The iterations of the majorization algorithm, started from -1.5, 0.5, and 0.0, are given next. Convergence is fast and ultimately seems to be quadratic. Note that, in this example at least, convergence is to the point where the two quadratic majorizers are equal. At this point $$ f(x^{(k)})+f'(x^{(k)})(x^{(k+1)}-x^{(k)})=-f(x^{(k)})-f'(x^{(k)})(x^{(k+1)}-x^{(k)}), $$ or $$ x^{(k+1)}=x^{(k)}-\frac{f(x^{(k)})}{f'(x^{(k)})}, $$ which is a Newton step. ```{r ad_hoc, echo = FALSE} source("auxilary.R") a <- c(0,-1,0,1)/6 myAbs <- function (y, a, k, lw, up) { f <- polynomial (a) df <- deriv (f) ff <- c(predict (f, y), -predict (f, y)) gg <- c(predict (df, y), -predict (df, y)) return (minimaxQ (y, ff, gg, k, lw, up)) } ``` ```{r ad_hoc_2} h <- myIterator (xinit = -1.5, f = myAbs, a = a, k = 2, lw = -2, up = 2, verbose = TRUE, final = FALSE) h <- myIterator (xinit = 0.5, f = myAbs, a = a, k = 2, lw = -2, up = 2, verbose = TRUE, final = FALSE) h <- myIterator (xinit = 0.0, f = myAbs, a = a, k = 2, lw = -2, up = 2, verbose = TRUE, final = FALSE) ``` We can find better quadratic majorizers of our cubics by using the results of @deleeuw_E_16q. Define the sharp quadratic majorizers for which $$ K(y):=\max_{L\leq x\leq U} f''(y)+\frac13 f'''(y)(x-y)=f''(y)+\frac13\max\ (f'''(L-y), f'''(U-y)) $$ For $\frac16(x^3-x)$ this gives $K_1(y)=y+\frac13(U-y)$ and for $-\frac16(x^3-x)$ it gives $K_2(y)=-y-\frac13(L-y)$. If $L=-2$ and $U=+2$ then $K_1(y)=\frac23 y+\frac23$ and $K_2(y)=-\frac23 y+\frac23$. At $y=-\frac32$ this means $K_1=-\frac13$ and $K_2=\frac53$. At $y=\frac12$ we have $K_1=1$ and $K_2=\frac13$. At $y=0$ we have $K_1=K_2=\frac23$. The piecewise quadratc majorizers are in figure `r figure_nums("sharp", display = "n")`. Note that for both $y=-\frac32$ and for $y=\frac12$ there are two interior knots.

```{r next_distraction, fig.align= "center", fig.width = 10, fig.height = 10, echo = FALSE, cache = FALSE} a <- c(0,-1,0,1)/6 x <- seq (-2, 2, length = 1000) f <- polynomial (a) fx <- predict (f, x) gx <- - predict (f, x) df <- deriv (f) y <- .5 k1 <- 1 k2 <- 1/3 ffx <- predict (f, y) + predict (df, y) * (x-y) + 0.5 * k1 * (x - y) ^ 2 ggx <- -predict (f, y) - predict (df, y) * (x-y) + 0.5 * k2 * (x - y) ^ 2 plot(x, pmax(fx, gx), type = "l", col = "RED", lwd = 2) lines (x, pmax (ffx, ggx), col = "BLUE", lwd = 2) y <- -1.5 k1 <- -1/3 k2 <- 5/3 ffx <- predict (f, y) + predict (df, y) * (x-y) + 0.5 * k1 * (x - y) ^ 2 ggx <- -predict (f, y) - predict (df, y) * (x-y) + 0.5 * k2 * (x - y) ^ 2 lines (x, pmax (ffx, ggx), col = "BLUE", lwd = 2) y <- 0 k1 <- 2/3 k2 <- 2/3 ffx <- predict (f, y) + predict (df, y) * (x-y) + 0.5 * k1 * (x - y) ^ 2 ggx <- -predict (f, y) - predict (df, y) * (x-y) + 0.5 * k2 * (x - y) ^ 2 lines (x, pmax (ffx, ggx), col = "BLUE", lwd = 2) abline(v=.5) abline(v=-1.5) abline(v=0) ```

As expected, using no lifitng and sharp majorization requires fewer iterations to reach the roots, although of course each iteration is more work. ```{r ad_hoc_3, echo = FALSE} source("auxilary.R") a <- c(0,-1,0,1)/6 myAbs <- function (y, a, k, lw, up) { f <- polynomial (a) df <- deriv (f) ff <- c(predict (f, y), -predict (f, y)) gg <- c(predict (df, y), -predict (df, y)) return (minimaxB (y, ff, gg, k, lw, up)) } ``` ```{r ad_hoc_4} h <- myIterator (xinit = -1.5, f = myAbs, a = a, k = c(-1/3,5/3), lw = -2, up = 2, verbose = TRUE, final = FALSE) h <- myIterator (xinit = 0.5, f = myAbs, a = a, k = c(1,1/3), lw = -2, up = 2, verbose = TRUE, final = FALSE) h <- myIterator (xinit = 0.0, f = myAbs, a = a, k = c(2/3,2/3), lw = -2, up = 2, verbose = TRUE, final = FALSE) ``` The root-finding example can easily be extended to more interesting cases. If we have a convex function $f$ with a quadratic majorizer, then $-F$ is concave and, if so desired, it can be lifted to have the same quadratic term. ##MDS in the Chebyshev Norm Consider the multidimensional scaling (MDS) problem of minimizing the loss function $$ \sigma(x):=\max_{i=1}^n w_i\left|\delta_i-\sqrt{x'A_ix}\right|. $$ Here the $w_i$ are positive weights, the $\delta_i$ are positive dissimilarties, and the $\sqrt{x'A_ix}$ are Euclidean distances. We give the quadratic majorizations that are needed. The first majorization, based on Cauchy-Schwartz, is actually linear. This is the basic SMACOF majorization, due to @deleeuw_C_77. $$ \delta_i-\sqrt{x'A_{i}x}\leq \delta_i-\frac{1}{\sqrt{y'A_{i}y}}x'A_{i}y. $$ The second majorization, based on the AM/GM inequality, is quadratic. It was first used in the MDS context by @heiser_91. $$ \sqrt{x'A_ix}-\delta_i\leq \frac12\frac{1}{\sqrt{y'A_iy}}\{x'A_ix+y'A_iy\}-\delta_i $$ We can add an appropriate quadratic term to the first (linear) majorization to achieve multivariate lifting. We will not give a full implementation in this paper, but in principle it is quite straightforward. It also is of some interestto observe that the alternative loss funcion $$ \sigma(x):=\max_{i=1}^n w_i\left|\delta_i^2-x'A_ix\right| $$ is already a maximum of quadratics, and consequently minimizing it does not require quadratic majorization. Without multivariate lifting, however, the regions where one quadratic function dominates all others can be quite complicated. #Appendix: Code ##auxilary.R ```{r file_auxilary, code = readLines("auxilary.R")} ``` ##minimax.R ```{r file_auxilary, code = readLines("minimax.R")} ``` #References