# Majorizing Cubics on Intervals Jan de Leeuw Version 15, December 11, 2016 Note: This is a working paper which will be expanded/updated frequently. All suggestions for improvement are welcome. The directory [gifi.stat.ucla.edu/cubic](http://gifi.stat.ucla.edu/cubic) has a pdf version, the complete Rmd file with all code chunks, the bib file, and the R source code. #Introduction Suppose $\mathcal{I}$ is the closed interval $[L,U]$, and $f:\mathcal{I}\rightarrow\mathbb{R}$. A function $g:\mathcal{I}\otimes\mathcal{I}\rightarrow\mathbb{R}$ is a *majorization scheme* for $f$ on $\mathcal{I}$ if * $g(x,x) = f(x)$ for all $x\in\mathcal{I}$, * $g(x,y)\geq f(x)$ for all $x,y\in\mathcal{I}$. In other words for each $y$ the global minimum of $g(x,y)-f(x)$ over $x\in\mathcal{I}$ is zero, and it is attained at $y$. If the functions $f$ and $g$ are differentiable and the minimum is attained at an interior point of the interval, we have $\mathcal{D}_1g(x,y)=\mathcal{D}f(x)$. If the functions are in addition twice differentiable we have $\mathcal{D}_{11}g(x,y)\geq\mathcal{D}^2f(x)$. The majorization conditions are not symmetric in $x$ and $y$, and consequently it sometimes is more clear to write $g_y(x)$ for $g(x,y)$, so that $g_y:\mathcal{I}\rightarrow\mathbb{R}$. We say that $g_y$ majorizes $f$ on $\mathcal{I}$ at $y$, or with *support point* $y$. A *majorization algorithm* is of the form $$x^{(k+1)}\in\mathop{\mathbf{argmin}}_{x\in\mathcal{I}}g(x,x^{(k)})$$ It then follows that \begin{equation}\label{E:sandwich} f(x^{(k+1)})\leq g(x^{(k+1)},x^{(k)})\leq g(x^{(k)},x^{(k)})=f(x^{(k)}), \end{equation} Thus a majorization step decreases the value of the objective function. The chain $\eqref{E:sandwich}$ is called the *sandwich inequality*. In $\eqref{E:sandwich}$ the inequality $f(x^{(k+1)})\leq g(x^{(k+1)},x^{(k)})$ follows from majorization, the inequality $g(x^{(k+1)},x^{(k)})\leq g(x^{(k)},x^{(k)})$ follows from minimization. This explains why majorization algorithms are also called *MM algorithms* (@lange_16). Using the *MM* label has the advantage that it can also be used for the dual family of minorization-maximization algorithms. In this note we are interested in *quadratic majorization*, i.e. in majorization functions of the form \begin{equation}\label{E:qmaj} g(x,y)=f(y)+f'(y)(x-y)+\frac12 K(x-y)^2, \end{equation} and specifically on quadratic majorizers of a cubic on an closed interval of the real line. If $K\leq 0$ in $\eqref{E:qmaj}$ the majorizaton function is concave and attains its minimum at one of endpoints of $\mathcal{I}$. If $K>0$ then define the *algorithmic map* $\mathcal{A}(x)=x-f'(x)/K$ and $$x^{(k+1)}=\begin{cases}L&\text{ if }\mathcal{A}(x^{(k)})U.\end{cases}$$ Assuming that the sequence $x^{(k)}$ converges to a fixed point $x_\infty$ of $\mathcal{A}$, i.e a point with $\mathcal{D}f(x_\infty)=0$, the rate of convergence is $$\rho(x_\infty)=1-\frac{\mathcal{D}^2f(x_\infty)}{K}.$$ # Majorizing a Cubic Suppose $f$ is a non-trivial cubic, with non-zero leading coefficient. The function and its derivatives are \begin{align*} f(x)&=d+cx+bx^2+ax^3,\\ f'(x)&=c+2bx+3ax^2,\\ f''(x)&=2b+6ax,\\ f'''(x)&=6a. \end{align*} ##Uniform Quadratic Majorization Define $$K_0=\max_{L\leq x\leq U}f''(x)=\max(f''(A),f''(B)).$$ Thus $$K_0=\begin{cases}6aU+2b&\text{ if }a>0,\\6aL+2b&\text{ if }a<0.\end{cases}$$ Our majorization function is the quadratic $$g(x,y):=f(y)+f'(y)(x-y)+\frac12 K_0(x-y)^2.$$ The corresponding majorization algorithm is $$x^{(k+1)}=\mathop{\mathbf{argmin}}_{L\leq x\leq U}g(x,x^{(k)})$$ Note the majorizing function can be a concave quadratic, in which case its minimum is always at one of the endpoints of the interval. Assuming we eventualy converge to a value $L0,\\\frac{6a(L-x_\infty)}{6aL+2b}&\text{ if }a<0.\end{cases} $$Note that this does not depend on the lower limit L of the interval if a>0. Convergence is faster if the upper limit U happens to be close to x_\infty, and in fact it can be close to zero. If U\rightarrow\infty the rate of convergence goes to one. As a first example, consider f(x)=\frac16(1-2x+x^3). This cubic has roots at -1.6180339887, 0.6180339887, 1. There is a local maximum at -\frac13\sqrt{6} and a local minimum at \frac13\sqrt{6}. The function (red), its first derivative (blue), and its second derivative (green) are in figure 1. Figure 1: Example Cubic We will look for a local minimum in the interval [-2,2], stating with initial value 1. Note that f''(x)=x and thus K_0=B=2. At x_\infty=\frac13\sqrt{3} we have \rho(x_\infty)=1-\frac16\sqrt{3}, i.e. approximately 0.7113248654. We report the results of the final iteration.  ## Iteration: 35 xinit: 1.00000000 xfinal: 0.57735207 rate: 0.71132334  If we look for a local minimum in the interval [0,1] instead of [-2,2], we get much faster convergence, because the upper bound is now much closer to the solution.  ## Iteration: 14 xinit: 0.50000000 xfinal: 0.57734974 rate: 0.42265183  If we start at a value to the left of -\frac13\sqrt{3} and look for a minimum in [-2,2] then the algorithm converges to the boundary at -2.  ## Iteration: 3 xinit: -1.50000000 xfinal: -2.00000000 rate: 0.00000000  Note that alteratively we could have used$$ K_0^+:=\max_{L\leq x\leq U} |f''(x)|$$in our majorization fuctions. Since$f''$is linear we see that$|f''|$is convex, and thus$K_0^+=\max(|f''(L)|,|f''(U)|$. Using$K_0^+$gives a majorization which is generally less precise, but uses majorization functions that are always convex quadratics. Also note that if there is a strict local minimum in$\mathcal{I}$then$K_0>0$, although we can still have$K_0 Figure 2: Sublevel Majorization

If we are close to a strict local minimum $x$ we have $f'(x)\approx 0$ and $f''(x)>0$. Thus the quadratic will have one root approximately zero and one root approximately equal to $f''(x)$, and the iteration is basically a Newton iteration. Thus, at least for cubics, sublevel majorization has quadratic convergence. We illustrate this by analyzing our small example with sublevel majorization, starting from $x=1$ and $x=\frac12$.  ## Iteration: 1 xold: 1.00000000 xnew: 0.66666667 cnew: 0.33333333 rate: 0.00000000 ## Iteration: 2 xold: 0.66666667 xnew: 0.58333333 cnew: 0.08333333 rate: 0.25000000 ## Iteration: 3 xold: 0.58333333 xnew: 0.57738095 cnew: 0.00595238 rate: 0.07142857 ## Iteration: 4 xold: 0.57738095 xnew: 0.57735027 cnew: 0.00003068 rate: 0.00515464 ## Iteration: 5 xold: 0.57735027 xnew: 0.57735027 cnew: 0.00000000 rate: 0.00002657   ## Iteration: 1 xold: 0.50000000 xnew: 0.57569391 cnew: 0.07569391 rate: 0.00000000 ## Iteration: 2 xold: 0.57569391 xnew: 0.57734948 cnew: 0.00165557 rate: 0.02187189 ## Iteration: 3 xold: 0.57734948 xnew: 0.57735027 cnew: 0.00000079 rate: 0.00047792  If $\mathcal{I}=[L,U]$ then our analysis must be modified slightly. We want inequality $\eqref{E:sharp}$ on the intersection of the sublevel interval and $[L,U]$. If the sublevel interval is in $[L,U]$ our previous analysis applies. Because we always have $y\in[L,U]$ the other possible intervals are $[y,U]$ if $y\leq U\leq y-2f'(y)/K$ and $[L,y]$ if $y-2f'(y)/K\leq L\leq y$. #Appendix: Code ##auxilary.R r mprint <- function (x, d = 2, w = 5) { print (noquote (formatC ( x, di = d, wi = w, fo = "f" ))) } cobwebPlotter <- function (xold, func, lowx = 0, hghx = 1, lowy = lowx, hghy = hghx, eps = 1e-10, itmax = 25, ...) { x <- seq (lowx, hghx, length = 100) y <- sapply (x, function (x) func (x, ...)) plot ( x, y, xlim = c(lowx , hghx), ylim = c(lowy, hghy), type = "l", col = "RED", lwd = 2 ) abline (0, 1, col = "BLUE") base <- 0 itel <- 1 repeat { xnew <- func (xold, ...) if (itel > 1) { lines (matrix(c(xold, xold, base, xnew), 2, 2)) } lines (matrix(c(xold, xnew, xnew, xnew), 2, 2)) if ((abs (xnew - xold) < eps) || (itel == itmax)) { break () } base <- xnew xold <- xnew itel <- itel + 1 } } minQuadratic <- function (a, lw, up) { f <- polynomial (a) fup <- predict (f, up) flw <- predict (f, lw) if (a <= 0) { if (fup <= flw) return (list (x = up, f = fup)) if (fup >= flw) return (list (x = lw, f = flw)) } xmn <- - a / (2 * a) fmn <- predict (f, xmn) if (xmn >= up) return (list (x = up, f = fup)) if (xmn <= lw) return (list (x = lw, f = flw)) return (list (x = xmn, f = fmn)) }  ##iterate.R r myIterator <- function (xinit, f, eps = 1e-6, itmax = 100, verbose = FALSE, final = TRUE, ...) { xold <- xinit cold <- Inf itel <- 1 repeat { xnew <- f (xold, ...) cnew <- abs (xnew - xold) rate <- cnew / cold if (verbose) cat( "Iteration: ", formatC (itel, width = 3, format = "d"), "xold: ", formatC ( xold, digits = 8, width = 12, format = "f" ), "xnew: ", formatC ( xnew, digits = 8, width = 12, format = "f" ), "cnew: ", formatC ( cnew, digits = 8, width = 12, format = "f" ), "rate: ", formatC ( rate, digits = 8, width = 12, format = "f" ), "\n" ) if ((cnew < eps) || (itel == itmax)) break xold <- xnew cold <- cnew itel <- itel + 1 } if (final) cat( "Iteration: ", formatC (itel, width = 3, format = "d"), "xinit: ", formatC ( xinit, digits = 8, width = 6, format = "f" ), "xfinal: ", formatC ( xnew, digits = 8, width = 6, format = "f" ), "rate: ", formatC ( rate, digits = 8, width = 6, format = "f" ), "\n" ) return (list ( itel = itel, xinit = xinit, xfinal = xnew, change = cnew, rate = rate )) } cubicUQ <- function (x, a, up, lw, sharp = FALSE) { f <- polynomial (a) g <- deriv (f) h <- deriv (g) i <- deriv (h) if (!sharp) { if (a > 0) k <- predict (h, up) if (a < 0) k <- predict (h, lw) } if (sharp) { if (a > 0) k <- predict (h, x) + predict (i, x) * (up - x) / 3 if (a < 0) k <- predict (h, x) + predict (i, x) * (lw - x) / 3 } xmin <- x - predict (g, x) / k if ((xmin <= up) && (xmin >= lw)) return (xmin) fup <- predict (f, up) flw <- predict (f, lw) return (ifelse (fup < flw, up, lw)) }  ##sublevel.R r tester <- function (y, k, func, grad) { qmaj <- function (x) func (y) + grad (y) * (x - y) + .5 * k * (x - y) ^ 2 ybnd <- y - 2 * grad (y) / k up <- max (y, ybnd) lw <- min (y, ybnd) x <- seq (lw, up, length = 100) s <- paste ("K = ", formatC(k, digits = 1, format= "f")) plot (x, func (x), col = "RED", lwd = 2, type = "l", ylab = "f,g", main = s) lines (x, qmaj (x), col = "BLUE", lwd = 2) abline (v = up) abline (v = lw) } f <- function (x) log (1 + exp (x)) g <- function (x) exp (x) / (1 + exp (x)) a <- function (x) (x ^ 3 - x) / 6 b <- function (x) (3 * x ^ 2 - 1) / 6 cubicSublevel <- function (y, a, sharp = FALSE) { f <- polynomial (a) g <- deriv (f) h <- deriv (g) i <- deriv (h) dfy <- predict (g, y) dgy <- predict (h, y) dhy <- predict (i, y) disk <- dgy ^ 2 - 4 * (2 / 3) * dhy * dfy if (disk <= 0) k <- max (0, dgy) else { r <- sort (solve (polynomial (c((2 / 3) * dhy * dfy, -dgy, 1)))) if ((r >= 0) && (r >= 0)) k <- dgy if ((r <= 0) && (r >= 0)) k <- r if ((r <= 0) && (r <= 0)) stop ("unbounded") } return (y - dfy / k) }  #References