--- title: "Majorizing Cubics on Intervals" author: "Jan de Leeuw" date: "Version 15, December 11, 2016" output: pdf_document: keep_tex: yes number_sections: yes toc: yes toc_depth: 3 html_document: keep_md: yes number_sections: yes toc: yes toc_depth: 3 fontsize: 12pt graphics: yes bibliography: cubic.bib abstract: We illustrate uniform quadratic majorization, sharp quadratic majorization, and sublevel quadratic majorization using the example of a univariate cubic. --- {r function_code, echo = FALSE} source("cubic.R") source("auxilary.R") source("iterate.R") source("sublevel.R")  {r packages, echo = FALSE} options (digits = 10) library(captioner) library(polynom)  {r equation_numbering, echo = FALSE} figure_nums <- captioner (prefix = "Figure") theorem_nums <- captioner (prefix = "Theorem") lemma_nums <- captioner (prefix = "Lemma") corollary_nums <- captioner (prefix = "Corollary") figure_nums (name = "first_example", caption = "Example Cubic", display = FALSE) figure_nums (name = "sublevel", caption = "Sublevel Majorization", display = FALSE)  Note: This is a working paper which will be expanded/updated frequently. All suggestions for improvement are welcome. The directory [gifi.stat.ucla.edu/cubic](http://gifi.stat.ucla.edu/cubic) has a pdf version, the complete Rmd file with all code chunks, the bib file, and the R source code. #Introduction Suppose $\mathcal{I}$ is the closed interval $[L,U]$, and $f:\mathcal{I}\rightarrow\mathbb{R}$. A function $g:\mathcal{I}\otimes\mathcal{I}\rightarrow\mathbb{R}$ is a *majorization scheme* for $f$ on $\mathcal{I}$ if * $g(x,x) = f(x)$ for all $x\in\mathcal{I}$, * $g(x,y)\geq f(x)$ for all $x,y\in\mathcal{I}$. In other words for each $y$ the global minimum of $g(x,y)-f(x)$ over $x\in\mathcal{I}$ is zero, and it is attained at $y$. If the functions $f$ and $g$ are differentiable and the minimum is attained at an interior point of the interval, we have $\mathcal{D}_1g(x,y)=\mathcal{D}f(x)$. If the functions are in addition twice differentiable we have $\mathcal{D}_{11}g(x,y)\geq\mathcal{D}^2f(x)$. The majorization conditions are not symmetric in $x$ and $y$, and consequently it sometimes is more clear to write $g_y(x)$ for $g(x,y)$, so that $g_y:\mathcal{I}\rightarrow\mathbb{R}$. We say that $g_y$ majorizes $f$ on $\mathcal{I}$ at $y$, or with *support point* $y$. A *majorization algorithm* is of the form $$x^{(k+1)}\in\mathop{\mathbf{argmin}}_{x\in\mathcal{I}}g(x,x^{(k)})$$ It then follows that $$\label{E:sandwich} f(x^{(k+1)})\leq g(x^{(k+1)},x^{(k)})\leq g(x^{(k)},x^{(k)})=f(x^{(k)}),$$ Thus a majorization step decreases the value of the objective function. The chain $\eqref{E:sandwich}$ is called the *sandwich inequality*. In $\eqref{E:sandwich}$ the inequality $f(x^{(k+1)})\leq g(x^{(k+1)},x^{(k)})$ follows from majorization, the inequality $g(x^{(k+1)},x^{(k)})\leq g(x^{(k)},x^{(k)})$ follows from minimization. This explains why majorization algorithms are also called *MM algorithms* (@lange_16). Using the *MM* label has the advantage that it can also be used for the dual family of minorization-maximization algorithms. In this note we are interested in *quadratic majorization*, i.e. in majorization functions of the form $$\label{E:qmaj} g(x,y)=f(y)+f'(y)(x-y)+\frac12 K(x-y)^2,$$ and specifically on quadratic majorizers of a cubic on an closed interval of the real line. If $K\leq 0$ in $\eqref{E:qmaj}$ the majorizaton function is concave and attains its minimum at one of endpoints of $\mathcal{I}$. If $K>0$ then define the *algorithmic map* $\mathcal{A}(x)=x-f'(x)/K$ and $$x^{(k+1)}=\begin{cases}L&\text{ if }\mathcal{A}(x^{(k)})U.\end{cases}$$ Assuming that the sequence $x^{(k)}$ converges to a fixed point $x_\infty$ of $\mathcal{A}$, i.e a point with $\mathcal{D}f(x_\infty)=0$, the rate of convergence is $$\rho(x_\infty)=1-\frac{\mathcal{D}^2f(x_\infty)}{K}.$$ # Majorizing a Cubic Suppose $f$ is a non-trivial cubic, with non-zero leading coefficient. The function and its derivatives are \begin{align*} f(x)&=d+cx+bx^2+ax^3,\\ f'(x)&=c+2bx+3ax^2,\\ f''(x)&=2b+6ax,\\ f'''(x)&=6a. \end{align*} ##Uniform Quadratic Majorization Define $$K_0=\max_{L\leq x\leq U}f''(x)=\max(f''(A),f''(B)).$$ Thus $$K_0=\begin{cases}6aU+2b&\text{ if }a>0,\\6aL+2b&\text{ if }a<0.\end{cases}$$ Our majorization function is the quadratic $$g(x,y):=f(y)+f'(y)(x-y)+\frac12 K_0(x-y)^2.$$ The corresponding majorization algorithm is $$x^{(k+1)}=\mathop{\mathbf{argmin}}_{L\leq x\leq U}g(x,x^{(k)})$$ Note the majorizing function can be a concave quadratic, in which case its minimum is always at one of the endpoints of the interval. Assuming we eventualy converge to a value L0,\\\frac{6a(L-x_\infty)}{6aL+2b}&\text{ if }a<0.\end{cases} Note that this does not depend on the lower limit L of the interval if a>0. Convergence is faster if the upper limit U happens to be close to x_\infty, and in fact it can be close to zero. If U\rightarrow\infty the rate of convergence goes to one. {r a_cubic, echo = FALSE} a <- c(1,-2,0,1)/6 p <- polynomial (a) r <- solve (p) d1p <- deriv (p) r1 <- solve (d1p) d2p <- deriv (d1p) d3p <- deriv (d2p)  As a first example, consider f(x)=\frac16(1-2x+x^3). This cubic has roots at r r. There is a local maximum at -\frac13\sqrt{6} and a local minimum at \frac13\sqrt{6}. The function (red), its first derivative (blue), and its second derivative (green) are in figure r figure_nums("first_example", display = "n"). {r first_example, fig.align= "center", fig.width = 10, fig.height = 10, echo = FALSE, cache = FALSE} a <- c(0,-1,0,1)/6 #a <- c(1,-2,0,1)/6 x <- seq (-2, 2, length = 1000) y <- ecubic (a, x) plot (x, yfx, xlim = c(-2,2), ylim = c(-1,1), type = "l", col = "RED", ylab = "y", lwd = 2) lines (x, ygx, col = "BLUE", lwd = 2) lines (x, yhx, col = "GREEN", lwd = 2) abline(h=0) abline(v=1) abline(v=-1) abline(v=0) abline(v=sqrt(3)/3) abline(v=-sqrt(3)/3)  r figure_nums("first_example") We will look for a local minimum in the interval [-2,2], stating with initial value 1. Note that f''(x)=x and thus K_0=B=2. At x_\infty=\frac13\sqrt{3} we have \rho(x_\infty)=1-\frac16\sqrt{3}, i.e. approximately r 1-sqrt(3)/6. We report the results of the final iteration. {r first_analysis_1, echo = FALSE, size = "tiny"} h <- myIterator(xinit = 1.0, f = cubicUQ, eps = 1e-6, itmax = 100, verbose = FALSE, final = TRUE, a = a, up = 2, lw = -2, sharp = FALSE)  If we look for a local minimum in the interval [0,1] instead of [-2,2], we get much faster convergence, because the upper bound is now much closer to the solution. {r first_analysis_2, echo = FALSE, size = "tiny"} h <- myIterator(xinit = 0.5, f = cubicUQ, eps = 1e-6, itmax = 100, verbose = FALSE, final = TRUE, a = a, up = 1, lw = 0, sharp = FALSE)  If we start at a value to the left of -\frac13\sqrt{3} and look for a minimum in [-2,2] then the algorithm converges to the boundary at -2. {r first_analysis_3, echo = FALSE, size = "tiny"} h <- myIterator(xinit = -1.5, f = cubicUQ, eps = 1e-6, itmax = 100, verbose = FALSE, final = TRUE, a = a, up = 2, lw = -2, sharp = FALSE)  Note that alteratively we could have used K_0^+:=\max_{L\leq x\leq U} |f''(x)|$in our majorization fuctions. Since$f''$is linear we see that$|f''|$is convex, and thus$K_0^+=\max(|f''(L)|,|f''(U)|$. Using$K_0^+$gives a majorization which is generally less precise, but uses majorization functions that are always convex quadratics. Also note that if there is a strict local minimum in$\mathcal{I}$then$K_0>0$, although we can still have$K_0 {r tester, fig.align= "center", echo = FALSE} par(mfrow=c(2,2)) func <- function (x) (x ^ 3 - x) / 6 grad <- function (x) (3 * x ^ 2 - 1) / 6 tester (.5, 2, func, grad) tester (.5, 1, func, grad) tester (.5, .55, func, grad) tester (.5, .50, func, grad) 
r figure_nums("sublevel")

If we are close to a strict local minimum $x$ we have $f'(x)\approx 0$ and $f''(x)>0$. Thus the quadratic will have one root approximately zero and one root approximately equal to $f''(x)$, and the iteration is basically a Newton iteration. Thus, at least for cubics, sublevel majorization has quadratic convergence. We illustrate this by analyzing our small example with sublevel majorization, starting from $x=1$ and $x=\frac12$. {r third_analysis_1, echo = FALSE, size = "tiny"} h <- myIterator(xinit = 1, f = cubicSublevel, eps = 1e-6, itmax = 100, verbose = TRUE, final = FALSE, a = a)  {r third_analysis_2, echo = FALSE, size = "tiny"} h <- myIterator(xinit = .5, f = cubicSublevel, eps = 1e-6, itmax = 100, verbose = TRUE, final = FALSE, a = a)  If $\mathcal{I}=[L,U]$ then our analysis must be modified slightly. We want inequality $\eqref{E:sharp}$ on the intersection of the sublevel interval and $[L,U]$. If the sublevel interval is in $[L,U]$ our previous analysis applies. Because we always have $y\in[L,U]$ the other possible intervals are $[y,U]$ if $y\leq U\leq y-2f'(y)/K$ and $[L,y]$ if $y-2f'(y)/K\leq L\leq y$. #Appendix: Code ##auxilary.R {r file_auxilary, code = readLines("auxilary.R")}  ##iterate.R {r file_auxilary, code = readLines("iterate.R")}  ##sublevel.R {r file_auxilary, code = readLines("sublevel.R")}  #References