Note: This is a working paper which will be expanded/updated frequently. All suggestions for improvement are welcome. The directory gifi.stat.ucla.edu/block has a pdf version, the complete Rmd file, and the bib file.

# 1 Introduction

We use notation and terminology taken from De Leeuw (1994).

# 2 Block Relaxation

To minimize $$g:X\otimes Y\rightarrow\mathbb{R}$$ over $$x\in X$$ and $$y\in Y$$ we can use the block relaxation algorithm. \begin{align*} y^{(k+1)}&\in\mathbf{argmin}_{y\in Y}g(x^{(k)},y),\\ x^{(k+1)}&\in\mathbf{argmin}_{x\in X}g(x,y^{(k+1)}). \end{align*}

Note that the argmin’s are point-to-set maps, because the minima over blocks are not necessarily unique.

As an example, consider $$g(a,b)=\mathbf{SSQ}(y-Xa-Zb)$$ with $$\mathbf{SSQ}()$$ the sum of squares. The algorithm, using Moore-Penrose inverses, is \begin{align*} b^{(k+1)}&=Z^+(y-Xa^{(k)}),\\ a^{(k+1)}&=X^+(y-Zb^{(k+1)}). \end{align*}

# 3 Augmentation

Suppose the original problem is to minimize $$f:X\rightarrow\mathbb{R}$$ over $$x\in X$$ and we can find $$g:X\times Y\rightarrow\mathbb{R}$$ such that $$f(x)=\min_{y\in Y}g(x,y)$$. Such a $$g$$ is called an augmentation of $$f$$. Minimizing $$f$$ over $$x\in X$$ can be done by applying block relaxation to the augmentation $$g$$ over $$x\in X$$ and $$y\in Y$$.

In least squares factor analysis, for example, we minimize $f(X)=\mathbf{SSQ}(\mathbf{off}(R-XX')),$ where $$\mathbf{off}(X)=X-\mathbf{diag}(X)$$. Choose the augmentation $g(X,\Delta)=\mathbf{SSQ}(R-XX'-\Delta)$ where $$\Delta$$ varies over diagonal matrices. The block relaxation algorithm is \begin{align*} \Delta^{(k+1)}&=\mathbf{diag}(R-X^{(k)}(X^{(k)})'),\\ (R-\Delta^{(k+1)})X^{(k+1)}&=X^{(k+1)}\Lambda, \end{align*}

where $$\Lambda$$ is a symmetric matrix of Lagrange multipliers. Thus finding $$X^{(k+1)}$$ involves solving the eigen problem for $$R-\Delta^{(k+1)}$$.

# 4 Majorization

Again we want to minimize $$f:X\rightarrow\mathbb{R}$$ over $$x\in X$$. Suppose there is a $$g:X\times X\rightarrow\mathbb{R}$$ such that $$g(x,y)\geq f(x)$$ for all $$x\in X$$ and $$y\in X$$ and such that $$g(x,x)=f(x)$$ for all $$x\in X$$. Such a $$g$$ is called a majorization of $$f$$. Minimize $$f$$ over $$x\in X$$ by applying block relaxation to the majorization $$g$$ over $$x\in X$$ and $$y\in X$$.

Clearly any majorization of $$f$$ is also an augmentation of $$f$$. Majorization is a special type of augmentation because $$X=Y$$ and $$x\in\mathbf{argmin}_{y\in Y} g(x,y)$$. Thus the block relaxation is simply $x^{(k+1)}\in\mathbf{argmin}_{x\in X}g(x,x^{(k)}).$ Thus majorization algorithms are block relaxation algorithms.

# 5 Majorization from Blocking

Suppose $$h:X\otimes Z\rightarrow\mathbb{R}$$. Define $$T(x)=\mathbf{argmin}_{z\in Z} h(x,z)$$, and suppose $$t(x)$$ is a selection from $$T(x)$$, i.e. $$t(x)\in T(x)$$ for all $$x\in X$$. Define $$f(x)=h(x,t(x))$$ and $$g(x,y)=h(x,t(y))$$. Then $$g(x,y)\geq g(x,x)=f(x)$$. Thus $$g$$ is a majorization of $$f$$. The majorization algorithm for $$f$$ and $$g$$ is simply the block relaxation algorithm for $$h$$. Thus block relaxation algorithms are majorization algorithms. Our reasoning here is very similar to Lange (2016) (section 4.9).

As an example consider $h(X,\Delta)=\mathbf{SSQ}(R-XX'-\Delta).$ Then $f(X) = \mathbf{SSQ}(\mathbf{off}(R-XX')),$ and the majorization of $$f$$ is $g(X,Y)=\mathbf{SSQ}(R - XX'-\mathbf{diag}(R-YY')).$

Another example is $h(a,b)=\mathbf{SSQ}(y-Xa-Zb).$ Then $f(a)=(y-Xa)'(I-ZZ^+)(y-Xa),$ and the majorization of $$f$$ is $g(a,b)=\mathbf{SSQ}(y-Xa-ZZ^+(y-Xb)).$ Clearly we can also interchange the role of the two blocks. In the factor analysis example we can minimize out $$X$$ to get $f(\Delta)=\sum_{s=p+1}^n \lambda_s(R-\Delta),$ where the $$\lambda_s(X)$$ are the ordered eigenvalues of $$X$$ (assuming the $$p$$ largest eigenvalues are non-negative). The majorization function is $g(\Delta,\Omega)=\mathbf{SSQ}(R-\Delta-(R-\Omega)_p),$ with $$(X)_p$$ the best rank $$p$$ approximation of $$X$$.

# 6 Partial Majorization

Suppose the problem we want to solve is minimizing $$g(x,y)$$ over $$x\in X$$ and $$y\in Y$$. If both minimizing $$g(x,y)$$ over $$x\in X$$ for fixed $$y\in Y$$ and minimizing $$g(x,y)$$ over $$y\in Y$$ for fixed $$x\in X$$is easy, then we often use block-relaxation, alternating the two conditional minimization problems until convergence.

But now suppose only one of the two problems, say minimizing $$g(x,y)$$ over $$y\in Y$$ for fixed $$x\in X$$, is easy. Define $f(x)=\min_{y\in Y} g(x,y)$ and let $$y(x)$$( be any $$y\in Y$$ such that $$f(x)=g(x,y(x))$$.

Suppose we have a majorizing function $$h(x,z)$$ for $$f(x)$$. Thus \begin{align*} f(x)&\leq h(x,z)\qquad\forall x,z\in X,\\ f(x)&= h(x,x)\qquad\forall x\in X. \end{align*}

Suppose our curent best solution for $$x$$ is $$\tilde x$$, with corresponding $$\tilde y=y(\tilde x)$$. Let $$x^+$$ be any minimizer of $$h(x,\tilde x)$$ over $$x\in X$$. Now $g(x^+,y(x^+))=f(x^+)\leq h(x^+,\tilde x)\leq h(\tilde x,\tilde x)=f(\tilde x)=g(\tilde x,y(\tilde x))$ which means that $$(x^+,y(x^+))$$ gives a lower loss function value than $$(\tilde x,y(\tilde x))$$. Thus we have, under the usual conditions, a convergent algorithm.

# References

De Leeuw, J. 1994. “Block Relaxation Algorithms in Statistics.” In Information Systems and Data Analysis, edited by H.H. Bock, W. Lenski, and M.M. Richter, 308–24. Berlin: Springer Verlag. http://www.stat.ucla.edu/~deleeuw/janspubs/1994/chapters/deleeuw_C_94c.pdf.

Lange, K. 2016. MM Optimization Algorithms. SIAM.