# Block Relaxation as Majorization
Jan de Leeuw
Version 007, November 04, 2016
---
Note: This is a working paper which will be expanded/updated frequently. All suggestions for improvement are welcome. The directory [gifi.stat.ucla.edu/block](http://gifi.stat.ucla.edu/block) has a pdf version, the complete Rmd file, and the bib file.
#Introduction
We use notation and terminology taken from @deleeuw_C_94c.
#Block Relaxation
To minimize $g:X\otimes Y\rightarrow\mathbb{R}$ over $x\in X$ and $y\in Y$ we can use
the *block relaxation* algorithm.
\begin{align*}
y^{(k+1)}&\in\mathbf{argmin}_{y\in Y}g(x^{(k)},y),\\
x^{(k+1)}&\in\mathbf{argmin}_{x\in X}g(x,y^{(k+1)}).
\end{align*}
Note that the argmin's are point-to-set maps, because the minima over blocks are not necessarily unique.
As an example, consider $g(a,b)=\mathbf{SSQ}(y-Xa-Zb)$ with $\mathbf{SSQ}()$ the sum of squares.
The algorithm, using Moore-Penrose inverses, is
\begin{align*}
b^{(k+1)}&=Z^+(y-Xa^{(k)}),\\
a^{(k+1)}&=X^+(y-Zb^{(k+1)}).
\end{align*}
#Augmentation
Suppose the original problem is to minimize $f:X\rightarrow\mathbb{R}$ over $x\in X$ and we can find $g:X\times Y\rightarrow\mathbb{R}$ such that $f(x)=\min_{y\in Y}g(x,y)$. Such a $g$ is called an *augmentation* of $f$. Minimizing
$f$ over $x\in X$ can be done by applying block relaxation to the augmentation $g$ over $x\in X$ and $y\in Y$.
In least squares factor analysis, for example, we minimize
$$
f(X)=\mathbf{SSQ}(\mathbf{off}(R-XX')),
$$
where $\mathbf{off}(X)=X-\mathbf{diag}(X)$. Choose the augmentation
$$
g(X,\Delta)=\mathbf{SSQ}(R-XX'-\Delta)
$$
where $\Delta$ varies over diagonal matrices. The block relaxation algorithm is
\begin{align*}
\Delta^{(k+1)}&=\mathbf{diag}(R-X^{(k)}(X^{(k)})'),\\
(R-\Delta^{(k+1)})X^{(k+1)}&=X^{(k+1)}\Lambda,
\end{align*}
where $\Lambda$ is a symmetric matrix of Lagrange multipliers. Thus finding $X^{(k+1)}$ involves
solving the eigen problem for $R-\Delta^{(k+1)}$.
#Majorization
Again we want to minimize $f:X\rightarrow\mathbb{R}$ over $x\in X$. Suppose there is a $g:X\times X\rightarrow\mathbb{R}$ such that $g(x,y)\geq f(x)$ for all $x\in X$
and $y\in X$ and such that $g(x,x)=f(x)$ for all $x\in X$. Such a $g$ is called a *majorization* of $f$. Minimize $f$ over $x\in X$ by applying block relaxation to the majorization $g$ over $x\in X$ and $y\in X$.
Clearly any majorization of $f$ is also an augmentation of $f$. Majorization is a special type of augmentation because $X=Y$ and $x\in\mathbf{argmin}_{y\in Y} g(x,y)$. Thus the block relaxation is simply
$$
x^{(k+1)}\in\mathbf{argmin}_{x\in X}g(x,x^{(k)}).
$$
Thus majorization algorithms are block relaxation algorithms.
#Majorization from Blocking
Suppose $h:X\otimes Z\rightarrow\mathbb{R}$. Define $T(x)=\mathbf{argmin}_{z\in Z} h(x,z)$, and suppose $t(x)$ is a selection from $T(x)$, i.e. $t(x)\in T(x)$ for all $x\in X$. Define $f(x)=h(x,t(x))$ and $g(x,y)=h(x,t(y))$. Then $g(x,y)\geq g(x,x)=f(x)$. Thus $g$ is a majorization
of $f$. The majorization algorithm for $f$ and $g$ is simply the block relaxation algorithm for $h$. Thus block relaxation algorithms are majorization algorithms. Our reasoning here is very similar to @lange_16 (section 4.9).
As an example consider
$$
h(X,\Delta)=\mathbf{SSQ}(R-XX'-\Delta).
$$
Then
$$
f(X) = \mathbf{SSQ}(\mathbf{off}(R-XX')),
$$
and the majorization of $f$ is
$$
g(X,Y)=\mathbf{SSQ}(R - XX'-\mathbf{diag}(R-YY')).
$$
Another example is
$$
h(a,b)=\mathbf{SSQ}(y-Xa-Zb).
$$
Then
$$
f(a)=(y-Xa)'(I-ZZ^+)(y-Xa),
$$
and the majorization of $f$ is
$$
g(a,b)=\mathbf{SSQ}(y-Xa-ZZ^+(y-Xb)).
$$
Clearly we can also interchange the role of the two blocks. In the factor analysis example we can minimize out $X$ to get
$$
f(\Delta)=\sum_{s=p+1}^n \lambda_s(R-\Delta),
$$
where the $\lambda_s(X)$ are the ordered eigenvalues of $X$ (assuming the $p$ largest eigenvalues are non-negative). The majorization function is
$$
g(\Delta,\Omega)=\mathbf{SSQ}(R-\Delta-(R-\Omega)_p),
$$
with $(X)_p$ the best rank $p$ approximation of $X$.
#Partial Majorization
Suppose the problem we want to solve is minimizing \(g(x,y)\) over \(x\in X\)
and \(y\in Y\). If both minimizing \(g(x,y)\) over \(x\in X\) for fixed \(y\in Y\)
and minimizing \(g(x,y)\) over \(y\in Y\) for fixed \(x\in X\)is easy, then we
often use block-relaxation, alternating the two conditional minimization
problems until convergence.
But now suppose only one of the two problems, say
minimizing \(g(x,y)\) over \(y\in Y\) for fixed \(x\in X\), is easy. Define
\[
f(x)=\min_{y\in Y} g(x,y)
\]
and let \(y(x)\)( be any \(y\in Y\) such that \(f(x)=g(x,y(x))\).
Suppose we have a majorizing function \(h(x,z)\) for \(f(x)\). Thus
\begin{align*}
f(x)&\leq h(x,z)\qquad\forall x,z\in X,\\
f(x)&= h(x,x)\qquad\forall x\in X.
\end{align*}
Suppose our curent best solution for \(x\) is \(\tilde x\), with corresponding
\(\tilde y=y(\tilde x)\). Let \(x^+\) be any minimizer of \(h(x,\tilde x)\) over \(x\in X\). Now
\[
g(x^+,y(x^+))=f(x^+)\leq h(x^+,\tilde x)\leq h(\tilde x,\tilde x)=f(\tilde x)=g(\tilde x,y(\tilde x))
\]
which means that \((x^+,y(x^+))\) gives a lower loss function value than \((\tilde x,y(\tilde x))\). Thus we have, under the usual conditions, a convergent algorithm.
\bibliographystyle{plainnat}
#References