---
title: "An Alternative Majorization for Multidimensional Scaling"
author: "Jan de Leeuw"
date: "Version April 09, 2018"
output:
html_document:
keep_md: yes
number_sections: yes
toc: yes
pdf_document:
keep_tex: yes
number_sections: yes
toc: yes
toc_depth: 3
fontsize: 12pt
graphics: yes
bibliography: cmds.bib
abstract: The Cauchy-Schwartz majorization of the distance function in SMACOF is replaced by a majorization of the squared distance function. This leads to an interesting SMACOF alternative, which we call SMOCAF.
---
Note: This is a working paper which will be expanded/updated frequently. All suggestions for improvement are welcome. The directory [gifi.stat.ucla.edu/cmds](http://gifi.stat.ucla.edu/cmds) has a pdf version, the bib file, the complete Rmd file with the code chunks, and the R source code.
#Introduction
The (Euclidean, metric, least squares) multidimensional scaling or MDS problem is a special case of a nonconvex optimization problem in which we minimize a function of the form
$$
f(x)\mathop{=}\limits^{\Delta}1 + \frac12 x'x - \sum_{i=1}^n \sqrt{x'A_ix},
$$
where the $A_i$ are given positive semi-definite matrices. Because the third term on the right, the one with the square root, is convex, the problem is a DC-problem (@lethi_tao_18), i.e. the minimization of the difference of two convex functions, one of which is quadratic and one of which is not differentiable at some points. We do not really have to worry about non-differentiability, because of the result in @deleeuw_A_84f.
#SMACOF
The SMACOF algorithm (@deleeuw_C_77, @deleeuw_heiser_C_77, @deleeuw_mair_A_09c) to minimize the MDS objective function $f$ is a majorization algorithm (@deleeuw_C_94c), currently more commonly known as an MM algorithm (@lange_16). It is based on the Cauchy-Schwarz inequality
$$
\sqrt{x'A_ix\ y'A_iy}\geq x'A_iy.
$$
Define
$$
e_i(x)\mathop{=}\limits^{\Delta}\begin{cases}(x'A_ix)^{-\frac12}&\text{ if }x'A_ix\not= 0,\\
0&\text{ otherwise }.\end{cases}
$$
and
$$
g(x,y)\mathop{=}\limits^{\Delta}1+\frac12 x'x -\sum_{i=1}^ne_i(y)x'A_iy,
$$
then we have the basic majorization relations $f(x)\leq g(x,y)$ and $f(x)=g(x,x)$. Thus step $k$ in the iterative algorithm minimizes $g(x,x^{(k)})$ over $x$, which leads to the convergent algorithm
$$
x^{(k+1)}=\sum_{i=1}^ne_i(x^{(k)})A_ix^{(k)}.
$$
#SMOCAF
For our alternative majorization we use the inequality
$$
x'A_ix\geq 2x'A_iy-y'A_iy.
$$
Define the polyhedral convex set $\mathcal{C}(y)$ of all $x$ such that $2x'A_iy-y'A_iy\geq 0$
for all $i$. Clearly $y\in\mathcal{C}(y)$, and thus $\mathcal{C}(y)$ is non-empty. Also define
$$
h(x,y)\mathop{=}\limits^{\Delta}1+\frac12 x'x-\sum_{i=1}^n\sqrt{2x'A_iy-y'A_iy},
$$
for all $x\in\mathcal{C}(y)$. Because the square root of a non-negative linear form is concave, minimizing $h(x,x^{(k)})$ over $x\in\mathcal{C}(x^{(k)})$ is a convex problem with linear constraints and leads to a convergent MM algorithm, which we call SMOCAF.
Of course a SMOCAF step is inherently more complicated than a SMACOF step. The idea is interesting, at least to me, because the SMOCAF algorithm solves a sequence of convex programming problems in which the constraint set changes over iterations.
#Example
We try out both SMACOF and SMOCAF on a simple example, where there are 10 matrices $A_i$,
all covariance matrices of a 20 by 5 matrix filled with random standard normals. The SMOCAF iterations are implemented using `constrOptim()` from base R, which uses a logarithmic barrier
method for the convex programming step. Since we have very good initial estimates, and we do not really have to iterate until convergence within each step, I assume the SMOCAF implementation can be made much more efficient. In addition, acceleration techniques specific to DC programming, such as the one discussed in @artacho_fleming_vuong_17, could be applied. Or general acceleration techniques, discussed recently in great detail in @sidi_17. But optimal speed is not what interests us here.
```r
system.time(print(smacof(a, rep(1,5), verbose = FALSE)))
```
```
## $itel
## [1] 116
##
## $stress
## [1] -61.77740087
##
## $coef
## [1] 0.7270085698 6.3676564611 8.5480388387 -2.1757084262 2.1625300759
##
## $grad
## [1] -7.316141844e-07 -4.745316849e-06 4.648248849e-06 5.576051584e-06
## [5] 1.455168160e-06
```
```
## user system elapsed
## 0.036 0.001 0.037
```
```r
system.time(print(smocaf(a, rep(1,5), verbose = FALSE)))
```
```
## $itel
## [1] 120
##
## $stress
## [1] -61.77740087
##
## $coef
## [1] 0.727008322 6.367654855 8.548040412 -2.175706538 2.162530569
##
## $grad
## [1] -7.554716913e-07 -4.899925776e-06 4.799694909e-06 5.757723873e-06
## [5] 1.502571069e-06
```
```
## user system elapsed
## 0.433 0.013 0.451
```
For both majorizations convergence is nice and smooth, and both converge to the same solution. The number of iterations is about the same, but since SMOCAF iterations are considerably more time-consuming the overall SMOCAF time is about 10 times the SMACOF time.
#Appendix: Code
##smacof.R
```r
smacof <-
function (a,
xold,
itmax = 1000,
eps = 1e-10,
verbose = TRUE) {
n <- dim(a)[3]
m <- dim(a)[1]
dold <- rep(0, n)
itel <- 1
for (i in 1:n)
dold[i] <- sqrt (sum (a[, , i] * outer (xold, xold)))
sold <- sum (xold ^ 2) / 2 - sum(dold)
repeat {
xnew <- rep (0, m)
dnew <- rep (0, n)
for (i in 1:n)
xnew <- xnew + drop(a[, , i] %*% xold) / dold [i]
for (i in 1:n)
dnew[i] <- sqrt (sum (a[, , i] * outer (xnew, xnew)))
snew <- sum (xnew ^ 2) / 2 - sum (dnew)
if (verbose)
cat(
"Iteration: ",
formatC (itel, width = 3, format = "d"),
"sold: ",
formatC (
sold,
digits = 8,
width = 12,
format = "f"
),
"snew: ",
formatC (
snew,
digits = 8,
width = 12,
format = "f"
),
"\n"
)
if ((sold - snew) < eps || (itel == itmax))
break
sold <- snew
xold <- xnew
dold <- dnew
itel <- itel + 1
}
g <- xnew
for (i in 1:n) {
g <- g - drop(a[, , i] %*% xnew) / dnew[i]
}
return(list(
itel = itel,
stress = snew,
coef = xnew,
grad = g
))
}
```
##smocaf.R
```r
obj <- function (x, y, a) {
f <- sum (x ^ 2) / 2
for (i in 1:10) {
f <- f - sqrt (abs (sum (a[, , i] * outer (2 * x - y, y))))
}
return (f)
}
grad <- function (x, y, a) {
g <- x
for (i in 1:10) {
g <-
g - (1 / sqrt (abs (sum (
a[, , i] * outer (2 * x - y, y)
)))) * a[, , i] %*% y
}
return (g)
}
smocaf <-
function (a,
xold,
itmax = 1000,
eps = 1e-10,
verbose = TRUE) {
n <- dim(a)[3]
m <- dim(a)[1]
dold <- rep(0, n)
itel <- 1
for (i in 1:n)
dold[i] <- sqrt (sum (a[, , i] * outer (xold, xold)))
sold <- sum (xold ^ 2) / 2 - sum(dold)
repeat {
uold <- matrix (0, n, m)
cold <- rep (0, n)
dnew <- rep (0, n)
for (i in 1:n) {
uold[i, ] <- 2 * a[, , i] %*% xold
cold[i] <- sum (a[, , i] * outer (xold, xold))
}
h <-
constrOptim (xold, obj, grad, uold, cold, y = xold, a = a)
xnew <- h$par
for (i in 1:n)
dnew[i] <- sqrt (sum (a[, , i] * outer (xnew, xnew)))
snew <- sum (xnew ^ 2) / 2 - sum (dnew)
if (verbose)
cat(
"Iteration: ",
formatC (itel, width = 3, format = "d"),
"sold: ",
formatC (
sold,
digits = 8,
width = 12,
format = "f"
),
"snew: ",
formatC (
snew,
digits = 8,
width = 12,
format = "f"
),
"\n"
)
if ((sold - snew) < eps || (itel == itmax))
break
sold <- snew
xold <- xnew
dold <- dnew
itel <- itel + 1
}
g <- xnew
for (i in 1:n) {
g <- g - drop(a[, , i] %*% xnew) / dnew[i]
}
return(list(
itel = itel,
stress = snew,
coef = xnew,
grad = g
))
}
```
#References