--- title: "Multidimensional Scaling with Upper Bounds" author: "Jan de Leeuw" date: "Version 22, January 13, 2017" output: pdf_document: keep_tex: yes number_sections: yes toc: yes toc_depth: 3 html_document: keep_md: yes number_sections: yes toc: yes toc_depth: 3 fontsize: 12pt graphics: yes bibliography: below.bib abstract: We give an algorithm, with R code, to minimize the multidimensional scaling stress loss function under the condition that some or all of the fitted distances are smaller than given upper bounds. --- {r function_code, echo = FALSE} source("below.R")  {r packages, echo = FALSE} options (digits = 10) library (captioner)  {r equation_numbering, echo = FALSE} figure_nums <- captioner (prefix = "Figure") figure_nums (name = "equi_regular_conf", caption = "Equidistance Data, Regular Left, From Below Right", display = FALSE) figure_nums (name = "equi_regular_shepard", caption = "Equidistance Data, Regular Left, From Below Right", display = FALSE) figure_nums (name = "gruijter_regular_conf", caption = "De Gruijter Data, Regular Left, From Below Right", display = FALSE) figure_nums (name = "gruijter_regular_shepard", caption = "De Gruijter Data, Regular Left, From Below Right", display = FALSE) figure_nums (name = "gruijter_maxbound", caption = "De Gruijter Data, Distances Bounded by Maximum Dissimilarity", display = FALSE) figure_nums (name = "gruijter_infbound", caption = "De Gruijter Data, Imposing Small Distances", display = FALSE) figure_nums (name = "ekman_regular_conf", caption = "Ekman Data, Regular Left, From Below Right", display = FALSE) figure_nums (name = "ekman_regular_shepard", caption = "Ekman Data, Regular Left, From Below Right", display = FALSE) figure_nums (name = "ekman_distances", caption = "Ekman Data, Regular vs Bounded Distances", display = FALSE)  Note: This is a working paper which will be expanded/updated frequently. All suggestions for improvement are welcome. The directory [gifi.stat.ucla.edu/below](http://gifi.stat.ucla.edu/below) has a pdf version, the complete Rmd file with all code chunks, the bib file, and the R source code. #Introduction In this paper we study the minimization of *stress*, defined as $$\mathbf{stress}(X):=\frac14\sum_{i=1}^n\sum_{j=1}^n w_{ij}(\delta_{ij}-d_{ij}(X))^2$$ over $n\times p$ *configuration matrices* $X$. Here $W=\{w_{ij}\}$ and $\Delta=\{\delta_{ij}\}$ are given symmetric, hollow, and non-negative matricesof, respectively, *weights* and *dissimilarities*. The $d_{ij}(X)$ are Euclidean distances between the points with coordinates in the rows of $X$. Thus $$d_{ij}^2(X):=(x_i-x_j)'(x_i-x_j)=(e_i-e_j)'XX'(e_i-e_j)=\mathbf{tr}\ X'A_{ij}X.$$ Here $e_i$ and $e_j$ are unit vectors, with all elements equal to zero except for the single element $i$ or $j$, which is equal to one. Also $A_{ij}:=(e_i-e_j)(e_i-e_j)'$. The problem of minimizing stress is well-studied (see, for example, @borg_groenen_05) . In this paper we analyze the problem of minimizing stress over all configurations $X$ for which $d_{ij}(X)\leq\alpha_{ij}$ for some or all pairs $i,j$, where the $\alpha_{ij}$ are positive bounds. As a special case we have *MDS from below*, where we require $d_{ij}(X)\leq\delta_{ij}$ for all $i,j$. Note that constraints of this type are not exactly independent. If we require $d_{ij}(X)\leq\alpha_{ij}$ and $d_{ik}(X)\leq\alpha_{ik}$ then, by the triangle inequality, we automatically require $d_{jk}(X)\leq\alpha_{ij}+\alpha_{ik}$. On the other hand the constraints are always consistent, because they will be satisfied if we take $X$ small enough. The constraints $d_{ij}(X)\leq\alpha_{ij}$, or equivalently $d_{ij}^2(X)\leq\alpha_{ij}^2$, define a convex set in configuration space, but stress is not convex. We can, however, use basic smacof theory (@deleeuw_C_77) to majorize stress by a convex quadratic. This can be used to define a simple iterative algorithm. In each iteration we majorize stress, and then minimize the convex quadratic majorizer over configurations satisfying the convex quadratic constraints $d_{ij}^2(X)\leq\alpha_{ij}^2$. Compute a new majorization at the minimizer of the majorized problem, and so on. These are the two steps in the majorization or MM approach to optimization (@deleeuw_C_94c, @lange_16). To minimize a convex quadratic under convex quadratic constraints we use the dual Newton method from @deleeuw_E_17a. ##Simplifying stress A simplified expression for stress, first used in @deleeuw_R_93c, is $$\mathbf{stress}(x)=\frac12\sum_{k=1}^K w_k(\delta_k-\sqrt{x'A_kx})^2.$$ We have put the elements below the diagonal of $W$ and $\Delta$ in a vector of length $\frac12 n(n-1)$. Also $x:=\mathbf{vec}(X)$ and the $A_k$ are the direct sums of $p$ copies of the corresponding $A_{ij}$. Now expand the square, and assume without loss of generality that $\frac12\sum_{k=1}^K w_k\delta_k^2=1$. Then $$\mathbf{stress}(x)=1-\sum_{k=1}^K w_k\delta_k\sqrt{x'A_kx}+\frac12 x'Vx,$$ with $V:=\sum_{k=1}^K w_kA_k$. Suppose $V=K\Lambda^2K'$ is a complete eigen-decomposition of $V$. Some care is needed to handle the fact that $V$ is singular, but under the assumption of irreducibility (@deleeuw_C_77) this is easily taken care of. Change variables with $z=\Lambda^{\frac12}K'x$. Then $$\mathbf{stress}(z)=1-\sum_{k=1}^K w_k\delta_k\sqrt{z'\tilde A_kz}+\frac12 z'z,$$ with $\tilde A_k=\Lambda^{-\frac12}K'A_kK\Lambda^{-\frac12}$, so that $\sum_{k=1}^K w_k\tilde A_k=I$. ##Smacof Notation and Theory We have see in the previous section that we can write $$\mathbf{stress}(x)=1-\rho(x)+\frac12 x'x,$$ where $$\rho(x):=\sum_{k=1}^K w_k\delta_k d_k(x),$$ with distances $d_k(x):=\sqrt{x'A_kx}$, and the $A_k$ positive semi-definite matrices that add up to the identity. By Cauchy-Schwarz, $$\rho(x)=x'B(x)x\geq x'B(y)y,$$ where $$B(x):=\sum_{k=1}^K w_k\frac{\delta_k}{d_k(x)}A_k.$$ If we define the *Guttman transform* if $x$ as $\Gamma(x):=B(x)x$, then for all $x$ $$\mathbf(stress)(x)=1-x'\Gamma(x)+\frac12 x'x=(1-\frac12\Gamma(x)'\Gamma(x))+\frac12(x-\Gamma(x))'(x-\Gamma(x)),$$ and for all $x,y$ $$\mathbf(stress)\leq 1-x'\Gamma(y)+\frac12 x'x=(1-\frac12\Gamma(y)'\Gamma(y))+\frac12(x-\Gamma(y)'(x-\Gamma(y)),$$ In this notation a smacof iteration is $x^{(k+1)}=\Gamma(x^{(k)})$. ##Necessary Conditions We look at the necessary conditions for an optimum of the bounded MDS problem. The Lagrangian is $$\mathcal{L}(x,\lambda):=\mathbf{stress}(x)+\frac12\sum_{k=1}^K\lambda_k(x'A_kx-\delta_k^2)$$ If $x$ is feasible and optimal for the problem of minimizing stress from below then there is a $\lambda\geq 0$, not all zero, such that $$\mathcal{D}_1\mathcal{L}(x,\lambda)=\left(I+\sum_{k=1}^K\lambda_kA_k\right)x-\Gamma(x)=0$$ and the multipliers $\lambda$ and constraint functions are complementary, i.e. $\lambda_k(x'A_kx-\delta_k^2)=0$ for all $k$. Compare this with the necessary condition for a local minimum in regular smacof, which is simply $x=\\Gamma(x)$. It should be noted that $x=\Gamma(x)$ implies $\rho(x)=x'x$, and consequently at a stationary point $\sigma(x)=1-\frac12 x'x$, which implies $x'x\leq 2$. Thus $\sum_{k=1}^K w_kd_k^2(x)\leq 2$, which implies $d_k(x)\leq\sqrt{2/w_k}$. Even in regular smacof, without bound constraints, the distances at a stationary point are bounded, and imposing the constraints $x'A_kx\leq 2/w_k$ still allows the unconstrained smacof solution. If the minimum stress is non-zero, then none of the constraints will be active, and all the distances will be strictly smaller than the a priori bounds. #MDS from Below Classical MDS (@torgerson_58) can be interpreted as minimizing the loss function *strain*, defined as $$\mathbf{strain}(X):=\frac14\mathbf{tr}\ J_n(\Delta^2-D^2(X))J_n(\Delta^2-D^2(X)),$$ over all $n\times p$ centered configuration matrices $X$, where $J_n$ is the centering operator $J_n=I_n-\frac{1}{n}ee'$, $\Delta^2$ is a matrix with squared dissimilarities, and $D^2(X)$ is a matrix of squared Euclidean distances. The loss function interpretation is based on the identity $$-\frac12 J_nD^2(X)J_n=XX',$$ which implies $$\mathbf{strain}(X)=\mathbf{tr}\ (C-XX')^2,$$ with $C:=-\frac12J_n\Delta^2J_n$. If $C$ has signature $(n_+,n_0,n_-)$ then we can write $C=\overline{C}-\underline{C}$ with both $\overline{C}$ and $\underline{C}$ positive semi-definite, of ranks $n_+$ and $n_-$, and with $\mathbf{tr}\ \overline{C}\underline{C}=0$. $\overline{C}$ is the projection of $C$ on the convex cone of positive semi-definite matrices, and $\underline{C}=-(C-\overline{C})$ is the negative of the projection on its polar cone. In terms of the eigenvalues and eigenvectors of $C$ we can write $C=\overline{K}\overline{\Lambda}\overline{K}'+\underline{K}\underline{\Lambda}\underline{K}$, with overlining used for the positive eigenvalues and their eigenvectors, and underlining for the negative ones. Thus $$\mathbf{strain}(X)=\mathbf{tr}\ (\overline{C}-\underline{C}-XX')^2=\mathbf{tr}\ (\overline{C}-XX')^2+\mathbf{tr}\ \underline{C}^2+2\mathbf{tr}\ X'\underline{C}X,$$ and $$\min_X\mathbf{strain}(X)\geq\min_Z\mathbf{tr}\ (\overline{C}-ZZ')^2+\mathbf{tr}\ \underline{C}^2+2\min_Y\mathbf{tr}\ Y'\underline{C}Y=\min_Z\mathbf{tr}\ (\overline{C}-ZZ')^2+\mathbf{tr}\ \underline{C}^2.$$ Thus if $p\geq n_+$ we choose $Z$ so that $ZZ'=\overline{C}$, if p {r equi_confplot, fig.align = "center", echo = FALSE} par(mfrow=c(1,2), pty = "s") plot(h2ax, type = "n", col = "RED", xlab = "dim 1", ylab = "dim 2") text (h2a$x, as.character(1:10), col = "RED") plot(h2b$x, type = "n", col = "RED", xlab = "dim 1", ylab = "dim 2") text (h2bx, as.character(1:10), col = "RED")  r figure_nums("equi_regular_conf") {r equi_shepardplot, fig.align = "center", echo = FALSE} par(mfrow=c(1,2), pty = "s") plot(h2adelta, h2a$dist, xlab = "dissimilarities", ylab = "distances", col = "RED", xlim = c(0, 1.5), ylim = c(0, 1.5)) abline (0, 1, col = "BLUE", lwd = 2) plot(h2b$delta, h2b$dist, xlab = "dissimilarities", ylab = "distances", col = "RED", xlim = c(0, 1.5), ylim = c(0, 1.5)) abline (0, 1, col = "BLUE", lwd = 2)  r figure_nums("equi_regular_shepard") To illustrate complementary slackness we give the values of the constraint functions$d_{ij}^2(X)-\delta_{ij}^2$, which are all non-positive, and the values of the Lagrange multipliers, which are all non-negative. We see that a constraint function that is equal to zero means a positive Lagrange multiplier, and a constraint function that is negative means a zero Lagrange multiplier. {r equi_complement} mprint (h2b$constraints, d = 6, f = "+") mprint (h2b$multipliers, d = 6, f = "+")  ##Dutch Political Parties 1967 As the next illustration we use data from @degruijter_67, with average dissimilarity judgments between Dutch political parties in 1967. The data are {r poldist-data, echo = FALSE} poldist <- structure( c( 5.63, 5.27, 4.6, 4.8, 7.54, 6.73, 7.18, 6.17, 6.72, 5.64, 6.22, 5.12, 4.59, 7.22, 5.47, 5.46, 4.97, 8.13, 7.55, 6.9, 4.67, 3.2, 7.84, 6.73, 7.28, 6.13, 7.8, 7.08, 6.96, 6.04, 4.08, 6.34, 7.42, 6.88, 6.36, 7.36 ), Labels = c("KVP", "PvdA", "VVD", "ARP", "CHU", "CPN", "PSP", "BP", "D66"), Size = 9L, call = quote(as.dist.default(m = polpar)), class = "dist", Diag = FALSE, Upper = FALSE ) print (poldist) poldata <- as.vector (poldist) polnames <- attr (poldist, "Labels")  {r below_comp, echo = FALSE, cache = FALSE} h1a <- smacof (poldata) h1b <- smacofBelow (poldata)  The optimal smacof solution in two dimensions needs r h1a$itel iterations to arrive at stress r h1a$stress. Approximation from below uses r h1b$itel iterations and finds stress r h1b$stress. At the solution there are r length (which (abs (h1b$constraints) < 1e-6)) active constraints, but as the Shepard plots in figure r figure_nums (name = "gruijter_regular_shepard", display = "n") show, the active constraints now do not correspond with the largest dissimilarities, and the two configurations in figure r figure_nums (name = "gruijter_regular_conf", display = "n") are quite different. Although that could, of course, also be because they correspond with two different local minima, which mostly differ in the location of the D66 party (a neoliberal party, full of civilized protest).
{r smacof_confplot, fig.align = "center", echo = FALSE} par(mfrow=c(1,2), pty = "s") plot(h1a$x, type = "n", xlab = "dim 1", ylab = "dim 2")#, xlim = c(-.15, .15), ylim = c(-.15, .15)) text(h1a$x, polnames, col = "RED") plot(h1b$x, type = "n", xlab = "dim 1", ylab = "dim 2")#, xlim = c(-.15, .15), ylim = c(-.15, .15)) text(h1b$x, polnames, col = "RED") 
r figure_nums("gruijter_regular_conf")

{r smacof_shepardplot, fig.align = "center", echo = FALSE} par(mfrow=c(1,2), pty = "s") plot(h1a$delta, h1a$dist, xlab = "dissimilarities", ylab = "distances", col = "RED")# xlim = c(0, .25), ylim = c(0, .25)) abline (0, 1, col = "BLUE", lwd = 2) plot(h1b$delta, h1b$dist, xlab = "dissimilarities", ylab = "distances", col = "RED")# xlim = c(0, .25), ylim = c(0, .25)) abline (0, 1, col = "BLUE", lwd = 2) 
r figure_nums("gruijter_regular_shepard")

In the next analysis we require that all distances are less than or equal to 8.13, the maximum dissimilarity. In order words, the maximum distances must be less than or equal to the maximum dissimilarity. {r gruijter_maxbound_comp, echo = FALSE} h1c <- smacofBelow (poldata, bnd = rep(max(poldata), length (poldata)))  The optimal solution under these constraints in two dimensions needs r h1c$itel iterations to arrive at stress r h1c$stress. At the solution there are r length (which (abs (h1cconstraints) < 1e-6)) active constraints, i.e. six distances equal to 8.13. {r gruijter_maxbound_plot, fig.align = "center", echo = FALSE} par(mfrow=c(1,2), pty = "s") plot (h1cx, type = "n", xlab = "dim 1", ylab = "dim 2") text(h1c$x, polnames, col = "RED") plot (h1c$delta, h1c$dist, xlab = "dissimilarities", ylab = "distances", col = "RED") abline (0, 1, col = "BLUE", lwd = 2)  r figure_nums ("gruijter_maxbound") And finally an analysis in which we require that distances between ARP, CHU, KVP (the Christian Democrat parties) are less than 0.1, and the distances between PvdA, PSP, CPN (the leftist parties) are also less than 0.1. This will effectively collapse them in the configuration plot. {r gruijter_infbound_comp, echo = FALSE} bnd <- rep (Inf, length (poldata)) bnd[c(3, 4, 22, 12, 13, 31)] <- 4 h1d <- smacofBelow (poldata, bnd = bnd, verbose = FALSE)  The optimal solution under these constraints in two dimensions needs r h1d$itel iterations to arrive at stress r h1d$stress. At the solution all r length (which (abs (h1d$constraints) < 1e-6)) constraints are active.
{r gruijter_infbound_plot, fig.align = "center", echo = FALSE} par(mfrow=c(1,2), pty = "s") plot (h1d$x, type = "n", xlab = "dim 1", ylab = "dim 2", xlim = c(-5, 5), ylim = c(-5, 5)) text(h1d$x, polnames, col = "RED") plot (h1d$delta, h1d$dist, xlab = "dissimilarities", ylab = "distances", col = "RED") abline (0, 1, col = "BLUE", lwd = 2) 
r figure_nums ("gruijter_infbound")

##Ekman Color Data {r ekman, echo = FALSE} data (ekman, package = "smacof") nms <- as.character (attr (ekman, "Labels")) ekman <- as.vector (1 - ekman)  {r ekman_run, echo = FALSE, cache = FALSE} h3a <-smacof (ekman) h3b <- smacofBelow (ekman)  The final example are the color data from @ekman_54. The optimal smacof solution in two dimensions needs r h3a$itel iterations to arrive at stress r h3a$stress. Approximation from below uses r h3b$itel iterations and finds stress r h3b$stress. At the solution there are r length (which (abs (h3bconstraints) < 1e-6)) active constraints. Again the bounded solution is a shrunken version of the regular smacof solution, and figure r figure_nums (name = "ekman_regular_shepard", display = "n") shows that the active constraints correspond with the largest dissimilarities and distances. {r ekman_confplot, fig.align = "center", echo = FALSE} par(mfrow=c(1,2), pty = "s") plot(h3ax, type = "n", xlab = "dim 1", ylab = "dim 2")# xlim = c(-.08, .08), ylim = c(-.08, .08)) text(h3a$x, nms, col = "RED") plot(h3b$x, type = "n", xlab = "dim 1", ylab = "dim 2")# xlim = c(-.08, .08), ylim = c(-.08, .08)) text(h3bx, nms, col = "RED")  r figure_nums("ekman_regular_conf") {r ekman_shepardplot, fig.align = "center", echo = FALSE} par(mfrow=c(1,2), pty = "s") plot(h3adelta, h3a$dist, xlab = "dissimilarities", ylab = "distances", col = "RED")# xlim = c(0, .15), ylim = c(0, .15)) abline (0, 1, col = "BLUE", lwd = 2) plot(h3b$delta, h3bdist, xlab = "dissimilarities", ylab = "distances", col = "RED")# xlim = c(0, .15), ylim = c(0, .15)) abline (0, 1, col = "BLUE", lwd = 2)  r figure_nums("ekman_regular_shepard") Figure r figure_nums("ekman_distances", display = "n") shows that the two configurations in figure r figure_nums ("ekman_regular_conf", display = "n") may look to be proportional, they really are not. There are many deviations from proportionality if we compare the fitted distances in the regular and bounded smacof solutions. {r ekman_distances, fig.align = "center", echo = FALSE} plot(h3adist, h3b$dist, col = "RED", xlab = "regular distances", ylab = "bounded distances")  r figure_nums("ekman_distances") #Discussion If$p=n-1$our multidimensional scaling problem is full-dimensional scaling or FDS (@deleeuw_groenen_mair_E_16e). In that case stress can be parametrized by using the scalar products$C$and we must minimize $$\mathbf{stress}(C):=\sum_{i=1}^n\sum_{j=1}^n w_{ij}(\delta_{ij}-\sqrt{\mathbf{tr}\ A_{ij}C})^2$$ over$C\gtrsim 0$and$\mathbf{tr}\ A_{ij}C\leq\alpha_{ij}$. This means that FDS with upper bounds means minimizing a convex function over a convex set, in fact solving a semidefinite programming problem. It also means that the concept of Gower rank starts playing a role (@deleeuw_E_16k). MDS from below can also be used with different loss functions. Minimizing $$\sum_{i=1}^n\sum_{j=1}^nw_{ij}|\delta_{ij}-d_{ij}(X)|$$ over$d_{ij}(X)\leq\delta_{ij}$is the same as maximizing the weighted sum of the distances, a convex function. Thus we can use tangential majorization (@deleeuw_B_16b), which means the majorization function is linear and has no quadratic term. Alternatively, the problem of minimizing $$\sum_{i=1}^n\sum_{j=1}^nw_{ij}|\delta_{ij}^2-d_{ij}^2(X)|$$ over$d_{ij}(X)\leq\delta_{ij}$means maximizing a convex quadratic form. Although the objective function is quadratic, we still need tangential majorization to get a convex objective to minimize. We do not address the problem in this paper of minimizing stress under constraints of the form$\beta_{ij}\leq d_{ij}(X)\leq\alpha_{ij}$. This is a fundamentally more complicated problem, because the lower bounds make the solution set non-convex. In fact, using lower bounds may make the solution set empty, because there may not be$p\$-dimensional Euclidean distances satisfying the constraints. It will be of some interest to apply our algorithm to unfolding examples, where we could have upper bound constraints on the row distances, the column distances, or the row-column distances. Of course in the unfolding case lower bound constraints are at least as interesting, because they can in principle be used to avoid the trivial solutions that are so common in unfolding, but we have argued that computationally lower bound constraints are far more difficult to handle. #Appendix: Code ##below.R {r file_auxilary, code = readLines("below.R")}  ##auxilary.R {r file_auxilary, code = readLines("auxilary.R")}  ##mdsUtils.R {r file_auxilary, code = readLines("mdsUtils.R")}  ##smacof.R {r file_auxilary, code = readLines("smacof.R")}  ##smacofBelow.R {r file_auxilary, code = readLines("smacofBelow.R")}  #References