--- title: "Multidimensional Scaling with Lower Bounds" author: "Jan de Leeuw" date: "Version 01, January 13, 2017" output: html_document: keep_md: yes number_sections: yes toc: yes toc_depth: 3 pdf_document: keep_tex: yes number_sections: yes toc: yes toc_depth: 3 fontsize: 12pt graphics: yes bibliography: above.bib abstract: We give an algorithm, with R code, to minimize the multidimensional scaling stress loss function under the condition that some or all of the fitted distances are larger than given positive lower bounds. This paper is a companion to @deleeuw_E_17b. We also give some interesting majorization theory. --- {r function_code, echo = FALSE} source("above.R")  {r packages, echo = FALSE} options (digits = 10) library (captioner)  {r equation_numbering, echo = FALSE} figure_nums <- captioner (prefix = "Figure") lemma_nums <- captioner (prefix = "Lemma") theorem_nums <- captioner (prefix = "Theorem") lemma_nums (name= "feasibility", caption ="[Feasibility]", display = FALSE) lemma_nums (name= "monotonicity", caption ="[Monotonicity]", display = FALSE) theorem_nums (name= "convergence", caption ="[Convergence]", display = FALSE) theorem_nums (name= "necessary", caption ="[Necessary]", display = FALSE) figure_nums (name = "equi_regular_conf", caption = "Equidistance Data, Regular Left, Lower Bounds Right", display = FALSE) figure_nums (name = "equi_regular_shepard", caption = "Equidistance Data, Regular Left, Lower Bounds Right", display = FALSE) figure_nums (name = "gruijter_regular_conf", caption = "De Gruijter Data, Regular Left, From Above Right", display = FALSE) figure_nums (name = "gruijter_regular_shepard", caption = "De Gruijter Data, Regular Left, From Above Right", display = FALSE) figure_nums (name = "gruijter_maxbound", caption = "De Gruijter Data, Distances Bounded by Minimum Dissimilarity", display = FALSE) figure_nums (name = "gruijter_infbound", caption = "De Gruijter Data, Imposing Large Distances", display = FALSE)  Note: This is a working paper which will be expanded/updated frequently. All suggestions for improvement are welcome. The directory [gifi.stat.ucla.edu/above](http://gifi.stat.ucla.edu/above) has a pdf version, the complete Rmd file with all code chunks, the bib file, and the R source code. #Introduction In this paper we study the minimization of *stress*, defined as $$\mathbf{stress}(X):=\frac14\sum_{i=1}^n\sum_{j=1}^n w_{ij}(\delta_{ij}-d_{ij}(X))^2$$ over $n\times p$ *configuration matrices* $X$. Here $W=\{w_{ij}\}$ and $\Delta=\{\delta_{ij}\}$ are given symmetric, hollow, and non-negative matricesof, respectively, *weights* and *dissimilarities*. The $d_{ij}(X)$ are Euclidean distances between the points with coordinates in the rows of $X$. Thus $$d_{ij}^2(X):=(x_i-x_j)'(x_i-x_j)=(e_i-e_j)'XX'(e_i-e_j)=\mathbf{tr}\ X'A_{ij}X.$$ Here $e_i$ and $e_j$ are unit vectors, with all elements equal to zero except for the single element $i$ or $j$, which is equal to one. Also $A_{ij}:=(e_i-e_j)(e_i-e_j)'$. The problem of minimizing stress is well-studied (see, for example, @borg_groenen_05). In this paper we analyze the problem of minimizing stress over all configurations $X$ for which $d_{ij}(X)\geq\alpha_{ij}$ for some or all pairs $i,j$, where the $\alpha_{ij}$ are positive bounds. As a special case we have *MDS from above*, where we require $d_{ij}(X)\geq\delta_{ij}$ for all $i,j$. These optimization problems are non-standard in various ways. The constraints $d_{ij}(X)\geq\alpha_{ij}$ define a *reverse convex set* (@meyer_70) in configuration space, and in addition stress is not convex. We can, however, use basic smacof theory (@deleeuw_C_77) to majorize stress by a convex quadratic and to majorize the constraints by linear functions. Majorization can be used to define a simple iterative algorithm. In each iteration we majorize stress and the constraints, and then minimize the convex quadratic majorizer over configurations satisfying the majorized linear constraints using quadratic programming. Compute a new majorization at the minimizer of the majorized problem, and so on. These are the two steps in the majorization or MM approach to optimization (@deleeuw_C_94c, @lange_16). #Some Majorization Theory Suppose the problem we are interested in is to minimize a real-valued $f_0$ over all $x\in\mathbb{R}^n$ that satisfy $f_j(x)\leq 0$ for all $j=1,\cdots,m$. We call this problem $\mathcal{P}$. Now suppose $g_j:\mathbb{R}^n\otimes\mathbb{R}^n\rightarrow\mathbb{R}$ majorizes $f_j$, i.e. \begin{align*} f_j(x)&\leq g_j(x,y)\text{ for all }x,y\in\mathbb{R}^n,\\ f_j(y)&=g_j(y,y)\text{ for all }y\in\mathbb{R}^n. \end{align*} To simplify matters we suppose all $f_j$ and $g_j$ are continuously differentiable, and all $g_j$ are convex in their first argument. Define problem $\mathcal{P}_k$ as the problem of minimizing $g_0(x,x^{(k-1)})$ over the $x\in\mathbb{R}^n$ that satisfy $g_j(x,x^{(k-1)})\leq 0$. Define a sequence $x^{(k)}$ with $x^{(0)}$ feasible for $\mathcal{P}$, and with $x^{(k)}$ for $k\geq 1$ a solution of $\mathcal{P}_k$. **r theorem_nums("convergence")** 1. $x^{(k)}$ is feasible for $\mathcal{P}$ for all $k\geq 1$. 2. $f_0(x^{(k)})\leq f_0(x^{(k-1)})$ for all $k\geq 1$. **Proof:** For the first part $f_j(x^{(k)})\leq g_j(x^{(k)},x^{(k-1)})$ by majorization, and $g_j(x^{(k)},x^{(k-1)})\leq 0$ by feasibility for $\mathcal{P}_k$. For the second part $f_0(x^{(k)})\leq g_0(x^{(k)},x^{(k-1)})$ by majorization. Because $x^{(k)}$ solves $\mathcal{P}_k$ we have $g_0(x^{(k)},x^{(k-1)})\leq g_0(x^{(k-1)},x^{(k-1)})$ if $x^{(k-1)}$ is feasible for $\mathcal{P}_k$, i.e. if $g_j(x^{(k-1)},x^{(k-1)})\leq 0$. But $g_j(x^{(k-1)},x^{(k-1)})=f_j(x^{(k-1)})\leq 0$ by the first part of the theorem. **QED** #Fixed Points By theorem r theorem_nums("convergence", display = "n") we have a sequence of points with decreasing function values that are all feasible for $\mathcal{P}$. We can say a bit more about accumulation points of this sequence. Define $\mathcal{P}(y)$ as the convex problem of minimizing $g_0(x,y)$ over $x\in\mathbb{R}^n$ and $g_j(x,y)\leq 0$. **r theorem_nums("necessary")** If $x$ solves $\mathcal{P}(x)$ then $x$ satisfies the first order necessary conditions for a local minimum of $\mathcal{P}$. **Proof:** The necessary and sufficient conditions for $x$ to be a minimum of $\mathcal{P}(y)$ are that there exists $\lambda\geq 0$ such that $$\mathcal{D}_1g_0(x,y)+\sum_{j=1}^m\lambda_j\mathcal{D}_1g_j(x,y)=0,$$ with in addition $g_j(x,y)\leq 0$ and $\lambda_jg_j(x,y)=0$. The first order necessary conditions for $x$ to be a local minimum of $\mathcal{P}$ are that there exists $\lambda\geq 0$ such that $$\mathcal{D}f_0(x)+\sum_{j=1}^m\lambda_j\mathcal{D}f_j(x)=0,$$ with in addition $f_j(x)\leq 0$ and $\lambda_jf_j(x)=0$. But if the $f_j$ and $g_j$ are differentiable, we know from majorization theory (@deleeuw_B_16b, @lange_16) that $\mathcal{D}f_j(x)=\mathcal{D}_1g_j(x,x)$ for all $x$. **QED** It follows that if we define $\mathcal{F}(y)$ as the solution of $\mathcal{P}(y)$ then fixed points of $\mathcal{F}$ are local minimizers of $\mathcal{P}$. Thus if $x^{(k)}$ converges to a fixed point of $\mathcal{F}$, it converges to a local minimizer of $\mathcal{P}$, and $f_0(x^{(k)})$ converges to a local minimum. #MDS with Lower Bounds We have see in the companion paper (@deleeuw_E_17b) that we can write $$\mathbf{stress}(x)=1-\rho(x)+\frac12 x'x,$$ where $$\rho(x):=\sum_{k=1}^K w_k\delta_k d_k(x),$$ with distances $d_k(x):=\sqrt{x'A_kx}$, and the $A_k$ positive semi-definite matrices that add up to the identity. Define $$B(x):=\sum_{k=1}^K w_k\frac{\delta_k}{d_k(x)}A_k,$$ and define the *Guttman transform* if $x$ as $\Gamma(x):=B(x)x$, then for all $x$ $$\mathbf{stress}(x)=1-x'\Gamma(x)+\frac12 x'x=(1-\frac12\Gamma(x)'\Gamma(x))+\frac12(x-\Gamma(x))'(x-\Gamma(x)),$$ and for all $x,y$ we have the majorization $$\mathbf{stress}(x)\leq 1-x'\Gamma(y)+\frac12 x'x=(1-\frac12\Gamma(y)'\Gamma(y))+\frac12(x-\Gamma(y)'(x-\Gamma(y)),$$ The constraints are $\sqrt{x'A_kx}\geq\alpha_k$ or $\alpha_k-\sqrt{x'A_kx}\leq 0$. The linearized constraints are $$\alpha_k-\frac{1}{\sqrt{y'A_ky}}x'A_ky\leq 0$$ Thus the quadratic programming problem we have to solve in each smacof iteration is minimization of $1-x'\Gamma(x^{(k-1)})+\frac12 x'x$ over $x$ satisfying $$x'A_kx^{(k-1)}\geq\alpha_kd_k(x^{(k-1)}).$$ We use the qp function from the lsei package (@wang_lawson_hanson_15). #Examples ##Equidistances {r below_comp_2, echo = FALSE, cache = FALSE} eqdata <- rep (1, 45) h2a <- smacof (eqdata) h2b <- smacofAbove (eqdata, bnd = c(rep (1, 9), rep (0, 36)))  Our first example has $n=10$ with all $\delta_{ij}$ equal. The optimal smacof solution in two dimensions needs r h2a$itel iterations to arrive at stress r h2a$stress. We reanalyze the data with the constraint that the distance between the first point and all nine others is at least one. The algorithm incorporating these bounds, implemented in the function smacofAbove, uses r h2b$itel iterations and finds stress r h2b$stress. The two configurations are in figure r figure_nums (name = "equi_regular_conf", display = "n").
{r equi_confplot, fig.align = "center", echo = FALSE} par(mfrow=c(1,2), pty = "s") plot(h2a$x, type = "n", col = "RED", xlab = "dim 1", ylab = "dim 2") text (h2a$x, as.character(1:10), col = "RED") plot(h2b$x, type = "n", col = "RED", xlab = "dim 1", ylab = "dim 2") text (h2b$x, as.character(1:10), col = "RED") 
r figure_nums("equi_regular_conf")

{r equi_shepardplot, fig.align = "center", echo = FALSE} par(mfrow=c(1,2), pty = "s") plot(h2a$delta, h2a$dist, xlab = "dissimilarities", ylab = "distances", col = "RED", xlim = c(0, 1.5), ylim = c(0, 1.5)) abline (0, 1, col = "BLUE", lwd = 2) plot(h2b$delta, h2b$dist, xlab = "dissimilarities", ylab = "distances", col = "RED", xlim = c(0, 1.5), ylim = c(0, 1.5)) abline (0, 1, col = "BLUE", lwd = 2) 
r figure_nums("equi_regular_shepard")

##Dutch Political Parties 1967 As the next illustration we use data from @degruijter_67, with average dissimilarity judgments between Dutch political parties in 1967. The data are {r poldist-data, echo = FALSE} poldist <- structure( c( 5.63, 5.27, 4.6, 4.8, 7.54, 6.73, 7.18, 6.17, 6.72, 5.64, 6.22, 5.12, 4.59, 7.22, 5.47, 5.46, 4.97, 8.13, 7.55, 6.9, 4.67, 3.2, 7.84, 6.73, 7.28, 6.13, 7.8, 7.08, 6.96, 6.04, 4.08, 6.34, 7.42, 6.88, 6.36, 7.36 ), Labels = c("KVP", "PvdA", "VVD", "ARP", "CHU", "CPN", "PSP", "BP", "D66"), Size = 9L, call = quote(as.dist.default(m = polpar)), class = "dist", Diag = FALSE, Upper = FALSE ) print (poldist) poldata <- as.vector (poldist) polnames <- attr (poldist, "Labels")  {r below_comp, echo = FALSE, cache = FALSE} h1a <- smacof (poldata) h1b <- smacofAbove (poldata, bnd = poldata)  The optimal smacof solution in two dimensions needs r h1a$itel iterations to arrive at stress r h1a$stress. Approximation from above, where we require that all distances are at least as large as the corresponding dissimilarties, uses r h1b$itel iterations and finds stress r h1b$stress. At the solution there are r length (which (abs (h1bconstraints) < 1e-6)) active constraints, but as the Shepard plots in figure r figure_nums (name = "gruijter_regular_shepard", display = "n") show, the active constraints now do not correspond with the smallest dissimilarities, although they do correspond to the smallest distances. {r smacof_confplot, fig.align = "center", echo = FALSE} par(mfrow=c(1,2), pty = "s") plot(h1ax, type = "n", xlab = "dim 1", ylab = "dim 2")#, xlim = c(-.15, .15), ylim = c(-.15, .15)) text(h1a$x, polnames, col = "RED") plot(h1b$x, type = "n", xlab = "dim 1", ylab = "dim 2")#, xlim = c(-.15, .15), ylim = c(-.15, .15)) text(h1bx, polnames, col = "RED")  r figure_nums("gruijter_regular_conf") {r smacof_shepardplot, fig.align = "center", echo = FALSE} par(mfrow=c(1,2), pty = "s") plot(h1adelta, h1a$dist, xlab = "dissimilarities", ylab = "distances", col = "RED")# xlim = c(0, .25), ylim = c(0, .25)) abline (0, 1, col = "BLUE", lwd = 2) plot(h1b$delta, h1b$dist, xlab = "dissimilarities", ylab = "distances", col = "RED")# xlim = c(0, .25), ylim = c(0, .25)) abline (0, 1, col = "BLUE", lwd = 2)  r figure_nums("gruijter_regular_shepard") In the next analysis we require that all distances are larger than or equal to r min(poldata), the minimum dissimilarity. In order words, the smallest distances must be larger than or equal to the minimum dissimilarity. {r gruijter_maxbound_comp, echo = FALSE} h1c <- smacofAbove (poldata, bnd = rep(min(poldata), length (poldata)))  The optimal solution under these constraints in two dimensions needs r h1c$itel iterations to arrive at stress r h1cstress. {r gruijter_maxbound_plot, fig.align = "center", echo = FALSE} par(mfrow=c(1,2), pty = "s") plot (h1cx, type = "n", xlab = "dim 1", ylab = "dim 2") text(h1c$x, polnames, col = "RED") plot (h1c$delta, h1c$dist, xlab = "dissimilarities", ylab = "distances", col = "RED") abline (0, 1, col = "BLUE", lwd = 2)  r figure_nums ("gruijter_maxbound") And finally an analysis in which we require that distances between ARP, CHU, KVP (the Christian Democrat parties) are larger than 5, and the distances between PvdA, PSP, CPN (the leftist parties) are also larger than 5. This forces them apart (as it turns out on ) {r gruijter_infbound_comp, echo = FALSE} bnd <- rep (0, length (poldata)) bnd[c(3, 4, 22, 12, 13, 31)] <- 5 h1d <- smacofAbove (poldata, bnd = bnd, verbose = FALSE)  The optimal solution under these constraints in two dimensions needs r h1d$itel iterations to arrive at stress r h1d$stress. The six constrained distances are r h1d$dist[c(3, 4, 22, 12, 13, 31)], so the Christian Democrats, for example, are on an equilateral triangle with side 5.
{r gruijter_infbound_plot, fig.align = "center", echo = FALSE} par(mfrow=c(1,2), pty = "s") plot (h1d$x, type = "n", xlab = "dim 1", ylab = "dim 2", xlim = c(-5, 5), ylim = c(-5, 5)) text(h1d$x, polnames, col = "RED") plot (h1d$delta, h1d$dist, xlab = "dissimilarities", ylab = "distances", col = "RED") abline (0, 1, col = "BLUE", lwd = 2) abline (h = 5, col = "GREEN", lwd = 2) 
r figure_nums ("gruijter_infbound")

#Discussion The obvious next step is to combine the algorithms of @deleeuw_E_17b and this paper to implement both upper and lower bounds on the distances. Of course for that case we have to be careful that a solution actually exists. Another next step is to see in how far these bounding algorithms are useful for both metric and non-metric unfolding. #Appendix: Code ##below.R {r file_auxilary, code = readLines("above.R")}  ##auxilary.R {r file_auxilary, code = readLines("auxilary.R")}  ##mdsUtils.R {r file_auxilary, code = readLines("mdsUtils.R")}  ##smacof.R {r file_auxilary, code = readLines("smacof.R")}  ##smacofBelow.R {r file_auxilary, code = readLines("smacofAbove.R")}  #References