# Singularities and Zero Distances in Multidimensional Scaling
Jan de Leeuw, Patrick Groenen, Patrick Mair
Version 01, Februari 15, 2016
Note: This is a working paper which will be expanded/updated frequently. The directory [gifi.stat.ucla.edu/zeroes](http://gifi.stat.ucla.edu/zeroes) has a pdf copy of this article, the complete Rmd file with all code chunks, the bib file, and the R source code.
#Problem
Multidimensional scaling (MDS) in this paper is minimization of the loss function *stress*, introduced by @kruskal_64a, @kruskal_64b.
Stress is defined as
\begin{equation}
\sigma(X):=\mathop{\sum\sum}_{1\leq i0,\\
0&\text{ if }d_{ij}(X)=0,
\end{cases}\\
b_{ii}(X)&:=-\sum_{1\leq j\not= i\leq n} b_{ij}(X)
\end{align*}
An important result on zero distances at a local minimum was proved by @deleeuw_A_84f. It is based on a formula for the Dini directional derivative (see also @deleeuw_groenen_mair_E_16b).
**Theorem 1: [Directional Derivative]**
$$
\lim_{\epsilon\downarrow 0}\frac{\sigma(X+\epsilon Y)-\sigma(X)}{\epsilon}=\mathbf{tr}\ Y'(V-B(X))X-\sum\{w_{ij}\delta_{ij}d_{ij}(Y)\mid d_{ij}(X)=0\}
$$
**Proof:**
Use
$$
d_{ij}(X+\epsilon Y)=d_{ij}(X)+\epsilon\begin{cases}\frac{1}{d_{ij}(X)}\mathbf{tr}\ X'A_{ij}Y&\text{ if }d_{ij}(X)>0,\\
d_{ij}(Y)&\text{ if }d_{ij}(X)=0.
\end{cases}
$$
to get the required result.
**QED**
**Theorem 2: [Necessary Condition]** If $\sigma$ has a local minimum at $X$ then $(V-B(X))X=0$ and $d_{ij}(X)>0$ for all $i\not= j$ such that $w_{ij}\delta_{ij}>0$.
**Proof:** Direct from theorem 1. **QED**
If $\sigma$ has a local minimum at $X$ and is differentiable at $X$, then the derivative vanishes. In the usual MDS problems we have $w_{ij}\delta_{ij}>0$ for all $i\not= j$, and theorem 2 says that stress is differentiable at
local minima.
Points where the derivative exists and vanishes are stationary points, which are not necessarily local minima. Because of the possibility of zero distances we need a more general notion of a stationary point, which we now define as points where $0\in\partial\sigma(X)$, with $\partial\sigma(X)$ the subdifferential of $\sigma$ at $X$. We know (@deleeuw_C_77) that $\sigma$ is the difference of a convex quadratic $\eta:=\mathbf{tr}\ X'VX$ and a convex function $\rho(X):+\mathbf{tr}\ X'B(X)X$, and thus the subdifferential is simply a translation of the convex subdifferential, or $\partial\sigma(X)=2(VX-\partial\rho(X))$.
**Theorem 3: [Stationary Points]** We have $0\in\partial\sigma(X)$ if and only if for all $i0,\\
\left\{(e_i-e_j)u'\mid u'u\leq 1\right\}&\text{ if }d_{ij}(X)=0.
\end{cases}
$$
**QED**
Note that theorem 3 implies that always $(V-B(X))X\in\partial\sigma(X)$, whether there are zero distances or not.
#Zero Distances
Theorem 2 tells us that at local minima $w_{ij}\delta_{ij}=0$ if $d_{ij}(X)=0$. But this leaves open some interesting questions about stationary points and local minima that are not covered by this result.
Can there be local minima with zero distances ? If we allow for zero dissimilarities the answer is clearly yes. Take $\Delta$ equal to $D(X)$ where the configuration $X$ has at least one zero distance. Then $X$ gives the global minimum of stress, equal to zero. And a global minimum is, of course, also a local minimum. If we allow for zero weights, then the answer is again yes, even if all dissimilarties are positive. Suppose we have four objects with all dissimilarities 1. We are fitting two-dimensional configurations. If all weights are one, the global minimum is attained for four points in the corners of a square. If we set $w_{12}=0$, however, we get perfect fit by putting points in the corners of an equilateral triangle, with objects 1 and 2 together in one corner. Since this makes stress zero, it is the global minimum. If we set, in addition, $w_{34}=0$ then we get the global minimum by having two points on the line, one with objects 1 and 2, and one with objects 3 and 4.
A related question is if there can there be solutions to $(V-B(X))X$ with zero distances ? We answer this more systematically.
Suppose $X$ satisfies the stationary equations $(V-B(X))X=0$ and $X$ is of block form, with all rows within a block the same, and rows in different blocks different. If there are $m$ blocks, then we can write $X=G\overline{X}$, where $G$ is an $n\times m$ binary indicator matrix,
indicating block membership. Suppose $I_j$ are the indices of the objects in block $j$.
It follows that $(V-B(X))G\overline{X}=0$ and thus $(\overline{V}-\overline{B}(X))\overline{X}=0$, where $\overline{V}:=G'VG$ and
$\overline{B}(X):=G'B(X)X$. We can write, somewhat suggestively, $\overline{B}(X)=B(\overline X)$. Use the fact that all between-set distances in $B_{j\ell}(X)$ are equal to $d_{j\ell}(\overline{X})$. Thus if, for $j\not=\ell$
\begin{align}
\overline{w}_{j\ell}&:=-\sum_{i\in I_j}\sum_{k\in I_\ell}w_{ik},\label{E:wcls}\\
\overline{b}_{j\ell}(X)&:=-\frac{1}{d_{j\ell}(\overline{X})}\sum_{i\in I_j}\sum_{k\in I_\ell}w_{ik}\delta_{ik},\label{E:bcls}
\end{align}
and the diagonal elements are filled in the make the matrices doubly-centered.
So the stationary equations for an MDS problem with zero distances have a *reduced* or *clustered* set of stationary equations, of the order of the number of *clusters* of points, which do not involve zero distance. This reduced MDS problem can also be interpreted as the original MDS problem, but with the constraints that there is some fixed clustering of points.
Conversely, if we have stationary equations for an MDS problem with nono-zero weights and distances, then we can always expand them to
stationary equations for a problem with zero distances, simply by finding $W$ and $\Delta$ that satisfy $\eqref{E:wcls}$ and $\eqref{E:bcls}$.
#Singularities
Suppose $(V-B(X))X=0$ and the $n\times p$ matrix $X$ has rank $r