Note: This is a working paper which will be expanded/updated frequently. The directory gifi.stat.ucla.edu/full has a pdf copy of this article, the complete Rmd file with all code chunks, the bib file, the LaTeX style file, and the R source code.

over the \(n\times p\) *configurations* \(X\) (Kruskal (1964a), Kruskal (1964b)). Symmetric matrix \(W=\{w_{ij}\}\) has known non-negative *weights*, symmetric matrix \(\Delta=\{\delta_{ij}\}\) has non-negative dissimilarities, and the symmetric matrix-valued function \(D(X)=\{d_{ij}(X)\}\) has Euclidean distances between the *points*, i.e. the rows of \(X\).

In the usual applications of MDS the user chooses *dimensionality* \(p\) to be much smaller than \(n\), because the name of the game is *dimensionality reduction*. In this paper we give a detailed analysis of *full dimensional scaling* or *FDS*, which is MDS with \(p=n\).

Now expand stress. If we assume without loss of generality that \[ \mathop{\sum\sum}_{1\leq i<j\leq n} w_{ij}\delta_{ij}^2=1, \] and we define

\begin{align} \rho(X)&:=\mathop{\sum\sum}_{1\leq i<j\leq n} w_{ij}\delta_{ij}d_{ij}(X),\label{E:rho}\\ \eta(X)&:=\mathop{\sum\sum}_{1\leq i<j\leq n} w_{ij}d_{ij}^2.\label{E:eta} \end{align} Now \begin{equation} \sigma(X)=1-\rho(X)+\frac12\eta(X). \end{equation} To develop some convenient matrix notation we define \begin{align} V&:=\mathop{\sum\sum}_{1\leq i<j\leq n} w_{ij}A_{ij},\\ B(X)&:=\mathop{\sum\sum}_{1\leq i<j\leq n} \begin{cases}w_{ij}\frac{\delta_{ij}}{d_{ij}(X)}A_{ij}&\text{ if }d_{ij}(X)>0,\\ 0&\text{ if }d_{ij}(X)=0,\end{cases} \end{align} so that \begin{align} \rho(X)&=\mathbf{tr}\ X'B(X)X,\\ \eta(X)&=\mathbf{tr}\ X'VX. \end{align}Note that \(V\) has off-diagonal elements equal to \(-w_{ij}\) and \(B(X)\) has off-diagonal elements \[b_{ij}(X)=-w_{ij}\frac{\delta_{ij}}{d_{ij}(X)}\] if \(d_{ij}(X)>0\) and zero otherwise. Diagonal elements are filled to make rows and columns sum to zero. Both \(V\) and \(B(X)\) are doubly-centered and positive semi-definite. If we assume without loss of generality that the MDS problem is *irredicible* (De Leeuw (1977)) then \(\mathbf{rank}(V)=n-1\) and the only vectors in the null space are multiples of \(e:=(1,1,\cdots,1)\). If all weights are eual to one then \(V=nI-ee'\).

De Leeuw (1984) proves a key necessary condition for a local minimum in MDS. Also see De Leeuw, Groenen, and Mair (2016) for a more recent proof. We also review some additional results on maxima and minima of stress.

**Theorem 1: [De Leeuw 84, De Leeuw 93, De Leeuw and Groenen 97]**

- If \(\sigma\) has a local minimum at \(X\) then \(d_{ij}(X)>0\) for all \(i,j\) such that \(w_{ij}\delta_{ij}>0\).
- If \(\sigma\) has a local minimum at \(X\) then \(\sigma\) is differentiable in a neighborhood of \(X\). Thus \(B(X)X=VX\).
- If \(\sigma\) has a local minimum at a column-centered \(X\) and \(\mathbf{rank}(X)=n-1\) then \(\sigma(X)=0\).
- \(\sigma\) has a local maximum at \(X\) if and only if \(X=0\).

Let’s look at a small example with four points, all dissimilarities equal, and all weights equal to one. There is a local minimum with four points in the corners of a square, with stress equal to 0.0285955. And there is another local minimum with three points forming an equilateral triangle, and the fourth point in the center. This has stress 0.0669873. We can compute stress for all points of the form \(\alpha X+\beta Y\), where \(X\) and \(Y\) are the two local minima. Figure 1 has a contour plot of \(\sigma(\alpha,\beta)\), showing the local minima at \((1,0)\) and \((0,1)\), and the local maximum at \((0,0)\).

Alternatively, we can plot stress on the line connecting \(X\) and \(Y\). Note that although stress only has a local maximum at the origin in configuration space, it can very well have local maxima if restricted to lines.

and \(d_{ij}^2(C):=\mathbf{tr}\ A_{ij}C\).

We call the space of all positive semi-definite \(n\times n\) matrices *cross product space*. The problem of minimizing \(\sigma\) over \(n\times p\)-dimensional configuration space is equivalent to the problem of minimizing \(\sigma\) over the set of matrices \(C\) in \(n\times n\)-dimensional cross product space that have rank less than or equal to \(p\). The corresponding solutions are related by the simple relationship \(C=XX'\).

**Theorem 2: [De Leeuw 93]** Stress is convex on cross product space.

**Proof:** First, \(\eta\) is linear in \(C\). Second, \[
\rho(C)=\mathop{\sum\sum}_{1\leq i<j\leq n} w_{ij}\delta_{ij}\sqrt{\mathbf{tr}\ A_{ij}C}.
\] This is the weighted sum of square roots of non-negative functions that are linear in \(C\), and it is consequently concave. Thus \(\sigma\) is convex. **QED**

Unfortunately the subset of cross product space of all matrices with rank less than or equal to \(p\) is far from simple (see Datorro (2015)), so computational approaches to MDS prefer to work in configuration space.

Cross product space, the set of all positive semi-definite matrices, is a closed convex cone \(\mathcal{K}\) in the linear space of all \(n\times n\) symmetric matrices. This has an interesting consequence.

**Theorem 3: [De Leeuw 93]** Full-dimensional scaling, i.e. minimizing \(\sigma\) over \(\mathcal{K}\), is a convex programming problem. Thus in FDS all local minima are global. If \(w_{ij}\delta_{ij}>0\) for all \(i,j\) then the minimum is unique.

This result has been around since about 1985. De Leeuw (1993) gives a proof, but the report it appeared in remained unpublished. A published proof is in De Leeuw and Groenen (1997). Another treatment of FDS, with a somewhat different emphasis, is in De Leeuw (2014).

Now, by a familiar theorem (Theorem 31.4 in Rockafellar (1970)), a matrix \(C\) minimizes \(\sigma\) over \(\mathcal{K}\) if and only if \begin{align} C&\in\mathcal{K},\\ V-B(C)&\in\mathcal{K},\\ \mathbf{tr}\ C(V-B(C))&=0. \end{align}We give a computational proof of this result for FDS that actually yields a bit more.

**Proof:** Simple expansion. **QED**

**Theorem 5: [Convex]:** Suppose \(C\) is a solution to the problem of minimizing \(\sigma\) over \(\mathcal{K}\). Then

- \(\mathbf{tr}\ A_{ij}C > 0\) for all \(i,j\) for which \(w_{ij}\delta_{ij}>0\).
- \(V-B(C)\) is positive semi-definite.
- \(\mathbf{tr}\ C(V-B(C))=0\).
- If \(C\) is positive definite then \(V=B(C)\) and \(\sigma(C)=0\).

**Proof:** The \(\epsilon^\frac12\) term in \(\eqref{E:expand}\) needs to vanish at a local minimum. This proves the first part. It follows that at a local minimum

If \(V-B(C)\) is not positive semi-definite, then there is a \(\Delta\in\mathcal{K}\) such that \(\mathbf{tr}\ (V-B(C))\Delta < 0\). Thus \(C\) cannot be the minimum, which proves the second part. If we choose \(\Delta=C\) we find

\begin{equation*} \sigma((1+\epsilon)C)=\sigma(C)+ \epsilon\ \mathbf{tr}\ (V-B(C))C+o(\epsilon). \end{equation*}and choosing \(\epsilon\) small and negative shows we must have \(\mathbf{tr}\ (V-B(C))C=0\) for \(C\) to be a minimum. This proves the third part. Finally, if \(\sigma\) has a minimum at \(C\), and \(C\) is positive definite, then from parts 2 and 3 we have \(V=B(C)\). Comparing off-diagonal elements shows \(\Delta=D(C)\), and thus \(\sigma(C)=0\). **QED**

If \(C\) is the solution of the FDS problem, then \(\mathbf{rank}(C)\) defines the *Gower rank* of the dissimilarities. The number of positive eigenvalues of the negative of the doubly-centered matrix of squared dissimilarities, the matrix factored in classical MDS, defines the *Torgerson rank* of the dissimilarities. The *Gower conjecture* is that the Gower rank is less than or equal to the Torgerson rank. No proof and no counter examples have been found.

in the space of all \(n\times n\) configurations, using the identity matrix as a default starting point. Since we work in configuration space, not in crossproduct space, this does not guarantee convergence to the unique FDS solution, but after convergence we can easily check the necessary and sufficient conditions of theorem 5.

As a small example, consider four points with all dissimilarities equal to one, except \(\delta_{14}\) which is equal to three. Clearly the triangle inequality is violated, and thus there certainly is no perfect fit mapping into Euclidean space.

The FDS solution turns out to have rank two, thus the Gower rank is two. The singular values of the FDS solution are

`## [1] 0.4508464709 0.2125310645 0.0000001303`

Gower rank two also follows from the eigenvalues of the matrix \(B(C)\), which are

`## [1] 1.0000000000 1.0000000000 0.9205543464`

and the stationary equations are \[
B(\gamma)\gamma=U\gamma.
\] We have already seen an example of \(\sigma\) on what can be called *combination space* in figure 1, where \(X\) and \(Y\) are two local minima. Another interesting example uses a configuration \(X\) and its gradient \(Y=\mathcal{D}\sigma(X)\). In that case finding the minimum of \(\sigma\) over combination space means finding the optimal step size in the steepest descent gradient method.

For a configuration with three points equally spaced on a horizontal line and the fourth point above the middle of the three

```
## [,1] [,2]
## [1,] -0.1006209 -0.1006209
## [2,] 0.0000000 -0.1006209
## [3,] 0.1006209 -0.1006209
## [4,] 0.0000000 0.3018627
```

we have gradient

```
## [,1] [,2]
## 1 0.3250910 -0.08772788
## 2 0.3638044 -0.07804082
## 3 -0.6888953 -0.08772788
## 4 0.0000000 0.25349657
```

The contour plot for the linear combinations is