Abstract

In Multidimensional Scaling we sometimes find that stress does not decrease if we increase dimensionality. This is explained in this note by using the Gower rank. Some examples with small Gower rank are analyzed.**Note:** This is a working paper which may be expanded/updated frequently. All suggestions for improvement are welcome. The directory gifi.stat.ucla.edu/gower has a pdf version, the complete Rmd file with all code chunks, and the bib file.

It follows that \(\sigma\) is convex on the convex cone of psd matrices, and thus all local minimizers of the FDS problem are global. Suppose \(\mathcal{I}\) is the set of all pairs \((i,j)\) with \(w_{ij}\delta_{ij}>0\). Then (De Leeuw (1984)) at a local minimum \(d_{ij}(C)>0\) for all \((i,j)\in\mathcal{I}\). Thus \(\sigma\) is differentiable in a neighborhood of local minima.

We need some additional definitions. We use theIf we assume, for convenience, that \(\sum_{j=1}^n\sum_{j=1}^n w_{ij}\delta_{ij}^2=1\), then \(\sigma(C)=1-2\mathbf{tr}\ B(C)C+\mathbf{tr}\ VC\).

From convex analysis (Rockafellar (1970), theorem 31.4) the necessary and sufficient conditions for \(C\) to be a minimizer are \begin{align} C&\gtrsim 0,\label{E:psd}\\ B(C)&\lesssim V,\label{E:pos}\\ \mathbf{tr}\ (B(C)-V)C&=0.\label{E:ort} \end{align}In general we do not have uniqueness. Think of the case in which \(\mathcal{I}\) has only one element, say \((1,2)\). Then any psd matrix \(C\) with \(c_{11}+c_{22}-2c_{12}=\delta_{12}^2\) is a minimizer with stress equal to zero. The local minimum is unique if and only if \(C=0\) is the only psd solution of the homogeneous linear system \(d_{ij}^2(C)=\mathbf{tr}\ A_{ij}C=0\) with \((i,j)\in\mathcal{I}\). Let us assume that this is the case, and that the global minimizer \(C\) is of rank \(p\). Then \(p\) is called the *Gower rank* of the pair \((W,\Delta)\). If the minimum is not unique the Gower rank is the minimum rank of the minimizers.

There is a clear analogy with classical MDS, in which the corresponding quantity \(p^\star\) is the number of positive eigenvalues of \(-\frac12 J\Delta^{(2)}J\), the doubly-centered matrix of squared dissimilarities. In fact, the *Gower conjecture* is that the Gower rank \(p\) is less than or equal to \(p^\star\).

If \(C=XX'\) is a full rank decomposition, with \(X\) of dimensions \(n\times p\), then we can write \(\eqref{E:ort}\) as \(V^+B(C)X=X\), using superscript \(+\) for the Moore-Penrose inverse. Thus \(X\) are eigenvectors of \(V^+B(C)\) with eigenvalues equal to one. Condition \(\eqref{E:pos}\) says that all eigenvalues of \(V^+B(C)\) are less than or equal to one, and \(X\) corresponds with the \(p\) largest eigenvalues.

`smacof`

(De Leeuw and Mair (2009)) parametrize the MDS problem in terms of an \(n\times p\) with \(d_{ij}(X)=\sqrt{\mathbf{tr}\ A_{ij}XX'}\). An MDS(p) algorithm minimizes \(\sigma\) over all \(n\times p\) configurations.

An iterative MDS(p) algorithm converges to a configuration with \(B(X)X=VX\), where \(B(X)\) is defined by \(\eqref{E:bdef}\) with \(C=XX'\). Thus \(C=XX'\) solves \(\eqref{E:psd}\) and \(\eqref{E:ort}\), but the eigenvalues of \(V^+B(X)\) can be larger than one, and consequently \(C\) does not solve the FDS problem. But conversely if the MDS(p) solution \(X\) gives a matrix \(V^+B(X)\) with \(p\) eigenvalues equal to one, then \(p\) is the Gower rank and \(C=XX'\) solves the FDS problem. This also implies that \(X\) gives the global minimum of the MDS(q) problem for all \(q\geq p\).

The following asymmetric matrix is taken from Wang and Chin (2010), table 5. Wong and Beasley (1990) constructed artificial data comparing seven hypothetical departments in a university in terms of three input variables (number of academic staff, academic staff salaries, support staff salaries) and three output variables (number of undergraduates, number of postgraduate students, number of resaerch papers). The data are used to construct a cross-efficiency matrix, which uses linear programming to compute input and output weights that measure efficiency of a department relative to another department. This produces cross efficiencies between zero and one, which can be interpreted as similarities, so we actually use one minus the data from Wang and Chin (2010).

```
## a b b d d f g
## A 0.0000 0.0188 0.2310 0.3589 0.0618 0.0000 0.0000
## B 0.0781 0.0000 0.2281 0.2987 0.1010 0.0000 0.0000
## C 0.0000 0.1490 0.0000 0.5458 0.5050 0.0000 0.7059
## D 0.3125 0.0000 0.2651 0.1803 0.2350 0.0494 0.0000
## E 0.0000 0.1539 0.3349 0.5865 0.0000 0.0896 0.0000
## F 0.0000 0.0188 0.2310 0.3589 0.0618 0.0000 0.0000
## G 0.0000 0.0188 0.2310 0.3589 0.0618 0.0000 0.0000
```

It was observed by Shahin Ashkiani, PhD candidate at Autonomous University of Barcelona, that metric unfolding solutions for this matrix exhibit some curious behaviour (personal communication, June 4, 2016). Increasing the dimensionality of the solution does not decrease the minimum stress. The MDS(p) solution has the same minimum stress for all \(p\geq 2\). This indicates the Gower rank of these data is two.

Although metric unfolding can be more efficiently done by `smacofRect()`

, we use `smacofSym()`

on the \(14\times 14\) augmented matrix, with the data in the off-diagonal submatrices and with zero weights for the diagonal submatrices. This corresponds more directly with the notation we use in this paper. We start with a random configuration and iterate until difference between successive stresses is less that 1e-15.

`## ndim = 1 | iterations = 5 | stress = 0.5805246365`

`## ndim = 2 | iterations = 136 | stress = 0.4067723531`

`## ndim = 3 | iterations = 129 | stress = 0.4067723531`

`## ndim = 4 | iterations = 144 | stress = 0.4067723531`

`## ndim = 5 | iterations = 119 | stress = 0.4067723531`

`## ndim = 6 | iterations = 142 | stress = 0.4067723531`

We can also illustrate this by showing the singular values of the configuration computed by `smacofSym()`

.

`## [1] 2.724878`

`## [1] 2.574972 1.997666`

`## [1] 2.5749717 1.9976662 0.0000001`

`## [1] 2.574972 1.997666 0.000000 0.000000`

`## [1] 2.5749712 1.9976669 0.0000005 0.0000000 0.0000000`

`## [1] 2.574972 1.997666 0.000000 0.000000 0.000000 0.000000`

There is a more interesting example in Wang and Chin (2010), analyzing cross efficiences computed from data of Tofallis (1997). In this case the decision making units are 14 airlines. The input variables are aircraft capacity, operating cost, and non-flight assets. The output variables are passenger kilometres and non-passenger revenue. The cross-efficiences are

```
## a b c d e f g h i j k l
## A 0.13 0.55 0.38 0.13 0.15 0.53 0.19 0.21 0.30 0.25 0.13 0.23
## B 0.83 0.66 0.95 0.83 0.83 0.98 0.75 0.73 0.72 0.79 0.83 0.80
## C 0.12 0.81 0.05 0.12 0.12 0.31 0.28 0.32 0.38 0.22 0.12 0.19
## D 0.04 0.57 0.30 0.04 0.06 0.30 0.18 0.21 0.30 0.19 0.04 0.17
## E 0.03 0.63 0.00 0.03 0.00 0.00 0.23 0.26 0.22 0.00 0.03 0.00
## F 0.12 0.89 0.04 0.12 0.12 0.02 0.34 0.39 0.49 0.28 0.12 0.25
## G 0.08 0.22 0.52 0.08 0.12 0.66 0.00 0.00 0.16 0.22 0.08 0.20
## H 0.22 0.39 0.48 0.22 0.23 0.71 0.15 0.14 0.18 0.25 0.22 0.24
## I 0.21 0.27 0.49 0.21 0.21 0.73 0.12 0.09 0.05 0.16 0.21 0.16
## J 0.22 0.36 0.35 0.22 0.17 0.64 0.22 0.21 0.00 0.00 0.22 0.03
## K 0.00 0.00 0.57 0.00 0.00 0.56 0.00 0.00 0.00 0.00 0.00 0.00
## L 0.05 0.37 0.25 0.05 0.04 0.56 0.06 0.06 0.00 0.00 0.05 0.00
## M 0.00 0.57 0.00 0.00 0.00 0.54 0.00 0.00 0.00 0.02 0.00 0.00
## N 0.00 0.77 0.00 0.00 0.00 0.00 0.22 0.27 0.35 0.14 0.00 0.12
## m n
## A 0.13 0.13
## B 0.83 0.83
## C 0.12 0.12
## D 0.04 0.04
## E 0.03 0.03
## F 0.12 0.12
## G 0.08 0.08
## H 0.22 0.22
## I 0.21 0.21
## J 0.22 0.22
## K 0.00 0.00
## L 0.05 0.05
## M 0.00 0.00
## N 0.00 0.00
```

We repeat the same analysis as in the first example. In this case we conclude the Gower rank is equal to three.

`## ndim = 1 | iterations = 7 | stress = 0.4984859586`

`## ndim = 2 | iterations = 291 | stress = 0.2403664444`

`## ndim = 3 | iterations = 1340 | stress = 0.2386715692`

`## ndim = 4 | iterations = 1784 | stress = 0.2386715692`

`## ndim = 5 | iterations = 1870 | stress = 0.2386715692`

`## ndim = 6 | iterations = 1920 | stress = 0.2386715692`

The singular values for the second example are

`## [1] 4.487714`

`## [1] 4.247787 2.580708`

`## [1] 4.151090 2.632157 1.042903`

`## [1] 4.151090 2.632154 1.042909 0.000020`

`## [1] 4.151090 2.632154 1.042909 0.000020 0.000000`

`## [1] 4.151090 2.632154 1.042909 0.000020 0.000000 0.000000`

The third example was already analyzed in De Leeuw, Groenen, and Mair (2016). The Ekman (1954) color data give similarities between 14 colors. We transform to dissimilarities by using the transformation \((1-x)^3\), and we repeat the analysis from the previous sections. Note that this is not an unfolding example, all weights are equal to one. The Gower rank turns out to be two, i.e. for this transformation the two-dimensional `smacof`

solution is the global minimizer of stress in \(p\geq 2\) dimensions.

`## ndim = 1 | iterations = 4 | stress = 0.3893229700`

`## ndim = 2 | iterations = 47 | stress = 0.1049991045`

`## ndim = 3 | iterations = 174 | stress = 0.1049991045`

`## ndim = 4 | iterations = 174 | stress = 0.1049991045`

`## ndim = 5 | iterations = 173 | stress = 0.1049991045`

`## ndim = 6 | iterations = 175 | stress = 0.1049991045`

The singular values for the third example are

`## [1] 2.348357`

`## [1] 2.060186 1.477827`

`## [1] 2.0601856 1.4778274 0.0000007`

`## [1] 2.0601856 1.4778274 0.0000007 0.0000000`

`## [1] 2.0601856 1.4778274 0.0000007 0.0000000 0.0000000`

`## [1] 2.0601856 1.4778274 0.0000007 0.0000000 0.0000000 0.0000000`

001 06/06/16

- First release

002 08/06/16

- Third Example

003 08/06/16

- Added something on uniqueness
- Added Gower conjecture

De Leeuw, J. 1984. “Differentiability of Kruskal’s Stress at a Local Minimum.” *Psychometrika* 49: 111–13. http://www.stat.ucla.edu/~deleeuw/janspubs/1984/articles/deleeuw_A_84f.pdf.

———. 1993. “Fitting Distances by Least Squares.” Preprint Series 130. Los Angeles, CA: UCLA Department of Statistics. http://www.stat.ucla.edu/~deleeuw/janspubs/1993/reports/deleeuw_R_93c.pdf.

De Leeuw, J., and P. Mair. 2009. “Multidimensional Scaling Using Majorization: SMACOF in R.” *Journal of Statistical Software* 31 (3): 1–30. http://www.stat.ucla.edu/~deleeuw/janspubs/2009/articles/deleeuw_mair_A_09c.pdf.

De Leeuw, J., P. Groenen, and P. Mair. 2016. “Full-Dimensional Scaling.” doi:10.13140/RG.2.1.1038.4407.

Ekman, G. 1954. “Dimensions of Color Vision.” *Journal of Psychology* 38: 467–74.

Rockafellar, R.T. 1970. *Convex Analysis*. Princeton University Press.

Tofallis, C. 1997. “Input Efficiency Profiling: an Application to Airlines.” *Computers and Operations Research* 24: 253–58.

Wang, Y.-M., and K.-S. Chin. 2010. “A Neutral DEA Model for Cross-efficiency Evaluation and its Extension.” *Expert Systems with Applications* 37: 3666–75.

Wong, Y.-H. B., and J.E. Beasley. 1990. “Restricting Weight Flexibility in Data Envelopment Analysis.” *The Journal of the Operational Research Society* 41: 829–35.