--- title: "Gower Rank" author: "Jan de Leeuw" date: "Version 0.03, Jun 08, 2016" output: pdf_document: keep_tex: yes number_sections: yes toc: yes toc_depth: 2 html_document: keep_md: yes number_sections: yes toc: yes toc_depth: 2 fontsize: 12pt graphics: yes bibliography: gower.bib abstract: In Multidimensional Scaling we sometimes find that stress does not decrease if we increase dimensionality. This is explained in this note by using the Gower rank. Some examples with small Gower rank are analyzed. --- {r mprint, echo = FALSE} library(smacof) mprint <- function (x, d = 2, w = 5) { print (noquote (formatC (x, di = d, wi = w, fo = "f"))) }  **Note:** This is a working paper which may be expanded/updated frequently. All suggestions for improvement are welcome. The directory [gifi.stat.ucla.edu/gower](http://gifi.stat.ucla.edu/gower) has a pdf version, the complete Rmd file with all code chunks, and the bib file. #Theory In *full-dimensional scaling* (@deleeuw_R_93c, @deleeuw_groenen_mair_E_16e) or *FDS* we minimize a loss function defined as $$\sigma(C):=\sum_{i=1}^n\sum_{j=1}^nw_{ij}(\delta_{ij}-d_{ij}(C))^2$$ over all positive semi-definite (psd) matrices $C$. Here $w_{ij}$ are known non-negative *weights* and $\delta_{ij}$ are known non-negative *dissimilarities*. To prevent complete triviality we assume $w_{ij}\delta_{ij}>0$ for at least one pair $i,j$. The *distances* $d_{ij}(C)$ are defined by $$d_{ij}(C):=\sqrt{c_{ii}+c_{jj}-2c_{ij}}.$$ It follows that $\sigma$ is convex on the convex cone of psd matrices, and thus all local minimizers of the FDS problem are global. Suppose $\mathcal{I}$ is the set of all pairs $(i,j)$ with $w_{ij}\delta_{ij}>0$. Then (@deleeuw_A_84f) at a local minimum $d_{ij}(C)>0$ for all $(i,j)\in\mathcal{I}$. Thus $\sigma$ is differentiable in a neighborhood of local minima. We need some additional definitions. We use the *Loewner order*, i.e. $A\gtrsim B$ means that $A-B$ is psd. Define the matrices $$A_{ij}:=(e_i-e_j)(e_i-e_j)',$$ with $e_i$ and $e_j$ unit vectors, i.e. vectors with one element equal to one and the rest equal to zero. Note that $d_{ij}(C)=\sqrt{\mathbf{tr}\ A_{ij}C}$. Using the matrices $A_{ij}$ we define \begin{align} V&:=\sum_{j=1}^n\sum_{j=1}^n w_{ij}A_{ij},\label{E:vdef}\\ B(C)&:=\sum_{j=1}^n\sum_{j=1}^n w_{ij}\frac{\delta_{ij}}{d_{ij}(C)}A_{ij}.\label{E:bdef} \end{align} If we assume, for convenience, that $\sum_{j=1}^n\sum_{j=1}^n w_{ij}\delta_{ij}^2=1$, then $\sigma(C)=1-2\mathbf{tr}\ B(C)C+\mathbf{tr}\ VC$. From convex analysis (@rockafellar_70, theorem 31.4) the necessary and sufficient conditions for $C$ to be a minimizer are \begin{align} C&\gtrsim 0,\label{E:psd}\\ B(C)&\lesssim V,\label{E:pos}\\ \mathbf{tr}\ (B(C)-V)C&=0.\label{E:ort} \end{align} In general we do not have uniqueness. Think of the case in which $\mathcal{I}$ has only one element, say $(1,2)$. Then any psd matrix $C$ with $c_{11}+c_{22}-2c_{12}=\delta_{12}^2$ is a minimizer with stress equal to zero. The local minimum is unique if and only if $C=0$ is the only psd solution of the homogeneous linear system $d_{ij}^2(C)=\mathbf{tr}\ A_{ij}C=0$ with $(i,j)\in\mathcal{I}$. Let us assume that this is the case, and that the global minimizer $C$ is of rank $p$. Then $p$ is called the *Gower rank* of the pair $(W,\Delta)$. If the minimum is not unique the Gower rank is the minimum rank of the minimizers. There is a clear analogy with classical MDS, in which the corresponding quantity $p^\star$ is the number of positive eigenvalues of $-\frac12 J\Delta^{(2)}J$, the doubly-centered matrix of squared dissimilarities. In fact, the *Gower conjecture* is that the Gower rank $p$ is less than or equal to $p^\star$. If $C=XX'$ is a full rank decomposition, with $X$ of dimensions $n\times p$, then we can write $\eqref{E:ort}$ as $V^+B(C)X=X$, using superscript $+$ for the Moore-Penrose inverse. Thus $X$ are eigenvectors of $V^+B(C)$ with eigenvalues equal to one. Condition $\eqref{E:pos}$ says that all eigenvalues of $V^+B(C)$ are less than or equal to one, and $X$ corresponds with the $p$ largest eigenvalues. #Algorithm Multidimensional scaling (MDS) algorithms such as smacof (@deleeuw_mair_A_09c) parametrize the MDS problem in terms of an $n\times p$ *configuration* $X$. Thus $$\sigma(X):=\sum_{i=1}^n\sum_{j=1}^nw_{ij}(\delta_{ij}-d_{ij}(X))^2$$ with $d_{ij}(X)=\sqrt{\mathbf{tr}\ A_{ij}XX'}$. An MDS(p) algorithm minimizes $\sigma$ over all $n\times p$ configurations. An iterative MDS(p) algorithm converges to a configuration with $B(X)X=VX$, where $B(X)$ is defined by $\eqref{E:bdef}$ with $C=XX'$. Thus $C=XX'$ solves $\eqref{E:psd}$ and $\eqref{E:ort}$, but the eigenvalues of $V^+B(X)$ can be larger than one, and consequently $C$ does not solve the FDS problem. But conversely if the MDS(p) solution $X$ gives a matrix $V^+B(X)$ with $p$ eigenvalues equal to one, then $p$ is the Gower rank and $C=XX'$ solves the FDS problem. This also implies that $X$ gives the global minimum of the MDS(q) problem for all $q\geq p$. #First Example The following asymmetric matrix is taken from @wang_chin_10, table 5. @wong_beasley_90 constructed artificial data comparing seven hypothetical departments in a university in terms of three input variables (number of academic staff, academic staff salaries, support staff salaries) and three output variables (number of undergraduates, number of postgraduate students, number of resaerch papers). The data are used to construct a cross-efficiency matrix, which uses linear programming to compute input and output weights that measure efficiency of a department relative to another department. This produces cross efficiencies between zero and one, which can be interpreted as similarities, so we actually use one minus the data from @wang_chin_10. {r delta_data, echo = FALSE} delta <- structure(c(0, 0.0781, 0, 0.3125, 0, 0, 0, 0.0188, 0, 0.149, 0, 0.1539, 0.0188, 0.0188, 0.231, 0.2281, 0, 0.2651, 0.3349, 0.231, 0.231, 0.3589, 0.2987, 0.5458, 0.1803, 0.5865, 0.3589, 0.3589, 0.0618, 0.101, 0.505, 0.235, 0, 0.0618, 0.0618, 0, 0, 0, 0.0494, 0.0896, 0, 0, 0, 0, 0.7059, 0, 0, 0, 0), .Dim = c(7L, 7L), .Dimnames = list(NULL, c("V1", "V2", "V3", "V4", "V5", "V6", "V7"))) delta<-as.matrix(delta) row.names(delta)<-c("A","B","C","D","E","F","G") colnames(delta)<-c("a","b","b","d","d","f","g") mprint(delta, 4, 6)  It was observed by Shahin Ashkiani, PhD candidate at Autonomous University of Barcelona, that metric unfolding solutions for this matrix exhibit some curious behaviour (personal communication, June 4, 2016). Increasing the dimensionality of the solution does not decrease the minimum stress. The MDS(p) solution has the same minimum stress for all $p\geq 2$. This indicates the Gower rank of these data is two. Although metric unfolding can be more efficiently done by smacofRect(), we use smacofSym() on the $14\times 14$ augmented matrix, with the data in the off-diagonal submatrices and with zero weights for the diagonal submatrices. This corresponds more directly with the notation we use in this paper. We start with a random configuration and iterate until difference between successive stresses is less that 1e-15. {r delta_augment, echo = FALSE} dd <- matrix (0, 14, 14) dd[1:7, 8:14] <- delta dd[8:14, 1:7] <- t(delta) ww <- matrix (0, 14, 14) ww[1:7, 8:14] <- 1 ww[8:14, 1:7] <- 1  {r delta_analysis, echo = FALSE, cache = TRUE} h1 <- smacofSym(dd, ndim = 1, init = "random", weightmat = ww, eps = 1e-15, itmax = 10000) h2 <- smacofSym(dd, ndim = 2, init = "random", weightmat = ww, eps = 1e-15, itmax = 10000) h3 <- smacofSym(dd, ndim = 3, init = "random", weightmat = ww, eps = 1e-15, itmax = 10000) h4 <- smacofSym(dd, ndim = 4, init = "random", weightmat = ww, eps = 1e-15, itmax = 10000) h5 <- smacofSym(dd, ndim = 5, init = "random", weightmat = ww, eps = 1e-15, itmax = 10000) h6 <- smacofSym(dd, ndim = 6, init = "random", weightmat = ww, eps = 1e-15, itmax = 10000)  {r delta_out, echo = FALSE} cat (" ndim = 1 | iterations = ", formatC(h1$niter, width = 4, format = "d"), " | stress = ", formatC (h1$stress, digits = 10, width = 12, format = "f"), "\n") cat (" ndim = 2 | iterations = ", formatC(h2$niter, width = 4, format = "d"), " | stress = ", formatC (h2$stress, digits = 10, width = 12, format = "f"), "\n") cat (" ndim = 3 | iterations = ", formatC(h3$niter, width = 4, format = "d"), " | stress = ", formatC (h3$stress, digits = 10, width = 12, format = "f"), "\n") cat (" ndim = 4 | iterations = ", formatC(h4$niter, width = 4, format = "d"), " | stress = ", formatC (h4$stress, digits = 10, width = 12, format = "f"), "\n") cat (" ndim = 5 | iterations = ", formatC(h5$niter, width = 4, format = "d"), " | stress = ", formatC (h5$stress, digits = 10, width = 12, format = "f"), "\n") cat (" ndim = 6 | iterations = ", formatC(h6$niter, width = 4, format = "d"), " | stress = ", formatC (h6$stress, digits = 10, width = 12, format = "f"), "\n")  We can also illustrate this by showing the singular values of the configuration computed by smacofSym(). {r delta_svd, echo = FALSE} zapsmall(svd(h1$conf)$d) zapsmall(svd(h2$conf)$d) zapsmall(svd(h3$conf)$d) zapsmall(svd(h4$conf)$d) zapsmall(svd(h5$conf)$d) zapsmall(svd(h6$conf)$d)  #Second Example There is a more interesting example in @wang_chin_10, analyzing cross efficiences computed from data of @tofallis_97. In this case the decision making units are 14 airlines. The input variables are aircraft capacity, operating cost, and non-flight assets. The output variables are passenger kilometres and non-passenger revenue. The cross-efficiences are {r tofallis_data, echo = FALSE} tofallis <- structure(c(0.1316, 0.8281, 0.1174, 0.0419, 0.0347, 0.1182, 0.0789, 0.2187, 0.2145, 0.2179, 0, 0.0538, 0, 0, 0.5499, 0.6621, 0.8058, 0.5741, 0.6342, 0.8892, 0.2219, 0.3886, 0.2722, 0.3646, 0, 0.3664, 0.5743, 0.7723, 0.3775, 0.9528, 0.0525, 0.2966, 0, 0.0437, 0.5227, 0.4838, 0.4924, 0.348, 0.5713, 0.25, 0, 0, 0.1316, 0.8281, 0.1174, 0.0419, 0.0347, 0.1182, 0.0789, 0.2187, 0.2145, 0.2179, 0, 0.0538, 0, 0, 0.1508, 0.8265, 0.1156, 0.0587, 0, 0.122, 0.1205, 0.2297, 0.2111, 0.175, 0, 0.0397999999999999, 0, 0, 0.5274, 0.9753, 0.3102, 0.3027, 0, 0.0234, 0.6618, 0.7076, 0.7323, 0.6436, 0.5582, 0.5605, 0.5445, 0, 0.1892, 0.7521, 0.2768, 0.1772, 0.2296, 0.3385, 0, 0.1542, 0.1218, 0.222, 0, 0.0638, 0, 0.2205, 0.2119, 0.7276, 0.3167, 0.215, 0.2641, 0.3916, 0, 0.1412, 0.0928, 0.2056, 0, 0.0605, 0, 0.2725, 0.2969, 0.7192, 0.3775, 0.3009, 0.2222, 0.4901, 0.1605, 0.1792, 0.0523, 0, 0, 0.000199999999999978, 0, 0.3522, 0.2488, 0.7942, 0.2154, 0.1887, 0, 0.2824, 0.2192, 0.2468, 0.1625, 0, 0, 0, 0.0157, 0.1431, 0.1316, 0.8281, 0.1174, 0.0419, 0.0347, 0.1182, 0.0789, 0.2187, 0.2145, 0.2179, 0, 0.0538, 0, 0, 0.2287, 0.7975, 0.1928, 0.1659, 0, 0.2522, 0.1988, 0.2369, 0.1631, 0.0281, 0, 0, 0, 0.1162, 0.1316, 0.8281, 0.1174, 0.0419, 0.0347, 0.1182, 0.0789, 0.2187, 0.2145, 0.2179, 0, 0.0538, 0, 0, 0.1316, 0.8281, 0.1174, 0.0419, 0.0347, 0.1182, 0.0789, 0.2187, 0.2145, 0.2179, 0, 0.0538, 0, 0), .Dim = c(14L, 14L), .Dimnames = list(NULL, c("V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12", "V13", "V14"))) tofallis <- as.matrix(tofallis) row.names(tofallis) <- LETTERS[1:14] colnames(tofallis) <- letters[1:14] mprint(tofallis)  {r tofallis_augment, echo = FALSE} dd <- matrix (0, 28, 28) dd[1:14, 15:28] <- tofallis dd[15:28, 1:14] <- t(tofallis) ww <- matrix (0, 28, 28) ww[1:14, 15:28] <- 1 ww[15:28, 1:14] <- 1  We repeat the same analysis as in the first example. In this case we conclude the Gower rank is equal to three. {r tofallis_analysis, echo = FALSE, cache = FALSE} h1 <- smacofSym(dd, ndim = 1, init = "random", weightmat = ww, eps = 1e-15, itmax = 10000) h2 <- smacofSym(dd, ndim = 2, init = "random", weightmat = ww, eps = 1e-15, itmax = 10000) h3 <- smacofSym(dd, ndim = 3, init = "random", weightmat = ww, eps = 1e-15, itmax = 10000) h4 <- smacofSym(dd, ndim = 4, init = "random", weightmat = ww, eps = 1e-15, itmax = 10000) h5 <- smacofSym(dd, ndim = 5, init = "random", weightmat = ww, eps = 1e-15, itmax = 10000) h6 <- smacofSym(dd, ndim = 6, init = "random", weightmat = ww, eps = 1e-15, itmax = 10000)  {r tofallis_out, echo = FALSE} cat (" ndim = 1 | iterations = ", formatC(h1$niter, width = 4, format = "d"), " | stress = ", formatC (h1$stress, digits = 10, width = 12, format = "f"), "\n") cat (" ndim = 2 | iterations = ", formatC(h2$niter, width = 4, format = "d"), " | stress = ", formatC (h2$stress, digits = 10, width = 12, format = "f"), "\n") cat (" ndim = 3 | iterations = ", formatC(h3$niter, width = 4, format = "d"), " | stress = ", formatC (h3$stress, digits = 10, width = 12, format = "f"), "\n") cat (" ndim = 4 | iterations = ", formatC(h4$niter, width = 4, format = "d"), " | stress = ", formatC (h4$stress, digits = 10, width = 12, format = "f"), "\n") cat (" ndim = 5 | iterations = ", formatC(h5$niter, width = 4, format = "d"), " | stress = ", formatC (h5$stress, digits = 10, width = 12, format = "f"), "\n") cat (" ndim = 6 | iterations = ", formatC(h6$niter, width = 4, format = "d"), " | stress = ", formatC (h6$stress, digits = 10, width = 12, format = "f"), "\n")  The singular values for the second example are {r tofallis_svd, echo = FALSE} zapsmall(svd(h1$conf)$d) zapsmall(svd(h2$conf)$d) zapsmall(svd(h3$conf)$d) zapsmall(svd(h4$conf)$d) zapsmall(svd(h5$conf)$d) zapsmall(svd(h6$conf)$d)  #Third Example The third example was already analyzed in @deleeuw_groenen_mair_E_16e. The @ekman_54 color data give similarities between 14 colors. We transform to dissimilarities by using the transformation $(1-x)^3$, and we repeat the analysis from the previous sections. Note that this is not an unfolding example, all weights are equal to one. The Gower rank turns out to be two, i.e. for this transformation the two-dimensional smacof solution is the global minimizer of stress in $p\geq 2$ dimensions. {r ekman_example, echo = FALSE} data(ekman, package = "smacof") ekman <- as.matrix(1 - ekman) ekman <- ekman ^ 3  {r ekman_analysis, echo = FALSE, cache = FALSE} h1 <- smacofSym(ekman, ndim = 1, init = "random", eps = 1e-15, itmax = 10000) h2 <- smacofSym(ekman, ndim = 2, init = "random", eps = 1e-15, itmax = 10000) h3 <- smacofSym(ekman, ndim = 3, init = "random", eps = 1e-15, itmax = 10000) h4 <- smacofSym(ekman, ndim = 4, init = "random", eps = 1e-15, itmax = 10000) h5 <- smacofSym(ekman, ndim = 5, init = "random", eps = 1e-15, itmax = 10000) h6 <- smacofSym(ekman, ndim = 6, init = "random", eps = 1e-15, itmax = 10000)  {r ekman_out, echo = FALSE} cat (" ndim = 1 | iterations = ", formatC(h1$niter, width = 4, format = "d"), " | stress = ", formatC (h1$stress, digits = 10, width = 12, format = "f"), "\n") cat (" ndim = 2 | iterations = ", formatC(h2$niter, width = 4, format = "d"), " | stress = ", formatC (h2$stress, digits = 10, width = 12, format = "f"), "\n") cat (" ndim = 3 | iterations = ", formatC(h3$niter, width = 4, format = "d"), " | stress = ", formatC (h3$stress, digits = 10, width = 12, format = "f"), "\n") cat (" ndim = 4 | iterations = ", formatC(h4$niter, width = 4, format = "d"), " | stress = ", formatC (h4$stress, digits = 10, width = 12, format = "f"), "\n") cat (" ndim = 5 | iterations = ", formatC(h5$niter, width = 4, format = "d"), " | stress = ", formatC (h5$stress, digits = 10, width = 12, format = "f"), "\n") cat (" ndim = 6 | iterations = ", formatC(h6$niter, width = 4, format = "d"), " | stress = ", formatC (h6$stress, digits = 10, width = 12, format = "f"), "\n")  The singular values for the third example are {r ekman_svd, echo = FALSE} zapsmall(svd(h1$conf)$d) zapsmall(svd(h2$conf)$d) zapsmall(svd(h3$conf)$d) zapsmall(svd(h4$conf)$d) zapsmall(svd(h5$conf)$d) zapsmall(svd(h6$conf)$d)  #NEWS 001 06/06/16 * First release 002 08/06/16 * Third Example 003 08/06/16 * Added something on uniqueness * Added Gower conjecture #References