# More on Inverse Multidimensional Scaling Jan de Leeuw, Patrick Groenen, Patrick Mair Version 023, Februari 22, 2016 Note: This is a working paper which will be expanded/updated frequently. It is a dynamic and reproducible upgrade of @deleeuw_U_12b, which itself is an update of @deleeuw_groenen_A_97. The directory [gifi.stat.ucla.edu/inverse](http://gifi.stat.ucla.edu/inverse) has a pdf copy of this article, the complete Rmd file with all code chunks, the bib file, and the R source code. #Problem A *multidimensional scaling* or *MDS* problem, as used in this paper, starts with a matrix of *weights* $W$ and a matrix of *dissimilarities* $\Delta$. Both $W$ and $\Delta$ are non-negative, symmetric, and hollow (i.e. have zero diagonal). The problem is to find an $n\times p$ *configuration* $X$ such that the *stress* loss function, first defined in @kruskal_64a, @kruskal_64b, \sigma(X, W, \Delta):=\mathop{\sum\sum}_{1\leq i0\right\},\\ V&:=\mathop{\sum\sum}_{1\leq i0$for all$i\not= j$. This can be done without loss of generality. If some of the distances are zero, then the corresponding iMDS problem can be reduced to a regular problem with a smaller number of points (@deleeuw_groenen_mair_E_16g). Also, an$n\times p$configuration$X$is *normalized* if it is column-centered and has rank$p$. For such$X$there exist$n\times (n-p-1)$centered orthonormal matrix$K$such that$K'X=0$. In iMDS we will always assume that$X$is both regular and normalized. We also assume, unless it is explicitly stated otherwise, that all off-diagonal weights$w_{ij}$are non-zero. We use$A\circ B$for the elementwise or Hadamard product of two matrices or vectors. The elementwise generalized inverse of$A$is$A^\dagger$, with $$a_{ij}^\dagger=\begin{cases}a_{ij}^{-1}&\text{ if }a_{ij}\not= 0,\\0&\text{ if }a_{ij}=0.\end{cases}$$ Note that all binary matrices satisfy$A^\dagger=A$.$E$is the elementwise complement of the identity matrix, i.e. a matrix with all elements equal to one, except for the diagonal, which is zero. And$e$is a vector with all elements equal to one. Our code also has the two R functions lower_triangle() and fill_symmetric(), which correspond to the two linear operators$\mathbf{low}()$and$\mathbf{fill}()$. If$A$is symmetric of order$n$, then$\mathbf{low}(A)$are the elements of$A$below the diagonal ravelled columnwise in a vector with$\frac12 n(n-1)$elements, and$\mathbf{fill}()$is the inverse of$\mathbf{low}()$. Thus if$A$is  ## [,1] [,2] [,3] ## [1,] 0 6 10 ## [2,] 6 0 14 ## [3,] 10 14 0  then$\mathbf{low}(A)$is r lower_triangle(a)   ## [1] 6 10 14  and$\mathbf{fill}(\mathbf{low}(A))$reproduces$A$r fill_symmetric(lower_triangle(a))   ## [,1] [,2] [,3] ## [1,] 0 6 10 ## [2,] 6 0 14 ## [3,] 10 14 0  **Lemma 1: [Matrix]** Suppose$X$is an$n\times p$matrix or rank$p$. Suppose$K$is an$n\times(n-p)$orthonormal basis for the null space of$X$. Then the symmetrix matrix$A$satisfies$AX=0$if and only if there is a symmetric$S$such that$A=KSK'$. **Proof:** Suppose$X=LS$with$L$an orthonormal basis for the column space of$X$and$S$non-singular. Write$A$as $$A=\begin{bmatrix}L&K\end{bmatrix} \begin{bmatrix} A_{11}&A_{12}\\A_{21}&A_{22} \end{bmatrix} \begin{bmatrix}L'\\K'\end{bmatrix}.$$ Then$AX=LA_{11}S+KA_{21}S=0$if and only if$A_{11}=0$and$A_{21}=0$, and by symmetry$A_{12}=0$. Thus$A=KA_{22}K'$. **QED** **Theorem 1: [Key]**$\Delta\in\Delta(W,X)$if and only if there is a symmetric$S$of order$n-p-1$such that for all$i\not= j$$$\Delta=D(X)\circ(E-W^\dagger\circ KSK').\label{E:key}$$ Thus$\Delta(W,X)$is a non-empty affine set, closed under linear combinations with coefficients that add up to one. **Proof:** By lemma 1 we have$(V-B(X))X=0$if and only if there is a symmetric$S$such that$V-B(X)=KSK'$. This can be rearranged to yield$\eqref{E:key}$. Note that always$D(X)\in\Delta(W,X)$. **QED** Note that theorem 1 implies that if$\Delta_1$and$\Delta_2$are in$\Delta(W,X)$, then so is the whole line through$\Delta_1$and$\Delta_2$. Specifically, if we compute a solution to the stationary equations$X$with some MDS algorithm such as smacof, then the line through the data$\Delta$and$D(X)$is in$\Delta(X,W)$. Of course$X\in X(W,\Delta)$if and only if$\Delta\in\Delta(W,X)$. The next result is corollary 6.3 in @deleeuw_groenen_A_97. **Corollary 1: [Full]** If$p=n-1$then$\Delta\in\Delta(W,X)$if and only if$W\circ\Delta=W\circ D(X)$if and only if$\sigma(X,W,\Delta)=0$. **Proof:** If$p=n-1$then$S$in theorem 1 is of order zero. **QED** For any two elements of$\Delta(X,W)$, one cannot be elementwise larger (or smaller) than the other. This is corollary 3.3 in @deleeuw_groenen_A_97. **Corollary 2: [Dominate]** If$\Delta_1$and$\Delta_2$are both in$\Delta(X,W)$and$\Delta_1\leq\Delta_2$, then$W\circ\Delta_1=W\circ\Delta_2$. **Proof:** With obvious notation$\mathbf{tr}\ X'(B_1(X)-B_2(X))X=0$, which can be written as $$\mathop{\sum\sum}_{1\leq i Figure 1: De Gruijter configuration The maximum and minimum dissimilarities in \Delta(X,W)\cap\Delta_+ are  ## KVP PvdA VVD ARP CHU CPN PSP BP D66 ## KVP 0.0 32.6 9.7 9.3 24.6 24.3 11.7 18.9 15.2 ## PvdA 32.6 0.0 25.7 36.9 20.7 34.1 36.1 29.7 22.2 ## VVD 9.7 25.7 0.0 17.4 32.0 34.4 22.3 23.5 20.4 ## ARP 9.3 36.9 17.4 0.0 28.0 33.7 25.6 28.9 23.0 ## CHU 24.6 20.7 32.0 28.0 0.0 20.8 31.9 33.3 26.3 ## CPN 24.3 34.1 34.4 33.7 20.8 0.0 24.9 30.7 31.8 ## PSP 11.7 36.1 22.3 25.6 31.9 24.9 0.0 23.7 27.7 ## BP 18.9 29.7 23.5 28.9 33.3 30.7 23.7 0.0 35.0 ## D66 15.2 22.2 20.4 23.0 26.3 31.8 27.7 35.0 0.0   ## KVP PvdA VVD ARP CHU CPN PSP BP D66 ## KVP 0.00 3.48 0.00 0.00 2.34 0.00 0.00 0.00 0.00 ## PvdA 3.48 0.00 0.00 0.00 0.00 0.00 0.93 0.00 0.00 ## VVD 0.00 0.00 0.00 0.00 0.83 0.00 0.00 0.00 0.00 ## ARP 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ## CHU 2.34 0.00 0.83 0.00 0.00 0.00 0.00 0.00 0.00 ## CPN 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ## PSP 0.00 0.93 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ## BP 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ## D66 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00  #Second Order Inverse MDS As we mentioned before, stationary values are not local minima. It is necessary for a local minimum that the stationary equations are satisfied, but it is also necessary that at a solution of the stationary equations the Hessian is positive semi-definite. Thus it becomes interesting to find the dissimilarities in \Delta(W,X)\cap\Delta_+ for which the Hessian is positive semi-definite. Define$$ \Delta_H(W,X):=\{\Delta\mid\mathcal{D}^2\sigma(X)\gtrsim 0\}.$$We can now study$\Delta(W,X)\cap\Delta_H(W,X)$or$\Delta(W,X)\cap\Delta_H(W,X)\cap\Delta_+$. **Theorem 5: [Hessian]**$\Delta_H(W,X)$is a convex non-polyhedral set. **Proof:** In MDS the Hessian is$2(V-H(x,W,\Delta))$, where H(x,W,\Delta):=\mathop{\sum\sum}_{1\leq i Figure 2: De Gruijter minimum iStress configuration #Appendix: Code r library (rcdd) library (numDeriv) library (vertexenum) library (MASS) library (lpSolve) library (quadprog) dyn.load("nextCombination.so") dyn.load("cleanup.so") inverseMDS <- function (x) { n <- nrow (x) m <- ncol (x) x <- apply (x, 2, function (y) y - mean (y)) nm <- n - (m + 1) kk <- cbind (1, x, matrix (rnorm (n * nm), n , nm)) kperp <- as.matrix (qr.Q (qr (kk))[, -(1:(m + 1))]) dd <- Euclid (x) k <- 1 base <- matrix (0, n * (n - 1) / 2, nm * (nm + 1) / 2) for (i in 1:nm) { for (j in 1:i) { oo <- outer (kperp[, i], kperp[, j]) if (j != i) { oo <- oo + t(oo) } base[, k] <- lower_triangle (dd + (1 - oo)) k <- k + 1 print (c(i, j, k)) } } return (base = cbind (lower_triangle (dd), base)) } inversePlus <- function (base, affine = TRUE) { if (affine) { hrep <- makeH ( a1 = d2q (-base), b1 = d2q (rep (0, nrow (base))), a2 = d2q (rep (1, ncol (base))), b2 = d2q (1) ) } else { hrep <- makeH (a1 = d2q (-base), b1 = d2q (rep (0, nrow (base)))) } vrep <- scdd (hrep) hrep <- q2d (hrep) vrep <- q2d (vrep$output) pr <- tcrossprod (hrep[,-c(1, 2)], vrep[,-c(1, 2)])[-1,] return (list ( base = base, hrep = hrep, vrep = vrep, pr = pr )) } twoPoints <- function (x, y, w = 1 - diag (nrow (x))) { dx <- lower_triangle (as.matrix (dist (x))) dy <- lower_triangle (as.matrix (dist (y))) w <- lower_triangle (w) gx <- makeG (x) gy <- makeG (y) hx <- (dx / w) * gx hy <- (dy / w) * gy lxy <- lm.fit (cbind (hx, -hy), dx - dy) lxx <- lxy$coefficients[1:ncol(hx)] lyy <- lxy$coefficients[-(1:ncol(hx))] return (list(delta1 = dx - hx %*% lxx, delta2 = dy - hy %*% lyy, res = sum (abs(lxy$residuals)), rank = lxy$rank)) } second_partials_stress <- function (x, delta, w = nonDiag (nrow (x))) { n <- nrow (x) p <- ncol (x) d <- Euclid (x) fac <- (w * delta) / (d + diag (n)) dd <- d * d v <- vmat (w) deri <- direct_sum (repList (v, p)) xx <- as.vector (x) for (i in 1:(n - 1)) { for (j in (i + 1):n) { aa <- direct_sum (repList (aijn (i, j, n), p)) ax <- drop (aa %*% xx) deri <- deri - fac[i, j] * (aa - outer (ax, ax) / dd[i, j]) } } return (deri) } second_partials_numerical <- function (x, delta, w = nonDiag (nrow (x))) { stress <- function (x, delta, w) { n <- nrow (delta) p <- length (x) / n d <- Euclid (matrix (x, n, p)) res <- delta - d return (sum (w * res * res) / 2) } return (hessian (stress, x, delta = delta, w = w)) } lower_triangle <- function (x) { n <- nrow (x) return (x[outer(1:n, 1:n, ">")]) } fill_symmetric <- function (x) { m <- length (x) n <- (1 + sqrt (1 + 8 * m)) / 2 d <- matrix (0, n, n) d[outer(1:n, 1:n, ">")] <- x return (d + t(d)) } Euclid <- function (x) { c <- tcrossprod (x) d <- diag (c) return (sqrt (abs (outer (d, d, "+") - 2 * c))) } circular <- function (n) { x <- seq (0, 2 * pi, length = n + 1) z <- matrix (0, n + 1, 2) z[, 1] <- sin (x) z[, 2] <- cos (x) return (z[-1,]) } direct_sum <- function (x) { n <- length (x) nr <- sapply (x, nrow) nc <- sapply (x, ncol) s <- matrix (0, sum (nr), sum (nc)) k <- 0 l <- 0 for (j in 1:n) { s[k + (1:nr[j]), l + (1:nc [j])] <- x[[j]] k <- k + nr[j] l <- l + nc[j] } return (s) } ein <- function (i, n) { return (ifelse (i == 1:n, 1, 0)) } aijn <- function (i, j, n) { d <- ein (i, n) - ein (j, n) return (outer (d, d)) } nextCombination <- function (x, n) { m <- length (x) if (all (x == ((n - m) + 1:m))) return (NULL) z <- .C("nextCombination", as.integer(n), as.integer (m), as.integer(x)) return (z[[3]]) } repList <- function (x, n) { z <- list () for (i in 1:n) { z <- c (z, list (x)) } return (z) } nonDiag <- function (n) { return (matrix (1, n, n) - diag (n)) } bmat <- function (x, delta, w = nonDiag (nrow (x))) { d <- Euclid (x) bmat <- -(w * delta) / (d + diag (nrow (x))) diag (bmat) <- -rowSums (bmat) return (bmat) } vmat <- function (w) { vmat <- -w diag (vmat) <- -rowSums (vmat) return (vmat) } torgerson <- function (delta, p = 2) { n <- nrow (delta) dd <- delta * delta sd <- rowSums (dd) / n ed <- sum (dd) / (n * n) cc <- -(dd - outer (sd, sd, "+") + ed) / 2 ee <- eigen (cc) ea <- ee$values eb <- ifelse (ea > 0, sqrt (abs (ea)), 0) return (ee$vectors[, 1:p] %*% diag (eb[1:p])) } cleanUp <- function (a, eps = 1e-3) { nv <- nrow (a) ind <- rep (TRUE, nv) for (i in 1:(nv - 1)) { xx <- a[i,] for (j in (i + 1):nv) { if (!ind[j]) next yy <- a[j,] mm <- max (abs (xx - yy)) if (mm < eps) ind[j] <- FALSE } } return (ind) } bruteForce <- function (a, b, eps = 1e-3) { n <- nrow (a) m <- ncol (a) cb <- combn (n, m) n1 <- ncol (cb) ind <- rep(TRUE, n1) ht <- numeric() for (i in 1:n1) { gg <- a[cb[, i], ] bg <- b[cb[, i]] qg <- qr(gg) if (qg$rank < m) { ind[i] <- FALSE next } hh <- solve (qg, bg) hg <- drop (a %*% hh) if (min (b - hg) < -eps) { ind[i] <- FALSE next } ht <- c(ht, hh) } n2 <- sum (ind) ht <- t (matrix (ht, m, n2)) ind <- cleanUp (ht, eps) n3 <- sum (ind) return (list ( x = ht[ind,], n1 = n1, n2 = n2, n3 = n3 )) } bruteForceOne <- function (a, b, p, q, v, eps = 1e-3) { n <- nrow (a) m <- ncol (a) ind <- which ((q - v %*% p) > -eps ) v <- v[ind, ] cb <- combn (n, m - 1) n1 <- ncol (cb) ind <- rep(TRUE, nb) ht <- numeric() for (i in 1:n1) { gg <- rbind (a[cb[, i], ], p) bg <- c (b[cb[, i]], q) qg <- qr(gg) if (qg$rank < m) { ind[i] <- FALSE next } hh <- solve (qg, bg) hg <- drop (a %*% hh) if (min (b - hg) < -eps) { ind[i] <- FALSE next } ht <- c(ht, hh) } n2 <- sum (ind) ht <- matrix (ht, m, n2) ind <- .C ("cleanup", as.double(ht), as.integer(n2), as.integer(m), as.integer(rep(1, n2)), as.double (eps))[[4]] n3 <- sum (ind) return (list ( x = t(ht)[ind, ], n1 = n1, n2 = n2, n3 = n3 )) } rankTest <- function (x, a, b, eps = 1e-3) { h <- drop (a %*% x) ind <- which (abs (h - b) < eps) r <- qr (a[ind,])$rank f <- min (b - h) > -eps return (list (rank = r, feasibility = f)) } makeDC <- function (x) { y <- -x diag(y) <- -rowSums (y) return (y) } bmat <- function (delta, w, d) { n <- nrow (w) dd <- ifelse (d == 0, 0, 1 / d) return (makeDC (w * delta * dd)) } smacof <- function (delta, w, xini, eps = 1e-6, itmax = 100, verbose = TRUE) { n <- nrow (xini) xold <- xini dold <- as.matrix (dist (xold)) sold <- sum (w * (delta - dold) ^ 2) / 2 itel <- 1 v <- ginv (makeDC (w)) repeat { b <- bmat (delta, w, dold) xnew <- v %*% b %*% xold dnew <- as.matrix (dist (xnew)) snew <- sum (w * (delta - dnew) ^ 2) / 2 if (verbose) { cat ( formatC (itel, width = 4, format = "d"), formatC ( sold, digits = 10, width = 13, format = "f" ), formatC ( snew, digits = 10, width = 13, format = "f" ), "\n" ) } if ((itel == itmax) || (sold - snew) < eps) break () itel <- itel + 1 sold <- snew dold <- dnew xold <- xnew } return (list ( x = xnew, d = dnew, s = snew, itel = itel )) } oneMore <- function (g, u) { v <- bruteForce (g, u)$x nv <- nrow (v) s <- matrix (0, 2, 2) ev <- rep (0, nv) for (i in 1:nv) { s[1, 1] <- v[i, 1] s[2, 2] <- v[i, 2] s[1, 2] <- s[2, 1] <- v[i, 3] ee <- eigen (s) ev[i] <- min (ee$values) if (ev[i] < 0) { yy <- ee$vectors[, 2] hh <- c (yy[1] ^ 2, yy[2] ^ 2, 2 * yy[1] * yy [2]) g <- rbind (g, -hh) u <- c (u, 0) } } return (list (v = v, g = g, u = u, e = ev)) } makeG <- function (x) { n <- nrow (x) p <- ncol (x) m <- n - p - 1 k <- qr.Q(qr(cbind(1, x, diag (n))))[, -c(1:(p + 1))] g <- matrix (0, n * (n - 1) / 2, m * (m + 1) / 2) l <- 1 if (m == 1) { g[, 1] <- lower_triangle (outer (k, k)) } else { for (i in 1:m) { g[, l] <- lower_triangle (outer(k[, i], k[, i])) l <- l + 1 } for (i in 1:(m - 1)) for (j in (i + 1):m) { g[, l] <- lower_triangle (outer(k[, i], k[, j]) + outer(k[, j], k[, i])) l <- l + 1 } } return (g) } iStress <- function (x, delta, w = rep (1, length (delta)), only = TRUE) { m <- length (delta) n <- (1 + sqrt (1 + 8 * m)) / 2 x <- matrix (x, n, length (x) / n) d <- lower_triangle (as.matrix (dist (x))) g <- makeG (x) h <- (d / w) * makeG (x) u <- - colSums(w * (delta - d) * h) v <- crossprod (h, w * h) s <- solve.QP (Dmat = v, dvec = u, Amat = -t(h), bvec = -d) ds <- d - h %*% s\$solution is <- sum (w * (delta - ds) ^ 2) if (only) return (is) else return (list (istress = is, delta = fill_symmetric (ds))) }  #NEWS 001 01/29/16 Incomplete first upload 002 01/30/16 Added preliminary section on inverse FDS 003 01/31/16 Added preliminary section on using Hessian 004 02/01/16 * Second example in detail. * Many small edits. 005 02/02/16 * Some sectioning. * More on Hessian. 006 02/02/16 More on FDS. 007 02/02/16 Added theorem [Bounded] with new proof. 008 02/04/16 Some work with vertexenum package. 009 02/04/16 Added theorem [Dominate]. 010 02/05/16 * Many small edits. * Switched to Hadamard products. * Added theorems [Full] and [Norm] * More sectioning * Some rearrangement 011 02/06/16 * Added code for cleanUp * Added code and results for bruteForce * Added stress maxima over non-negative delta 012 02/06/16 * Resolved the number of vertices problem for example 3 * Added code for rankTest 013 02/10/16 * Sketch of matrix result * Incorrect handling of zero weights and/or distances addressed * Folded in smacof code * Added unfolding example 014 02/15/16 * Proof matrix result * More on iStress * Second derivatives for example 3 * More on second order iMDS 015 02/15/16 * Added sensitivity section and example 016 02/16/16 * Added theorem [Line] * Added caches 017 02/17/16 * iFDS example * added diameters 018 02/18/16 * added bruteForceOne() to update H to V conversion 019 02/19/16 * rewrote some of the Problem section * moved the matrix lemma * many small edits 020 02/20/16 * removed Ekman example * added De Gruijter example 021 02/20/16 * added iStress minimization 022 02/21/16 * iStress example * small edits * use makeG() function throughout * introduce fill() and low() * discussed the vector implementation 023 02/22/16 * work on multiple solutions section * nextCombination() * cleanup() translated to C #References