```{r global_perspective, fig.align= "center", fig.width = 10, fig.height = 10, echo = FALSE, cache = FALSE} par(pty="s") persp(aa, bb, z, col = "RED", xlab = "theta_1", ylab = "theta_2", zlab = "stress") ```

We see the symmetry, following from the fact that stress is even. We also see the local maximum at the origin, where stress is not differentiable. Also note the ridge, where $d_{12}(\theta)=0$ and where stress is not differentiable either. The ridge shows nicely that on rays emanating from the origin stress is a convex quadratic. Also, far away from the origin, stress globally behaves very much like a convex quadratic (except for the ridge). Clearly local minima must be found in the valleys surrounding the small mountain at the origin, all within the sphere with radius $\sqrt{2}$. Figure `r figure_nums("global_contour", display = "n")` is a countour plot of stress over $(-2,+2)\otimes(-2,+2)$. The red line is $\{\theta\mid d_{12}(\theta) = 0\}$. The blue line has the minimum of the convex quadratic on each of the rays through the origin. Thus all local minima, and in fact all stationary points, are on the blue line. Note that the plot uses $\theta$ to define the coordinate axes, not $\gamma=(\alpha,\beta)$. Thus there are no stationary points at $(0,1)$ and $(1,0)$, but at the corresponding points (`r bsum[,1]`) and (`r bsum[,2]`) in the $\theta$ coordinates (and, of course, at their mirror images). Besides the single local maximum at the origin, it turns out that in this example there are at least five pairs of stationary points. Or, more precisely, I have not been able to find more than five. Each stationary point $\theta$ has a mirror image $-\theta$. Three of the five are local minima, two are saddle points. Local minima are plotted as blue points, saddle points as red points.

```{r global_contour, fig.align= "center", fig.width = 10, fig.height = 10, echo = FALSE, cache = FALSE} contour(aa, bb, z, levels = seq(0,.30,length=50)) aa <- seq (-2 * pi, 2 * pi, length = 100) bb <- sin (aa) cc <- cos (aa) xxx <- matrix (0, 100, 2) for (k in 1:100) { z <- c (bb[k], cc[k]) dd <- matrix (0, 4, 4) for (i in 1:4) for (j in 1:4) { dd[i, j] <- sqrt (sum (uu (i, j, x, y) * outer (z, z))) } lbd <- sum (dd * delta) / sum (dd ^ 2) xxx[k, ] <- lbd * z } lines (xxx[ ,1], xxx[, 2], lwd = 2, col = "BLUE") abline (0, th0[2] /th0[1], col = "RED", lwd = 2) th <-sth1$theta points (th[1], th[2], pch = 19, cex = 1.0, col = "BLUE") th <- -th points (th[1], th[2], pch = 19, cex = 1.0, col = "BLUE") th <-sth2$theta points (th[1], th[2], pch = 19, cex = 1.0, col = "BLUE") th <- -th points (th[1], th[2], pch = 19, cex = 1.0, col = "BLUE") th <-sth3$theta points (th[1], th[2], pch = 19, cex = 1.0, col = "BLUE") th <- -th points (th[1], th[2], pch = 19, cex = 1.0, col = "BLUE") th <-sth4$theta points (th[1], th[2], pch = 19, cex = 1.0, col = "RED") th <- -th points (th[1], th[2], pch = 19, cex = 1.0, col = "RED") th <-sth5$theta points (th[1], th[2], pch = 19, cex = 1.0, col = "RED") th <- -th points (th[1], th[2], pch = 19, cex = 1.0, col = "RED") ```

##First Minimum We zoom in on the first local minimum at (`r sth1$theta`). Its stress is `r sth1$stress` and the corresponding configuration is in figure `r figure_nums("configuration_first_minimum", display = "n")`.

```{r configuration_first_minimum, fig.align= "center", echo = FALSE, cache = FALSE} par(pty="s") plot(xso1, type = "n", xlab = "dim 1", ylab = "dim 2") text (xso1, as.character(1:4),col = "RED") ```

Note that this local minimum corresponds with the equilateral triangle with center, which is a saddle point in configuration space (@trosset_mathar_97). The eigenvalues of $B(\theta)$ are (`r zapsmall(eigen(sth1$b)$values)`) and those of the Hessian $I-H(\theta)$ are (`r zapsmall(eigen(sth1$h)$values)`). The area of the contour plot around the stationary value is in figure `r figure_nums("contour_first_minimum", display = "n")`. ```{r values_first_minimum, echo = FALSE, cache = TRUE} aa <- bb <- seq (.7, 1.2, length = 100) z <- matrix (0, 100, 100) for (i in 1:100) for (j in 1:100) z[i, j]<- stress (aa[i], bb[j], x, y, delta) ```

```{r contour_first_minimum, fig.align= "center", fig.width = 10, fig.height = 10, echo = FALSE, cache = FALSE} par(pty="s") contour(aa, bb, z, levels = seq(0.065,.08,length=100)) th <- sth1$theta points (th[1], th[2], pch = 19, cex = 1.5, col = "BLUE") ```

##Second Minimum The second local minimum (which is the global minimum) at (`r zapsmall(sth2$theta)`) has stress `r sth2$stress` and the corresponding configuration is in figure `r figure_nums("configuration_second_minimum", display = "n")`.

```{r configuration_second_minimum, fig.align= "center", echo = FALSE, cache = FALSE} par(pty="s") plot(xso2, type = "n", xlab = "dim 1", ylab = "dim 2") text (xso2, as.character(1:4),col = "RED") ```

The eigenvalues of $B(\theta)$ are (`r zapsmall(eigen(sth2$b)$values)`) and those of the Hessian $I-H(\theta)$ are (`r zapsmall(eigen(sth2$h)$values)`). The area of the contour plot around the stationary value is in figure `r figure_nums("contour_second_minimum", display = "n")`. ```{r values_second_minimum, echo = FALSE, cache = TRUE} aa <- seq (1, 2, length = 100) bb <- seq (-.5,.5, length=100) z <- matrix (0, 100, 100) for (i in 1:100) for (j in 1:100) z[i, j]<- stress (aa[i], bb[j], x, y, delta) ```

```{r contour_second_minimum, fig.align= "center", fig.width = 10, fig.height = 10, echo = FALSE, cache = FALSE} par(pty="s") contour(aa, bb, z, levels = seq(0.02,.10,length=100)) th <- sth2$theta points (th[1], th[2], pch = 19, cex = 1.5, col = "BLUE") ```

##Third Minimum The third local minimum at (`r sth3$theta`) has stress `r sth3$stress` and the corresponding configuration is in figure `r figure_nums("configuration_third_minimum", display = "n")`.

```{r configuration_third_minimum, fig.align= "center", echo = FALSE, cache = FALSE} par(pty="s") plot(xso3, type = "n", xlab = "dim 1", ylab = "dim 2") text (xso3, as.character(1:4),col = "RED") ```

The eigenvalues of $B(\theta)$ are (`r zapsmall(eigen(sth3$b)$values)`) and those of the Hessian $I-H(\theta)$ are (`r zapsmall(eigen(sth3$h)$values)`). The area of the contour plot around the stationary value is in figure `r figure_nums("contour_third_minimum", display = "n")`. ```{r values_third_minimum, echo = FALSE, cache = TRUE} aa <- seq (-.5, .5, length = 100) bb <- seq (1, 2, length = 100) z <- matrix (0, 100, 100) for (i in 1:100) for (j in 1:100) z[i, j]<- stress (aa[i], bb[j], x, y, delta) ```

```{r contour_third_minimum, fig.align= "center", fig.width = 10, fig.height = 10, echo = FALSE, cache = FALSE} par(pty="s") contour(aa, bb, z, levels = seq(.11,.20,length=100)) th <- sth3$theta points (th[1], th[2], pch = 19, cex = 1.5, col = "BLUE") ```

##First Saddle Point The saddle point at (`r sth4$theta`) has stress `r sth4$stress` and the corresponding configuration is in figure `r figure_nums("configuration_first_saddlepoint", display = "n")`.

```{r configuration_first_saddlepoint, fig.align= "center", echo = FALSE, cache = FALSE} par(pty="s") plot(xso4, type = "n", xlab = "dim 1", ylab = "dim 2") text (xso4, as.character(1:4),col = "RED") ```

The eigenvalues of $B(\theta)$ are (`r zapsmall(eigen(sth4$b)$values)`) and those of the Hessian $I-H(\theta)$ are (`r zapsmall(eigen(sth4$h)$values)`). The area of the contour plot around the stationary value is in figure `r figure_nums("contour_first_saddlepoint", display = "n")`. ```{r values_first_saddlepoint, echo = FALSE, cache = TRUE} aa <- seq (.30, .34, length = 100) bb <- seq (1.25, 1.30, length = 100) z <- matrix (0, 100, 100) for (i in 1:100) for (j in 1:100) z[i, j]<- stress (aa[i], bb[j], x, y, delta) ```

```{r contour_first_saddlepoint, fig.align= "center", fig.width = 10, fig.height = 10, echo = FALSE, cache = FALSE} par(pty="s") contour(aa, bb, z, levels = seq (.112, .113, length=100)) th <- sth4$theta points (th[1], th[2], pch = 19, cex = 1.5, col = "RED") ```

##Second Saddle Point The saddle point at (`r sth5$theta`) has stress `r sth5$stress` and the corresponding configuration is in figure `r figure_nums("configuration_second_saddlepoint", display = "n")`.

```{r configuration_second_saddlepoint, fig.align= "center", echo = FALSE, cache = FALSE} par(pty="s") plot(xso5, type = "n", xlab = "dim 1", ylab = "dim 2") text (xso5, as.character(1:4),col = "RED") ```

The eigenvalues of $B(\theta)$ are (`r zapsmall(eigen(sth5$b)$values)`) and those of the Hessian $I-H(\theta)$ are (`r zapsmall(eigen(sth5$h)$values)`). The area of the contour plot around the stationary value is in figure `r figure_nums("contour_second_saddlepoint", display = "n")`. ```{r values_second_saddlepoint, echo = FALSE, cache = TRUE} aa <- seq (1.11, 1.14, length = 100) bb <- seq (.76, .79, length = 100) z <- matrix (0, 100, 100) for (i in 1:100) for (j in 1:100) z[i, j]<- stress (aa[i], bb[j], x, y, delta) ```

```{r contour_second_saddlepoint, fig.align= "center", fig.width = 10, fig.height = 10, echo = FALSE, cache = FALSE} par(pty="s") contour(aa, bb, z, levels = seq (.0672, .0678, length=100)) th <- sth5$theta points (th[1], th[2], pch = 19, cex = 1.5, col = "RED") ```

#Regions of Attraction ##Smacof We use the `smacof()` function from the code in the appendix with 100 different starting points of $\theta$, equally spaced on the circle. Figure `r figure_nums("histogram_smacof", display = "n")` is a histogram of the number of smacof iterations to convergence within $1e-15$. In all cases `smcof` converges to a local minimum in coefficient space, never to a saddle point. Figure `r figure_nums("path_smacof", display = "n")` shows which local minima are reached from the different starting points. This shows, more or less contrary to what @trosset_mathar_97 suggest, that non-global minima can indeed be points of attraction for `smacof` iterations. ```{r one_hundred_smacof, echo = FALSE, cache = TRUE} xx <- seq (-2*pi, 2*pi, length = 100) aa <- sin (xx) bb <- cos (xx) zs <- as.list(1:100) ss <- ts <- gs <- rep (0, 100) for (i in 1:100) { zs[[i]] <- smacof(aa[i], bb[i], x, y, delta, eps = 1e-15, verbose = FALSE) ss[i] <- zs[[i]]$stress gs[i] <- max(zs[[i]]$g) ts[i] <- min(eigen (zs[[i]]$h)$values) } ```

```{r histogram_smacof, fig.align= "center", echo = FALSE, cache = FALSE} iterations <- sapply(zs, function (z) z$itel) hist(iterations) ```

```{r path_smacof, fig.align= "center", fig.width = 10, fig.height = 10, echo = FALSE, cache = FALSE} par(pty="s") plot (0, xlim = c(-2,2), ylim = c(-2,2), type = "n") lines (aa, bb, col = "RED") for (i in 1:100) { points (zs[[i]]$theta[1], zs[[i]]$theta[2], pch = 19, cex = 1.5, col = "BLUE") lines (matrix(c(aa[i], bb[i], zs[[i]]$theta), 2, 2, byrow = TRUE)) } ```

##Newton ```{r one_hundred_newton, echo = FALSE, cache = TRUE} xx <- seq (-2*pi, 2*pi, length = 100) aa <- sin (xx) bb <- cos (xx) zn <- as.list(1:100) sn <- tn <- gn <- rep (0, 100) for (i in 1:100) { zn[[i]] <- newton(aa[i], bb[i], x, y, delta, eps = 1e-15, verbose = FALSE) sn[i] <- zn[[i]]$stress gn[i] <- max(zn[[i]]$g) tn[i] <- min(eigen (zn[[i]]$h)$values) } ``` We repeat the same exercise with Newton's method, which converges from all 100 starting points. In higher dimensions we may not be so lucky. The histogram of iteration counts is in figure `r figure_nums("histogram_newton", display = "n")`. It shows in this example that `smacof` needs about 10 times the number of iterations that Newton needs. Because `smacof` iterations are much less expensve than Newton ones, this does not really say much about computing times. If we look at figure `r figure_nums("path_newton", display = "n")` we see the problem with non-safeguarded Newton. Although we have fast convergence from all 100 starting points, Newton converges to a saddle point in `r length(which(tn < 0))` cases.

```{r histogram_newton, fig.align= "center", echo = FALSE, cache = FALSE} iterations <- sapply(zn, function (z) z$itel) hist(iterations) ```

```{r path_newton, fig.align= "center", fig.width = 10, fig.height = 10, echo = FALSE, cache = FALSE} par(pty="s") plot (0, xlim = c(-2,2), ylim = c(-2,2), type = "n") lines (aa, bb, col = "RED") for (i in 1:100) { points (zn[[i]]$theta[1], zn[[i]]$theta[2], pch = 19, cex = 1.5, col = "BLUE") lines (matrix(c(aa[i], bb[i], zn[[i]]$theta), 2, 2, byrow = TRUE)) } ```

#Another Look Remember that $\rho(\theta)=\theta'B(\theta)\theta$. Thus $\sigma(\lambda\theta)=1-\lambda\rho(\theta)+\frac12\lambda^2\theta'\theta$, and $$ \min_\lambda\sigma(\lambda\theta)=1-\frac12\frac{\rho^2(\theta)}{\theta'\theta}. $$ Thus we can minimize $\sigma$ over $\theta$ by maximizing $\rho$ over the unit circle $\mathcal{S}:=\{\theta\mid\theta'\theta=1\}$. This is a nice formulation, because $\rho$ is norm, i.e. a homogeneous convex function of $\theta$. Consequently we have transformed the problem from unconstrained minimization of the DC function (i.e. difference of convex functions) stress to that of maximization of a ratio of norms. In turn this is equivalent to maximization of the convex function $\rho$ over the unit circle, or, again equivalently, over the unit ball, a compact convex set. This transform was first used in MDS by @deleeuw_C_77, partly because it made the theory developed by @robert_67 available. The levels sets $\{\theta\mid\rho(\theta)=\kappa\}$ are the $\rho$-circles defined by the norm $\rho$. The corresponding $\rho$-balls $\{\theta\mid\rho(\theta)\leq\kappa\}$ are closed and nested convex sets containing the origin. Thus we want to find the largest $\rho$-circle that still intersects $\mathcal{S}$. The similarity with the geometry of eigenvalue problems is obvious. ```{r ropt, echo = FALSE} ph<-sth2$theta / sqrt (sum (sth2$theta ^ 2)) ropt <- rho (ph[1], ph[2], x, y, delta) ``` In our example we know that the global optimum of stress is at (`r zapsmall(sth2$theta)`), and if we project that point on the circle it becomes (`r zapsmall(ph)`). The corresponding optimal $\rho$ is `r ropt`. Figure `r figure_nums("rho_contour", display = "n")` gives the contourplot for $\rho$, with the outer $\rho$-circle corresponding with the optimal value. The fact that the optimal value contour is disjoint from the interior of $\mathcal{S}$ is necessary and sufficient for global optimality (@dur_horst_locatelli_98). Notice the sharp corners in the contour plot, showing the non-diffentiability of $\rho$ at the points where $d_{12}(\theta)=0$. We could also look for the minimum of $\rho$ on the unit circle, which means finding the largest $\rho$-circle that touches $\mathcal{S}$ on the inside. Inspecting figure `r figure_nums("rho_contour", display = "n")` shows that this will be a point where $\rho$ is not differentiable, i.e. a point with $d_{12}(\theta)=0$. This minimum $\rho$ problem does not make much sense in the context of multidimensional scaling, however, and it not related directly to the minimization of stress. ```{r rho_values, echo = FALSE, cache = TRUE} aa <- bb <- seq (-1.2, 1.2, length = 100) z <- matrix (0, 100, 100) for (i in 1:100) for (j in 1:100) z[i, j]<- rho (aa[i], bb[j], x, y, delta) ```

```{r rho_contour, fig.align= "center", fig.width = 10, fig.height = 10, echo = FALSE, cache = FALSE} par(pty="s") contour(aa, bb, z, levels = c(seq(0,1.3,length = 14), ropt), lwd = 2, col = "BLUE") av <- seq (0, 2 * pi, length = 100) bv <- sin (av) cv <- cos (av) lines(bv,cv,col="RED",lwd=3) ```

#A Final Look Now that we know that the MDS problem is equivalent to maximizing $\rho$ on the unit circle, we can use nonlinear coordinates $(\theta_1,\theta_2)=(\sin\xi,\cos\xi)$ to reduce the problem to a one-dimensional unconstrained one in, say, ``circle space''. Thus, with the same abuse of notation as for stress, $\rho(\xi):=\rho(\sin\xi,\cos\xi)$, and we have to maximize $\rho$ over $0\leq\xi\leq\pi$. In figure `r figure_nums("rho_nonlinear_plot", display = "n")` we have plotted $\rho$ as a function of $\eta$. There are blue vertical lines at the three local minima in coefficient space, red vertical lines at the stationary points, and a green vertical line where $d_{12}(\xi)=0$. Note that in circle space stress has both multiple local minima and multiple local maxima.

```{r rho_nonlinear_values, echo = FALSE, cache = TRUE} par(pty="s") av <- seq (0, pi, length = 10000) bv <- rep(0, 10000) sav <- sin (av) cav <- cos (av) for (i in 1:10000) bv[i] <- rho (sav[i], cav[i], x, y, delta) ``` ```{r rho_nonlinear_plot, fig.align= "center", fig.width = 10, fig.height = 10, echo = FALSE, cache = FALSE} plot (av, bv, type = "l", col = "RED", lwd = 2, xlab = "xi", ylab = "rho") th1 <- sth1$theta th1 <- th1 / sqrt (sum (th1 ^ 2)) xi1 <- asin (th1[1]) abline (v = xi1, col = "BLUE") th2 <- sth2$theta th2 <- th2 / sqrt (sum (th2 ^ 2)) xi2 <- asin (th2[1]) abline (v = xi2, col = "BLUE") th3 <- sth3$theta th3 <- th3 / sqrt (sum (th3 ^ 2)) xi3 <- asin (th3[1]) abline (v = xi3, col = "BLUE") th4 <- sth4$theta th4 <- th4 / sqrt (sum (th4 ^ 2)) xi4 <- asin (th4[1]) abline (v = xi4, col = "RED") th5 <- sth5$theta th5 <- th5 / sqrt (sum (th5 ^ 2)) xi5 <- asin (th5[1]) abline (v = xi5, col = "RED") th0 <- th0 / sqrt (sum (th0 ^ 2)) xi0 <- asin (th0[1]) abline (v = pi+xi0, col = "GREEN") ```

```{r second_derivatives, echo = FALSE} th <- sth1$theta th <- th / sqrt (sum (th ^ 2)) bh <- bmat (th[1], th[2], x, y, delta) eh <- max (eigen (bh$h)$values) rh <- rho (th[1], th[2], x, y, delta) hes1 <- eh - rh th <- sth2$theta th <- th / sqrt (sum (th ^ 2)) bh <- bmat (th[1], th[2], x, y, delta) eh <- max (eigen (bh$h)$values) rh <- rho (th[1], th[2], x, y, delta) hes2 <- eh - rh th <- sth3$theta th <- th / sqrt (sum (th ^ 2)) bh <- bmat (th[1], th[2], x, y, delta) eh <- max (eigen (bh$h)$values) rh <- rho (th[1], th[2], x, y, delta) hes3 <- eh - rh th <- sth4$theta th <- th / sqrt (sum (th ^ 2)) bh <- bmat (th[1], th[2], x, y, delta) eh <- max (eigen (bh$h)$values) rh <- rho (th[1], th[2], x, y, delta) hes4 <- eh - rh th <- sth5$theta th <- th / sqrt (sum (th ^ 2)) bh <- bmat (th[1], th[2], x, y, delta) eh <- max (eigen (bh$h)$values) rh <- rho (th[1], th[2], x, y, delta) hes5 <- eh - rh ``` From lemma `r lemma_nums("nonlinear", display = "n")` we see that the second derivative $\mathcal{D}^2\rho(\xi)$ is equal to $\mathbf{tr}\ H(\xi)-\rho(\xi)$. For the three local minima in coordinate space we find second derivatives `r c(hes1, hes2, hes3)` in circle space, i.e. they are properly converted to local maxima. The two stationary points in coordinate space have second derivatives `r c(hes4, hes5)`, and are turned into local minima. For more general cases, by the way, we know by @lyusternik-schnirelmann_34 that a continuously differentiable even function on the unit sphere in $\mathbb{R}^n$ has at least $n$ distinct pairs of stationary points. #Discussion Although our results are based on a single small example, there are some general conclusions we can draw. Stess has no local maxima, except one at the origin. It has saddle points and non-global minima. If we reparametrize configuration space by using linear combinations of a fixed number of configurations we can ``upgrade'' some of the stationary point to local minima. If we remove homogeneity from the problem by working on the unit ball that can upgrade even more stationary points. In comparing `smacof` with Newton we have found, not surprisingly, that Newton uses fewer iterations but often converges to saddle points. ```{r mathar_trosset, echo = FALSE, cache = TRUE} set.seed(12345) delta <- 1 - diag (4) delta <- delta * sqrt (2 / sum (delta ^ 2)) a<-3+sqrt(3) x<-matrix (c(0, 2*a, a, a, 0, 0, a*sqrt(3), a*sqrt(3)/3), 4, 2) x <- apply (x, 2, function (x) x - mean (x)) dx<- as.matrix(dist(x)) lb <- sum (delta * dx) / sum (dx ^ 2) x <- lb * x z <- matrix (c(0, -2, sqrt(3), -2-sqrt(3)/3, 0, 0, sqrt(3),2 +sqrt(3)), 4, 2) z <- apply (z, 2, function (x) x - mean (x)) bmat2 <- function (x) { dx <- as.matrix (dist (x)) b <- - delta / (dx + diag (4)) diag (b) <- - rowSums (b) return (b) } stress2 <- function (x) { dx <- as.matrix (dist (x)) return (sum ((delta - dx) ^ 2) / 2) } smacof2 <- function (xold, itmax = 10000, eps = 1e-15) { fold <- stress2 (xold) itel <- 1 path <- numeric(0) repeat { b <- bmat2 (xold) xnew <- b %*% xold / 4 fnew <- stress2 (xnew) path <- c(path, fnew) if (((fold - fnew) < eps) || (itel == itmax)) break fold <- fnew xold <- xnew itel <- itel + 1 } return (list (x = xnew, itel = itel, stress = fnew, path = path)) } h <- smacof2 (x + .001 * matrix (rnorm (8), 4, 2)) s1 <- h$path[1] sn <- h$path[h$itel] ss <- stress2(x) ``` The finding that saddle points in configuration space can correspond with local minima in coefficient space has an interesting implication for the `smacof` algorithm. We know that the set of starting points from which `smacof` converges to a saddle point has measure zero, in other words unless you start in a saddle point, you will almost certainly converge to a local minimum (@lee_simchowitz_jordan_recht_16). To illustrate this we repeat a calculation done first in @deleeuw_A_88b. Suppose we start `smacof` iterations with the $Y+.001*E$, where $E$ are random standard normals, and $Y$ is the triangle with center (which has loss function value `r ss`). `smacof` starts with a loss function value for the perturbed $Y$ of `r s1`. It does not drop below 0.066 until iteration `r which(h$path < .066)[1]`. But then it rather quickly converges in iteration `r h$itel` to `r sn`, the loss function value of the global minimum, four points in the corners of a square. On the other hand, if we minimize $\sigma(\gamma)$ over $\gamma$ we have a non-zero probability of converging to the local minimum in coefficient space at $(0,1)$, i.e. to $Y$. This indicates, perhaps, that `smacof` must be used with a somewhat higher precision than the default, and that testing for second derivative information is always a good idea, no matter what space we are working in. Minimizing stress over a plane spanned by two configurations may seem somewhat artificial and limited. But think of the situation in which we use configurations $X$ and $Y=\mathcal{D}\sigma(X)$. Or, equivalently, $X$ and $B(X)X$. In that case minimizing over the plane means computing the optimal step size of a gradient step or the optimum over-relaxation of `smacof` iterations, a problem first addressed perhaps in @deleeuw_heiser_C_80. Or the case in which $Y$ is the Newton step, and optimizing over the plane is a stabilized version of Newton's method. #Appendix: Two Lemmas Here we present two lemmas that describe the change of coordinates from configuration space to coefficient space and then to circle space. The proofs, which are just simple computations, are omitted.

**`r lemma_nums("linear", display = "f")`** Suppose $X$ and Y are $n\times m$ matrices and $f$ is a twice-differentiable function on $\mathbb{R}^{n\times m}$. Define $g(\alpha,\beta):=f(\alpha X+\beta Y)$. Then $$ \mathcal{D}g(\alpha,\beta)=\begin{bmatrix}\mathbf{tr}\ X'\mathcal{D}f(\alpha X+\beta Y)\\ \mathbf{tr}\ Y'\mathcal{D}f(\alpha X+\beta Y)\end{bmatrix}, $$ and $$ \mathcal{D}^2g(\alpha,\beta)= \begin{bmatrix} \sum_{i=1}^n\sum_{j=1}^m\sum_{k=1}^n\sum_{\ell=1}^m x_{ij}x_{k\ell}\frac{\partial f^2}{\partial z_{ij}\partial z_{k\ell}}(\alpha X+\beta Y)& \sum_{i=1}^n\sum_{j=1}^m\sum_{k=1}^n\sum_{\ell=1}^m x_{ij}y_{k\ell}\frac{\partial f^2}{\partial z_{ij}\partial z_{k\ell}}(\alpha X+\beta Y)\\ \sum_{i=1}^n\sum_{j=1}^m\sum_{k=1}^n\sum_{\ell=1}^m y_{ij}x_{k\ell}\frac{\partial f^2}{\partial z_{ij}\partial z_{k\ell}}(\alpha X+\beta Y)& \sum_{i=1}^n\sum_{j=1}^m\sum_{k=1}^n\sum_{\ell=1}^m y_{ij}y_{k\ell}\frac{\partial f^2}{\partial z_{ij}\partial z_{k\ell}}(\alpha X+\beta Y) \end{bmatrix}. $$

It is clear how this result generalizes to linear combinations of more than two matrices.

**`r lemma_nums("nonlinear", display = "f")`** If $f$ is a twice-differentiable function of two variables and $g(x)=f(sin(x),cos(x))$ then $$ \mathcal{D}g(x)=y'\mathcal{D}f(z), $$ with $z:=(sin(x),cos(x))$ and $y:=(cos(x),-sin(x))$, and $$ \mathcal{D}^2g(x)=y'\mathcal{D}^2f(z)y-z'\mathcal{D}f(z). $$

More generally, if a differentiable function $f$ on the unit sphere has a stationary point at $\theta$ then $\theta'\theta=1$ and there is a multiplier $\lambda$ such that $\mathcal{D}f(\theta)=\lambda\theta$. If $f$ is homogeneous of degree one then $\theta'\mathcal{D}f(\theta)=f(\theta)$ and thus at a stationary point $\lambda=f(\theta)$. If the function is twice differentiable and the stationary point is a local maximum then in addition $\nu'\mathcal{D}^2f(\theta)\nu\leq\lambda$ for all $\nu'\nu=1$ and $\nu'\theta=0$. For a norm, i.e. a homogeneous convex $f$, this says equivalently that the largest eigenvalue of $\mathcal{D}^2f(\theta)$ is less than or equal to $f(\theta)$. Note that the second order necessary conditions for a local maximum become sufficient if the inequality is strict. #Appendix: Code ```{r display_code, eval = FALSE} bmat <- function (a, b, x, y, delta) { bm <- matrix (0, 2, 2) hm <- matrix (0, 2, 2) z <- c(a, b) for (i in 1:4) { for (j in 1:4) { if (i == j) next uij <- uu (i, j, x, y) uz <- drop (uij %*% z) dij <- sqrt (sum (uij * outer (z, z))) bm <- bm + (delta[i,j] / dij) * uij hm <- hm + (delta[i,j] / dij) * (uij - outer (uz, uz) / sum (z * uz)) } } return (list (b = bm, h = hm)) } stress <- function (a, b, x, y, delta) { z <- c (a, b) bm <- bmat (a, b, x, y, delta)$b return (1 + sum(z ^ 2) / 2 - sum (z * bm %*% z)) } rho <- function (a, b, x, y, delta) { z <- c (a, b) bm <- bmat (a, b, x, y, delta)$b return (sum (z * bm %*% z)) } vv <- function (i, j, x, y) { a <- matrix (0, 2, 2) a[1, 1] <- sum ((x[i, ]- x[j,]) ^ 2) a[2, 2] <- sum ((y[i, ]- y[j,]) ^ 2) a[1, 2] <- a[2, 1] <- sum ((x[i, ]- x[j,]) * (y[i, ]- y[j, ])) return (a) } uu <- function (i, j, x, y) { n <- nrow (x) asum <- 2 * n * matrix (c (sum(x ^ 2), sum (x * y), sum (x * y), sum (y ^ 2)), 2, 2) csum <- solve (chol (asum)) return (t(csum) %*% vv (i, j, x, y) %*% csum) } smacof <- function (a, b, x, y, delta, eps = 1e-10, itmax = 1000, verbose = TRUE) { zold <- c(a,b) bold <- bmat (a, b, x, y, delta)$b fold <- 1 + sum(zold ^ 2) / 2 - sum (zold * bold %*% zold) itel <- 1 repeat { znew <- drop (bold %*% zold) bhmt <- bmat (znew[1], znew[2], x, y, delta) bnew <- bhmt$b fnew <- 1 + sum(znew ^ 2) / 2 - sum (znew * bnew %*% znew) if (verbose) { cat ( formatC (itel, width = 4, format = "d"), formatC ( fold, digits = 10, width = 13, format = "f" ), formatC ( fnew, digits = 10, width = 13, format = "f" ), "\n" ) } if ((itel == itmax) || (fold - fnew) < eps) break () itel <- itel + 1 fold <- fnew zold <- znew bold <- bnew } return (list (stress = fnew, theta = znew, itel= itel, b = bnew, g = znew - bnew %*% znew, h = diag(2) - bhmt$h)) } newton <- function (a, b, x, y, delta, eps = 1e-10, itmax = 1000, verbose = TRUE) { zold <- c(a,b) bhmt <- bmat (a, b, x, y, delta) bold <- bhmt$b hold <- diag(2) - bhmt$h fold <- 1 + sum(zold ^ 2) / 2 - sum (zold * bold %*% zold) itel <- 1 repeat { znew <- drop (solve (hold, bold %*% zold)) bhmt <- bmat (znew[1], znew[2], x, y, delta) bnew <- bhmt$b hnew <- diag(nrow(bnew)) - bhmt$h fnew <- 1 + sum(znew ^ 2) / 2 - sum (znew * bnew %*% znew) if (verbose) { cat ( formatC (itel, width = 4, format = "d"), formatC ( fold, digits = 10, width = 13, format = "f" ), formatC ( fnew, digits = 10, width = 13, format = "f" ), "\n" ) } if ((itel == itmax) || abs (fold - fnew) < eps) break () itel <- itel + 1 fold <- fnew zold <- znew bold <- bnew hold <- hnew } return (list (stress = fnew, theta = znew, itel = itel, b = bnew, g = znew - bnew %*% znew, h = hnew)) } mprint <- function (x, d = 2, w = 5) { print (noquote (formatC (x, di = d, wi = w, fo = "f"))) } ``` #References