Note: This is a working paper which will be expanded/updated frequently. All suggestions for improvement are welcome. The directory gifi.stat.ucla.edu/third has a pdf version, the bib file, the complete Rmd file with the code chunks, and the R and C source code.

# 1 Introduction

The multidimensional scaling loss function fStress is defined as $$$\label{E:fstress} \sigma(x):=\mathop{\sum\sum}_{1\leq i<j\leq n}w_{ij}(\delta_{ij}-f(x'A_{ij}x))^2$$$

Here $$x=\mathbf{vec}(X)$$, with $$X$$ the usual MDS configuration of $$n$$ points in $$p$$ dimensions. The distances between points $$i$$ and $$j$$ in the configurations are $$d_{ij}(x):=\sqrt{x'A_{ij}x}$$.

The $$np\times np$$ matrices $$A_{ij}$$ are defined using unit vectors $$e_i$$ and $$e_j$$, all zero except for one element that is equal to one. If you like, the $$e_i$$ are the columns of the identity matrix. Define $E_{ij}:=(e_i-e_j)(e_i-e_j)',$ and use $$p$$ copies of $$E_{ij}$$ to make the direct sum $A_{ij}:=\underbrace{E_{ij}\oplus\cdots\oplus E_{ij}}_{p\text{ times}}.$ Assuming without loss of generality that the dissimilarities are normalized to sum of squares one. Then $\sigma(x)=1-\rho(x)+\eta^2(x)$ with $\rho(x):=\mathop{\sum\sum}_{1\leq i<j\leq n}w_{ij}\delta_{ij}f(x'A_{ij}x),$ and $\eta^2(x):=\mathop{\sum\sum}_{1\leq i<j\leq n}w_{ij}f^2(x'A_{ij}x),$ #Partials, Partials Everywhere

## 1.1 Derivatives of fDistances

$f(x)=g(x'Ax)$ $\mathcal{D}_kf(x)=Dg(x'Ax)b_k$ $\mathcal{D}_{kl}f(x)=2\mathcal{D}g(x'Ax)a_{kl}+4\mathcal{D}^2g(x'Ax)b_kb_l$ $\mathcal{D}_{kl v}f(x)=4\mathcal{D}^2g(x'Ax)\{a_{kl}b_v+a_{l v}b_k+a_{kv}b_l\}+8\mathcal{D}^3g(x'Ax)b_kb_l b_v$

$\begin{multline*} \mathcal{D}_{kl vu}f(x)=8\mathcal{D}^3(x'Ax)\{a_{kl}b_vb_u+a_{l v}b_kb_u+a_{kv}b_lb_u+b_lb_va_{ku}+b_kb_l a_{uv}+b_kb_va_{lu}\}+\\4\mathcal{D}^2g(x'Ax)\{a_{kl}a_{vu}+a_{lv}a_{ku}+a_{kv}a_{l u}\}+16\mathcal{D}^4g(x'Ax)b_kb_lb_vb_u \end{multline*}$

## 1.2 Derivatives of Stress

The stress loss function is $\sigma(x):=\frac12\mathop{\sum\sum}_{1\leq i<j\leq n}w_{ij}(\delta_{ij}-d_{ij}(x))^2,$ Here $$x=\mathbf{vec}(X)$$, with $$X$$ the usual MDS configuration of $$n$$ points in $$p$$ dimensions. The distances between points $$i$$ and $$j$$ in the configurations are $$d_{ij}(x):=\sqrt{x'A_{ij}x}$$.

The $$np\times np$$ matrices $$A_{ij}$$ are defined using unit vectors $$e_i$$ and $$e_j$$, all zero except for one element that is equal to one. If you like, the $$e_i$$ are the columns of the identity matrix. Define $E_{ij}:=(e_i-e_j)(e_i-e_j)',$ and use $$p$$ copies of $$E_{ij}$$ to make the direct sum $A_{ij}:=\underbrace{E_{ij}\oplus\cdots\oplus E_{ij}}_{p\text{ times}}.$ Assuming without loss of generality that the dissimilarities are normalized to sum of squares one. Then $\sigma(x)=1-\rho(x)+\frac12x'Vx,$ with $\rho(x):=\mathop{\sum\sum}_{1\leq i<j\leq n}w_{ij}\delta_{ij}\sqrt{x'A_{ij}x},$ and $V:=\mathop{\sum\sum}_{1\leq i<j\leq n}w_{ij}A_{ij}.$ The function $$\frac12x'Vx$$ is quadratic, so its derivatives are trivial to compute. $\rho(x+y)=\mathop{\sum\sum}_{1\leq i<j\leq n}w_{ij}\delta_{ij}\sqrt{e_{ij}(x+y)}$

$\rho(x+y)=\rho(x)+y'\mathcal{D}\rho(x)+\frac12 y'\mathcal{D}^2(x)y+$

$\mathcal{D}\rho(x)=\mathop{\sum\sum}_{1\leq i<j\leq n}w_{ij}\frac{\delta_{ij}}{d_{ij}(x)}A_{ij}x$

$\mathcal{D}^2\rho(x)=\mathop{\sum\sum}_{1\leq i<j\leq n}w_{ij}\delta_{ij}$