# Gifi Analysis of Multivariate Data Jan de Leeuw Patrick Mair Patrick Groenen Version 0.01, April 05, 2016 --- title: "Gifi Analysis of Multivariate Data" author: - Jan de Leeuw - Patrick Mair - Patrick Groenen date: "Version 0.01, April 05, 2016" output: bookdown::gitbook: keep_md: yes split_by: none bookdown::epub_book: null bookdown::pdf_book: keep_tex: yes html_document: keep_md: yes number_sections: yes toc: yes toc_depth: 3 pdf_document: keep_tex: yes latex_engine: pdflatex number_sections: yes toc: yes toc_depth: 3 documentclass: book fontsize: 12pt graphics: yes link-citations: yes mainfont: Times New Roman description: site: bookdown::bookdown_site bibliography: gifi.bib --- Note: The directory [gifi.stat.ucla.edu/gifi](http://gifi.stat.ucla.edu/gifi) has the complete Rmd file with all code chunks, pdf and epub versions, and R and C files with the code. #Loss Function In the multivariate analysis techniques presented in this paper the data are measurements or classifications of $n$ _objects_ by $p$ _variables_. Variables are partitioned into $m$ _sets of variables_, with set $j$ containing $p_j$ variables. Thus $\sum_{j=1}^m p_j=p$. The number of variables in a set is called the _span_ of the set. _Gifi Multivariate Analysis_, or *GMVA*, is the minimization of the loss function \begin{equation} \sigma(X,H,A)=\frac{1}{m}\sum_{j=1}^m\mathbf{SSQ}\ (X-H_jA_j),\label{E:newloss} \end{equation} where $\mathbf{SSQ}()$ is the sum of squares. The loss function $\eqref{E:newloss}$ depends on three sets of parameters $X, H$ and $A$. One of them, the matrix $X$, is common to all sets. The data for set $j$ are in the matrix $H_j$, which codes observations on the $p_j$ variables in set $j$. GMVA does not operate on the data directly, but on _admissible transformations_ of the variables. Matrix $A_j$ has coefficients used to make linear combinations of the variables in set $j$. 1. The $n\times r$ matrix $X$ of _object scores_ is required to be column-centered and normalized by $X'X=I$. The _dimensionality_ $r$ is chosen by the user. 2. The $n\times p_j$ matrices of _transformed variables_ $H_j$ have columns $h_{j\ell}$, with $\ell=1,\cdots,p_j$. We require $h_{j\ell}\in\mathcal{K_{j\ell}}\cap\mathcal{S}$, where $\mathcal{K}_{j\ell}$ is a _cone of admissible transformations_ and $\mathcal{S}$ is the unit sphere in $\mathbb{R}^n$. For example, $\mathcal{K}_{j\ell}$ can be the cone of all centered vectors in $\mathbb{R}^n$ that are monotone with the data in variable $(j,\ell)$. 3. The $p_j\times r$ matrices $A_j$ of _weights_ are unconstrained. Thus we make linear combinations of the transformed variables in such a way that the _set scores_ $H_jA_j$ are as close as possible to the object scores $X$, and consequently as close as possible to each other. This explains why GMVA used to be called *homogeneity analysis*: we want to make the $m$ set scores as _homogeneous_ as possible. Note that GMVA is both linear and non-linear, in the sense that it makes linear combinations of non-linear transformations of the variables. A GMVA solution can be associated with various sets of *loadings*, i.e. correlations between observed variables and constructed (or latent) variables. We have the *object loadings* $X'H_j$, the correlations between object scores and transformed variables in set $j$, and the *variable loadings*, the correlations between the transformed variables $H_j$ and the set scores $H_jA_j$. Because optimal $A_j$ is the linear least squares solution for given $X$ and $H_j$ we have $H_j'(X-H_jA_j)=0$, which means the object loadings are equal to the covariances between tranformed variables and set scores. Note that the set scores are not normalized, so the variable loadings are $H_j'H_jA_j\{\mathbf{diag}(A_j'H_j'H_jA_j)\}^{-\frac12}$. In addition we can compute the *induced correlation matrix*, which is $H'H$, with $H=(H_1\mid\cdots\mid H_m)$. ##History The history of loss function $\eqref{E:newloss}$ is complicated. Least squares and eigenvalue methods for quantifying multivariate qualitative data were introduced by @guttman_41. They were taken up by, among others, @deleeuw_R_68e and by Benzécri and his students [@cordier_65]. In this earlier work the emphasis was often on optimizing quadratic forms, or ratios of quadratic forms, and not so much on least squares, distance geometry, and so-called _biplots_ [@gower_hand_96]. In @deleeuw_B_74 a first attempt was made to unify most classical descriptive multivariate techniques using a single least squares loss function and a corresponding _alternating least squares (ALS)_ optimization method. The project then bifurcated to the *ALSOS project*, with Young and Takane at the University of North Carolina Chapell Hill, and the *Gifi project*, at the Department of Data Theory of the UNiversity of Leiden. The ALSOS project was started in 1973-1974, when De Leeuw was visiting Bell Telephone Labs in Murray Hill. ALSOS stands for Alternating Least Squares with Optimal Scaling. The ALS part of the name was provided by @deleeuw_R_68d and the OS part by @bock_60. Young, De Leeuw, and Takane applied the basic ALS and OS methodology to conjoint analysis, regression, principal component analysis, multidimensional scaling, and factor analysis, producing computer programs (and SAS modules) for each of the techniques. An overview of the project, basically at the end of its lifetime, is in @young_deleeuw_takane_C_80 and @young_81. The ALSOS project was clearly inspired by the path-breaking work of @kruskal_64a, @kruskal_64b, @kruskal_65, who designed a general way to turn metric multivariate analysis techniques into non-metric ones. The Gifi project took its inspiration mostly from @guttman_41. Guttman's quantification method, which later became known as _multiple correspondence analysis_, was merged with linear and nonlinear principal component analysis in the HOMALS/PRINCALS techniques and programs [@deleeuw_vanrijckevorsel_C_80]. The GMVA loss function that was chosen ultimately, for example in the work of @vanderburg_deleeuw_verdegaal_A_88, had been used earlier by @carroll_68 in multi-set canonical correlation analysis. In the Gifi system [@gifi_B_90, @michailidis_deleeuw_A_98] a slightly different parametrization, and a correspondingly different ALS algorithm, were used than the one we have implemented here. The loss function used by Gifi is \begin{equation} \sigma(X,Y)=\frac{1}{m}\sum_{j=1}^m\mathbf{SSQ}\ (X-\sum_{\ell=1}^{p_j}G_{j\ell}Y_{j\ell}),\label{E:oldloss} \end{equation} where the $G_{j\ell}$ are known spanning matrices for the cones of admissible transformations, and the $Y_{j\ell}$ are $k_{j\ell}\times p$ matrices of _category quantifications_. By using the full rank decompositions $Y_{j\ell}=Z_{j\ell}A_{j\ell}$ we can show that $\eqref{E:newloss}$ and $\eqref{E:oldloss}$ are essentially the same. We feel that the alternative parametrization in terms of $H_j$ and $A_j$ has some conceptual and computational advantages. ##Copies Within a set there can be more than one version of a variable. These multiple versions are called *copies*. Since GMVA transforms variables, having more than one copy is not just redundant, because different copies can be transformed differently. As a simple example of copies, think of using different monomials or orthogonal polynomials of a single variable $x$ in a polynomial regression. In the algorithm copies of a variable are just treated in exactly the same way as other variables. The notion of copies replaces the notion of the _rank of a quantification_ used in traditional Gifi, which in turn generalizes the distinction between _single_ and _multiple_ quantifications. A single variable has only one copy in its set, a multiple variable has the maximum number of copies. It is possible, in principle, to have copies of a variable in different sets, but we have not implemented that possibility. ##Associated Eigenvalue Problems Associated with the problem of minimizing loss function $\eqref{E:newloss}$ are some eigen and singular value problems defined by the matrices $H_j$. This has been discussed extensively in @gifi_B_90, and there are some more recent discussions in @tenenhaus_tenenhaus_11 and @vandervelden_takane_12. We begin the section with some definitions, which are more or less standard in GMVA. First $H:=(H_1\mid H_2\mid\cdots\mid H_m),$ and $C:=H'H$. The matrix $C$, which is called the *Burt matrix* in correspondence analysis, is a $p\times$ block matrix, with $m\times m$ blocks define by $C_{j\ell}:=H_j'H_\ell$. We also use separate notation $D_j:=C_{jj}=H_j'H_j$ for the diagonal blocks in $C$, and for their direct sum $D:=D_1\oplus\cdots\oplus D_m$. Finally $A$ stacks the $A_j$ on top of each other. The stationary equations for minimizing $\sigma$ over $X'X=I$ and $A$, for given $H$, are \begin{align} H'X&=DA,\\ HA&=XM, \end{align} with $X'X=I$, and $M$ an $r\times r$ symmetric matrix of Lagrange multipliers. It follows that \begin{equation} CA=DAM,\label{E:stata} \end{equation} as well as \begin{equation} HD^+H'X=XM,\label{E:statx} \end{equation} with $X'X=I$ and $D^+$ the Moore-Penrose inverse of $D$. It follows that $X=KT$, where $K$ are eigenvectors corresponding with the $r$ largest eigenvalues of $HD^+H$ and $L$ is an arbitrary rotation matrix. In the same way $A=LT$, where $L$ are the eigenvectors corresponding with the $r$ largest eigenvalues of $D^+C$. The non-zero eigenvalues of $HD^+H'$ and $D^+C$ are the same, both are equal to the squares of the singular values of $HD^{-\frac12}$, with $D^{-\frac12}$ the symmetric square root of $D^+$. The result can be made a bit more intuitive by defining the orthogonal projectors $P_j:=H_jD_j^+H_j'$ and their average $P_\star$. Then $X$ can be chosen as the normalized eigenvectors of $P_\star$ and, if $\lambda_s$ are the corresponding ordered eigenvalues, \begin{equation} \min_{X'X=I}\min_A\sigma(X,A,H)=\sum_{s=1}^r(1-\lambda_s(P_\star)). \end{equation} The eigenvalues in $\Lambda$ are all between zero and one. In GMVA the fit of set $j$ is called the *discrimination measure*. It is defined as $\Delta_j:=X'P_jX=A_j'D_jA_j$. Note that the average discrimination measure $\Delta_\star$ is equal to the diagonal matrix $\Lambda$. #Algorithm The standard way to minimize loss function $\eqref{E:oldloss}$ is implemented in the OVERALS program [@vanderburg_deleeuw_verdegaal_A_88, @meulman_heiser_11]. It is also the one used in the homals package [@deleeuw_mair_A_09a]. In this paper the algorithm is different because we use the loss function $\eqref{E:newloss}$. We still use ALS, which means in this case that we cycle through three substeps in each iteration. We update $A$ for given $X$ and $H$, we then update $X$ for given $H$ and $A$, and finally we update $H$ for given $X$ and $A$. Algorithm A goes as follows. 0. Set $k=0$ and start with some $X^{(0)},H^{(0)},A^{(0)}$. 1. $X^{(k+1)}=\mathbf{ortho}(\mathbf{center}(H^{(k)}A^{(k)})$. 2. For $j=1,\cdots,m$ compute $A_j^{(k+1)}=\{H_j^{(k)}\}^+X^{(k+1)}$. 3. For $j=1,\cdots,m$ and $s=1,\cdots p_j$ compute $h_{js}^{(k+1)}=\mathbf{proj}_{\mathcal{K}_{js}\cap\mathcal{S}}((X^{(k+1)}-\sum_{ts}h_{jt}^{(k)}\{a_{jt}^{(k+1)}\}')a_s^{(k+1)})$. 4. If converged stop. Else $k\leftarrow k+1$ and go to step 1. In step 1 we use superscript + for the Moore-Penrose inverse. In step 2 the center operator does column centering, the ortho operator finds an orthonormal basis for the column space of its argument. The complicated part is step 4, the _optimal scaling_, i.e. the updating of $H_j$ for given $X$ and $A_j$. We cycle through the variables in the set, each time projecting a single column on the cone of admissible transformations of the variable, and then normalizing the projection to length one. The _target_, i.e. the vector we are projecting, is complicated, because the other variables in the same set must be taken into account. In order to simplify the optimal scaling computations within an iteration we can use majorization [@deleeuw_C_94c, @heiser_95, @lange_hunter_yang_00, @deleeuw_E_15b]. This has the additional benefit that the optimal scaling step becomes embarassingly parallel. We expand the loss for set $j$ around a previous solution $\tilde H_j$. $$\mathbf{SSQ}(X-H_jA_j)= \mathbf{SSQ}(X-\tilde H_jA_j)-2\mathbf{tr}\ (H_j-\tilde H_j)'(X-\tilde H_jA_j)A_j' +\mathbf{tr}\ A_j'(H_j-\tilde H_j)'(H_j-\tilde H_j)A_j.$$ Now $$\mathbf{tr}\ (H_j-\tilde H_j)A_jA_j'(H_j-\tilde H_j)'\leq\kappa_j\ \mathbf{tr}\ (H_j-\tilde H_j)'(H_j-\tilde H_j),$$ where $\kappa_j$ is the largest eigenvalue of $A_j'A_j$. Thus $$\mathbf{SSQ}(X-H_jA_j)\leq\mathbf{SSQ}(X-\tilde H_jA_j)+\kappa_j\ \mathbf{SSQ}(H_j-U_j)-\frac{1}{\kappa_j}\ \mathbf{SSQ}((X-\tilde H_jA_j)A_j'),$$ where $U_j$ is the _target_ $$U_j=\tilde H_j+\frac{1}{\kappa_j}(X-\tilde H_jA_j)A_j'.\tag{3}$$ It follows we can update the optimal scaling of the variables by projecting the columns of $U_j$ on their respective cones and then normalizing. See @deleeuw_U_75a for results on normalized cone regression. This can be done for all variables in the set separately, without taking any of the other variables in the set (or in any of the other sets) into account. Thus the optimal scaling is easy to parallellize. The resulting algorithm B is as follows. 0. Set $k=0$ and start with some $X^{(0)},H^{(0)},A^{(0)}$. 1. $X^{(k+1)}=\mathbf{ortho}(\mathbf{center}(H^{(k)}A^{(k)})$. 2. For $j=1,\cdots,m$ compute $A_j^{(k+1)}=\{H_j^{(k)}\}^+X^{(k+1)}$. 3. For $j=1,\cdots,m$ compute $U_j^{(k+1)}=H_j^{(k)}+\frac{1}{\kappa_j}(X^{(k+1)}-H_j^{(k)}A_j^{(k+1)})\{A_j^{(k+1)}\}'$ and for $s=1,\cdots p_j$ compute $h_{js}^{(k+1)}=\mathbf{proj}_{\mathcal{K}_{js}\cap\mathcal{S}}(u_{js}^{(k+1)})$. 4. If converged stop. Else $k\leftarrow k+1$ and go to step 1. ##Implementation Details If we follow the ALS strategy strictly the $\mathbf{ortho}()$ operator should be implemented using Procrustus rotation [@gibson_62]. Thus if $Z=K\Lambda L'$ is the singular value decomposition of $X$, then $\mathbf{ortho}(Z)=KL'$. Note, however, that any other basis for the column space of $Z$ merely differs from the Procrustus basis by a rotation. And this rotation matrix will carry unmodified into the upgrade of $A_j$ in step 2 of the algorithm, and thus after steps 1 and 2 the loss will be the same, no matter whch rotation we select. In our algorithm we use the QR decomposition to find the basis, using the Gram-Schmidt code from @deleeuw_E_15d. We implement the cone restrictions by the constraints $h_{js}=G_{js}z_s$ in combination with $T_{js}h_{js}\geq 0$. Thus the transformed variables must be in the intersection of the subspace spanned by the columns of the _transformation basis_ $G_{js}$ and the polyhedral convex cones of all vectors $h$ such that $T_{js}h\geq 0$. We suppose that all columns of the $G_{js}$ add up to zero, and we require, in addition, the normalization $SSQ(h_{js})=1$. In earlier GMVA work, summarized for example in @gifi_B_90 or @michailidis_deleeuw_A_98, the transformation basis matrices $G_{js}$ were binary zero-one matrices, indicating category membership. The same is true for the software in IBM SPSS Categories [@meulman_heiser_11] or in the R package homals [@deleeuw_mair_A_09a]. In this paper we extend the current GMVA software using _B-spline bases_, which provide a form of fuzzy non-binary coding suitable for both categorical and numerical variables [@vanrijckevorsel_deleeuw_B_88]. These generalizations were already discussed in @deleeuw_vanrijckevorsel_vanderwouden_A_81 and @gifi_B_90, but corresponding easily accessible software was never released. We use the code described in @deleeuw_E_15e to generate B-spline bases. Note that for coding purposes binary indicators are B-splines of degree zero, while polynomials are B-splines without interior knots. Also note that binary indicators can be created for qualitative non-numerical variables, for which B-splines are not defined. We have added the option using degree -1 to bypass the B-spline code and generate an indicator matrix. Throughout we first orthonormalize the basis matrices $G_{js}$, using the Gram-Schmidt code from @deleeuw_E_15d. The matrices $T_{js}$ in the homogeneous linear inequality restrictions that define the cones $\mathcal{K}_{js}$ can be used to define monotonicity or convexity of the resulting transformations. In the current implementation we merely allow for monotonicity, which means the $T_{js}$ do not have to be stored. The transformations for each variable can be restricted to be increasing, or they can be unrestricted. By using splines without interior knots we allow in addition for polynomial transformations, which again can be restricted to be either monotonic or not. This covers the previous Gifi types _nominal_, _ordinal_, and _numerical_, which were of course designed for categorical variables with a small number of categories. Note that it is somewhat misleading to say we are fitting monotone splines or polynomials, we are mainly requiring monotonicity at the data points. Missing data are incorporated in the definition of the cones of transformations by using a $G_{js}$ which is the direct sum of a spline basis for the non-missing and an identity matrix for the missing data. This is called _missing data multiple_ in @gifi_B_90. There are no linear inequality restrictions on the quantifications of the missing data. ##Wrappers The homals() implementation in @deleeuw_mair_A_09a is a single monolithic program in R, which specializes to the various GMVA techniques by a suitable choice of its parameters. This has some obvious disadvantages. If we want principal component analysis, we know all sets are singletons. If we want multiple correspondence analysis we know each variable has $p$ copies. If we want multiple regression, we know there are two sets, and one is a singleton. So it is somewhat tedious to specify all parameters all of the time. Also, some of the output is specific to a particular technique. For regression we want residuals and fitted values, in canonical analysis we want set scores and loadings. And, more generally, we may want the output in a form familiar from the classical MVA techniques. It is indeed possible to transform the homals() output to more familar forms (@deleeuw_R_09c), but this requires some extra effort. In this book we go back to the original approach of @gifi_R_89 and write separate programs for principal component analysis, multiple regression, canonical analysis, discriminant analysis, ad so on. These programs are wrappers for the main computational core, the program gifiEngine(). The wrappers, which have the familiar names morals(), corals(), princals(), homals(), criminals(), overals(), primals(), and canals(), set up the input for gifiEngine(), receive the output, and transform the output to a format familiar from the corresponding technique from classical MVA. Each wrapper foo returns a structure of class foo. # Multiple Correspondence Analysis and homals() Suppose all basis matrices $G_{j\ell}$ in set $j$ are the same, say equal to $G_j$. Then the set scores $H_jA_j$ are equal to $G_jZ_jA_j$, which we can write simply as $G_jY_j$. Thus loss must be minimized over $X$ and the $Y_j$. If all $G_j$ are binary indicators of categorical variables, and the $m$ sets are all of span one, then GMVA is _multiple correspondence analysis_ (MCA). The set scores $G_jY_j$ are $k_j$ different points in $\mathbb{R}^p$, with $k_j$ the number of categories of the variable, which is usually much less than $n$. The plot connecting the set scores to the object scores is called the _star plot_ of the variable. If $k_j$ is much smaller than $n$ a star plot will connect all object scores to their category centroids, and the plot for a set (i.e. a variable) will show $k_j$ stars. Since loss $\sigma$ is equal to the sum of squared distances between object scores and set scores, we quantify or transform variables so that stars are small. In our GMVA MCA function homals() we allow for B-spline bases and for monotonicity restrictions. The input data (as for all GMVA programs) needs to be numeric, and we included a small utility funcion makeNumeric() that can be used on data frames, factors, and character variables to turn them into numeric matrices. All other arguments to the funcion have default values. r homals <- function (data, knots = knotsD (data), degrees = rep (-1, ncol (data)), ordinal = rep (FALSE, ncol (data)), ndim = 2, itmax = 1000, eps = 1e-6, seed = 123, verbose = FALSE)  The output is a structure of class homals, i.e. a list with a class attributehomals. The list consists of transformed variables (in xhat), their correlation (in rhat), the objectscores (in objectscores), the setscores (in setscores, which is itself a list of length number of variables), the discrimination measures (in dmeasures, a list of length number of variables), their average (in lambda), the weights (in a), the number of iterations (in ntel), and the loss funcion value (in f). r return (structure ( list ( xhat = xhat, rhat = crossprod (xhat), objectscores = h$x, setscores = y, dmeasures = d, lambda = dsum / ncol (data), weights = a, ntel = h$ntel, f = h$f ), class = "homals" ))  ##Examples ###Hartigan's Hardware Our first example are semi-serious data from @hartigan_75 (p. 228), also analyzed in @gifi_B_90 (p. 128-135).  ## thread head indentation bottom length brass ## tack N F N S 1 N ## nail1 N F N S 4 N ## nail2 N F N S 2 N ## nail3 N F N F 2 N ## nail4 N F N S 2 N ## nail5 N F N S 2 N ## nail6 N C N S 5 N ## nail7 N C N S 3 N ## nail8 N C N S 3 N ## screw1 Y O T S 5 N ## screw2 Y R L S 4 N ## screw3 Y Y L S 4 N ## screw4 Y R L S 2 N ## screw5 Y Y L S 2 N ## bolt1 Y R L F 4 N ## bolt2 Y O L F 1 N ## bolt3 Y Y L F 1 N ## bolt4 Y Y L F 1 N ## bolt5 Y Y L F 1 N ## bolt6 Y Y L F 1 N ## tack1 N F N S 1 Y ## tack2 N F N S 1 Y ## nailb N F N S 1 Y ## screwb Y O L S 1 Y  r h <- homals (makeNumeric(hartigan), degrees = rep (-1, ncol(hartigan))) Figure 1: Hartigan Data, Object Scores Figure 2: Hartigan Data, Star Plots  ## [,1] [,2] ## [1,] 0.93 0.16 ## [2,] 0.16 0.03 ## [,1] [,2] ## [1,] 0.96 0.04 ## [2,] 0.04 0.64 ## [,1] [,2] ## [1,] 0.94 0.07 ## [2,] 0.07 0.68 ## [,1] [,2] ## [1,] 0.40 -0.13 ## [2,] -0.13 0.04 ## [,1] [,2] ## [1,] 0.29 -0.18 ## [2,] -0.18 0.81 ## [,1] [,2] ## [1,] 0.07 0.04 ## [2,] 0.04 0.03   ## [,1] [,2] ## [1,] 0.60 0.00 ## [2,] 0.00 0.37   ##  0.5157326   ##  1.031465   ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] ## [1,] 1.00 0.00 -0.66 -0.75 0.82 -0.58 0.46 0.00 -0.41 0.03 -0.22 ## [2,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ## [3,] -0.66 0.00 1.00 -0.00 -0.71 0.14 -0.44 0.00 0.56 0.43 0.17 ## [4,] -0.75 0.00 -0.00 1.00 -0.45 0.65 -0.23 0.00 0.06 -0.41 0.14 ## [5,] 0.82 0.00 -0.71 -0.45 1.00 -0.00 0.51 0.00 -0.41 -0.37 -0.15 ## [6,] -0.58 0.00 0.14 0.65 -0.00 1.00 -0.07 0.00 0.13 -0.56 0.17 ## [7,] 0.46 0.00 -0.44 -0.23 0.51 -0.07 1.00 0.00 -0.28 -0.17 -0.29 ## [8,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ## [9,] -0.41 0.00 0.56 0.06 -0.41 0.13 -0.28 0.00 1.00 0.00 -0.24 ## [10,] 0.03 0.00 0.43 -0.41 -0.37 -0.56 -0.17 0.00 0.00 1.00 -0.09 ## [11,] -0.22 0.00 0.17 0.14 -0.15 0.17 -0.29 0.00 -0.24 -0.09 1.00 ## [12,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ## [,12] ## [1,] 0.00 ## [2,] 0.00 ## [3,] 0.00 ## [4,] 0.00 ## [5,] 0.00 ## [6,] 0.00 ## [7,] 0.00 ## [8,] 0.00 ## [9,] 0.00 ## [10,] 0.00 ## [11,] 0.00 ## [12,] 0.00   ##  0.60 0.37 0.21 0.13 0.10 0.07 0.02 0.00 0.00 0.00 -0.00 ##  -0.00  ###GALO r h <- homals (galo, degrees = rep (-1, 4))  The four star plots are in figure 4.    Figure 5: Personality Scales, Star Plots, Multiple Nominal, Degree Zero ### Thirteen Personality Scales Our next example is a small data set from the psych package [@revelle_15] of five scales from the Eysenck Personality Inventory, five from a Big Five inventory, a Beck Depression Inventory, and State and Trait Anxiety measures. r data(epi.bfi, package = "psych") epi <- epi.bfi epi_knots <- lapply (epi, function (x) fivenum (x)[2:4]) epi_degrees <- rep (0, 13) epi_ordinal <- rep (FALSE, 13)  We perform a two-dimensional MCA, using degree zero and inner knots at the three quartiles for all 13 variables. r h <- homals(epi, epi_knots, epi_degrees, epi_ordinal)  We have convergence in 260 iterations to loss 0.7478043. The object scores are in figure 3. Figure 3: Personality Scales, Object Scores, Multiple Nominal, Degree Zero Figure 4 has the$G_jY_j$for each of the thirteen variables, with the first dimension in red, and the second dimension in blue. Because the degree of the splines is zero, these _transformation plots_ show step functions, with the steps at the knots, which are represented by vertical lines.   Figure 4: Personality Scales, Transformations, Multiple Nominal, Degree Zero The thirteen star plots are in figure 4.     Figure 5: Personality Scales, Star Plots, Multiple Nominal, Degree Zero Now change the degree to two for all variables, i.e. fit piecewise quadratic polynomials which are differentiable at the knots. We still have two copies for each variable, and these two copies define the sets. r epi_degrees <- rep (2, 13) h <- homals (epi, epi_knots, epi_degrees, epi_ordinal)  We have convergence in 785 iterations to loss 0.7179135. The object scores are in figure 6 and the transformation plots in figure 7. Figure 6: Personality Scales, Object Scores, Multiple Nominal, Degree Two   Figure 7: Personality Scales, Transformations, Multiple Nominal, Degree Two #Correspondence Analysis and corals() # Nonlinear Principal Component Analysis and princals() Suppose all$m$sets each contain only a single variable. Then the Burt matrix is the correlation matrix of the$H_j$, which are all$n\times 1$matrices in this case. It follows that GMVA maximizes the sum of the$r$largest eigenvalues of the correlation matrix over transformations, i.e. GMVA is _nonlinear principal component analysis_ [@deleeuw_C_14]. ## Example: Thirteen Personality Scales We use the same data as before for an NLPCA with all sets of rank one, all variables ordinal, and splines of degree 2. r library(nnls) epi_copies <- rep (1, 13) epi_ordinal <- rep (TRUE, 13) h <- princals(epi, epi_knots, epi_degrees, epi_ordinal, epi_copies)  In 19 iterations we find minimum loss 0.7330982. The object scores are in figure 8 and the transformation plots in figure 9. NLPCA maximizes the sum of the two largest eigenvalues of the correlation matrix of the variables. Before transformation the eigenvalues are 4.0043587, 2.6702003, 1.9970912, 0.8813983, 0.6571463, 0.6299946, 0.5246896, 0.4657022, 0.3457515, 0.3403361, 0.2767531, 0.1835449, 0.0230333, after transformation they are 4.1939722, 2.7454868, 1.604906, 0.8209072, 0.7184825, 0.677183, 0.51865, 0.4545214, 0.4200148, 0.351787, 0.2928574, 0.1699557, 0.0312759. The sum of the first two goes from 6.674559 to 6.9394591. Figure 8: Personality Scales, Object Scores, Single Ordinal, Degree Two   Figure 9: Personality Scales, Transformations, Single Ordinal, Degree Two We repeat the analysis with ordinal variables of degree two, without interior knots. Thus we the transformation plots will be quadratic polynomials that are monotone over the range of the data. r h <- princals(epi, knotsE(epi), epi_degrees, epi_ordinal)  In 20 iterations we find minimum loss 0.7393666. The object scores are in figure 10 and the transformation plots in figure 11. The eigenvalues are now 4.0828642, 2.6936186, 1.8391342, 0.8732231, 0.6666505, 0.6491709, 0.5390077, 0.459182, 0.3632868, 0.3471175, 0.2845394, 0.1782232, 0.023982, with sum of the first two equal to 6.7764828. Figure 10: Personality Scales, Object Scores, Single Numerical, Degree Two   Figure 11: Personality Scales, Transformations, Single Numerical, Degree Two #Optimal Scaling and primals() #Canonical Analysis and canals() If there are only two sets the generalized eigenvalue problem for the Burt matrix becomes $$\begin{bmatrix} D_1&C_{12}\\C_{21}&D_2 \end{bmatrix} \begin{bmatrix} a_1\\a_2 \end{bmatrix}=2\lambda\begin{bmatrix}D_1&0\\0&D_2\end{bmatrix}\begin{bmatrix} a_1\\a_2 \end{bmatrix},$$ which we can rewrite as $$\begin{split} C_{12}a_2&=(2\lambda-1)D_1a_1,\\ C_{21}a_1&=(2\lambda-1)D_2a_2, \end{split}$$ from which we see that GMVA maximizes the sum of the$r$largest canonical correlations between$H_1$and$H_2$. See also @vandervelden_12. #Multiple Regression and morals() If the second set only contains a single copy of a single variable then we choose transformations that maximize the multiple correlation of that variable and the variables in the first set. ##Example:Polynomial Regression r x <- center(as.matrix (seq (0, pi, length = 20))) y <- center(as.matrix (sin (x))) h<- morals (x, y, xknots = knotsE(x), xdegrees = 3, xordinal = TRUE) plot(y, h$yhat)  ![](_main_files/figure-html/polynom_data-1.png) r plot(x, h$xhat)  ![](_main_files/figure-html/polynom_data-2.png) r plot (x, y) lines (x, h$ypred)  ![](_main_files/figure-html/polynom_data-3.png) ##Example:Gases with Convertible Components We analyze a regression example, using data from Neumann, previously used by Willard Gibbs, and analyzed with regression in a still quite readable article by @wilson_26. Wilson's analysis was discussed and modified using splines in @gifi_B_90 (pages 370-376). In the regression analysis in this section we use two copies of temperature, with spline degree zero, and the first copy ordinal. For pressure and the dependent variable density we use a single ordinal copy with spline degree two. r data (neumann, package = "homals") xneumann <- neumann[, 1:2] yneumann <- neumann[, 3, drop = FALSE] xdegrees <- c(0,2)  r h <- morals (xneumann, yneumann, xdegrees = c(0,2), xcopies = c(2,1))  In 34 iterations we find minimum loss 0.0295045, corresponding with a multiple correlation of 0.8854642. The object scores are in figure 12 plotted against the original variables (not the transformed variables), and the transformation plots in are figure 13. Figure 12: Gases with Convertible Components, Objects Scores Figure 13: Gases with Convertible Components, Transformations

# Discriminant Analysis and criminals() If the second set contains more than one copy of a single variable and we use binary indicator coding for that variable, then we optimize the eigenvalue (between/within ratio) sums for a canonical discriminant analysis. ##Example: Iris data The next example illustrates (canonical) discriminant analysis, using the obligatory Anderson-Fisher iris data. Since there are three species of iris, we use two copies for the species variable. The other four variables are in the same set, they are transformed using piecewise linear monotone splines with five knots. r data(iris, package="datasets") iris_vars <- names(iris) iris[] <- as.numeric (iris[]) iris_knots <- as.list(1:5) for (i in 1:4) iris_knots[[i]] <- quantile (iris[[i]], (1:5) / 6) iris_knots[] <- 1:3 iris_degrees <- c(1,1,1,1,0) iris_ordinal <- c (TRUE, TRUE, TRUE, TRUE, FALSE) iris_copies <- c (1,1,1,1,2) iris_sets <- c(1,1,1,1,2)  r h <- criminals (iris, iris_knots,iris_degrees,iris_ordinal,iris_sets,iris_copies)  In 34 iterations we find minimum loss 0.0295045. The object scores are in figure 14 plotted against the original variables (not the transformed variables), and the transformation plots are in figure 15. Figure 14: Iris Data, Objects Scores Figure 15: Iris Data, Transformations

Discriminant analysis decomposes the total dispersion matrix $T$ into a sum of a between-groups dispersion $B$ and a within-groups dispersion $W$, and then finds directions in the space spanned by the variables for which the between-variance is largest relative to the total variance. GMVA optimizes the sum of the $r$ largest eigenvalues of $T^{-1}B$. Before optimal transformation these eigenvalues for the iris data are r , after transformation they are r . #Multiset Canonical Correlation and overals() ##Example: Thirteen Personality Scales This is the same example as before, but now we group the five scales from the Eysenck Personality Inventory and the five from the Big Five inventory into sets. The remaining three variables define three separate sets. No cpies are used, and we use monotone cubic splines with the interior knots at the quartiles. r epi_knots <- lapply (epi, function (x) fivenum (x)[2:4]) epi_degrees <- rep (3, 13) epi_sets <- c(1,1,1,1,1,2,2,2,2,2,3,4,5)  r h <- overals(epi, epi_sets, epi_copies, epi_knots, epi_degrees, epi_ordinal)   ##  231 13  In 196 iterations we find minimum loss 0.4724286. The object scores are in figure 16 and the transformation plots in figure 17. Figure 16: Personality Scales, Multi-Set, Objects Scores   Figure 17: Personality Scales, Multi-Set, , Transformations

\appendix #Code ##R Code ###Engine ###Wrappers ###Utilities ###Splines ###Gram-Schmidt ##C Code ###Splines ###Gram-Schmidt #NEWS #TO DO #References