# Croissants and Wedges in Multiple Correspondence Analysis Jan de Leeuw Version 2.01, December 16, 2015 ![croissants and wedges.](croissants.png) Note: This is a working paper which will be expanded/updated frequently. The directory [gifi.stat.ucla.edu/croissants](http://gifi.stat.ucla.edu/croissants) has a pdf copy of this article, the complete Rmd file with all code chunks, and R and C files with the code. Unfortunately the Norway data cannot be shared, so that part of the paper is not reproducible. If you want to knit the Rmd file, remove the Norway section. #Introduction In multiple correspondence analysis (MCA) we frequently observe two-dimensional scatter plots in which the points are on or close to a convex function, with shape similar to a parabola. This is often referred to as the *horseshoe effect*, but that name is not quite appropriate, since horseshoes fold back at the endpoints. Thus we suggest to call such plots *croissants*, which seems especially suitable if there is considerable dispersion around the convex function. Croissants have a complicated history. In data analysis they were first described in the context of scale analysis by @guttman_50. The MCA eigenvectors of perfect scales satisfy a second order difference equation, whose solution are the discrete orthogonal polynomials. The first and second component define the croissant. In the French data analysis literature the horseshoe or croissant is consequently called the *Effet Guttman*. The next major contribution was @lancaster_58, who described a family of bivariate distributions for which the singular value decomposition of the density consisted of orthogonal polynomials. The most famous member of the family is the bivariate normal, which has the Hermite-Chebyshev polynomials as its components. Lancaster's work was generalized and extended by his students and co-workers. It was taken up by BenzĂ©cri's school, in particular in the thesis of @naouri_70. In ecology horseshoes were first discussed in the influential paper of @hill_74. They were considered a nuisance, and various techniues were developed to get rid of them [@hill_gauch_80]. Ecologists work with the Gaussian abundance model for environmental gradients, and @ihm_vangroenewoud_75 showed this resulted in horseshoes. In multidimensional scaling quadratic structures were first discussed and analyzed by @levelt_vandegeer_plomp_66 for musical intervals (a parabola) and by @degruijter_67 for political parties (an ellipse). More information on horseshoes in the MDS context is in @diaconis_goel_holmes_08 and @deleeuw_R_07a. The application of Hermite-Chebyshev polynomials to the multivariate normal case in MCA was outlined in section 3.8 of @deleeuw_B_74. @gifi_B_80 pointed out for the first time that oscillation matrices and total positivity were the relevant mathematical theories, see also section 9.2 of @gifi_B_90. Total positivity was used in a general theory by @schriever_83, @schriever_85 that covers all previously discussed special cases. The appropriate way to patch the decomposition of the various bivariate marginals together in the multivariate case is due to @deleeuw_C_82, see also @bekker_deleeuw_C_88. Much of the work of the Gifi school on horseshoes was summarized in @vanrijckevorsel_87. #MCA We give a very brief introduction to MCA to fix the terminology. We start with n measurements on m categorical variables. Variable j has $k_j$ categories. We then expand each variable to an indicator matrix $G_j$, indicating category membership. Thus $G_j$ is an $n\times k_j$ binary matrix with exactly one element equal to one in each column, and zeroes everywhere else. Define $G:=\begin{bmatrix}G_1\mid\cdots\mid G_m\end{bmatrix}$ and $C=G'G$. The matrix $C$, which contains the bivariate cross tables of the variables, is called the *Burt Matrix* (*Tableau de Burt*). Also define $D:=\mathbf{diag}(C)$, the diagonal matrix of univariate marginals. Both $C$ and $D$ are of order $K:=\sum k_j$. MCA is defined as solving the generalized eigenvalue problem $CY=mDY\Lambda$, with $Y'DY=I$. The generalized eigenvectors $Y$ are called the *category quantifications*, and the centroids $X:=\frac{1}{m}GY$ are called the *object scores*. In most cases we do not use all eigenvectors in the data analysis but only a small number of them. The category quantifications are in an $\sum k_j\times p$ matrix, and the object scores are in an $n\times p$ matrix. Note that with the normalization we have chosen $0\leq\Lambda\leq I$ and $X'X=\frac{1}{m}\Lambda$. In our computations we use the package geigen [@geigen_15]. Or, alternatively, we define $E:=D^{-\frac12}CD^{-\frac12}$, compute eigenvalues $Z$ such that $EZ=mZ\Lambda$, and then set $Y=D^{-\frac12}Z$. Also, because of the singularities in $G$, there is one generalized eigenvalue equal to one (with a corresponding generalized eigenvector $y$ that has all elements equal) and there are $m-1$ generalized eigenvalues equal to zero. #Synthetic Data: Normal Distribution One simple way to reliably generate croissants is to discreticize a sample from a multinormal. Of course there are various parameters to consider. The function discreteNormal() allows one to choose the size of the sample n, the number of variables m, the correlation between the variables r, and the discretization points (aka knots). r formals ("discreteNormal")   ## $n ## ## ##$m ## ## ## $r ## ## ##$knots 
Throughout this section we choose $n=10,000$. Our first normal croissant (category quantifications on the left, object scores on the right) has four variables, knots at the integers from -3 to +3, and all correlations equal to .75. We see somewhat more scatter or filling in the object score croissant, basically because object scores are convex combinations of category quantifications.
Figure 1: Normal Croissants, m = 4, r = .75

If we decrease the correlation to .25 the croissant crumbles. A correlation of .25 is too close to independence, which means that not enough variation will be captured by the first two eigenvalues. Nevertheless if we discard the outliers the croissant is still there.
Figure 2: Normal Croissants, m = 4, r = .25

In the third simulation we increase the number of variables to 25, with correlation .75. The croissants are tight and symmetric.
Figure 3: Normal Croissants, m = 25, r = .75

Moreover for 25 variables we still see a clear croissant if the correlation is .25, although there obviously is more filling. Lower correlation means objects scores will be convex combinations of more category quantifications, and thus they will cluster more around the origin, which is the fattest part of the croissant.
Figure 4: Normal Croissants, m = 25, r = .25

We can also study the influence of skewness by choosing the knots to be c(0,1,2,3), keeping the correlaton between the four variables at .75. A truncated versions of the croissant appears.
Figure 5: Normal Croissants, m = 4, r = .75, skew

Finally we can choose a correlation matrix which does not have all correlations the same. We use the correlation matrix  ## [,1] [,2] [,3] [,4] ## [1,] 1.000 -0.027 0.729 0.008 ## [2,] -0.027 1.000 0.001 0.343 ## [3,] 0.729 0.001 1.000 -0.027 ## [4,] 0.008 0.343 -0.027 1.000  This has the effect of perturbing the croissant, because it becomes a mixture of different croissants for the different variables.
Figure 6: Normal Croissants, m = 4, r is various

In real data we have a combination of all these effects. We will have low correlations and/or skewness for some of the variables. Thus croissants can be asymmetric and have a lot of scatter. And, of course, there may be deviations from underlying normality as well, which can easily destroy the croissant. #Real Data: GALO The GALO example has served us well for almost 40 years. The objects (individuals) are 1290 school children in the sixth grade of elementary school in the city of Groningen (Netherlands) in 1959. There are four variables * Gender: M/F. * IQ: The original range (60 to 144) has been categorized into 9 ordered categories. * SES: + LoWC = Lower white collar; + MidWC = Middle white collar; + Prof = Professional, Managers; + Shop = Shopkeepers; + Skil = Schooled labor; + Unsk = Unskilled labor. * Teacher's Advice Secondary Education: + Agr = Agricultural; + Ext = Extended primary education; + Gen = General; + Grls = Secondary school for girls; + Man = Manual, including housekeeping; + None = No further education; + Uni = Pre-University. ##MCA We use the first two dimensions, which have eigenvalues 0.5391591 and 0.3915176. The category quantifications from MCA for each variable are in Figure 7: GALO Category Quantifications. Small croissants are visible throughout.
Figure 7: GALO Category Quantifications

The croissant is somewhat clearer if we put the category quantification of all variables in a single plot.
Figure 8: GALO Category Quantifications

We plot the object scores in a somewhat nonstandard way, by putting unnormalized object scores (centroids of category quantifications) within the hull of the category quantifications. This illustrates that, at least to some extent, the shape of the object score plot is determined by the shape of the category quantification plot. We see an obvious croissant for these data.
Figure 9: GALO Object Scores in Convex Hull of Quantifications

##Two-step MCA: PRIMALS One of the standard ways to get rid of MCA croissants is to impose constraints on the category quantifications. In his canonical MCA paper @guttman_41 already argued that only the first dimension of an MCA should be used in linear multivariate analysis, because the remaining dimensions are components of non-linearity. The MCA does not decompose variance, as PCA does in classical multivariate analysis, but it decomposes chi-square, and thus its interpretation should be completely different. There are various ways to incorporate this insight. The simplest one is to use the first MCA dimension to quantify the variables, and to use these quantified variables in a standard linear technique such as PCA or regression. The combination of MCA quantification and PCA is sometimes called PRIMALS [@meulman_gifi_81]. The correlation matrix of the quantified variables is  ## SEX IQ ADV SES ## SEX 1.0000000 0.2250425 0.1530414 0.2624973 ## IQ 0.2250425 1.0000000 0.7838499 0.3591835 ## ADV 0.1530414 0.7838499 1.0000000 0.3809945 ## SES 0.2624973 0.3591835 0.3809945 1.0000000  with eigenvalues (divided by 4)  ## [1] 0.53915911 0.23737936 0.17062083 0.05284071  and eigenvectors  ## [,1] [,2] [,3] [,4] ## [1,] -0.2968564 0.8447797 -0.4381085 0.07927491 ## [2,] -0.5990117 -0.2807262 -0.2625353 -0.70246210 ## [3,] -0.5930407 -0.3551810 -0.1553433 0.70570362 ## [4,] -0.4487360 0.2852833 0.8455794 -0.04738025  Using the first two eigenvectors we can make a PCA plot.
Figure 10: PRIMALS GALO PCA Plot

To show how this analysis is related to MCA, we first normalize the category quantifications of variable $j$ to $y_j'D_jy_j=1$. We then create the first column of $Y$ by multiplying the $y_j$ by the corresponding entries of the first eigenvector of the correlation matrix. And do the same for the second column of $Y$. We still have $Y'DY=I$, while $\mathbf{tr}\ Y'CY$ is equal to the sum of the first two eigenvalues of the correlation matrix, i.e. 0.7765385. For the MCA the corresponding sum is 0.9306767.
Figure 11: PRIMALS GALO Category Quantifications

The corresponding object scores are the left singular vectors of the matrix of quantified variables. There is no croissant in sight any more.
Figure 12: PRIMALS GALO Object Scores

##Nonlinear PCA: PRINCALS The PRIMALS approach is a two-step approach, in the sense that we first maximize by choosing the largest eigenvalue of the Burt table and then we maximize by choosing the largest eigenvalues of the induced correlation matrix. Thus there are two different criteria involved, and this could be seen as a disadvantage. In the PRINCALS approach we choose quantifications of the variables that maximize the sum of the first p eigenvalues of the induced correlation matrix. For $p=1$ this is PRIMALS, but for $p>1$ the two approaches are different. We perform the necessary computations, in the GALO case for $p=2$, using the homals package [@deleeuw_mair_A_09a]. Alternatively, the aspect package [@mair_deleeuw_A_10] could be used. The correlation matrix of the quantified variables is  ## SEX IQ ADV SES ## SEX 1.00000000 0.01951188 -0.07525007 0.3791437 ## IQ 0.01951188 1.00000000 0.58402262 0.4182533 ## ADV -0.07525007 0.58402262 1.00000000 0.3333477 ## SES 0.37914374 0.41825334 0.33334774 1.0000000  with eigenvalues (divided by 4)  ## [1] 0.4825477 0.3013908 0.1146095 0.1014520  and eigenvectors  ## [,1] [,2] [,3] [,4] ## [1,] -0.1899217 0.8085833 0.5568847 -0.001470354 ## [2,] -0.5980856 -0.2443347 0.1488185 -0.748630276 ## [3,] -0.5556153 -0.3817234 0.3664577 0.641317128 ## [4,] -0.5454494 0.3752077 -0.7303706 0.168115701  The largest eigenvalue is now 0.4825477, while for PRIMALS it was the larger value 0.5391591. But the sum of the two largest eigenvalues for PRINCALS is 0.7839385, while for PRIMALS it is the smaller value 0.7765385. Using the first two eigenvectors of the induced correlation matrix we can make a PCA plot.
Figure 13: PRINCALS PCA Plot

In the same way as for PRIMALS we can make the PRINCALS category quantifications and object scores.
Figure 14: PRINCALS GALO Category Quantifications

Figure 15: PRINCALS GALO Object Scores

##Simultaneous Bilinearizing MCA A more complete and more revealing analysis is possible by using the theory in @deleeuw_C_82, see also @bekker_deleeuw_C_88 and @deleeuw_E_15g. We study an alternative way of diagonalizing the matrix $E=D^{-\frac12}CD^{-\frac12}$. Suppose we can find square orthonormal $K_j$ such that $K_j'K_j=I$ and $K_jE_{j\ell}K_\ell$ is diagonal for all $j,\ell=1,\cdots,m$, say equal to the diagonal matrices $\Gamma_{j\ell}$. The condition is that if two submatrices of $E$ have a subscript in common, then then also have the corresponding set of singular vectors in common. @deleeuw_A_88a calls this *simultaneous bilinearizability*. Since that name is a bit verbose, we call the condition *SBL*. We know SBL is possible for the continuous multinormal, with the $K_j$ the Hermite-Chebyshev polynomials. It is also possible if all variables are binary, and it is possible if $m=2$. Those are some important gauges, and we expect SBL to be true approximately in many practical situations. Define the direct sum $K:=K_1\oplus\cdots\oplus K_m$, and $\Gamma:=K'EK$. Under SBL all $\Gamma_{j\ell}$ are diagonal. We can then find a permutation $P$ of rows and columns such that $R:=P'\Gamma P$ is the direct sum of a number of correlation matrices. If we start with $m$ variables with $k$ categories each, then $R=R_1\oplus\cdots\oplus R_k$, where each $R_s$ is of order $m$. The eigenvectors of $R$ are the direct sum $L:=L_1\oplus\cdots\oplus L_k$, and under SBL we have $$L'RL=L'P'\Gamma PL=L'P'K'EKPL=m\Lambda.$$ Thus under SBL the eigenvectors of $E$ are given by $Z=KPL$ and the eigenvalues of $E$ are also the eigenvalues of $\Gamma$ and also the eigenvalues of $R$. Thus for each eigenvalue of $E$ we can say which is the $R_s$ it belongs to. If the variables have a different number of categories then under SBL we have $k$ matrices $R_s$, where $k$ is now the maximum of the $k_j$. The $R_s$ are generally of different orders, because variable $j$ is represented in $R_s$ if and only if $k_j\geq s$. Thus a binary variable only occurs in $R_1$ and $R_2$. In all cases the matrix $R_1$ can be chosen to be the $m\times m$ matrix with all elements equal to one. For all variables binary the matrix $R_2$ is the matrix of phi-coefficients (point correlations). Now SBL will not be precisely be satisfied for empirical data, except in the trivial cases $k_j\equiv 2$ or $m=2$. To create a suitable dat analysis techniue we choose the $K_j$ in such a way that the sum of squares of the off-diagonal elements of the $\Gamma_{j\ell}$ is minimized, or, equivalently, the sum of squares of the diagonal elements is maximized. The details and the code are in @deleeuw_R_08a. For this paper the approximate diagonalization procedure is in the function threeStepApprox() which uses kplSVD() to find the $K_j$ and kplPerm() to find the permutation $P$. Because the SBL theory is somewhat more complicated than other MCA theory, we analyze a simple normal example in detail. r set.seed (12345) x <- discreteNormal(1000, 4, matrix (.75, 4, 4) + .25 * diag (4), repList (c(-2,2), 4)) b <- burtTable (x) d <- 1 / sqrt (diag (b$c)) e <- b$c * outer (d, d) h <- threeStepApprox (e, rep(3, 4))  The Burt Table is  ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] ## [1,] 25 0 0 9 16 0 9 16 0 9 16 0 ## [2,] 0 943 0 18 913 12 14 920 9 15 915 13 ## [3,] 0 0 32 0 21 11 0 20 12 0 27 5 ## [4,] 9 18 0 27 0 0 8 19 0 12 15 0 ## [5,] 16 913 21 0 950 0 15 920 15 12 929 9 ## [6,] 0 12 11 0 0 23 0 17 6 0 14 9 ## [7,] 9 14 0 8 15 0 23 0 0 9 14 0 ## [8,] 16 920 20 19 920 17 0 956 0 15 927 14 ## [9,] 0 9 12 0 15 6 0 0 21 0 17 4 ## [10,] 9 15 0 12 12 0 9 15 0 24 0 0 ## [11,] 16 915 27 15 929 14 14 927 17 0 958 0 ## [12,] 0 13 5 0 9 9 0 14 4 0 0 18  The normalized Burt Table $E$ is  ## 1.0000 0.0000 0.0000 0.3464 0.1038 0.0000 0.3753 0.1035 0.0000 0.3674 0.1034 0.0000 ## 0.0000 1.0000 0.0000 0.1128 0.9646 0.0815 0.0951 0.9690 0.0640 0.0997 0.9627 0.0998 ## 0.0000 0.0000 1.0000 0.0000 0.1204 0.4055 0.0000 0.1143 0.4629 0.0000 0.1542 0.2083 ## 0.3464 0.1128 0.0000 1.0000 0.0000 0.0000 0.3210 0.1183 0.0000 0.4714 0.0933 0.0000 ## 0.1038 0.9646 0.1204 0.0000 1.0000 0.0000 0.1015 0.9654 0.1062 0.0795 0.9738 0.0688 ## 0.0000 0.0815 0.4055 0.0000 0.0000 1.0000 0.0000 0.1146 0.2730 0.0000 0.0943 0.4423 ## 0.3753 0.0951 0.0000 0.3210 0.1015 0.0000 1.0000 0.0000 0.0000 0.3831 0.0943 0.0000 ## 0.1035 0.9690 0.1143 0.1183 0.9654 0.1146 0.0000 1.0000 0.0000 0.0990 0.9687 0.1067 ## 0.0000 0.0640 0.4629 0.0000 0.1062 0.2730 0.0000 0.0000 1.0000 0.0000 0.1199 0.2057 ## 0.3674 0.0997 0.0000 0.4714 0.0795 0.0000 0.3831 0.0990 0.0000 1.0000 0.0000 0.0000 ## 0.1034 0.9627 0.1542 0.0933 0.9738 0.0943 0.0943 0.9687 0.1199 0.0000 1.0000 0.0000 ## 0.0000 0.0998 0.2083 0.0000 0.0688 0.4423 0.0000 0.1067 0.2057 0.0000 0.0000 1.0000  and its eigenvalues are  ## 4.0000 2.1028 1.9429 0.9545 0.7356 0.6493 0.6118 0.5338 0.4695 -0.0000 -0.0000 -0.0000  After application of kplSVD() to $E$ we have  ## 1.0000 0.0000 0.0000 1.0000 0.0000 0.0000 1.0000 -0.0000 -0.0000 1.0000 0.0000 -0.0000 ## 0.0000 1.0000 0.0000 -0.0000 0.3313 -0.0209 -0.0000 0.3622 -0.0212 -0.0000 0.3529 -0.0049 ## 0.0000 0.0000 1.0000 0.0000 -0.0226 0.3858 0.0000 -0.0218 0.4459 0.0000 -0.0080 0.1868 ## 1.0000 -0.0000 0.0000 1.0000 -0.0000 -0.0000 1.0000 -0.0000 -0.0000 1.0000 0.0000 -0.0000 ## 0.0000 0.3313 -0.0226 -0.0000 1.0000 -0.0000 0.0000 0.3054 -0.0141 0.0000 0.4590 -0.0102 ## 0.0000 -0.0209 0.3858 -0.0000 -0.0000 1.0000 0.0000 -0.0136 0.2541 0.0000 -0.0117 0.4289 ## 1.0000 -0.0000 0.0000 1.0000 0.0000 0.0000 1.0000 0.0000 -0.0000 1.0000 0.0000 -0.0000 ## -0.0000 0.3622 -0.0218 -0.0000 0.3054 -0.0136 0.0000 1.0000 0.0000 -0.0000 0.3692 -0.0024 ## -0.0000 -0.0212 0.4459 -0.0000 -0.0141 0.2541 -0.0000 0.0000 1.0000 -0.0000 -0.0042 0.1883 ## 1.0000 -0.0000 0.0000 1.0000 0.0000 0.0000 1.0000 -0.0000 -0.0000 1.0000 0.0000 -0.0000 ## 0.0000 0.3529 -0.0080 0.0000 0.4590 -0.0117 0.0000 0.3692 -0.0042 -0.0000 1.0000 0.0000 ## -0.0000 -0.0049 0.1868 -0.0000 -0.0102 0.4289 -0.0000 -0.0024 0.1883 -0.0000 -0.0000 1.0000  Permuting rows and columns gives the matrix $R=R_1\oplus R_2\oplus R_3$  ## 1.0000 1.0000 1.0000 1.0000 0.0000 0.0000 -0.0000 0.0000 0.0000 0.0000 -0.0000 -0.0000 ## 1.0000 1.0000 1.0000 1.0000 -0.0000 -0.0000 -0.0000 0.0000 0.0000 -0.0000 -0.0000 -0.0000 ## 1.0000 1.0000 1.0000 1.0000 -0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 -0.0000 -0.0000 ## 1.0000 1.0000 1.0000 1.0000 -0.0000 0.0000 -0.0000 0.0000 0.0000 0.0000 -0.0000 -0.0000 ## 0.0000 -0.0000 -0.0000 -0.0000 1.0000 0.3313 0.3622 0.3529 0.0000 -0.0209 -0.0212 -0.0049 ## 0.0000 -0.0000 0.0000 0.0000 0.3313 1.0000 0.3054 0.4590 -0.0226 -0.0000 -0.0141 -0.0102 ## -0.0000 -0.0000 0.0000 -0.0000 0.3622 0.3054 1.0000 0.3692 -0.0218 -0.0136 0.0000 -0.0024 ## 0.0000 0.0000 0.0000 -0.0000 0.3529 0.4590 0.3692 1.0000 -0.0080 -0.0117 -0.0042 0.0000 ## 0.0000 0.0000 0.0000 0.0000 0.0000 -0.0226 -0.0218 -0.0080 1.0000 0.3858 0.4459 0.1868 ## 0.0000 -0.0000 0.0000 0.0000 -0.0209 -0.0000 -0.0136 -0.0117 0.3858 1.0000 0.2541 0.4289 ## -0.0000 -0.0000 -0.0000 -0.0000 -0.0212 -0.0141 0.0000 -0.0042 0.4459 0.2541 1.0000 0.1883 ## -0.0000 -0.0000 -0.0000 -0.0000 -0.0049 -0.0102 -0.0024 -0.0000 0.1868 0.4289 0.1883 1.0000  And we can now diagonalize the blocks.  ## 4.0000 -0.0000 0.0000 -0.0000 -0.0000 -0.0000 0.0000 -0.0000 -0.0000 0.0000 0.0000 -0.0000 ## -0.0000 -0.0000 0.0000 0.0000 -0.0000 -0.0000 0.0000 -0.0000 -0.0000 0.0000 -0.0000 -0.0000 ## 0.0000 0.0000 0.0000 -0.0000 -0.0000 0.0000 -0.0000 -0.0000 -0.0000 0.0000 -0.0000 -0.0000 ## -0.0000 -0.0000 0.0000 -0.0000 0.0000 0.0000 0.0000 -0.0000 -0.0000 0.0000 -0.0000 -0.0000 ## -0.0000 -0.0000 -0.0000 0.0000 2.0923 -0.0000 0.0000 -0.0000 -0.0395 -0.0077 -0.0066 -0.0015 ## -0.0000 -0.0000 0.0000 0.0000 -0.0000 0.7349 -0.0000 0.0000 -0.0037 0.0047 -0.0058 -0.0110 ## 0.0000 0.0000 -0.0000 0.0000 0.0000 -0.0000 0.6412 -0.0000 0.0045 -0.0007 -0.0144 0.0158 ## -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 0.0000 -0.0000 0.5315 -0.0083 -0.0058 0.0038 0.0126 ## -0.0000 -0.0000 -0.0000 -0.0000 -0.0395 -0.0037 0.0045 -0.0083 1.9532 -0.0000 -0.0000 -0.0000 ## 0.0000 0.0000 0.0000 0.0000 -0.0077 0.0047 -0.0007 -0.0058 -0.0000 0.9543 0.0000 0.0000 ## 0.0000 -0.0000 -0.0000 -0.0000 -0.0066 -0.0058 -0.0144 0.0038 -0.0000 0.0000 0.6185 0.0000 ## -0.0000 -0.0000 -0.0000 -0.0000 -0.0015 -0.0110 0.0158 0.0126 -0.0000 0.0000 0.0000 0.4740  and we can sort the diagonal elements  ## 4.0000 2.0923 1.9532 0.9543 0.7349 0.6412 0.6185 0.5315 0.4740 0.0000 -0.0000 -0.0000  to find they are really close to the eigenvalues of $E$. The correlations between the eigenvector of $E$ and the SBL approximations are  ## -1.0000 0.0000 -0.0000 0.0000 0.0000 -0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 -0.0000 ## -0.0000 0.0000 0.0000 -0.0000 -0.9669 -0.0007 0.0007 -0.0014 0.2551 0.0065 0.0043 0.0009 ## -0.0000 0.0000 0.0000 -0.0000 -0.2551 0.0029 -0.0033 0.0057 -0.9669 0.0020 0.0013 0.0003 ## 0.0000 -0.0000 -0.0000 -0.0000 0.0068 0.0215 -0.0023 -0.0138 0.0002 0.9996 -0.0006 -0.0009 ## -0.0000 -0.0000 -0.0000 -0.0000 -0.0003 0.9976 0.0009 -0.0030 0.0030 -0.0216 -0.0500 -0.0420 ## -0.0000 -0.0000 -0.0000 -0.0000 0.0019 0.0182 -0.9045 0.0044 0.0032 -0.0022 0.4180 -0.0826 ## 0.0000 0.0000 0.0000 0.0000 0.0041 0.0472 0.4154 0.0503 -0.0008 0.0011 0.9057 0.0486 ## 0.0000 -0.0000 -0.0000 0.0000 0.0002 0.0091 -0.0356 0.9786 0.0059 0.0134 -0.0490 0.1957 ## 0.0000 -0.0000 -0.0000 -0.0000 -0.0009 -0.0404 0.0901 0.1987 0.0007 0.0029 0.0020 -0.9751 ## -0.0000 -0.4411 -0.4836 0.7560 0.0000 -0.0000 0.0000 -0.0000 -0.0000 -0.0000 -0.0000 0.0000 ## 0.0000 0.0000 0.8424 0.5389 0.0000 -0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 -0.0000 ## -0.0000 -0.8975 0.2377 -0.3716 0.0000 -0.0000 0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000  Thus the eight non-trivial eigenvectors of $E$ correspond, repectively, with first of $R_2$, first of $R_3$, second of $R_3$, second of $R_2$, third of $R_2$, third of $R_3$, fourth of $R_2$, fourth of $R_3$. In terms of the polynomials underlying the category quantifications this means linear, quadratic, quadratic, linear, linear, quadratic, linear, quadratic. Back to GALO. For GALO the eigenvalues of $E/4$ are  ## 1.0000 0.5392 0.3915 0.3833 0.3465 0.3083 0.2764 0.2702 0.2569 0.2517 0.2454 0.2414 0.2379 0.2282 0.2183 0.1902 0.1706 0.1519 0.1283 0.1150 0.0490 -0.0000 -0.0000 -0.0000  The direct sum of the eigenvalues of the $R_s/m$, which have orders 4, 4, 3, 3, 3, 3, 2, 1, 1, is  ## 1.0000 0.0000 0.0000 -0.0000 ## 0.5376 0.2471 0.1636 0.0517 ## 0.3908 0.2425 0.1167 ## 0.3505 0.2442 0.1553 ## 0.3238 0.2527 0.1735 ## 0.2688 0.2606 0.2206 ## 0.2716 0.2284 ## 0.2500 ## 0.2500  and if we sort these we find  ## 1.0000 0.5376 0.3908 0.3505 0.3238 0.2716 0.2688 0.2606 0.2527 0.2500 0.2500 0.2471 0.2442 0.2425 0.2284 0.2206 0.1735 0.1636 0.1553 0.1167 0.0517 0.0000 0.0000 -0.0000  Note that the orders of the matrices $R_s$ in this case are 4, 4, 3, 3, 3, 3, 2, 1, 1. Clearly the approximate eigenvalues from the KPL diagonalization and the actual eigenvalues are very similar, especially the larger ones and the smaller ones. The correlation between actual eigenvectors and approximate eigenvectors is $Q:=Z'KPL$. Let us look at the first three non-trivial eigenvectors of $E$, which correspond with rows 2, 3, and 4 of $Q$. Regress them on the non-trivial columns of $KPL$, which are all columns except the first four. The percentage of variance explained by the colums of $KPL$ are the squares of the elements of rows 2, 3, and 4 of $Q$.  ## 0.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00   ## 0.00 0.00 0.00 0.00 0.98 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00   ## 0.00 0.31 0.01 0.00 0.01 0.01 0.00 0.20 0.07 0.00 0.26 0.02 0.00 0.01 0.08 0.00 0.00 0.00 0.00 0.01  Thus we see the first non-trivial eigenvector of $E$ corresponds with the first eigenvector of $R_2$, which the next eigenvector of $E$ corresponds with the first eigenvector of $R_3$. The third non-trivial eigenvector of $E$ is most closely related to the second eigenvector of $R_2$, but there are also contributions of the first eigenvectors of $R_4$ and $R_5$. There is a somewhat more rigorous method to match the eigenvectors in $Z$ and $KPL$. The Hadamard product $Q*Q$ is doubly stochastic. Finding the permutation matrix closest in the least suares sense to $Q*Q$ is a linear assignment problem that can be solved by using lp.assign() from the lpSolve package [@lpsolve_15]. We indicate for each row the column where the permutation matrix has a one.  ## [1] 1 5 9 6 12 15 21 18 23 24 13 16 10 22 20 7 17 14 19 11 8 2 3 ## [24] 4  Again this indicates that the first three non-trivial eigenvectors of $E$ correspond with the dominant eigenvalue of $R_2$, the dominant eigenvalue of $R_3$, and the second eigenvalue of $R_2$. If the eigenvectors in $K$ correspond with the orthogonal polynomials, as they do in the multinormal case, then the first two nontrivial eigenvectors produce a croissant, because they come from the linear and quadratic transformations, respectively. But if we select the first and the third non-trivial eigenvector of $E$ to plot, then they both come from the linear quantifications and they do not produce a croissant. We first plot $Y=D^{-\frac12}KPL$, using the columns corresponding to the first two non-trivial eigenvectors of $E$. This shows the croissant.
Figure 16: SBL GALO Category Quantifications, Different R

If we choose the first and second eigenvector of $R_2$, i.e. the solution corresponding to the first and third non-trivial eigenvectors of $E$, there is no croissant. In fact, the solution is similar to the one given by PRIMALS, consisting of a line through the origin for each variable, with the category quantifications on the line.
Figure 17: SBL GALO Category Quantifications, Same R

We can compare this to the solution we get by actaully plotting the first and third non-trivial eigenvector of $E$. There is no croissant in this case either.
Figure 18: GALO Category Quantifications, Skip One Dimension

#Real Data: Norway The data is register data from Norway, with information on a high number (n = 159539) of individuals. The analysis is part of a social mobility study conducted by Arild Danielsen, Department of History, Sociology and Innovation, Faculty of Economics and Social Sciences, University College of Southeast Norway. See @danielsen_15. The data have four variables with data on class-fractions (based on occupational categories) of the individualâ€™s two parents (in 2008) and two grand-fathers (in 1980). Thus we have information for each of the 159539 individuals about the membership of Mother, Father, Father's Father, and Mother's Father in the following categories. 1. Cultural elite 2. Professional elite 3. Economic elite 4. Cultural upper-middle 5. Professional upper-middle 6. Economic upper-middle 7. Cultural lower-middle 8. Professional lower-middle 9. Economic lower-middle 10. Skilled workers 11. Unskilled and semi-skilled workers 12. Benefits (family / unempl. / social) 13. Not possible to classify 14. Missing observations In order to simplify our analysis we decided to eliminate all individuals with scores in categories 12-14. This reduces the number of observations to 81259. For the actual sociological analysis this would probably be a wasteful and unwise decision, but for our croissant illustration it makes sense. The marginals of the remaining eleven categories are as follows.  ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] ## [1,] 716 3290 1117 2081 4902 6618 1308 3229 4799 18730 34469 ## [2,] 650 3451 1072 1996 4987 6873 1384 3265 5030 18868 33683 ## [3,] 1373 3839 3710 4198 11458 11078 860 4534 8630 17506 14073 ## [4,] 639 1069 284 6466 10570 2587 3090 14977 7225 17579 16773  ##MCA The first two eigenvalues of an MCA on these data are 0.4486663 and 0.3095208. We plot the category quantifications of the four variables. Calling the shape of the plot a croissant is a stretch, it is better to call it a *wedge*.
Figure 19: Norway Category Quantifications, Joint

The first nine categories have factorial structure, combining the three possibilities Cultural, Professional, Economic with the three Elite, Upper Middle, and Lower Middle. We use this factorial structure to connect the nine categories by lines. The line "Cultural"" connects points 1, 4, 7, the line "Elite" connects points 1, 2, 3, and so on. In total there are six lines. Categories 10 and 11, skilled and unskilled labour, are not connected. We'll leave the interpretation to people who know what they are talking about.
Figure 20: Norway Category Quantifications, Separate

We can replot the separate category quantifications, labeled by the category frequencies. Obviously the sharp point of the wedge corresponds with the categories with the highest frequencies, illustrating the general point that in MCA heavily populated categories tend to be close to the origin of the plot.
Figure 21: Norway Category Quantifications, Frequencies

The 81259 object scores form a very sharp and well-filled wedge.
Figure 22: Norway Object Scores

##Additive Constraints In figure Figure 20 we have used the factorial $3\times 3$ structure of the nine first categories to draw a grid in the plot. Instead, we can require that the nine points for each variable are on a regular grid, by using a design matrix that forces the "Cultural Elite" category point to be the sum of a "Cultural" point and an "Elite" point. And so on for all nine combinations. The grids for the four variables are generally different. The MCA under these constraints has first two eigenvalues 0.4470581 and 0.3038511, which is only marginally smaller than the unconstrained eigenvalues 0.4486663 and 0.3095208. The grid lines clearly display the structure within the wedge.
Figure 23: Norway Category Quantifications, Additive

It is somewhat surprising that the object scores show a pattern which is quite different from the unconstrained scores in Figure 22. The wedge is still there, but it is clearly partitioned into five or six parallel clouds of individuals.
Figure 24: Norway Object Scores, Additive

##Additive and Equality Constraints We can go a step further with constraining the solution by requiring additivity, as before, and in addition that all four grids are the same. The MCA under these constraints has first two eigenvalues 0.4227509 and 0.2935249, which are somewhat smaller than the previous eigenvalues.
Figure 26: Norway Object Scores, Additive and Equality

The object scores show the same clouds within the wedge, but because many more individuals get the same scores there are fewer points and the clouds are far less dense.
Figure 26: Norway Object Scores, Additive and Equality

##Simultaneous Bilinearizing MCA Using additivity does not really get rid of the wedge, but allows us to study it in more detail and take it apart. SBL theory gives us another tool to look at the wedge. For Norway the eigenvalues of $E/4$ are  ## 1.0000 0.4487 0.3095 0.2912 0.2783 0.2745 0.2620 0.2607 0.2594 0.2585 0.2569 0.2561 0.2550 0.2537 0.2527 0.2517 0.2512 0.2501 0.2498 0.2484 0.2481 0.2477 0.2470 0.2468 0.2457 0.2449 0.2446 0.2426 0.2425 0.2414 0.2387 0.2380 0.2365 0.2339 0.2304 0.2288 0.2198 0.2134 0.1984 0.1845 0.1580 0.0000 -0.0000 -0.0000  The direct sum of the eigenvalues of the $R_s/m$ is  ## 1.0000 0.0000 -0.0000 0.0000 ## 0.4486 0.2054 0.1869 0.1591 ## 0.3081 0.2430 0.2407 0.2082 ## 0.2909 0.2470 0.2413 0.2208 ## 0.2742 0.2479 0.2452 0.2328 ## 0.2766 0.2497 0.2438 0.2299 ## 0.2597 0.2530 0.2482 0.2391 ## 0.2577 0.2522 0.2471 0.2430 ## 0.2576 0.2500 0.2466 0.2457 ## 0.2577 0.2510 0.2488 0.2425 ## 0.2551 0.2534 0.2497 0.2418  Note that each row here adds up to one. Also note that the orders of the matrices $R_s$ in this case are 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, because all variables have 11 categories. If we sort the eigenvalues of all $R_s/m$ we find  ## 1.0000 0.4486 0.3081 0.2909 0.2766 0.2742 0.2597 0.2577 0.2577 0.2576 0.2551 0.2534 0.2530 0.2522 0.2510 0.2500 0.2497 0.2497 0.2488 0.2482 0.2479 0.2471 0.2470 0.2466 0.2457 0.2452 0.2438 0.2430 0.2430 0.2425 0.2418 0.2413 0.2407 0.2391 0.2328 0.2299 0.2208 0.2082 0.2054 0.1869 0.1591 0.0000 0.0000 -0.0000  The two sets of sorted eigenvalues are plotted in Figure 27. It is pretty obvious they are close.
Figure 27: Eigenvalues of E and R for Norway Example

The assignment problem for the eigenvectors gives the solution  ## [1] 1 5 9 13 21 17 25 33 29 26 37 42 30 22 41 38 43 31 27 34 19 39 18 ## [24] 35 36 14 40 32 10 23 15 28 11 44 20 24 16 12 6 7 8 2 3 4  Again we see the familiar pattern: eigenvalues of $E$ are the first eigenvalue of (trivial) $R_1$, followed by the first eigenvalue of $R_2$, of $R_3$, and of $R_4$. The remaining eigenvalue of $R_2$ are actually the smallest ones of the whole sequence, except for the trivial zeroes. This tells us that these data are highly one-dimensional. We see the familiar wedge again in Figure 28, which plots the dominant eigenvectors corresponding to $R_2$ and $R_3$.
Figure 28: SBL Norway Category Quantifications, Different R

#Appendix: Code r discreteNormal <- function(n, m, r, knots) { x <- matrix(rnorm(n * m), n, m) x <- x %*% chol(r) for (j in 1:m) { x[, j] <- apply(outer(x[, j], c(knots[[j]], Inf), ">"), 1, which.min) } return(x) } repList <- function(x, m) { z <- vector("list", m) return(lapply(z, function(i) x)) } burtTable <- function(x, degrees = rep(-1, ncol(x)), knots = NULL, center = FALSE, standardize = FALSE, orthonormalize = FALSE) { n <- nrow(x) m <- ncol(x) g <- matrix(0, n, 0) l <- rep(0, m) for (j in 1:m) { z <- x[, j] if (degrees[j] < 0) { h <- ifelse(outer(z, sort(unique(z)), "=="), 1, 0) } else { h <- bsplineBasis(z, degrees[j], knots[[j]]) } if (center) { h <- center(h)[, -1] } if (standardize) { h <- standardize(h) } if (orthonormalize) { h <- gs(h)$q } g <- cbind(g, h) l[j] <- ncol(h) } return(list(c = crossprod(g), g = g, ord = l)) } directSum <- function(x) { m <- length(x) nr <- sum(sapply(x, nrow)) nc <- sum(sapply(x, ncol)) z <- matrix(0, nr, nc) kr <- 0 kc <- 0 for (i in 1:m) { ir <- nrow(x[[i]]) ic <- ncol(x[[i]]) z[kr + (1:ir), kc + (1:ic)] <- x[[i]] kr <- kr + ir kc <- kc + ic } return(z) } kplPerm <- function(cc, k) { kl <- unlist(sapply(k, function(i) 1:i)) p <- ifelse(outer(1:sum(k), order(kl), "=="), 1, 0) return(list(pcp = t(p) %*% cc %*% p, perm = p, ord = as.vector(table(kl)))) } makeE <- function(cc, k) { dd <- mInvSqrt(blockSelect(cc, k)) return(dd %*% cc %*% dd) } mInvSqrt <- function(x) { ex <- eigen(x) ew <- abs(ex$values) ev <- ifelse(ew == 0, 0, 1/sqrt(ew)) ey <- ex$vectors return(ey %*% (ev * t(ey))) } blockSelect <- function(cc, k) { l <- unlist(lapply(1:length(k), function(i) rep(i, k[i]))) return(cc * ifelse(outer(l, l, "=="), 1, 0)) } kplSVD <- function(e, k, eps = 1e-06, itmax = 500, verbose = TRUE, vectors = TRUE) { m <- length(k) sk <- sum(k) ll <- kk <- ww <- diag(sk) itel <- 1 ossq <- 0 klw <- 1 + cumsum(c(0, k))[1:m] kup <- cumsum(k) ind <- lapply(1:m, function(i) klw[i]:kup[i]) for (i in 1:m) kk[ind[[i]], ind[[i]]] <- t(svd(e[ind[[i]], ])$u) kek <- kk %*% e %*% t(kk) for (i in 1:m) for (j in 1:m) ww[ind[[i]], ind[[j]]] <- ifelse(outer(1:k[i], 1:k[j], "=="), 1, 0) repeat { for (l in 1:m) { if (k[l] == 2) (next)() li <- ind[[l]] for (i in (klw[l] + 1):(kup[l] - 1)) for (j in (i + 1):kup[l]) { bi <- kek[i, -li] bj <- kek[j, -li] wi <- ww[i, -li] wj <- ww[j, -li] acc <- sum(wi * bi^2) + sum(wj * bj^2) acs <- sum((wi - wj) * bi * bj) ass <- sum(wi * bj^2) + sum(wj * bi^2) u <- eigen(matrix(c(acc, acs, acs, ass), 2, 2))$vectors[, 1] c <- u[1] s <- u[2] kek[-li, i] <- kek[i, -li] <- c * bi + s * bj kek[-li, j] <- kek[j, -li] <- c * bj - s * bi if (vectors) { ki <- kk[i, li] kj <- kk[j, li] kk[i, li] <- c * ki + s * kj kk[j, li] <- c * kj - s * ki } } } nssq <- sum(ww * kek^2) - sum(diag(kek)^2) if (verbose) cat("Iteration ", formatC(itel, digits = 4), "ssq ", formatC(nssq, digits = 10, width = 15), "\n") if (((nssq - ossq) < eps) || (itel == itmax)) (break)() itel <- itel + 1 ossq <- nssq } return(list(kek = kek, kk = kk, itel = itel, ssq = nssq)) } inducedR <- function(c, y, k) { m <- length(k) l <- unlist(lapply(1:m, function(i) rep(i, k[i]))) g <- ifelse(outer(l, 1:m, "=="), 1, 0) s <- g * matrix(y, length(y), m) r <- crossprod(s, c %*% s) e <- abs(diag(r)) d <- ifelse(e == 0, 0, e) return(r/sqrt(outer(d, d))) } threeStepApprox <- function(e, k, eps = 1e-06, itmax = 500, verbose = FALSE, vectors = TRUE) { ee <- eigen(e) ev <- ee$vectors ea <- ee$values f <- kplSVD(e, k, eps = eps, itmax = itmax, verbose = verbose, vectors = TRUE) ef <- f$kek kf <- f$kk g <- kplPerm(ef, k) eg <- g$pcp kg <- crossprod(g$perm, kf) gg <- g$ord gl <- cumsum(gg) lg <- 1 + c(0, gl)[-length(gl) - 1] bb <- lapply(1:length(gl), function(i) eigen(eg[lg[i]:gl[i], lg[i]:gl[i]])\$vectors) vc <- directSum(bb) eh <- crossprod(vc, eg %*% vc) kh <- crossprod(kg, vc) cc <- crossprod(ev, kh) mc <- apply(cc, 1, function(x) max(abs(x))) wc <- apply(cc, 1, function(x) which.max(abs(x))) yc <- apply(cc, 2, function(x) max(abs(x))) vc <- apply(cc, 2, function(x) which.max(abs(x))) return(list(eval = ea, evec = ev, aval = diag(eh), avec = kh, bval = ef, cval = eg, dval = eh, cc = cc, mc = mc, wc = wc, yc = yc, vc = vc, gg = gg)) }  #Appendix: NEWS 0.01 12/04/15 * First working version posted 0.02 12/06/15 * more runs of examples 0.03 12/07/15 * normal examples 1.00 12/08/15 * added text * added constrained examples * first published version 1.01 12/09/15 * GALO text * GALO single * GALO PCA 1.02 12/10/15 * subsections * removed Norway data from repository * short MCA introduction * added PRIMALS section to GALO * added PRINCALS section to GALO 1.03 12/12/15 * code for three-step MCA * GALO analysis by three-step MCA 1.04 12/13/15 * GALO analysis by three-step MCA * intro theory for simultaneous bilinearity 2.00 12/14/15 * GALO analysis with SBL completed * Norway analysis with SBL * That's it for now 2.01 12/16/15 * small detailed SBL example #References