# Exceedingly Simple Isotone Regression with Ties Jan de Leeuw Version 001, January 19, 2016 Note: This is a working paper which will be expanded/updated frequently. The directory [gifi.stat.ucla.edu/isotone](http://gifi.stat.ucla.edu/isotone) has a pdf copy of this article, the complete Rmd file that includes all code chunks, and R and FORTRAN files with the code. Suggestions are welcome 24/7. #Problem In *least squares monotone regression with a simple order* we minimize a loss function of the form \begin{equation} \sigma(x)=\sum_{i=1}^n w_i(x_i-y_i)^2 \end{equation} over $x_1\leq x_2\leq\cdots\leq x_n$. The vector $w$ contains positive *weights*. The vector $y$ is the *target*. Monotone regression projects the target on the cone defined by the inequality constraints. There are many `R` implementations available in various CRAN packages, starting with `isoreg()` in the `stats` package (if the weights are all equal). The function `amalgm()` in this paper is another such implementation, merely an `R` wrapper for a double precision modification of the `FORTRAN` code of @cran_80. We could also have used the `C` implementation in recent versions of the `smacof` package. Various additions and improvements of the original package of @deleeuw_mair_A_09c are discussed in @mair_deleeuw_groenen_U_15. If there are ties in the data the required order on $x$ needs to take those into account. There are three basic approaches. The first two are described by @kruskal_64b, the third is in @deleeuw_A_77. In the primary approach. In the *primary approach* we require ordering only between tie blocks, not within tieblocks. The *secondary approach* requires equality within tie blocks and ordering between tie blocks, the *tertiary approach* only requires ordering of the tie block means. So if the data used to generate the order are $$ \begin{bmatrix}2&2&3&1&3&3&1&2&1\end{bmatrix} $$ then in the primary approach we require $$ \{x_4,x_7,x_9\}\leq\{x_1,x_2,x_8\}\leq\{x_3,x_5,x_6\}, $$ in the secondary approach this becomes $$ x_4=x_7=x_9\leq x_1=x_2=x_8\leq x_3=x_5=x_6, $$ and in the tertiary approach it is $$ \frac{x_4+x_7+x_9}{3}\leq\frac{x_1+x_2+x_8}{3}\leq\frac{x_3+x_5+x_6}{3} $$ #Implementation The code in the appendix implements the three approaches to ties by using a list of indices, computed by ```r f <- sort(unique(x)) g <- lapply(f, function (z) which(x == z)) ``` Thus if the data are ``` ## [1] 2 2 3 1 3 3 1 2 1 ``` then the list is ``` ## [[1]] ## [1] 4 7 9 ## ## [[2]] ## [1] 1 2 8 ## ## [[3]] ## [1] 3 5 6 ``` The function `isotone()` uses simple list manipulations to get the data in the correct form for a call to `amalgm()`. Study it at your leisure. #Example Again we use the same data to generate the order ``` ## [1] 2 2 3 1 3 3 1 2 1 ``` We also choose the target ``` ## [1] 2 1 6 5 4 7 8 9 3 ``` The primary approach gives ``` ## [1] 4.000000 4.000000 6.333333 4.000000 6.333333 7.000000 4.000000 6.333333 ## [9] 3.000000 ``` with a loss function value of 42.6666667. For the secondary approach the projection is ``` ## [1] 4.666667 4.666667 5.666667 4.666667 5.666667 5.666667 4.666667 4.666667 ## [9] 4.666667 ``` with a loss function value of 58. And for the tertiary approach we find ``` ## [1] 2.666667 1.666667 6.000000 4.333333 4.000000 7.000000 7.333333 9.666667 ## [9] 2.333333 ``` with a loss function value of only 2.6666667. We show the transformations of the data in figure 1.