---
title: "Least Squares Solutions of Linear Inequality Systems"
author: "Jan de Leeuw"
date: "Version 21, December 20, 2016"
output:
pdf_document:
keep_tex: yes
number_sections: yes
toc: yes
toc_depth: 3
html_document:
keep_md: yes
number_sections: yes
toc: yes
toc_depth: 3
fontsize: 12pt
graphics: yes
bibliography: linineq.bib
abstract: We discuss the problem of finding an approximate solution to an overdetermined
system of linear inequalities, or an exact solution if the system is consistent.
Theory and R code is provided for four different algorithms. Two techniques use
active set methods for non-negatively constrained least squares, one uses alternating
least squares, and one uses a nonsmooth Newton method.
---
```{r function_code, echo = FALSE}
source("linInEq.R")
source("matrix.R")
```
```{r packages, echo = FALSE}
options (digits = 10)
library(captioner)
library(lsei)
```
```{r equation_numbering, echo = FALSE}
figure_nums <- captioner (prefix = "Figure")
theorem_nums <- captioner (prefix = "Theorem")
lemma_nums <- captioner (prefix = "Lemma")
corollary_nums <- captioner (prefix = "Corollary")
figure_nums (name = "first_example", caption = "Example Cubic", display = FALSE)
figure_nums (name = "sublevel", caption = "Sublevel Majorization", display = FALSE)
```
Note: This is a working paper which will be expanded/updated frequently. All suggestions for improvement are welcome. The directory [gifi.stat.ucla.edu/linineq](http://gifi.stat.ucla.edu/linineq) has a pdf version, the complete Rmd file with all code chunks, the bib file,
and the R source code.
#Introduction
For another project (@deleeuw_E_16r) I needed a quick way to check if a system of inhomogeneous linear inequalities $Ax\leq b$ is consistent (i.e. has at least one solution). There is a substantial literature on this problem, but not much R code.
The approach we take here is to minimize the least squares loss function
\begin{equation}\label{E:loss}
\sigma(x,y)=\mathbf{SSQ}(Ax-b+y)=\frac12\mathbf{SSQ}(\begin{bmatrix}A&I\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix}-b),
\end{equation}
with $\mathbf{SSQ}$ the sum of squares, which we minimize over $x$ and over the *slack variables* $y\geq 0$. For brevity, we will call this problem
$\mathcal{P}$. We call a solution $(x,y)$ of $\mathcal{P}$ the *augmented least squares solution of the system of linear inequalities*.
For homogeneous linear inequalities least squares solutions were already computed in @deleeuw_R_68d and @deleeuw_R_69c, in some shaky papers
in which *alternating least squares* was first introduced as a general principle of algorithm construction. For homogeneous inequalities there is
always a (trivial) solution with both $x=0$ and $y=0$, so an additional constraint such as $\|x\|=1$ or $\|Ax=1\|$ or $\|y\|=1$ was used in these early papers.
Clearly $\sigma$ of $\eqref{E:loss}$ is a convex quadratic in $(x,y)$, which is bounded below, and by the Frank-Wolfe theorem (@frank_wolfe_56) attains its unique minimum at some $x$ and $y\geq 0$. The minimum is equal to zero if and only if the system is consistent. The matrix $\begin{bmatrix}A&-I\end{bmatrix}$ is singular, however, which means the minimizer is not necessarily unique.
Initially, there are no constraints on the shape of $A$, it can be singular, it can be tall, it can be wide. But we can do some preprocessing and replace $A$ by an orthonormal basis for its column space. Thus we suppose without loss of generality from now on that $A$ is an $n\times m$ matrix satisfying $A'A=I$, which eliminates at least one obvious type of non-uniqueness.
We analyze $\mathcal{P}$, using theorem 31.4 of @rockafellar_70. The set of all $(x,y)\in\mathbb{R}^m\otimes\mathbb{R}^n$ with $y\geq 0$ is a closed convex cone $\mathcal{K}$. The negative of the polar of $\mathcal{K}$ is the set $\mathcal{K}^\star=\{(u,v)\mid x'u + y'v\geq 0\}$ for all $(x,y)\in\mathcal{K}$, and thus $\mathcal{K}^\star=\{(0,v)\mid v\geq 0\}$. The derivatives of $\sigma$ are
$$
\begin{bmatrix}\mathcal{D}_1\sigma(x,y)\\\mathcal{D}_2\sigma(x,y)\end{bmatrix}=\begin{bmatrix}A'\\I\end{bmatrix}\left(Ax+y-b\right)=
\begin{bmatrix}x-A'(b-y)\\y-(b-Ax)\end{bmatrix}.
$$
If follows that $(x, y)$ is a solution of problem $\mathcal{P}$ if and only if
\begin{align*}
x&=A'(b-y),\\
y&\geq b-Ax,\\
y&\geq 0,\\
y'(y-(b-Ax))&=0.
\end{align*}
The last three conditions taken together are equivalent to $y=(b-Ax)_+=\max(0,b-Ax)$.
#Direct Attack
##Alternating Least Squares
In block relaxation algorithms, and specifically in alternating least squares (ALS) algorithms, we minimize over a subset of the variables
while keeping all other variables fixed at their current values, and then we cycle over subsets (@deleeuw_C_94c). In our case the two natural subsets are $x$ and $y$, and the ALS algorithm is
\begin{align*}
x^{(k+1)}&=A'(b-y^{(k)}),\\
y^{(k+1)}&=\max(0, b-Ax^{(k+1)}).
\end{align*}
Because the loss function is strictly convex in $x$ for given $y$ and in $y$ for given $x$ the block relaxation algorithm converges linearly
to a stationary value, which is necessarily a minimizer, and a fixed point of the ALS algorithm.
Convergence is very stable but typically slow, especially when the system is consistent, and consequently the minimum of $\eqref{E:loss}$ is zero. Iterations are very simple and fast however, and even for moderately sized problems, which take hundreds of iterations, they are still feasible.
For historical and sentimental reasons I am fond of this approach, implemented in the R function `LinInEqALS`.
Two examples follow, the first for an inconsistent system, the second for a consistent one. We iterate until $\sigma^{(k)}-\sigma^{(k+1)}<1e-15$.
The rate we compute is $(\sigma^{(k)}-\sigma^{(k+1)})/(\sigma^{(k-1)}-\sigma^{(k)})$. Since we ae only interested in the consistency of the system of inequalities we do not pay much attention to the solutions, which in the case of consistency are generally not unique anyway.
```{r als_inconsistent}
set.seed (12345)
a <- gsRC (matrix (rnorm (200), 100, 2))$x
b_inc <- rnorm (100)
print (LinInEqALS (a, b_inc, eps = 1e-15, itmax = 10000))
```
```{r als_consistent}
b_con <- rowSums (a) + rchisq (100, 1)
print (LinInEqALS (a, b_con, eps = 1e-15, itmax = 10000))
```
##Quadratic Programming
Minimizing $\eqref{E:loss}$ over $x$ and $y\geq 0$ is a quadratic programming problem, more precisely a partially non-negative least squares problem, in $n+m$ variables with a singular coefficient matrix $\begin{bmatrix}A&I\end{bmatrix}$. This is a somewhat unsatisfactory formulation, because it leaves the problem unnecessarily big and sparse, but it is easy to implement using the `pnnls` function from the `lsei` package (@wang_lawson_hanson_15).
```{r use_lsei_inc}
print (LinInEqPNNLS (a, b_inc))
```
```{r use_lsei_con}
print (LinInEqPNNLS (a, b_con))
```
#Using Projection
*Projection* means constructing the two new loss functions $\sigma_\star(x)=\min_{y\geq 0}\sigma(x,y)$ and $\sigma_\dagger(y)=\min_x\sigma(x,y)$, and then minimizing those projected loss functions. Note that $\sigma_\star$ is a function of $m$ variables, while $\sigma_\dagger$ is a function
of $n$ variables.
##Projecting out $x$
We have
$$
\sigma_\dagger(y):=\min_x\sigma(x,y)=\mathbf{SSQ}(A(A'(b-y))-(b-y))=\mathbf{SSQ}((I-AA')(b-y)).
$$
Choose an orthonormal basis $A_\perp$ for the null space of $A$, and set $c=A_\perp'b$. Then
$$
\sigma_\dagger(y)=\mathbf{SSQ}(A_\perp'y-c),
$$
and we must minimize $\sigma_\dagger(y)$ over $y\geq 0$, which can be done with the `nnls` function of `lsei` (@wang_lawson_hanson_15). For null space bases and least squares solutions our function `LinInEqNNLS` uses the routines of @deleeuw_E_16i. We reanalyze our two examples in this way.
```{r nnls_inconsistent}
print (LinInEqNNLS (a, b_inc))
```
```{r nnls_consistent}
print (LinInEqNNLS (a, b_con))
```
The approach works well, but it has the disadvantage we still have to solve a problem with $n$ variables, and that consequently the size of the null space basis $L$ can be prohibitive.
##Projecting out $y$
Projecting out $y$ leads to a function of $m$ variables. We can write
$$
\sigma_\star(x)=\sigma(x,(b-Ax)_+)=\mathbf{SSQ}((Ax-b)_+).
$$
Thus $\sigma_(\bullet,\star)_\star$ is convex, piecewise quadratic, and differentiable, although generally not twice differentiable.
Since @han_80 the solution $x$ minimizing $\sigma_\star$ is known as the least squares solution of the system of linear inequalities. @han_80 shows that the problem of minimizing $\sigma_\star$ is equivalent to the quadratic program of minimizing $\frac12 z'z$ over $Ax-b\leq z$ over $(x,z)$.
Han also proposes an efficient algorithm to minimize $\sigma_\star$. Various variations have been proposed by
@yang_90, @spoonamore_92, @bramley_winnicka_96, @carp_popa_serban_13, and @popa_serban_14, but we implement the original algorithm. What is basically the Han algorithm is presented as a generalized Newton method
in @pinar_98 and @salahi_ketabchi_10.
Suppose $I(x)=\{i\mid a_i'x\leq b\}$, the index set of the violated inequalities. Then compute the *direction of descent* $d$.
$$
d^{(k)}:=\mathop{\mathbf{argmin}}_d \mathbf{SSQ}(A_{I^{(k)}}d-(b_{I^{(k)}}-A_{I^{(k)}}x^{(k)}))
$$
If the least squares solution is not unique, we have to make a choice. @han_80 used the minimum norm least squares solution, while @bramley_winnicka_96 used the least squares solution based on QR with pivoting. Since we have the code from @deleeuw_E_16i, we use QR with pivoting as well. Next we compute the *step size* $\lambda$.
$$
\lambda^{(k)}:=\mathop{\mathbf{argmin}}_\lambda \sigma_\star(x^{(k)}+\lambda d^{(k)})
$$
Since $\sigma_\star$ is a convex and piecewise quadratic function of $\lambda$ we just move from left to right, going from one breakpoint to another, as in @deleeuw_E_16r, until we hit the minimum corresponding with the smallest minimizer. To finish the iteration we make the step $x^{(k+1)}=x^{(k)}+\lambda^{(k)}d^{(k)}$. In our examples the Han algorithm finishes in three iterations, obviously with superlinear convergence rate.
```{r han_inconsistent}
print (LinInEqHan (a, b_inc))
```
```{r han_consistent}
print (LinInEqHan (a, b_con))
```
#Discussion
We started this paper with the problem of minimizing the loss function $\sigma:\mathbb{R}^m\otimes\mathbb{R}^n\rightarrow\mathbb{R}_+$, and then define $\sigma_\star:\mathbb{R}^m\rightarrow\mathbb{R}_+$ and $\sigma_\dagger:\mathbb{R}^n\rightarrow\mathbb{R}_+$ by using projection. But we could also have started with the problem of minimizing $\sigma_\star$ on $\mathbb{R}^m$ and then use *augmentation* (@deleeuw_C_94c) to arrive at minimizing
$\sigma$ over $\mathbb{R}^m\otimes\mathbb{R}^n_+$.
If the problem is to minimize a function $f$ over $X$, then an augmentation of $f$ is a function $g$ on $X\otimes Y$ such that $f(x)=\min_{y\in Y}g(x,y)$
for all $x\in X$. Then augmented minimization problem is to minimize $g$ over $x\in X$ and $y\in Y$. In our example $\sigma_\star=\min_{y\geq 0}\sigma(x,y)$
and thus $\sigma$ is an augmentation of $\sigma_\star$.
There is little doubt that the Han algorithm is computationally superior to the othetr options, although it requires somewhat more intricate programming. The quadratic programming methods are very simple to implment, given the existence of `lsei` on CRAN, but they are wasteful because they do not use the special structure of the problem. The ALS method is cute and very easy to program from scratch, but it may be too slow for larger problems.
With different technology, and with more rigor and detail, there are similar arguments in a very nice recent expository paper by @hiriart-urruty_contesse_penot_16.
#Appendix: Code
##LinInEqALS.R
```{r file_auxilary, code = readLines("LinInEqALS.R")}
```
##LinInEqPNNLS.R
```{r file_auxilary, code = readLines("LinInEqPNNLS.R")}
```
##LinInEqNNLS.R
```{r file_auxilary, code = readLines("LinInEqNNLS.R")}
```
##LinInEqHan.R
```{r file_auxilary, code = readLines("LinInEqHan.R")}
```
#References