```{r example_approx, fig.align = "center", echo = FALSE} par(pty = "s") plot(w, outer(uv$u,uv$v), ylab = "c", col = "RED") abline(0,1, col = "BLUE", lwd = 2) ```

The optimal $u$ and $v$ computed with `mbound()` are plotted in figure `r figure_nums("uvplots", display = "n")`.

```{r uvplots, fig.align = "center", echo = FALSE} par(mfrow=c(1,2)) plot(1:24, uv$u, xlab="hour", ylab="u", col = "RED", lwd = 3, type="l") plot(1:7, uv$v, xlab="day", ylab="v", col = "RED", lwd = 3, type="l") ```

The `wPCA()` function has a `bnd` parameter, which can be equal to `all`, or `col`, or `row`, or `opt`. If `bnd="all"` then $c_{ij}=\max w_{ij}$, as in @kiers_97, if `bnd="row"` then $c_{ij}=\max_j w_{ij}$, as in @groenen_giaquinto_kiers_03. If `bnd="col"` then $c_{ij}=\max_i w_{ij}$ and if `bnd="opt"` then we compute the optimal $u_iv_j$. ```{r wpca, cache = FALSE, echo = FALSE} p <- 1 h1 <- wPCA(crashi, w, p = p, bnd = "all") h2 <- wPCA(crashi, w, p = p, bnd = "col") h3 <- wPCA(crashi, w, p = p, bnd = "row") h4 <- wPCA(crashi, w, p = p, bnd = "opt") ``` First look at $p=1$. For all four values of the `bnd` parameter we find a chi-square equal to `r h1$f` with $168-((24+7)-1)=138$ degrees of freedom. The iteration count for `bnd="all","col","row","opt"` is, respectively, `r c(h1$itel, h2$itel, h3$itel, h4$itel)`. Because in this example the between-row/within-column variation is so much larger than the within-row/between-column variation, it makes sense that `bnd="row"` performs well, almost as bad as `bnd="opt"`, while `bnd="col"` is bad, almost as bad as `bnd="all"`. The one-dimensional plots of $A$ and $B$ from $z=ab'$ are in figure `r figure_nums("abplots1", display = "n")`.

```{r abplots1, fig.align = "center", echo = FALSE} par(mfrow=c(1,2)) s <- svd (h1$z) plot(1:24, s$u[,1], xlab="hour", ylab="u", col = "RED", lwd = 3, type="l") plot(1:7, s$v[,1], xlab="day", ylab="v", col = "RED", lwd = 3, type="l") ```

```{r aa, echo = FALSE} ha1<-lrDerive(h1$z, crashi, p = p, u = h1$u, v = h1$v, w) la1 <- max(Mod(eigen(ha1)$values)) ha2<-lrDerive(h2$z, crashi, p = p, u = h2$u, v = h2$v, w) la2 <- max(Mod(eigen(ha2)$values)) ha3<-lrDerive(h3$z, crashi, p = p, u = h3$u, v = h3$v, w) la3 <- max(Mod(eigen(ha3)$values)) ha4<-lrDerive(h4$z, crashi, p = p, u = h4$u, v = h4$v, w) la4 <- max(Mod(eigen(ha4)$values)) ``` We also computed the spectral norms of the derivative at the solution, using the functions in `lowrank.R`. For `bnd="all","col","row","opt"` the convergence rate is, respectively, `r c(la1, la2, la3, la4)`. ```{r wpca2, cache = FALSE, echo = FALSE} p <- 2 h1 <- wPCA(crashi, w, p = p, bnd = "all") h2 <- wPCA(crashi, w, p = p, bnd = "col") h3 <- wPCA(crashi, w, p = p, bnd = "row") h4 <- wPCA(crashi, w, p = p, bnd = "opt") ``` For $p=2$ for all four values of the `bnd` parameter we find chi-square equal to `r h1$f` with $168-((24+7)*2-4)=110$ degrees of freedom. The iteration count for `bnd="all","col","row","opt"` is, respectively, `r c(h1$itel, h2$itel, h3$itel, h4$itel)`.

```{r abplots2, fig.align = "center", echo = FALSE} par(mfrow=c(1,2)) s <- svd (h1$z) su <- s$u[,1:2] plot(su, xlab="dim 1", ylab="dim 2", type ="n") text(su, as.character(0:23), col = "RED") for (i in 1:23) lines(matrix(c(su[i,], su[i+1,]), 2, 2, byrow = TRUE), col = "BLUE", lwd = 2) sv <- s$v[,1:2] plot(sv, xlab="dim 1", ylab="dim 2", type = "n") text(sv, as.character(1:7), col = "RED") for (i in 1:6) lines(matrix(c(sv[i,], sv[i+1,]), 2, 2, byrow = TRUE), col = "BLUE", lwd = 2) ```

```{r aa2, echo = FALSE} ha1<-lrDerive(h1$z, crashi, p = p, u = h1$u, v = h1$v, w) la1 <- max(Mod(eigen(ha1)$values)) ha2<-lrDerive(h2$z, crashi, p = p, u = h2$u, v = h2$v, w) la2 <- max(Mod(eigen(ha2)$values)) ha3<-lrDerive(h3$z, crashi, p = p, u = h3$u, v = h3$v, w) la3 <- max(Mod(eigen(ha3)$values)) ha4<-lrDerive(h4$z, crashi, p = p, u = h4$u, v = h4$v, w) la4 <- max(Mod(eigen(ha4)$values)) ``` For `bnd="all","col","row","opt"` the convergence rate is, respectively, `r c(la1, la2, la3, la4)`. #Discussion The main question, which we did not answer, is if the decrease in computer time by lowering the number of iterations compensates for the extra computation of the optimal rank-one bounds. Computing the optimal bounds is a quadratic programming problem of size $nm$, which can be substantial and rapidly becomes impractical for really large data sets. In that case, alternative strategies are needed, for example starting with row or column bounds and performing just a few iterations of an iterative quadratic programming method. Of course we also have not exploited $\ell_1$ and $\ell_\infty$ solutions, which have linear programming solutions that are less expensive than the quadratic ones. Also note that in each majorization step we have solved a low-rank approximation problem. We would also have a convergent algorithm if we replaced the singular value decomposition in each majorization step by one or more alternating least squares iterations. This will generally lead to more majorization iterations, but each majorization iteration becomes less expensive. In addition we can use $A$ and $B$ from $X\approx AB'$ for the previous majorization iteration as starting points for the next one. There is an interesting recent paper by @szlam_tulloch_tygert_17, who show that just a few alternating least squares iterations will generally provide a very good low-rank approximation. This implies that it may not pay off to optimize the number of iterations needed for convergence, if convergence is not really of much practical value anyway. It also indicates that performing only a small number of "inner" alternating least squares iterations in each majorization step is probably a good strategy. Again, that is something to be explored. #Appendix: Code ##mbound.R ```{r file_auxilary, code = readLines("mbound.R")} ``` ##wPCA.R ```{r file_auxilary, code = readLines("wPCA.R")} ``` ##lowrank.R ```{r file_auxilary, code = readLines("lowrank.R")} ``` #References