Chapter 7 Regression
7.1 Ordinary least-squares
Both OLS and WLS are implemented here using QR decomposition. As for OLS this is very easily done in R. Given some feature matrix X
and a corresponding outcome variable y
we can use qr.solve(X, y)
to compute \(\hat\beta\).
7.2 Weighted least-squares
For the weighted least-squares estimator we have: (see appendix for derivation)
\[ \begin{equation} \begin{aligned} && \hat\beta_m^{WLS}&= \left( \mathbf{X}^T \Phi^{-1} \mathbf{X} \right)^{-1} \mathbf{X}^T\Phi^{-1}\mathbf{y}\\ \end{aligned} \tag{7.1} \end{equation} \]
In R weighted this can be implemented as follows:
function (X, y, weights = NULL)
{if (!is.null(weights)) {
<- diag(weights)
Phi <- qr.solve(t(X) %*% Phi %*% X, t(X) %*% Phi %*% y)
beta
}else {
<- qr.solve(X, y)
beta
}return(beta)
}
7.3 Logisitic regression
To model the probability of \(y=1\) we will use logistic regression:
\[ \begin{equation} \begin{aligned} && \mathbf{p}&= \frac{ \exp( \mathbf{X} \beta )}{1 + \exp(\mathbf{X} \beta)} \end{aligned} \tag{7.2} \end{equation} \]
Equation (7.2) is not estimated directly but rather derived from linear predictions
\[ \begin{equation} \begin{aligned} \log \left( \frac{\mathbf{p}}{1-\mathbf{p}} \right)&= \mathbf{X} \beta \\ \end{aligned} \tag{7.3} \end{equation} \]
where \(\beta\) can be estimated through iterative re-weighted least-squares (IRLS) which is a simple implementation of Newton’s method (see for example Wasserman (2013); a complete derivation can also be found in the appendix):
\[ \begin{equation} \begin{aligned} && \beta_{s+1}&= \left( \mathbf{X}^T \mathbf{W} \mathbf{X} \right) \mathbf{X}^T \mathbf{W}z\\ \text{where}&& z&= \mathbf{X}\beta_{s} + \mathbf{W}^{-1} (\mathbf{y}-\mathbf{p}) \\ && \mathbf{W}&= \text{diag}\{p_i(\beta_{s})(1-p_i(\beta_{s}))\}_{i=1}^n \\ \end{aligned} \tag{7.4} \end{equation} \]
In R this can be implemented from scratch as below. For the empirical exercises we will rely on glm([formula], family="binomial")
which scales much better to higher dimensional problems than my custom function and also implements weighted logit.
function (X, y, beta_0 = NULL, tau = 1e-09, max_iter = 10000)
{if (!all(X[, 1] == 1)) {
<- cbind(1, X)
X
}<- ncol(X)
p <- nrow(X)
n if (is.null(beta_0)) {
<- matrix(rep(0, p))
beta_latest
}<- diag(n)
W <- T
can_still_improve <- 1
iter while (can_still_improve & iter < max_iter) {
<- X %*% beta_latest
y_hat <- exp(y_hat)/(1 + exp(y_hat))
p_y <- crossprod(X, y - p_y)
df_latest diag(W) <- p_y * (1 - p_y)
<- X %*% beta_latest + qr.solve(W) %*% (y - p_y)
Z <- qr.solve(crossprod(X, W %*% X), crossprod(X, W %*% Z))
beta_latest <- mean(abs(df_latest)) > tau
can_still_improve <- iter + 1
iter
}return(list(fitted = p_y, coeff = beta_latest))
}
7.4 Appendix
7.4.1 Weighted least-squares

Weighted least-squares
7.4.2 Iterative reweighted least-squares

Iterative reweighted least-squares