Chapter 7 Regression
7.1 Ordinary least-squares
Both OLS and WLS are implemented here using QR decomposition. As for OLS this is very easily done in R. Given some feature matrix X and a corresponding outcome variable y we can use qr.solve(X, y) to compute \(\hat\beta\).
7.2 Weighted least-squares
For the weighted least-squares estimator we have: (see appendix for derivation)
\[ \begin{equation} \begin{aligned} && \hat\beta_m^{WLS}&= \left( \mathbf{X}^T \Phi^{-1} \mathbf{X} \right)^{-1} \mathbf{X}^T\Phi^{-1}\mathbf{y}\\ \end{aligned} \tag{7.1} \end{equation} \]
In R weighted this can be implemented as follows:
function (X, y, weights = NULL)
{
if (!is.null(weights)) {
Phi <- diag(weights)
beta <- qr.solve(t(X) %*% Phi %*% X, t(X) %*% Phi %*% y)
}
else {
beta <- qr.solve(X, y)
}
return(beta)
}7.3 Logisitic regression
To model the probability of \(y=1\) we will use logistic regression:
\[ \begin{equation} \begin{aligned} && \mathbf{p}&= \frac{ \exp( \mathbf{X} \beta )}{1 + \exp(\mathbf{X} \beta)} \end{aligned} \tag{7.2} \end{equation} \]
Equation (7.2) is not estimated directly but rather derived from linear predictions
\[ \begin{equation} \begin{aligned} \log \left( \frac{\mathbf{p}}{1-\mathbf{p}} \right)&= \mathbf{X} \beta \\ \end{aligned} \tag{7.3} \end{equation} \]
where \(\beta\) can be estimated through iterative re-weighted least-squares (IRLS) which is a simple implementation of Newton’s method (see for example Wasserman (2013); a complete derivation can also be found in the appendix):
\[ \begin{equation} \begin{aligned} && \beta_{s+1}&= \left( \mathbf{X}^T \mathbf{W} \mathbf{X} \right) \mathbf{X}^T \mathbf{W}z\\ \text{where}&& z&= \mathbf{X}\beta_{s} + \mathbf{W}^{-1} (\mathbf{y}-\mathbf{p}) \\ && \mathbf{W}&= \text{diag}\{p_i(\beta_{s})(1-p_i(\beta_{s}))\}_{i=1}^n \\ \end{aligned} \tag{7.4} \end{equation} \]
In R this can be implemented from scratch as below. For the empirical exercises we will rely on glm([formula], family="binomial") which scales much better to higher dimensional problems than my custom function and also implements weighted logit.
function (X, y, beta_0 = NULL, tau = 1e-09, max_iter = 10000)
{
if (!all(X[, 1] == 1)) {
X <- cbind(1, X)
}
p <- ncol(X)
n <- nrow(X)
if (is.null(beta_0)) {
beta_latest <- matrix(rep(0, p))
}
W <- diag(n)
can_still_improve <- T
iter <- 1
while (can_still_improve & iter < max_iter) {
y_hat <- X %*% beta_latest
p_y <- exp(y_hat)/(1 + exp(y_hat))
df_latest <- crossprod(X, y - p_y)
diag(W) <- p_y * (1 - p_y)
Z <- X %*% beta_latest + qr.solve(W) %*% (y - p_y)
beta_latest <- qr.solve(crossprod(X, W %*% X), crossprod(X, W %*% Z))
can_still_improve <- mean(abs(df_latest)) > tau
iter <- iter + 1
}
return(list(fitted = p_y, coeff = beta_latest))
}7.4 Appendix
7.4.1 Weighted least-squares
Weighted least-squares
7.4.2 Iterative reweighted least-squares
Iterative reweighted least-squares