Regression

Everything is just linear regression

graph LR
AA(Linear Regression) --> A
A(Least Squares) <--"Gaussian Linear Model"--> B(Maximum Likelihood Estimation)
AA --"feature augmentation"--> G:::hidden
G --"polynomial feature"--> C(Polynomial Regression)
G --"general feature"--> D(Generalized Linear Regression)
D --> S(Splines)
A --"regularization"--> R:::hidden
R --"L2"--> RR(Ridge Regression)
R --"L1"--> L(LASSO)
RR <--"Gaussian prior"--> M(Maximum a Posteriori)
AA --"underdetermined system"--> N(Least Norm)
N --"kernel"--> NK(Kernel Regression)
AA --"prior to posterior"--> BB(Bayesian Linear Regression)
BB --"kernel"--> BK(Gaussian Process Regression)
B --"link function"--> GL(Generalized Linear Model)
GL --> GLL(Logistic Regression)
GL --> GLP(Poison Regression)
GLL --"prior to posterior"--> BL(Bayesian Logistic Regression)

classDef hidden display: none;
class A,AA,B,BB,BL,BK,C,D,G,GL,GLL,GLP,L,M,N,NK,R,RR,S internal-link

Regression is a fundamental statistical task1 aimed at uncovering the relationship between two correlated random variables. When one variable is designated as the input and the other as the output, regression focuses on modeling the effect of the input on the output. This setup makes regression closely related to, or often regarded as a type of, Prediction task, where the goal is to match the prediction to the true output; please refer to Regression and Prediction for more discussion on the relationship between the two tasks.

A regression task involves the following components:

  • Input variable , whose samples are known as measurements, covariates, features, explanatory variables, and independent variables;
  • Output variable , whose samples are known as responses, labels, and dependent variables;
  • Model function that captures the relationship between the input and output.

The most general model is the conditional distribution . More practically, we often do partial modeling, which decomposes the model into a regression function that partially describes the distribution of and a residual uncertainty term. The most common example takes the form

where denotes a noise term and our goal is to recover the deterministic function . Our target random variable can also be various Statistics of , such as conditional quantiles. Moreover, a regression task often restricts the search space of to a parametric model . Different parametric models learn from data differently. The learned model is then called a regression function parameterized by .

In the context of machine learning, regression is a Supervised Learning task, as the label is given for each sample.

Different regression function families and different loss functions give different regression models:

Regression and Prediction

One could consider regression and Prediction as distinct tasks as they have different goals and thus different metrics. Given a random variable , regression focuses on fitting the model , so it is evaluated using a model-level metric of the form . For example, if belongs to a parametric family , then the metric is usually of the form , reducing the task to an instance of Estimation. In contrast, prediction focuses on matching the output , typically evaluated via a pointwise metric of the form , e.g., the zero-one prediction error.

One could also argue that regression is a special case of Prediction task, as the model is completely determined by the mapping , hence knowing the prediction for all essentially recovers the model . However, the premise is almost never true and thus this inclusion is only theoretically valid. This is because in Statistics or Supervised Learning, we only have access to the sample or training data, and thus the best predictor we can learn may deviate from the true model arbitrarily on the unseen data, even if their prediction on the unseen data is close with high probability (generalize well in terms of prediction error). For example, even when the true model is simple, an over-parametrized neural network may learn a complex and uninterpretable predictor that achieves excellent prediction accuracy but bears little resemblance to . On the other hand, if we restrict our search space of the predictor to the class of regression models, then solving a prediction task essentially reduces to solving a regression task. In other words, although we can view regression as a type of prediction task, not every prediction method is suitable for regression. The converse is always true: regression, as a class of methods, can be used to solve prediction tasks.

One could also view Prediction as a general regression task, if we use the prediction accuracy as the fitting metric between the predictor and the model. For example, we can have as the sample size increases, in terms of that . In this sense, we can say fits well and solves the regression task under a prediction-oriented metric.

Ultimately, different aspects view the relationship between regression and prediction differently, with the distinction lying in the specific goals and metrics.

Footnotes

  1. Regression is also referred to as a class of methods that are used to solve this type of task.