Maximum a Posteriori

As in Bayesian Inference, we have a posterior distribution given a prior distribution and the observation. However, instead of taking the expectation, we can also apply the Maximum Likelihood Estimation, which gives the maximum a posteriori (MAP) estimation.

Note here, we are not maximizing the likelihood of the observation; we are maximizing the likelihood of the weight given the posterior distribution.

Relationship with Ridge Regression

Just like MLE being the probability interpretation of least square estimation, MAP is the probability interpretation of the Ridge Regression.

Assume the prior distribution of is , then

which gives , equaling to with regularizer parameter .