Goodness of Fit Test

Goodness of fit tests are non-parametric tests that assess how well a statistical model fits a set of observations. The general idea is to test the empirical PMF/PDF against the null PMF/PDF of a statistical model.

Discrete Case

A discrete distribution with a finite sample space is a general Bernoulli distribution, or termed a categorical distribution, determined by the PMF , where are the categories. The collection of all categorical distribution with categories is actually a parametrized model with parameter space , the -dimensional simplex. And this parameter space is of dimension . This implies the following theorem:

Thm

where is the PMF MLE, is true PMF (under the null), and is the Chi-Square Distribution with degrees of freedom (DoF).

With some calculation, one can show that is indeed the empirical PMF: , where is the number of observations in category .

vs

Additionally, the result seems to be consistent with the classical asymptotic distribution of MLE, except the DoF of the chi-square distribution is instead of .

Although one can memorize the above result by mirroring it to the classical properties of MLE, the same derivation will not go through. This is because, due to the dependence of ‘s coordinates, the covariance matrix of the score function will be of rank , hence not invertible, failing the For Maximum Likelihood Estimation.

The asymptotic valid test using the above theorem against is called the chi-square test.

Continuous Case

One possible way is to plot the Histogram of the sample, which is essentially a PMF with finite bins, and use chi-square test. Suppose our null PDF is , then the discretized null PMF is , where is the -th bin.

Another convenient visual way is inspect the quantile function, leading to the quantile-quantile plot.

We can also look at the CDF. Note that the CDF of a Discrete Random Variable is a step function. For any data-generating distribution, the empirical CDF constructed from finite samples is naturally a step function. Therefore, we do not need to manually discretize the data into bins. And because of the nice convergence properties of Inference for CDFs, we can directly use the empirical CDF to test against the null CDF.

Let be the empirical CDF. The Kolmogorov–Smirnov test statistic is . We can construct an asymptotic valid test using Donsker’s theorem, with the critical values being the quantiles of the , where is a Brownian bridge.

We can also construct a non-asymptotic valid test by noticing that is pivotal: it has the same distribution for any given the same sample size . Therefore, we can build the quantile table of using offline simulations, and then fetch the critical values for later hypothesis testing. This is called inference by simulation:

  • Input: , .
  • For :
    • Sample for .
    • Compute the empirical CDF of .
    • Compute .
  • Let be the -quantile of .

Once we conduct the above simulation for a large , we get the critical values and can use it for any test with a sample size .

Then, an confidence interval for is . For the null hypothesis , the Kolmogorov–Smirnov test is .

Using inference by simulation also directly gives an approximate p-value:

Remark

We conclude by commenting on elements in the above test that also appear throughout the topic of Hypothesis Testing:

  • The test statistic has the same distribution as under the null. corresponds to the uniform distribution, but can be any distribution. This is similar to the CLT Test Statistic where different test statistics all converge to the standard normal distribution.
  • Before any tests, you conduct a large simulation of to get its quantile table. This is similar to other HT settings where you prepare the quantile tables for, e.g., Gaussian, t-Distribution, etc.
  • Now given a specific HT task and a sample, you can construct the test statistic and/or the p-value, which are all random variables. But note that, even involves both and , its randomness comes solely from , because is a known distribution.
  • After observing the sample data , you can compute the p-value . The calculated p-value quantifies the probability of observing such a sample data under the null. Thus, when it’s small, it is unlikely the null is true.