OLS Summary: P-values and Confidence Intervals

Albert Um
3 min readJan 29, 2021

In short, the p-values represent the probability of the coefficient values being 0. The exact coefficients are unknown due to sampling uncertainty, and the confidence intervals represent the bounds of the population coefficient given a confidence range (e.g., 95% CI as above). If the coefficient is 0, then the independent variable can be deemed insignificant. Therefore, if the probability of the coefficient being 0 is small(e.g., 0.05), we can lean in favor of the alternative that the feature is significant.

The p-values for the coefficients are derived from a hypothesis test. The null hypothesis states the coefficient is zero(insignificant), and the alternative claims the coefficient is not zero(significant). Hypothesis tests work by “proof by disproof.” The p-value is the probability that the coefficient is 0 given that the null statement, coefficient = 0, is true. If the p-value is small, we can reject the null hypothesis in favor of the alternative.

The output of the test is a t-score which is then translated to a p-value from a t-value table. The two-tailed t-test statistic can be calculated as such:

x_bar stands for the mean coefficients of the sample and the SE stands for the coefficients’ standard error. Simply dividing the coefficient by the standard error should give you the t-score. The p-value is then found in a t-value chart with the t-score and the degree of freedom(which equals the number of observations minus the number of columns minus 1).

If the p-value is less than some assigned alpha(e.g., 0.05), we can reject the hypothesis that the population coefficient is 0 and lean toward the alternative that the feature is significant. The confidence interval for a coefficient in the above OLS summary is a 95% confidence range of the population coefficient. The coefficients are calculated as such:

It’s important to note that an increase in sample observations will tighten the confidence interval as the standard error of the coefficient will decrease. In addition, t-scores will increase multiplicatively as the number of sample observations increase, therefore lowering the p-value. Intuitively, an increase in observations will decrease the variation of the samples. However, increasing observations will not reduce the standard error of the regression, only the coefficient.

The standard error of coefficient is calculated with the sum of squared residuals, degrees of freedom, and the inverse of X_transpose*X.

I hope this blog has helped you interpret the OLS summary. The main takeaways from this blog are:

  • P-value is the by-product of the hypothesis test that the population coefficient is 0(insignificant).
  • Coefficient Confidence Intervals represent the range of the population coefficient. The larger number of samples, the tighter the confidence interval of the coefficients(NOT regression/model) will be.
  • Standard Error of Coefficients can be calculated using fancy linear algebra.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Albert Um
Albert Um

Written by Albert Um

Hello! My name is Albert Um.

Responses (1)

Write a response