Covariate shift in Machine Learning

Photo by Erda Estremera on Unsplash

What is Covariate shift?

For example, let’s say I created a classifier for health insurance coverage using census data. The model will susceptibly deteriorate over time as the distribution of the inputs changes. A change in the input probabilities violates a primary assumption that the probabilities are the same during training and testing.

In addition, if my health insurance classifier was trained using New York’s census data explicitly and was then used to classify Alabama residents, the model might perform worse on Alabama than during training(New York).

Most supervised models follow the assumption that the training and testing sets are pulled from the same population. In order to maintain this expensive assumption, the model should be refit regularly with an addition of the most recent data. However, when it’s not possible to collect additional data, there are some simple tricks such as importance reweighting. Before handling the shift, it’s important to first identify if a shift has occurred.

Identify Shift

When the model handles multiple variables, I can create pseudo labels for the training and testing sets(1 for test and 0 for training) and use the combined data sets to fit a binary classifier to see if the model can detect the difference. If the training set and the testing set come from the same population, then the pseudo classifier will have a hard time classifying.

Handling Shift

I extract weights by fitting a model with the joined training and testing set with a pseudo-dependent variable, 1 for testing and 0 for training. The objective is to return the confidence scores on the training set after fitting and using the exponentiated confidence scores as weights.

The higher the confidence score, the more confident the model is that the observation belongs in the positive case(similar to the test). The confidence scores are then exponentiated to return the weights. Since I don’t want specific observations to have outrageous weight assignments, I clipped the confidence scores at a certain threshold. Clipping will increase bias but at the same time decrease variance significantly.

The final classifier is fit with the appropriate weights that look similar to the test set.

Hello! My name is Albert Um.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store