Time Series: Feature Engineering Datetime Month

Photo by Jay Mantri on Unsplash

For this blog, I will talk about a feature engineering trick as an alternative to creating dummy variables for months. It’s common practice to create datetime dummy month columns in additive models. For example, I am trying to forecast wildfires in Queensland, Australia.


And in order to capture the month seasonality, I create dummy month features from the datetime index using a loop as such:

When I take these features and forecast using SARIMAX, there might be large step-downs or step-ups from the last day of a given month to the first day of the next month. For example:

This makes a lot of sense because each month's feature has different coefficients. However, maybe creating on and off switches based on month isn’t exactly what I want. Although January 1st is in a new month, it’s not very far from December 31st. I would want the beginning of January(month_1) to take in some elements from December(month_12).

In order to create these features, I used a radial basis function to return me bumps for each month.

However, I had to brute force the solution by creating multiple(n years) arrays of cumulative days of the data frame and minus the cumulative mid-day for that year and returning max values for that array because initially, January didn’t take any elements of December and vice versa.

The month features are now a bit curvier than the previous on and off switches:

When I refit the training set with these features, I get a more generalized and curvy plot as below:

I learned about these ideas from Vincent D. Warmerdam at his PyData talk back in 2018. Many kudos to him! Please check out his talk here:


Hello! My name is Albert Um.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store