Time Series: Feature Engineering Datetime Month

Albert Um
3 min readDec 11, 2020

--

Photo by Jay Mantri on Unsplash

For this blog, I will talk about a feature engineering trick as an alternative to creating dummy variables for months. It’s common practice to create datetime dummy month columns in additive models. For example, I am trying to forecast wildfires in Queensland, Australia.

https://github.com/Call-for-Code/Spot-Challenge-Wildfires

And in order to capture the month seasonality, I create dummy month features from the datetime index using a loop as such:

When I take these features and forecast using SARIMAX, there might be large step-downs or step-ups from the last day of a given month to the first day of the next month. For example:

This makes a lot of sense because each month's feature has different coefficients. However, maybe creating on and off switches based on month isn’t exactly what I want. Although January 1st is in a new month, it’s not very far from December 31st. I would want the beginning of January(month_1) to take in some elements from December(month_12).

In order to create these features, I used a radial basis function to return me bumps for each month.

However, I had to brute force the solution by creating multiple(n years) arrays of cumulative days of the data frame and minus the cumulative mid-day for that year and returning max values for that array because initially, January didn’t take any elements of December and vice versa.

The month features are now a bit curvier than the previous on and off switches:

When I refit the training set with these features, I get a more generalized and curvy plot as below:

I learned about these ideas from Vincent D. Warmerdam at his PyData talk back in 2018. Many kudos to him! Please check out his talk here:

https://www.youtube.com/watch?v=68ABAU_V8qI&ab_channel=PyData

--

--

Albert Um
Albert Um

Written by Albert Um

Hello! My name is Albert Um.

No responses yet