Time Series: Feature Engineering Datetime Month

Photo by Jay Mantri on Unsplash

For this blog, I will talk about a feature engineering trick as an alternative to creating dummy variables for months. It’s common practice to create datetime dummy month columns in additive models. For example, I am trying to forecast wildfires in Queensland, Australia.

https://github.com/Call-for-Code/Spot-Challenge-Wildfires

And in order to capture the month seasonality, I create dummy month features from the datetime index using a loop as such:

When I take these features and forecast using SARIMAX, there might be large step-downs or step-ups from the last day of a given month to the first day of the next month. For example:

This makes a lot of sense because each month's feature has different coefficients. However, maybe creating on and off switches based on month isn’t exactly what I want. Although January 1st is in a new month, it’s not very far from December 31st. I would want the beginning of January(month_1) to take in some elements from December(month_12).

In order to create these features, I used a radial basis function to return me bumps for each month.

However, I had to brute force the solution by creating multiple(n years) arrays of cumulative days of the data frame and minus the cumulative mid-day for that year and returning max values for that array because initially, January didn’t take any elements of December and vice versa.

The month features are now a bit curvier than the previous on and off switches:

When I refit the training set with these features, I get a more generalized and curvy plot as below:

I learned about these ideas from Vincent D. Warmerdam at his PyData talk back in 2018. Many kudos to him! Please check out his talk here:

https://www.youtube.com/watch?v=68ABAU_V8qI&ab_channel=PyData

--

--

--

Hello! My name is Albert Um.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

So you think you can code? General Assembly bootcamp — Week One

Migrating Data from MongoDB to AWS S3

#freestockphoto (Tuesday 24th 08PM)

Advantages of DevOps for Business

Independent Software Vendors (ISVs) and the proliferation of B2B application marketplaces

Photograph of a marketplace, courtesy Lishen Chang on Unsplash

Hackon 2.0- Best Voice hack

Good Code Should Target Humans

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Albert Um

Albert Um

Hello! My name is Albert Um.

More from Medium

Time Series Analysis and Time Aware ML: The Vast and Deep Relationship Between Statistics and Time

Forecasting Time Series Data

An easy start into Time-series Forecasting: A practical example using Darts library

Time series forcasting methods