Introduction
Forecasting electricity demand is of immense importance not only for the research community, but also for the concerned industry. Forecasting electricity demand can be either long term, medium term or short-term. Medium to long term load forecasting is used in planning and policy making while short-term forecasting is used in scheduling the generation process as well as determining the electricity prices on electricity exchanges. Short-term electricity demand prediction is vital for the scheduling and control of power systems. They tell us about which generators and turbines should stay functional and which ones should stay idle and for how long. This also helps us prevent overloading which may lead to equipment failure. Another important issue is the gradual decentralization of the electricity markets. While some countries have adopted it, many are still considering it. Markets use these forecasts to price electricity. With supply and demand fluctuating and electricity prices spiking by a factor of ten or more in a matter of hours, load forecasting is vitally important for all market participants. Reducing forecasting error is a major concern. A small forecast error can lead to a huge loss for producers, distributors, investors etc. Electricity load forecasting relies hugely on the weather conditions. Weather anomalies produce dramatic changes in the demand for electricity. The weather factor becomes increasingly important when we are analyzing medium to long term demand. This requires the use of multivariate and complex time series models. Nevertheless, several studies have shown that univariate time series methods perform quite well. Researchers and practitioners have typically used regression, exponential smoothing, autoregressive integrated moving average (ARIMA) models for the purpose. Neural networks are also used but their performance is inferior to classical time series methods. With machine learning methods gaining popularity, support vector regression and its variant least-squares support vector regression were tried and the results are encouraging.
Method and Data
In this blog I will use a modified exponential smoothing methods called TBATS (which is an acronym for Trigonometric, Box-Cox Transformation, ARMA Errors, Trend and Seasonality) model for short-term electricity demand forecasting. It’s an innovation state space modeling framework for forecasting complex time series containing more than one or complex seasonal periods. This is a new approach published by De Livera et al. in the Journal of American Statistical Association. The article contains intricate mathematical details. The method has been implemented by Slava Razbash and Rob J. Hyndman in R’s “forecast” package.
The data for the analysis is available from the website of Independant Electricity System Operator which is responsible for power distribution in Ontario Canada. The original dataset comprised hourly demand time series from 2002 until 2014. I have used 12 weeks of data from 6 January 2014 until 31 March 2014 for the analysis. I used the first eight weeks of data as the training set and the last four weeks as the test set. As the plot below shows, the data is nonlinear and contains double seasonal periods. There is daily seasonality and a weekly seasonality. If a longer time series is being used, then we need to check for and include the annual seasonality as well. The electricity demand pattern on holidays is markedly different from the weekdays. This may have an adverse effect on our model estimation. The best way to deal with this problem is to replace the holiday data with a mean of days before and after it. For example you may replace the data at 01:00 hour on a holiday (say Tuesday) by the mean of data at 01:00 hour on Monday and Wednesday and so on. The data used here contains very few holidays so I have skipped this step.
Application
Here I will show how this can be done using the R’s forecast package. I have supplied the time series and the seasonal periods to the model. The other parameters are decided by the model based on Akaike Information Criterion.
Loading the data in R:
electricity.data <- read.csv("HourlyDemands_2002-2014.csv")
head(electricity.data)
## Date Hour Total.Market.Demand Ontario.Demand
## 1 01-May-02 1 14141 14137
## 2 01-May-02 2 13876 13872
## 3 01-May-02 3 13974 13820
## 4 01-May-02 4 13898 13744
## 5 01-May-02 5 14378 14224
## 6 01-May-02 6 15408 15404
As I am interested in forecasting the Ontario demand, I will use the time series in the fourth coluns of the data under the variable name “Ontario.Demand.”
#Training Set: 8 weeks of data containing 1368 hourly data points from
#06 January 2014 uptill 03 March 2014.
electricity.train <- electricity.data[102433:103800,]
#Test Set: 4 weeks of data containing 672 hourly data points from
# 04 March 2014 uptill 31 March 2014.
electricity.test <- electricity.data[103801:104492,]
Fitting the TBATS model:
#The output below has been modified to avoid large amount of data from
#printing on the screen.
library(forecast)
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Loading required package: timeDate
## This is forecast 6.1
electricity.tbats <- tbats(electricity.train[,4],seasonal.periods = c(24,168))
electricity.tbats
## TBATS(0, {3,1}, 0.95, {<24,8>, <168,4>})
##
## Call: tbats(y = electricity.train[, 4], seasonal.periods = c(24, 168))
##
## Parameters
## Lambda: 5.3e-05
## Alpha: 0.02193652
## Beta: 0.003105538
## Damping Parameter: 0.949609
## Gamma-1 Values: -9.47268e-06 -1.780174e-05
## Gamma-2 Values: -4.117514e-05 2.1836e-06
## AR coefficients: 0.715125 0.30404 -0.2388
## MA coefficients: 0.591792
##
## Sigma: 0.01059258
## AIC: 24332.95
#Plot showing the decomposition of the time series.
plot(electricity.tbats)
I have used the “forecast” function that is included in the forecast pacakage instead of the usual “predict” function, both are fine.
electricity.forecast <- forecast(electricity.tbats, h=672)
plot(electricity.forecast, main = "Forecasts from TBATS",xlab = "Time in Hours", ylab = "Electricity Demand")
The results contain point forecasts as well as the prediction intervals. I will use the function “accuracy” to find the error measures.
#electricity.forecast is the forecasted demand.
#electricity.test[,4] is the test set.
accuracy(electricity.forecast,electricity.test[,4])
## ME RMSE MAE MPE MAPE
## Training set 1.035377 194.1032 145.0593 -0.001292152 0.796321
## Test set -1776.148647 2045.2013 1820.6367 -10.697953811 10.940501
## MASE ACF1
## Training set 0.3004426 -0.04734554
## Test set NA NA
Conclusion
The accuracy can be improved by trying different values of the parameters. One may try running the program with and without Box-Cox transformation, using TRUE/FALSE for ARMA errors, trend, damped trend and several additional arguments. Note that these functions generate loads of data, most of which is not shown here. For further details about the prarameters, read the documentation.
library(forecast)
help(tbats)