ARIMA Modeling To Forecast Future Oil Prices

| About: The United (USO)
This article is now exclusive for PRO subscribers.


This article works through the ARIMA method to time series forecasting with respect to WTI crude oil prices.

Using a set of monthly oil prices from January 2003 to October 2016 the data are best fitted to an autoregressive (AR) model.

The model projects near-term upside in prices.

Recently I posted an article regarding the econometric modeling of WTI crude oil (NYMEX:"CL")(NYSEARCA:USO)(NYSEARCA:OIL) prices using ordinary least squares ("OLS") multiple regression analysis. That model used a variety of factors, including consumption, production, real GDP and inventory builds to predict the price of oil. Based on how current data conformed to the modeling output, I estimated oil to be valued in the low-$50s per barrel.

Time series procedures are nonetheless a more typical means of forecasting future price points. In econometrics, an autoregressive integrated moving average ("ARIMA") model is commonly used to predict future points with respect to time-varying phenomena or as an alternative means of studying the nature of time series data. ARIMA can therefore be particularly useful when it comes to studying various forms of financial or economic data.

There are three components to an ARIMA model, as a combination of autoregressive ("AR"), integrated ("I"), and moving-average ("MA") models. Autoregressive revolves around regressing the variable of interest (in this case, the price of oil) on its prior terms. The "I" part of the model is generally applied when the data in the sample are non-stationary (i.e., the joint probability distribution of the variables under consideration change over time). The moving-average model states that the output variable linearly depends on the present and past values of a stochastic term. ARIMA models are generally described through an "ARIMA (p,d,f)" format. For example, an ARIMA (2,0,2) model would denote a model consisting of two AR terms and two MA terms. (A more rigorous explanation on ARIMA models, including mathematical definitions, can be found here.)

The confidence intervals at which forecasts are made rely on the assumption that the residuals (i.e., the difference between the values measured as part of the regression and the theoretical or true values) are normally distributed and uncorrelated. Accordingly, it is important to plot out an auto-correlation function ("ACF") and partial auto-correlated function ("PACF") to help diagnose the proper ARIMA model to use on the data under consideration. For this exercise, WTI crude oil prices are taken from January 2, 2003, to October 21, 2016. The monthly price points of oil are graphed in the top part while the ACF and PACF are graphed at the bottom of the diagram below:

(Source: author)

The slow decay of the ACF suggests the data follow a long memory process. Namely, in terms of real-world meaning, the data display that shocks in the oil market have an enduring influence on the commodity's future price trajectory through their ability to affect the market several observations ahead.

The PACF displays a sharp cutoff while the ACF decays geometrically, which makes the data best suited by adding more AR terms versus MA terms. In statistics parlance, it's deemed a "stationarized series." The gradual decay in the ACF simply suggests that each successive lag affects the present and future observations less as time goes on. Therefore, events in the oil market from April 2015 are less likely to affect the market than those from September 2016. This seems self evident, but each dataset is different.

The PACF visual guides how many AR terms to add while the ACF determines how many MA terms. Given the slow decay of the ACF, no MA terms are needed - it wouldn't affect the forecast much, but would have no statistically significant effect. The PACF has two significant spikes, which gives us two AR terms. Accordingly, our ARIMA model can be reduced into a (2,0,0) model, or essentially just an AR model alone, given the zeros excludes I and MA terms. A check of statistical significance, by comparing the two AR terms' coefficients versus their standard errors (the former divided by the latter should surpass 1.96 or be below -1.96 to have significance at the 5% level), shows that a (2,0,0) model is probably the best fit for the data, with both coming up as statistically significant.


The results of the model suggest some future upside in oil. I forecasted for the next twelve months, or through October 2017.

(Source: author)

The blue line represents the mean expectation or "point forecast." The darker blue cloud displays an 80% confidence interval while the lighter blue cloud shows a 95% confidence interval. The chart below shows that the point forecast expects oil to be priced at $58 per barrel at this time next year. Understandably, an 80% confidence interval provides a wide range of outcomes a year out, going from $26 at the 10th percentile (roughly the market's 2016 lows) to $90 at the 90th percentile.

(Source: author)

Alternative Time Series Forecasts

The theta method as introduced in Assimakopoulos and Nikolopoulos (2000) became popular in academic circles shortly after publication with its performance in the M3 competition with respect to monthly time series data. It's developed a fairly popular following in supply chain management ad planning and other microeconomic data forecasting due to the accuracy of its point forecasts. It represents a special case of an SES-d model (simple exponential smoothing with drift). The model isn't wholly appropriate for use on most macroeconomic phenomena, but out of basic curiosity I ran it anyway, predicting 12 months forward. The results are shown below:

(Source: author)

The theta model detects seasonality in the data, denoting that crude prices are more likely to rise in the summer primarily in response to summer driving season. For this reason, the model predicts a dip to the mid-$40s per barrel in the 2016-17 winter while rising again to the mid-$50s by summer 2017. If we were to expand the forecast from 12 to 24 months, the seasonality pattern is likely to still be apparent.

(Source: author)

And sure enough it does, following the same exact pattern, but this time with higher highs for summer 2018 and a slight bullish bias overall. Still, the theta model lacks an autoregressive component and isn't an optimal choice for macroeconomic data in most cases.


Time series models are helpful to learn about various meaningful statistics based on prior observations. Naturally, forecasts are best produced on shorter time frames as unanticipated events can pop up to change the price trajectory of the market. Models take into account this uncertainty through the large forecasting disparities observed but work to limit the utility of the model as they became broader over time.

One type of model also may be more effective at one point in time versus another. For this reason, no single one-dimensional source of analysis should be used entirely to derive price forecasts. Ideally, a combination of quantitative/technical models and fundamental analysis should be used to determine if one might have an edge on predicting the future price of any financial asset.

Disclosure: I/we have no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours.

I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.