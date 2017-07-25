They found that this technique even had power to predict U.S. recessions 3 quarters ahead, distant enough that the mean of the Survey of Professional Forecasters predictions hasn't been negative.

In a 2016, two researchers from the University College of London released their findings that a machine learning technique, known as random forest regression, could outperform forecasters in predicting GDP (as measured by the Survey of Professional Forecasters). Perhaps more importantly, this technique has predictive power over recessions as far as 3 quarters into the future - where the mean of SPF estimates has never predicted negative growth.

What Is a 'Random Forest Regression?'

For a more in-depth understanding of how a random forest regression, or RFR as I'll call them for this article, works you should see the link above. If a really simple overview is all you're looking for, keep on reading.

For predicting real-valued variables, random forests are built from these things called decision trees. A decision tree partitions the data it is given into a number of subsets, then assigns each piece of data depending on which subset it is classified into. In the implementation of RFR used for this article, each subset is classified by the mean value of the data in the subset. For example, a decision tree may partition (2, 5), (3, 7), (1, 4.5), (7, 11), (8, 11), (6, 9), where the first number is meant to explain the second, into subsets depending on whether the first number was greater than 5. This would give two subsets [(2, 5), (3, 7), (1, 4.5)] and [(7, 11), (8, 11), (6, 9)]. With the mean of 5, 7, and 4.5 being 5.5 and the second subset having a mean of 10.33, any future data point we wish to estimate by this decision tree will be estimated as 5.5 if it is less than 5.0, and 1.33 otherwise. While this example seems to show the inflexibility of estimates by decision tree, we can make for more flexible estimate, first by classifying data into more subsets, and second by using the technique of random forests.

Random forests are a way to allow decision trees to partition a data set into a large number of subsets, according to fine-grained characteristics of the data set, but without overfitting to noise in the data (a serious problem in economic forecasting). In a random forest, a number of decision trees are created, each given a data set randomly picked from the given data set with replacement. The estimates of each tree are then averaged to obtain a final estimate. This allows each tree to tightly fit its given data set, while averaging out any random noise in the data.

Predicting U.S. GDP

As the researchers goal in using RFR was to improve recession forecasting, their chosen variables, the 3-month Treasury Bill rate, the 10-year Treasury Bond rate, the quarterly change on the S&P 500 index, and the ratio of private sector debt to current price GDP, were all focused on monetary factors. I, on the other hand, want to not just to predict recessions but also GDP during normal economic times, I chose a different set of variables. These were:

The percent change from a year ago in the PCE price index The percent change from a year ago in the PCE excluding food and energy (chain-type price index) The spread between 2 and 10-year Treasuries on the first day of the month. The percent change from a year ago in mortgage debt service payments as a percent of disposable personal income The unemployment rate The percent change from a year ago in the unemployment rate The percent change from a year ago in the U.S. Leading Index The percent change from year ago of retail sales (excluding food sales) The month supply of homes

For each prediction, I estimated on data lagged to the degree of the lag between the current data and the last day of time period of the predicted GDP. For instance, in predicting Q2 GDP, 04-01-2017 through 07-01-2017, the last unemployment rate figure is from June 1st. So, for the data point estimating GDP on 04-01-2017, I would use the unemployment rate from 03-01-2017. It should also be pointed out that all data I have was pulled from FRED, so it is the most current estimate of each data point and not necessarily the data that was available when released.

The GDP Predictions

So first off, I attempted to predict the percent change from a year ago of GDP for Q2 (which ended at the end of June), Q3, and Q4. Alongside the actual predictions. Alongside these predictions, I have used a form of cross-validation to test the predictiveness of rfr at each time lag. This involves, for each sample, fitting the random forest on all other samples, estimating with the model what the held-out sample should be, and calculating the error between the sample and predicted value. I then calculated the correlation and R-squared between predicted and actual values, as well as the standard deviation of errors between the two.

(Note that charts don't represent a time series)

Prediction for Q2 2017:

Cross-Validation Predictions vs. Actual

Prediction: 3.04

Correlation: .908

R-Squared: 0.80

Standard Deviation Of Errors: 0.879

Prediction for Q3 2017:

Cross-Validation Predictions vs. Actual

Prediction: 2.85

Correlation: 0.70

R-Squared: 0.50

Standard Deviation Of Errors: 1.39

Prediction for Q4 2017:

Cross-Validation Predictions vs. Actual

Prediction: 3.28

Correlation: 0.78

R-Squared: 0.608

Standard Deviation Of Errors: 1.23

But What Does it All Mean?

2016 GDP for Q2, Q3, and Q4 was $18450.1, $18675.3, and $18869.4 billion respectively. So, if the predictions are to believed, 2017 will be $19077.4 bil. for Q2, $19207.5 bil. for Q3, and $19488.3 bil. for Q4. That's a total growth by year's end of $461.2 bil., or about 2.4%, from the current estimate of Q1 GDP.

If the prediction error stays within one standard deviation, each prediction can be expected to be within about +/- $250 bil. This is definitely a pretty big range, but the errors should offset each other over the 3 quarters (one quarter might have a positive prediction error while the next may be negative).

Given the large possible prediction errors, the main takeaway from these numbers isn't the actual estimates. Instead, it is that GDP growth is likely to continue to grow at an average rate through the end of the year. If you ask me, whether or not the Fed continues hiking rates will probably determine whether the prediction over or under promise. Either way, though, expect a status quo economy through this year's end.

