A few economic data may be useful for market timing purposes. Retail sales are one of them. Advance Real Retail and Food Services Sales (RRSFS) are issued by the Federal Reserve Bank of St. Louis using data of the Advance Monthly Retail Trade Survey by the U.S. Census Bureau. It is usually released in the second half of every month for the previous month. Data are available from 1992.

RRSFS values may be revised, which brings a question: should we use the initial releases or the revised data?

Doing a backtest means simulating a series of decisions in the past, using data that were available at each decision point. From this point of view, initial releases are preferable. But, is it always smart to use point-in-time data to evaluate an indicator? Is a "backtest" the best tool to study data that may change after publication? I don't think so. The purpose of a backtest is to simulate a time machine and do "as if" we were in the past. The purpose of my market timing study is identifying relations between entry signals from various indicators and an output signal: a stock index. My preference is not to do "strict" backtests, but having the less noise possible in entry signals. When a data series is revised, I want to know if revisions are part of the information, or if it is noise due to measurement errors. If they are part of the information, it is better to use point-in-time data. If they are noise due to measurement errors, it is better to use the latest version of revised data. They are the closest to the real states of the system at every point in the past. I think a model should be based on the real states of the system. Moreover, we can expect the measurement process to improve with time. Not only may we expect it, but we can prove it. The next chart plots a metric of RRSFS measurement errors. It is a 12-month sum of revisions, defined as initial releases minus revised values.

RRSFS initial releases have been skewed to overvaluation (optimism) until 2015, then became slightly skewed to undervaluation (pessimism). Large positive and negative errors may offset each other in this chart. The next chart shows the 12-month sum of the absolute value of revisions, which is more relevant to noise evaluation.

The amplitude of noise from measurement errors is on a downtrend. Quoting the U.S. Census Bureau about the Advance Monthly Retail Trade Survey, "numerous improvements to the estimation procedure have been made since then and the average absolute revision to the advance estimate is now two-tenths of one percent."

Therefore, I systematically use revised data for RRSFS. It would make little sense to build a model with "point-in-time" data coming from a vintage measurement process.

I will show market-timing test results based on monthly decisions. Indicators are observed on the 1st day of every month. Each indicator gives a binary signal "bullish" (0) or "bearish" (1). Every indicator is tested by calculating the performance of an investment in the S&P 500 (VOO, or IVV, SPY) with a market-timing strategy going gradually out of the market during the month of a bearish signal. Gradualness is simulated using the average of daily closing prices as monthly price. It means a trade "off" or "on" is smoothed along the month, as if 1/21 of the trade was executed every day on market closing for an average month of 21 trading days. The first advantage is that it is easy to get a free and reliable price data series based on this rationale on a very long period (Robert Shiller's online data). The second advantage of using smoothed monthly prices is a lower sensitivity to short-term moves. There is no risk to design a model unwillingly curve-fitted to a series of specific daily prices (the first trading days of every month). There is a third advantage: it is more realistic for investors who cannot make a big move on a single day because of capital size or compliance (especially fund managers).

The following tests simulate going to cash on a bearish signal. Obviously, this is rarely the best strategy. Opening or increasing hedging positions is usually a better way to manage riskier periods, incurring lower trading costs when the portfolio is in many positions or when holdings are not very liquid. It also keeps dividends coming when there are some.

RRSFS value is generally published close to the middle of every month for the previous month. Therefore, on the 1st day of month "m", we can make decisions using RRSFS (m-2).

After trying several possibilities, I have chosen to show this indicator: bearish when RRSFS went down in 6 months and bullish otherwise.

For robustness test purposes, I have also tested data delayed by 1 month. It is a way to check the model is not too time-sensitive and also how a first possible revision may affect the result.

In the tables below:

CAGR is the annualized return in percentage points.

Ddmax is the maximum drawdown depth also in percentage.

DLmax is the maximum duration in months.

MAR ratio is a risk-adjusted performance metric defined as MAR = CAGR/Ddmax.

The first column gives the starting year for each test, the end date is always 1/1/2019.

For all tables, benchmark data are repeated in italic to facilitate comparisons (S&P 500, buy and hold).

In the first series of tests below, the bearish signal is given by RRSFS (m-2) < RRSFS (m-8), meaning the 6-month momentum is negative.

Since CAGR MAR Ddmax DLmax CAGR MAR Ddmax DLmax 2000 3.22 0.06 50.82 80 6.24 0.33 19.09 38 1993 7.12 0.14 50.82 80 9.01 0.47 19.09 38

The second series is the robustness test for the same condition with data delayed by 1 month. It means the bearish signal is given by RRSFS (m-3) < RRSFS (m-9).

Since CAGR MAR Ddmax DLmax CAGR MAR Ddmax DLmax 2000 3.22 0.06 50.82 80 5.39 0.17 31.63 55 1993 7.12 0.14 50.82 80 8.49 0.27 31.63 55

Chart since 1993:

This indicator significantly improves all metrics, including the robustness test with delayed data. The main inconvenience is it can be tested on a relatively short period.

RRSFS was bullish in my latest update for subscribers.

