We’ve been discussing sources of performance decay, degrees of freedom, and the implied statistical significance of systematic trading strategies, so I was pleased when some recent articles triggered an idea for a related case study.
Albert Einstein is oft credited with suggesting that problems should be made ‘as simple as possible, but not simpler’. In fact, a poster with this very phrase and a picture of Einstein’s unmistakable visage adorned the inside of my bedroom door for much of my adolescence. However, readers might be interested to learn that this particular phrase has never been directly attributed to Einstein in any of his published works. Rather, it’s surmised that this statement is actually a distilled version of a slightly less accessible quotation from a lecture entitled, “On the Method of Theoretical Physics” delivered at Oxford in 1933. The actual quote from Einstein was, “It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.”
In any event, the distillation is a useful heuristic, and nowhere more so than in the field of empirical finance. To wit, it is attractive to think that an already simple approach, such as Mebane Faber’s ‘Ivy Portfolio,’ a 5-asset, 10-month moving average methodology, which requires monthly attention as originally proposed, might be equally effective with annual rebalancing. For those who aren’t already acquainted with it, Faber’s Ivy Portfolio approach was first proposed in a paper entitled “A Quantitative Approach to Tactical Asset Allocation” in 2007. It has been updated several times since including a recent update in early 2013 which extended the results through 2012. I’m not ashamed to admit that this paper was a primary catalyst for our own interest in quantitative approaches to asset allocation.
The mechanics of Faber’s approach is quite simple. First, compose a diversified portfolio from each of the major asset classes held in equal weight: bonds, U.S. stocks, international stocks, real estate, and commodities. Next compute a moving average (MA) of closing prices over the prior 10 months for each asset. Observe the portfolio at the end of each month, and where an asset closes out the month below the level of its moving average, sell the asset and hold cash, repurchasing only when it closes back above its moving average at the end of any subsequent month.
Our analysis will attempt to answer several questions:
- Is it possible to make this approach even simpler by only rebalancing the portfolio every 12 months rather than at the end of every month?
- What can we discover by backtesting a strategy which only trades the portfolio on the last trading day of the year?
- If this backtest did yield results that were comparable to the monthly approach, how statistically significant is this result?
- How might we improve our understanding of the true distribution of risk and return for an annually rebalanced strategy relative to a monthly version?
Before diving into our quantitative analysis, please recall that the cornerstone of robust system development is statistical significance. Furthermore, statistical significance is largely a function of the number of observations. It’s difficult to achieve statistical significance with only a few trades, as each trade constitutes one observation. As a result, an annually traded approach starts out with a large hurdle to overcome, which is that we are only able to generate one observation per year per instrument. For example, using just the original Ivy Portfolio’s 5 asset classes – US stocks, EAFE stocks, US real estate, US Treasuries and commodities, if we have 40 years of data we will have about 40*5 = 200 observations.
Granted, 40 years of time series data is meaningful because it covers several secular market regimes, such as the 1970s stagflation, the 2000 tech bubble, the emerging market and commodity boom of the mid ‘naughts, and the Global Financial Crisis of 2008. Even so, 200 total observations is not enough to generate meaningful statistical confidence, as we will demonstrate below.
To answer the questions we raised above, we ran several tests. Before we explain the tests however, note that we altered the original Ivy 5 concept in some subtle ways:
- We added emerging market equities (EEM), Japanese equities (EWJ), gold (GLD), international real estate (RWX) and long-duration Treasuries (TLT) to the original 5 asset universe. The broader universe generated more observations, and allowed us to test whether the parameters specified in the originally specified Ivy Portfolio approach were optimized to work on just those 5 assets , or whether the rules are more universally applicable.
- We used daily data for our tests rather than monthly data. As a result, our tests only go back to 1995 because daily data for all of the indexes was unavailable prior to that time. However, daily data allows us to test trading on each of the 252 trading days of the calendar year, which multiplies our number of observations by over 250 times.
- We used the daily equivalent of the monthly moving averages applied in the original report. For example, rather than using a 12 month moving average, we used a 252 day MA. Any performance deviations due to our use of daily vs. monthly moving average calculations are statistically immaterial to the analysis.
- We tested both the 200 day (~10 month) and the 252 day (~12 month) moving averages as filters to see if there was a material difference in results by varying the length of the moving average
We approached our analysis from three directions. First, we ran tests using a 252 day (~12 month) moving average rule with annual rebalancing, but where the annual rebalance occurs in months other than December. We also compared each of the annually rebalanced systems to results from a system that is rebalanced at the end of every month, and another system that is observed for rebalancing every day. Next we performed the same analyses, but using a 200 day (~10 month) moving average filter instead of a 252 day MA filter.
Note that we imposed onerous all-in transaction costs of 100 bps for annual rebalancing, 150 bps for monthly rebalancing, and 200 bps for daily rebalancing.
Figures 1. and 2. show the dispersion of performance results for these moving average systems for rebalances that occur on the last trading day of each calendar month. In other words, the results for January assume annual rebalancing on the last trading day of January in each calendar year.
The red bars in the charts show the average results for annually rebalanced models across all of the months in the calendar year. The green bar shows the results of the traditional monthly rebalanced system, and the orange bar demonstrates the performance of a system that is rebalanced daily. For those who aren’t familiar with MAR, it is simply the return divided by the maximum drawdown.
Figure 1. Performance results for 252 day moving average system, annual rebalancing at the end of each calendar month
Data source: Bloomberg
Figure 2. Performance results for 200day moving average system, annual rebalancing at the end of each calendar month
Data source: Bloomberg
First, note that the 252 day (~12 month) and 200 day (~10 month) versions deliver statistically indistinguishable results on all relevant metrics. So we can safely assert that the 12 month MA used in the original report is relatively robust. However, the validations end there.
Recall that we penalized annually rebalanced models by 1% per year, the monthly rebalanced system by 1.5% per year, and the daily observed system by 2% per year. To our thinking, the most relevant comparisons are between red and green bars, because they illustrate the average results of all the annually rebalanced systems, and the monthly system, respectively. It’s clear from the charts that results from the monthly rebalanced system are demonstrably better in every performance metric than the average of all annually rebalanced systems. Indeed, the monthly version is better than even the best annually rebalanced systems in most respects.
Results for annually systems rebalanced in certain months – June and July for 252 day MA systems and July and August for 200 day MA systems – show just slightly lower Sharpe ratios and higher MARs than the monthly system. Nascent quants might be tempted to conclude that you would do just as well trading an annual 252 day MA system so long as you trade in June or July, or are trading an annual 200 day MA system in July or August. But this is an illusion.
Recall that there were really just 2 bear markets over the test horizon: the 2000 bursting of the technology bubble, and the 2008 Global Financial Crisis. Further, only the 2008 Global Financial Crisis really qualifies as a true multi-asset class crash. It just so happens that in 2008 most assets, with the exception of U.S. stocks and real estate, delivered strong returns until June. Further, the crash didn’t really get going in earnest until September. The favourable ‘mid summer’ strategies rebalanced in June, July or August also avoided the whipsaws and volatile bottoming process that occurred in the first three months of 2009. Annual strategies that rebalanced in June, July or August were able to capture all of the returns in 2008, avoid almost all of the ensuing crash, avoid the January whipsaw and V bottom in March, and harness a substantial portion of the 2009 rebound. Lucky stuff, not likely to be repeated in the same way next time.
For fun, we took the next natural step for this analysis by examining the performance of annually rebalanced systems traded on each day of the calendar year. There are typically 252 trading days in a calendar year, so we examined the results for systems that trade annually on day 1, day 2, day 3…day 251, day 252. Trade day 1 will have a slightly different calendar date each year, depending on where New Years Day falls in the week, but in all we have 252 different annually rebalanced systems from which to compare results. Figures 3. and 4. show these results separately for 252 day MA and a 200 MA systems. Rather than show the results for each annual trade day (which would have made for a very wide chart), we sorted results into quantiles; this better illustrates the distribution of performance for all of the individual systems.
The numbers at the bottom of each chart represent percentile values. For example, the bar above 0.1 in any chart describes the 10th percentile observation; that is, the observation that is exceeded by 90% of all observations. Among 250 observations, this would be the 25th lowest value. The 0.5 bar is highlighted in red because it represents the median value, or the 50th percentile. 50% of all results exceed this value, and 50% are below.
Figure 3. Quantile analysis of 252 annually rebalanced 252 day MA systems vs. monthly and daily traded systems
Figure 4. Quantile analysis of 252 annually rebalanced 200 day MA systems vs. monthly and daily traded systems
It is useful to compare the median performance (red bars) among all possible annually rebalanced models against the performance of the monthly rebalanced and daily rebalanced versions (green and orange bars, respectively). Note again that in every case the monthly rebalanced system outperforms the median annually rebalanced system.
Somewhat surprisingly, the Sharpe ratios of the monthly rebalanced systems exceed the Sharpe ratios for 99% of the annually rebalanced versions. You can observe this for yourself by comparing the 0.99 bar in the charts to the green and orange bars. You’d have to be incredibly lucky to trade an annually rebalanced system and exceed the performance of the monthly model; less than 1 in 100 who try are likely to be successful.
Some readers may have been wondering whether there was anything magical about the fact that the monthly traded approach always executes on the last trading day of the month. Would results vary if we traded monthly, but on the 8th day of the month, or perhaps day 17? To satisfy your curiosity, we ran the monthly traded system with trading days from day 1 to day 20 in each month to see if this made a large difference to results. Figure 5 summarizes the output.
Figure 5. Performance results for monthly systems rebalanced at each trading day of the month, 10 month MA
Data Source: Bloomberg
Some of you may be surprised to learn that rebalancing on the last day of the month carries no advantage, and may in fact be disadvantageous. Keen systematicians may choose to divide their capital and trade each fraction on a different day of the month to further stabilize results without impacting turnover (though smaller investors may incur more trading costs).
The goal of this article was not to conclude whether annual, monthly, or daily rebalancing is optimal for Faber’s ‘Ivy-5’ portfolio. Indeed, quite the opposite. Rather, the goal was to provide a framework for judging the statistical robustness of a simple systematic asset allocation strategy. In doing so, it’s important to test how sensitive a strategy is to small changes in important features of the system. In this case, while our tests were very consistent with the spirit of the original analysis of the Ivy 5 method, small changes to the asset universe, moving average window, and in particular trade dates resulted in material dispersion in results. For example, the 5th percentile worst outcome for all annually rebalanced approaches, per Figure 3., was a compound return under 4%, a Sharpe ratio under 0.15, and a maximum drawdown of over 25%. In contrast, the 95th percentile outcome was a compound return over 6%, a Sharpe over 0.5 and a maximum drawdown under 10%. Pretty significant.
It also became clear through our analysis that an annually rebalanced approach to an Ivy 5 type methodology is very unlikely to generate the same absolute or risk-adjusted performance as the monthly rebalanced approach, even after accounting for fairly onerous transaction cost assumptions. On the other hand, more frequent daily rebalancing incurs transaction costs that swamp any potential benefits and may be vulnerable to more frequent whipsaws which have the potential to amplify drawdowns.
As simple as possible, but no simpler!