My palate is simpler than it used to be. A young chef adds and adds and adds to the plate. As you get older, you start to take away. – Jacques Pépin, famous French chef
The current article series deals with the concept of performance decay, which occurs when the performance of a systematic trading strategy is materially worse in application than it appeared during testing. We dealt with the concept of arbitrage in our last post, drawing a parallel with the phenomenon of ‘multiple discovery’ in science. Essentially, we hypothesized that many developers drawing from a similar body of research will stumble upon similar applications at approximately the same time. As these investors compete to harvest the same or similar anomalies, each investor will harvest a smaller share of the available alpha.
We also touched on reasons why we are confident that thoughtful active asset allocation strategies are likely to preserve their strong risk-adjusted return profile for the foreseeable future. Recall that a variety of structural impediments prevent contemporary ‘big money interests’ like pensions, endowments, and other large institutions from exploiting this arbitrage opportunity. At root, these large capital pools are constrained by group-think, corporate structure, and slow-moving governance procedures. These constraints preclude them from migrating their focus from traditional sources of alpha (i.e. security selection) to tactical sources.
This post begins our exploration of the concept of ‘degrees of freedom’ in system development. The term ‘degrees of freedom’ has slightly different meanings depending on whether the context is formal statistics or mechanical systems. While Investment system design often draws from both contexts, for the purpose of this series we will skew much closer to the latter. Essentially, the number of degrees of freedom in a system refers to the number of independent parameters in the system that may impact results.
When I first discovered systematic investing, my intuition was to find as many ways to measure and filter time series as could fit on an Excel worksheet. I was like a boy who had tasted an inspired bouillabaisse for the first time, and just had to try to replicate it myself. But rather than explore the endless nuance of French cuisine, I just threw every conceivable French herb into the pot at once.
To wit, one of my early designs had no less than 37 classifiers, including filters related to regressions, moving averages, raw momentum, technical indicators like RSI and stochastics, as well as fancier trend and mean reversion filters like TSI, DVI, DVO, and a host of other three and four letter acronyms. Each indicator was finely tuned to optimal values in order to maximize historical returns, and these values changed as I optimized against different securities. At one point I designed a system to trade IWM with a historical return above 50% and a Sharpe ratio over 4.
These are the kinds of systems that perform incredibly well in hindsight and then blow up in production, and that’s exactly what happened. My partner applied the IWM system to time US stocks for a few weeks, and lost 25%. Dozens of hours and weeks of late nights at the computer down the drain.
The problem with complicated systems with many moving parts is that they require you to find the exact perfect point of optimization in many different dimensions – in my case, 37. To understand what I mean by that, imagine trying to create a tasty dish with 37 different ingredients. How could you ever find the perfect combination? A little more salt may bring out the flavour of the rosemary, but might overpower the truffle oil. What to do? Add more salt and more truffle oil? But more truffle oil may not complement the earthiness of the chanterelles.
You see it isn’t enough to simply find the local optimum for each classifier individually, any more than you can decide on the optimal amount of any ingredient in a dish without considering its impact on the other ingredients. That’s because, in most cases the signal from one classifier interacts with other classifiers in non-linear ways. For example, if you operate with two filters in combination – say a moving average cross and an oscillator – you are no longer concerned about the optimal length of the moving average(s) or the lookback periods for the oscillator independently; rather, you must examine the results of the oscillator during periods where the price is above the moving average, and again when the price is below the moving average. You may find that the oscillator behaves quite differently when the moving average filter is in one state than it does in another state.
To give you an idea of the scope of this challenge, consider a simplification where each classifier has just 12 possible settings, say a lookback range of 1 to 12 months. 37 classifiers with 12 possible choices per classifier represents 6.6 x 10^18 possible permutations. While a quintillion permutations may not seem like a simplification, consider that many of the classifiers in my 37 dimension IWM system had two or three parameters of their own (short lookback, long lookback, z score, p value, etc.), and each of those parameters was also optimized. Never mind finding a needle in a haystack, this is like finding one particular grain of sand on the beach.
There is another problem as well: each time you divide the system into two or more states you definitionally reduce the number of observations in each state. To illustrate, imagine if each of the 37 classifiers in my IWM system had just 2 states – long or cash. Then there would be 2^37 = 137 billion possible system states. Recall that statistical significance depends on the number of observations, so reducing the number of observations per state of the system reduces the statistical significance of the observed results for each state, and also for the system in aggregate. For example, take a daily traded system with 20 years of testing history. If you divide a 20 year (~5000 day) period into 137 billion possible states, each state will have on average only 5000/137 billion=0.00000004 observations per state! Clearly 20 years of history isn’t enough to have any confidence in this system; you would need a testing period of more than 3 million years to derive statistical significance.
As a rule, the more degrees of freedom your model has, the greater the sample size that is required to prove statistical significance. The converse is also true: given the same sample size, a model with fewer degrees of freedom is likely to have higher statistical significance. In the investing world, if you are looking at back-tested results of two investment models with similar performance, you should generally have more confidence in the model with fewer degrees of freedom. At the very least, we can say that the results from that model would have greater statistical significance, and a higher likelihood of delivering results in production that are consistent with what was observed in simulation.
How many bowls of bouillabaisse would you have to sample to be sure you’d found the perfect combination of ingredients?
Because of this, optimization, like cooking, must be conducted in an integrated way that accounts for all of the dimensions of the problem at once. And this is the driving force behind the strange reality that often times in the investing world, as with cooking, novices seek complexity, while veterans seek simplicity. This is counterintuitive – even for investment professionals, which is why system design has a strange learning curve where the tendency is to move very quickly away from the simple approach that introduced you to systematic trading in first place (in our case Faber’s work along with The Chartist and Dorsey Wright) toward extremely complex designs, each with a very precise optimal setting.
Eventually you recognize the folly of this pursuit, and work backward toward coherence and simplicity. Of course, simple doesn’t mean easy, any more than a novice can follow a simple recipe to recreate a culinary masterpiece. As you will discover, thoughtful simplicity can be deceptively complex. We will give you an example of that in our next article. For now, please pass the salt and pepper.