Welcome
About
Research Reports
Litigation Support
Books etc.
Stock Rating Links
e-mail me

 

Blind Validation Study for

Dow Utility Components

 

July 2004

 

We have shown how Bellwether patterns of recent returns can predict a substantial amount of the observed variability in future returns on specific stocks.  The basic approach that we have taken, explained elsewhere, is to find these bellwether patterns and examine how well they have worked over the past, and then to also calculate the statistical significance of the patterns.  When we find a pattern that works well over the past and that has a high level of calculated statistical significance, we use that pattern to predict future returns.

 

For the companies that make up the Dow Utilities index, we have performed the bellwether analysis using ten years of data, and the results for outlooks to the month ahead are shown below:

 

 

Performance of Bellwether patterns for Dow Utilities Components

 

The entire study is available from Quantext, but the results above are sufficient to motivate the discussion here.  The Bellwether patterns are used to assign a rating — STRONG, NEUTRAL, or WEAK.  The incremental monthly return is the difference between the average monthly return for months with a specific rating and the average monthly return for all months.  STRONG rated months over the past ten years have generated a return of 3.7% in addition to the average monthly return, for example. 

 

The average incremental return for STRONG rated months is 3.7% and the average incremental return for WEAK rated months is -4.0%.  The patterns of bellwether relationships that we find are extremely significant.  The Significance, shown in the table above, is the calculated probability that the observed level of predictability could have occurred by chance.  For WMB, the firm with the least significance, the probability that the bellwether patterns is entirely due to chance is estimated to be slightly more than one quarter of one percent—which makes this pattern very significant.  A significance of 1% is called ‘highly significant’ by statistical convention, and WMB exhibits a far higher significance—and all of the other firms are stronger still. 

 

Understanding Statistical Significance

 

There are some important caveats to this type of calculation of statistical significance.  The first, and most important, is that statistical significance is the probability that the observed level of prediction could be this high if you selected the input variables at random from entire universe of all possible variables.  This is, of course, not what we have done.  Whenever you look for causal relationships, there is substantial pre-conditioning of the data.  For the fourteen companies in this study, we are looking at how well recent returns (from the past three months) are precursors of future returns for a single company of interest.  We do not choose five predictors at random, but rather look for the best combination.  This process of looking for the ‘best’ pattern for explaining the data means that the statistical significance will tend to be an over-estimate.  This is the dilemma of any ‘model’ or ‘theory.’  The fact that it explains the past is not a guarantee that it will explain the future, and people can only build models that explain the past and then use these models to forecast the future.  The meaning of the statistical significance is as an indicator of the strength of the relationship (the bellwether pattern, in this case). 

 

For the ten years of data used in the analysis of the Dow Utilities components, for example, Dominion (ticker: D) exhibits a bellwether pattern with Significance estimated at 0.005%, which suggests a robust pattern.  Further, when we look at incremental monthly returns across ratings for D, STRONG rated months exhibit additional returns of 2.1% and WEAK rated months average -3.0% as compared to months of all ratings.  The spread of more than 5% per month between STRONG and WEAK rated months is worth paying attention to.  The statistical significance for D is due to the fact that we are using a model with five variables (five companies comprise the bellwether pattern) to explain about 22% of the historical volatility in monthly returns over the last ten years.  There is no way to calculate the true statistical significance, which would reflect that we looked for the best pattern, but the calculated Significance is indicative of relative performance of the model.  While high statistical significance is likely to be an exaggeration of reality, it is important to note that low statistical significance is invariably a bad sign. 

 

Validating Bellwether Patterns Using a Holdout Sample

 

So what’s the solution to this problem?  Can’t we do better in determining if a pattern is really good?  The most obvious way to solve this dilemma is to use a ‘holdout’ sample from your data.  This means that you calculate the bellwether patterns using most of the data, but then test the predictions on a separate body of data.  This is often called ‘blind’ or ‘out of sample’ testing.  For the Bellwether model, we have performed an out of sample validation that is very formidable.  In our operational outlooks, we use all available data prior to the period being forecasted.  This means that we have data through May 2004 for predicting June 2004, etc.  For our out-of-sample validation, we have created an additional challenge for the model.  We hold out the period from June 2003 through May 2004—12 individual months.  We then calculate bellwether patterns using data through May 2003.  Without ever adjusting the model again, we generate one-month outlooks for each of the months on the hold-out period (June 2003-May 2004).  Having performed this calculation, we can then examine how well the Bellwether patterns perform for assigning ratings. 

 

 

Out-of-Sample Monthly Returns by Rating

 

The table above shows the average monthly returns in the holdout period for months rated as STRONG, NEUTRAL, and WEAK.  The average month across all companies rated as STRONG generates a return of 4.1% per month, whereas WEAK-rated months average -0.9% in return.  The difference between STRONG and WEAK rated months is 5% per month.  This is less than the spread of 7.7% observed for the entire ten year period, but still suggests a substantial advantage to using the ratings.

 

For each company in the table above, the bellwether pattern was used to generate a rating for twelve individual months.  The consistency of the bellwether patterns in predicting high and low return months is quite striking.  This is especially true in that we have actually subjected the bellwether patterns to a more difficult test than we face in reality.  For our forecasts, we would not hold the bellwether patterns static for a twelve month period, without allowing the model to evolve in time—but that is exactly what we have done for this study.  By not allowing the model to use the most current data in this out-of-sample test, we are creating a substantial additional hurdle to forecast performance. 

 

The only notable failure of the bellwether patterns in this analysis are for TXU.  The average return for WEAK rated months is higher than the average monthly return for STRONG rated months for the holdout period.  In the case of TXU, the statistical bellwether pattern has performed very well in explaining the past, but performs poorly in predicting the future.  This may be due to a number of factors, but the most likely is simply that TXU is perceived by the investor community as increasingly independent from its peer group—which is indeed the case.  For the past year in particular, TXU is increasingly discussed as pursuing a business model that is quite different from its peers.  Perhaps the way that a shift in expected earnings between a group of firms in TXU’s sector would have impacted perception of TXU’s prospects in the past no longer holds. 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 





|Welcome| |About| |Research Reports| |Litigation Support| |Books etc.| |Stock Rating Links|