Wednesday, May 2, 2012

Take-the-Best Statistical Model

Why do we do multiple regression?


Multiple regression is the workhorse of econometrics.  Almost every empirical paper in economics relies on it, and it also forms the basis for a large majority of political science research.  But is this a valid model for prediction?  How sensitive are the results?

As it turns out, the answer is "very".  Multiple regression is sensitive to a host of issues, including normality, linearity, and low error data.  But if the real world doesn't always fit these assumptions, why do we try to use the model to predict the real world?  One thing to notice when reading the empirical papers is that they often tell you that a certain coefficient is statistically significant, but rarely does one see a given confidence interval for that coefficient.  Of course, listing confidence intervals for every coefficient, especially when there are so many, is quite cumbersome.  Yet this convenient omission often leads us to be too confident about our estimates.  How much do we really know?

Behavioral economists have long criticized this model of human decision making because there's no feasible way that we can run a regression in our head and then make a decision.  At least, I know I don't.   Although many of my friends may have used excel spreadsheets to decide where to go to college, they did not end up basing their decision on some kind of complex regression model.  It's just too computationally intractable for everyday use.

Furthermore, one key problem of multiple regression is ecological validity.  We know that the regression model predicts the sample pretty well, but does it predict the future with any accuracy?  Especially if the future is highly variable and uncertain, why should we trust our Gaussian methods that are highly sensitive to outliers? According to Gerd Gigerenzer, the most accurate rules are often not the high powered intensive statistics methods.  Rather, fast and frugal algorithms that actually limit the information they evaluate can create more accurate results.

One of the prototypical fast and frugal algorithms Gigerenzer describes is Take the Best.  While multiple regression would look at all the data and perform various tests on individual data's contribution to the dependent variable, Take the Best does a sequential evaluation of a list of key determinants.  For example, multiple regression would decide between two restaurants by looking at all the data: food quality, wait time, location, parking spots.  It would then weight each input carefully according to an equation, and then look at the results of the equations for the two restaurants.  Whichever restaurant returns the higher value is the restaurant that's chosen.  On the other hand, Take the Best would look at whether the food quality gap is high.  If so, then pick the restaurant with better food.  If the gap is not high enough, move on to the next rule and repeat this simple process.  Computationally, this would require M+1 evaluations, which is in linear time and computationally quite tractable.

Gigerenzer applied this heuristic to predicting Chicago high school dropout rates.  Given two high schools and all the associated statistics: attendance rate, proportion of low income students, social science test scores, and more, what's the most accurate way to predict which high school had a higher dropout rate?   Gigerenzer and his fellow researchers took half the population of schools and built a Take the Best model and a multiple regression model to explain the data.  No surprise, multiple regression did better, predicting over 70% of the pairs correctly while Take the Best only managed around 65%.  Yet when the two models were tested on the other half of the population, Take the Best had about a 60% accuracy rate and multiple regression barely manged around the low 50%'s.

Surprising?  Gigerenzer in Gut Feelings explains:
But why did ignoring information pay in this case? High school dropout rates are highly unpredictable-in only 60 percent of the cases could the better strategy correctly predict which school had the higher rate, (Note that 50 percent would be chance.) Just as a financial adviser can produce a respectable explanation for yesterday's stock results, the complex strategy can weigh its many reasons so that the resulting equation fits well with what we already know. Yet, as Figure 5-2 clearly shows, in an uncertain world, a complex strategy can fail exactly because it explains too much in hindsight. Only part of the information is valuable for the future, and the art of intuition is to focus on that part and ignore the rest. A simple rule that relies only on the best clue has a good chance of hitting on that useful piece of information.
Looking backwards can hurt; you might end up blindsided by what the future can hold. The data might show trends that are only valid for the sample, and not the population as a whole.  Your results won't be ecologically valid if all the data is taken into consideration.  Especially since correlations change substantially over time, Gaussian methods are more likely to offer the pretense of knowledge than knowledge itself.

This has massive policy implications.  From Gigerenzer:
According to the complex strategy, the best predictors for a high dropout rate were the school`s percentage of Hispanic students, students with limited English, and black students-in that order. In contrast, Take the Best ranked attendance rate first, then writing score, then social science test score. On the basis of the complex analysis, a policy maker might recommend helping minorities to assimilate and supporting the English as a second language program. The simpler and better approach instead suggests that a policy maker should focus on getting students to attend class and teaching them the basics more thoroughly. Policy, not just accuracy, is at stake.
Yet with these policy issues at stake, it's surprising that fast and frugal algorithms aren't used more in economics research.  One disadvantage of take the best is that it doesn't give much quantitative accuracy.  It only tells which value is higher, but not by how much.  But how much does that matter?  While multiple regression may give more statistically significant coefficients, do we really have the power to tune the economy that much?  Even the best of natural experiments don't result in parameters that don't change through time.  Romer and Romer beautifully estimate tax elasticity, but how arrogant would a person need to be to build our entire tax policy based on an estimation from an almost 80 year old data set?  DSGE quantitative accuracy is such a joke that peripheral ad-hoc models are needed to make them even somewhat useful.

These alternative statistical tools are likely to add much to our insight of models.  The development of robust heuristics will be critical in a complex world, in which calculation becomes increasingly difficult and Gaussian methods increasingly fragile.

No comments:

Post a Comment