Thinking outside of the bell curve

Statistical distributions do not usually generate widespread public enthusiasm. Sure, some people spend hours admiring the shape-shifting forms of the beta distribution or the unflappable positivity of the exponential distribution; the limitless potential of the Gumbel distribution or the black-or-white certainty of the Bernoulli distribution. But those people usually reside at the extreme reaches of the societal bell curve.

Until now. The financial crisis has been blamed on numerous factors: inflated house prices, greedy bankers and weak regulation of Wall Street, but a commentator recently suggested an entirely different type of culprit – the Gaussian copula function.  This function measures the dependence between variables, such as the value of the housing and the stock markets, and then describes how they vary jointly. The problem is that the multivariate Gaussian function assumes that the relationship between variables becomes weaker as each of the variables become more extreme – so that the chance of (say) a large simultaneous drop in both house prices and share prices are estimated to be extremely small. Unfortunately, experience shows us that the opposite is often the case.

Plenty of commentators disagree with this hypothesis, but one thing is for certain: the way we measure probabilities of rare events matters. Important decisions are made in the present based on our assumptions on the frequency or rarity of hypothetical extreme events in the future. And this applies beyond the world of financial markets.

Consider some recent (and not-so-recent) meteorological disasters: hurricane Sandy in 2012, the Queensland floods in 2010/11, the North Sea flood of 1953 and the Great Mississippi flood of 1927. In all cases, they were described as a ‘perfect storm’ of factors aligning to cause the catastrophe. In all cases, society would have been better able to manage the event if only one of the factors (such as extreme winds or a king tide) had occurred in isolation, rather than all of them happening at once. And in all cases, the event was largely unforeseen, and its surprise was one of the principal reasons for the event’s catastrophic impact.

The 1953 flood occurred because of the coincidence of a spring tide with a severe storm surge. The high water levels overwhelmed dykes and led to extensive inundation in large parts of The Netherlands and elsewhere.

We must be realistic: how can we possibly foresee catastrophic events when they are often without precedent in the instrumental record? The common response from engineers and planners is to fit a statistical distribution to the (smaller) events that we do have on record, and then extrapolate to generate the hypothetical future event. And the tail assumptions of the distribution – the assumptions that dictate whether events become more or less correlated the more extreme they become – will make all the difference. Consider, for example, the situation where we are interested in the probability that a 1 in 10 year flood will occur on the same day as a 1 in 10 year storm surge. This could be important if we are building a house or a bridge close to the coast, and we want to estimate how high it needs to be. If the two processes are independent then they will happen 1/(10*365)2. This amounts to once every 36,500 years – probably not worth worrying about for all but the most critical infrastructure. In contrast, if they always happen at the same time, you can expect them once every 10 years. In that case, I certainly would want to make sure my house is built to withstand such an event! Of course in most cases the reality is somewhere between the two ends of this very wide spectrum.

What is the solution to this problem? The answer is not trivial, and requires a strong foundation in risk estimation, multivariate statistics, and an understanding of the physical processes that will cause such an event. Certainly more effort needs to be placed on improving our scientific understanding of extremes and more accurately quantifying the risk of rare events, so that we can be better prepared when they occur. By the same token, we also must accept (as pointed out in The Black Swan) that we might never be able to successfully estimate the risk of very rare events, and that there always will be unknown unknowns whose probability cannot be quantified. Whatever we do, we should be spending more time thinking about the assumptions implicit in our methods, and not have blind faith that a statistical distribution – no matter how elegant – will furnish us with all the answers.

Acknowledgements: The idea for this blog post article came from an anonymous reviewer of a paper that some colleagues and I have been writing on compound extremes, as well as some discussions with Dr Michael Leonard on multivariate distributions and the “formula that killed Wall Street”.

This entry was posted in Educational. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *