Fisher’s Flawed Foundations of Statistics

This lecture is about the personality of Sr Ronald Fisher and his foundational ideas about statistics. The idea that statistics is all about “data reduction”, is due to lack of computational capabilities at the time of Fisher – it was not possible to analyze thousands of data points, without reducing them to manageable summaries. Even though computer capabilities now make this possible, intellectual inertia has kept the discipline of statistics bound to the now obsolete mold into which it was cast by Fisher.

Fisher was a prominent Eugenicist, and he had six children in accordance with his belief that the path to improvement of the human race involved increasing the propagation of superior specimens of humanity. A central question for us is:  “Is modern statistics FREE of its Eugenicist origins?”. The minority position is NO. This position is described and well defended by Donald Mackenzie in his book: “Statistics in Britain,1865 to 1930:The Social Construction of Scientific Knowledge”. He writes that “Connections between eugenics and statistics can be seen both at the organization level and at the detailed level of the mathematics of regression and association discussed in chapters 3 and 7. Without eugenics, statistical theory would not have developed in the way it did in Britain – and indeed might not have developed at all, at least till much later.” In brief, Eugenics shaped the tools and techniques developed in statistics. However, the Dominant View is that Moderns Statistics is FREE of its racist origins. This view is ably defended by Louçã, Francisco in his article on “Emancipation Through Interaction–How Eugenics and Statistics Converged and Diverged.” Journal of the History of Biology 42.4 (2009): 649-684. He argues in favor of the Consensus View: There is no doubt that origins of statistics are due to Eugenics project, but it has now broken free of these dark origins.

In this part of the lecture, we look at the personality of Fisher, and assess how it shaped the foundations of statistics. It is acknowledged by all that Fisher was cantankerous, proud & obstinate. He would never admit to mistakes, and was stubborn in defending his position, even against facts. He was also vengeful: to oppose Fisher was to turn him into a permanent enemy. In many battles, Fisher took the wrong side. HOWEVER, he won most of his battles because of his brilliance, to the detriment of truth. The impact of Fisher’s victories has permanently scarred statistics, and continues to guide the field in the wrong directions. This lecture is about SOME (not all) of his fundamental mistakes.

Perhaps the most basic, and also the most confusing, was the battle between Fisher and  Pearson regarding the testing of Statistical Hypothesis. This is confusing because today both of the two conflicting positions are taught to students of statistics simultaneously. Even though the conflict was never resolved, it is now ignored and glossed over, buried under the carpet. The Fundamental Question is “WHAT is a hypothesis about the data?”. According to Fisher, a hypothesis treats data as a random sample from a hypothetical infinite population which can be described by a FEW parameters. WHERE does this ASSUMPTION come from? It comes from the NEED to reduce a large amount of data to a FEW numbers which can be studied. This reduction is needed because of our LIMITED mental capabilities – we cannot handle/understand large data sets. Fisher wrote that: “In order to arrive at a distinct formulation of statistical problems, it is necessary to define the task which the statistician sets himself: briefly, and in its most concrete form, the object of statistical methods is the reduction of data. A quantity of data, which usually by its mere bulk is incapable of entering the mind, is to be replaced by relatively few quantities which shall adequately represent the whole, or which, in other words, shall contain as much as possible, ideally the whole, of the relevant information contained in the original data.” The parametric mathematical model for treating the data as a random sample from a hypothetical infinite population allows us to reduce that data, making inference possible. The hypothetical infinite population does not have any counterpart in reality.

What is to prevent the statistician from making completely ridiculous assumptions, since the model comes purely from the imagination, and purely for mathematical convenience? For this purpose, Fisher proposed the use of p-values. If the data is extremely unlikely under the null hypothesis, this casts doubt on the validity of the proposed model for the data. The p-value tests for GROSS CONFLICT between data and the assumed model. One can never learn whether or not the model is true, because there is nothing real which maps into the assumed hypothetical infinite population which follows the theoretical distribution being assumed. To Fisher, the mathematical model is a device to enable the reduction of the data, and not an true description of reality.

In a classical example of mistaking the map for the territory, the Neyman-Pearson theory of hypothesis testing takes the Fisherian model as the TRUTH.  The Null hypothesis is ONE of the parametric configurations. The Alternative hypothesis is SOME OTHER parametric configuration. The Neyman-Pearson theory now allows us to calculate the exact most powerful test – under the assumptions that the parametric models COVER the truth. The possibility of TYPE III errors – that is, none of the assumed parametric models is valid – is ruled out by assumption, and never taken into consideration. BUT the assumption of a parametric model to describe the data is arbitrary. The imaginary infinite population following a theoretical distribution has been made up just for mathematical convenience!

In the course of the bitter personal conflict which ensued, the real issues, related to the common weakness of both approaches were ignored and suppressed. Instead, Fisher’s promotion of his methods led to dramatic misuse &  abuse of the Fisherian p-values. The P-values were MEANT to assess gross conflict and serve as a rough check on the modelling process. Instead, these were turned into a REQUIREMENT for valid statistical results. The hugely popular philosophy of science developed by Karl Popper was very useful in elevating the importance of the p-value: we can never PROVE a scientific hypothesis, but we can disprove them. A significant p-value disproves a null hypothesis creating a scientific fact. Insignificant p-values mean nothing. This led a fundamentally flawed statistical methodology currently being taught and used all over the world. The problem is that there are huge numbers of hypothesis which are NOT in gross conflict with the data. By careful choice of parametric models, we can ensure that our desired null hypothesis does not conflict with the data. The Neyman-Pearson theory can ADD to this illusion of the validity of imaginary hypothesis, if we find alternatives which are even more implausible than our favored null hypothesis.

Fisher Versus Gosset. The p-value invented by Gosset measures statistical significance, which is very different from practical significance. Gosset warned against confusing the two from the beginning. Unfortunately, because it was a tool in Fisher’s war against Neyman-Pearson, Fisher pushed it to the hilt. This led to a fundamental misunderstanding of the role and importance of p-values in statistical research which persists to this day.  The damage inflicted by these misguided statistical procedures has been documented by Stephen T. Ziliak and Deirdre N. McCloskey in The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives

Perhaps of even greater fundamental importance was the battle between Fisher and Sewall Wright. Sewall Wright invented path analysis – a method for assessing CAUSAL effects. If this method had been understood and adopted, modern statistics would be entirely different. Unfortunately, Sewall Wright had a fight with Fisher on some obscure genetics controversy related to EUGENICS. As a result, Fisher’s ignored, neglected, and criticized, all of Sewall Wright’s contributions and attempts at developing a theory of causality. To be fair, this was not entirely Fisher’s fault. Theories of knowledge in vogue, based on logical positivism, suggested that unobservables cannot be part of scientific theories. This led to difficulties in understanding causality, because it is never directly observable, and is always based on understanding of unobservable real-world mechanisms (for more details, see Causality As Child’s Play). Over the past few decades, there have been revolutionary advances in understanding of causality, made by Judea Pearl and his students, which build on causal path analysis similar to the methods of Sewall Wright. Unfortunately, statisticians and econometricians have mostly failed to learn from these methods, because they go against decades of indoctrination against such methods.

Failure to understand causality continues to be a serious problem for statistics. One of the most dramatic illustrations was the controversy about Cigarettes and Cancer in the middle of the 20th Century. For more details about this controversy, see Pearl & Mackenzie: The Book of Why (Chapter 5) and also Walter Bodmer:  RA Fisher, statistician and geneticist extraordinary: a personal view. A friendly relationship turned into enmity when Bradford Hill and Richard Doll published an extensive empirical study documenting the effect of smoking on cancer. This conflicted with Fisher’s views that correlations cannot prove causation, and also ideology of libertarianism. These convictions led Fisher to deny empirical evidence regarding the link between smoking and cancer long after it had become overwhelming. Because of his enormous prestige, his opinions delayed recognition of the link, and the necessary policy response. Fisher’s obstinate refusal to accept strong statistical evidence in conflict with his ideologies delayed the policy response, and probably led to substantial loss of lives due to lung cancer.

What lessons can be learned from this personal history of the founder of modern statistics? Islam teaches us a lot about the search for knowledge. See Principles of Islamic Education for a detailed discussion. Here we briefly discuss some of the  required attitudes for Seekers of Truth. We must learn to value knowledge as the most precious treasure of God,  seeking it with passion, energy, and utmost effort. This was one of the keys to how Islamic teachings made world leaders out of ignorant and backwards Bedouin.  We must also understand that knowledge, or insight, is a GIFT of God. We must learn to take small steps, and be grateful for small advances in understanding. Knowledge is like a castle constructed brick-by-brick from small elements. We must acquire patience for the long haul, instead of expecting quick results. The knowledge we acquire does not come from our personal capabilities; it is a gift of God. We cannot take pride in discoveries because they are not due to my genius. The pride of Qaroon (Bible: Korah) is condemned – that my wealth is due to my own wisdom and capabilities, and therefore I do not recognize the rights of others. We must learn humility & gratitude: I have been given knowledge beyond what I deserve, and beyond my capabilities. Furthermore, because of our limited capabilities, we can often make mistakes, and fail to recognize the truth, and confuse it with falsehood. An essential part of the search for truth is UNLEARNING — we must be ready abandon cherished preconceptions, and rebuild our knowledge on new bases if the evidence calls for it.

3 thoughts on “Fisher’s Flawed Foundations of Statistics

  1. Thank you for this deeper dive in the problems of categorization , taxonomy underlying our statistics. My take is also the over-reliance on price-derived data underlying all macroeconomic indicators from GDP to inflation, debt, deficits, etc.

    since prices are a function of human ignorance ( always historic and allowing externalities ) see my most recent critiques on our Latest headlines and Books & Reviews page on http://www.ethicalmarkets.com

  2. Thanks Hazel, your work on ethical markets is of central importance in showing us the goals we need to collectively strive for, in order to protect the planet from disaster, and improve the quality of lives of its inhabitants. Statistics are manufactured to support the dominant perspectives – it is very costly to collect data on a global basis, and those who pay the piper call the tunes. When people throw statistics at me, I tell them that the single book with highest sales in statistics – with more sales than all other textbooks combined, is entitled: “How to Lie With Statistics” by Darrel Huff.

  3. From, Statistics in Britain 1865 -1930 The Social Construction of Scientific Knowledge. Donald A. MacKenzie. 1981. 225-226.

    Science is an activity not of passive contemplation and ‘discovery’ but of invention. It is goal-oriented, and, while its goals may all in a general sense have to do with the enhancement of the human potential to predict and control the world, they represent different particularisations of this overall objective. The pursuit of particular goals is typically sustained by social interests located either in the internal social structure of science or in that of society at large. Scientific knowledge is thus a social construct in two senses. First, in that it is typically the product of interacting groups of scientists. Second, in that social interests affect it not merely at the organisational level but at the most basic level of the development and evaluation of theories and techniques. Because science is goal-oriented, and because its goals are socially sustained, scientific knowledge is constitutively social.

    A final note of caution is, however, perhaps in order. To say, following Habermas, that goals and interests are constitutive of knowledge is to invite a possible misunderstanding. The German language differentiates between two aspects of the notion of ‘knowledge’; Erkenntis (‘the act, process, form or faculty of knowing’) and Wissen (‘the passive content of what is known’). Habermas’s analysis refers to the first, rather than the second (Habermas 1972, 319). So must any similar analysis, if it is to avoid the ‘genetic fallacy’ of concluding that the origins of knowedge forever determine its status. Knowledge must be analysed as a resource for practice, and knowing must be seen as a process.

    The analogy between knowledge as a resource for practice and tools in the everyday sense may make this point clearer. A tool’s construction will reflect the tasks for which it was designed, and it will initially be evaluated according to its adequacy in the performance of these tasks. This does not mean, however, that its use is always limited to these tasks: it may well be found helpful for purposes quite different from those for which it was developed. Similarly, the construction and evaluation of knowledge can be structured by particular goals without these determining for all time the fate of this knowledge. Of course, it is true that the initial uses of a tool may well give us a clue as to other possible uses, may suggest the amendments that will be required to achieve different objectives with it, and may indicate in which situations we may have to discard it. All of this, however, is contingent, not necessary.

    That eugenic concerns structured Galton’s and Pearson’s statistical theory does not imply, therefore, that the modem statistician who does not share these concerns need necessarily eschew the use of the concepts developed by them. It is not that the acceptance of a technique by modern statisticians guarantees its context-independent validity. Rather, the construction and evaluation of statistical theory by modem statisticians needs to be studied in its own right before any conclusions can be drawn as to the goals and interests constitutive of present-day statistics.

    ‘Our statistics is different’, the modem statistician may well claim. To say this is false in one sense, true in another. It is false, in that to claim that ‘we’ have achieved eternally valid knowledge, or evaluations not structured by context or interest, would be unjustifiable. It is true, to the extent that ‘our’ statistical theory has emerged in a historical process [different] from ‘theirs’. This historical process has largely been one of the generalisation of the scope of statistical theory, as statisticians have come to grips with new situations. ‘Their’ concepts have been modified, stretched or discarded. So ‘our’ statistics is in this sense more general than ‘theirs’, and hence it is relatively easy for us to see the context-bound nature of ‘their’ thought. It is not that ‘our’ statistics explains ‘theirs’ as a special case; rather, ‘theirs’ helps to explain ‘ours’, in that ‘their’ knowledge was used in the construction of ‘ours’. It is not, as a Platonist might have it, that Galton and Pearson discovered some of the current stock of truths; rather, it is that they, in solving their problems, produced resources that have been used by later statisticians to solve other problems. ‘Our’ statistics is different from ‘theirs’ in that it has evolved from it; but, like ‘theirs’, it is a social and historical product, and can and should be analysed as such.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.