This is a write-up of Section E of Chapter 1 “Why We Need A New Approach to Statistics?” of Real Statistics: A Radical Approach. To register for the ongoing online course: http://bit.ly/AZRealStats – This section describes four fundamental flaws in modern statistics, which make it necessary to rebuild the entire discipline on new foundations for the 21st Century.
In the early 20th Century, Sir Ronald Fisher initiated an approach to statistics that he characterized as follows:: “… the object of statistical methods is the reduction of data. A quantity of data, which usually by its mere bulk is incapable of entering the mind, is to be replaced by relatively few quantities which shall adequately represent the whole …” As he clearly indicates, we want to reduce the data because our minds cannot comprehend large amounts of data. Therefore, we want to summarize the data in a few numbers which adequately represent the whole data set.
It should be obvious from the start that this is an impossible task. One cannot reduce the information contained in 1000 points of data to two or three numbers. There must be loss of information in this process. Fisher developed a distinctive methodology, which is still at the heart of conventional statistics. The central element of this methodology was an ASSUMPTION – the data is a random sample from a larger population, where the larger population is characterized by a few key parameters. Under these assumptions, the key parameters which characterized the larger population would be sufficient to characterize the data set at hand. Under such assumptions, Fisher showed that there were “sufficient statistics” – a small set of numbers that captured all of the information available in the data. Thus, once in possession of the sufficient statistics, the data analyst could actually throw away the original data, as all relevant information from the data set had been captured in the sufficient statistics. Our goal in this section is to explain how this methodology works, why it was a brilliant contribution of Fisher at his time, and why this methodology is now obsolete, and a handicap to progress in statistics.
In the raw data, each data point is unique and informative. But Fisher’s approach anonymizes all of the data by making them all equally representative of a population. This actually has parallels to our real approach – we think of the data as informing us about the real world which is hidden. But the problem is that Fisher uses an imaginary world from which the data comes, whereas we are interested in the real world. According to conventional statistical methodology, the statistician is free to make up a class of imaginary populations, and pretend that the data is a random sample from this imagined population. It is a little-noticed effect of this approach that the data is actually replaced by the imaginary population. Using this methodological freedom, the statistician can restrict the imaginary populations to satisfy some desired prerequisite or bias. Then statistical inference will confirm this bias, making it appear as if the data is providing us with this information, when in fact, it is the bias that has been built into the assumptions, and all data sets will confirm this bias.
Logical Positivism is a nominalist philosophy – it says that we can have knowledge only of the appearances. Knowledge of the hidden reality behind the appearances is impossible for us to have. Therefore, we should abandon the quest for such knowledge. This idea, first introduced by Kant, has had a tremendous influence on subsequent philosophical developments in Europe. There is a critical mistake here, which is not recognized by Western philosophers even today. The mistake arises from the misconception that to qualify as “knowledge” we must have certainty. We can never have certain knowledge of the hidden reality; we can only make guesses regarding it. Western philosophers refused to accord the status of “knowledge” to our guesses concerning hidden reality. The detailed and complex analysis of how this happened has been carried out by Peter T. Manicas in “A History and Philosophy of the Social Sciences”. In reality, we face huge amounts of uncertainty in our lives. Most, if not all, of the knowledge that we have is not 100% certain. We cannot know for certain the intention of the driver on the crossroad as we approach a light turning green. Yet we stake our lives on the guess that he is not slowing down to deceive us into crossing, and plans to speed up to crash into us after luring us into the intersection. In our daily lives, we have no choice but to guess at matters about which we cannot have certainty (like which career to choose, which person to marry, etc.).
Realist Philosophy is opposed to nominalist philosophy in saying the knowledge is concerned almost exclusively with the hidden reality behind the appearances. It acknowledges that such knowledge cannot be 100% certain. We use our hearts for guidance in such matters, but the hearts serve this function well only if they are not clouded by various kinds of influences. We cannot ascertain the purity of our hearts, and cannot be sure that the testimony of heart is 100% reliable. But, this uncertain knowledge is the only kind that is available to mankind.
Note the dramatic difference between nominalism and realism. Using a nominalist philosophy, knowledge is concerned purely with appearances. All appearances can be quantified or measured in some way, and thereby reduced to numbers. Then, since we never need to go beyond the appearances, statistics can confine itself purely to an analysis of numbers. In contrast, real statistics aims to use numbers to learn about the hidden structures of the real world. Probability and Causation are two such structures, which are never observable. We can learn about them only via indirect clues. A realist philosophy holds the knowledge is primarily concerned with the real world, and not with the appearances. Thus, the subject matter of real statistics is radically different from that of nominal statistics. We illustrate this point more clearly in the context of the two prime examples, probability and causation.
Students will be surprised to learn that statisticians do not have a clue as to the real nature of probability. The vast majority of textbooks use the frequency theory definition, which is impossible to defend. One can never flip a coin an infinite number of times, nor can one define the “method of flipping” in a way that remains consistent over time and is guaranteed to produce the same probability on each trial. All attempts to resolve these difficulties have been proven to fail. For this reason, a small minority of authors use subjective probability, which reduces probability to an opinion. This is also of no use in real-world applications, which require assessments of probabilities based on the empirical evidence, not a guess.
Instead of discussing the arguments and counterarguments, it is more insightful to understand why statisticians have failed to find a satisfactory definition of probability. Again, this comes from a commitment to logical positivism. The natural way to define probability is to say that when we flip a coin and observe Heads, Tails could have also occurred with equal likelihood. But this counterfactual event can never be observed. We can never go back in time, recreate the flip, and see if Tails comes up. The time-branching model of probability says that when the coin is flipped the world branches into two – one world in which the outcome is heads, while the other world has the tails outcome. This natural model also captures the time-bound nature of probability – it exists before the experiment, and is extinguished after the experiment. That is, after the coin is flipped, we can no longer talk about probabilities. None of these intuitive and natural characteristics of probability is captured by any of the standard definitions. Commitment to positivism makes it impossible to define probability in a natural way. This textbook provides a new definition of probability, which corresponds to our intuitions about probability, unlike any of the standard definitions
Just as positivism blocks the possibility of defining probability, it also blocks the possibility of defining causality. The best way to understand the statement that Event X caused Event Y is by a counterfactual. In exactly the same circumstances prevailing at the time that event X took place, if event X had not taken place, then event Y would not have taken place. It is impossible to observe the truth of this statement – no empirical evidence can be provided one way or the other. According to the logical positivist philosophy, this statement is meaningless noise, just like morality.
The literature on causality is exceedingly complex. Ever since David Hume pointed out that causation can never be observed, only correlation can, philosophers have been attempting to define causality without success. An enormous amount of literature exists on the topic, but it only increases confusion, without bringing clarity. While philosophers have been confused for centuries, child development experts have shown that infants can understand causality and differentiate between correlation and causation. This leads to a mystery: why can infants understand causality, while philosophers cannot? In this textbook, we will use the children’s approach to cut through the confusion, and present a simple definition of probability not available in statistics textbooks. This is of great importance since one of the most important jobs of statisticians is to figure out causal relationships: “does smoking cause cancer?”, or “is this vaccine effective in preventing the disease?”. Failure to understand basics of causality has led to many wrong inferences about causality, which have caused an enormous amount of loss of lives. Some of these causes are documented in the first two chapters of David Friedman’s textbook on Statistics.
When Max Planck introduced quantum mechanics, it was nearly universally rejected by leading physicists. Einstein never accepted it, even after it had become an established theory. Max Planck was so disappointed by this that he said “Physics progresses one funeral at a time”. Thomas Kuhn noted that scientific revolutions are never accepted by the older generation. It is the new students who are attracted to new methodologies, and eventually create radical changes.
Our new approach to statistics rejects a century of developments based on foundations established by Professor of Eugenics: Sir Ronald Fisher. These will not be welcomed by those who have received deep training in these methods. Most statisticians who have spent many years mastering the complexities of data reduction, will be unhappy to learn that this is no longer necessary. However, the youth will find these new methods very attractive for a number of reasons. First, these methods correspond to our intuitions about probability and statistics. Because we deal directly with the data, instead of imposing arbitrary distributional assumptions, our methods lead to easily comprehensible data analysis. This also makes teaching easy, because the concepts being taught match the intuitions of the students and correspond to their experiences. Even though our methods are direct and simple, they lead to powerful tools for data analysis which will eventually come to dominate. However, this will not happen without a struggle with the establishment.
Students of this course should be aware that the ideas being taught are highly controversial, and will be challenged by statisticians who have been trained to think in very different ways. Students will not be sufficiently familiar with either the new methodology developed in this book, nor the old classical methodology taught in current textbooks, to be able to win arguments against those who are opposed to change. However, it will be easy to persuade neutral third parties because of the transparency and clarity of the new approach.