The talk linked below explains why the positivist/nominalist methodology used in Econometrics leads to mostly nonesense regressions. It also explains how a realist alternative can be developed.
“The Philosophy and Techniques for Quantitative Research” – Keynote Address by Dr. Asad Zaman, VC PIDE at Workshop on 19-20 April, 2018 Dept of Economics, Fatima Jinnah Women’s University, Rawalpindi, Pakistan.
My message will come as a surprise to students gathered here to learn advanced econometric techniques. Let me begin by stating it baldly: “Econometrics is nothing more than Fraud by Numbers”.
As Joan Robinson famously said, “The purpose of studying economics is not to acquire a set of ready-made answers to economic questions, but to learn how to avoid being deceived by economists.” A similar statement holds for econometrics. We should learn it not in order to acquire techniques which will teach us how to use data sets to make inferences about reality. Rather, we should learn it to avoid being deceived by econometricians. The techniques described by Perkins in “Confessions of an Economic Hit-Man” are in common use around the world. Fancy econometrics is used to persuade people to adopt policies which harm the public, while fattening corporate coffers.
As a simple illustration of econometric fraud, consider the following regression:
CONS = -268.7 + 6.78 SUR – 1.82 CO2 + error (R2=0.84)
Std Err: (25.9) (0.73) (0.65) (20.0)
Where CONS = Private Consumption Expenditure in Pakistan, SUR =Survival to age 65, female (% of cohort) = SP.DYN.TO65.FE.ZS, C02 =CO2 emissions from gaseous fuel consumption (% of total)= EN.ATM.CO2E.GF.ZS – these are variables taken from the WDI data set.
This regression has a high R-squared, and highly significant coefficients, but is completely meaningless. It is known as “nonsense” or “spurious” regression. What is not well-known or understood, especially by econometrics students, is that nearly all regressions are nonsense. The reason for this is very simple. Regression proceeds by making a large number of assumptions – stochastic structure of errors, exogeneity of regressors, correct specification (inclusion of all relevant regressors, and exclusion of irrelevant ones), linearity of functional form, etc. In nearly all real world applications, not all of these assumptions hold, which makes the regression results worthless. (see Axiom of Correct Specification for more details.)
The reason regressions are fraudulent is because, as textbooks state clearly in initial chapters that there are many assumptions which are required to do regression analysis. However, after an initial mention of assumptions, the textbooks do all regression analysis without any further discussion of the assumptions. Thus the impressions is created that the DATA imply RESULT, and the assumptions are hidden in the background without any discussion. As a result, students, econometricians, and readers are deceived into thinking the result are valid because they are coming from the data, when in fact the result are not valid because they are coming from the false assumptions made in regression analysis.
This is a shocking claim: everything currently being taught in econometrics textbooks all over the world today is a fraud. This immediately leads to a second question: The West has a sophisticated, deep, and complex intellectual tradition, which has led to miraculous innovations which surround us in our daily lives – miracle drugs, rockets, computers and an endless list of technological wonders. How is it possible for them to make such major mistakes in statistics? The implausibility of the idea that a mathematical genius can make elementary blunders in calculations creates a psychological barrier in the path of understanding my message. Therefore, I will briefly elaborate on the reasons why Western intellectuals went astray in economics and econometrics. For further discussion, see my post and lecture on “Economics for the 21st Century”, which discusses three major mistakes which were made in developing the foundations of the social sciences in the West.
Science was introduced to the West during the re-conquest of Spain, completed in 1492, which gave Europeans access to millions of books in the libraries of Al-Andalus – Islamic Spain. This knowledge was often strongly in conflict with teachings of the Catholic Church, which did their best to suppress it, using censorship and the Inquisition for battling heretical thoughts. It took Europe two centuries of violent struggles between orthodoxy and the floodgates of new knowledge to assimilate and absorb some portions of this knowledge. This became known as the battle of “Science” and “Religion”, and has decisively shaped European thought. The bitterness of the fight, and the eventual victory of science, led to the still dominant European misconception that Science is the only valid source of knowledge – see “Method or Madness?” for further elaboration. The European philosophers became engaged in a centuries long effort to understand what science is, and to prove the superiority of science, and to prove the exclusive claim of science to the production of knowledge. This “Deification of Science” has led to an intellectual disaster which has still not been understood, and has caused deep and fundamental flaws in the foundations of Western Social Sciences.
Very briefly, widespread acceptance of what might be termed Kant’s Blunder led to the emergence of nominalist methodology – the idea that observations matter, and the underlying reality which generated the observations does not. Kant said that until now, philosophers have been concerned with the problem of whether our mental models of reality match the true reality. However, this is an impossible problem – the true reality (the thing in-itself) is forever hidden from us. We should abandon this effort to match mental models (scientific theories) to true reality. Instead, we should concern ourselves with how the mind organizes observations in order to generate theories, which order these observations into coherent ensembles. This major blunder became widely accepted and eventually led to the emergence of the philosophy of logical positivism, which became wildly popular. Even though this fatally flawed philosophy was eventually rejected by philosophers, the methodological foundations of modern economics and modern econometrics were constructed on the basis of this philosophy, and they have not been revised since the collapse of positivism. This is the central reason why modern econometrics mostly produces nonsense regressions. After this digression on philosophy and methodology, we come back to a more concrete discussion of the current nominalist methodology of econometrics, and how it needs to be replaced by a realist methodology.
In a capsule summary, the Nominalist Philosophy of Science is based on the following three principles:
N1: It is the goal of science to search for laws. A scientific law is a strict regularity or a pattern in the collection of observations.
N2: The scientific method looks for patterns and regularities in collections of data. There are many ways to find patterns. If a pattern enables prediction, it is a candidate for a scientific law.
N3: Explanation (and prediction) in science proceeds by appeal to scientific laws.
All three methodological principles, at the heart of modern economics and econometrics, are fundamentally wrong (for further details, see Methodological Mistakes and Econometric Consequences). We need to replace them with three alternative principles which would form the foundations for a Realist Philosophy of Science
R1: A scientific law identifies causal effects in operation between objects, observable and not observable, which exist in the real world.
R2: Scientific methodology consists of using abductive inference from experience to develop and test hypotheses about causal mechanisms, observable and non-observable, at work in the world.
R3: Explanation in science is causal explanation: We can often explain when it would have been impossible to predict.
These abstract philosophical concepts are best understood in the context of real world examples. Consider for example the Regression of Infant Mortality on GNP in Pakistan:
InfMort = 183 -0.22 GNP/cap
The R-squared is 95%, and both coefficients are highly significant, so using standard econometrics methodology, we would come to superficially plausible conclusion that GNP per capita is a strong determinant of Infant mortality, and increases in GNP per capita lead to decreases in infant mortality. According to the realist approach, this is a wrong conclusion. The regression just display a PATTERN – in Pakistan GNP/cap has been increasing, while Infant Mortality has been going down. There is no necessary connection between these two events. This is the fundamental weakness and flaw in the nominalist methodology; it stops at displaying a pattern, and mistakes patterns for laws. To move towards a realist methodology, we need to ask whether this pattern is just a coincidence – two things happening together without any close connection – or whether there are some real underlying causes which lead to this pattern occurring. To find out, we must trace the causal chain – HOW does increasing GNP affect Infant Mortality? We can think of many reasons why this might happen. We could look at the Government Expenditures on Public Health, and especially those expenditures which relate to infant mortality – for example, Lady Health Workers, and many other schemes which involve healthcare for children and infants. Similarly, we might look at the effects of rising income on the populations usage o health services. The point is that we must search for strong underlying causal connections, in order to come up with real explanations of patterns.
Now turn to the question of whether or not we should do regressions, and if so, How? Used with suitable precautions (which is OFTEN not the case, because these precautions are not taught in typical courses) regressions can highlight unusual patterns in the data. These patterns provide us with a clue to real forces in operation which lead to the occurrence of these patterns. In order to use regressions correctly, we must run regressions on variables which are connected via strong and short causal chains. It is evident the X must directly impact Y. A very useful tool for this purpose is the methodology of “Structural Equation Models”. This allows us to develop path diagrams of causal relationships between variables, and thereby capture a complex causal structure, which is a realistic depiction of the real world. It is by no means a magic bullet, and it is easy to abuse this method. It depends heavily on the advance specification of the correct causal chains. If the researcher specifies the structure accurately, capturing the short and strong causal chains in the initial path diagram, this methodology would allow us to achieve reasonable inferences. We need to develop the correct PATH Diagrams, and test EACH LINK separately, with reference to real world events, and not just by looking at correlations in the data. To explain this a bit further, we look at a few examples.
The famous Quantity Theory of Money says that increases in money stock lead to proportional increases in prices, with no other effects on the real economy. Friedman and Schwartz have written a long book analyzing the patterns of growth of money and those of prices, and looking at how these move together. According to the realist methodology, looking at patterns can provide clues, but the crucial task is to determine the mechanism by which money influences prices. In order to do this, we have to look at the process by which money is created, and the process by which prices change, and find the link between the two. The State Bank of Pakistan creates money and loans it, either to the government, or to private banks. The government then spends it on salaries (and other purchases). The salary would add to income of government officials, and would increase their aggregate demand, which can have an effect on prices. Note that this would affect prices of only the class of consumer goods which are purchased by government employees. Similarly, we can track the effects of other government expenditures and see what types of prices they can affect. Similarly SBP loans to the private banks allow them to extend credit, by a huge multiplier, to the private sector. Again, we have to see the nature of the loans they make – whether these are consumer loans or investment, or some other type. This will then determine the inflationary impact. The point is that realistic methodology requires careful consideration of the intricate details of the mechanism by which money affects prices – an overall finding of patterns is not satisfactory science from the realist point of view.
Once we understand the regression analysis is based on adding unverifiable and typically false assumptions to the data set, then the importance of simply looking at the data becomes much clearer. Econometrics conveys the impression that by using more and more sophisticated and complex technique, one can extract more and more information from the data set. This is an illusion – sophisticated techniques add more assumptions and hence they give even more false and misleading results. Instead, we need to learn very simple technique of data analysis, which work reliably with minimal assumptions. In particular, newly emerging powerful techniques of Data Visualisation are very useful. They allow us to look at a picture of the data. This picture is all there is! There is no more information in the data than what is available from suitably drawn pictures of the data. Once again, let me provide a concrete and practical demonstration of these ideas with reference to a particular data set.
My course on “Descriptive Statistics: An Islamic Approach” (freely available online) introduces and incorporates many of these ideas in order to create a unique course in statistics. The lectures focus purely on describing and picturing the data, without any theoretical tools. We start by explaining how the Islamic approach makes a difference to the teaching of statistics (see the first lecture). Subsequent lectures explain that statistics is basically about making arguments with numbers. The power of statistics comes from the deceptive nature of numbers – some are concrete, factual, and indisputable, but most numbers we use are measures constructed in arbitrary and meaningless ways. Nonetheless, once we construct a number, it creates a convincing argument, because who can argue with numbers? The course starts by explaining the arbitrary nature of index numbers, and how one can prove anything using such numbers.
The next few lectures cover basic concepts like measures of central tendency and variation. We explain how statistical techniques emerged in the early part of the twentieth century when there was very little computational power available. As a result, following the typical pattern already discussed in context of regression models, ASSUMPTIONS were made to substitute for computational capabilities. The assumption that the data is always normal is exceedingly powerful because it allows us to reduce the data set to just two numbers: mean and standard deviation. This assumption became embedded as a bedrock of statistical analysis, because the computational requirements for analysis of non-normal data sets were much greater. Unfortunately, most real world data sets are not normally distributed. Now we have the computational power required for their analysis, but the theory is still stuck in pre-computer era. The best measures of central tendency and dispersion are the median and the IQR, but textbooks continue to teach the mean and standard deviation, which are best for normals, but very poor for other distributions. My lectures explain the properties of these measures in context of real data sets, and shows how mean, median, and mode must be chosen according to the real world purpose of the analysis – the choice cannot be made in the abstract, on purely theoretical grounds. This is the essence of a realistic approach to statistics and econometrics.
Finally, we end the lecture with an example of how a visual and graphical data analysis provides us with lots of useful information, without any need of regression or other assumptions required for statistical models. Graphs of the median infant mortality show a remarkable global decline. Graphs of Pakistan show that it has remained at a constant 80th percentile, following the overall global trend. However, both Iran and Egypt have made remarkable improvements in their percentile positions. Starting from a position worse than that of Pakistan, they have gone to better than 50th percentile. What accounts for their superior performance? For this, we need to look at the programs in Iran and Egypt to see what they did. This is the essence of real econometrics. Data analysis must be done in the real world context. We analyze data generated by the real world, and then use it to try to understand the real world. Data analysis cannot be done in isolation, purely on a set of numbers, detached from their real world meanings, and detached from a purpose for the analysis.