This is the continuation of a sequence of posts on methodology of economics and econometrics (For previous posts, see: Mistaken Methodologies of Science 1, Models and Realities 2, Thinking about Thinking 3, Errors of Empiricism 4, Three Types of Models 5, Unrealistic Mental Models 6, The WHY of Crazy Models 7, The Knowledge of Childless Philosophers 8, Beyond Kant 9,). In this (10th) post, we consider the methodology of econometrics, which is based on Baconian or observational models. That is, econometric models tend to look only at what is available on the surface, as measured by observations, without attempting to discover the underlying reality which generates these observations. This is an over-simplified description, and we will provide some additional details about econometric methodology later.
The methodology of econometrics is rarely discussed in econometrics textbooks. Instead, students are taught how to DO econometrics in an apprentice-like fashion. All textbooks mentions the “assumptions of the regression model” in an introductory chapter. In later chapters, they proceed to do regression analysis on data, without any discussion of whether or not these assumptions hold, and what difference it could make to the data analysis. The student is taught – by example, not by words – that these assumptions do not matter, and we can always assume them to be true. In fact, as I learned later, these assumptions are all-important and these actually drive all the analysis. By ignoring them, we create the misleading impression that we are learning from the data, when in fact, the results of the analysis come out of the hidden assumptions we make about it. Once one realized this clearly, it becomes EASY to carry out a regression analysis which makes any data produce any result at all, by varying the underlying assumptions about the functional form, and the nature of the unobservable errors. This issue is discussed and explained in greater detail in my paper and video lecture on “Choosing the Right Regressors“. The fundamental underlying positivist principles of econometric methodology, which are never discussed and explained, can be summarized as the following three ideas:
Discovery of Scientific Laws: The GOAL of data analysis is to find patterns in the data. By changing functional forms, and adding random errors, the range of patterns that we can find in any finite amount of data is massively expanded from what we can see in scatter plots of the data. Any pattern we can find, subject to rules students are taught, is a candidate for a new scientific law.
Verification of Scientific Laws: By running regressions until we get a good fit, we discover scientific laws. By now it is well established that some of these good fits are “spurious” – they are ‘accidental’ patterns in the observations, which do not actually have any significance, or law, which drives them. So how do we assess a potential scientific law, to see if it is valid? The standard positivist answer is “forecasting”. If a law forecasts well, then it is valid – it is being tested OUTSIDE the range of data on which it was estimated, so if it continues to hold good, this represents validity.
Explanation by Scientific Laws: It is thought that learning scientific laws deepens our understanding of reality. But there is an active quarrel among philosophers as to what it means to have deeper understanding. According to the positivists, the patterns that we see in the observations ARE the understanding. There is no more (deeper) understanding to be had. This has been formalized in the Deductive-Nomological (D-N) model of explanation. To “explain” a particular event is to show that it is a particular case of a general law. If a particular data point fits a regression (law), then it is explained by the regression.
All three of these ideas, on which modern econometric methodology is based, are challenged and contradicted by a “Realist Approach” to Econometrics.
Discovery of Unobservable Objects and Effects: The object of data analysis is NOT to find patterns (good fits, high R-squared). Rather, we look for patterns which reveal hidden, unobservable, real world objects and effects which manifest themselves in the patterns that we see. For example, we observe the pattern (opium ==> sleep): opium puts people to sleep. We ask “why” – perhaps it is some chemical contained within opium which has this property – if so, all compounds which contain the same chemical will have this property. We look at chemical constituents of opium to search for possible explanations – this is an example of going beyond the observations to search for deeper hidden causes of the patterns that we observe.
Verification by Experimentation: A large number of observed chemical phenomena could be explained by the hypothesis that chemical were composed of molecules. Different experiments designed to discover or confirm properties of these hypothesized objects produced results conforming to the presumed existence of molecules. Hypotheses about hidden objects, and causal effects, are confirmed (but not proven) by experiments or observations which are designed to highlight presence or absence of such objects and effects, by screening out elements which would interfere with detection of their presence. If the same object or effect succeeds in explaining a variety of observed phenomena, then we get strong indirect confirmation for their existence. “Prediction” or “Forecasting” on the other hand, is NOT a good method for confirming scientific laws because we live in a complex world where there are a huge number of different laws in operation at the same time. A valid law may fail to forecast well because of other factors in operation. Similarly, an invalid law may forecast well by chance. Given sufficiently large number of models, one of them will automatically hit the right forecast without being correct. Success in prediction requires that there should be only one law in operation – this is what experiments try to achieve. They do so by creating artificial environments which screen out all other effects so as to highlight the effect they are looking for. Weak effects will fail to forecast well, because they will be overwhelmed by other, stronger, factors in operation in real world situations.
Causal Explanation: Opposed to the idea of explanation by patterns, is the idea of a real explanation, which goes beyond observed patterns to seek the hidden and unobserved real causes which create this pattern. The examples given in “Simpson’s Paradox“ can be used to provide an illustration. Suppose we observe that the admit ratio of males is higher than the admit ratio of females at Berkeley. This is, by itself, a pattern, which can be converted into a law: if a female applies to Berkeley, her chances of getting admission are lower than that of a male who applies to Berkeley. However, causal explanation requires a deeper search for reasons for this discrepancy. The obvious hypothesis which suggests itself is that Berkeley discriminates against women. We then search for additional evidence to confirm whether or not this is true. This might lead us to look at the Departmental Admissions separately for each department. Examining these ratios leads to the reverse conclusion: each department discriminates in favor of women and against men. Looking at this breakdown led Bickel, Hammel, O’Connell (1951) “Sex Bias in Graduate Admissions: Data from Berkeley“ to a rather different conclusion. The low admit rates for women were because more women applied to departments which were more difficult to get into. As discussed in much greater detail in 2-Simpson’s Paradox, there is a wide variety of different causal structures, which radically different implications, all of which lead to the same set of observable data. So explanation by patterns is not really possible — patterns do not have any direct meaning, and must be interpreted in the context of a causal hypothesis about underlying causes for the pattern.
For reasons discussed above, the positivist methodology for econometrics naturally leads to models which are over-fitted to the data, and routinely fail to work outside the data sets on which they were fitted. This is because econometricians look for strong fits, instead of surprising patterns which require examination and explanation. This is explained in greater depth and detail in my paper on “Methodological Mistakes and Econometric Consequences“. See video lecture on this topic below: