This is the second in a sequence of lectures about regression models and methodology (for first lecture, on underlying philosophy of science, see: Flawed Foundations of Social Sciences). These regression models are central in econometrics. Current methodology underlying these models makes them completely useless for learning about the real world. To learn how and why, we first discuss the differences between nominalist and realist methodology for science.
Underlying Philosophy of Science
Many important structures of the real world are hidden from view. However, as briefly sketched in previous lecture on Ibnul Haytham: First Scientist, current views say that science is only based on observables. Causation is central to statistics and econometrics, but it is not observable. As a result, there is no notation available to describe the relationship of causation between two variables. We will use X => Y as a notation for X causes Y. Roughly speaking, this means that if values of X were to change, then Y would have a tendency to change as a result. This is not observable for two separate reasons. ONE because it is based on a counterfactual. In another world, where the value of X was different from what was actually observed in our current world, this change would exert pressure on Y to change. TWO X exerts an influence on Y, but there are other causal factors which are also involved. Thus Y might not actually change in the expected direction because the effect of X might be offset by other causal factors which we have not accounted for. For both of these reasons, causality is not directly observable.
Achieving Conceptual Clarity
The standard approach to statistics and econometrics is based on a huge number of confusions. The same word is used for many different concepts. To clear up these confusions, we need to develop new language and notations. We start by distinguishing between three different types of ideas:
- O-concepts refer to the Observables.
- M-concepts refer to a model for the data
- R-concepts refer to the Real World
R-concepts refer to factors in the real world, and causal effects which link them. For example, Household income could be one of the factors which causally influences consumption decisions. Let us use HI* and HC* to denote real world household income and consumption for some particular household. We will use => to denote causation: HI* => HC*. A household income-expenditure survey obtains measures HI and HC of HI* and HC*. We will use this notational convention to distinguish between real world concepts and their observable counterparts. An econometric model attempts to find a model which fits data on HI and HC. A real model uses the data on HI and HC as clues to tease out causal relationships within real world variables HI* and HC*.
To illustrate how real-world models are constructed, we go through a hypothetical example close to reality. We start with a hypothesis about a real world causal relationship; for example, HI* => HC*. Causal relationships are unobservable, so no direct confirmation is possible. However, examination of data on HI and HC can provide indirect evidence confirming or disconfirming the hypothesis. There are three main possibilities: HI* => HC*, HC* => HI*, and HC* ^ HI*. The last is the standard symbol for independence, but since this is not generally available, we will also use || double vertical bars as a replacement notation: HC* || HI* means that the two variables are independent – neither causes the other. Note that there would be many other possibilities, such as bidirectional causality, or causal effects mediated through intervening variables, but we are considering the simplest possible cases, to start with.
The data can provide us with evidence regarding these causal relationships. If we see large variations in HI* and very little in HC*, we would be tempted to reject the causal hypothesis that HI* => HC*. This might be the case in an ideal Islamic society, where everyone follows simple lifestyles, regardless of income levels. If, on the other hand, we see that consumption levels increase with income, this would suggest that our hypothesis may be true. But, we always need to check for reverse causation. Suppose for example that people are accustomed to different lifestyles, and they earn to support their lifestyle. Those who desire higher consumption levels will be driven to earn higher incomes. In this case the causal direction will be the reverse: HC* => HI*. There are many different ways that we can judge the direction of causation, according to availability of data, or using experiments which vary income.
The central point we are trying to make here concerns the difference between real models and econometric models. Econometrics models are confined to the OBSERVED data HC and HI. Real models ALWAYS go beyond the observed data, and involve causal hypotheses linking the unobservables HC* and HI*. Real models can never be proven or disproven, but data can provide support or disconfirming evidence. In this regard, the data is suggestive, never conclusive, for a number of reasons.First, what is observed is an imperfect measure of the underlying real variable. Second, causal effects may be suppressed in the sample due to operation of other factors about which we have no knowledge. For example, we might observe a sample where consumption is identical but income levels vary greatly, and conclude that the causal hypothesis HI* => HC* is not valid. However, we may find that data is for a population of migrant workers, who send all their savings back home to their families, while minimizing personal consumption levels to what is barely necessary. Here, another factor is operating to suppress the causal effect which would appear in its absence.
The video continues to a discussion of regressions models, followed by a regression analysis of a specific data set on two different measures of Serum Kanamycin Levels in the blood. For the full writeup, see: https://azprojects.wordpress.com/2021/03/21/regression-econometrics-vs-reality/