This continues from the previous post on Lies, Damned Lies, and Statistics.
More than 1.5 million copies sold, more than all other textbooks of statistics combined. Online copy
The vast majority of our life experience is built upon knowledge which cannot be reduced to numbers and facts. Our hopes, dreams, struggles, sacrifices, what we live for, and what we are ready to die for – none of these things can be quantified. However, as we have discussed, logical positivists said that what cannot be observed by our senses cannot be part of a scientific theory. As a result of this false idea, later disproven by philosophers, the attempt was made to measure everything – numbers were assigned to intelligence, trust, integrity, corruption, preferences, etc. – even though a long-standing tradition, as well as common intuition, tells us that these things are qualitative, and not measurable. Scientific progress was deeply and dramatically influenced by what I have called Lord Kelvin’s Blunder(2019): “When you can measure what you are speaking about, and express it in numbers, you know something about it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind”. This mindset needs to the lead to assign numbers and attempt to measure the unmeasurable. The harms that this has caused has been discussed in Beyond numbers & material rewards and Corruption: Measuring the Unmeasurable.
In this note, we would like to focus on just one aspect of this attempt to measure what cannot be measured. The idea that we can take multiple measures of performance and reduce them to a single number is known as the index number problem. Very few people realize that this is inherently impossible – all such attempts must inevitably involve making subjective decisions regarding how the different measures should be combined. What is most often done in practice is that subjective decisions hidden in choice of measures, and associated weights are justified as objective. Because of this illusion of objectivity created by standard practice, most people are unaware that there are no objective solutions to the index number problem.
Impossibility of Combining Indicators: There is no objective way to combine two or more measures of performance to come up with a single number which measures of overall performance.
We will explain and illustrate by a few examples. We start with a familiar case where scores from two exams must be merged to create a single score for the course grade. Suppose that instructor Orhan has three students who received the following scores on the midterm and final in the course:
The teacher has interacted with all four students throughout the semester and has got a good idea of their capabilities, over and above what scores on the exams show. Suppose he thinks that Anil is the best among the four, and would like the classrom grades to reflect this opinion. As long as weight given to the Final is greater than 60%, Anil will have the highest score. On the other hand, he may know that Bera is a brilliant student who just had a bad day on the Final. Weights of more than 60% for the midterm would make Bera the top student. Equal weights would make Javed come out on top. A slight increase in weights for the final (45% MT, 65% Fin) would make Dawood the best student. So depending on the subjective decision of the teacher, he can choose weights to make any one of the four the top student. Furthermore, by assigning weights and calculating the score, the subjective opinion will look like an objective and impartial decision. The teacher could give apparently objective reasons for any choice of weights, by referring to the subjective factors of the length, difficulty, and scope of coverage of the exams as reasons for his weights.
Which one of the four is the best student? What surprises most students is that there is no objective answer to this question. As a mathematical theorem, it is impossible to summarize the information contained in two numbers by one numbers. Two dimensions cannot be reduced to one. Whenever we carry out such a reduction, we lose half of the information contained in two numbers. When there are multiple indicators, we lose even more information. Because there is no objective answer, the choice must be made on subjective grounds. This subjective element was the topic of rhetoric – the persuasive tactics used to argue that the final should carry more weight, or that the midterm should carry more weight, or even that some factor not considered, like attendance, should be taken into account. However, the philosophy of positivism teaches us that subjective judgments and personal opinions are of no value, and only the ‘facts’ should be considered. As a result, the subjective process of assigning weights, and choosing factors, must be concealed under an appearance of objectivity. This is the “hidden rhetoric” of statistics. Unlike pre-positivist rhetoric, this form is deadly because the unsuspecting victim only sees the numbers, and is told that you cannot argue with the facts. He does not even get to see the subjective elements which have gone into the manufacturing of the these numbers. Before proceeding to our main topic of GDP, we give one more common example of how statistics are use to create a false impression of objectivity in the context of rankings of universities.
One implication of the impossibility of objectively combining multiple indicators is that it is impossible to objectively rank products which have multiple dimensions of performance. This point is made very clearly, accurately, and forcefully by Gladwell (2011). We consider only one of his examples to illustrate. The interested reader is strongly encouraged to read the original article. We consider popular methods for coming up with a single number to rank universities. This is done by making numerical judgments according to several criteria and then combining them using subjectively chosen weights. As a specific example, suppose that Criterion A is Financial Resources available to the university per student, indicated by money spent on faculty salaries, libraries, and other academic infrastructure. Criterion B is the percentage of admitted students who graduate. Criterion C is selectivity (the percentage of applicants who are admitted from total number of applicants). Hypothetical numbers for the three criteria are given below.
Which of the three universities is the “best”? Malcom Gladwell (2011, The Order of Things) says that the question does not make sense, and it cannot be answered. The numbers and names used for illustration here are hypothetical, but plausible. Chicago is a private university which charges high fees and has a relatively easy admissions policy. It encourages fierce competition among students and selects the survivors, leading to a high dropout ratio. It invests substantial financial resources on faculty salaries and institutional overheads, providing high quality facilities and charging high fees. Stanford is an exclusive elitist university, where only a few students who are cream of the cream are admitted. The university is well equipped financially, and invests a huge amount on faculty and academic resources. Because all applicants are extremely good, nearly all complete their studies. Penn State is a large public sector university which aims to provide education to the masses. It has an easy enrollment policy, and helps and encourages all students to graduate, resulting in a low dropout rate. It invests less in resources to make education affordable for the masses, and has a high student/faculty ratio for this reason. Each of the universities has a different goal, and when evaluated with respect to its own goals, each of them is the best of the three. By choosing different weights for the different criteria, we can make the combined index come out to favor any one of the three universities as the best. There is no objective way to choose weights. In fact, it could be argued that each of the factors can be considered a virtue or a defect – it can receive negative or positive score – depending on our subjective point of view. The standard rankings assign a positive weight to financial resources, evaluating a university as higher ranking if it spends more. However, this factor is negatively correlated with affordability, which may be much more important to students, and would result in a reversal of the ranking by this factor. Similarly, there is an argument that we should try to carry along and educate all students so that high dropout rates are bad. However, we could also argue that rigorous competition leads to selection of the best students, and poor students are eliminated, resulting in the best graduates. Selectivity is good for the students who get in, but bad for the ones excluded by the process. How much weight to give each factor, which factors to consider, and whether the factor is considered as a plus or a minus, all of these are subjective decisions.
When evaluation is carried out in multiple dimensions, the choice of dimensions, weights attached to them, and whether they count as positive or negative factors are all subjective choices. However, because of the positivist philosophy of knowledge which is the basis of modern statistics, this subjectivity is concealed, so as to create an appearance of objectivity. In the rest of this article, we explore this subjectivity in the context of one of the most important and widespread measures of economic performance, namely the GDP per capita.
To be continued.