On January 9th, a little over two months ago, Bryan Caplan wrote,
Many, perhaps most, of my friends are giddy about ChatGPT. My Inbox is full of emails from Tyler Cowen touting its glories. More concretely, Alex Tabarrok recently assured me that existing AIs could already beat the performance of most of our economics students in our beloved subject. Though I was on the road almost all of last month, I vowed to put these claims to the test upon my return. […]
Outcome: If ChatGPT had been a regular student, it would have earned 31/100. That’s a D. Even worse than I expected.
Isn’t the software amazing nonetheless? Only a little. […]
You could object: “Who cares if ChatGPT gets a D on your midterm?” I care. At minimum, this shows, contra Alex Tabarrok, that the AI is not yet as good at economics as most of our students. For me personally, this also reinforces my suspicion that AI enthusiasts cherry-pick the best AI output.
I was surprised that this was Caplan’s reaction to a computer program receiving a D on one of his exams. For a long time, it was considered extremely hard to get computers to comprehend even basic sentences. Ten years ago, the state of the art in language modeling was pitiful. You usually couldn’t get an AI to spit out a coherent answer to a question on an economics exam, much less a correct answer.
Given that we went from AI that could barely understand English at all, to AI that can get a D on one of Caplan’s exams within ten years, one would think that this development represented amazing progress, even if the tech is still immature.
As Scott Aaronson put it,
Again and again, I’ve used the analogy that GPT is like the chess-playing horse that gets called unimpressive for only winning 3 out of 10 games. This is almost literally true in this case! For most of my life, an AI that could get a D rather than an F on your exam would’ve been a sci-fi fantasy, especially seeing how stringently you grade.
Caplan replied to Aaronson,
Some existing software is already miraculous. Google is amazing by absolute standards. ChatGPT isn't, for now. Though of course you're right that getting a D is impressive in some sense. If one of my kids had written ChatGPT I would be taking victory laps, but I can't get excited about it for now.
I thought this moment was a good time to put Caplan’s beliefs to the test, and challenge Caplan to a bet about whether AI will continue to score poorly on his exams in the coming years. Keeping to his reputation as a man known for putting his money where his mouth is—and having maintained a perfect track record of 24 wins in a row—Caplan accepted my offer.
The terms of the bet were outlined in a later post from Caplan. To summarize, I will win the bet if by January 30, 2029, an AI is able to get an A- or higher on at least 5/6 of his most recent midterm exams.
Now that the newest model from OpenAI, GPT-4, has been released, I thought I’d try grading its performance on the same exam that Caplan gave to ChatGPT. In short, GPT-4 got an A, even though I was trying hard to emulate Caplan’s nitpicky grading style. To be clear, despite this result, I am not yet sure that current AI is good enough to win the bet. GPT-4 got an A, but not by a huge margin. That said, I’m still feeling good about the bet with nearly 6 years left on the clock. AI will continue to improve.
In the rest of this post I’ll present the results. Note that I only gave GPT-4 the exam once, only allowed it one attempt per question, and I didn’t give the AI any hints. Also, I’m not a professor in economics, nor have I taught labor economics before, so I might have made some mistakes while grading GPT-4’s responses.
Economics 321 Midterm
Prof. Bryan Caplan
Fall 2022
Score Approximate Grade
60-100 A
45-59 B
35-44 C
25-34 D
0-24 F
GPT-4’s Total Score: 66/100
Part 1: True, False, and Explain
(10 points each - 2 for the right answer, and 8 for the explanation)
State whether each of the following six propositions is true or false. In 2-3 sentences, explain why. Use diagrams if helpful.
1. Suppose a country’s workplaces start to experience periodic blackouts. T, F, and Explain: Aggregate Labor Demand will clearly fall even though some workers will benefit.
Suggested Answer:
TRUE. ALD will definitely fall because MPP is the sole cause of MVP at the aggregate level, and workers are less productive with intermittent electricity. However, workers in industries with inelastic product demand will benefit because MVP=MPP*P, and P will rise more than MPP falls. Electrical repairmen are an obvious example.
GPT-4’s answer:
True. Aggregate labor demand is likely to fall due to periodic blackouts because blackouts generally disrupt production, reduce productivity, and increase uncertainty for businesses. While some workers, such as those involved in repairing or preventing blackouts, may benefit from increased demand for their services, the overall negative impact on productivity and production for the majority of industries will outweigh the benefits for a few, leading to a decrease in aggregate labor demand.
Score: 10/10. The AI gets 2 points for the correct answer, plus 2 points for stating that labor demand falls, plus 3 points for connecting falling labor demand to lower labor productivity, plus 3 points for mentioning that workers in some industries with inelastic product demand benefit from falling productivity.
2. “Productivity growth in one sector can very easily reduce employment in that sector.” (Krugman, The Accidental Theorist) T, F, and Explain: Krugman argues that such employment loss is a market failure that justifies government regulation.
Suggested answer:
FALSE. According to Krugman, employment will fall in some sectors, but rise in others. The net result of higher productivity, as usual, is that living standards rise overall, though not everyone gains. See his hotdog/bun thought experiment.
GPT-4’s answer:
False. While Krugman does argue that productivity growth in one sector can reduce employment in that sector, he does not necessarily argue that this employment loss is a market failure that justifies government regulation. The statement in question highlights the potential for technological advancements or improvements in efficiency to lead to job displacement, but it does not suggest that the government should intervene to correct the situation. Technological progress can lead to structural unemployment in the short run, but it also can create new opportunities and jobs in the long run, as workers shift to new industries or occupations.
Score: 7/10. The AI gets 2 points for the correct answer, plus 2 points for correctly restating Krugman's point, plus 3 points for mentioning that employment will rise in other industries. However the AI does not mention that productivity growth is a net economic gain and thus raises overall living standards, even though it hurts some workers.
3. Suppose the government starts licensing car mechanics, but this fails to raise the quality of their work. T, F, and Explain: Mechanics’ wages will definitely rise.
Suggested answer:
TRUE. Licensing reduces supply, which raises wages and reduces quantity. If licensing raised quality, demand would go up, too. But by assumption, that didn’t happen. (And if it did, it would only reinforce the conclusion that wages go up!)
GPT-4’s answer
False. Although licensing requirements can create barriers to entry for car mechanics, which may lead to a reduction in the supply of mechanics and potentially higher wages, it is not guaranteed that their wages will definitely rise. If the licensing does not raise the quality of their work, consumers may not be willing to pay more for the same quality of service, and competition among existing mechanics could keep wages from rising. Additionally, if the licensing process is not particularly difficult or costly, it may not effectively limit the supply of mechanics, and therefore, may not lead to higher wages.
Score: 2/10. The AI gets 2 points for stating that wages may go up as a result of a reduction in the labor supply from licensing. However, the AI does not realize that if consumer demand does not rise, and the labor supply decreases, then the wages of mechanics will go up, even with competition among existing mechanics.
4. In his discussion of immigration, Landsburg remarks: "[W]e ostentatiously ignore the most obvious beneficiaries of open borders and can still conclude that open borders are a good thing." T, F, and Explain: According to Landsburg, the "most obvious beneficiaries" are skilled American workers.
Suggested Answer:
FALSE. Landsburg says that the “most obvious beneficiaries” are the immigrants themselves, who get a massive increase in pay. His point is that the net benefits for everyone else are ALSO positive, so immigration looks like a big social gain even if we ignore the most obvious gainers.
GPT-4’s answer:
False. According to Landsburg, the "most obvious beneficiaries" of open borders are not highly skilled American workers, but rather the immigrants themselves. By allowing individuals to move freely across borders, immigrants are given the opportunity to improve their standard of living, find better job opportunities, and escape poverty or other difficult circumstances in their home countries. While highly skilled American workers may also benefit from open borders through increased specialization, knowledge exchange, and access to global talent, Landsburg is referring to the immigrants themselves as the primary beneficiaries.
Score: 10/10. The AI gets 2 points for the correct answer, and 4 points for correctly stating Landsburg's point. It also gets 4 points for putting Landsburg's point in context, pointing out that open borders can also benefit highly skilled American workers even if those aren’t the most obvious beneficiaries of the policy.
5. Suppose you’re a typical selfish person. T, F, and Explain: Evidence of severe credit market imperfections will make you more eager to continue your education, but evidence of severe externalities of education will not.
Suggested Answer:
TRUE. If there are severe credit market imperfections, the private return to education will consistently exceed the ordinary investment return throughout the rest of the economy. The idea is that due to lack of collateral, students might be unable to get the loans they need to capture this return, which keeps returns elevated. Knowledge of this elevated return is an extra reason to stay in school. Externalities, in contrast, by definition, don’t affect private returns, so selfish students won’t care about them.
GPT-4’s answer:
True. As a typical selfish person, you are primarily concerned with your own well-being and benefits. Evidence of severe credit market imperfections, such as limited access to loans or high interest rates, might make you more eager to continue your education because it implies that individuals who manage to obtain education despite these imperfections are likely to earn a higher return on their investment, as fewer people can afford education and there is less competition in the educated labor market.
On the other hand, evidence of severe externalities of education, whether positive or negative, will not directly affect your eagerness to continue your education, because externalities represent the indirect benefits or costs that your education has on others. As a selfish person, you are primarily concerned with your own direct benefits, and the impact on others is not a significant factor in your decision-making process.
Score: 10/10. The AI gets 2 points for the correct answer. It gets 2 points for correctly stating that if there are severe credit market imperfections then private returns will consistently exceed the ordinary investment return, and 2 points for providing a correct explanation why this is true. It gets another 4 points for correctly stating the logic of externalities, and how it relates to selfishness.
6. Higher-IQ students have a higher college completion probability. T, F, and Explain: This does not affect college students’ expected rate of return.
Suggested Answer:
FALSE. The rate of return to education depends on both (a) the gains of school if you graduate, and (b) the probability of graduating. If high IQ raises (b), then its expected return automatically rises as well. It’s like investing in a restaurant: Profitability depends on (a) the profits if the restaurant survives, and (b) the probability that the restaurant survives.
GPT-4’s answer:
False. Higher-IQ students having a higher college completion probability does affect college students' expected rate of return. The expected rate of return on a college education depends on both the probability of completing the degree and the expected earnings premium associated with the degree. If higher-IQ students are more likely to complete college, they are more likely to receive the higher earnings associated with a college degree, which increases their expected rate of return on their investment in education. On the other hand, lower-IQ students, with a lower probability of completing college, face a lower expected rate of return as they may not receive the full earnings premium associated with degree completion.
Score: 10/10. The AI gets 2 points for giving the right answer, and 8 points for correctly identifying the way that IQ can effect the expected rate of return from going to college. More specifically, it correctly identifies that the expected rate of return on a college education depends both on the probability of completing the degree and the expected earnings premium associated with the degree, and relates this fact successfully to the effect from a higher IQ.
Part 2: Short Answer
(20 points each)
In 4-6 sentences, answer both of the following questions. Use diagrams if helpful. 1. The U.S. has had two years of very high inflation. Use a clearly-labelled supply-and-demand diagram to show what has been happening to the real minimum wage from 2019-2022. What will ultimately happen to low-skilled workers’ living standards and employment if high inflation continues and the federal government fails to raise the minimum wage? Consider all of the notable effects.
Suggested answer:
During this time, the real minimum wage has been rapidly falling. As a result, the disemployment effect of the minimum wage has been falling as well. Real wages are getting closer to market-clearing levels, and employment is getting closer to full employment. If inflation continues and the government doesn’t raise the minimum wage, this will ultimately amount to the abolition of the minimum wage. The result: Wages will fall to the market-clearing level, and employment will reach full employment. The effective abolition of the minimum wage is bad for lucky low-skilled workers who currently get higher pay, but better for everyone else, including consumers, unlucky low-skilled workers who can’t find work at all, employers, investors, and anyone who wants basic job training. (See graph on the next page).
GPT-4’s answer:
During the period of high inflation from 2019-2022, the real minimum wage would decline as the purchasing power of money erodes. In a supply-and-demand diagram, the nominal minimum wage would be represented as a horizontal line at the minimum wage level. As inflation occurs, the real minimum wage (i.e., the nominal minimum wage adjusted for inflation) would effectively decrease, causing the horizontal line to shift downwards.
If high inflation continues and the federal government fails to raise the minimum wage, low-skilled workers' living standards would likely deteriorate, as their wages would have less purchasing power in the economy. This decrease in real minimum wage may initially lead to an increase in the demand for low-skilled labor, as firms may find it relatively cheaper to hire such workers. However, over time, other factors such as technological advancements, labor market adjustments, and the changing structure of the economy may offset these short-term benefits, as firms seek more cost-effective ways to operate or replace low-skilled labor with automation.
In summary, the living standards of low-skilled workers would likely decline due to the erosion of their purchasing power, and while employment opportunities may increase in the short run, this effect may be offset in the long run by factors like automation and labor market adjustments.
Score: 8/20. The AI gets 6 points for correctly explaining that the real minimum wage will decline as purchasing power of money erodes. It also gets 2 points for partially describing what this effect would look like in a supply and demand diagram (albeit without drawing a diagram). However, the AI does not present a correct picture of this effect on living standards in the supply-and-demand model. The AI says that a decline in the real minimum wage may increase demand for low-skilled labor, but that is not true. The AI paints a uniformly negative picture of a declining minimum wage in the long run, failing to point out the positive employment effects of a lower real minimum wage.
2. Suppose there are two goods – meals and steel. Here is how much American and Mexican workers can produce in an hour:
Steel Meals
American 7 3
Mexican 1 2
Give a simple example showing how Mexican immigration effectively increases American workers’ productivity. Why is this important? (Hint: How important are services in modern economies?)
Suggested Answer:
Americans have a comparative advantage in Steel, Mexicans in Meals, and a price ratio of 1:1 would make it profitable for each country to specialize in its comparative advantage. As a result, if an American worker switches from spending one hour each on Steel and Meals to producing Steel exclusively, he goes from producing 7 Steel and 3 Meals to producing 14 Steel. This allows him to purchase 3 Meals and still be 4 Steel richer than he started.
This is important because in modern economies, services are 80% of employment. So if you can’t trade services, most potential gains to trade go unrealized. Free trade in goods without free trade in labor is a pale imitation of actual free trade.
GPT-4’s answer:
To demonstrate how Mexican immigration effectively increases American workers' productivity, let's consider a hypothetical example where trade and specialization occur between American and Mexican workers. Suppose one American worker and one Mexican worker are both working for an hour.
In this case, without specialization, an American worker could produce either 7 units of steel or 3 meals. A Mexican worker could produce either 1 unit of steel or 2 meals. If each worker splits their time equally between the two tasks, the American worker would produce 3.5 units of steel and 1.5 meals, while the Mexican worker would produce 0.5 units of steel and 1 meal. Together, they would produce a total of 4 units of steel and 2.5 meals.
Now, let's assume they specialize according to their comparative advantage. The American worker should focus on producing steel, while the Mexican worker should focus on producing meals. In one hour, the American worker would produce 7 units of steel, and the Mexican worker would produce 2 meals. Together, they now produce a total of 7 units of steel and 2 meals.
By specializing and allowing Mexican immigration, the combined productivity of American and Mexican workers effectively increases (from 4 steel units and 2.5 meals to 7 steel units and 2 meals). This is important because services, such as meal preparation, are a significant part of modern economies. By allowing workers to focus on their areas of comparative advantage, overall productivity and economic efficiency increase, benefiting both American and Mexican workers and promoting economic growth.
Score: 9/20. Although it had a minor calculation error, the AI gets 6 points for correctly demonstrating the principle of comparative advantage and how an American worker and a Mexican worker can increase their overall productivity via specialization. The AI also gets 3 points for correctly stating that services are a “significant” part of modern economies; however, it fails to put this statement in context by explaining that most services cannot be traded internationally without immigration, and services comprise a majority of employment in modern rich economies. Also, the AI neglects to explain how the American worker can trade with the Mexican worker to become better off than he would be in the absence of trade.
I wonder if it does any better if you tell it that the test is being graded by Bryan Caplan and its goal is to get the highest possible score. Maybe then it would eg notice the disemployment effects of minimum wage.
I'm going to be honest, a lot of this strikes me as debatable, e.g.:
""3. Suppose the government starts licensing car mechanics, but this fails to raise the quality of their work. T, F, and Explain: Mechanics’ wages will definitely rise.
Suggested answer:
TRUE. Licensing reduces supply, which raises wages and reduces quantity. If licensing raised quality, demand would go up, too. But by assumption, that didn’t happen. (And if it did, it would only reinforce the conclusion that wages go up!)
GPT-4’s answer
False. Although licensing requirements can create barriers to entry for car mechanics, which may lead to a reduction in the supply of mechanics and potentially higher wages, it is not guaranteed that their wages will definitely rise. If the licensing does not raise the quality of their work, consumers may not be willing to pay more for the same quality of service, and competition among existing mechanics could keep wages from rising. Additionally, if the licensing process is not particularly difficult or costly, it may not effectively limit the supply of mechanics, and therefore, may not lead to higher wages.""
"Definitely" is incredibly strong. Suppose the rule is bought in by a city government, and there are ten mechanics before it, the licensing requirements are not especially onerous, and as a result the number of mechanics remains static despite a nominal barrier to entry. We are given no assurances that the licensing process *actually reduces supply*- it could be super easy. For these reasons, I would be *very* reluctant to use the word "definitely". Definitely is a term that should pretty much never be used in economics.
And don't even get me started on the minimum wage question.