GPT-4 takes Bryan Caplan's midterm and gets an A

Mar 15, 2023

On January 9th, a little over two months ago, Bryan Caplan wrote,

Many, perhaps most, of my friends are giddy about ChatGPT. My Inbox is full of emails from Tyler Cowen touting its glories. More concretely, Alex Tabarrok recently assured me that existing AIs could already beat the performance of most of our economics students in our beloved subject. Though I was on the road almost all of last month, I vowed to put these claims to the test upon my return. […]
Outcome: If ChatGPT had been a regular student, it would have earned 31/100. That’s a D. Even worse than I expected.
Isn’t the software amazing nonetheless? Only a little. […]
You could object: “Who cares if ChatGPT gets a D on your midterm?” I care. At minimum, this shows, contra Alex Tabarrok, that the AI is not yet as good at economics as most of our students. For me personally, this also reinforces my suspicion that AI enthusiasts cherry-pick the best AI output.

I was surprised that this was Caplan’s reaction to a computer program receiving a D on one of his exams. For a long time, it was considered extremely hard to get computers to comprehend even basic sentences. Ten years ago, the state of the art in language modeling was pitiful. You usually couldn’t get an AI to spit out a coherent answer to a question on an economics exam, much less a correct answer.

Given that we went from AI that could barely understand English at all, to AI that can get a D on one of Caplan’s exams within ten years, one would think that this development represented amazing progress, even if the tech is still immature.

As Scott Aaronson put it,

Again and again, I’ve used the analogy that GPT is like the chess-playing horse that gets called unimpressive for only winning 3 out of 10 games. This is almost literally true in this case! For most of my life, an AI that could get a D rather than an F on your exam would’ve been a sci-fi fantasy, especially seeing how stringently you grade.

Caplan replied to Aaronson,

Some existing software is already miraculous. Google is amazing by absolute standards. ChatGPT isn't, for now. Though of course you're right that getting a D is impressive in some sense. If one of my kids had written ChatGPT I would be taking victory laps, but I can't get excited about it for now.

I thought this moment was a good time to put Caplan’s beliefs to the test, and challenge Caplan to a bet about whether AI will continue to score poorly on his exams in the coming years. Keeping to his reputation as a man known for putting his money where his mouth is—and having maintained a perfect track record of 24 wins in a row—Caplan accepted my offer.

The terms of the bet were outlined in a later post from Caplan. To summarize, I will win the bet if by January 30, 2029, an AI is able to get an A- or higher on at least 5/6 of his most recent midterm exams.

Now that the newest model from OpenAI, GPT-4, has been released, I thought I’d try grading its performance on the same exam that Caplan gave to ChatGPT. In short, GPT-4 got an A, even though I was trying hard to emulate Caplan’s nitpicky grading style. To be clear, despite this result, I am not yet sure that current AI is good enough to win the bet. GPT-4 got an A, but not by a huge margin. That said, I’m still feeling good about the bet with nearly 6 years left on the clock. AI will continue to improve.

In the rest of this post I’ll present the results. Note that I only gave GPT-4 the exam once, only allowed it one attempt per question, and I didn’t give the AI any hints. Also, I’m not a professor in economics, nor have I taught labor economics before, so I might have made some mistakes while grading GPT-4’s responses.

Economics 321 Midterm

Prof. Bryan Caplan

Fall 2022

Score Approximate Grade

60-100 A

45-59 B

35-44 C

25-34 D

0-24 F

GPT-4’s Total Score: 66/100

Part 1: True, False, and Explain

(10 points each - 2 for the right answer, and 8 for the explanation)

State whether each of the following six propositions is true or false. In 2-3 sentences, explain why. Use diagrams if helpful.

1. Suppose a country’s workplaces start to experience periodic blackouts. T, F, and Explain: Aggregate Labor Demand will clearly fall even though some workers will benefit.

Suggested Answer:

TRUE. ALD will definitely fall because MPP is the sole cause of MVP at the aggregate level, and workers are less productive with intermittent electricity. However, workers in industries with inelastic product demand will benefit because MVP=MPP*P, and P will rise more than MPP falls. Electrical repairmen are an obvious example.

GPT-4’s answer:

True. Aggregate labor demand is likely to fall due to periodic blackouts because blackouts generally disrupt production, reduce productivity, and increase uncertainty for businesses. While some workers, such as those involved in repairing or preventing blackouts, may benefit from increased demand for their services, the overall negative impact on productivity and production for the majority of industries will outweigh the benefits for a few, leading to a decrease in aggregate labor demand.

Score: 10/10. The AI gets 2 points for the correct answer, plus 2 points for stating that labor demand falls, plus 3 points for connecting falling labor demand to lower labor productivity, plus 3 points for mentioning that workers in some industries with inelastic product demand benefit from falling productivity.

2. “Productivity growth in one sector can very easily reduce employment in that sector.” (Krugman, The Accidental Theorist) T, F, and Explain: Krugman argues that such employment loss is a market failure that justifies government regulation.

Suggested Answer:

FALSE. Landsburg says that the “most obvious beneficiaries” are the immigrants themselves, who get a massive increase in pay. His point is that the net benefits for everyone else are ALSO positive, so immigration looks like a big social gain even if we ignore the most obvious gainers.

GPT-4’s answer:

False. According to Landsburg, the "most obvious beneficiaries" of open borders are not highly skilled American workers, but rather the immigrants themselves. By allowing individuals to move freely across borders, immigrants are given the opportunity to improve their standard of living, find better job opportunities, and escape poverty or other difficult circumstances in their home countries. While highly skilled American workers may also benefit from open borders through increased specialization, knowledge exchange, and access to global talent, Landsburg is referring to the immigrants themselves as the primary beneficiaries.

Score: 10/10. The AI gets 2 points for the correct answer, and 4 points for correctly stating Landsburg's point. It also gets 4 points for putting Landsburg's point in context, pointing out that open borders can also benefit highly skilled American workers even if those aren’t the most obvious beneficiaries of the policy.

5. Suppose you’re a typical selfish person. T, F, and Explain: Evidence of severe credit market imperfections will make you more eager to continue your education, but evidence of severe externalities of education will not.

Suggested Answer:

TRUE. If there are severe credit market imperfections, the private return to education will consistently exceed the ordinary investment return throughout the rest of the economy. The idea is that due to lack of collateral, students might be unable to get the loans they need to capture this return, which keeps returns elevated. Knowledge of this elevated return is an extra reason to stay in school. Externalities, in contrast, by definition, don’t affect private returns, so selfish students won’t care about them.

GPT-4’s answer:

True. As a typical selfish person, you are primarily concerned with your own well-being and benefits. Evidence of severe credit market imperfections, such as limited access to loans or high interest rates, might make you more eager to continue your education because it implies that individuals who manage to obtain education despite these imperfections are likely to earn a higher return on their investment, as fewer people can afford education and there is less competition in the educated labor market.

On the other hand, evidence of severe externalities of education, whether positive or negative, will not directly affect your eagerness to continue your education, because externalities represent the indirect benefits or costs that your education has on others. As a selfish person, you are primarily concerned with your own direct benefits, and the impact on others is not a significant factor in your decision-making process.

Score: 10/10. The AI gets 2 points for the correct answer. It gets 2 points for correctly stating that if there are severe credit market imperfections then private returns will consistently exceed the ordinary investment return, and 2 points for providing a correct explanation why this is true. It gets another 4 points for correctly stating the logic of externalities, and how it relates to selfishness.

6. Higher-IQ students have a higher college completion probability. T, F, and Explain: This does not affect college students’ expected rate of return.

Suggested Answer:

FALSE. The rate of return to education depends on both (a) the gains of school if you graduate, and (b) the probability of graduating. If high IQ raises (b), then its expected return automatically rises as well. It’s like investing in a restaurant: Profitability depends on (a) the profits if the restaurant survives, and (b) the probability that the restaurant survives.

GPT-4’s answer:

False. Higher-IQ students having a higher college completion probability does affect college students' expected rate of return. The expected rate of return on a college education depends on both the probability of completing the degree and the expected earnings premium associated with the degree. If higher-IQ students are more likely to complete college, they are more likely to receive the higher earnings associated with a college degree, which increases their expected rate of return on their investment in education. On the other hand, lower-IQ students, with a lower probability of completing college, face a lower expected rate of return as they may not receive the full earnings premium associated with degree completion.

Score: 10/10. The AI gets 2 points for giving the right answer, and 8 points for correctly identifying the way that IQ can effect the expected rate of return from going to college. More specifically, it correctly identifies that the expected rate of return on a college education depends both on the probability of completing the degree and the expected earnings premium associated with the degree, and relates this fact successfully to the effect from a higher IQ.

Part 2: Short Answer

(20 points each)

In 4-6 sentences, answer both of the following questions. Use diagrams if helpful. 1. The U.S. has had two years of very high inflation. Use a clearly-labelled supply-and-demand diagram to show what has been happening to the real minimum wage from 2019-2022. What will ultimately happen to low-skilled workers’ living standards and employment if high inflation continues and the federal government fails to raise the minimum wage? Consider all of the notable effects.

Suggested Answer:

Americans have a comparative advantage in Steel, Mexicans in Meals, and a price ratio of 1:1 would make it profitable for each country to specialize in its comparative advantage. As a result, if an American worker switches from spending one hour each on Steel and Meals to producing Steel exclusively, he goes from producing 7 Steel and 3 Meals to producing 14 Steel. This allows him to purchase 3 Meals and still be 4 Steel richer than he started.

This is important because in modern economies, services are 80% of employment. So if you can’t trade services, most potential gains to trade go unrealized. Free trade in goods without free trade in labor is a pale imitation of actual free trade.

GPT-4’s answer:

To demonstrate how Mexican immigration effectively increases American workers' productivity, let's consider a hypothetical example where trade and specialization occur between American and Mexican workers. Suppose one American worker and one Mexican worker are both working for an hour.

In this case, without specialization, an American worker could produce either 7 units of steel or 3 meals. A Mexican worker could produce either 1 unit of steel or 2 meals. If each worker splits their time equally between the two tasks, the American worker would produce 3.5 units of steel and 1.5 meals, while the Mexican worker would produce 0.5 units of steel and 1 meal. Together, they would produce a total of 4 units of steel and 2.5 meals.

Now, let's assume they specialize according to their comparative advantage. The American worker should focus on producing steel, while the Mexican worker should focus on producing meals. In one hour, the American worker would produce 7 units of steel, and the Mexican worker would produce 2 meals. Together, they now produce a total of 7 units of steel and 2 meals.

By specializing and allowing Mexican immigration, the combined productivity of American and Mexican workers effectively increases (from 4 steel units and 2.5 meals to 7 steel units and 2 meals). This is important because services, such as meal preparation, are a significant part of modern economies. By allowing workers to focus on their areas of comparative advantage, overall productivity and economic efficiency increase, benefiting both American and Mexican workers and promoting economic growth.

Score: 9/20. Although it had a minor calculation error, the AI gets 6 points for correctly demonstrating the principle of comparative advantage and how an American worker and a Mexican worker can increase their overall productivity via specialization. The AI also gets 3 points for correctly stating that services are a “significant” part of modern economies; however, it fails to put this statement in context by explaining that most services cannot be traded internationally without immigration, and services comprise a majority of employment in modern rich economies. Also, the AI neglects to explain how the American worker can trade with the Mexican worker to become better off than he would be in the absence of trade.

penttrioctium

Mar 15, 2023Edited

I wonder if it does any better if you tell it that the test is being graded by Bryan Caplan and its goal is to get the highest possible score. Maybe then it would eg notice the disemployment effects of minimum wage.

Expand full comment

1 reply

Philosophy bear

Mar 16, 2023

I'm going to be honest, a lot of this strikes me as debatable, e.g.:

""3. Suppose the government starts licensing car mechanics, but this fails to raise the quality of their work. T, F, and Explain: Mechanics’ wages will definitely rise.

"Definitely" is incredibly strong. Suppose the rule is bought in by a city government, and there are ten mechanics before it, the licensing requirements are not especially onerous, and as a result the number of mechanics remains static despite a nominal barrier to entry. We are given no assurances that the licensing process *actually reduces supply*- it could be super easy. For these reasons, I would be *very* reluctant to use the word "definitely". Definitely is a term that should pretty much never be used in economics.

And don't even get me started on the minimum wage question.

3 replies

4 more comments...

Matthew Barnett’s Blog

Discussion about this post