How do we know?

Lane Kenworthy, The Good Society
September 2020

The aim of this book, The Good Society, is to figure out what institutions and policies best facilitate human flourishing in a rich society. How do we do that?

We use science. We ask a question, formulate a hypothesis (sometimes called a “model” or “theory”), examine evidence to see whether it’s consistent with the hypothesis, adjust the hypothesis if necessary, think about what else we would expect to observe if the hypothesis were true, and then look for more evidence. The key to science isn’t that it proceeds in exactly this order. The key is that it’s evidence-based and thorough.

Skip to:

ASK A QUESTION

First, pose a question. It’s tempting to begin with a topic or an issue, but it’s better to start with a question. A topic is too big, and too vague. Narrowing it down to a specific question helps to focus. And it doesn’t preclude flexibility; we can always change the question later.

Suppose we’re interested in improving happiness. Let’s start off with the following question: What would increase happiness in the United States?

CONSIDER THE EVIDENCE

A good next step is to read some summaries and discussions of prior research on the question.1 This will give us some familiarity with what’s been done already and some ideas for how to proceed.

Next, we could take a look at some evidence. Can happiness be captured in a quantitative measure? Public opinion surveys have asked questions along the lines of “How happy would you say you are?” for some time in the US. Respondents are given three options: very happy, pretty happy, and not too happy. Suppose we decide to use the responses to this question as our measure of happiness. Specifically, we might use the share of adults who respond “very happy” as the indicator.

Let’s start with the trend over time in the United States. Data from the General Social Survey (GSS) tell us that the share of American adults saying they are “very happy” hasn’t changed much since the early 1970s. If anything, it has decreased slightly. That’s surprising: the economy has grown, and it seems likely that more income would translate into more happiness, because income allows people to buy more things and it alleviates financial stress. But maybe additional income no longer increases happiness once a country reaches a certain level of affluence. This is interesting. Let’s revise our question: In an affluent nation, does more income bring greater happiness?

A research question is, in effect, the same thing as a hypothesis. We could restate our question as the following hypothesis: In an affluent nation, more income brings greater happiness. Our goal is to figure out the degree to which this is true.

Notice that I said “the degree to which,” rather than “whether or not.” That’s because in most instances we want to know not just whether there is an effect, but also how large the effect is.2

The over-time pattern in the United States since the early 1970s suggests that more income may not increase happiness. But we need to think more carefully. Were there any other developments during this period that might have reduced happiness and thereby offset an increase produced by higher income? Let’s say we know from our reading of prior studies that people who are religious tend to be happier. Religiosity has been decreasing in the US in recent decades, so that could have caused a decline in happiness. Also, since the rich in America have been getting much richer (some CEOs, hedge fund managers, athletes, and entertainers now take home more than $100 million in a single year), perhaps many Americans feel that they’ve been falling farther behind, and therefore are less happy, even though their incomes have been increasing.

Given these considerations, we might conclude that the over-time pattern in the United States doesn’t yield a clear answer to our question. Let’s turn to other evidence.

To answer most questions in social science, we need to compare. To ask “Does more income bring greater happiness?” is to ask “Does low income tend to go with low happiness and high income tend to go with high happiness?” In examining the over-time pattern in the US, we were comparing different points in time. We were looking to see if happiness was greater in years in which income was higher. What other types of comparative evidence are there?

We can compare across countries. If income increases happiness, then nations with higher income should tend to have greater average happiness.

Which countries should we examine? Lack of data means we can’t include all of the world’s 190-plus nations. We wouldn’t want to include all of them even if we could, because most countries are much poorer than the United States. If we’re interested in whether more income brings more happiness in an affluent nation like the US, then we should focus on affluent nations. Also, we might want to include only countries with democratic political systems. And we should probably exclude a few, like Iceland and Luxembourg, that have populations on the scale of a city rather than a country. There are 20 or so countries that are comparable to the US in this respect.

Suppose we find that, among this group of countries, higher income is associated with higher happiness (income is positively correlated with happiness). That’s interesting, but once again we need to consider whether income is genuinely the cause or whether there is something else making it appear as though income is the cause. Perhaps employment increases happiness, and more people are employed in countries with higher income, and it’s the higher employment rather than the higher income that causes the greater happiness in these nations. Or perhaps good health increases happiness, and countries with healthier people have higher income, and it’s the better health that brings the greater happiness.

We can conduct a statistical analysis to control for (“hold constant”) these other potential causes.3 Let’s suppose such an analysis suggests that income is in fact associated with happiness across the world’s rich nations.

CONSIDER MORE EVIDENCE

Examining over-time developments in a single country or comparing across countries at a single point in time are helpful analytical strategies. But better is to compare change over time across countries. This is closer to an experiment, which is the ideal strategy for assessing causality.4 If more income boosts happiness, countries with larger increases in income should tend to have larger increases in happiness.

We also could, if data are available, do these same sorts of comparative analyses across states or regions.

Another helpful strategy is to look for “natural experiments.” Suppose two neighboring states are similar in economic structure, population characteristics, culture, and other respects. Examples might include North Dakota and South Dakota, or Alabama and Mississippi. Suppose one state gets an income windfall, perhaps from discovery of oil or natural gas or some other energy resource, and this leads to a significant increase in income. If income boosts happiness, we would expect a larger jump in happiness in the state with the income rise than in the other state. This same sort of approach can be applied to countries (the US and Canada, or Denmark and Sweden) or to cities or neighborhoods. And if data are available, it can be applied to large numbers of pairs.5

A valuable and sometimes-underappreciated strategy is careful, in-depth examination of developments in a single country. For our question, it would be particularly useful to study a country in which there does appear to be a link between income and happiness. This would allow us to explore whether income genuinely is the cause, and it could help us to better understand the path(s) through which income affects happiness.6

Now, suppose we conclude from our various analyses that more income does tend to produce greater average happiness in a country. If this is a genuine causal effect, we ought to see it among individuals too. We could use data from a survey such as the GSS to examine whether persons with higher income tend to be happier.

Ideally, we would have data on particular individuals over time. This would allow us to see whether persons whose income increases tend to become happier.

Another possibility is to conduct a formal experiment with individuals. We might get a randomly-selected set of people, measure each person’s happiness, then have them do an activity that leads some to get a nontrivial monetary sum, and then measure each person’s happiness again. If income increases happiness, we should find a larger increase in happiness among those who received more money.

There are lots of other ways of studying individuals that might help. We could pick a group of people with low income and low happiness and interview them in depth to try to understand exactly why low income reduces their subjective well-being. Or we could do an ethnographic study, spending time with such people as they go about their daily activities, to try to sort out the causal relationship between income and happiness.

What if we observe a correlation between income and happiness when comparing across countries but not when comparing across time in a particular country? Or what if our finding from studying individuals isn’t consistent with what we find in examining countries? Then we need to try to figure out the source of the inconsistency. Sometimes the inconsistency can be resolved; sometimes it indicates that one of the findings is mistaken.

I’ve described a host of analytical strategies. The actual practice of science typically involves doing just one of them and then reporting the findings. The cumulation of such studies allows us to get a clearer picture in answering our question.

The best social science often is similar to detective work, with the social scientist more like Sherlock Holmes than like a chemist in a lab. Seldom do we have the evidence we want. So we use various types of data, and we deploy a mixture of analytical methods. We ask: “What would we expect to observe if a particular hypothesis were true? Is that what we in fact observe? If so or if not, what does that tell us about the answer to our question?” Then we piece together a conclusion from our multiple imperfect and incomplete bits of evidence.

TENDENCIES, NOT LAWS

If a physicist knows the volume and pressure of a gas, she can calculate its exact temperature. These three things are related to one another in law-like fashion. In the social world, there are few iron laws.7 Even if more income boosts happiness, that effect almost certainly will be a tendency. It will happen for some people or countries but not for all, and the magnitude of the impact will be larger for some than for others. That’s why in discussing causal effects social scientists frequently use phrases such as “on average” and “tends to.”

MACRO EVIDENCE IS VITAL

In recent years social scientists have increasingly utilized experiments and “randomized controlled trials,” in which a policy or incentive is administered to one group but not another. This source of evidence is considered to be the gold standard in social science because it’s the most likely to yield a clear signal about causality.

However, we shouldn’t draw inferences about country-level patterns based solely on individual-level data.8 We can’t tell from studies of individuals how strong the effect will be at the level of society as a whole. Nor can we tell whether the cause will have other effects on the outcome that offset the effect identified at the micro level. Suppose studies of individuals find that people tend to respond to higher taxes by reducing their work effort. We shouldn’t infer that increasing tax rates in the United States will reduce economic growth. At the aggregate level the impact may be small and thus overshadowed by other determinants of growth. And higher tax rates may have other effects on economic growth (enabling more investment in infrastructure or research, for example) that offset a negative impact on individuals’ work effort.

To draw inferences about the societal impact of policies and institutions, we have to examine macro units — countries, states, cities. Even if it seldom gives us definitive indication of causality, macro evidence is vital.

SMART CONSUMPTION

Suppose you want to form an opinion on a question or an issue but you aren’t an expert, and you don’t have enough time or interest to become one. What should you do?

Too often, we simply side with our “team” — Democrats, southerners, Burkean conservatives, Mormons, Jews, environmentalists, libertarians, country music fans, upper east side intellectuals, “freshwater” economists, Fox News watchers, or any other group with whom we feel an affiliation. When we don’t know what our team’s view is on a particular issue, we may take our cue from an influential organization or person who speaks for the team — the AARP, the Chamber of Commerce, the president, a talk radio host.

Consider the question of whether the planet is warming. This issue is too complex for almost anyone who isn’t a climate expert to be certain about. One possibility, then, is to believe what most scientists believe. But in practice, that’s not how most of us make up our minds. A survey by the Pew Research Center in 2015 found that 92% of Americans who consider themselves Democrats believe there is solid evidence that the earth’s average temperature is rising, while only 38% of Republicans believe that.9

Our inclination to side with our team leads us to uncritically accept argument and evidence consistent with our position, and to reject opposing argument and evidence. It also encourages us to use bad argument and evidence to try to sway others. Sometimes this is conscious, sometimes unconscious. Even the wisest among us are susceptible. It’s particularly likely in formats designed to persuade, such as political speeches, talk radio, op-eds, and films.

If we want to make up our own mind, and we aren’t capable of digesting the scientific literature, we need other sources of information. There are many: books, magazine and newspaper articles, op-eds, online essays and posts, movies, videos, lectures, discussions, and more. Which ones should we trust?

A good source will have many or all of the following features: It will address a question, rather than an issue. It will evaluate evidence. It will formulate a conclusion. It will consider objections and alternatives to the conclusion. It will acknowledge uncertainty.

Just as valuable, perhaps, is to be aware of what makes for a bad source of information. Here are some red flags10:

Presents only one side, with no attention to objections or counterarguments. Occasionally this is justifiable because space is limited, as in an op-ed or a brief interview spot. But usually it’s a signal that the conclusion hasn’t been carefully thought through.

Addresses weak arguments that the other side doesn’t actually make. This is a common trick by op-ed writers and politicians. Arguing against a straw-man makes your case appear more sensible.

Asks a question, but then answers a different one. This too is quite common. The author begins with the question of interest, but then shifts, almost imperceptibly, to a different question, the answer to which just happens to be their preferred solution or position.

Generalizes from an anecdote. Journalists and politicians typically begin with a story about a person, group, place, or event. This is an expository technique aimed at hooking you in, at grabbing your attention. Rarely is it, in and of itself, evidence for or against a hypothesis. Sometimes a source will intentionally generalize from a single anecdote or a small number of them. That’s bad. A single story might be representative of a larger phenomenon, but it just as easily might be the exception to the rule.

Tells some history and then implies that today’s similarities or differences give us the solution. This is common in books and documentaries. History helps to put our current situation in perspective, and many people find it interesting. But these sources sometimes use the history to draw excessively simplistic conclusions. They focus on one or two changes and argue that if these were reversed, we could get back to the desirable outcome we used to have. The problem is that lots of other things may have changed as well, so that reversing one or two institutions or policies might not have the same effect as in the past.

Posits a tradeoff, implying that we can have one thing or the other but not both. Societies and individuals do face tradeoffs; we can’t always have our cake and eat it too. But too often a source will posit a tradeoff because it seems plausible that there is one, even if the evidence suggests that it’s actually weak or nonexistent. Commentators frequently say that the United States could afford more generous social programs if we were to increase taxes, but that this would reduce investment and economic growth. That’s a reasonable hypothesis, but it turns out that most of the evidence suggests it’s incorrect.

Uses a theory based on highly-simplified assumptions to infer about the real world. This can be a useful analytical tool, particularly for clarifying our thinking about causal processes. It’s been the bread and butter of academic economists for much of the past century. But it’s vital that the reader be told when the theory (“model”) is aimed mainly at crystallizing thinking rather than at explaining actual empirical phenomena.

Treats a single scientific study as though it were definitive. Almost never does a single study get everything right. Many findings turn out to be not quite right or altogether wrong. Knowledge advances via reanalysis and additional analysis.

Suggests that there are only two alternatives and one of them is extreme. The social world is complex, and often there are multiple ways to attack a problem. But it’s easier to argue for your favored strategy if you can convince your audience that there really are only two options, and the other one isn’t realistic or isn’t consistent with their values.

Defers to public opinion. Saying “it’s what Americans want” is a common sidestep. Public opinion can be interesting and important, but it seldom tells us anything about whether the evidence supports a particular policy or position.

Oversimplifies with categories. Grouping things — individuals, organizations, countries — into categories can enhance our understanding by alerting us to key differences. But sometimes it hides more than it reveals. Hans Rosling notes, for instance, that “human beings have a strong dramatic instinct toward binary thinking, a basic urge to divide things into two distinct groups, with nothing but an empty gap in between. We love to dichotomize…. The gap instinct makes us imagine division where there is just a smooth range…. When people say ‘developing’ and ‘developed,’ what they are probably thinking is ‘poor countries’ and ‘rich countries’…. Today, most people are in the middle. There is no gap between the West and the rest, between developed and developing, between rich and poor. And we should all stop using the simple pairs of categories that suggest there is.”11

Uses data tricks. Misleading use of data, whether intentional or accidental, is rampant. So be alert. One common trick is to exaggerate size by using a really big number. Someone who wants us to be alarmed at the size of Social Security will say that it costs $850 billion a year, without mentioning that this is only about 5% of the country’s GDP. Another trick is to overstate or understate change. Suppose the share of Americans receiving disability payments increases from 1% to 2%. In percentage-point terms this is a fairly small increase, but if we say that disability receipt doubled, it sounds quite large. If we convey this via a line graph and have the vertical axis range from 1 to 2, it will appear to the casual observer that there was a massive jump. Similarly, we could make it look as though there were no change at all by having the vertical axis range from 0 to 100.

If your source of information exhibits one or more of these red flags, be skeptical.

GOOD SCIENCE AND GOOD ARGUMENT

Social science isn’t easy. But if we want to make the United States (or any other country) better, we depend on it. Given this, it’s a good idea to try to practice it and consume it effectively.


  1. See, for example, Lane Kenworthy, “Happiness,” The Good Society. 
  2. Many, many reports by good journalists and academics ignore the strength of effect. Here are two examples: Margot Sanger-Katz, “Income Inequality: It’s Also Bad for Your Health.” New York Times, March 30, 2015; Jane Brody, “Nuts Are a Nutritional Powerhouse.” New York Times, March 31, 2015.  
  3. For an introduction, see Brandon Foltz, “Statistics 101: Multiple Regression (Part 1), The Very Basics”; Lane Kenworthy, “Toward Improved Use of Regression in Macrocomparative Analysis,” Comparative Social Research, 2007. 
  4. Glenn Firebaugh, Seven Rules for Social Research, Princeton University Press, 2008, ch. 5; Joshua D. Angrist and Jorn-Steffen Pischke, Mostly Harmless Econometrics, Princeton University Press, 2009, pp. 227-243; Lane Kenworthy, “Step Away from the Pool,” Newsletter of the American Political Science Association Organized Section for Qualitative and Multi-Method Research, 2011. 
  5. David Card and Alan B. Krueger, Myth and Measurement: The New Economics of the Minimum Wage, Princeton University Press, 1995; Arindrajit Dube, T. William Lester, and Michael Reich, “Minimum Wage Effects Across State Borders: Estimates Using Contiguous Counties,” Review of Economics and Statistics, 2010. 
  6. David Collier, “Understanding Process Tracing,” PS: Political Science and Politics, 2011. 
  7. There are exceptions. See, for instance, Charles Ragin, The Comparative Method, University of California Press, 1987; Gary Goertz and Harvey Starr, eds., Necessary Conditions, Rowman and Littlefield, 2003. 
  8. For more, see Angus Deaton, “Instruments, Randomization, and Learning about Development,” Journal of Economic Literature, 2010; Martin Ravallion, “Fighting Poverty One Experiment at a Time,” Journal of Economic Literature, 2012; Angus Deaton and Nancy Cartwright, “Understanding and Misunderstanding Randomized Controlled Trials,” Social Science and Medicine, 2018; Christopher J. Ruhm, “Shackling the Identification Police?,” Southern Economic Journal, 2019. 
  9. Jocelyn Kiley, “Ideological Divide Over Global Warming as Wide as Ever,” Pew Research Center, 2015 
  10. See also Joel Best, Damned Lies and Statistics, University of California Press, 2001; Noah Smith, “A Few Words about Math,” Noahpinion, 2013; Noah Smith, “Most of What You Learned in Econ 101 Is Wrong,” Bloomberg View, 2015; Jon Bakija, Lane Kenworthy, Peter Lindert, and Jeff Madrick, How Big Should Our Government Be?, University of California Press, 2016; Hans Rosling, Factfulness, Flatiron Books, 2018. 
  11. Rosling, Factfulness, ch. 1. See also Charles Tilly, Durable Inequality, University of California Press, 1998.