Soci 109: Analysis of Sociological Data (2020)

University of California-San Diego
Winter 2020
MWF 8:00–8:50, SSB 101

Lane Kenworthy
Email: lkenworthy@ucsd.edu
Office hours: F 9:00–11:00, SSB 472

Skip to:

COURSE DESCRIPTION

This course introduces you to strategies, techniques, and software for analyzing quantitative social science data. I’ll emphasize the following:

  • Ask a good research question
  • Measure
  • Describe
  • Graph
  • Compare
  • Control
  • Pay attention to the magnitude, not just the existence, of effects
  • Where possible, use multiple types of data
  • Don’t infer causation from correlation alone
  • Be thorough
  • Be skeptical
  • Admit your uncertainty
  • Write clearly and simply

Here’s what you’ll need:

  • The course readings, videos, and data sets are available via the links below, at no cost.
  • You’ll need to purchase access to Stata, a statistical software program. For this course you only need a six-month “Student” license, which is $48 (at the site, click on “6-month”).
  • We’ll make a lot of use of this data set: General Social Survey 1972-2018 (Excel). To check variable names, question wording, and more, use the GSS codebook.
  • You’ll find it helpful to refer regularly to this “Stata Quick Guide.”
  • I’ve put the Stata commands and data sets for some of the graphs we’ll examine on a “Sample Graph Commands” page.

Grading: class attendance 15%, short assignments 60%, research report 25%. Details are below. I’ll post grades on Canvas.

SCHEDULE

Week 1
January 6, 8, 10
Social science, scatterplots, Stata, the General Social Survey

Readings and videos

  • Lane Kenworthy, “How Do We Know?,” The Good Society. READING
  • Stata Corp, “Getting Started in Stata,” 22 minutes. VIDEO

In-class analyses

  • Does the economy influence presidential election outcomes? Kenworthy, “Democracy,” The Good Society, figure 20. GRAPH
  • Has life gotten better? Hans Rosling, “200 Countries, 200 Years, 4 Minutes,” Gapminder, 2016. VIDEO

Weeks 2 and 3
January 13, 15, 17, 22, 24 (no class Jan 20)
Scatterplots, regression

Readings and videos

  • Khan Academy, “Mean, Median, and Mode,” 9 minutes. VIDEO
  • Khan Academy, “Standard Deviation,” 8 minutes. VIDEO
  • Khan Academy, “Scatterplots,” 2 minutes. VIDEO
  • Khan Academy, Regression 1, 7 minutes. VIDEO
  • Khan Academy, Regression 2, 9 minutes. VIDEO
  • Khan Academy, “Correlation and Causality,” 11 minutes. VIDEO
  • David Spiegelhalter, “What Causes What?,” The Art of Statistics, Basic Books, 2019, pp. 95-119. READING
  • Tim Harford, “Cigarettes, Damn Cigarettes, and Statistics,” Financial Times, April 10, 2015. READING
  • Bruce W. Hardy and Jessica Castonguay, “The Moderating Role of Age in the Relationship between Social Media Use and Mental Well-Being: An Analysis of the 2016 General Social Survey.” Computers in Human Behavior, 2018. READING

In-class analyses

  • Does education improve individuals? Does it improve countries? Kenworthy, “What Good Is Education?,” The Good Society, figures 2, 9. GRAPHS
  • How unequal is opportunity in America? Kenworthy, “Equality of Opportunity,” The Good Society, figure 2. GRAPH
  • Is the south less educated? If so, has it been catching up? General Social Survey data.
  • Does social media make people unhappy? General Social Survey data.
  • Does life expectancy rise as countries get richer? Kenworthy, “Progress,” The Good Society, figure 13. GRAPH
  • Does income increase happiness? Kenworthy, “Happiness,” The Good Society, figures 1, 2, 6. GRAPHS

Week 4
January 27, 29, 31
Line graphs

Readings and videos

  • Khan Academy, “Line Graph,” 2 minutes. VIDEO
  • Claire Cain Miller, “The Divorce Surge Is Over, But the Myth Lives On,” New York Times, 2014. READING

In-class analyses

  • Is earth warming? Alberto Cairo, How Charts Lie: Getting Smarter about Visual Information, W.W. Norton, 2019, p. 37. GRAPH. Kenworthy, “Climate Stability,” The Good Society, figure 4. GRAPH
  • Do Americans want a big welfare state? Kenworthy, “How Much Public Insurance Do Americans Want?,” The Good Society, figures 1-27. GRAPHS
  • How does life expectancy in the US compare to other rich democratic countries? Kieran Healy, Data Vizualization: A Practical Introduction, Princeton University Press, 2019, figure 4.21. GRAPH. Kenworthy, “Longevity,” The Good Society, figure 4. GRAPH
  • How much has income inequality increased in the United States? Kenworthy, “Income Distribution,” The Good Society, figures 3, 7, 8, 9. GRAPHS
  • How progressive are state taxes? Kenworthy, “Taxes: Additional Data,” The Good Society, figure A3. GRAPH
  • Is it only Democrats who have gotten more progressive on social issues? Kenworthy, “Personal Freedom: Additional Data,” The Good Society, figure A8. GRAPH
  • Did lead poisoning cause the surge in violent crime? Kenworthy, “Safety,” The Good Society, figure 16. GRAPH
  • What impact have changes in abortion law had on the incidence of abortion? Kenworthy, “Personal Freedom,” The Good Society, figure 14. GRAPH
  • Does it matter whether the president is a Democrat or a Republican? Kenworthy, “Do Election Outcomes Matter?,” The Good Society. GRAPHS
  • Is America getting less religious? Kenworthy, “Religion,” The Good Society, figure 11. GRAPH

Weeks 5 and 6
February 3, 5, 7, 10, 12, 14
Multiple regression

Readings and videos

  • Lane Kenworthy and Melissa Malami, “Gender Inequality in Political Representation: A Worldwide Comparative Analysis,” Social Forces, 1999. READING
  • William Zinsser, On Writing Well, excerpt, 2006. READING

In-class analyses

  • What determines household income? General Social Survey data.
  • What determines speeding ticket fine amounts? Markowky et al data set. LINK
  • What determines state-level presidential vote outcomes? State politics data set. LINK
  • Do cell phones distract drivers and cause accidents? Cell phone data set. LINK
  • What causes the gender pay gap? General Social Survey data.
  • Is democracy good for political stability? Political stability data set. LINK

Weeks 7 and 8
February 19, 21, 24, 26, 28 (no class Feb 17)
Maps, histograms, dot plots (bar graphs)

Readings and videos

  • Alberto Cairo, How Charts Lie: Getting Smarter about Visual Information, W.W. Norton, 2019, pp. 1-7. READING
  • Khan Academy, “Histograms,” 6 minutes. VIDEO
  • Khan Academy, “Bar Graph,” 3 minutes. VIDEO

In-class analyses

  • Which parts of America vote Democratic and which parts vote Republican? Kenworthy, “Is America Too Polarized?,” The Good Society, figures 1-2. GRAPHS
  • How do states differ in their marijuana laws? Kenworthy, “Marijuana,” The Good Society, figures 1-2. GRAPHS
  • What parts of the US have large concentrations of African Americans? Kieran Healy, Data Vizualization: A Practical Introduction, Princeton University Press, 2019, figure 7.12. GRAPH
  • How will climate change affect temperatures in the US? Heidi Cullen, “Think It’s Hot Now? Just Wait,” New York Times, 2016. GRAPHS
  • Where have opioid-related deaths increased the most? Kieran Healy, Data Vizualization: A Practical Introduction, Princeton University Press, 2019, figures 7.16, 7.18. GRAPHS
  • Have Americans become more polarized in their political views? Kenworthy, “Is America Too Polarized?,” The Good Society, figures 3-4. GRAPHS
  • Have Americans gotten more educated in the past half century? General Social Survey data. Command: hist educ5, percent by(period_2, col(1)).
  • Are immigrants less educated than native-born Americans? General Social Survey data. Command: hist educ5, percent by(born, col(1)).
  • How do rich democracies vary in helping people have a balanced life? Kenworthy, “Work-Family-Leisure Balance,” The Good Society, figure 8. GRAPH
  • Are views about climate change determined by education? General Social Survey data.
  • Are Americans still more civically engaged than people in other rich democracies? Kenworthy, “Civic Engagement,” The Good Society, figures 3-5. GRAPHS

Week 9
March 2, 4, 6
Difference-in-differences regression, regression discontinuity

In-class analyses

  • Does inequality in salaries help or hurt the pay of ordinary pro athletes? Kenworthy, “Is Winner-Take-All Bad or Good for the Middle Class? Evidence from Baseball,” Consider the Evidence, 2011. ANALYSES
  • Does income inequality increase obesity? Kenworthy, “Weight Moderation,” The Good Society, figures 10-13. ANALYSES
  • Do educational degrees boost income? General Social Survey data. Command: scatter coninc_mean educ || lfit coninc_mean educ if educ<=11 || lfit coninc_mean educ if educ>=12 & educ<=15 || lfit coninc_mean educ if educ>=16 & educ<=20 || , xline(11.5) xline(15.5) legend(off)
  • Does birth date affect athletic success? Malcolm Gladwell, Outliers: The Story of Success, Little, Brown, and Company, 2008, ch. 1. READING

Week 10
March 9, 11, 13
Putting it all together

Readings and videos

  • Khan Academy, “Hypothesis Testing and P-Values,” 11 minutes. VIDEO

In-class analyses

  • Do public social programs help the poor? Lane Kenworthy, Social Democratic Capitalism, Oxford University Press, 2020, figures 2.5, 2.6, 2.7, 2.17.
  • Why is interpersonal trust so low in the US? Kenworthy, “Trust,” The Good Society, figures 4-16. ANALYSES
  • What are the consequences of income inequality? Kenworthy, “Is Income Inequality Harmful?,” The Good Society. ANALYSES
  • Why have deaths among middle-aged white Americans increased? Kenworthy, “Longevity,” The Good Society, figures 10-13. ANALYSES

Exam week
Research report due: Monday, March 16, 11:00am.

COURSE AIMS

Here’s what you should be able to do by the end of this course:

  • Evaluate someone else’s use of quantitative data
  • Understand and explain the difference between correlation and causation
  • Decide which analytical tool is best for answering a question
  • Enter data into a statistical software program
  • Create and interpret a scatterplot graph
  • Create and interpret a line graph
  • Create and interpret a histogram
  • Execute and interpret a basic linear regression analysis
  • Execute and interpret a regression analysis with multiple independent variables
  • Execute and interpret a regression analysis with categorical independent variables
  • Understand a curvilinear regression analysis
  • Understand a regression analysis with interaction effects
  • Understand a difference-in-differences regression analysis
  • Understand a regression discontinuity analysis
  • Understand “statistical significance,” when it is relevant, and what it tells us
  • Understand and explain the limits of quantitative data for answering a question

STATA RESOURCES

Stata guide

Stata help. In Stata, type the following:

  • help schemes
  • help summarize
  • help regress
  • help correlate
  • help graph
  • help scatterplot
  • help histogram
  • help graph bar

Stata videos

GRADING

Course grades will be based on the following:

  • 15%: attendance
  • 60%: short assignments 1-8
  • 25%: research report

Each of these will be graded on a scale of 0 to 100. So your numerical course grade is calculated as: (attendance x .15) + (average score on assignments 1-8 x .60) + (research report x .25).

Your letter grade for the course will be determined as follows:

  • 96.67 and above = A+
  • 93.34–96.66 = A
  • 90–93.33 = A–
  • 86.67–89.99 = B+
  • 83.34–86.66 = B
  • 80–83.33 = B–
  • 76.67–79.99 = C+
  • 73.34–76.66 = C
  • 70–73.33 = C–
  • 60–69.99 = D
  • below 60 = F

There will be no extra-credit projects or assignments.

ATTENDANCE

I won’t take attendance the first two weeks. In weeks 3-10, you’re allowed to miss three classes without penalty. If you miss no more than three, your attendance grade will be 100. If you miss four days, your attendance grade will be 90. If you miss five, it will be 80. If you miss six, it will be 70. If you miss seven, it will be 60. If you miss more than seven, your attendance grade will be zero.

You can miss a class day without it counting as one of your three freebies for any of the following reasons: (1) holidays or special events observed by organized religions (for students who show affiliation with that particular religion), (2) absences pre-approved by the UCSD Dean of Students (or Dean’s designee), (3) extended illness (this requires a doctor’s note). I will need written verification of the circumstances.

SHORT ASSIGNMENTS

Short assignment 1

  • Purchase Stata 16 and install it on your computer. The link is above.
  • Open Stata and take a screen shot showing “Licensed to: your name.”
  • Download the General Social Survey data set (link is above). Open the data file in Excel. Copy and paste the data into a Stata file. Get a frequency distribution for the educ variable. What is the median number of years of schooling completed by American adults according to the GSS? What is the mean? What is the standard deviation? Create a new “grouped” version of educ, with the groups defined as follows: 1=0-11, 2=12, 3=13-15, 4=16, 5=17 or more. Name this new variable educ5. Get a frequency distribution for educ5. Get histograms for educ (ask for 20 bars) and educ5. Does the shape of the distribution differ depending on which version of the variable — educ or educ5 — you use? If so, why?
  • Upload your answers in Canvas. For the Stata output, either take screenshots and insert them into your word processing document or copy-and-paste the output into the word processing document. Your written answers should be no more than 1,000 words.
  • Due Wednesday, January 22, 8:00am.

Short assignment 2

  • Question: It’s often thought that people with more education are likely to have fewer children. Indeed, a common recommendation for countries that want to reduce population growth is to increase education. Is this hypothesis correct in the United States?
  • Analytical tools: scatterplot and regression.
  • Data: Use the GSS 1972-2018 data set that you downloaded for assignment 1. The variables are educ and childs. You might want to confine the analysis to people beyond peak childbearing age — say, age 45 and over.
  • Ignore statistical significance for this assignment.
  • Upload your answer in Canvas. No more than 1,000 words.
  • Due Monday, January 27, 8:00am.

Short assignment 3

  • Question: Did the “war on poverty” fail?
  • Analytical tool: line graph.
  • The debate: In the mid-1960s, President Lyndon Johnson declared a “war on poverty” and the federal government began spending more money on programs that help low-income Americans. In the mid-1980s, President Ronald Reagan argued that “poverty won the war.” His reasoning was that despite the increase in government expenditures, the poverty rate stopped falling in the early 1970s. Others countered that this was a result of two developments: First, the economy had gotten much tougher for the poor, with low-end wages stagnating and job stability eroding. That increased the poverty rate, hiding the poverty-reducing impact of government programs. Second, while government cash benefits to elderly Americans (mainly Social Security) rose steadily over time, cash benefits for the working-aged poor, most importantly Aid to Families with Dependent Children (AFDC), had been decreasing rather than increasing since the early 1970s. This debate continues.
  • An empirical test: Create a line graph of the poverty rate from 1959 to 2018. Then create a second line graph with one line showing the poverty rate for 18-to-64-year-olds and another line showing the poverty rate for persons aged 65 and over. What does this suggest about which side is correct in the “war on poverty” debate?
  • Data: Census Bureau, “Historical Poverty Tables,” table 2 (poverty status of people by family relationship, race, and Hispanic origin) and table 3 (poverty status of people by age, race, and Hispanic origin). The data are in Excel. From table 2, copy and paste into Stata the following from the “all races” section: year, percent below poverty all people. From table 3, copy and paste into Stata the following from the “all races” section: year, percent below poverty 18 to 64 years, percent below poverty 65 years and over. You’ll need to clean up the “year” data in Stata, or perhaps enter the years manually.
  • Upload your answer in Canvas. No more than 1,000 words.
  • Due Monday, February 3, 8:00am.

Short assignment 4

  • Question: In a 2011 Scientific American article, “How New York Beat Crime,” Franklin Zimring contends that New York City’s policing strategy reduced violent crime. He points out that the city’s policing strategy changed in the early 1990s, and after that its homicide rate fell dramatically. Yet it fell in other cities too. If Zimring’s hypothesis is correct, we would expect the murder rate to have decreased more in New York than in other large cities. Did it?
  • Analytical tool: line graph.
  • Data: Homicide rates from 1985 to 2012 in all large cities in a metro area with 3 million or more people. Data source: city police departments, via FBI Uniform Crime Reports. The data: Homicide, 19 large US cities, 1985-2012 (Excel); original data source.
  • Upload your answer in Canvas. No more than 1,000 words.
  • Due Monday, February 10, 8:00am.

Short assignment 5

  • Question: Why do some people watch more television than others?
  • Analytical tools: scatterplot and regression.
  • Data: Use the GSS 1972-2018 data set that you downloaded for assignment 1. It includes a variable called tvhours, which is the (self-reported) number of hours per day the respondent watches television. Use multivariate OLS regression to examine the following variables as possible causes of time spent watching TV: age, sex, race, region (recode as south, other), education (educ), work status (wrkstat: recode as employed full-time, other), subjective class position (class), number of children (childs), US-born or immigrant (born), health, happiness (happy).
  • Check for, and if necessary deal with, any outliers (extreme values) for the tvhours variable.
  • Ignore statistical significance for this assignment.
  • Upload your answer in Canvas. No more than 1,000 words.
  • Show your Stata commands in an appendix at the end of your write-up. (This won’t count toward your word total.)
  • Due Wednesday, February 19, 8:00am.

Short assignment 6

  • Question: What causes family instability?
  • Analytical tools: scatterplot and regression.
  • The share of American children who grow up in a single-parent family has increased steadily since the 1960s. Hypotheses about the key culprits abound. They include: (1) urbanization (the traditional stigma against divorce and out-of-wedlock childbearing erodes more rapidly in cities); (2) secularization (decline in religiosity erodes those stigmas); (3) declining social capital (Robert Putnam’s 2000 book Bowling Alone attributed the worsening of an array of social problems to the decrease in social capital in the US); (4) immigration (heterogeneity reduces the influence of norms); (5) poverty (low income puts strain on relationships); (6) income inequality (according to Richard Wilkinson and Kate Pickett, in their book The Spirit Level, income inequality increases anxiety and stress and thereby contributes to family dissolution); (7) labor force participation (William Julius Wilson, in his books The Truly Disadvantaged and When Work Disappears, hypothesized that lack of employment reduces family stability by reducing the number of marriageable males and by contributing to the erosion of other supportive institutions, such as community organizations); (8) college graduation rate (people with a college degree tend to wait longer before having a child).
  • Data: Raj Chetty and colleagues have assembled data for 741 commuting zones in the United States in the early 2000s: Family stability and more, 741 US commuting zones, 2000s (Excel). For the outcome, family instability, use the variable cs_fam_wkidsinglemom, which is the fraction of children living with a single mother. Use the following variables to assess the hypotheses: (1) intersects_msa (0=nonurban, 1=urban); (2) rel_tot (fraction religious); (3) scap_ski90pcm (social capital index); (4) cs_born_foreign (fraction foreign born); (5) hhinc00 (household income per capita); (6) gini (larger numbers indicate higher income inequality) (7) cs_labforce (labor force participation rate); (8) gradrate_r (college graduation rate, income adjusted).
  • Ignore statistical significance for this assignment.
  • Upload your answer in Canvas. No more than 1,000 words.
  • Show your Stata commands in an appendix at the end of your write-up. (This won’t count toward your word total.)
  • Due Wednesday, February 26, 8:00am.

Short assignment 7

  • Question: Religiosity has been declining in the United States, but Americans remain more religious than their counterparts in other rich democratic nations. What groups of Americans are the most and least religious these days?
  • Analytical tool: dot plot (bar graph).
  • Data: Use the GSS 1972-2018 data set that you downloaded for assignment 1. The variable for religiosity is attend. The values are 0 = never, 1 = less than once a year, 2 = once a year, 3 = several times a year, 4 = once a month, 5 = two or three times a month, 6 = nearly every week, 7 = every week, 8 = more than once a week. Recode it into a form that will allow you to calculate a mean: days per year. To do this, make a copy of the attend variable and call it attend_daysperyear. Then recode it as follows: recode attend_daysperyear 9=. 0=0 1=0 2=1 3=3 4=12 5=30 6=45 7=52 8=78. Then get the mean of attend_daysperyear for the following groups: sex (women, men), age (18-34, 35-64, 65-89), race (white, black, other), region (1-2 = northeast, 3-4 = midwest, 5-7 = south, 8-9 = west), educ (1-12, 13-15, 16, 17-20). Graph the means for these 16 groups with a dot plot, with the groups shown in descending order according to average attendance_daysperyear.
  • Upload your answer in Canvas. No more than 1,000 words.
  • Show your Stata commands in an appendix at the end of your write-up. (This won’t count toward your word total.)
  • Due Monday, March 2, 8:00am.

Short assignment 8

  • Question: Why haven’t Americans gotten happier in the past half century?
  • Pick four (or more) of the determinants of happiness I mention in “Happiness” and explore their relationship with happiness in the United States over time. The data points (cases) will be years.
  • Analytical tools: line graph, scatterplot, and regression.
  • Data: Use the GSS data to calculate a measure of happiness in each GSS year from 1972 to 2018. You could use the share answering “not too happy,” use the share answering “very happy,” or calculate average happiness. The units are years. Use the GSS (and, if you wish, other data sources) to get data for the hypothesized causes you choose to examine.
  • Ignore statistical significance for this assignment.
  • Upload your answer in Canvas. No more than 1,000 words.
  • Show your Stata commands in an appendix at the end of your write-up. (This won’t count toward your word total.)
  • Due Monday, March 9, 8:00am.

RESEARCH REPORT

Write a report using quantitative data analysis to help answer a social science research question of your choosing. Follow the principles listed at the top of this syllabus.

Upload your report in Canvas. Emailed reports won’t be accepted. To turn it in, go to canvas.ucsd.edu, log in, choose this course, and click on the “Research report” module.

Length: No more than 2,500 words. Fewer is fine. If you need more words, put them in an appendix. The report should be typed single-space using 12-point font and two-inch left and right margins. Put key graphs and tables in the text. Put appendix graphs and tables, if any, at the end.

Show your Stata commands in an appendix at the end. (This won’t count toward your word total.)

Use footnotes to give credit to anyone from whom you borrow evidence or argument. I’m not picky about the formatting of the footnotes, but be sure to include the author(s), title, and year; don’t simply list an internet address.

Don’t plagiarize. If you aren’t sure what constitutes plagiarism and how to avoid it, see the UC San Diego Library’s guide to preventing plagiarism.

The due date is listed in the schedule above. A paper turned in late but within 48 hours of the deadline will be penalized 25 points (out of 100). A paper turned in more than 48 hours late, or not turned in at all, will receive a grade of zero.

Some possible data sources for your research report and beyond

ACADEMIC INTEGRITY

Students are encouraged to share intellectual views and discuss freely the principles and applications of course materials. However, graded work must be the product of independent effort unless otherwise instructed. Students are expected to adhere to UC San Diego policy on academic integrity.

SPECIAL NEEDS AND ACCOMMODATIONS

Students who need special accommodation or services should contact the Office for Students with Disabilities (OSD). You must register and request that the OSD send me official notification of your accommodation needs as soon as possible. Please meet with me to discuss accommodations and how the course requirements and activities may impact your ability to fully participate.

SUBJECT TO CHANGE

Information here, other than the grade and attendance policy, may be subject to change with advance notice, as deemed appropriate by the instructor.