Soci 109: Analysis of Sociological Data

DRAFT * DRAFT * DRAFT * DRAFT * DRAFT

University of California-San Diego
Winter 2020
MWF 8:00–8:50, SSB 101

Lane Kenworthy
Email: lkenworthy@ucsd.edu
Office hours: F 9:00–11:00, SSB 472

This course introduces you to strategies, techniques, and software for analyzing quantitative social science data. I’ll emphasize the following:

  • Ask a good research question
  • Measure
  • Describe
  • Graph
  • Compare
  • Control
  • Pay attention to the magnitude, not just the existence, of effects
  • Where possible, use multiple types of data
  • Don’t infer causation from correlation alone
  • Be thorough
  • Be skeptical
  • Admit your uncertainty
  • Write clearly and simply

The course readings, videos, and data sets are available via the links below, at no cost. You’ll need to purchase access to Stata, a statistical software program. For this course you only need a six-month “Student” license, which is $48 (at the site, click on “6-month”).

You’ll also need this data set: General Social Survey 1972-2018 (Excel).

Grading: class attendance 15%, short assignments 60%, research report 25%. Details are below. I’ll post grades on Canvas.

SCHEDULE

Week 1
January 6, 8, 10
Social science, scatterplot, Stata, the General Social Survey

  • Reading: Kenworthy, “How Do We Know?,” The Good Society. LINK
  • In-class analysis: Does the economy influence presidential election outcomes? Kenworthy, “Democracy,” The Good Society, figure 20. LINK
  • In-class video: Hans Rosling, “200 Countries, 200 Years, 4 Minutes,” Gapminder, 2016. LINK
  • Video: Stata Corp, “Getting Started in Stata” (22:01). LINK

Week 2
January 13, 15, 17
Scatterplot, regression

  • Video: Khan Academy, “Mean, Median, and Mode” (8:53). LINK
  • Video: Khan Academy, “Standard Deviation” (8:04). LINK
  • In-class analysis: Does education improve individuals? What about countries? Kenworthy, “What Good Is Education?,” The Good Society, figures 1-20. LINK
  • In-class analysis: Does social media make people unhappy? General Social Survey data.
  • Reading: Bruce W. Hardy and Jessica Castonguay, “The Moderating Role of Age in the Relationship between Social Media Use and Mental Well-Being: An Analysis of the 2016 General Social Survey.” Computers in Human Behavior, 2018. LINK
  • In-class analysis: Is the south less educated? If so, has it been catching up? General Social Survey data.

Week 3
January 20, 22, 24
Scatterplot, regression

  • Video: Khan Academy, “Scatterplots” (2:21). LINK
  • Video: Khan Academy, “Regression 1” (7:47). LINK
  • Video: Khan Academy, “Regression 2” (9:14). LINK
  • Video: Khan Academy, “Correlation and Causality” (10:44). LINK
  • Reading: Tim Harford, “Cigarettes, Damn Cigarettes, and Statistics.” Financial Times, April 10, 2015. LINK
  • In-class analysis: Does life expectancy rise as countries get richer? Kenworthy, “Progress,” The Good Society, figure 13. LINK
  • In-class analysis: Does income increase happiness? Kenworthy, “Happiness,” The Good Society, figures 1, 2, 6. LINK
  • In-class analysis: How unequal is opportunity in America? Kenworthy, “Equality of Opportunity,” The Good Society, figure 2. LINK

Week 4
January 27, 29, 31
Line graph

  • Video: Khan Academy, “Line Graph” (2:21). LINK
  • In-class analysis: Do Americans want a big welfare state? Kenworthy, “How Much Public Insurance Do Americans Want?,” The Good Society, figures 1-27. LINK
  • In-class analysis: How much has income inequality increased in the United States? Kenworthy, “Income Distribution,” The Good Society, figures 3, 7, 8, 9. LINK
  • In-class analysis: How progressive are state taxes? Kenworthy, “Taxes: Additional Data,” The Good Society, figure A3. LINK
  • In-class analysis: Does it matter whether the president is a Democrat or a Republican? Kenworthy, “Do Election Outcomes Matter?,” The Good Society. LINK
  • In-class analysis: Is America getting less religious? Kenworthy, “Religion,” The Good Society, figure 11. LINK
  • Reading: Claire Cain Miller, “The Divorce Surge Is Over, But the Myth Lives On,” New York Times, 2014. LINK

Week 5
February 3, 5, 7
Multiple regression

  • Reading: Lane Kenworthy and Melissa Malami, “Gender Inequality in Political Representation: A Worldwide Comparative Analysis,” Social Forces, 1999. LINK
  • In-class analysis: What determines speeding ticket fine amounts? Markowky et al data set. LINK
  • In-class analysis: What determines household income? General Social Survey data.
  • In-class analysis: What determines state-level presidential vote outcomes? State politics data set. LINK

Week 6
February 10, 12, 14
Multiple regression

  • Reading: William Zinsser, On Writing Well, excerpt, 2006. LINK
  • In-class analysis: Do cell phones distract drivers and cause accidents? Cell phone data set. LINK
  • In-class analysis: What causes the gender pay gap? General Social Survey data.
  • In-class analysis: Is democracy good for political stability? Political stability data set. LINK

Week 7
February 17, 19, 21
Histogram

  • Video: Khan Academy, “Histograms” (6:07). LINK
  • In-class analysis: Are immigrants less educated than native-born Americans? General Social Survey data.
  • In-class analysis: Have Americans become more polarized politically? Kenworthy, “Is America Too Polarized?,” The Good Society, figure 4. LINK
  • In-class analysis: Have Americans gotten more educated in the past half century? General Social Survey data.

Week 8
February 24, 26, 28
Dot plot (bar chart)

  • Video: Khan Academy, “Bar Graph” (2:58). LINK
  • In-class analysis: How do rich democracies vary in helping people have a balanced life? Kenworthy, “Work-Family-Leisure Balance,” The Good Society, figure 8. LINK
  • In-class analysis: Are views about climate change determined by education? General Social Survey data.
  • In-class analysis: Are Americans still more civically engaged than people in other rich democracies? Kenworthy, “Civic Engagement,” The Good Society, figures 2-5. LINK

Week 9
March 2, 4, 6
Difference-in-differences regression, regression discontinuity

  • In-class analysis: Does inequality in salaries help or hurt the pay of ordinary planers? Kenworthy, “Is Winner-Take-All Bad or Good for the Middle Class? Evidence from Baseball,” Consider the Evidence, 2011. LINK
  • In-class analysis: Does income inequality increase obesity? Kenworthy, “Weight Moderation,” The Good Society, figures 10-13. LINK
  • In-class analysis: Do educational degrees boost income? General Social Survey data.

Week 10
March 9, 11, 13
Putting it all together

  • In-class analysis: Why is interpersonal trust so low in the US? Kenworthy, “Trust,” The Good Society, figures 4-16. LINK
  • In-class analysis: What are the consequences of income inequality? Kenworthy, “Is Income Inequality Harmful?,” The Good Society. LINK
  • In-class analysis: Why have deaths among middle-aged white Americans increased? Kenworthy, “Longevity,” The Good Society, figures 8-11. LINK
  • Video: Khan Academy, “Hypothesis Testing and P-Values” (11:26). LINK

COURSE AIMS

Here’s what you should be able to do by the end of this course:

  • Evaluate someone else’s use of quantitative data
  • Understand and explain the difference between correlation and causation
  • Decide which analytical tool is best for answering a question
  • Enter data into a statistical software program
  • Create and interpret a scatterplot graph in a statistical software software program
  • Create and interpret a line graph in a statistical software program
  • Create and interpret a histogram in a statistical software program
  • Execute and interpret a basic linear regression analysis
  • Execute and interpret a regression analysis with multiple independent variables
  • Execute and interpret a curvilinear regression analysis
  • Execute and interpret a regression analysis with categorical independent variables
  • Execute and interpret a regression analysis with interaction effects
  • Execute and interpret a difference-in-differences regression analysis
  • Execute and interpret a regression discontinuity analysis
  • Understand “statistical significance,” when it is relevant, and what it tells us
  • Understand and explain the limits of quantitative data for answering a question

ADDITIONAL STATA RESOURCES

Stata help. In Stata, type the following:

  • help schemes
  • help summarize
  • help regress
  • help correlate
  • help graph
  • help scatterplot
  • help histogram
  • help graph bar

Additional Stata help

Stata videos

GRADING

Course grades will be based on the following:

  • 15%: attendance
  • 60%: short assignments 1-8
  • 25%: research report

Each of these will be graded on a scale of 0 to 100. So your numerical course grade is calculated as: (attendance x .15) (average score on assignments 1-8 x .65) + (research report x .25).

Your letter grade for the course will be determined as follows:

  • 96.67 and above = A+
  • 93.34–96.66 = A
  • 90–93.33 = A–
  • 86.67–89.99 = B+
  • 83.34–86.66 = B
  • 80–83.33 = B–
  • 76.67–79.99 = C+
  • 73.34–76.66 = C
  • 70–73.33 = C–
  • 60–69.99 = D
  • below 60 = F

There will be no extra-credit projects or assignments.

ATTENDANCE

You’re allowed to miss three classes without penalty. If you miss no more than three, your attendance grade will be 100. If you miss four days, your attendance grade will be 90. If you miss five, it will be 80. If you miss six, it will be 70. If you miss seven, it will be 60. If you miss more than seven, your attendance grade will be zero.

You can miss a class day without it counting as one of your three freebies for any of the following reasons: (1) holidays or special events observed by organized religions (for students who show affiliation with that particular religion), (2) absences pre-approved by the UCSD Dean of Students (or Dean’s designee), (3) extended illness (this requires a doctor’s note). I will need written verification of the circumstances.

SHORT ASSIGNMENTS

Assignment 1

  • Purchase Stata 16 and install it on your computer. The link is listed above.
  • Open Stata and take a screen shot showing “Licensed to: your name.”
  • Download the General Social Survey data set (link). Open the data file in Excel. Copy and paste the data into a Stata file. Get and print a frequency distribution for the educ variable. What is the median number of years of schooling completed by American adults according to the GSS? What is the mean? What is the standard deviation? Create a new “grouped” version of educ, with the groups defined as follows: 1=0-11, 2=12, 3=13-15, 4=16, 5=17 or more. Name this new variable educ5. Get and print a frequency distribution for educ5. Get and print histograms for educ (ask for 20 bars) and educ5. Does the shape of the distribution differ depending on which version of the variable — educ or educ5 — you use? If so, why?
  • Turn in a hard copy of your answers. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins. Attach your Stata printouts and screen shot.
  • Due Monday, January 20, at the beginning of class.

Assignment 2

  • Question: It is often thought that people with more education are likely to have fewer children. Indeed, a common recommendation for countries that want to reduce population growth is to increase education. Is this hypothesis correct in the United States?
  • Analytical tools: scatterplot and regression.
  • Data: Use the 1,200-person sample from the 2012 GSS (link is in the “Data sets” section above). The variables are educ and childs. You might want to confine the analysis to people beyond peak childbearing age — say, age 45 and over.
  • Ignore statistical significance for this assignment.
  • Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
  • Due Monday, January 27, at the beginning of class.

Assignment 3

  • Question: Franklin Zimring, in a 2011 Scientific American article “How New York Beat Crime,” contends that New York City’s policing strategy reduced violent crime. As he points out, the homicide rate in New York fell has fallen dramatically since the early 1990s. Yet it has fallen in other cities too. If Zimring’s hypothesis is correct, we would expect the murder rate to have decreased more in New York than in other large cities. Did it?
  • Analytical tool: line graph.
  • Data: Homicide rates from 1985 to 2012 in all large cities in a metro area with 3 million or more people. Data source: city police departments, via FBI Uniform Crime Reports. The data: Homicide, 19 large US cities, 1985-2012 (Excel); original data source.
  • Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
  • Due Monday, February 3, at the beginning of class.

Assignment 4

  • Question: Did the “war on poverty” fail?
  • Analytical tool: line graph.
  • The debate: In 1964, President Lyndon Johnson declared a “war on poverty” and the federal government created a variety of new programs to try to help low-income Americans. In the mid-1980s, President Ronald Reagan argued that “poverty had won the war.” His reasoning was that despite the increase in government expenditures, the poverty rate stopped falling in the mid-1970s. Others countered that this was a result of two developments: First, the economy had gotten much tougher for the poor, with low-end wages stagnating and job stability eroding. That made it more difficult for government programs to continue to reduce poverty. Second, government cash benefits for the working-aged poor, most importantly Aid to Families with Dependent Children (AFDC), had been decreasing rather than increasing since the mid-1970s. Nearly three decades later, this debate continues.
  • An empirical test: Create a line graph of the poverty rate since 1959. Then create a second line graph showing the poverty rate for 18-to-64-year-olds and the poverty rate for those age 65 and over. What does this suggest about which side is correct in the “war on poverty” debate?
  • Data: Census Bureau, “Historical Poverty Tables: People,” table 3 (poverty status by age, race, and Hispanic origin). Link to the data: Poverty rates by age group, US, 1959-2012. The data are in Excel. Copy and paste into Stata the following from the “all races” section of the Excel file: year, percent below poverty level all people, percent below poverty level 18 to 64 years, percent below poverty level 65 years and over. You’ll need to clean up the “year” data in Stata, or perhaps enter the years manually. If you use an Apple computer, there’s a problem with downloading the Excel file from the Census Bureau website; instead use the Excel file in the “Data sets” section above.
  • Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
  • Due Wednesday, February 10, at the beginning of class.

Assignment 5

  • Question: Religiosity has been declining in the United States, but Americans remain more religious than their counterparts in other rich democratic nations. What groups of Americans are the most and least religious these days?
  • Analytical tool: dot plot (bar graph).
  • Data: Use the 1,200-person sample from the 2012 GSS (link is in the “Data sets” section above). The variable for religiosity is attend. The values are 0 = never, 1 = less than once a year, 2 = once a year, 3 = several times a year, 4 = once a month, 5 = two or three times a month, 6 = nearly every week, 7 = every week, 8 = more than once a week. Recode it into a form that will allow you to calculate a mean: days per year. To do this, make a copy of the attend variable and call it attend_daysperyear. Then recode it as follows: recode attend_daysperyear 9=. 0=0 1=0 2=1 3=3 4=12 5=30 6=45 7=52 8=78. Then get the mean of attend_daysperyear for the following groups: sex (women, men), age (18-34, 35-64, 65-89), race (white, black, other), region (1-2 = northeast, 3-4 = midwest, 5-7 = south, 8-9 = west), educ (1-12, 13-15, 16, 17-20). Graph the means for these 16 groups with a dot plot, with the groups shown in descending order according to average attendance_daysperyear.
  • Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
  • Due Monday, February 17, at the beginning of class.

Assignment 6

  • Question: What causes family instability?
  • Analytical tools: scatterplot and regression.
  • The share of American children who grow up in a single-parent family has increased steadily since the 1960s. Hypotheses about the key culprits abound. They include: (1) urbanization (the traditional stigma against divorce and out-of-wedlock childbearing erodes more rapidly in cities); (2) secularization (decline in religiosity erodes those stigmas); (3) declining social capital (Robert Putnam’s 2000 book Bowling Alone attributed the worsening of an array of social problems to the decrease in social capital in the US); (4) immigration (heterogeneity reduces the influence of norms); (5) poverty (low income puts strain on relationships); (6) income inequality (according to Richard Wilkinson and Kate Pickett, in their book The Spirit Level, income inequality increases anxiety and stress and thereby contributes to family dissolution); (7) labor force participation (William Julius Wilson, in his books The Truly Disadvantaged and When Work Disappears, hypothesized that lack of employment reduces family stability by reducing the number of marriageable males and by contributing to the erosion of other supportive institutions, such as community organizations); (8) college graduation rate (people with a college degree tend to wait longer before having a child).
  • Data: Raj Chetty and colleagues have assembled data for 741 commuting zones in the United States in the early 2000s: Family stability and more, 741 US commuting zones, 2000s (Excel); original data source. For the outcome, family instability, use the variable cs_fam_wkidsinglemom, which is the fraction of children living with a single mother. Use the following variables to assess the hypotheses: (1) intersects_msa (0=nonurban, 1=urban); (2) rel_tot (fraction religious); (3) scap_ski90pcm (social capital index); (4) cs_born_foreign (fraction foreign born); (5) hhinc00 (household income per capita); (6) gini (larger numbers indicate higher income inequality) (7) cs_labforce (labor force participation rate); (8) gradrate_r (college graduation rate, income adjusted).
  • Ignore statistical significance for this assignment.
  • Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
  • Due Monday, February 24, at the beginning of class.

Assignment 7

  • Question: Why do some people watch more television than others?
  • Analytical tools: scatterplot and regression.
  • Data: Use the 1,200-person sample from the 2012 GSS (link is in the “Data sets” section above). It includes a variable called tvhours, which is the (self-reported) number of hours per day the respondent watches television. Use multivariate OLS regression to examine the following variables as possible causes of time spent watching TV: age, sex, race, region (recode as south, other), education (educ), work status (wrkstat: recode as employed full-time, other), subjective class position (class), number of children (childs), US-born or immigrant (born), health, happiness (happy).
  • Check for, and if necessary deal with, any outliers (extreme values) for the tvhours variable.
  • Ignore statistical significance for this assignment.
  • Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
  • Due Wednesday, March 2, at the beginning of class.

Assignment 8

  • Question: Why haven’t Americans gotten happier in the past half century?
  • In my “Happiness” chapter, I hypothesize that it is because some happiness boosters have decreased and some happiness depressors have increased. Pick five (or more) of the ones I mention and explore their relationship with happiness in the United States over time. The data points (cases) will be years.
  • Analytical tools: line graph, scatterplot, and regression.
  • Data: Use the GSS data to calculate a measure of happiness in each GSS year from 1972 to 2018. You could use the share answering “not too happy,” use the share answering “very happy,” or calculate average happiness. The units are years. Use the GSS (and, if you wish, other data sources) for the hypothesized causes you choose to examine.
  • Ignore statistical significance for this assignment.
  • Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
  • Due Monday, March 9, at the beginning of class.

RESEARCH REPORT

Write a report using quantitative data analysis to help answer a social science research question of your choosing. Follow the principles listed at the top of this syllabus.

Turn in a hard copy and upload your report on Canvas. Emailed reports won’t be accepted. To turn it in on Ted, go to canvas.ucsd.edu, log in, choose this course, and click on “Upload research report” in the blue menu bar. Your report won’t be visible to other students on Ted; this is just to allow me to check for plagiarism and length.

Length: No more than 5,000 words. Fewer is fine. If you need more words, put them in an appendix. The report should be typed single-space using 12-point font and two-inch left and right margins. Put key graphs and tables in the text. Put appendix graphs and tables, if any, at the end.

Use footnotes to give credit to anyone from whom you borrow evidence or argument. I’m not picky about the formatting of the footnotes, but be sure to include the author(s), title, and year; don’t simply list an internet address.

Don’t plagiarize. If you aren’t sure what constitutes plagiarism and how to avoid it, see the UC San Diego Library’s guide to preventing plagiarism.

The due date is listed above. A paper turned in late but within 48 hours of the deadline will be penalized 25 points (out of 100). A paper turned in more than 48 hours late, or not turned in at all, will receive a grade of zero.

Some possible data sources for your research report and beyond

ACADEMIC INTEGRITY

Students are encouraged to share intellectual views and discuss freely the principles and applications of course materials. However, graded work must be the product of independent effort unless otherwise instructed. Students are expected to adhere to UC San Diego policy on academic integrity.

SPECIAL NEEDS AND ACCOMMODATIONS

Students who need special accommodation or services should contact the Office for Students with Disabilities (OSD). You must register and request that the OSD send me official notification of your accommodation needs as soon as possible. Please meet with me to discuss accommodations and how the course requirements and activities may impact your ability to fully participate.

SUBJECT TO CHANGE

Information here, other than the grade and attendance policy, may be subject to change with advance notice, as deemed appropriate by the instructor.