Soci 109: Analysis of Sociological Data (2015)

University of California, San Diego
Spring 2015
MWF 10:00–10:50, SSB 101

Lane Kenworthy
Email: lane.kenworthy@gmail.com
Office hours: MW 12:30–1:30, SSB 472

This course introduces you to techniques and software for analyzing quantitative social science data. I’ll emphasize the following:

  • Ask a good research question
  • Measure
  • Describe
  • Graph
  • Compare
  • Control
  • Pay attention to the magnitude, not just the existence, of effects
  • Where possible, use multiple types of data
  • Don’t infer causation from correlation alone
  • Be thorough
  • Be skeptical
  • Admit your uncertainty
  • Write clearly and simply

The course readings, videos, and data sets are available via the links below at no cost. You’ll need to purchase access to Stata, a statistical software program. For this course you only need a six-month license for “Small Stata,” which is $35. (Small Stata allows 1,200 cases and 99 variables. If you’d like to be able to analyze larger data sets, you can get Stata IC.)

Grading: eight short assignments 65%, research report 35%. Details are below.

I’ll post grades on Ted. Everything else you need for the course — instructions, links, assignments — is in this online syllabus.

SCHEDULE

Week 1
Introduction to data analysis, Stata, and the General Social Survey

Weeks 2-5
Mean, median, standard deviation, histogram, dot plot (bar graph), line graph

Weeks 6-9
Scatterplot, correlation, regression

Week 10
Statistical significance

COURSE MATERIALS

Data sources

Data sets

Analytical tools: Khan Academy videos

Stata readings

Stata videos

Tips on how to do research

Have Americans become more politically polarized?

Do Americans want a smaller government?

How much has inequality increased in the United States?

Why has violent crime decreased?

How rapidly has legalization of same-sex marriage spread?

The family in decline: Still ongoing? Among what groups?

Is the middle class richer in the United States than in other affluent countries?

Does having more income make people happier?

Why has obesity increased?

Are Americans becoming less religious?

Why have so many working-class whites switched from Democratic to Republican?

How much intergenerational mobility is there in America?

Does economic growth increase life expectancy?

Does education pay off?

How much economic growth trickles down to the middle class and the poor?

Do types of economic freedom go together?

Why do some countries have more women in politics than others?

Is income inequality harmful?

Pay attention to the magnitude, not just the existence, of effects

Don’t infer causation from correlation alone

Write clearly and simply

SHORT ASSIGNMENTS

Assignment 1

  • Purchase Stata and install it on your computer.
  • Open Stata and take a screen shot showing “Licensed to: your name.”
  • Download the General Social Survey 2012 data set (link is in in the “Data sets” section above). Open the data file in Excel. Copy and paste the data into a Stata file. Get and print a frequency distribution for the educ variable. What is the median number of years of schooling completed by American adults according to the GSS? What is the mean? What is the standard deviation? Create a new grouped version of educ, with the groups defined as follows: 1=0-11, 2=12, 3=13-15, 4=16, 5=17 or more. Call this new variable educ5. Get and print a frequency distribution for educ5. Get and print histograms for educ (ask for 20 bars) and educ5. Does the shape of the distribution differ depending on which version of the variable — educ or educ5 — you use? If so, why?
  • Turn in a hard copy of your answers. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins. Attach your Stata printouts and screen shot.
  • Due Monday, April 13, at the beginning of class.

Assignment 2

  • Question: Franklin Zimring, in his 2011 article “How New York Beat Crime” (link is above), contends that New York City’s policing strategy reduced violent crime. As he points out, the homicide rate in New York fell has fallen dramatically since the early 1990s. Yet it has fallen in other cities too. If Zimring’s hypothesis is correct, we would expect the murder rate to have decreased more in New York than in other large cities. Did it?
  • Analytical tool: line graph.
  • Data: Homicide rates from 1985 to 2012 in all large cities in a metro area with 3 million or more people. Data source: city police departments, via FBI Uniform Crime Reports. A link to the data set is in the “Data sets” section above.
  • Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
  • Due Monday, April 20, at the beginning of class.

Assignment 3

  • Question: Did the “war on poverty” fail?
  • Analytical tool: line graph.
  • The debate: In 1964, President Lyndon Johnson declared a “war on poverty” and the federal government created a variety of new programs to try to help low-income Americans. In the mid-1980s, President Ronald Reagan argued that “poverty had won the war.” His reasoning was that despite the increase in government expenditures, the poverty rate stopped falling in the mid-1970s. Others countered that this was a result of two developments: First, the economy had gotten much tougher for the poor, with low-end wages stagnating and job stability eroding. That made it more difficult for government programs to continue to reduce poverty. Second, government cash benefits for the working-aged poor, most importantly Aid to Families with Dependent Children (AFDC), had been decreasing rather than increasing since the mid-1970s. Nearly three decades later, this debate continues.
  • An empirical test: Create a line graph of the poverty rate since 1959. Then create a second line graph showing the poverty rate for 18-to-64-year-olds and the poverty rate for those age 65 and over. What does this suggest about which side is correct in the “war on poverty” debate?
  • Data: Census Bureau, “Historical Poverty Tables: People,” table 3 (poverty status by age, race, and Hispanic origin). A link to this webpage is in the “Data sources” section above. The data are in Excel. Copy and paste into Stata the following from the “all races” section of the Excel file: year, percent below poverty level all people, percent below poverty level 18 to 64 years, percent below poverty level 65 years and over. You’ll need to clean up the “year” data in Stata, or perhaps enter the years manually. If you use an Apple computer, there’s a problem with downloading the Excel file from the Census Bureau website; instead use the Excel file in the “Data sets” section above.
  • Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
  • Due Wednesday, April 29, at the beginning of class.

Assignment 4

  • Question: Religiosity has been declining in the United States, but Americans remain more religious than their counterparts in other rich democratic nations. What groups of Americans are the most and least religious these days?
  • Analytical tool: dot plot (bar graph).
  • Data: Use the 1,200-person sample from the 2012 GSS (link is in the “Data sets” section above). The variable for religiosity is attend. The values are 0 = never, 1 = less than once a year, 2 = once a year, 3 = several times a year, 4 = once a month, 5 = two or three times a month, 6 = nearly every week, 7 = every week, 8 = more than once a week. Recode it into a form that will allow you to calculate a mean: days per year. To do this, make a copy of the attend variable and call it attend_daysperyear. Then recode it as follows: recode attend_daysperyear 9=. 0=0 1=0 2=1 3=3 4=12 5=30 6=45 7=52 8=78. Then get the mean of attend_daysperyear for the following groups: sex (women, men), age (18-34, 35-64, 65-89), race (white, black, other), region (1-2 = northeast, 3-4 = midwest, 5-7 = south, 8-9 = west), educ (1-12, 13-15, 16, 17-20). Graph the means for these 16 groups with a dot plot, with the groups shown in descending order according to average attendance_daysperyear.
  • Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
  • Due Monday, May 4, at the beginning of class.

Assignment 5

  • Question: It is often thought that people with more education are likely to have fewer children. In fact, a common recommendation for countries that want to reduce population growth is to increase education. Is this hypothesis correct in the United States?
  • Analytical tools: scatterplot and regression.
  • Data: Use the 1,200-person sample from the 2012 GSS (link is in the “Data sets” section above). The variables are educ and childs. You might want to confine the analysis to people beyond peak childbearing age — say, age 45 and over.
  • Ignore statistical significance for this assignment.
  • Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
  • Due Monday, May 11, at the beginning of class.

Assignment 6

  • Question: What causes family instability?
  • Analytical tools: scatterplot and regression.
  • The share of American children who grow up in a single-parent family has increased steadily since the 1960s. Hypotheses about the key culprits abound. They include: (1) urbanization (the traditional stigma against divorce and out-of-wedlock childbearing erodes more rapidly in cities); (2) secularization (decline in religiosity erodes those stigmas); (3) declining social capital (Robert Putnam’s 2000 book Bowling Alone attributed the worsening of an array of social problems to the decrease in social capital in the US); (4) immigration (heterogeneity reduces the influence of norms); (5) poverty (low income puts strain on relationships); (6) income inequality (according to Richard Wilkinson and Kate Pickett, in their book The Spirit Level, income inequality increases anxiety and stress and thereby contributes to family dissolution); (7) labor force participation (William Julius Wilson, in his books The Truly Disadvantaged and When Work Disappears, hypothesized that lack of employment reduces family stability by reducing the number of marriageable males and by contributing to the erosion of other supportive institutions, such as community organizations); (8) college graduation rate (people with a college degree tend to wait longer before having a child).
  • Data: Raj Chetty and colleagues have assembled data for 741 commuting zones in the United States in the early 2000s. The link is in the “Data sets” section above. For the outcome, family instability, use the variable cs_fam_wkidsinglemom, which is the fraction of children living with a single mother. Use the following variables to assess the hypotheses: (1) intersects_msa (0=nonurban, 1=urban); (2) rel_tot (fraction religious); (3) scap_ski90pcm (social capital index); (4) cs_born_foreign (fraction foreign born); (5) hhinc00 (household income per capita); (6) gini (larger numbers indicate higher income inequality) (7) cs_labforce (labor force participation rate); (8) gradrate_r (college graduation rate, income adjusted).
  • Ignore statistical significance for this assignment.
  • Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
  • Due Monday, May 18, at the beginning of class.

Assignment 7

  • Question: Why do some people watch more television than others?
  • Analytical tools: scatterplot and regression.
  • Data: Use the 1,200-person sample from the 2012 GSS (link is in the “Data sets” section above). It includes a variable called tvhours, which is the (self-reported) number of hours per day the respondent watches television. Use multivariate OLS regression to examine the following variables as possible causes of time spent watching TV: age, sex, race, region (recode as south, other), education (educ), work status (wrkstat: recode as employed full-time, other), subjective class position (class), number of children (childs), US-born or immigrant (born), health, happiness (happy).
  • Check for, and if necessary deal with, any outliers (extreme values) for the tvhours variable.
  • Ignore statistical significance for this assignment.
  • Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
  • Due Wednesday, May 27, at the beginning of class.

Assignment 8

  • Question: Why haven’t Americans gotten happier in the past 40 years?
  • In my “Happiness” chapter (link is above), I hypothesize that it is because some happiness boosters have decreased and some happiness depressors have increased. Pick five (or more) of the ones I mention and explore their relationship with happiness in the United States over time. The data points (cases) will be years.
  • Analytical tools: line graph, scatterplot, and regression.
  • Data: Use the GSS Berkeley site to calculate a measure of happiness in each GSS year from 1972 to 2012. You could use the share answering “not too happy,” use the share answering “very happy,” or calculate average happiness. Use the GSS and other data sources for the hypothesized causes you choose to examine.
  • Ignore statistical significance for this assignment.
  • Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
  • Due Monday, June 1, at the beginning of class.

RESEARCH REPORT

Write a report using quantitative data analysis to help answer a social science research question of your choosing. Follow the principles listed at the top of this syllabus.

Turn in a hard copy and upload your report on Ted. Emailed reports won’t be accepted. To turn it in on Ted, go to ted.ucsd.edu, log in, choose this course, and click on “Upload research report” in the blue menu bar. Your report won’t be visible to other students on Ted; this is just to allow me to check for plagiarism and length.

Length: 3,500 to 10,000 words. The report should be typed single-space using 12-point font and two-inch left and right margins. Put key graphs and tables in the text. Put appendix graphs and tables (if any) at the end.

Use footnotes to give credit to anyone from whom you borrow evidence or argument. I’m not picky about the formatting of the footnotes, but be sure to include the author(s), title, and year; don’t simply list an internet address.

Don’t plagiarize. If you aren’t sure what constitutes plagiarism and how to avoid it, see the UCSD Library’s guide to preventing plagiarism.

Due Wednesday, June 10, by 10:00am, in my office (SSB 472). A report turned in late but within 48 hours of the deadline will be penalized 25 points (out of 100). A report turned in more than 48 hours late, or not turned in at all, will receive a grade of zero.

GRADING

Course grades will be based on the following:

  • 65%: assignments 1-8
  • 35%: research report

Each of these will be graded on a scale of 0 to 100. So your numerical course grade is calculated as: (average score on assignments 1-8 x .65) + (research report x .35).

Your letter grade for the course will be determined as follows:

  • 97 and above = A+
  • 93–96 = A
  • 90–92 = A–
  • 87–89 = B+
  • 83–86 = B
  • 80–82 = B–
  • 77–79 = C+
  • 73–76 = C
  • 70–72 = C–
  • 60–69 = D
  • below 60 = F

There will be no extra-credit projects or assignments.

ACADEMIC INTEGRITY

Students are encouraged to share intellectual views and discuss freely the principles and applications of course materials. However, graded work must be the product of independent effort unless otherwise instructed. Students are expected to adhere to UCSD policy on academic integrity.

SPECIAL NEEDS AND ACCOMMODATIONS

Students who need special accommodation or services should contact the Office of Students with Disabilities (OSD), University Center 202, email osd@ucsd.edu, tel 858.534.4382. You must register and request that the OSD send me official notification of your accommodation needs as soon as possible. Please meet with me to discuss accommodations and how the course requirements and activities may affect your ability to fully participate.