University of California, San Diego
Spring 2015
MWF 10:00–10:50, SSB 101
Lane Kenworthy
Email: lane.kenworthy@gmail.com
Office hours: MW 12:30–1:30, SSB 472
This course introduces you to techniques and software for analyzing quantitative social science data. I’ll emphasize the following:
- Ask a good research question
- Measure
- Describe
- Graph
- Compare
- Control
- Pay attention to the magnitude, not just the existence, of effects
- Where possible, use multiple types of data
- Don’t infer causation from correlation alone
- Be thorough
- Be skeptical
- Admit your uncertainty
- Write clearly and simply
The course readings, videos, and data sets are available via the links below at no cost. You’ll need to purchase access to Stata, a statistical software program. For this course you only need a six-month license for “Small Stata,” which is $35. (Small Stata allows 1,200 cases and 99 variables. If you’d like to be able to analyze larger data sets, you can get Stata IC.)
Grading: eight short assignments 65%, research report 35%. Details are below.
I’ll post grades on Ted. Everything else you need for the course — instructions, links, assignments — is in this online syllabus.
SCHEDULE
Week 1
Introduction to data analysis, Stata, and the General Social Survey
Weeks 2-5
Mean, median, standard deviation, histogram, dot plot (bar graph), line graph
Weeks 6-9
Scatterplot, correlation, regression
Week 10
Statistical significance
COURSE MATERIALS
Data sources
- General Social Survey (GSS), 1972-2014; GSS Codebook
- World Values Survey, 1981-2014
- US Census Bureau, Historical Income Data, 1947-2013
- US Census Bureau, Historical Poverty Tables, 1959-2013
- OECD
- Luxembourg Income Study
- World Top Incomes Database
- World Database of Happiness
- UNDP Human Development Report
- World Bank
Data sets
- GSS 2012, sample of 1,200 for use with small Stata (Excel); original data source
- Homicide, 19 large US cities, 1985-2012 (Excel); original data source
- Poverty rates by age group, US, 1959-2012 (Excel)
- Family stability and more, 741 US commuting zones, 2000s (Excel); original data source
Analytical tools: Khan Academy videos
- Mean and median
- Standard deviation
- Histogram
- Bar graph
- Line graph
- Scatterplots
- Regression 1
- Regression 2
- Correlation and Causality
- Hypothesis testing and p-values
Stata readings
- Stata 13 quick guide
- Stata sample commands
- Getting started with Stata
- Schemes
- Summarize
- Graph intro
- Graph
- Histogram
- Dot plot (bar graph)
- Line graph (scatterplot)
- Scatterplot
- Scatterplot with regression line
- Scatterplot with loess curve
- Regression
- Correlation
Stata videos
- Introduction
- Quick help in Stata
- Copy and paste data from Excel into Stata
- Example data included with Stata
- Mean and median
- Histogram
- Bar graph
- Scatterplot
- Regression
- Correlation
Tips on how to do research
- Lane Kenworthy, “Doing Research,” 2014
Have Americans become more politically polarized?
- Kenworthy, Lane. 2015. “Political Polarization.” The Good Society.
Do Americans want a smaller government?
- Kenworthy, Lane. 2014. “How Much Public Insurance Do Americans Want?” The Good Society.
How much has inequality increased in the United States?
- Kenworthy, Lane. 2015. “Income Inequality.” The Good Society.
Why has violent crime decreased?
- Zimring, Franklin. 2011. “How New York Beat Crime.” Scientific American, August.
How rapidly has legalization of same-sex marriage spread?
- Park, Haeyoun. 2015. “Gay Marriage State by State: How a Trickle Became a Torrent.” New York Times, March 31.
The family in decline: Still ongoing? Among what groups?
- Kenworthy, Lane. 2015. “Families.” The Good Society.
- Miller, Claire Cain. 2014. “The Divorce Surge Is Over, but the Myth Lives On.” New York Times, December 2.
Is the middle class richer in the United States than in other affluent countries?
- Leonhardt, David and Kevin Quealy. 2014. “The American Middle Class Is No Longer the World’s Richest.” New York Times, April 22.
- Kenworthy, Lane. 2015. “Shared Prosperity.” The Good Society.
Does having more income make people happier?
- Kenworthy, Lane. 2015. “Happiness.” The Good Society.
Why has obesity increased?
- Kenworthy, Lane. 2015. “Weight Moderation.” The Good Society.
Are Americans becoming less religious?
- Voas, David and Mark Chaves. 2014. “Is the United States a Counterexample to the Secularization Thesis?” Unpublished.
- Pew Research Center. 2015. “America’s Changing Religious Landscape.”
Why have so many working-class whites switched from Democratic to Republican?
- Kenworthy, Lane, Sondra Barringer, Daniel Duerr, and Garrett Andrew Schneider. 2007. “The Democrats and Working-Class Whites.” Unpublished.
How much intergenerational mobility is there in America?
- Economic Mobility Project. 2012. “Pursuing the American Dream: Economic Mobility Across Generations.” Pew Charitable Trusts.
Does economic growth increase life expectancy?
- Rosling, Hans. 2010. “200 Countries, 200 Years, 4 Minutes.” Gapminder.org.
Does education pay off?
- Kenworthy, Lane. 2014. “What Good Is Education?” The Good Society.
How much economic growth trickles down to the middle class and the poor?
- Kenworthy, Lane. 2015. “Shared Prosperity.” The Good Society.
- Kenworthy, Lane. 2015. “A Decent and Rising Income Floor.” The Good Society.
Do types of economic freedom go together?
- Kenworthy, Lane. 2014. “Economic Freedom.” The Good Society.
Why do some countries have more women in politics than others?
- Kenworthy, Lane and Melissa Malami. 1999. “Gender Inequality in Political Representation: A Worldwide Comparative Analysis.” Social Forces 78: 235-269.
Is income inequality harmful?
- Kenworthy, Lane. 2015. “Is Income Inequality Harmful?” The Good Society.
Pay attention to the magnitude, not just the existence, of effects
- Brody, Jane. 2015. “Nuts Are a Nutritional Powerhouse.” New York Times, March 31.
- Sanger-Katz, Margot. 2015. “Income Inequality: It’s Also Bad for Your Health.” New York Times, March 30.
Don’t infer causation from correlation alone
- Harford, Tim. 2015. “Cigarettes, Damn Cigarettes, and Statistics.” Financial Times, April 10.
Write clearly and simply
- William Zinsser, On Writing Well, excerpt, 2006
SHORT ASSIGNMENTS
Assignment 1
- Purchase Stata and install it on your computer.
- Open Stata and take a screen shot showing “Licensed to: your name.”
- Download the General Social Survey 2012 data set (link is in in the “Data sets” section above). Open the data file in Excel. Copy and paste the data into a Stata file. Get and print a frequency distribution for the educ variable. What is the median number of years of schooling completed by American adults according to the GSS? What is the mean? What is the standard deviation? Create a new grouped version of educ, with the groups defined as follows: 1=0-11, 2=12, 3=13-15, 4=16, 5=17 or more. Call this new variable educ5. Get and print a frequency distribution for educ5. Get and print histograms for educ (ask for 20 bars) and educ5. Does the shape of the distribution differ depending on which version of the variable — educ or educ5 — you use? If so, why?
- Turn in a hard copy of your answers. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins. Attach your Stata printouts and screen shot.
- Due Monday, April 13, at the beginning of class.
Assignment 2
- Question: Franklin Zimring, in his 2011 article “How New York Beat Crime” (link is above), contends that New York City’s policing strategy reduced violent crime. As he points out, the homicide rate in New York fell has fallen dramatically since the early 1990s. Yet it has fallen in other cities too. If Zimring’s hypothesis is correct, we would expect the murder rate to have decreased more in New York than in other large cities. Did it?
- Analytical tool: line graph.
- Data: Homicide rates from 1985 to 2012 in all large cities in a metro area with 3 million or more people. Data source: city police departments, via FBI Uniform Crime Reports. A link to the data set is in the “Data sets” section above.
- Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
- Due Monday, April 20, at the beginning of class.
Assignment 3
- Question: Did the “war on poverty” fail?
- Analytical tool: line graph.
- The debate: In 1964, President Lyndon Johnson declared a “war on poverty” and the federal government created a variety of new programs to try to help low-income Americans. In the mid-1980s, President Ronald Reagan argued that “poverty had won the war.” His reasoning was that despite the increase in government expenditures, the poverty rate stopped falling in the mid-1970s. Others countered that this was a result of two developments: First, the economy had gotten much tougher for the poor, with low-end wages stagnating and job stability eroding. That made it more difficult for government programs to continue to reduce poverty. Second, government cash benefits for the working-aged poor, most importantly Aid to Families with Dependent Children (AFDC), had been decreasing rather than increasing since the mid-1970s. Nearly three decades later, this debate continues.
- An empirical test: Create a line graph of the poverty rate since 1959. Then create a second line graph showing the poverty rate for 18-to-64-year-olds and the poverty rate for those age 65 and over. What does this suggest about which side is correct in the “war on poverty” debate?
- Data: Census Bureau, “Historical Poverty Tables: People,” table 3 (poverty status by age, race, and Hispanic origin). A link to this webpage is in the “Data sources” section above. The data are in Excel. Copy and paste into Stata the following from the “all races” section of the Excel file: year, percent below poverty level all people, percent below poverty level 18 to 64 years, percent below poverty level 65 years and over. You’ll need to clean up the “year” data in Stata, or perhaps enter the years manually. If you use an Apple computer, there’s a problem with downloading the Excel file from the Census Bureau website; instead use the Excel file in the “Data sets” section above.
- Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
- Due Wednesday, April 29, at the beginning of class.
Assignment 4
- Question: Religiosity has been declining in the United States, but Americans remain more religious than their counterparts in other rich democratic nations. What groups of Americans are the most and least religious these days?
- Analytical tool: dot plot (bar graph).
- Data: Use the 1,200-person sample from the 2012 GSS (link is in the “Data sets” section above). The variable for religiosity is attend. The values are 0 = never, 1 = less than once a year, 2 = once a year, 3 = several times a year, 4 = once a month, 5 = two or three times a month, 6 = nearly every week, 7 = every week, 8 = more than once a week. Recode it into a form that will allow you to calculate a mean: days per year. To do this, make a copy of the attend variable and call it attend_daysperyear. Then recode it as follows: recode attend_daysperyear 9=. 0=0 1=0 2=1 3=3 4=12 5=30 6=45 7=52 8=78. Then get the mean of attend_daysperyear for the following groups: sex (women, men), age (18-34, 35-64, 65-89), race (white, black, other), region (1-2 = northeast, 3-4 = midwest, 5-7 = south, 8-9 = west), educ (1-12, 13-15, 16, 17-20). Graph the means for these 16 groups with a dot plot, with the groups shown in descending order according to average attendance_daysperyear.
- Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
- Due Monday, May 4, at the beginning of class.
Assignment 5
- Question: It is often thought that people with more education are likely to have fewer children. In fact, a common recommendation for countries that want to reduce population growth is to increase education. Is this hypothesis correct in the United States?
- Analytical tools: scatterplot and regression.
- Data: Use the 1,200-person sample from the 2012 GSS (link is in the “Data sets” section above). The variables are educ and childs. You might want to confine the analysis to people beyond peak childbearing age — say, age 45 and over.
- Ignore statistical significance for this assignment.
- Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
- Due Monday, May 11, at the beginning of class.
Assignment 6
- Question: What causes family instability?
- Analytical tools: scatterplot and regression.
- The share of American children who grow up in a single-parent family has increased steadily since the 1960s. Hypotheses about the key culprits abound. They include: (1) urbanization (the traditional stigma against divorce and out-of-wedlock childbearing erodes more rapidly in cities); (2) secularization (decline in religiosity erodes those stigmas); (3) declining social capital (Robert Putnam’s 2000 book Bowling Alone attributed the worsening of an array of social problems to the decrease in social capital in the US); (4) immigration (heterogeneity reduces the influence of norms); (5) poverty (low income puts strain on relationships); (6) income inequality (according to Richard Wilkinson and Kate Pickett, in their book The Spirit Level, income inequality increases anxiety and stress and thereby contributes to family dissolution); (7) labor force participation (William Julius Wilson, in his books The Truly Disadvantaged and When Work Disappears, hypothesized that lack of employment reduces family stability by reducing the number of marriageable males and by contributing to the erosion of other supportive institutions, such as community organizations); (8) college graduation rate (people with a college degree tend to wait longer before having a child).
- Data: Raj Chetty and colleagues have assembled data for 741 commuting zones in the United States in the early 2000s. The link is in the “Data sets” section above. For the outcome, family instability, use the variable cs_fam_wkidsinglemom, which is the fraction of children living with a single mother. Use the following variables to assess the hypotheses: (1) intersects_msa (0=nonurban, 1=urban); (2) rel_tot (fraction religious); (3) scap_ski90pcm (social capital index); (4) cs_born_foreign (fraction foreign born); (5) hhinc00 (household income per capita); (6) gini (larger numbers indicate higher income inequality) (7) cs_labforce (labor force participation rate); (8) gradrate_r (college graduation rate, income adjusted).
- Ignore statistical significance for this assignment.
- Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
- Due Monday, May 18, at the beginning of class.
Assignment 7
- Question: Why do some people watch more television than others?
- Analytical tools: scatterplot and regression.
- Data: Use the 1,200-person sample from the 2012 GSS (link is in the “Data sets” section above). It includes a variable called tvhours, which is the (self-reported) number of hours per day the respondent watches television. Use multivariate OLS regression to examine the following variables as possible causes of time spent watching TV: age, sex, race, region (recode as south, other), education (educ), work status (wrkstat: recode as employed full-time, other), subjective class position (class), number of children (childs), US-born or immigrant (born), health, happiness (happy).
- Check for, and if necessary deal with, any outliers (extreme values) for the tvhours variable.
- Ignore statistical significance for this assignment.
- Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
- Due Wednesday, May 27, at the beginning of class.
Assignment 8
- Question: Why haven’t Americans gotten happier in the past 40 years?
- In my “Happiness” chapter (link is above), I hypothesize that it is because some happiness boosters have decreased and some happiness depressors have increased. Pick five (or more) of the ones I mention and explore their relationship with happiness in the United States over time. The data points (cases) will be years.
- Analytical tools: line graph, scatterplot, and regression.
- Data: Use the GSS Berkeley site to calculate a measure of happiness in each GSS year from 1972 to 2012. You could use the share answering “not too happy,” use the share answering “very happy,” or calculate average happiness. Use the GSS and other data sources for the hypothesized causes you choose to examine.
- Ignore statistical significance for this assignment.
- Turn in a hard copy of your answer. No more than two single-spaced typed pages of text, with 12-point font and two-inch left and right margins.
- Due Monday, June 1, at the beginning of class.
RESEARCH REPORT
Write a report using quantitative data analysis to help answer a social science research question of your choosing. Follow the principles listed at the top of this syllabus.
Turn in a hard copy and upload your report on Ted. Emailed reports won’t be accepted. To turn it in on Ted, go to ted.ucsd.edu, log in, choose this course, and click on “Upload research report” in the blue menu bar. Your report won’t be visible to other students on Ted; this is just to allow me to check for plagiarism and length.
Length: 3,500 to 10,000 words. The report should be typed single-space using 12-point font and two-inch left and right margins. Put key graphs and tables in the text. Put appendix graphs and tables (if any) at the end.
Use footnotes to give credit to anyone from whom you borrow evidence or argument. I’m not picky about the formatting of the footnotes, but be sure to include the author(s), title, and year; don’t simply list an internet address.
Don’t plagiarize. If you aren’t sure what constitutes plagiarism and how to avoid it, see the UCSD Library’s guide to preventing plagiarism.
Due Wednesday, June 10, by 10:00am, in my office (SSB 472). A report turned in late but within 48 hours of the deadline will be penalized 25 points (out of 100). A report turned in more than 48 hours late, or not turned in at all, will receive a grade of zero.
GRADING
Course grades will be based on the following:
- 65%: assignments 1-8
- 35%: research report
Each of these will be graded on a scale of 0 to 100. So your numerical course grade is calculated as: (average score on assignments 1-8 x .65) + (research report x .35).
Your letter grade for the course will be determined as follows:
- 97 and above = A+
- 93–96 = A
- 90–92 = A–
- 87–89 = B+
- 83–86 = B
- 80–82 = B–
- 77–79 = C+
- 73–76 = C
- 70–72 = C–
- 60–69 = D
- below 60 = F
There will be no extra-credit projects or assignments.
ACADEMIC INTEGRITY
Students are encouraged to share intellectual views and discuss freely the principles and applications of course materials. However, graded work must be the product of independent effort unless otherwise instructed. Students are expected to adhere to UCSD policy on academic integrity.
SPECIAL NEEDS AND ACCOMMODATIONS
Students who need special accommodation or services should contact the Office of Students with Disabilities (OSD), University Center 202, email osd@ucsd.edu, tel 858.534.4382. You must register and request that the OSD send me official notification of your accommodation needs as soon as possible. Please meet with me to discuss accommodations and how the course requirements and activities may affect your ability to fully participate.