Chapter 1 Syllabus

1.1 Course Description

The purpose of this course is to provide graduate students with the critical computing skills necessary to conduct research in quantitative / computational social science. This course is not an introduction to statistics, computer science, or specialized social science methods. Rather, the focus will be on practical skills necessary to be successful in further methods work. The first portion of the class introduces students to basic computer literacy, terminologies, and the R programming language. The second part of the course provides students the opportunity to use the skills they learned in part 1 towards practical applications such as webscraping, data collection through APIs, automated text analysis, etc. We will assume no prior experience with programming or computer science.

Objectives

By the end of the course, students should be able to:

  • Understand basic programming terminologies, structures, and conventions.
  • Write, execute, and debug R code.
  • Produce reproducible analyses using R Markdown.
  • Clean, transform, and wrangle data using the tidyverse packages.
  • Scrape data from websites and APIs.
  • Parse and analyze text documents.
  • Be familiar with the concepts and tools of a variety of computational social science applications.
  • Master basic Git and GitHub workflows
  • Learn independently and train themselves in a variety of computational applications and tasks through online documentation.

1.2 Who should take this course

This course is designed for Political Science Graduate students, but any graduate student is welcome. We will not presume any prior programming or computer science experience.

1.3 Requirements and Evaluation

Final Grades

This is a graded class based on the following:

  • Completion of assigned homework (50%)
  • Participation (25%)
  • Final project (25%)

Assignments

Assignments will be assigned at the end of every Thursday session. They will be due one week later, unless otherwise noted. The assignments are intended to expand upon the lecture material and to help students develop the actual skills that will be useful for applied work. The assignments will be frequent but each of them should be fairly short.

You are encouraged to work in groups, but the work you turn in must be your own. Group submission of homework, or turning in copies of the same code or output, is not acceptable. While you are encouraged to use the internet to help you debug, but do not copy and paste large chunks of code that you do not understand. Remember, the only way you actually learn how to write code is to write code!

Portions of the homework in R should be completed using R markdown, a markup language for producing well-formatted documents with embedded R code and outputs. To submit your homework, knit the R Markdown file to PDF, and then submit the PDF file through Canvas (unless otherwise noted).

Class Participation

The class participation portion of the grade can be satisfied in one or more of the following ways:

  • attending the lectures
  • asking and answering questions in class
  • attending office hours
  • contributing to class discussion through the Canvas site, and/or
  • collaborating with the computing community, either by attending a workshop or meetup, submitting a pull request to a github repository (including the class repository), answering a question on StackExchange, or other involvement in the social computing / digital humanities community.

Final Project

Students have two options for class projects:

  1. Data project: Use the tools we learned in class on your own data of interest. Collect and/or clean the data, perform some analysis, and visualize the results. Post your reproducible code on Github.

  2. Tool project: Create a tutorial on a tool we didn’t cover in class. Ideas include: bash, LaTex, pandoc, quanteda, tidytext, etc. Post it on github.

Students are required to write a short proposal by November 7 (no more than 2 paragraphs) in order to get approval / feedback from the instructors.

Project materials (i.e. a github repo) will be due by end of day on December 9. We will specify submission details in class.

On December 10 we will have a lightning talk session where students can present their projects in a maximum 5 minute talk.

Late Policy and Incompletes

All deadlines are strict. Late assignments will be dropped a full letter grade for each 24 hours past the deadline. Exceptions will be made for students with a documented emergency or illness.

I will only consider granting incompletes to students under extreme personal/family duress.

Academic Integrity

I follow a zero-tolerance policy on all forms of academic dishonesty. All students are responsible for familiarizing themselves with, and following, university policies regarding proper student conduct. Being found guilty of academic dishonesty is a serious offense and may result in a failing grade for the assignment in question, and possibtire course.

1.4 Activities and Materials

Course Structure

Classes will follow a “workshop” style, combining lecture, demonstration, and coding exercises. We envision the class to be as interactive / hands on as possible, with students programming every session. You must bring a laptop to class.

It is important that students complete the requisite reading before class. I anticipate spending 1/2 the class lecturing, and 1/2 practicing with code challenges.

Course Website

All course notes will be posted on https://plsc-31101.github.io/course/, including class notes, code demonstrations, sample data, and assignments. Students will be assigned reading from these notes before every class.

Students are encouraged to submit pull requests to the website repository, for example if they find a typo, or if they found a particularly helpful resource that would aid other students.

Students projects will also be shared on the course website.

Canvas

We will use Canvas for communication (announcements and questions) and turning in assignments.

You should ask questions about class material and assignments through the Canvas website so that everyone can benefit from the discussion. We encourage you to respond to each other’s questions as well. Questions of a personal nature can be emailed to us directly.

Computer Requirements and Software

By the end of the first week, you should install the following software on your computer:

  • Access to the UNIX command line (e.g., a Mac laptop, a Bash wrapper on Windows)
  • Git
  • R and RStudio (latest versions)

This requires a computer that can handle all this software. Almost any Mac will do the job. Most Windows machines are fine too if they have enough space and memory.

See Install Page for more information. We will be having an InstallFest on Wednesday, Oct 2 from 9:30 to 11:30 in Pick 504. for those students experiencing difficulties downloading and installing the requisite software.

1.5 Curriculum Outline / Schedule

  1. Oct 1 - Introduction
    • course objectives, logistics, overview of programming and reproducible research.
  2. Oct 3 - R Tools
    • R Studio, R Markdown, packages, help.
  3. Oct 8 - R Syntax
    • basic syntax, variables, functions, data types.
  4. Oct 10 - Data Structures
    • vectors, lists, factors, matrices, data frames.
  5. Oct 15 - Indexing and Subsetting
    • subsetting formats, operators, boolean conditionals, sub-assignment, applications.
  6. Oct 17 - Introduction to Data
    • common terms, formats, tidy data, storage, import/export, exploratory data analysis.
  7. Oct 22 - Manipulating data with dplyr
    • importing data, subsetting, summarizing, and conducting calculations across groups, piping.
  8. Oct 24 - Tidying data with tidyr
    • tidy data principles, gather, spread, separate, unite
  9. Oct 29 - Merging and Linking Data
    • relational data, keys, joins, missing values.
  10. Oct 31 - Visualization
    • base plotting, ggplot, grammar of graphics, writing images.
  11. Nov 5 - Hypothesis testing and regressions
    • models, model objects, stargazer.
  12. Nov 7 - Strings and Regex

  13. Nov 12 - R programming 1
    • conditional flow, functions.
  14. Nov 14 - R Programming 2
    • iterations, map.
  15. Nov 19 - Collecting data with APIs
    • requests, RESTful APIs, queries, API libraries.
  16. Nov 21 - Webscraping
    • HTML, CSS, In-Browser tools, scraping tools.
  17. Nov 26 - Text analysis 1
    • preprocessing, DTM, dictionary methods, distinctive words.
  18. Nov 28 - No class, Thanksgiving

  19. Dec 3 - Text analysis 2
    • supervised vs. unsupervised learning, Vector-space models, topic models.
  20. Dec 5 - Git / Github

  21. Dec 10, 1:30-3:30pm - Final project lightning talks