Chapter 5 Introduction

5.1 The Motivation

Here is the dream:

Computers have revolutionized research, and that revolution is only beginning. Every day, social scientists and humanists all over the world use them to study things that are too big, too small, too fast, too slow, too expensive, too dangerous, or just too hard to study any other way.

Now here is the reality:

Every day, scholars all over the world waste time wrestling with computers. Tasks that should take a few moments take hours or days, and many things never work at all. When scholars try to get help, they are inundated with unhelpful information and give up.

This sorry state of affairs persists for three reasons:

  1. No room, no time: Everybody’s schedule is full — there is simply not space to add more about computing without dropping something else.

  2. The blind leading the blind: The infrastructure does not exist to help scholars develop the skills they need. Senior researchers cannot teach the next generation how to do things that they do not know how to do themselves.

  3. Autodidact chauvinism: Since there are no classrooms, scholars are pressured to teach themselves. But self-learning is time consuming and nearly impossible without a base level of knowledge.

Despite these challenges, there are great reasons to learn how to program:

  1. Practical efficiency: Even though it takes some time at first, once you get the hang of it, learning how to program can save you an enormous amount of time doing basic tasks that you would otherwise do by hand.

  2. New tools: Some things are impossible or nearly impossible to do by hand. Computers open the door for new tools and methods, but many require programming skills.

  3. New data: The Internet is a wealth of data waiting to be analyzed! Whether it is collecting Twitter data, working with the Congress API, or scraping websites, programming knowledge is a must.

  4. Better scholarship: (Quality) programming can open the door to better transparency, reproducibility, and collaboration in the Social Sciences and the Humanities.

Goal of the Class: Learn to Learn

The basic learning objective of this course is to leave here with the knowledge and skills required to learn on your own, whether that is through programming documentation, StackExchange and other online fora, courses, or D-Lab workshops.

By the end of the course, students should be able to

  • Understand basic programming terminologies, structures, and conventions.
  • Write, execute, and debug R code.
  • Produce reproducible analyses using R Markdown.
  • Clean, transform, and wrangle data using the tidyverse packages.
  • Scrape data from websites and APIs.
  • Parse and analyze text documents.
  • Be familiar with the concepts and tools of a variety of computational social science applications.
  • Master basic Git and GitHub workflows.
  • Learn independently and train themselves in a variety of computational applications and tasks through online documentation.

This course will not

  • Teach you to be a professional programmer or software developer.
  • Teach you statistics, computer science, or specialized social science / digital humanities methods.

Why not just take a Computer Science course?

Computer Science courses do not anticipate the types of questions social scientists might ask, and therefore they

  • Introduce many unnecessary concepts.
  • Do a poor job of explaining how computer programming tools might be used by social scientists.
  • Are too resource-intensive for the average social scientist.

Programming is not useful just for computer scientists, methodologists, or people who work with “big data.”

A Practical Example

To illustrate, here is a practical example that comes out of my own research:

Task 1 (by hand)

Topic: International Human Rights “Naming and Shaming”

Question: Who shames whom on what?

Data: UN Human Rights Committee’s Universal Periodic Review

From Antigua & Barbuda 2011 review:

The Task: Parse a bunch of reports (PDFs) into a dataset (CSV). Add metadata for issue, action, and sentiment.

How much time will it take?

By Hand: 40,000 recommendations x 3 min per recommendation x 8-hour days x 5-day weeks = 12 months

Task 2 (by hand)

What if we wanted to extend this research?

Question:: How does UPR shaming compare to actual human rights abuses?

Data: Amnesty International’s Urgent Actions

Task: Collect all of Amnesty International’s Urgent Actions and add metadata for issue and country.

How much time will it take?

By hand: 25,000 recommendations x 3 min per recommendation x 8-hour days x 5-day weeks = 7.5 months

Tasks 1 & 2 (with a computer)

With a computer, we can write a program that

  1. Parses recommendations into a CSV.
  2. Codes recommendations by issue, action, and sentiment using computational text analysis tools.
  3. Uses webscraping to collect all of Amnesty International’s urgent actions.
  4. Runs simple regression models with R to correlate Amnesty reports with UPR shaming.

Total time: 2 months

Time Saved: 1.5 years

5.2 About This Class

About Me

My name is Rochelle Terman and I am a faculty member in Political Science.

  • A few years ago, I did not know how to program. Now I program almost every day.
  • I program mostly in Python and R. I have a special interest in text analysis and webscraping.
  • My substantive research is on international norms and human rights.
  • I will not be able to answer all your questions.
  • No one will.
  • But especially me.

Course Structure

The course is divided into two main sections:

1. Skills

Basic computer literacy, terminologies, and programming languages:

  • Base R: objects and data structures.
  • tidyverse for data analysis.
  • Modeling and visualization.
  • Key programming concepts (iteration, functions, conditional flow, etc.).

We are using R because it is the common programming language among Political Scientists. But if you understand the concepts, you should able to pick up Python and other languages pretty easily.

2. Applications

Use the skills learned in part 1 towards practical applications:

  • Webscraping.
  • APIs.
  • Computational Text Analysis.
  • Version control and communication.

The goal is to introduce the students to a medley of common applications so that they can discover which avenue to pursue in their own research and what such training would entail.

Class Activities

Classes will follow a “workshop” style, combining lecture, demonstration, and coding exercises. We envision the class to be as interactive/hands-on as possible, with students programming every session.

It is important that students complete the requisite reading before class. I anticipate spending 1/2 the class lecturing and 1/2 practicing with coding challenges.

Course Websites

Class notes and other materials are available here: https://github.com/plsc-31101/course/

We will use Canvas to distribute/accept assignments.

We will use Piazza for communications and discussion. Please use Piazza liberally.

Evaluation

This is a graded class based on the following:

  • Completion of assigned homework (50%).
  • Participation (25%).
  • Final project (25%).

If you want to audit, please let me know ASAP.

Assignments

  • In general, assignments are assigned at the end of lecture and are due the following week.
  • Exceptions will be noted.
  • The first assignment is due next week before class on Wednesday, October 7. It is available on Canvas.
  • Turn in assignments on Canvas.
  • Work in groups, but submit your own assignment.

Participation

The class participation portion of the grade can be satisfied in one or more of the following ways:

  • Attending the lectures.
  • Asking and answering questions in class.
  • Attending office hours.
  • Contributing to class discussion through the Piazza site.
  • Collaborating with the computing community.

Final Project

Students have two options for class projects:

  1. Data project: Use the tools we learned in class on your own data of interest.

  2. Tutorial project: Create a tutorial on a tool we did not cover in class.

Both options require an R Markdown file narrating the project.

Students are required to write a short proposal by November 6 (no more than 2 paragraphs) in order to get approval/feedback from the instructors.

Project materials (i.e., a GitHub repo) will be due by end of day on December 10. We will specify submission details in class.

On December 11, we will have a lightning talk session where students can present their projects in a maximum 5-minute talk.

Software

  • Installation instructions are on the website.
  • Get started EARLY.
  • Go to the Installfest (Wednesday, Sep 30, 9:30-11:30 on Zoom) to double check your installation.
  • If you have computer troubles, post the problem on Piazza with as much detail as possible.

5.3 Learning How to Program

Before we talk about what it takes to learn how to program, let’s first review what programming is.

What is programming?

A program is a sequence of instructions that specifies how to perform a computation. Most programs are written in a human-readable programming language (or “source code”) and then executed with the help of a compiler or interpreter.

A few basic instructions appear in just about every language:

  1. Input: Get data from the keyboard, a file, the network, or some other device.
  2. Output: Display data on the screen, save it in a file, send it over the network, etc.
  3. Math: Perform basic mathematical operations like addition and multiplication.
  4. Conditional execution: Only perform tasks if certain conditions are met.
  5. Iteration: Do the same task over and over again on different inputs.

That being said, programming languages differ from one another in the following ways:

  1. Syntax: Whether to add a semicolon at the end of each line, etc.
  2. Usage: JavaScript is for building websites, R is for statistics, Python is general purpose, etc.
  3. Level: How close you are to the hardware. ‘C’ is often considered to be the lowest- (or one of the lowest-) level language.
  4. Object-oriented: “Objects” are data + functions. Some programming languages are object-oriented (e.g., Python), and some are not (e.g., C).
  5. Many more: Here is a list of all the types of programming languages out there.

What language should you learn?

Most programmers can program in more than one language. That is because they know how to program generally, as opposed to “knowing” Python, R, Ruby, or whatever.

So what should your first programming language be? That is, what programming language should you use to learn how to program? At the end of the day, the answer depends on what you want to get out of programming. Many people recommend Python because it is fun, easy, and multi-purpose. Here is an article that can offer more advice.

In this class, we will be using R because it is the most popular language in our disciplinary community (of political scientists).

Regardless of what you choose, you will probably grow comfortable in one language while learning the basic concepts and skills that allow you to ‘hack’ your way into other languages. That is because programming is an extendible skill.

Thus, “knowing how to program” means learning how to think like a programmer, not necessarily knowing all the language-specific commands off the top of your head. Do not learn specific programming languages; learn how to program.

What programming is really like.

Here is the sad reality: When you are programming, 80% or more of your time will be spent debugging, looking stuff up (like program-specific syntax, documentation for packages, useful functions, etc.), or testing. This does not apply just to beginner or intermediate programmers, although you will grow more “fluent” over time.

Google software engineers write an average of 10-20 lines of code per day.

The lesson: Programming is a slow activity, especially in the beginning.

If you are a good programmer, you are a good detective!

How to Learn

Here are some tips on how to learn computer programming:

  1. Learning to program is 5% intelligence, 95% endurance.
  2. Like learning to play an instrument or speak a foreign language, it takes practice, practice, practice.
  3. Program a little bit every day.
  4. Program with others. Do the problem sets in pairs or groups.
  5. It is better to type than to copy and paste.
  6. Most “programming” is actually researching, experimenting, and thinking.
  7. Stay organized.

The 15-Minute Rule

We will follow the 15-minute rule in this class. If you encounter a problem in your assignments, spend 15 minutes troubleshooting the problem on your own. After 15 minutes, if you still cannot solve the problem, ask for help.

(Hat tip to Computing for Social Sciences.)

Debugging

Those first 15 minutes should be spent trying to debug your code. Here are some tips:

  • Read the errors!
  • Read the documentation.
  • Make it smaller.
  • Figure out what changed.
  • Check your syntax.
  • Print statements are your friends.

Using the Internet

You should also make use of Google and StackOverflow to resolve the error. Here are some tips for how to google errors:

  • Google: name-of-program + text in error message.
  • Remove user- and data-specific information first!
  • See if you can find examples that do and do not produce the error. Try other people’s code, but do not fall into the copy-paste trap.

Asking for Help

We will use Piazza for class-related questions and discussion. You are highly encouraged to ask questions and answer one another’s questions.

  1. Include a brief, informative title.
  2. Explain what you are trying to do and how it failed.
  3. Include a reproducible example.

Here are some helpful guidelines on how to properly ask programming questions:

  1. “How to Ask Programming Questions,” ProPublica.
  2. “How do I ask a good question?” StackOverflow.
  3. “How to properly ask for help,” Computing for Social Science.