• PLSC 31101
  • I Before Class
  • 1 Syllabus
    • 1.1 Course Description
    • 1.2 Who should take this course
    • 1.3 Requirements and Evaluation
    • 1.4 Activities and Materials
    • 1.5 Curriculum Outline / Schedule
  • 2 Installation
    • 2.1 R
    • 2.2 R Studio
    • 2.3 R Packages
    • 2.4 LaTex
    • 2.5 The Bash Shell
    • 2.6 Git
    • 2.7 Other helpful tools
    • 2.8 Testing your installation
  • 3 Homework Rubric
  • II Course Notes
  • 4 Introduction
    • 4.1 The Motivation
    • 4.2 About This Class
    • 4.3 Learning How to Program
  • 5 R Basics
    • 5.1 What is R?
    • 5.2 RStudio
      • 5.2.1 Console
      • 5.2.2 Scripts
      • 5.2.3 Running Code
      • 5.2.4 Comments
      • 5.2.5 Diagnostics and errors
      • 5.2.6 Errors, Messages, and Warnings
      • 5.2.7 R Environment
    • 5.3 R Packages
      • 5.3.1 Installing Packages
      • 5.3.2 Loading Packages
      • 5.3.3 Challenge
    • 5.4 R Markdown
      • 5.4.1 YAML Header
      • 5.4.2 Markdown
      • 5.4.3 Code Chunks
      • 5.4.4 Knitting
      • 5.4.5 R Chunk Options for Knitting
      • 5.4.6 Cheatsheets and Other Resources
      • 5.4.7 Challenges
  • 6 R Syntax
    • 6.1 Variables
      • 6.1.1 Arithmetic
      • 6.1.2 Assigning Variables
      • 6.1.3 Variable Names
      • 6.1.4 Challenges
    • 6.2 Functions
      • 6.2.1 Arguments
      • 6.2.2 Store Function Output
      • 6.2.3 Argument Restrictions and Defaults
      • 6.2.4 Documentation and Help Files
      • 6.2.5 Challenges
    • 6.3 Data Types
      • 6.3.1 What Is that Type?
      • 6.3.2 Coercion
      • 6.3.3 Other Objects
      • 6.3.4 Challenges
  • 7 Data Classes and Structures
    • 7.1 Vectors
      • 7.1.1 Creating Vectors
      • 7.1.2 Naming a Vector
      • 7.1.3 Calculations on Vectors
      • 7.1.4 Types of Vectors
      • 7.1.5 Coercion
      • 7.1.6 Challenges
    • 7.2 Lists
      • 7.2.1 Creating Lists
      • 7.2.2 Naming Lists
      • 7.2.3 List Structure
      • 7.2.4 Challenges
    • 7.3 Factors
      • 7.3.1 Creating Factors
      • 7.3.2 Summarizing a Factor
      • 7.3.3 Changing Factor Levels
      • 7.3.4 Factors are Integers
      • 7.3.5 Challenges
    • 7.4 Matrices
      • 7.4.1 Creating Matrices
      • 7.4.2 Matrix Dimensions
      • 7.4.3 Matrix Names
      • 7.4.4 Challenge
    • 7.5 Dataframes
      • 7.5.1 Creating Dataframes
      • 7.5.2 The Structure of Dataframes
      • 7.5.3 Naming Dataframes
      • 7.5.4 Coercing Dataframes
      • 7.5.5 Challenges
      • 7.5.6 Quiz
      • 7.5.7 Answers
  • 8 Subsetting
    • 8.1 Subsetting Vectors
      • 8.1.1 Subsetting Types
      • 8.1.2 Conditional Subsetting
      • 8.1.3 Challenge
    • 8.2 Subsetting Lists
      • 8.2.1 With [
      • 8.2.2 With [[
      • 8.2.3 with $
      • 8.2.4 Challenge
    • 8.3 Subsetting Matrices
    • 8.4 Subsetting Dataframes
      • 8.4.1 Subsetting Names and $
      • 8.4.2 Conditional Subsetting
      • 8.4.3 List-Like and Matrix-Like Subsetting
      • 8.4.4 Challenges
    • 8.5 Sub-assignment
      • 8.5.1 Basics of Sub-assignment
      • 8.5.2 Recycling
      • 8.5.3 Applications
  • 9 Project Workflow
    • 9.1 Organizing Code
      • 9.1.1 Store Analyses in Scripts, Not Workspaces.
      • 9.1.2 Working Directories and Paths
      • 9.1.3 R Projects
      • 9.1.4 File Organization
    • 9.2 Importing and Exporting
      • 9.2.1 Where’s my data?
      • 9.2.2 Data Storage
      • 9.2.3 Importing Data
      • 9.2.4 Exporting Data
  • 10 Data Transformation
    • 10.1 Introduction to Data
      • 10.1.1 The Gapminder Dataset
      • 10.1.2 Structure and Dimensions
      • 10.1.3 Summary statistics
      • 10.1.4 Challenges
    • 10.2 Introduction to Tidyverse
      • 10.2.1 tidyverse
      • 10.2.2 Gapminder
      • 10.2.3 Why dplyr?
    • 10.3 dplyr Functions
      • 10.3.1 Select Columns with select
      • 10.3.2 The Pipe
      • 10.3.3 Filter Rows with filter
      • 10.3.4 Calculate Across Groups with group_by
      • 10.3.5 Summarize Across Groups with summarize
      • 10.3.6 Add New Variables with mutate
      • 10.3.7 Arrange Rows with arrange
      • 10.3.8 Count Variable Quantities with count()
    • 10.4 Challenges
  • 11 Tidying Data
    • 11.1 Wide vs. Long Formats
    • 11.2 Tidying the Gapminder Data
    • 11.3 tidyr Functions
      • 11.3.1 gather
      • 11.3.2 separate
      • 11.3.3 spread
    • 11.4 More tidyverse
    • 11.5 Challenges
  • 12 Relational Data
    • 12.1 Why Relational Data
    • 12.2 Keys
    • 12.3 Joins
    • 12.4 Defining Keys
    • 12.5 Duplicate Keys
    • 12.6 Challenges
  • 13 Plotting
    • 13.1 The Dataset
    • 13.2 R Base Graphics
      • 13.2.1 Scatter and Line Plots
      • 13.2.2 Histograms and Density Plots
      • 13.2.3 Labels
      • 13.2.4 Axis and Size Scaling
      • 13.2.5 Graphical Parameters
      • 13.2.6 Annotations, Reference Lines, and Legends
    • 13.3 ggplot2
      • 13.3.1 Grammar
      • 13.3.2 Anatomy of aes
      • 13.3.3 Layers
      • 13.3.4 Labels
      • 13.3.5 Transformations and Stats
      • 13.3.6 Facets
      • 13.3.7 Putting Everything Together
    • 13.4 Saving plots
  • 14 Statistical Inferences
    • 14.1 Statistical Distributions
      • 14.1.1 Sampling and Simulation
      • 14.1.2 Random Seeds
      • 14.1.3 Challenges
    • 14.2 Inferences and Regressions
      • 14.2.1 Statistical Tests
      • 14.2.2 Regressions and Linear Models
      • 14.2.3 Regression Output
      • 14.2.4 Interactions
      • 14.2.5 Formatting Regression Tables
      • 14.2.6 Challenges
  • 15 Strings and Regular Expressions
    • 15.1 String Basics
      • 15.1.1 Creating Strings
      • 15.1.2 Escape and Special Characters
      • 15.1.3 String length
      • 15.1.4 Combining strings
      • 15.1.5 Subsetting strings
      • 15.1.6 Locales
      • Challenges
    • 15.2 Regular expressions
      • 15.2.1 Basic matches
      • 15.2.2 Escape Characters
      • 15.2.3 Anchors
      • 15.2.4 Character classes and alternatives
      • 15.2.5 Repetition
      • 15.2.6 Regex Resources
    • 15.3 Common Tools
      • 15.3.1 Detect matches
      • 15.3.2 Extract matches
      • 15.3.3 Replacing matches
      • 15.3.4 Splitting
    • 15.4 Other types of patterns
      • 15.4.1 stringi
    • Acknowledgments
  • 16 Programming in R
    • 16.1 Conditional Flow
      • 16.1.1 Multiple Conditions
      • 16.1.2 Complex Statements
      • 16.1.3 Code Style
      • 16.1.4 if vs. if_else
    • 16.2 Functions
      • 16.2.1 Why Write Functions?
      • 16.2.2 Anatomy of a Function
      • 16.2.3 Writing a Function
      • 16.2.4 Using a Function
      • 16.2.5 Variable Scope
      • 16.2.6 Arguments
      • 16.2.7 Challenges
    • 16.3 Iteration
      • 16.3.1 Vectorized Functions
      • 16.3.2 For-loops
      • 16.3.3 Challenges
      • 16.3.4 Functional Programming and map
      • 16.3.5 Challenges
      • 16.3.6 Scoped Verbs
  • 17 Collecting Data from the Web
    • 17.1 Introduction
    • 17.2 Web APIs
      • 17.2.1 Some Basic Terminology
      • 17.2.2 How Do GET Requests Work?
      • 17.2.3 Finding APIs
      • 17.2.4 Getting API Access
      • 17.2.5 Using APIs in R
    • 17.3 Collecting Twitter Data with RTweet
      • 17.3.1 Setting up RTweet
      • 17.3.2 UChicago Political Science Prof Tweets
      • 17.3.3 Hashtags and Text Strings
    • 17.4 Writing API Queries
      • 17.4.1 Constructing the API GET Request
      • 17.4.2 Parsing the response
      • 17.4.3 Iteration through results pager
      • 17.4.4 Visualizing Results
      • 17.4.5 More resources
    • 17.5 Webscraping
      • 17.5.1 What’s a website?
      • 17.5.2 HTML
      • 17.5.3 CSS
      • 17.5.4 Finding Elements with Selector Gadget
    • 17.6 Scraping Presidential Statements
      • 17.6.1 Using RVest to Read HTML
      • 17.6.2 Find Page Elements
      • 17.6.3 Get Attributes and Text of Elements
      • 17.6.4 Let’s DO this.
      • 17.6.5 Challenge 1: Make a function
  • 18 Text Analysis
    • 18.1 Preprocessing
      • 18.1.1 From Words to Numbers
      • 18.1.2 Exploring the DTM
      • 18.1.3 Exporting the DTM
      • 18.1.4 Challenges
    • 18.2 Sentiment Analysis and Dictionary Methods
      • 18.2.1 Preprocessing and Setup
      • 18.2.2 Scoring the songs
      • 18.2.3 Challenges
    • 18.3 Distinctive Words
      • 18.3.1 Unique usage
      • 18.3.2 Differences in frequences
      • 18.3.3 Differences in averages
      • 18.3.4 Difference in averages, adjustment
    • 18.4 Structural Topic Models
      • 18.4.1 Preprocessing
      • 18.4.2 Estimate Model
      • 18.4.3 Interprete Model
      • 18.4.4 Analyze topics
  • 19 Git and Github
    • 19.1 Starting with Git
      • 19.1.1 Creating a repository
      • 19.1.2 git add: tracks files
      • 19.1.3 git commit: saves files
      • 19.1.4 git push: moves changes from one branch to another.
      • 19.1.5 Challenge 1
      • 19.1.6 Ignoring Things
      • 19.1.7 Challenge 2
      • 19.1.8 Pulling / Syncing
    • 19.2 Collaborating
      • 19.2.1 Fork & Pull Model
  • III Resources
  • 20 Cheat Sheets and Guides
  • Published with bookdown

PLSC 31101: Computational Tools for Social Science

Chapter 20 Cheat Sheets and Guides

  1. RStudio IDE Cheat Sheet

  2. R Markdown Cheat Sheet

  3. R Markdown Reference Guide Base R Cheat Sheet

  4. Data Transformation with dplyr Cheat Sheet

  5. Data Visualization Cheat Sheet

  6. Regular Expressions Cheat Sheet