• Preface
  • 1 Tools
    • 1.1 Useful Resources
    • 1.2 Why these tools?
      • 1.2.1 Why not just use Excel?
    • 1.3 R and RStudio
      • 1.3.1 rstudio.calvin.edu
      • 1.3.2 RStudio on the Linux machines
      • 1.3.3 Rstudio on your own machine
    • 1.4 R packages
  • 2 Visualization
    • 2.1 Reading
      • 2.1.1 Why
      • 2.1.2 How
    • 2.2 References
      • 2.2.1 Visualization Design
      • 2.2.2 Implementation
    • 2.3 Tweaks
      • 2.3.1 Reordering bars in a bar plot
      • 2.3.2 Tweaking scales
      • 2.3.3 Direct Labels
      • 2.3.4 Legends and Labels
    • 2.4 Mapping
      • 2.4.1 Plotly
  • 3 Data Wrangling
    • 3.1 Resources
      • 3.1.1 Practice
    • 3.2 SQL and BigQuery
    • 3.3 Afterward
  • 4 Predictive Modeling
    • 4.1 Lingo
    • 4.2 Reading Guide
      • 4.2.1 Prediction as a Goal
      • 4.2.2 Linear models for regression
      • 4.2.3 tidymodels
    • 4.3 Modeling Goals
    • 4.4 Defining Overfitting
    • 4.5 Setting up a predictive modeling task
  • 5 Other Topics
    • 5.1 Recommendation Systems
      • 5.1.1 Discussion Activity
    • 5.2 Text Mining (and bias)
    • 5.3 Resources
    • 5.4 Relational Databases
  • 6 Communication
    • 6.1 Resources
  • 7 Ethics and Social Impact
    • 7.1 Privacy and Surveillance Discussion: Reidentification and Facial Recognition
      • 7.1.1 Reidentification
      • 7.1.2 Face Recognition
    • 7.2 Background
      • 7.2.1 Current Issues

DATA 202 Supplemental Notes

6 Communication

6.1 Resources

  • Tell a Meaningful Story With Data

  • Don’t misuse “experiment”

  • Shiny Apps

  • but-therefore

  • GitHub Pages