class: left, top, title-slide .title[ # Introduction
Example ] .author[ ### Keith VanderLinden
Calvin University ] --- <!-- Start with a scripture reading that is in keeping with the perspectives assignments in the course (TBD). Ken uses Philippians 1:9-11. --> # Data Science in Action Consider H. Rosling’s well-known demonstration of data visualization. <iframe width="560" height="315" src="https://www.youtube.com/embed/hVimVzgtD6w" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> .footnote[See: https://www.ted.com/talks/hans_rosling_the_best_stats_you_ve_ever_seen] ??? This is Hans Rosling's 2006 TED talk. Rosling died in 2017. They will have watched this for the preparation assignment. Review the answers to the preparation question on Rosling's students (the problem wasn't so much ignorance and preconceptions). Data tests preconceptions. Here, we do a single, full example that approximates Rosling's visualization. References: - Mine's UN votes (https://www.youtube.com/watch?v=OJ1xR0ObhIw)? - H. Rosling's videos (https://youtu.be/Z8t4k0Q8e8Y or https://www.youtube.com/watch?v=hVimVzgtD6w 0:00-5:15) - Ken's example https://cs.calvin.edu/courses/data/202/ka37/slides/w02/w2d2-vis.html --- # Example: Raw Data .pull-left[ ```r *gapminder ``` .footnote[See: https://www.rdocumentation.org/packages/gapminder] ] .pull-right[ ``` ## # A tibble: 1,704 × 6 ## country continent year lifeExp pop gdpPercap ## <fct> <fct> <int> <dbl> <int> <dbl> ## 1 Afghanistan Asia 1952 28.8 8425333 779. ## 2 Afghanistan Asia 1957 30.3 9240934 821. ## 3 Afghanistan Asia 1962 32.0 10267083 853. ## 4 Afghanistan Asia 1967 34.0 11537966 836. ## 5 Afghanistan Asia 1972 36.1 13079460 740. ## 6 Afghanistan Asia 1977 38.4 14880372 786. ## 7 Afghanistan Asia 1982 39.9 12881816 978. ## 8 Afghanistan Asia 1987 40.8 13867957 852. ## 9 Afghanistan Asia 1992 41.7 16317921 649. ## 10 Afghanistan Asia 1997 41.8 22227415 635. ## # … with 1,694 more rows ``` ] ??? This is the raw data, taken from the gapminder library. A tibble is a upgraded R data.frame. - The dataset is rectangular, as most simple datasets are. - There are 1704 rows/observations; 6 columns/variables; and, thus, 10224 values (1704*6) --- # Example: Data Summary .pull-left[ ```r gapminder %>% * summarize(AvgLifeExp = mean(lifeExp)) ``` ] .pull-right[ ``` ## # A tibble: 1 × 1 ## AvgLifeExp ## <dbl> ## 1 59.5 ``` ] ??? This groups the data by year and computes an average per group. --- # Example: Data Aggregation .pull-left[ ```r gapminder %>% * group_by(year) %>% summarize(AvgLifeExp = mean(lifeExp)) ``` ] .pull-right[ ``` ## # A tibble: 12 × 2 ## year AvgLifeExp ## <int> <dbl> ## 1 1952 49.1 ## 2 1957 51.5 ## 3 1962 53.6 ## 4 1967 55.7 ## 5 1972 57.6 ## 6 1977 59.6 ## 7 1982 61.5 ## 8 1987 63.2 ## 9 1992 64.2 ## 10 1997 65.0 ## 11 2002 65.7 ## 12 2007 67.0 ``` ] ??? This groups the data by year and computes an average per group. --- # Example: Data Visualisation .pull-left[ ```r gapminder %>% group_by(year) %>% summarize(AvgLifeExp = mean(lifeExp)) %>% * ggplot() + * aes(x=year, y=AvgLifeExp) + * geom_line() ``` ] .pull-right[ <img src="example_files/figure-html/unnamed-chunk-5-1.png" width="100%" /> ] ??? This takes the same data, grouped by year, and plots the average over time using a line plot. --- # Example: Data Visualisation (cf. Rosling) .pull-left[ ```r library(scales) gapminder %>% filter(year == 2007) %>% ggplot() + aes(x = gdpPercap, y = lifeExp) + geom_point(alpha = .8) + aes(color = continent) + aes(size = pop) + scale_x_continuous( breaks = c(400, 4000, 40000), trans = "log10") + labs(x = "GDP per Capita") + labs(y = "Life Expectancy (years)") + labs(color = "Continent") + labs(size = "Population") + scale_size_area(labels=label_comma()) + theme_bw() + annotation_logticks(sides = "b") ``` ] .pull-right[ <img src="example_files/figure-html/unnamed-chunk-6-1.png" width="100%" /> ] ??? This graph approximates one time-stop of Rosling's famous animation. Summarize this only; we'll demo the code detail in the RStudio slides.