This lab exercise introduces R and Posit/RStudio using data from the Seattle
pet registration website. We will use Posit’s RStudio Integrated
Development Environment (IDE) to manage data, analysis, and
scripting.
The Dataset
Go to dataset website (linked above) and do the following:
Review the description of the dataset and, in particular, study
the data dictionary (i.e., “Columns in this Dataset”), which
specifies the data fields stored for each pet registration
entry.
Download (i.e., “Export”) the dataset to your local machine and
load it into MS Excel. Check that the data matches the website’s
description. Verify that the fields listed in the data dictionary match,
and check many entries there are.
This dataset is a well-formatted, comma-separated-values (CSV) file.
If you have problems downloading the file, you can access this local copy (data from
October 7, 2021).
RStudio
RStudio integrates file editing and R programming. Log in to the
course Posit server (see the main course page for the link) and do the
following:
Console — The Console pane (on the lower left) provides
an R interpreter into which you can enter R commands. Try typing the
following commands:
1 + 1 — This arithmetic expression, and
others like it, should evaluate properly.
library(tidyverse) — This function call
loads the Tidyverse library, which provides basic data
wrangling and analysis functions.
Files — The Files pane (on the lower right) provides a
file browser. For both this lab and future labs, do the following:
Create a “New Folder” for this course with an appropriate name
(e.g., info601) and then, inside that folder, create a
sub-folder named lab01 for this lab. Use this folder (e.g.,
info601/lab01) to store all your lab 1 files and to keep
them separate from the files for your other lab and homework
assignments.
For this lab (and others), you’ll also need data files, so create
a data sub-folder (e.g., info601/lab01/data)
and “upload” the pet registration CSV file linked from the Moodle course
into that folder.
Editor — The Editor pane (which will appear on the upper
left) provides a way to view and modify files. Do the following:
Click on the CSV file in the File pane and select “View File”.
This will open the raw CSV file in the editor pane.
Be sure that you can answer these questions:
What data is stored in the first line of the CSV file?
How do the other lines of the file compare with the way they are
presented in Excel?
Why there are no spaces around the commas?
Environment — The Environment pane (which will appear on
the upper right) provides a view of the R objects currently being stored
by RStudio. Do the following:
Click on the CSV file again in the File pane, but this time
select “Import Dataset”. This will open a formatted version of CSV file
in the Editor pane and add the registry dataset to the Environment Pane.
Verify that this happens.
Create a new variable by typing x <- 1 in the
Console pane. Verify that this adds a new object named x to
the Environment pane.
There’s nothing to turn in for this exercise.
LS0tDQp0aXRsZTogIkxhYiAxLjEgLSBIZWxsbywgUlN0dWRpbyEiDQpvdXRwdXQ6DQogIGh0bWxfZG9jdW1lbnQ6DQogICAgY29kZV9kb3dubG9hZDogdHJ1ZQ0KLS0tDQoNCmBgYHtyIGluY2x1ZGU9RkFMU0V9DQprbml0cjo6b3B0c19jaHVuayRzZXQoZWNobyA9IFRSVUUpDQpgYGANCg0KVGhpcyBsYWIgZXhlcmNpc2UgaW50cm9kdWNlcyBbUl0oaHR0cHM6Ly93d3cuci1wcm9qZWN0Lm9yZy8pIGFuZCBbUG9zaXQvUlN0dWRpb10oaHR0cHM6Ly93d3cucG9zaXQuY28vKSB1c2luZyBkYXRhIGZyb20gdGhlIFtTZWF0dGxlIHBldCByZWdpc3RyYXRpb24gd2Vic2l0ZV0oaHR0cHM6Ly9kYXRhLnNlYXR0bGUuZ292L0NvbW11bml0eS9TZWF0dGxlLVBldC1MaWNlbnNlcy9qZ3V2LXQ5cmIpLg0KV2Ugd2lsbCB1c2UgUG9zaXQmcnNxdW87cyBSU3R1ZGlvIEludGVncmF0ZWQgRGV2ZWxvcG1lbnQgRW52aXJvbm1lbnQgKElERSkgdG8gbWFuYWdlIGRhdGEsIGFuYWx5c2lzLCBhbmQgc2NyaXB0aW5nLg0KDQojIyBUaGUgRGF0YXNldA0KDQpHbyB0byBkYXRhc2V0IHdlYnNpdGUgKGxpbmtlZCBhYm92ZSkgYW5kIGRvIHRoZSBmb2xsb3dpbmc6DQoNCjEuIFJldmlldyB0aGUgZGVzY3JpcHRpb24gb2YgdGhlIGRhdGFzZXQgYW5kLCBpbiBwYXJ0aWN1bGFyLCBzdHVkeSB0aGUgKmRhdGEgZGljdGlvbmFyeSogKGkuZS4sICJDb2x1bW5zIGluIHRoaXMgRGF0YXNldCIpLCB3aGljaCBzcGVjaWZpZXMgdGhlIGRhdGEgZmllbGRzIHN0b3JlZCBmb3IgZWFjaCBwZXQgcmVnaXN0cmF0aW9uIGVudHJ5Lg0KDQoyLiBEb3dubG9hZCAoaS5lLiwgIkV4cG9ydCIpIHRoZSBkYXRhc2V0IHRvIHlvdXIgbG9jYWwgbWFjaGluZSBhbmQgbG9hZCBpdCBpbnRvIE1TIEV4Y2VsLiBDaGVjayB0aGF0IHRoZSBkYXRhIG1hdGNoZXMgdGhlIHdlYnNpdGUncyBkZXNjcmlwdGlvbi4gVmVyaWZ5IHRoYXQgdGhlIGZpZWxkcyBsaXN0ZWQgaW4gdGhlIGRhdGEgZGljdGlvbmFyeSBtYXRjaCwgYW5kIGNoZWNrIG1hbnkgZW50cmllcyB0aGVyZSBhcmUuDQoNClRoaXMgZGF0YXNldCBpcyBhIHdlbGwtZm9ybWF0dGVkLCBjb21tYS1zZXBhcmF0ZWQtdmFsdWVzIChDU1YpIGZpbGUuIElmIHlvdSBoYXZlIHByb2JsZW1zIGRvd25sb2FkaW5nIHRoZSBmaWxlLCB5b3UgY2FuIGFjY2VzcyBbdGhpcyBsb2NhbCBjb3B5XShkYXRhL1NlYXR0bGVfUGV0X0xpY2Vuc2VzLmNzdikgKGRhdGEgZnJvbSBPY3RvYmVyIDcsIDIwMjEpLg0KDQojIyBSU3R1ZGlvDQoNClJTdHVkaW8gaW50ZWdyYXRlcyBmaWxlIGVkaXRpbmcgYW5kIFIgcHJvZ3JhbW1pbmcuIExvZyBpbiB0byB0aGUgY291cnNlIFBvc2l0IHNlcnZlciAoc2VlIHRoZSBtYWluIGNvdXJzZSBwYWdlIGZvciB0aGUgbGluaykgYW5kIGRvIHRoZSBmb2xsb3dpbmc6IA0KDQoxLiAqQ29uc29sZSogLS0tIFRoZSBDb25zb2xlIHBhbmUgKG9uIHRoZSBsb3dlciBsZWZ0KSBwcm92aWRlcyBhbiBSIGludGVycHJldGVyIGludG8gd2hpY2ggeW91IGNhbiBlbnRlciBSIGNvbW1hbmRzLiBUcnkgdHlwaW5nIHRoZSBmb2xsb3dpbmcgY29tbWFuZHM6DQoNCiAgICBhLiBgMSArIDFgIC0tLSBUaGlzICphcml0aG1ldGljIGV4cHJlc3Npb24qLCBhbmQgb3RoZXJzIGxpa2UgaXQsIHNob3VsZCBldmFsdWF0ZSBwcm9wZXJseS4NCiAgICANCiAgICBiLiBgbGlicmFyeSh0aWR5dmVyc2UpYCAtLS0gVGhpcyAqZnVuY3Rpb24gY2FsbCogbG9hZHMgdGhlIFRpZHl2ZXJzZSAqbGlicmFyeSosIHdoaWNoIHByb3ZpZGVzIGJhc2ljIGRhdGEgd3JhbmdsaW5nIGFuZCBhbmFseXNpcyBmdW5jdGlvbnMuDQoNCjIuICpGaWxlcyogLS0tIFRoZSBGaWxlcyBwYW5lIChvbiB0aGUgbG93ZXIgcmlnaHQpIHByb3ZpZGVzIGEgZmlsZSBicm93c2VyLiBGb3IgYm90aCB0aGlzIGxhYiBhbmQgZnV0dXJlIGxhYnMsIGRvIHRoZSBmb2xsb3dpbmc6DQoNCiAgICBhLiBDcmVhdGUgYSAiTmV3IEZvbGRlciIgZm9yIHRoaXMgY291cnNlIHdpdGggYW4gYXBwcm9wcmlhdGUgbmFtZSAoZS5nLiwgYGluZm82MDFgKSBhbmQgdGhlbiwgaW5zaWRlIHRoYXQgZm9sZGVyLCBjcmVhdGUgYSBzdWItZm9sZGVyIG5hbWVkIGBsYWIwMWAgZm9yIHRoaXMgbGFiLiBVc2UgdGhpcyBmb2xkZXIgKGUuZy4sIGBpbmZvNjAxL2xhYjAxYCkgdG8gc3RvcmUgYWxsIHlvdXIgbGFiIDEgZmlsZXMgYW5kIHRvIGtlZXAgdGhlbSBzZXBhcmF0ZSBmcm9tIHRoZSBmaWxlcyBmb3IgeW91ciBvdGhlciBsYWIgYW5kIGhvbWV3b3JrIGFzc2lnbm1lbnRzLg0KDQogICAgYi4gRm9yIHRoaXMgbGFiIChhbmQgb3RoZXJzKSwgeW91J2xsIGFsc28gbmVlZCBkYXRhIGZpbGVzLCBzbyBjcmVhdGUgYSBgZGF0YWAgc3ViLWZvbGRlciAoZS5nLiwgYGluZm82MDEvbGFiMDEvZGF0YWApIGFuZCAidXBsb2FkIiB0aGUgcGV0IHJlZ2lzdHJhdGlvbiBDU1YgZmlsZSBsaW5rZWQgZnJvbSB0aGUgTW9vZGxlIGNvdXJzZSBpbnRvIHRoYXQgZm9sZGVyLg0KICAgIA0KMy4gKkVkaXRvciogLS0tIFRoZSBFZGl0b3IgcGFuZSAod2hpY2ggd2lsbCBhcHBlYXIgb24gdGhlIHVwcGVyIGxlZnQpIHByb3ZpZGVzIGEgd2F5IHRvIHZpZXcgYW5kIG1vZGlmeSBmaWxlcy4gRG8gdGhlIGZvbGxvd2luZzoNCg0KICAgIGEuIENsaWNrIG9uIHRoZSBDU1YgZmlsZSBpbiB0aGUgRmlsZSBwYW5lIGFuZCBzZWxlY3QgIlZpZXcgRmlsZSIuIFRoaXMgd2lsbCBvcGVuIHRoZSByYXcgQ1NWIGZpbGUgaW4gdGhlIGVkaXRvciBwYW5lLg0KDQogICAgYi4gQmUgc3VyZSB0aGF0IHlvdSBjYW4gYW5zd2VyIHRoZXNlIHF1ZXN0aW9uczoNCiAgICANCiAgICAgICAgaS4gV2hhdCBkYXRhIGlzIHN0b3JlZCBpbiB0aGUgZmlyc3QgbGluZSBvZiB0aGUgQ1NWIGZpbGU/DQogIA0KICAgICAgICBpaS4gSG93IGRvIHRoZSBvdGhlciBsaW5lcyBvZiB0aGUgZmlsZSBjb21wYXJlIHdpdGggdGhlIHdheSB0aGV5IGFyZSBwcmVzZW50ZWQgaW4gRXhjZWw/DQoNCiAgICAgICAgaWlpLiBXaHkgdGhlcmUgYXJlIG5vIHNwYWNlcyBhcm91bmQgdGhlIGNvbW1hcz8NCg0KNC4gKkVudmlyb25tZW50KiAtLS0gVGhlIEVudmlyb25tZW50IHBhbmUgKHdoaWNoIHdpbGwgYXBwZWFyIG9uIHRoZSB1cHBlciByaWdodCkgcHJvdmlkZXMgYSB2aWV3IG9mIHRoZSBSIG9iamVjdHMgY3VycmVudGx5IGJlaW5nIHN0b3JlZCBieSBSU3R1ZGlvLiBEbyB0aGUgZm9sbG93aW5nOg0KDQogICAgYS4gQ2xpY2sgb24gdGhlIENTViBmaWxlIGFnYWluIGluIHRoZSBGaWxlIHBhbmUsIGJ1dCB0aGlzIHRpbWUgc2VsZWN0ICJJbXBvcnQgRGF0YXNldCIuIFRoaXMgd2lsbCBvcGVuIGEgZm9ybWF0dGVkIHZlcnNpb24gb2YgQ1NWIGZpbGUgaW4gdGhlIEVkaXRvciBwYW5lIGFuZCBhZGQgdGhlIHJlZ2lzdHJ5IGRhdGFzZXQgdG8gdGhlIEVudmlyb25tZW50IFBhbmUuIFZlcmlmeSB0aGF0IHRoaXMgaGFwcGVucy4NCiAgICANCiAgICBiLiBDcmVhdGUgYSBuZXcgdmFyaWFibGUgYnkgdHlwaW5nIGB4IDwtIDFgIGluIHRoZSBDb25zb2xlIHBhbmUuIFZlcmlmeSB0aGF0IHRoaXMgYWRkcyBhIG5ldyBvYmplY3QgbmFtZWQgYHhgIHRvIHRoZSBFbnZpcm9ubWVudCBwYW5lLg0KDQpUaGVyZSdzIG5vdGhpbmcgdG8gdHVybiBpbiBmb3IgdGhpcyBleGVyY2lzZS4NCg==