Homework 5 - Teams Database

In this homework, we design and build a relational database for an organization. Start by creating a new RMarkdown document named hw5-teams.Rmd using the standard homework format. Include your PostgreSQL commands as non-executed, SQL code chunks.

The purpose of this database is to maintain records on people and teams for an organization.

Data

The database needs to represent information on the following:

People - People are the core of an organization. Each person has a name and a start date (when they joined the organization).
Groups - Every person is assigned to one support group, which has a group name and a discussion topic (e.g., work-life balance).
Teams - Teams are sets of people who work on different tasks. Each person can be on more than on team. Each person has a role on the team (i.e., facilitator, administrator, or NULL).

Design and implement a relational schema using SQL that is appropriate for this database, and then add around a dozen records to this database, making sure that there are:

people who are not on any team, are on one team, and are on multiple teams.
at least two groups and at least two teams

Queries

Write SQL queries to retrieve:

The personal information of all the people in the organization, alphabetized by name.
The name of a particular person along with their support group name and topic.
The names of all the people on a particular team along with the name of the team and their role on that team.

R Access

Finally, load your database into RStudio using dplyr to reproduce the queries you wrote in SQL. Start by loading the required libraries.

library(tidyverse)
library(DBI)

You can load an ElephantSQL database in R by going to the “Details” tab for you ElephantSQL data instance and copying the following information into R.

DB_HOST <- "COPY_THE_SERVER_HERE"
DB_USER <- "COPY_THE_USER_HERE"
DB_PASSWORD <- "COPY_THE_PASSWORD_HERE"

You can then connect to the database and load the data as follows.

db <- DBI::dbConnect(RPostgres::Postgres(), 
                     host = DB_HOST, 
                     port = 5432,
                     user = DB_USER,
                     password = DB_PASSWORD)

dbListTables(db)
YOUR_TABLE <- tbl(db, "YOUR_TABLE_NAME")

Then, you can craft the required dplyr commands.

Conclusion

Because this is not an exploratory analysis, there are no analytical conclusions. Instead, give a one-or-two-paragraph discussion of the relative value of the dataset technologies we’ve used before this unit (e.g., CSV files, spreadsheets) and the relational database technologies we’ve used in this unit. Why would an organization choose one over the other? Which is more common in business organizations?