Relational Data The Relational Model

class: left, top, title-slide

.title[
# Relational Data The Relational Model
]
.author[
### Keith VanderLinden Calvin University
]

---

# The Relational Data Model

.pull-left[
Realistic datasets tend to comprise multiple *tables* that can be too large to easily fit into a computer&rsquo;s main memory. Such datasets have been represented most commonly using the *relational model*, which distinguishes the following:

- Database
- Table
- Record
- Field

*Relational database management systems* (RDBMS) are used to represent and work with relational data.
]

???
The real flights database is much larger than `nycflights13`, which includes only data on the three major NYC airports for the year 2013 (~350K flights).

- Database: A database is a set of tables.
- Tables: A table/relation is a set of records/rows.
- Records: A record/row is a set of data fields/cells.
- Fields: A field/cell stores an atomic value.

This view is analogous to the multi-dataframe datasets we saw in unit 7.

The relational model was developed by E.F. Codd at IBM in the 1970s. The first RDBMS followed soon thereafter.

There are other forms of non-relational datasets we won't address in this class.

---
# Relational Schemata

.pull-left[
Each relation is structured as using a *schema*, which specifies the relation name and its attributes for that relation.

The example on the right gives the schema for a supplier-parts-jobs database, and includes some sample data.

The ID values are called *primary keys*; they uniquely identify each row in the relation.
]
.pull-right[
<pre>
 Supplier(ID, name, loginID, password)
 1 acme acme joshua
 2 ronco ronco ronco
 
 Part(ID, name, price)
 1 birdSeed $5
 2 TNT $100
 3 leadShot $10

Job(ID, name, address)
 1 StudentUnion Calvin
 2 NULL Amway
</pre>
]

???
Explain the dataset.

---
# Relational Schemata

.pull-left[
Each relation is structured as using a *schema*, which specifies the relation name and its attributes for that relation.

The example on the right gives the schema for a supplier-parts-jobs database, and includes some sample data.

The ID values are called *primary keys*; they uniquely identify each row in the relation.

This updated schema includes *foreign keys*, which are used to join the relations together. Foreign keys allow *one-to-many* and *many-to-many* relationships between relations.
]
.pull-right[
<pre>
 Supplier(ID, name, loginID, password)
 1 acme acme joshua
 2 ronco ronco ronco

Part(ID, name, price, supplierID)
 1 birdSeed $5 1
 2 TNT $100 1
 3 leadShot $10 1

Job(ID, name, address)
 1 StudentUnion Calvin
 2 NULL Amway

PartJob(PID, JID, quantity, date )
 1 1 10 10-20-2018
 2 2 2 10-21-2018
 1 2 3 10-21-2018
</pre>
]

???
Explain the dataset.

Give examples of key values, foreign keys, 1-m and m-m relationships.

---
# Normalization

Not all schemata are equally effective. One could imagine storing the example data as follows:

<pre>
SPJ(S-name, loginID, password, P-name, price, J-name, address, quantity, date )
 Acme acme joshua seed $5 Union Calvin 10 10-20-2018
</pre>

???
- We could store the SPJ data in a single table, which would be some sort of "log book".
- There's no need for keys or foreign keys here but there's a problem with redundancy/scalability, which can be seen as we add more data.

---
# Normalization

Not all schemata are equally effective. One could imagine storing the example data as follows:

<pre>
SPJ(S-name, loginID, password, P-name, price, J-name, address, quantity, date )
 Acme acme joshua seed $5 Union Calvin 10 10-20-2018
 Acme acme joshua TNT $100 NULL Amway 2 10-21-2018
 Acme acme joshua shot $10 NULL Amway 5 11-01-2018
 &hellip;
</pre>

This creates unacceptable redundancy.

???
Notice how we've had to repeat information on each line unnecessarily.

---
# Normalization

Not all schemata are equally effective. One could imagine storing the example data as follows:

<pre>
SPJ(S-name, loginID, password, &hellip;, P-name, price, &hellip;, J-name, address, &hellip;, quantity, date )
 Acme acme joshua &hellip; seed $5 &hellip; Union Calvin &hellip; 10 10-20-2018
 " " " " TNT $100 &hellip; NULL Amway &hellip; 2 10-21-2018
 " " " " shot $10 &hellip; " " " 5 11-01-2018
 &hellip;
</pre>

This creates unacceptable redundancy --- see the (unhealthy) use of &ldquo;ditto&rdquo; marks in this data.

???
The repetitions are easier to see with these "ditto" marks. The dittoed data here has to be the same on every line, which is problematic.
- We *waste storage space* storing the same thing over and over again. Computer memory is cheap, but it's not that cheap. Imagine storing dozens of fields for each supplier and having to repeat them every time that supplier sold something.
- We *risk inconsistency* in the database. If the supplier password gets changed, for any reason, in one place, we'll never be certain what the right value is.
- We have *trouble updating* information. If the company changes its password, we need to change that value in many places.

It&rsquo;s generally best to *normalize* relational schemata so that related data items (e.g., supplier name, supplier login, supplier password, &hellip;) are factored out into separate tables with primary keys and records are inter-related using *foreign keys*.

<pre>
Supplier(ID, name, loginID, password, &hellip;)
Part(ID, name, price, supplierID, &hellip;)
Job(ID, name, address, &hellip;)
PJ(PID, JID, quantity, date, &hellip;)
</pre>

???
Redundancy takes more storage space and allows inconsistency.

---
# Example: A Normalized SPJ Schema

.pull-left[
<pre>
Supplier(ID, name, loginID, password)
Part(ID, name, price, supplierID)
Job(ID, name, address)
PJ(PID, JID, quantity, date)
</pre>
]
.pull-right[
![SPJ Schema](images/spj-schema.png)
]

???
- This database schema is stripped down for use as an example, but it shows the key relational elements required for this course.
- Review the tables, records, fields, primary and foreign keys.