DataFlow Tools

Keith VanderLinden
Calvin University

Demos

  • Basic Tools
    • Python Scripts
    • Typer
    • Make
  • SLO Example

Python Scripts

We’re moving code chunks from Jupyter notebooks to Python scripts such as this one.

import pandas as pd

data_df = pd.read_csv("data/test.csv")

# Add a (pointless) column for the appropriate way to address the solder.
data_df["address"] = data_df["rank"] + " " + data_df["name"]

data_df.to_csv("data/dataset.csv", index=False)

We can execute this on the CLI.

python src/process_script.py

Typer

We can use this equivalent Typer script.

import pandas as pd
import typer

app = typer.Typer()

@app.command()
def dataset_build(raw_data_filename, dataset_filename):
    data_df = pd.read_csv(raw_data_filename)
    data_df["address"] = data_df["rank"] + " " + data_df["name"]
    data_df.to_csv(dataset_filename, index=False)

if __name__ == "__main__":
    app()

And execute it on the CLI.

python src/process.py \
    --raw-data-filename "data/test.csv" \
    --dataset-filename="data/dataset.csv"

Make

We can now orchestrate the dataflow with a Makefile.

SHELL = /bin/bash
DATA_DIR  := ./data
SRC_DIR   := ./src

ALL: $(DATA_DIR)/dataset.csv

$(DATA_DIR)/test.csv: $(DATA_DIR)/test.csv.dvc
    dvc pull

$(DATA_DIR)/dataset.csv: $(DATA_DIR)/test.csv
    python $(SRC_DIR)/process.py \
        --raw-data-filename $(DATA_DIR)/test.csv \
        --dataset-filename $(DATA_DIR)/dataset.csv

.PHONY: clean
clean:
    rm -f $(DATA_DIR)/test.csv
    rm -f $(DATA_DIR)/dataset.csv

Snapshot

See the live demo.