Demos
Basic Tools
Python Scripts
Typer
Make
SLO Example
We’ll do separate slides for the basic tools and then return here for the SLO example.
The SLO system is more complicated than TagIfAI and the team project, too complicated to run here in real time.
The pre-processing stack requires tools that are hard to install, so I stick with the container, which has a Dockerfile carefully designed to install the tools properly.
It’s slow and it takes lots of memory.
It had a Kafka realtime dataflow, which is just patched here.
./data
files - All of these files are DVC controlled, even the ones that can be reproduced. This is complicated enough, that it was hard to remember how the dataflow worked.
src/dataset-preprocessor.py
- This script uses Fire rather than Typer, and does considerably more text pre-processing that the class examples. We won’t go through all that here, but note the full documentation of the Fire functions.
Makefile
- This has the same basic structure as the example, but has 7 processing steps, not counting the model.
I standardized the dataset filenames to help keep things sane.
Walk back through the prerequisite - target structure, starting at the top, with data/dataset.json.dvc
.
There are named targets in here, e.g., datasets
, which makes targets easier to understand, but perhaps isn’t necessary for simpler examples. I PHONY’d them to make sure they always run.
Review the SLO data stack/flow.
Python Scripts
We’re moving code chunks from Jupyter notebooks to Python scripts such as this one.
import pandas as pd
data_df = pd.read_csv("data/test.csv" )
# Add a (pointless) column for the appropriate way to address the solder.
data_df["address" ] = data_df["rank" ] + " " + data_df["name" ]
data_df.to_csv("data/dataset.csv" , index= False )
We can execute this on the CLI.
python src/process_script.py
Scripts like this running on the CLI are the basis of production system tooling.
They’re mostly based on what you’d see in a traditional, introduction to programming course.
You may get a configurational GUI of some sort, but you’ll always be running scripts. The CLI is the foundation here.
Typer
We can use this equivalent Typer script.
import pandas as pd
import typer
app = typer.Typer()
@app.command ()
def dataset_build(raw_data_filename, dataset_filename):
data_df = pd.read_csv(raw_data_filename)
data_df["address" ] = data_df["rank" ] + " " + data_df["name" ]
data_df.to_csv(dataset_filename, index= False )
if __name__ == "__main__" :
app()
And execute it on the CLI.
python src/process.py \
--raw-data-filename "data/test.csv" \
--dataset-filename="data/dataset.csv"
The script in the previous slide works, but it’s not very flexible. It would be better to be able to pass in the input and output files as arguments to a CLI command, which is what Typer does.
Notes
The @app.command()
decorator is used to define a CLI command.
Use \
to break the command across lines.
Use double dashes (--
) to name the CLI arguments
Use single dashes (-
) to separate words in the CLI arguments, with _
in the code.
I’ve left out the Annotated
version of the Typer script for simplicity. See the example code (linked from Moodle) for what you should do.
There are other Python tools that do this, but Typer is pretty good.
References
Make
We can now orchestrate the dataflow with a Makefile .
SHELL = /bin/bash
DATA_DIR := ./data
SRC_DIR := ./src
ALL: $( DATA_DIR ) /dataset.csv
$( DATA_DIR ) /test.csv: $( DATA_DIR ) /test.csv.dvc
dvc pull
$( DATA_DIR ) /dataset.csv: $( DATA_DIR ) /test.csv
python $( SRC_DIR ) /process.py \
--raw-data-filename $( DATA_DIR ) /test.csv \
--dataset-filename $( DATA_DIR ) /dataset.csv
.PHONY: clean
clean:
rm -f $( DATA_DIR ) /test.csv
rm -f $( DATA_DIR ) /dataset.csv
Talk through the features of the Makefile.
SHELL
Variables
Targets, particularly the default ALL
target.
target-prerequisite pairs, separated by colons
Commands, indented with tabs (!)
.PHONY
target, which is a special target that doesn’t correspond to a file, thus supporting general CLI commands.
I’ve also created targets for the following, as suggested by G. Mohandas. Show them in the provided code.
Venv - I do this frequently and can’t remember the details.
Styling - I also installed the black, flake8, and isort plugins in VSCode as well, and am not sure that I’m thrilled to have these tools auto-formatting my code.
Demo this in the data562-tagifai CLI, without showing the Makefile.
make clean
make
make style
make venv
(Skip really making this.)
Snapshot
See the live demo.