Purpose: to practice using an LLM API from Python — conversations, system prompts, and tool calling — and to build something practical: an assistant that reads a folder of documents and tags them.
Overview ¶
You’ll build a Chatbot class that wraps the OpenAI client library, then use it for a real task: tagging a folder of text files with topic labels. Media companies do this kind of work on video archives; support teams do it on customer emails; you’ll get the basic idea in a few dozen lines.
The lab uses an on-campus LLM server running Qwen3.5-9B, so there’s no API key to set up and no credit card needed.
Setup ¶
Install the openai package: in Thonny, Tools > Manage packages, search for it and click Install.
Download lab13.zip and extract it (right-click → Extract All, or double-click and then drag the folder out) into your class folder. Do not edit files inside the zip — open the extracted folder. Inside you’ll find:
chatbot.py— theChatbotclass (you’ll edit this)tag_documents.py— the tagging script (you’ll edit this)tool_schema.py— helper that builds JSON schemas from function type hints (do not modify)documents/— a folder of short text files to tagtag_definitions.txt— list of tags and descriptionspolicies/— only used in the optional Extension A
Update the header documentation in chatbot.py and tag_documents.py as usual.
Before you start, peek at the Appendix: OpenAI Client Library section of this week’s slides — it has every code pattern you’ll need.
Task 1: A minimal Chatbot class ¶
Open chatbot.py. The chat method has a few blanks marked Task 1 TODO for you to fill in.
Task 2: System prompts ¶
Task 3: Tagging documents ¶
What is tagging? Tagging means assigning short labels — like news, tutorial, or interview — to a piece of content so it can be searched, filtered, or organized. Media companies tag video archives so editors can find “all the product reviews from last year.” Support teams tag customer emails to route them to the right team. You’re going to extract those labels automatically, using the LLM to read each document and decide which tags apply.
Now for the real job. Open tag_documents.py and tag_definitions.txt.
tag_definitions.txt looks something like:
interview: a one-on-one conversation with a guest
tutorial: step-by-step instructions for doing something
news: reporting on recent events
review: evaluation of a product or performance
In documents/ there are several short .txt files. Your job: for each file, use the chatbot to pick one or more tags that apply, then print the results.
Notice that, with an empty system prompt, the model often ends up trying to continue the document or asking what you want to do with it. To get better results, you need to give the model instructions about what you want.
Notice that each file gets a fresh Chatbot (one per file, since each decision should be independent of the others).
The system prompt should look something like:
Act as [role].
Available tags:
[put the contents of TAGS here]
The user will provide a document. Respond with the tags that apply to the document.
Use string concatenation or an f-string to build the final prompt. Test it and see how it works.
Task 4: Calling a Python function from the chatbot ¶
Tool calling lets the model run Python code to get information it doesn’t have. Try this in a scratch file first:
from chatbot import Chatbot
bot = Chatbot()
print(bot.chat("How long until midnight?"))
The model will either make up an answer (confidently wrong!) or apologize for not knowing. Let’s give it a tool so it can find out.
The starter chatbot.py already includes a current_time() function and a second method, chat_with_tools, that runs the tool-call loop. The loop has a blank marked Task 4 TODO where you decide which tool to run.
Optional: structured output with Pydantic ¶
Remember the inconsistent output from Task 3? Structured output is another way to get data you can parse. You describe the shape you want with a Pydantic model, and the OpenAI client forces the model to comply.
Optional extensions ¶
Pick one of these if you have time. (Or both, or neither — Tasks 1-4 are the main event.)
Option A: Customer support chatbot with file-reading tools ¶
Give the chatbot access to a folder of company policy documents, then let it answer customer questions by reading them on demand.
The lab download includes a policies/ folder with four short policy docs (returns, shipping, hours, subscriptions) for a fictional coffee company. You’ll add two tools — list_folder and read_file — both restricted to the policies folder so the model can’t read anything else on your computer.
Option B: Math helper with sympy ¶
Give the chatbot symbolic-math tools via the sympy package (pip install sympy).
def differentiate(expression: str, variable: str) -> str:
"""Compute the derivative of an expression with respect to a variable."""
from sympy import symbols, sympify, diff
var = symbols(variable)
expr = sympify(expression)
return str(diff(expr, var))
Submission ¶
Submit on Moodle under Lab 13. Both partners should submit separately.
Make sure:
- The header comment at the top of each
.pyfile has your name, the date, and (if you worked with a partner) your partner’s name. - Your
tag_documents.pyhas the one-sentence observation from Task 3 filled in at the top. - Your
chatbot.pyhas your favorite persona exchange from Task 2 saved as a comment at the bottom.
Files to submit:
chatbot.pytag_documents.py- If you did Extension A (customer support): a short
support_conversation.txtwith one sample conversation. - If you did Extension B (math helper): the file where you defined your sympy tools (can be a scratch file like
math_helper.py).
This lab was co-written with Claude (Anthropic).
Bonus: fetch your own documents from YouTube
If you want to try tagging real content, the youtube-transcript-api package will fetch the auto-generated transcript for any public YouTube video — no API key required. Install it (pip install youtube-transcript-api), grab the video ID from a YouTube URL (the part after v=), and:
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import TextFormatter
from pathlib import Path
api = YouTubeTranscriptApi()
transcript = api.fetch("dQw4w9WgXcQ") # replace with a real ID
text = TextFormatter().format_transcript(transcript)
Path("documents/my_video.txt").write_text(text)
Pull a handful of different kinds of videos (a tutorial, a product review, an interview, a news clip) and run your tagging script on them. Note: don’t redistribute the transcripts — they’re copyrighted by the video creators.