Lab 13: LLM Chatbot and Document Tagging

Purpose: to practice using an LLM API from Python — conversations, system prompts, and tool calling — and to build something practical: an assistant that reads a folder of documents and tags them.

Overview

You’ll build a Chatbot class that wraps the OpenAI client library, then use it for a real task: tagging a folder of text files with topic labels. Media companies do this kind of work on video archives; support teams do it on customer emails; you’ll get the basic idea in a few dozen lines.

The lab uses an on-campus LLM server running Qwen3.5-9B, so there’s no API key to set up and no credit card needed.

Setup

Install the openai package: in Thonny, Tools > Manage packages, search for it and click Install.

Download lab13.zip and extract it (right-click → Extract All, or double-click and then drag the folder out) into your class folder. Do not edit files inside the zip — open the extracted folder. Inside you’ll find:

Update the header documentation in chatbot.py and tag_documents.py as usual.

Before you start, peek at the Appendix: OpenAI Client Library section of this week’s slides — it has every code pattern you’ll need.

Task 1: A minimal Chatbot class

Open chatbot.py. The chat method has a few blanks marked Task 1 TODO for you to fill in.

Task 2: System prompts

Task 3: Tagging documents

What is tagging? Tagging means assigning short labels — like news, tutorial, or interview — to a piece of content so it can be searched, filtered, or organized. Media companies tag video archives so editors can find “all the product reviews from last year.” Support teams tag customer emails to route them to the right team. You’re going to extract those labels automatically, using the LLM to read each document and decide which tags apply.

Now for the real job. Open tag_documents.py and tag_definitions.txt.

tag_definitions.txt looks something like:

interview: a one-on-one conversation with a guest
tutorial: step-by-step instructions for doing something
news: reporting on recent events
review: evaluation of a product or performance

In documents/ there are several short .txt files. Your job: for each file, use the chatbot to pick one or more tags that apply, then print the results.

Notice that, with an empty system prompt, the model often ends up trying to continue the document or asking what you want to do with it. To get better results, you need to give the model instructions about what you want.

Notice that each file gets a fresh Chatbot (one per file, since each decision should be independent of the others).

The system prompt should look something like:

Act as [role].

Available tags:
[put the contents of TAGS here]

The user will provide a document. Respond with the tags that apply to the document.

Use string concatenation or an f-string to build the final prompt. Test it and see how it works.

Task 4: Calling a Python function from the chatbot

Tool calling lets the model run Python code to get information it doesn’t have. Try this in a scratch file first:

from chatbot import Chatbot
bot = Chatbot()
print(bot.chat("How long until midnight?"))

The model will either make up an answer (confidently wrong!) or apologize for not knowing. Let’s give it a tool so it can find out.

The starter chatbot.py already includes a current_time() function and a second method, chat_with_tools, that runs the tool-call loop. The loop has a blank marked Task 4 TODO where you decide which tool to run.

Optional: structured output with Pydantic

Remember the inconsistent output from Task 3? Structured output is another way to get data you can parse. You describe the shape you want with a Pydantic model, and the OpenAI client forces the model to comply.

Optional extensions

Pick one of these if you have time. (Or both, or neither — Tasks 1-4 are the main event.)

Option A: Customer support chatbot with file-reading tools

Give the chatbot access to a folder of company policy documents, then let it answer customer questions by reading them on demand.

The lab download includes a policies/ folder with four short policy docs (returns, shipping, hours, subscriptions) for a fictional coffee company. You’ll add two tools — list_folder and read_file — both restricted to the policies folder so the model can’t read anything else on your computer.

Option B: Math helper with sympy

Give the chatbot symbolic-math tools via the sympy package (pip install sympy).

def differentiate(expression: str, variable: str) -> str:
    """Compute the derivative of an expression with respect to a variable."""
    from sympy import symbols, sympify, diff
    var = symbols(variable)
    expr = sympify(expression)
    return str(diff(expr, var))

Submission

Submit on Moodle under Lab 13. Both partners should submit separately.

Make sure:

Files to submit:

This lab was co-written with Claude (Anthropic).


Bonus: fetch your own documents from YouTube

If you want to try tagging real content, the youtube-transcript-api package will fetch the auto-generated transcript for any public YouTube video — no API key required. Install it (pip install youtube-transcript-api), grab the video ID from a YouTube URL (the part after v=), and:

from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import TextFormatter
from pathlib import Path

api = YouTubeTranscriptApi()
transcript = api.fetch("dQw4w9WgXcQ")  # replace with a real ID
text = TextFormatter().format_transcript(transcript)
Path("documents/my_video.txt").write_text(text)

Pull a handful of different kinds of videos (a tutorial, a product review, an interview, a news clip) and run your tagging script on them. Note: don’t redistribute the transcripts — they’re copyrighted by the video creators.