Lab 13: LLM Chatbot and Document Tagging

Purpose: to practice using an LLM API from Python — conversations, system prompts, and tool calling — and to build something practical: an assistant that reads a folder of documents and tags them.

Overview ¶

You’ll build a Chatbot class that wraps the OpenAI client library, then use it for a real task: tagging a folder of text files with topic labels. Media companies do this kind of work on video archives; support teams do it on customer emails; you’ll get the basic idea in a few dozen lines.

The lab uses an on-campus LLM server running Qwen3.5-9B, so there’s no API key to set up and no credit card needed.

Setup ¶

Install the openai package: in Thonny, Tools > Manage packages, search for it and click Install.

Download lab13.zip and extract it (right-click → Extract All, or double-click and then drag the folder out) into your class folder. Do not edit files inside the zip — open the extracted folder. Inside you’ll find:

chatbot.py — the Chatbot class (you’ll edit this)
tag_documents.py — the tagging script (you’ll edit this)
tool_schema.py — helper that builds JSON schemas from function type hints (do not modify)
documents/ — a folder of short text files to tag
tag_definitions.txt — list of tags and descriptions
policies/ — only used in the optional Extension A

Update the header documentation in chatbot.py and tag_documents.py as usual.

Before you start, peek at the Appendix: OpenAI Client Library section of this week’s slides — it has every code pattern you’ll need.

Task 1: A minimal `Chatbot` class ¶

Open chatbot.py. The chat method has a few blanks marked Task 1 TODO for you to fill in.

Fill in the Task 1 TODO blanks in chat:

Send the request with self.client.chat.completions.create(...). Pass model=MODEL, messages=self.messages, and extra_body=NO_THINKING.
Append the assistant’s reply to self.messages as a dict {"role": "assistant", "content": reply}.

(There’s a second method, chat_with_tools, already partly written. You’ll come back to that in Task 4 — ignore it for now.)

Test it in a scratch file:

from chatbot import Chatbot
bot = Chatbot()
print(bot.chat("Five fun facts about the moon, one phrase each."))
print(bot.chat("Now the sun."))

The second reply should understand that “now the sun” means “facts about the sun in the same format.” That’s conversation memory working.

After those two calls, inspect bot.messages — it should look something like this:

[
    {"role": "user",      "content": "Five fun facts about the moon, one phrase each."},
    {"role": "assistant", "content": "1. The Moon is ..."},
    {"role": "user",      "content": "Now the sun."},
    {"role": "assistant", "content": "1. The Sun is ..."},
]

This list is exactly what gets sent to the model on every call, which is why it remembers the conversation.

Task 2: System prompts ¶

Give your chatbot a persona:

pirate = Chatbot(system_prompt="You are a pirate who only talks about the sea.")
print(pirate.chat("What did you have for breakfast?"))

Try two or three different personas. Save your favorite exchange (both your message and the reply) as a comment at the bottom of chatbot.py.

Task 3: Tagging documents ¶

What is tagging? Tagging means assigning short labels — like news, tutorial, or interview — to a piece of content so it can be searched, filtered, or organized. Media companies tag video archives so editors can find “all the product reviews from last year.” Support teams tag customer emails to route them to the right team. You’re going to extract those labels automatically, using the LLM to read each document and decide which tags apply.

Now for the real job. Open tag_documents.py and tag_definitions.txt.

tag_definitions.txt looks something like:

interview: a one-on-one conversation with a guest
tutorial: step-by-step instructions for doing something
news: reporting on recent events
review: evaluation of a product or performance

In documents/ there are several short .txt files. Your job: for each file, use the chatbot to pick one or more tags that apply, then print the results.

Notice that, with an empty system prompt, the model often ends up trying to continue the document or asking what you want to do with it. To get better results, you need to give the model instructions about what you want.

Notice that each file gets a fresh Chatbot (one per file, since each decision should be independent of the others).

The system prompt should look something like:

Act as [role].

Available tags:
[put the contents of TAGS here]

The user will provide a document. Respond with the tags that apply to the document.

Use string concatenation or an f-string to build the final prompt. Test it and see how it works.

Task 4: Calling a Python function from the chatbot ¶

Tool calling lets the model run Python code to get information it doesn’t have. Try this in a scratch file first:

from chatbot import Chatbot
bot = Chatbot()
print(bot.chat("How long until midnight?"))

The model will either make up an answer (confidently wrong!) or apologize for not knowing. Let’s give it a tool so it can find out.

The starter chatbot.py already includes a current_time() function and a second method, chat_with_tools, that runs the tool-call loop. The loop has a blank marked Task 4 TODO where you decide which tool to run.

Fill in the Task 4 TODO block with an if statement that checks name and calls the matching tool. For now the only tool is current_time, so you just need one branch:

if name == "current_time":
    result = current_time()

Test it:

from chatbot import Chatbot, current_time
bot = Chatbot()
print(bot.chat_with_tools("How long until midnight?", [current_time]))

The reply should include the actual current time. The [tool call: ...] and [tool result: ...] print statements let you see the model pause, call your function, and use the result in its reply.

Is the result correct? You might try running it multiple times. Think about what additional tools you could add to help the model answer questions like this more accurately.

(When you add more tools in the extensions, add another elif branch for each one.)

Optional: structured output with Pydantic ¶

Remember the inconsistent output from Task 3? Structured output is another way to get data you can parse. You describe the shape you want with a Pydantic model, and the OpenAI client forces the model to comply.

Define a Pydantic model at the top of tag_documents.py:

from pydantic import BaseModel

class TagResult(BaseModel):
    tags: list[str]
    reasoning: str

You’ll need to pip install pydantic (Manage Packages in Thonny).

Write a second tagging function that uses the OpenAI client directly (not your Chatbot class — tagging is one-shot, no conversation memory needed):

from openai import OpenAI
from chatbot import BASE_URL, MODEL, NO_THINKING

def tag_structured(contents):
    client = OpenAI(base_url=BASE_URL, api_key="not-needed")
    system_prompt = ...  # same as Task 3, but you can drop "comma-separated only"
    response = client.chat.completions.parse(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": contents},
        ],
        response_format=TagResult,
        extra_body=NO_THINKING,
    )
    return response.choices[0].message.parsed  # a TagResult object

Call it from __main__ and print result.tags and result.reasoning for each file. Compare to Task 3: result.tags is now a guaranteed list[str], and the bonus reasoning string lets you see what the model was thinking.

Optional extensions ¶

Pick one of these if you have time. (Or both, or neither — Tasks 1-4 are the main event.)

Option A: Customer support chatbot with file-reading tools ¶

Give the chatbot access to a folder of company policy documents, then let it answer customer questions by reading them on demand.

The lab download includes a policies/ folder with four short policy docs (returns, shipping, hours, subscriptions) for a fictional coffee company. You’ll add two tools — list_folder and read_file — both restricted to the policies folder so the model can’t read anything else on your computer.

Write a FolderTools class whose methods are the tools. The class remembers the allowed folder in __init__ and both methods reject any path that would escape it.

from pathlib import Path

class FolderTools:
    """Filesystem tools restricted to a single folder."""

    def __init__(self, base_dir):
        self.base = Path(base_dir).resolve()

    def _resolve(self, path):
        target = (self.base / path).resolve()
        if not target.is_relative_to(self.base):
            raise ValueError("path outside allowed folder")
        return target

    def list_folder(self, path: str = ".") -> str:
        """List the names of files inside a folder."""
        return "\n".join(p.name for p in self._resolve(path).iterdir())

    def read_file(self, path: str) -> str:
        """Read the contents of a file."""
        return self._resolve(path).read_text()

Give the chatbot a system prompt explaining it’s a support agent for Riverbend Coffee Roasters and can read files in the policy folder. Then pass the class’s methods as tools:

fs = FolderTools("policies")
bot = Chatbot(system_prompt="You are a support agent for ...")
bot.chat_with_tools(
    "I opened a bag of coffee - can I return it?",
    [fs.list_folder, fs.read_file],
)

Try a few customer questions:

“I opened a bag of coffee — can I return it?”
“When will my order arrive if I’m in Ontario?”
“It’s the 26th and I want to skip next month’s subscription. Can I?”

Save a sample conversation in a comment or a text file.

Option B: Math helper with sympy ¶

Give the chatbot symbolic-math tools via the sympy package (pip install sympy).

def differentiate(expression: str, variable: str) -> str:
    """Compute the derivative of an expression with respect to a variable."""
    from sympy import symbols, sympify, diff
    var = symbols(variable)
    expr = sympify(expression)
    return str(diff(expr, var))

Submission ¶

Submit on Moodle under Lab 13. Both partners should submit separately.

Make sure:

The header comment at the top of each .py file has your name, the date, and (if you worked with a partner) your partner’s name.
Your tag_documents.py has the one-sentence observation from Task 3 filled in at the top.
Your chatbot.py has your favorite persona exchange from Task 2 saved as a comment at the bottom.

Files to submit:

chatbot.py
tag_documents.py
If you did Extension A (customer support): a short support_conversation.txt with one sample conversation.
If you did Extension B (math helper): the file where you defined your sympy tools (can be a scratch file like math_helper.py).

This lab was co-written with Claude (Anthropic).

Bonus: fetch your own documents from YouTube

If you want to try tagging real content, the youtube-transcript-api package will fetch the auto-generated transcript for any public YouTube video — no API key required. Install it (pip install youtube-transcript-api), grab the video ID from a YouTube URL (the part after v=), and:

from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import TextFormatter
from pathlib import Path

api = YouTubeTranscriptApi()
transcript = api.fetch("dQw4w9WgXcQ")  # replace with a real ID
text = TextFormatter().format_transcript(transcript)
Path("documents/my_video.txt").write_text(text)

Pull a handful of different kinds of videos (a tutorial, a product review, an interview, a news clip) and run your tagging script on them. Note: don’t redistribute the transcripts — they’re copyrighted by the video creators.