# Calvin Course Advisor Bot

Step 1: Get an LLM running

https://ollama.com/library/gemma3:1b

### Option 1: Run locally:

1. Install `ollama`.
2. Start the server: `ollama serve`.
3. Pull the model: `ollama pull gemma3:1b-it-qat`
  - If you have a lot of memory, or a good GPU, you can try `gemma3:4b-it-qat`.

If you do this, you can use:

```python
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
```

### Option 2: Use the Google Gemini API (talking the OpenAI protocol)

See the instrucitons in CS 375 Homework 1.

## Warm-Up: Structured Outputs

We want to ask the model to output a set of search queries. So we'll want to have the model output in a structured way that's easy for us to parse.

See https://platform.openai.com/docs/guides/structured-outputs?api-mode=chat for an intro.

https://ollama.com/blog/structured-outputs



In [43]:
import openai

First we'll use Pydantic to declare what type of output we want. Despite being named "model", this isn't an AI model; it's basically a data class (a struct).

In [83]:
from typing import Literal
from pydantic import BaseModel

class SearchTool(BaseModel):
    tool_name: Literal["search_course_catalog"] = "search_course_catalog"
    thinking: str
    queries: list[str]

example_search = SearchTool(
    thinking="The user wants to know some trivia.",
    queries=[
        "What is the capital of France?",
        "What is the largest mammal?",
    ])
example_search

SearchTool(tool_name='search_course_catalog', thinking='The user wants to know some trivia.', queries=['What is the capital of France?', 'What is the largest mammal?'])

In [115]:
json.dumps(SearchTool.model_json_schema())

'{"properties": {"tool_name": {"const": "search_course_catalog", "default": "search_course_catalog", "title": "Tool Name", "type": "string"}, "thinking": {"title": "Thinking", "type": "string"}, "queries": {"items": {"type": "string"}, "title": "Queries", "type": "array"}}, "required": ["thinking", "queries"], "title": "SearchTool", "type": "object"}'

Now let's ask the LLM to create these queries for us. We'll use `response_format` in the OpenAI API.

We'll need to do some prompt engineering to get reasonable results from this small model.

https://platform.openai.com/docs/guides/text?api-mode=chat

In [120]:
client = openai.OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
MODEL_NAME = "gemma3:1b-it-qat" # change this if you want to use a different model

completion = client.beta.chat.completions.parse(
    model=MODEL_NAME,
    messages=[
        {"role": "system", "content": f"""
Write 10 search queries for a course catalog that would find information relevant to the user's interest. The queries should match titles or descriptions of courses in an undergraduate program.

Example:
Student interest: "art"
Queries: ["art", "photography", "visual rhetoric", "painting", "sculpture", "art history", "graphic design", "digital media", "art theory", "contemporary art"]

Notes:
- Before responding, write a short thought about what kinds of courses might be relevant to the user's interest.
- Ensure that each query would match one or more specific courses.
       
The output should be JSON with the following schema: {json.dumps(SearchTool.model_json_schema())}
"""},
        {"role": "user", "content": "I'm looking for courses related to AI."},
    ],
    response_format=SearchTool,
    temperature=0.5
)

event = completion.choices[0].message.parsed
event

SearchTool(tool_name='search_course_catalog', thinking="Given the topic of AI, I'll focus on courses related to machine learning, deep learning, natural language processing, and potentially robotics.  I'll also include introductory courses to give a broad overview.", queries=['ai', 'machine learning', 'deep learning', 'natural language processing', 'robotics', 'artificial intelligence', 'algorithms', 'neural networks', 'data science', 'computer vision', 'ethics of ai'])

## Step 2: Make a simple retrieval system

We'll pull course descriptions from a Calvin course search endpoint.

In [85]:
sections_json_url = 'https://app.calvin.edu/coursesearch/AY25FA25_Sections.json'
# TODO: replace with our mirror.

import requests

sections_json = requests.get(sections_json_url)
sections_json.raise_for_status()
sections = sections_json.json()

In [86]:
len(sections)

1020

In [87]:
next(section for section in sections if section['SectionName'].startswith('CS 108'))
#[section for section in sections if 'programming' in section.get('CourseDescription', '').lower()]

{'AcademicLevel': 'Undergraduate',
 'AcademicPeriod': '2025 Fall (09/02/2025-12/18/2025)',
 'Campus': 'Grand Rapids Campus',
 'CourseNumber': '108',
 'DeliveryMode': 'In-Person',
 'CourseDescription': 'An introduction to computing as a problem-solving discipline. A primary emphasis is on programming as a methodology for problem solving, including: the precise specification of a problem, the design of its solution, the encoding of that solution, and the testing, debugging and maintenance of programs. A secondary emphasis is the discussion of topics from the breadth of computing including historical, theoretical, ethical and biblical perspectives on computing as a discipline. Laboratory. Lab fee - see catalog for details.',
 'SectionEndDate': '2025-12-18',
 'EnrolledCapacity': '8/16',
 'SectionHours': '3',
 'InstructionalFormat': 'Lecture',
 'Instructors': 'Rocky Chang （張蛟川）',
 'Locations': 'Science Building 372 - PC Classroom',
 'MeetingPatterns': 'MWF | 11:00 AM - 12:05 PM | 09/02/2025

In [88]:
course_descriptions = {
    section['SectionName'].split('-', 1)[0].strip(): (section["SectionTitle"], section["CourseDescription"])
    for section in sections
    if "CourseDescription" in section
    and section.get('AcademicLevel') == 'Undergraduate'
    and section.get('Campus') == 'Grand Rapids Campus'
}

In [89]:
len(course_descriptions)

430

In [90]:
def search_courses(query: str):
    """
    Search for courses that match the query.
    """
    query = query.lower()
    matches = []
    for course, (title, description) in course_descriptions.items():
        if query in title.lower() or query in description.lower():
            matches.append((course, title, description))
    return matches
search_courses("programming")


[('CS 106',
  'Introduction to Scientific Computation And Modeling',
  'An introduction to computing as a tool for science, emphasizing programming as a methodology for problem solving, quantitative data analysis, and simulation in science and mathematics. This includes in silico modeling of natural phenomena, precise specification of a problem, design of its algorithmic solution, testing, debugging, and maintaining software, using scripting to increase scientific productivity, and the use of existing scientific software libraries. A secondary emphasis is the discussion of breadth topics, including historical, theoretical, ethical and biblical perspectives on computing as a discipline. This course provides an alternative to CS 108, providing an introduction to computing focusing on scientific examples and applications. Laboratory. Lab fee - see catalog for details.'),
 ('CS 108',
  'Introduction to Computing',
  'An introduction to computing as a problem-solving discipline. A primary e

## A Complete Bot



In [105]:
scenario = "I want to do graduate school in computational biology. What undergrad courses could I take to prepare myself?"

messages = [
    {"role": "system", "content": """
You are a course advisor bot that can make search queries to a course catalog. You will help students find courses that match their interests.

The conversation will follow the following format:
1. The user will describe their interests. 
2. The assistant will write 10 search queries that would match titles or descriptions of courses in an undergraduate program. For example, if the student is interested in art, the assistant should query for "art", "photography", "visual rhetoric", etc.
3. The user will respond with a list of courses that match the queries.
4. The assistant will then suggest specific courses relevant to the user's interests.

Notes:
- Before responding, think about the user's interests.
- Ensure that each query would match one or more specific courses in the course catalog.
- Queries should be one or two words long
"""},
        {"role": "user", "content": scenario},
]

completion = client.beta.chat.completions.parse(
    model=MODEL_NAME,
    messages=messages,
    response_format=SearchTool,
    temperature=0.5,
)
event = completion.choices[0].message.parsed
print(event.thinking)
print('; '.join(event.queries))
event


Computational biology is a broad field – it involves things like data analysis, machine learning, and modeling. Let’s start by identifying core areas of study.  I’ll generate 10 search queries to help you find relevant undergraduate courses.
machine learning; data analysis; bioinformatics; statistics; programming; biology; genetics; computational modeling; algorithms; databases


SearchTool(tool_name='search_course_catalog', thinking='Computational biology is a broad field – it involves things like data analysis, machine learning, and modeling. Let’s start by identifying core areas of study.  I’ll generate 10 search queries to help you find relevant undergraduate courses.', queries=['machine learning', 'data analysis', 'bioinformatics', 'statistics', 'programming', 'biology', 'genetics', 'computational modeling', 'algorithms', 'databases'])

In [106]:
# find courses that would match *any* of the queries
matches = {
    course
    for query in event.queries
    for course in search_courses(query)
}
len(matches)

33

In [107]:
event.model_dump_json()

'{"tool_name":"search_course_catalog","thinking":"Computational biology is a broad field – it involves things like data analysis, machine learning, and modeling. Let’s start by identifying core areas of study.  I’ll generate 10 search queries to help you find relevant undergraduate courses.","queries":["machine learning","data analysis","bioinformatics","statistics","programming","biology","genetics","computational modeling","algorithms","databases"]}'

In [108]:
def format_matches_as_document(matches):
    """
    Format the matches as a document.
    """
    return f"{len(matches)} total matches:\n" + "\n".join(
        f"{course}: {title} - {description}"
        for course, title, description in matches
    )
formatted_matches = format_matches_as_document(matches)
print(formatted_matches)

33 total matches:
BIOL 160: Ecological and Evolutionary Systems - The basic concepts in ecological and evolutionary biology. Topics include: population ecology and genetics, community ecology, evolutionary processes and speciation, phylogenetics, adaptive biology, ecosystem dynamics, environmental degradation and environmental sustainability, and biodiversity. Students develop critical thinking skills by applying these concepts to biological challenges and problems at local, regional, and global scales. Lectures, in-class activities, and discussions. 
BIOL 161L: Cellular and Genetic Systems Lab - Students use prevailing methods to conduct lab experiments that test the effects of cooking on compounds with nutritional and other health benefits, thereby developing competencies for contemporary cellular and molecular biology research.Corequisite: BIOL 161 . Lab fee. See catalog for details.
CS 106: Introduction to Scientific Computation And Modeling - An introduction to computing as a tool

In [109]:
messages.append(
    {"role": "assistant", "content": event.model_dump_json()}
)
messages.append(
    {"role": "user", "content": formatted_matches}
)


In [110]:
messages

[{'role': 'system',
  'content': '\nYou are a course advisor bot that can make search queries to a course catalog. You will help students find courses that match their interests.\n\nThe conversation will follow the following format:\n1. The user will describe their interests. \n2. The assistant will write 10 search queries that would match titles or descriptions of courses in an undergraduate program. For example, if the student is interested in art, the assistant should query for "art", "photography", "visual rhetoric", etc.\n3. The user will respond with a list of courses that match the queries.\n4. The assistant will then suggest specific courses relevant to the user\'s interests.\n\nNotes:\n- Before responding, think about the user\'s interests.\n- Ensure that each query would match one or more specific courses in the course catalog.\n- Queries should be one or two words long\n'},
 {'role': 'user',
  'content': 'I want to do graduate school in computational biology. What undergrad 

In [111]:
class CourseRecommendation(BaseModel):
    course_code: str
    reasoning: str

class RecommendTool(BaseModel):
    tool_name: Literal["recommend_course"] = "recommend_course"
    thinking: str
    recommendations: list[CourseRecommendation]


In [112]:
completion = client.beta.chat.completions.parse(
    model=MODEL_NAME,
    messages=messages,
    response_format=RecommendTool,
    temperature=0.5,
)
event = completion.choices[0].message.parsed
print(event.thinking)
for rec in event.recommendations:
    try:
        description = course_descriptions[rec.course_code]
    except KeyError:
        description = ("Unknown course", "No description available")
    print(f"{rec.course_code}: {description[0]} - {description[1]}")
    print(f"Reasoning: {rec.reasoning}")


Based on the provided information, here’s a breakdown of the recommendations and a suggested approach to prioritizing them:
BIOL 345: Ecosystem Ecology and Management - The lives of human beings and countless other creatures are sustained by the goods and services resulting from the proper functioning of earth's ecosystems. As the human population places increasing pressure on these systems, the need for their careful stewardship and management grows. This course provides a detailed study of ecosystem structure and function, with special emphasis on local ecosystems, and the scientific basis for managing and restoring ecosystems. Specific topics include energy flow and nutrient cycling, biodiversity and endangered species management, conservation genetics, population dynamics, landscape ecology, and human dimensions of ecosystem management.Lectures, laboratories, case studies, and field investigations.
Reasoning: This course focuses on ecosystem ecology and management, a highly relevan