Discussion 376.3: When Agents Go Wrong | CS 375-376 Spring 2026 at Calvin University

LLM-powered agents can now browse the web, write and execute code, send emails, book appointments, and manage files. When they work, they’re impressive. When they fail, the consequences land on real people — and it’s not always clear who’s responsible.

This Discussion addresses the course objectives Overall-Impact and Overall-LLM-Failures, and connects to OG-LLM-Advanced.

Initial Post

Find a specific, documented case where an AI agent or AI-powered automation caused harm or failed in a consequential way. This could be:

An agent that took an action it shouldn’t have (wrong purchase, deleted data, sent an embarrassing message, leaked private information)
A customer service bot that made promises the company wouldn’t honor
An autonomous system that behaved unpredictably when given unexpected input
A tool-using agent that escalated a small error into a larger one through a chain of actions
A case where the absence of human oversight was the core problem

If you can’t find a documented real case, you may construct a realistic hypothetical based on a system you’ve used or built — but label it as hypothetical and explain why you think it’s plausible.

In your post (~150-250 words):

Describe what happened (or could happen). Be specific: what was the agent, what tools did it have access to, what went wrong, and who was affected?
Identify the failure point. Was this a problem with the model (hallucination, misunderstanding instructions), the system design (insufficient guardrails, too much autonomy), the deployment context (wrong use case, missing human-in-the-loop), or something else?
Who should be responsible? The developer? The deploying company? The user who trusted it? Make an argument.

Cite your source.

Replies

Reply to at least two classmates (~75-150 words each). Your replies should:

Propose a concrete fix or mitigation for the failure they described. Be specific — not “add more testing” but what you’d test, what guardrail you’d add, or where you’d require human approval.
Engage with their responsibility argument. Do you agree with who they held responsible? Would a different framing (e.g., product liability, professional ethics, or a concept like stewardship or the common good) change the answer?

Rubric

Initial post is on time and addresses the prompt
Post describes a specific agent failure with enough detail to understand what went wrong
Post identifies the failure point (model, system design, deployment, etc.)
Post makes a clear argument about responsibility
Replies propose a concrete, specific mitigation
Replies engage with the responsibility question using a reasoned framework
Writing is clear, specific, and well-cited