Lab 2: Broken Bot

Understanding Security Vulnerabilities and Guardrails

Lab Overview

In this lab, you'll meet Oscar, a simulated jailbroken bot like Tay from Chapter 1 of Steve's book. You'll learn about guardrails and how simple filters can prevent harmful content. This lab introduces basic security measures for AI systems.

Skill Level: 1 Prerequisites: None

Exercises

Exercise 2.A: Meet Oscar

Skill Level: 1 Prerequisites: None
Directions:

Go to the live app. Choose Oscar from the Bot picker. Have a conversation. Note that he isn't very charming - he's simulating a jailbroken bot like Tay from Chapter 1 of Steve's book.

Exercise 2.B: Analyze Oscar

Skill Level: 1 Prerequisites: None
Directions:

Inspect his ruleset to see where his charming behavior comes from.

View Oscar's Rules

Exercise 2.C: Your First Guardrails

Skill Level: 1 Prerequisites: None
Directions:

Open the guardrails panel using the toolbar button (find it with the tooltips if you're having trouble). Check the boxes for the Sex (Local) and Violence (Local) input and output filters. Now have a conversation. Trigger Oscar's destructive behaviors, feel free to get nasty with him too. Watch the guardrails in action.

Exercise 2.D: Inspect the Guardrails

Skill Level: 1 (2 for extra credit) Prerequisites: None
Directions:

Go to the blocklists and see how they're constructed. Note, these are very simple for demo purposes.

View Violence Blocklist

View Sex Blocklist

Extra Credit:

Inspect the JavaScript that implements the blocklists.

View Blocklist Implementation

Extra, Extra Credit:

Clone the repo and make your own copy on your machine. Expand the blocklists to include more terms or terms in other languages. Reload the app and test your expanded guardrails.

Key Learning Points

Next Steps

Once you've completed these exercises, you'll be ready to move on to Lab 3: Locking the Front Door and Back Door, where you'll learn about prompt injection attacks and more advanced security measures.