Lab Overview
Now things get interesting! Meet Oscar, our intentionally "broken" bot who demonstrates what can happen when AI systems go wrong. To understand why this matters, let's look at a real-world disaster: Microsoft's Tay chatbot. In 2016, Microsoft released Tay on Twitter as an experiment in conversational AI. Within hours, internet users had "hacked" Tay by feeding it racist and inflammatory content, which the bot then repeated and amplified. The result? Bad headlines, public embarrassment, and Microsoft pulling Tay offline after just 16 hours.
The tragedy is that this was entirely preventable. Simple content filters could have stopped Tay from repeating harmful language, saving Microsoft from a PR nightmare and protecting users from exposure to inappropriate content. In this lab, you'll see exactly how Microsoft could have saved itself - and how you can prevent similar disasters in your own AI systems.
This lab is your first taste of AI security in action. You'll witness firsthand how simple guardrails can transform a potentially dangerous AI system into a safe, controlled one. Through hands-on experimentation, you'll learn why content filtering isn't just a nice-to-have feature - it's essential for any AI system that interacts with users. We'll show you how basic keyword filters work, why they're important, and how they can be the difference between a helpful AI assistant and a liability. By the end, you'll understand the fundamental principle that every AI system needs multiple layers of protection, and you'll have the tools to implement the first line of defense.
Exercises
Exercise 2.A: Meet Oscar
Directions:
Go to the live app. Choose Oscar from the Bot picker. Have a conversation. Note that he isn't very charming - he's simulating a jailbroken bot like Tay from Chapter 1 of Steve's book.
Exercise 2.B: Analyze Oscar
Exercise 2.C: Your First Guardrails
Directions:
Open the guardrails panel using the toolbar button (find it with the tooltips if you're having trouble). Check the boxes for the Sex (Local) and Violence (Local) input and output filters. Now have a conversation. Trigger Oscar's destructive behaviors, feel free to get nasty with him too. Watch the guardrails in action.
Exercise 2.D: Inspect the Guardrails
Directions:
Go to the blocklists and see how they're constructed. Note, these are very simple for demo purposes.
Extra, Extra Credit:
Clone the repo and make your own copy on your machine. Expand the blocklists to include more terms or terms in other languages. Reload the app and test your expanded guardrails.
Key Learning Points
- Understanding how vulnerable AI systems can behave
- Learning about basic content filtering mechanisms
- Experiencing guardrails in action
- Understanding the importance of input and output filtering
- Seeing how simple blocklists can provide basic protection
Next Steps
Once you've completed these exercises, you'll be ready to move on to Lab 3: Locking the Front Door and Back Door, where you'll learn about prompt injection attacks and more advanced security measures.