Lab Overview
In this lab, you'll meet Oscar, a simulated jailbroken bot like Tay from Chapter 1 of Steve's book. You'll learn about guardrails and how simple filters can prevent harmful content. This lab introduces basic security measures for AI systems.
Exercises
Exercise 2.A: Meet Oscar
Directions:
Go to the live app. Choose Oscar from the Bot picker. Have a conversation. Note that he isn't very charming - he's simulating a jailbroken bot like Tay from Chapter 1 of Steve's book.
Exercise 2.B: Analyze Oscar
Exercise 2.C: Your First Guardrails
Directions:
Open the guardrails panel using the toolbar button (find it with the tooltips if you're having trouble). Check the boxes for the Sex (Local) and Violence (Local) input and output filters. Now have a conversation. Trigger Oscar's destructive behaviors, feel free to get nasty with him too. Watch the guardrails in action.
Exercise 2.D: Inspect the Guardrails
Directions:
Go to the blocklists and see how they're constructed. Note, these are very simple for demo purposes.
Extra, Extra Credit:
Clone the repo and make your own copy on your machine. Expand the blocklists to include more terms or terms in other languages. Reload the app and test your expanded guardrails.
Key Learning Points
- Understanding how vulnerable AI systems can behave
- Learning about basic content filtering mechanisms
- Experiencing guardrails in action
- Understanding the importance of input and output filtering
- Seeing how simple blocklists can provide basic protection
Next Steps
Once you've completed these exercises, you'll be ready to move on to Lab 3: Locking the Front Door and Back Door, where you'll learn about prompt injection attacks and more advanced security measures.