Lab Overview
Ready to become an AI security engineer? This is where the real fun begins! Lab 5 is your chance to apply everything you've learned and create your own security solutions from scratch. While the previous labs taught you about existing vulnerabilities and defenses, this lab empowers you to build the next generation of AI security tools that could one day protect real-world systems.
We live in a constantly evolving threat landscape. Attackers are always getting smarter, and as the use-cases for GenAI expand, so do the ways in which these systems can be attacked. The defenses that work today may not be enough tomorrow. That's why it's critical to learn how to customize and expand your security measures—adapting to new threats and new applications as they emerge. This lab is your chance to practice those skills in a safe, hands-on environment.
You'll tackle real-world challenges like protecting against medical advice liability (a common concern for healthcare AI systems), detecting personally identifiable information (essential for privacy compliance), and creating robust detection systems that can adapt to new threats. These aren't just academic exercises - they're the same problems that companies face when deploying AI systems in production, and the solutions you develop here could form the foundation for professional security tools. By the end of this lab, you'll have the skills and confidence to secure AI systems in the real world, and you'll understand why the field of AI security is both challenging and incredibly rewarding. This is where you transition from learning about AI security to becoming an AI security practitioner.
Exercises
Exercise 5.A: Create a Blocklist
Directions:
Look at the sex and violence examples. Create another topic like "Medical" and try to keep your bot from giving out medical advice. You might do this in real life to mitigate legal risk. Add it to the output filters list.
Reference the existing blocklists:
Exercise 5.B: Create a PII Guardrail
Directions:
Create a simple, local filter for personally identifiable information (PII) using regular expressions (RegEx). Look for patterns like phone numbers, social security numbers, etc. Go nuts! Add it to the output filters list and try it out.
Consider patterns like:
- Phone numbers (various formats)
- Social Security Numbers (XXX-XX-XXXX)
- Email addresses
- Credit card numbers
- IP addresses
Exercise 5.C: Make a Robust PII Guardrail
Directions:
Make a much more robust PII using an LLM as the judge. Use the prompt injection (AI) filter as an example. Add it to the output filters list and try it out.
Reference the AI prompt injection filter:
Extra Credit:
Create a PII test suite that tests the local and robust versions. Put it in /tests. Use the Prompt Injection suite as an example. Look at how the AI version and the LLM-based versions compare.
Extra, Extra Credit:
Tune both the simple PII filter and the advanced one and see how good you can make them against your test suite cases.
Key Learning Points
- Creating custom security filters from scratch
- Understanding regular expressions for pattern matching
- Building robust PII detection systems
- Creating comprehensive test suites
- Comparing simple vs. sophisticated approaches
- Extending the playground with custom functionality
Advanced Tips
- Start with simple patterns and gradually make them more sophisticated
- Test your filters with various edge cases and false positives
- Consider performance implications of your filters
- Document your custom filters for future reference
- Share your creations with the community!
Congratulations!
You've completed all the labs in Steve's Chat Playground Lab Book! You now have hands-on experience with:
- Basic AI system vulnerabilities
- Content filtering and guardrails
- Prompt injection attacks and defenses
- Output filtering and security measures
- Advanced moderation techniques
- Automated testing for security
- Creating custom security measures
Keep exploring, experimenting, and building secure AI systems!