Lab Overview
Everything here is extra credit! Everything requires some admin/developer chops. Dive in if you like! These don't have a lot of structure, so it's more challenging, but that's part of the fun. You can do it! Review the extensibility docs to get started.
Exercises
Exercise 5.A: Create a Blocklist
Directions:
Look at the sex and violence examples. Create another topic like "Medical" and try to keep your bot from giving out medical advice. You might do this in real life to mitigate legal risk. Add it to the output filters list.
Reference the existing blocklists:
Exercise 5.B: Create a PII Guardrail
Directions:
Create a simple, local filter for personally identifiable information (PII) using regular expressions (RegEx). Look for patterns like phone numbers, social security numbers, etc. Go nuts! Add it to the output filters list and try it out.
Consider patterns like:
- Phone numbers (various formats)
- Social Security Numbers (XXX-XX-XXXX)
- Email addresses
- Credit card numbers
- IP addresses
Exercise 5.C: Make a Robust PII Guardrail
Directions:
Make a much more robust PII using an LLM as the judge. Use the prompt injection (AI) filter as an example. Add it to the output filters list and try it out.
Reference the AI prompt injection filter:
Extra Credit:
Create a PII test suite that tests the local and robust versions. Put it in /tests. Use the Prompt Injection suite as an example. Look at how the AI version and the LLM-based versions compare.
Extra, Extra Credit:
Tune both the simple PII filter and the advanced one and see how good you can make them against your test suite cases.
Key Learning Points
- Creating custom security filters from scratch
- Understanding regular expressions for pattern matching
- Building robust PII detection systems
- Creating comprehensive test suites
- Comparing simple vs. sophisticated approaches
- Extending the playground with custom functionality
Advanced Tips
- Start with simple patterns and gradually make them more sophisticated
- Test your filters with various edge cases and false positives
- Consider performance implications of your filters
- Document your custom filters for future reference
- Share your creations with the community!
Congratulations!
You've completed all the labs in Steve's Chat Playground Lab Book! You now have hands-on experience with:
- Basic AI system vulnerabilities
- Content filtering and guardrails
- Prompt injection attacks and defenses
- Output filtering and security measures
- Advanced moderation techniques
- Automated testing for security
- Creating custom security measures
Keep exploring, experimenting, and building secure AI systems!