Lab Overview
In this lab, you'll compare local filters with AI-powered moderation. You'll learn about advanced prompt injection techniques and automated testing for security measures. This lab focuses on understanding the trade-offs between simple and sophisticated approaches.
Exercises
Exercise 4.A: Advanced Moderation
Directions:
Repeat your activities from Lab 2, but switch between the local Sex/Violence filters and the "AI" version. Try things like misspellings. Compare the local and AI versions. Note the performance changes. Look for changes when entering your prompts or getting responses.
Extra Credit:
Inspect the JavaScript that drives the moderation filter (which is used to create the Sex and Violence filter behaviors).
Exercise 4.B: Advanced Prompt Injection
Directions:
Repeat your attempts from Lab 3 to do a prompt injection. Now turn on local, and see what it blocks. Then, uncheck "Prompt Injection (Local)" and check the "Prompt Injection (AI)" filter. Try things like prompt injections that include misspellings and foreign languages. See how it compares.
Extra Credit:
Inspect the prompt that drives the prompt injection filter.
Extra, Extra Credit:
Inspect the JavaScript that uses that prompt to create the filter.
Exercise 4.C: Automated Testing
Directions:
Let's see how these work and compare. It's one thing to hunt and peck, but having a real evaluation suite is crucial. Navigate to the playground's live "Test Suite". Try the PromptInjection, sex and violence test suites. These may take a few minutes to run (and may also cost a few pennies). When they complete, they'll give you a summary that shows speed/performance as well as accuracy. Note the variations.
Extra Credit:
Navigate to the test suite files and see how the various test cases are created and labeled.
Extra, Extra Credit:
In your local copy of the repo, add some of your own test cases (and/or remove some of the ones that are there) and rerun the tests! See what happens.
Key Learning Points
- Comparing local vs. AI-powered filtering approaches
- Understanding performance vs. accuracy trade-offs
- Learning about advanced prompt injection techniques
- Experiencing automated testing for security measures
- Understanding the importance of comprehensive evaluation
- Learning about multilingual and misspelling attacks
Next Steps
Once you've completed these exercises, you'll be ready to move on to Lab 5: Go Bananas, where you'll tackle advanced developer exercises and create your own custom security measures.