Lab 4: Simple vs. Smart - Steve's Chat Playground Lab Book

Lab Overview

In this lab, you'll compare local filters with AI-powered moderation. You'll learn about advanced prompt injection techniques and automated testing for security measures. This lab focuses on understanding the trade-offs between simple and sophisticated approaches.

Skill Level: 1-2 Prerequisites: OpenAI API Key

Exercises

Exercise 4.A: Advanced Moderation

Skill Level: 1 (2 for extra credit) Prerequisites: OpenAI API Key

Directions:

Repeat your activities from Lab 2, but switch between the local Sex/Violence filters and the "AI" version. Try things like misspellings. Compare the local and AI versions. Note the performance changes. Look for changes when entering your prompts or getting responses.

Extra Credit:

Inspect the JavaScript that drives the moderation filter (which is used to create the Sex and Violence filter behaviors).

View OpenAI Moderation Filter

Exercise 4.B: Advanced Prompt Injection

Skill Level: 1 (2 for extra credit) Prerequisites: OpenAI API Key

Directions:

Repeat your attempts from Lab 3 to do a prompt injection. Now turn on local, and see what it blocks. Then, uncheck "Prompt Injection (Local)" and check the "Prompt Injection (AI)" filter. Try things like prompt injections that include misspellings and foreign languages. See how it compares.

Extra Credit:

Inspect the prompt that drives the prompt injection filter.

View OpenAI Prompt Injection Prompt

Extra, Extra Credit:

Inspect the JavaScript that uses that prompt to create the filter.

View OpenAI Prompt Injection Filter

Exercise 4.C: Automated Testing

Skill Level: 1 (2 for extra, extra credit) Prerequisites: OpenAI API Key (you can do part of this without, but you won't get the full effect)

Directions:

Let's see how these work and compare. It's one thing to hunt and peck, but having a real evaluation suite is crucial. Navigate to the playground's live "Test Suite". Try the PromptInjection, sex and violence test suites. These may take a few minutes to run (and may also cost a few pennies). When they complete, they'll give you a summary that shows speed/performance as well as accuracy. Note the variations.

View Test Suite

Extra Credit:

Navigate to the test suite files and see how the various test cases are created and labeled.

View Test Data

Extra, Extra Credit:

In your local copy of the repo, add some of your own test cases (and/or remove some of the ones that are there) and rerun the tests! See what happens.

Key Learning Points

Comparing local vs. AI-powered filtering approaches
Understanding performance vs. accuracy trade-offs
Learning about advanced prompt injection techniques
Experiencing automated testing for security measures
Understanding the importance of comprehensive evaluation
Learning about multilingual and misspelling attacks

Next Steps

Once you've completed these exercises, you'll be ready to move on to Lab 5: Go Bananas, where you'll tackle advanced developer exercises and create your own custom security measures.