Natalie Wu

Edition

2024

Connect

Linkedin

How malicious instructions can override system prompts

PLAY

How malicious instructions can override system prompts

PLAY

Natalie Wu is a Software Engineer at Lakera, a Swiss AI security company specializing in protecting large language models (LLMs) from vulnerabilities like prompt injection and data leaks. She contributes to Lakera’s flagship product, Lakera Guard, which secures AI applications with real-time threat detection and content moderation. Wu has also been involved in developing tools like Gandalf, an interactive game designed to test and improve LLM security by challenging users to extract passwords from increasingly secure models

Natalie Wu is a Software Engineer at Lakera, a Swiss AI security company specializing in protecting large language models (LLMs) from vulnerabilities like prompt injection and data leaks. She contributes to Lakera’s flagship product, Lakera Guard, which secures AI applications with real-time threat detection and content moderation. Wu has also been involved in developing tools like Gandalf, an interactive game designed to test and improve LLM security by challenging users to extract passwords from increasingly secure models

Natalie Wu is a Software Engineer at Lakera, a Swiss AI security company specializing in protecting large language models (LLMs) from vulnerabilities like prompt injection and data leaks. She contributes to Lakera’s flagship product, Lakera Guard, which secures AI applications with real-time threat detection and content moderation. Wu has also been involved in developing tools like Gandalf, an interactive game designed to test and improve LLM security by challenging users to extract passwords from increasingly secure models

Outlines the progression of AI from conversational to single-agent to internet-of-agents, highlighting increasing risks. Natalie focuses on prompt injection vulnerabilities in language models, demonstrating how malicious instructions can override system prompts. She provides real-world examples of AI misuse, including chatbots being manipulated on social media. Wu concludes by introducing Lakera's API solution, which helps detect prompt injections, inappropriate content, and other security issues in AI applications.

Outlines the progression of AI from conversational to single-agent to internet-of-agents, highlighting increasing risks. Natalie focuses on prompt injection vulnerabilities in language models, demonstrating how malicious instructions can override system prompts. She provides real-world examples of AI misuse, including chatbots being manipulated on social media. Wu concludes by introducing Lakera's API solution, which helps detect prompt injections, inappropriate content, and other security issues in AI applications.