Greetings,
For almost two years now I've been sounding the alarm on the incredible insecurity of AI-enabled systems. Two weeks ago I gave a very different talk. I'm still warning people about the issues with AI, but this was the first talk where I got to show meaningful and encouraging progress in AI security.
I set out to demonstrate an attack on OpenClaw (the agentic fad released at the end of January whose website gets 35 million monthly visitors and whose Github stars put it at the #7 most popular project of all time). The idea was for the attack to be a one-shot email to compromise.
OpenClaw itself has been riddled with security problems (110 CVEs in 2.5 months, 60 high or critical severity), yet its protections against this particular attack are now actually very good.
In fact, OpenAI's models got 100x better at blocking prompt injections between November and March and Anthropic saw similar impressive improvements. 🤯
If you want to learn more including what they're doing right, what you should be doing, and where there are still catastrophic problems, check out my talk. SnowFROC should be posting the official live version soon (and we'll put it in the video description), but in the meantime you can see the clean recording.
..Patrick
PS - We just published an academic paper, "AI Data Blind Spots," in the peer reviewed ISSA Journal, which you can read if you're an ISSA member.