Security for AI
Self-driving cars, facial recognition software, automated hiring tools, AI chatbots. AI is everywhere. And like any transformative technology, it brings with it a whole new set of security challenges. We’re not just defending codebases anymore. We're securing black boxes full of dynamic decision-making, unpredictable outputs, and hidden vulnerabilities.
So much of security for AI systems is still uncharted territory. The attack surface has exploded, and new threats have emerged seemingly overnight. The security community has responded with tools to help close the gaps. Sometimes making sense of these tools can be confusing.
In this blog, we look at model scanning and AI red teaming, two tools that couldn’t be more different in their approach to securing AI. Let’s break them down by defining what they are, how they work, when to use them, and why they’re both crucial to securing AI systems today.
What is model scanning?
Though model scanning might sound like traditional code scanning, it is entirely different. This is because AI models aren’t made of source code in the way traditional applications are. Instead, they’re composed of weights and parameters stored in serialized files, often in formats like Pickle, PyTorch, or ONNX.
Model scanning focuses on scanning these artifacts to find known vulnerabilities, particularly those that could be introduced during the model serialization and deserialization processes. Why does this matter? Because serialized model files can contain embedded code. When loading a model file, malicious code could be executed, almost like opening an infected email attachment.
Think of it this way: when you download a model from a public repository or share one internally, how do you know it hasn’t been tampered with? Model scanning gives you a yes or no answer. Is this file safe to use, or is it laced with malware? If the file is not safe, you can remediate the issue. Think of model scanning as part of your AI supply chain security plan.
When should you use model scanning?
Generally, you use model scanning in the following instances:
- Before deploying a model trained by a third party or shared across teams
- When downloading open-source models or using pre-trained weights from public hubs
- During routine audits of internal model repositories
- As part of a CI/CD pipeline in AI development workflows
Basically, any time you're moving or loading a model file, scanning should be part of your security hygiene checklist. Think of it like scanning software dependencies, but for AI models.
What is AI red teaming?
While model scanning focuses on the file itself, AI red teaming is all about the behavior of the model.
AI models are complex, probabilistic systems. You can give a model the same input three times and get three different outputs. Unlike traditional systems, you can’t simply trace a line of code to understand what went wrong or what will happen next. That unpredictability makes it incredibly hard to secure.
AI red teaming simulates real-world adversarial attacks to identify how a model behaves in the wild. Think of it like a live fire drill. Red teamers try to provoke harmful, biased, insecure, or manipulated responses from the model by using sophisticated prompting techniques. These attacks help organizations understand how their models might behave when under pressure and what kind of damage they could cause.
By understanding how an AI model behaves, you can start to identify its blind spots, pressure points, and failure modes before a malicious actor does. It allows organizations to build more resilient models by hardening them against known attack patterns, patching behavioral weaknesses, and reinforcing safety mechanisms. In short, red teaming turns unpredictability into insight, giving you a chance to fix issues before they become headlines.
When would you use AI red teaming?
AI red teaming is useful when you want to understand the behavior of the model, as in the following situations:
- Before releasing a model to production that interacts with users (for example, chatbots, recommendation engines, content generation tools)
- To test how your model handles sensitive prompts or attempts to circumvent safety filters
- During model fine-tuning or post-training evaluation
- As part of a larger responsible AI or trust and safety program
If you want to know whether your model could be manipulated into generating harmful content or leaking sensitive information, AI red teaming is your answer.
What’s the difference?
The biggest difference between model scanning and AI red teaming comes down to predictability.
Model scanning is deterministic. You scan a file, and you get a definitive answer: safe or unsafe. It’s like antivirus software in that it’s useful, even essential, but limited to known threats.
AI red teaming is anything but predictable. It’s like hiring an ethical hacker to stress-test your model’s behavior. You don’t know what you’re going to find until you dig in. It requires creativity, context, and a deep understanding of how models work and how they fail.
In short, model scanning looks at static files for known risks. AI red teaming probes live models to uncover unknown vulnerabilities in behavior.
Both are necessary, but they operate in completely different domains of AI risk.
Securing AI in the enterprise
Enterprises can’t afford to pick just one. Model scanning and AI red teaming serve different purposes and protect different layers of your AI stack.
Imagine deploying a language model that was trained on malicious data or manipulated files. Without model scanning, you might not know until it's too late. But even a perfectly clean model can behave in unexpected or harmful ways. Without red teaming, you risk deploying a system that behaves well in tests but collapses under real-world use.
Together, these approaches offer a layered defense. Model scanning ensures you’re not bringing malware into your environment. Red teaming ensures your model behaves as expected, even under stress.
The path to secure AI isn't a single tool or technique. It’s a coordinated strategy. One that includes testing the boundaries of both what a model is and what a model does.
Final thoughts
Securing AI is like trying to hit a moving target in the dark. The landscape is evolving fast, and threats can come from unexpected places. This includes everything from the data you train on, to the file you download, to the prompt an adversary submits.
Model scanning and AI red teaming give you visibility into two very different parts of the risk surface. One protects against known threats lurking in the shadows of your AI supply chain. The other shines a light on unpredictable behavior that could spiral into real-world harm.
Use both. Because when it comes to AI security, it's not just what the model contains. It's what the model does that matters.
How TrojAI can help
Our mission at TrojAI is to enable the secure rollout of AI in the enterprise. We are a comprehensive security platform for AI that protects AI models, applications, and agents. Our best-in-class platform empowers enterprises to safeguard AI systems both at build time and run time. TrojAI Detect automatically red teams AI models, safeguarding model behavior and delivering remediation guidance at build time. TrojAI Defend is an AI application firewall that protects enterprises from real-time threats at run time.
By assessing the risk of AI model behavior during the model development lifecycle and protecting it at run time, we deliver comprehensive security for your AI models and applications.
Want to learn more about how TrojAI secures the largest enterprises globally with a highly scalable, performant, and extensible solution?
Learn more at troj.ai now.