LLM Red Teaming with TrojAI Detect

AI security and auto red teaming

AI is evolving fast, and so are the threats to these systems. Organizations need better ways to test and secure their models before vulnerabilities make it into production. That’s where TrojAI Detect comes in.

We’re introducing new capabilities for auto red teaming AI models, giving you a centralized, automated way to test against real-world threats. With features like a dataset registry for custom test data, a model registry for better tracking and policy management, and automated adversarial testing, TrojAI Detect makes it easier than ever to identify risks, enforce security policies, and ensure model resilience.

Even better? We now map auto red teaming results directly to the 2025 OWASP Top 10 for LLMs. So now you’re not just catching failures, you’re aligning with industry standards to prioritize the risks that matter most.

Let’s dive into the details.

Dataset registry

TrojAI allows you to register your own datasets to test your AI models. This means you can use public, open source, or custom datasets. Choosing the right dataset is crucial when auto red teaming AI models because it ensures that test results are more controlled, comprehensive, and realistic.

AI models behave differently based on their training data distribution. AI applications built on top of these models behave differently based on their real-world inference data distribution. By selecting your own dataset for testing, you can focus on specific security risks, such as prompt injection attacks in LLMs or misclassification risks in image models. Furthermore, public benchmark datasets may not expose application-specific vulnerabilities as they lack application context. Custom datasets allow you to generate adversarial inputs, simulating events like RAG poisoning attacks or jailbreaking attempts, that are tailored to an individual AI application’s weaknesses.

With a centralized repository, TrojAI allows you to use off-the-shelf benchmark datasets that we provide or to bring your own custom datasets, giving you more flexibility and control over your testing to meet your unique needs. This includes production datasets for a particular use case or private benchmark datasets for risks that you care about, such as security, operational, or ethical risks.

Model registry

TrojAI has introduced a new model registry. In modern AI ecosystems, models are frequently dispersed across multiple environments. They may be embedded within distinct services, integrated into various APIs, or even operating unnoticed in shadow AI environments. These APIs could be hosted on hyperscaler platforms such as Amazon SageMaker, deployed within self-managed infrastructure, or orchestrated through bespoke API solutions.

One of the key things about generative AI models given their size, is that you’re typically not hosting the actual model files in the same process as your application code. They’re being served elsewhere. A model registry helps by letting you point to these locations and securely manage access using industry standards.

By having a centralized model registry, you have one place to keep track of all your models, no matter where they’re running. Here’s why that’s powerful:

One-to-many relationships: A single model might be used across multiple applications. Instead of configuring each app separately, you can set a blanket policy at the model level.
Policy inheritance: Any application using a registered model would automatically inherit its policies unless they’re specifically overridden at the application level.
Flexible policy layers: You can have global policies for all models and apps, specific policies for individual models, and eventually, even user-level policies within applications.

This kind of structure simplifies governance, security, and compliance while making it easier to scale AI deployments across an organization. The direction is clear: Model registries are becoming essential for managing AI at scale.

Auto red teaming AI models

Once you’ve selected your datasets, you’ll want to test them against the models you’ve registered. This is when you start red teaming. You can run tests on data as-is or you can apply manipulations on top of it to simulate various attack scenarios.

TrojAI allows you to apply manipulations that simulate adversarial attacks, distributional drift, ethical bias, and more. The idea is to see how these manipulations could impact your application in production before you go live.

TrojAI offers a wide range of transformations out of the box. The real value of these transformations is that they let you automate bounded distributional drift testing over time. You don’t need to spin up an eval Python script or Jupyter Notebook, with hardcoded cruft to load your data and manage your model. It doesn’t scale. TrojAI works across any model and any task. It’s universal, it’s automated, and it’s easily repeatable.

Auto red teaming results

Once you’ve run the tests, you get results that deliver insights. Testing shows you what categories of data your model is vulnerable to and sets thresholds for acceptable performance. You can ask how often is your model passing a certain category of test? Is it failing 50% of the time? 100% of the time? If so, that’s a red flag. It might mean you need a different model, you need to rethink your task, or you need to implement downstream guardrails like TrojAI Defend in your production systems.

You might be wondering, Well, what guardrails do I add? How can I monitor all this traffic effectively? That’s where you need a secondary control, like an AI application firewall, to help manage it for you.

Mapping vulnerabilities to the OWASP Top 10 for LLMs

Once you’ve got your auto red teaming results, TrojAI maps them to the 2025 OWASP Top 10 for LLMs. This is important because at the end of the day, security isn't just about identifying issues, it’s about knowing which issues matter the most and how they fit into industry-standard risks.

The OWASP Top 10 for LLMs lays out the biggest security threats facing large language models. It includes things like prompt injection attacks, training data poisoning, and sensitive information disclosure. If you're conducting penetration tests but not mapping your findings to these well-documented risks, you may have gaps in your security coverage—leaving your system exposed to liability. You might be identifying occasional issues, but are they the vulnerabilities that could critically compromise your system? That’s the question you need to answer with confidence, and you need to be able to systematically answer it again and again as your data and application concept drifts over time.

By mapping your red teaming results to OWASP, you get a clear risk profile, prioritization, and actionable next steps. Once you know where your vulnerabilities sit within the OWASP framework, you can align your mitigation strategies.

At the end of the day, auto red teaming is only as valuable as the insights you pull from it.

How TrojAI can help

TrojAI Detect gives you cutting-edge security for your LLM and GenAI models that automates previously manual processes.

Our mission at TrojAI is to enable the secure rollout of AI in the enterprise. We are a comprehensive AI security platform that protects the behavior of AI/ML and GenAI models and applications. Our best-in-class platform empowers enterprises to safeguard AI applications and models both at build time and run time. TrojAI Detect automatically red teams AI models, safeguarding model behavior and delivering remediation guidance at build time. TrojAI Defend is a firewall for AI that protects enterprises from real-time threats at run time.

By assessing the risk of AI model behaviors during the model development lifecycle and protecting model behavior at run time, we deliver comprehensive security for your AI models and applications.

Want to learn more about how TrojAI secures the largest enterprises globally with a highly scalable, high-performance, and extensible solution?

Visit us at troj.ai now.

LLM Red Teaming with TrojAI Detect

AI security and auto red teaming

Dataset registry

Model registry

Auto red teaming AI models

Auto red teaming results

Mapping vulnerabilities to the OWASP Top 10 for LLMs

How TrojAI can help

The latest from our blog

AI Model Scanning vs. AI Red Teaming

What Is a Data Extraction Attack?

AI Red Teaming: Insights from the Front Lines of GenAI Security

Secure your AI future today.