Why Enterprises Need a Better Way to Evaluate AI Security

June 3, 2026 • Bob Walder • Blog

Enterprise adoption of AI has moved well beyond experimentation. Large language models, retrieval-enabled assistants, and increasingly autonomous agents are being connected to internal data, operational workflows, and decision-making processes across the business. That shift creates opportunity, but it also creates a problem: many organizations are deploying AI faster than they are developing the security discipline required to control it.

For security leaders, the challenge is no longer theoretical. It is practical and immediate. Questions about data exposure, prompt manipulation, delegated authority, policy enforcement, resilience under stress, and auditability are now showing up in real buying decisions. Yet the market for AI security products is still immature. Terminology is inconsistent. Product categories blur together. Demonstrations often show only narrow success cases. What buyers need is a more defensible way to decide which controls are meaningful, which are incomplete, and which claims can actually withstand scrutiny.

In March NSS Labs published two-part white paper series on enterprise AI security.

The first paper, AI Security Beyond the Model: What Enterprises Need to Care About — and Why, establishes the core argument that securing the model alone is not enough. In real deployments, the primary risk does not sit neatly inside the model. It emerges in the systems surrounding it: the data sources it can access, the tools it can invoke, the permissions it inherits, the policies that govern it, and the visibility the organization has into how it behaves. The paper examines these issues through the lenses of input integrity, output risk, resilience, policy governance, agentic behavior, observability, and GRC.

The second paper, Evaluating Enterprise AI Security: Questions Every Buyer Should Be Able to Answer, takes that architectural and governance framing and turns it into an evaluation framework. It is aimed squarely at CISOs, enterprise buyers, GRC leaders, and security architects who need to assess vendors under real-world conditions. The paper lays out the questions buyers should ask, the warning signs they should watch for, and the criteria they should use when comparing AI security controls for production environments.

A central theme across both papers is the role of runtime guardrails. These are the controls that operate outside the model, where enterprise risk actually materializes. They are responsible for enforcing policy, constraining access, mediating tool use, reducing data leakage, and generating the evidence that security and governance teams will need when something goes wrong. However sophisticated a model may be, those external controls are what determine whether AI can be operated safely at enterprise scale.

Another important point in the series is that AI security cannot be separated from governance. Enterprises do not just need controls that block obvious abuse. They need controls that can be explained, tested, monitored, tuned, and audited. They need to know what decisions were made, which policy triggered them, what data was accessed, which tools were invoked, and whether authority was properly constrained at every step. In short, they need an approach to AI security that is operationally credible, not merely technically impressive.

This work also reflects a broader need in the market. AI security is moving quickly, but rigorous evaluation standards are not keeping pace. That gap makes it harder for buyers to distinguish between sound engineering and attractive narratives. It also makes it harder for vendors with genuinely strong controls to prove their value in a disciplined way. Establishing clearer expectations for testing, transparency, and validation benefits both sides of the market.

NSS Labs believes that enterprise AI security needs the same kind of disciplined, evidence-driven thinking that has long been applied to other areas of cybersecurity. That means moving beyond architecture diagrams and product claims toward repeatable evaluation, realistic testing, and accountable governance. These two papers are intended to help establish that foundation.

This series is not the end of the work. It is the start of a larger effort to define how AI security controls should be evaluated and, ultimately, how they should be tested under independent conditions. As AI systems become more embedded in enterprise operations, the organizations that succeed will not be the ones that move fastest without discipline. They will be the ones that pair innovation with control, and ambition with evidence.

NSS Labs has already begun testing AI Protection Systems (AIPS). If your organization is evaluating AI security products, these papers are intended to provide a clearer starting point: what matters, what to ask, and what good should look like.

If you would like to see an AIPS product tested, now is the time to reach out to us.

Read the white papers: