2026.06.01 AIガバナンス

Why Enterprise AI Projects Fail Security Review: Five Patterns and the New Regulatory Stakes

Many AI projects look strong in internal demonstrations. The model performs well. User feedback is positive. The business case is clear. Then the project reaches the information security committee, and stops moving.

This is not unusual. The gap between an AI system that works and one that clears enterprise security review is larger than most teams plan for, and it tends to surface at the same point in the project lifecycle.

Three concerns typical of where these reviews stall:

“We can demonstrate accuracy, but we cannot demonstrate auditability.”

“The retrieval layer pulls from sources our access controls were never designed to govern.”

“We have no answer for what happens when the model’s behavior shifts after deployment.”

The regulatory landscape surrounding enterprise AI has evolved significantly.

In Japan, the AI Promotion Act (AI推進法) has been fully effective since September 2025, establishing a framework that emphasizes voluntary cooperation and government-led coordination rather than direct penalties. Alongside it, the AI Guidelines for Business (updated to v1.2 in March 2026) continue to provide practical guidance for AI developers, providers, and users.

In the EU, the AI Act takes a different approach, introducing legally binding obligations for certain categories of AI systems. Rules for high-risk systems apply from August 2026, with potential extraterritorial implications for organizations providing AI systems or AI-enabled services into the EU market.

Security review is no longer only an internal hurdle. It is increasingly shaped by external expectations as well, and several of the patterns discussed below may have regulatory implications in addition to operational and security risks.

This article walks through five patterns that commonly cause enterprise AI projects to fail security review, and the habits that separate the projects which pass.

1. The PoC-to-Production Gap, Reframed01

The difficulty of moving AI from proof-of-concept to production is usually framed in technical terms. Accuracy drops on real data. Latency rises. Costs scale unpredictably. These problems are real, but they are not what kills most projects.

What kills most projects is the second gate. A PoC asks one question: does this work? A security review asks a different one: is this safe to release into production? Most teams build to clear the first question and assume the second will follow. It does not follow.

Two columns: what teams build for vs what reviewers check, separated by 'The Gap'. — Where projects build to the left of the gap, and reviews are scored on the right.

2. Five Patterns of Failure02

2-1. Prompts treated as throwaway strings, not versioned artifacts

In many projects, prompts live inside source files, configuration blobs, or copy-pasted between engineers’ notes. Reviewers ask a simple question. Which prompt produced the output that led to this decision, on which date, by whom? If the answer requires reconstructing git history and chat logs, the project has already failed the review.

Versioning prompts is not glamorous engineering. It is what makes the difference between a system reviewers can sign off on and one they cannot.

2-2. Retrieval pipelines that bypass existing access controls

RAG systems often get built as if access control is a downstream concern. A vector database ingests documents from across the enterprise, and the application queries it freely. The assumption is that filtering happens at the chat interface.

Reviewers correctly identify this as a privilege escalation surface. The model can surface information the requesting user has no clearance to see, with no trace, because the access decision was made at a layer that does not know about user identity.

2-3. No audit trail for AI-influenced decisions

When an AI-assisted workflow contributes to a customer-facing decision such as pricing, approval, or recommendation, auditors expect a reconstructable trail. The input, the retrieved context, the model version, the output, the human review step. Most production AI systems log some of these. Few log all of them. Almost none log them in a form auditors can read without engineering help.

2-4. Prompt injection treated as a research problem

Prompt injection is often discussed in academic terms, as if it were a future concern documented in research papers. In production, it is a present concern. Any system that ingests untrusted text such as emails, documents, or web content, and then passes that text to an LLM with privileges, has an active attack surface. Reviewers know this. Treating injection mitigations as optional is one of the fastest ways to fail review.

2-5. Model and retrieval drift with no monitoring

A system that performs well at launch will not necessarily perform well in three months. The model provider may release an update. The retrieval corpus drifts as documents are added and removed. Without continuous monitoring of output quality, retrieval relevance, and downstream business metrics, teams cannot answer the reviewer’s last question. How would you know if this system started to fail?

These patterns are not just operational failures. Each one now maps onto named obligations under the regulations in force. The next section walks through that mapping.

3. What This Means Under the New Rules03

The EU AI Act, applicable from August 2026 for high-risk systems, names concrete technical obligations in its articles on requirements for high-risk AI. Japan’s AI Guidelines for Business v1.2 frame obligations differently, separated across three roles: AI Developer, AI Provider, and AI User. A team building or deploying AI may fall into more than one of these roles, and the failure patterns above translate into named duties under both frameworks.

Failure pattern	EU AI Act (high-risk systems)	AI Guidelines for Business v1.2
Unversioned prompts	Art. 11 (Technical documentation); Art. 12 (Record-keeping)	Transparency obligations across Developer and Provider roles
Missing audit trails	Art. 12 (Record-keeping); Art. 26 (Deployer log retention, minimum six months)	Accountability obligations across all three roles
Prompt injection treated as research	Art. 15 (Accuracy, robustness, cybersecurity)	Technical robustness obligations on Provider role
Drift without monitoring	Art. 72 (Post-market monitoring)	Ongoing review obligations on Provider and User roles

The mapping is not exhaustive. Several patterns cut across multiple articles or roles. The point is that each of these five failures, until recently treated as engineering oversights or matters for internal review, now has an external counterpart with audit and enforcement attached.

4. What Successful Teams Do Differently04

The projects that clear security review tend to share a small number of habits.

They map AI components to existing enterprise controls early. Identity, access management, logging, change management, and incident response are not built from scratch. They are extended to cover the AI stack.
They design for auditability before optimizing for performance. A slightly slower system that can fully explain its decisions reaches production faster than a fast system that cannot.
They treat the retrieval layer as a privileged subsystem. Access controls are enforced at retrieval time, not at the application boundary.
They run security review continuously, not at the end. Reviewers are involved from architecture design, not handed a finished system to approve.

These habits do not slow projects down. They are, in practice, what lets projects move at all.

5. Summary05

In many enterprise environments, AI projects do not stall on model performance. They stall on unresolved questions of security, governance, and compliance.
The PoC-to-production gap is, in practice, a security and auditability gap.
The five most common failure patterns: unversioned prompts, retrieval bypassing access controls, missing audit trails, unaddressed prompt injection, and unmonitored drift.
Japan’s AI Promotion Act, the updated AI Guidelines for Business, and the EU AI Act now make these failures externally consequential, not just internal.
Successful teams treat security as a design constraint, not a final hurdle.

This is the first article in a series on building AI systems ready for enterprise deployment. Future articles will cover each of the five patterns in more depth, beginning with the retrieval and access control layer.

References

Cabinet Office of Japan, “AI Act, full implementation, toward the next phase” (October 2025)
Ministry of Economy, Trade and Industry & Ministry of Internal Affairs and Communications, AI Guidelines for Business v1.2 (March 2026)
European Union, Regulation (EU) 2024/1689 on Artificial Intelligence (AI Act)
NIST, AI Risk Management Framework (AI RMF 1.0)
OWASP, Top 10 for Large Language Model Applications

2026.06.12 AIセキュリティ