Understanding DSPM for AI: The Future of Data Security

Focusing on Commvault, the recent developments in AI data security at the RSA Conference emphasise the importance of integrating Data Security Posture Management (DSPM) into AI systems. The “last mile” of governance, where sensitive data may be mishandled, is highlighted as a gap that organisations must address to ensure secure AI operations.

Hi everyone, Jen here. Happy Saturday. If you are catching this as part of your weekend reading, I hope you have a coffee in hand. This week, the conversation in the AI world changed significantly during the RSA Conference (RSAC). The industry is moving toward a future where security is "baked into" the data pipeline rather than bolted on, and it is good news to see Commvault addressing these issues. We’ve spent the last eighteen months obsessed with building models. Forward-looking thinkers in the AI industry realise that we have a massive "last mile" problem when it comes to securing the data that fuels these models.

The days of 'just restore the latest backup' are over, and businesses expect more from security. It is one thing to have a data strategy on PowerPoint or a flipchart. It is quite another to ensure that a Large Language Model (LLM) doesn’t accidentally serve up private salary details to a junior intern because of a poorly configured vector database. Let's look at how Data Security Posture Management (DSPM) is coming into play to solve the issues and meet expectations. 

What is DSPM for AI?

DSPM for AI is a security framework that provides real-time visibility and control over sensitive data as it interacts with AI systems. Unlike traditional data security, which focuses on data at rest in databases, DSPM for AI monitors data "in flight". It monitors how the data is used for model training, how it is retrieved during inference, and how it is transformed into AI-generated outputs.

At its core, DSPM for AI answers three questions:

  1. Where is my sensitive data located within the AI pipeline?
  2. Who (or what model) has access to it?
  3. What happens to that data after the AI processes it?

Interconnected data nodes over a desk representing a secure AI data governance network.

What is the 'Last Mile' of AI Data Governance?

The 'last mile' of AI data governance is the gap between existing data security policies and the actual, real-time usage of data by AI models. Organisations often have governance in place for their "Mile 1" (the source databases). However, they lose control the moment that data is ingested into an AI environment, converted into vectors, or summarised.

In the traditional world, we secured the perimeter. In the AI world, the perimeter is porous. Data moves from structured SQL databases into unstructured vector stores like Pinecone or Milvus. Once it is there, traditional Access Control Lists (ACLs) often disappear. This creates a "governance gap" where sensitive information becomes accessible to anyone with the right prompt.

Without solving this last mile, your AI projects are essentially operating in a "shadow" state, regardless of how well-governed your underlying data warehouses might be. You can read more about the risks of unmanaged AI in my article on Shadow AI: What Every Enterprise Needs to Know (Statistics, Risks & Solutions).

Understanding the 'Two Miles' of the AI Data Journey

To build a resilient AI strategy, we have to look at data in two distinct phases. I call these the "Two Miles" of the AI data journey.

Mile 1: Data AI Uses (The Input)

This involves the data used for training, fine-tuning, or Retrieval-Augmented Generation (RAG). The challenge here is Data Discovery. AI leads need to know if Personal Identifiable Information (PII) or Intellectual Property (IP) has migrated from a secure silo into a model training set. If you don't have a good foundation here, you are essentially building on a foundation of "garbage in, gospel out." For a deeper look at why this due diligence matters, see our recent post on The Quiet Revival of Data Due Diligence.

Mile 2: Data AI Produces (The Output)

This is where most organisations fail. When an AI summarizes a confidential legal document, that summary itself is new data. If that summary contains leaked PII or trade secrets and is stored in a non-secure cache, you have a fresh data breach on your hands. Mile 2 is about Downstream Control: ensuring that the outputs of AI are governed with the same rigour as the inputs.

How Commvault and Satori are Addressing AI Data Risk

One of the most significant announcements at RSAC came from Commvault. They are doubling down on the "Last Mile" by integrating DSPM directly into their cyber resilience platform. A key part of this strategy is their acquisition of Satori, a leader in data access control.

Why does this matter for AI Leads?
Most DSPM tools are good at telling you that you have a problem, but the Commvault solution is unique because it provides active enforcement. It can redact sensitive information in real-time before the model ever sees it. It does this by placing a "governance layer" between the data and the AI model.

Commvault’s new capabilities include:

  • Vector Database Coverage: Extending discovery and classification to vector stores, which are the primary storage for RAG-based AI applications.
  • Structured Data Discovery: Identifying sensitive entities across complex, hybrid cloud environments.
  • Real-time Access Management: Automatically applying "least-privilege" access so that AI developers only see the data they absolutely need for model performance.

This move signals a shift from "data backup" to "data intelligence." It is no longer enough to just recover data, as it used to be the case. Now, due to business and compliance pressures and expectations, the organisation must prove that the data you are recovering is secure and compliant.

Digital gateway illustrating real-time data redaction and automated AI security policy enforcement.

A Checklist for AI Leads

If you are leading an AI initiative, you cannot wait for a breach to think about DSPM. Use this checklist during your next strategy session to evaluate your "Last Mile" readiness:

  1. Inventory your Vector Stores: Do you have a list of every Pinecone, Milvus, or Weaviate instance in your organization?
  2. Map the Data Flow: Can you trace a piece of sensitive data from your SQL server through to the LLM prompt?
  3. Check for "Entitlement Creep": Are your data scientists using administrative credentials to access production data for "testing"?
  4. Audit AI Outputs: Do you have a mechanism to scan AI-generated summaries for PII leakage?
  5. Review the 'Last Mile' Controls: Are you relying on the AI model's "safety filters" (which can be bypassed) or are you enforcing security at the data layer?

For those just starting their journey, I recommend checking out Commvault's community resources to ensure your basics are covered before moving into advanced DSPM.

Implementation Timelines: What is Available Now?

It is important to be realistic about the technology, as the rate of change is phenomenal.

  • Available Now: Discovery and classification for standard cloud databases and initial support for major vector databases. Basic real-time redaction through tools like Satori.
  • Planned/Upcoming: Deep integration between AI "activity logs" and security alerts, and more granular "intent-based" access controls that can detect if a user is trying to "jailbreak" a model to get to sensitive data.

The industry is moving toward a future where security is "baked into" the data pipeline rather than bolted on. If you need help navigating these choices, read more about my AI Vision & Strategy Workshops help you map out a roadmap that fits your specific risk profile.

Modern office sky-bridge representing a strategic roadmap for AI data governance and resilience.

Frequently Asked Questions about DSPM for AI

How is DSPM for AI different from traditional DSPM?

Traditional DSPM focuses on data at rest in known databases. DSPM for AI extends this to include data in motion within AI pipelines, specifically targeting vector databases, model training sets, and AI-generated outputs. It addresses the unique ways AI models "consume" and "reproduce" data.

Why are vector databases a security risk?

Vector databases store data as mathematical representations (embeddings). Traditional security tools often cannot "read" these embeddings, making it difficult to identify if they contain sensitive information. Without specific DSPM for AI, these databases become a black hole for governance.

Can DSPM for AI prevent prompt injection?

While DSPM isn't a firewall for prompts, it acts as a critical backstop. If a prompt injection attack successfully tricks a model, DSPM ensures the model doesn't have the access to retrieve the sensitive data the attacker is looking for. It limits the "blast radius" of an attack.

Does this replace my existing Data Governance framework?

No. It operationalises it. Your governance framework defines the policies (e.g., "Interns cannot see HR data"). DSPM for AI is the technology that enforces that policy in the complex, high-speed environment of an AI application.

Final Thoughts

Solving the last mile of AI governance is a business necessity. As we move from experimental AI to production-grade intelligence, the organisations that win will be the ones that can trust their data.

If you're feeling overwhelmed by the technical debt of "Shadow AI" or unsure how to secure your vector stores, don't ignore it. Start with visibility, and move toward enforcement.

Stay strategic, and Happy Saturday!

Jen Stirrup
Founder, Jen Stirrup Consulting


For more insights on AI strategy and data leadership, visit our homepage or explore our Data Analyst's Toolkit.

Share the Post:

Discover more from Jennifer Stirrup: AI Strategy, Data Consulting & BI Expert | Keynote Speaker

Subscribe now to keep reading and get access to the full archive.

Continue reading