Alexey Piskovatskov 2/12/26 Alexey Piskovatskov 2/12/26

Security Risks to Watch When Implementing RAG AI — What Modern Teams Need to Know - Part One

Retrieval-Augmented Generation (RAG) has rapidly become a cornerstone in modern AI applications. By combining large language models (LLMs) with external knowledge sources — such as document stores, databases, APIs, or enterprise systems — RAG enables more accurate and contextually grounded responses. But this power comes with a unique set of security risks that can undermine trust, compliance, and safety if not properly understood and mitigated. As organizations increasingly adopt RAG for customer service, knowledge work, and enterprise insights, it’s essential to evaluate not only what RAG enables, but the security surface it creates.

1. Data Leakage and Unintended Exposure

One of the most significant risks in RAG systems stems from how contextual data is retrieved and fed to the LLM. Because RAG typically pulls text from internal documents, logs, or databases to augment model outputs:

Sensitive or classification-restricted information can be inadvertently surfaced in responses.
If retrieval filters or access controls are insufficient, unauthorized users may receive information they should not see.
Misconfigured embeddings or vector stores can lead to cross-tenant data bleed, especially in multi-tenant SaaS environments.

Example: A customer support RAG system pulling knowledge from internal HR or legal documents might inadvertently return confidential policy details to an external user if access controls aren’t enforced at the retrieval layer.

2. Unvalidated and Malicious Source Content

RAG systems rely on the assumption that the knowledge base is reliable and trustworthy. However, many real-world datasets are messy, incomplete, or contain adversarial content:

Embedding models may not effectively distinguish trustworthy sources from harmful or manipulated content.
Attackers could poison the knowledge store — e.g., by injecting false product specifications or malicious instructions upstream of the RAG pipeline.
Without validation and provenance tracking, compromised data can propagate into hallucinations or unsafe suggestions.

Trend Alert: As RAG becomes operationalized at scale, threat actors are exploring ways to manipulate vector stores through crafted inputs that disrupt retrieval quality or mislead model reasoning.

3. Lack of Access Control on Vector Stores

Vector databases like ChromaDB, FAISS, or Milvus often don’t include robust, built-in access control layers. This means:

Data retrieved by RAG can bypass traditional authorization checks unless explicitly enforced.
Developers must externally implement RBAC, attribute-based access controls, and request filtering to prevent oversharing.
Fine-grained controls (e.g., “only finance docs for finance users”) are rarely native to vector stores and must be engineered.

Risk: Exposed retrieval interfaces can unintentionally return confidential or regulated data — for example, customer identifiers or trade secrets — if the access model isn’t carefully designed.

4. Model Prompt Injection and Context Manipulation

RAG systems expand the attack surface by incorporating external data into prompts. This opens the door to:

Prompt injection attacks, where an adversary crafts input that alters the model’s reasoning path.
Context poisoning, where malicious content in the knowledge base steers outputs toward erroneous or harmful responses.

Even if the LLM itself is secure and private, the input context provided through retrieval can dictate behavior, making retrieval hygiene and sanitization just as critical as model security.

5. Compliance and Data Governance Challenges

RAG systems often consolidate information from multiple repositories:

This can complicate regulatory compliance (e.g., GDPR, HIPAA, PCI DSS), because data flows across systems without clear audit trails.
In regulated industries, sensitive data might inadvertently enter a training or context pipeline, violating usage policies if proper segmentation is not enforced.
Without clear versioning and data lineage for retrieved content, it’s difficult to demonstrate compliance during audits.

Important: Simply storing data securely is not enough — organizations must also govern how that data is used in retrieval and generation.

6. Insufficient Logging, Monitoring, and Alerting

Traditional security systems focus on network and application logs, but RAG introduces:

New retrieval logs
AI decision paths
External tool invocation logs (e.g., knowledge store queries, API calls)

Without structured observability across these components, it’s difficult to detect anomalous or malicious use patterns. A missing log here could mean an exploitation attempt goes unnoticed.

Alexey Piskovatskov 2/4/26 Alexey Piskovatskov 2/4/26

From Context to Control: How MCP and Contextual Engineering Align with NIST CSF

As enterprises move from experimentation to production-grade AI systems, security frameworks are no longer optional guardrails — they are foundational architecture. In a previous post, we explored how Model Context Protocol (MCP) and contextual engineering enable reliable, scalable AI by structuring how models receive, interpret, and act on information. In this continuation, we examine how those same mechanisms map naturally to the NIST Cybersecurity Framework (CSF) and why that alignment matters for organizations deploying AI in regulated, high-risk environments.

At a high level, NIST CSF provides a shared language for managing cybersecurity risk across five core functions: Identify, Protect, Detect, Respond, and Recover. MCP and contextual engineering do not replace this framework — they operationalize it for AI systems. Together, they create enforceable boundaries around what AI systems know, what they can do, and how their behavior can be monitored, audited, and corrected over time.

Identify: Defining AI Assets, Boundaries, and Risk

The Identify function focuses on understanding assets, dependencies, and risk exposure. In AI systems, this includes models, prompts, data sources, tools, and decision pathways — many of which are dynamic and opaque without intentional design.

MCP enables explicit declaration of context sources, tool permissions, and execution constraints. Contextual engineering formalizes why specific information is included and when it is appropriate. Together, they transform AI context from an implicit prompt blob into a defined system asset that can be inventoried, classified, and risk-assessed. This directly supports NIST requirements around asset management, governance, and risk understanding — especially critical when AI systems interact with financial, healthcare, or identity data.

Protect: Enforcing Least Privilege at the Context Layer

The Protect function is about safeguards — and for AI systems, the most fragile attack surface is often context itself. Over-broad prompts, unrestricted tool access, and uncontrolled memory introduce silent failure modes and security risk.

Contextual engineering applies least privilege principles to AI inputs, ensuring models only receive the minimum information required for a task. MCP reinforces this by constraining tool invocation, parameter scope, and execution rights at runtime. Rather than relying on policy documents or developer discipline, protection becomes enforceable by system design. This mirrors traditional security controls like IAM and network segmentation, but applied at the AI orchestration layer.

Detect: Observability Into AI Decisions and Behavior

Detection requires visibility — and AI systems are notoriously difficult to observe without structured instrumentation. MCP provides standardized hooks for logging context usage, tool calls, and decision pathways, while contextual engineering defines what signals matter.

This enables organizations to detect anomalies such as unexpected data access, abnormal tool usage, or behavioral drift. From a NIST CSF perspective, this supports continuous monitoring, event analysis, and detection processes that are essential for enterprise environments. Importantly, detection here is not limited to infrastructure-level threats; it extends to semantic and behavioral risks unique to AI systems.

Respond: Containing and Correcting AI Failures

When incidents occur, response speed and clarity matter. Poorly structured AI systems make it difficult to isolate failure causes or apply targeted remediation.

By structuring AI behavior through MCP-defined contracts and context layers, organizations can respond surgically — disabling specific tools, revoking context sources, or tightening execution constraints without shutting down entire systems. Contextual engineering ensures response actions do not introduce new ambiguity or unintended consequences. This maps directly to NIST’s emphasis on coordinated response, mitigation, and communication.

Recover: Learning and Improving After AI Incidents

Recovery is not just about restoration; it’s about improvement. For AI systems, this means refining prompts, adjusting context boundaries, updating safeguards, and strengthening controls based on real-world failures.

Because MCP and contextual engineering make AI behavior explicit and inspectable, post-incident analysis becomes actionable rather than speculative. Organizations can evolve their AI systems in a controlled way — strengthening resilience, updating governance rules, and feeding lessons learned back into system design. This closes the loop envisioned by NIST CSF’s recovery function.

Why This Matters for Enterprise AI

The convergence of MCP, contextual engineering, and NIST CSF represents a shift from AI as experimentation to AI as critical infrastructure. Enterprises do not need new security frameworks for AI — they need AI systems that are compatible with the frameworks they already trust.

By treating context as a governed asset and MCP as an enforcement mechanism, organizations can deploy AI systems that are auditable, defensible, and resilient by design. This alignment is what allows AI to move safely into core business workflows — not despite security requirements, but because of them.

Alexey Piskovatskov 1/28/26 Alexey Piskovatskov 1/28/26

Why MCP and Contextual Engineering Are Both Essential for Enterprise Security AI

As enterprises increasingly adopt AI to support security operations—vulnerability management, compliance reviews, incident triage, and access audits—many teams discover the same uncomfortable truth: AI systems fail not because models are weak, but because context is poorly designed and inconsistently delivered. This is where Model Context Protocol (MCP) and contextual engineering come together. Separately, each solves a different class of problems. Together, they form the foundation for secure, reliable, and auditable AI systems in enterprise environments.

MCP provides the infrastructure layer for context. It standardizes how models access external systems such as ticketing tools, code repositories, vulnerability scanners, and identity platforms. In an enterprise security setting, this means AI agents can retrieve information from Jira, GitHub, IAM systems, or compliance repositories through well-defined, permissioned interfaces rather than raw prompt injection or brittle API wrappers. MCP ensures access is controlled, scoped, and structured—critical requirements when dealing with sensitive security data.

However, access alone does not produce trustworthy results. This is where contextual engineering plays a decisive role. Contextual engineering defines what information the model should see, when it should see it, and how it should be framed. In a security review workflow, for example, the model should not ingest every vulnerability ever recorded. Instead, it should be guided to focus on active, high-severity findings, recent code changes, and relevant compliance controls. Contextual engineering enforces relevance, reduces noise, and prevents overgeneralized or speculative outputs.

Consider an AI-powered security assessment agent reviewing cloud infrastructure readiness against NIST CSF. MCP enables secure, read-only access to cloud configuration data, open Jira issues, recent deployment logs, and compliance documentation. Contextual engineering then constrains the model to evaluate only controls applicable to the organization’s architecture, exclude deprecated resources, and ground every recommendation in retrieved evidence. The result is not a generic security checklist, but a tailored, defensible assessment that aligns with enterprise risk priorities.

One of the most critical benefits of combining MCP with contextual engineering is hallucination prevention. In security contexts, hallucinations are not just inconvenient—they are dangerous. MCP ensures the model retrieves real, authoritative data rather than relying on training priors. Contextual engineering ensures the model is required to use that data, cite it, and reason within defined boundaries. This pairing transforms AI from an advisory guesser into a constrained decision-support system.

Together, MCP and contextual engineering also improve governance and auditability. Security teams must be able to explain why a recommendation was made, what data informed it, and who had access to that data. MCP provides traceable, versioned context sources and access logs. Contextual engineering provides structured reasoning paths, explicit assumptions, and documented decision criteria. This alignment is essential for regulated industries such as fintech, healthcare, and government.

As enterprises move toward agentic security workflows—where AI assists with triage, remediation planning, or compliance validation—the need for both MCP and contextual engineering becomes non-negotiable. MCP creates the secure rails; contextual engineering defines the rules of engagement. Without MCP, AI systems become unsafe. Without contextual engineering, they become unreliable. Together, they enable security teams to deploy AI that is not only powerful, but trustworthy by design.

Alexey Piskovatskov 12/8/25 Alexey Piskovatskov 12/8/25

Understanding HNSW in ChromaDB: The Engine Behind High-Performance Vector Search

As Retrieval-Augmented Generation (RAG) becomes a core architectural pattern for modern AI applications, the efficiency of vector search has never been more critical. Developers rely on vector databases not only to store high-dimensional embeddings but also to retrieve relevant information at low latency and high accuracy—especially when working at scale. ChromaDB, one of the most widely used open-source vector databases, achieves this performance through Hierarchical Navigable Small World (HNSW) graphs, a breakthrough data structure for approximate nearest-neighbor (ANN) search.

HNSW is an ANN algorithm that organizes vectors into a multi-layer graph structure. The upper layers form a sparse network that allows long-range “jumps” across embedding space, while the lower layers gradually increase in density and local connectivity. This hierarchical architecture enables the search algorithm to quickly traverse from coarse-grained layers to the fine-grained, densely connected bottom level—ultimately delivering near-logarithmic query complexity. Instead of scanning through millions of embeddings, the system efficiently navigates the graph to locate the nearest neighbors with high recall. This balance of speed and accuracy makes HNSW an ideal fit for latency-sensitive RAG applications, conversational agents, semantic search systems, and any workload relying on rapid vector similarity lookups.

A key aspect of HNSW’s flexibility in ChromaDB lies in the space parameter, which determines the distance function used throughout the index. ChromaDB natively supports several space types, including "cosine" (for directional similarity), "l2" (Euclidean distance), and "ip" (inner product). This choice directly influences retrieval behavior: cosine distance is ideal for normalized embeddings from large language models or sentence transformers; Euclidean distance is a natural fit for geometric embedding spaces; and inner product works well when maximizing alignment between vectors. Because HNSW operates directly within the chosen metric, ChromaDB can adapt to a wide range of embedding models without requiring custom indexing logic or post-processing steps. The result is a vector search engine that aligns closely with the mathematical properties of the underlying embeddings.

Beyond distance metrics, HNSW in ChromaDB exposes configurable parameters such as M (controlling the number of bi-directional links per node), ef_construction (defining graph search depth during index building), and ef (controlling search breadth at query time). These knobs give developers fine-tuned control over the accuracy-performance tradeoff. Higher values increase recall and precision but require more compute resources; lower values favor faster throughput. Because HNSW supports incremental insertion, new vectors can be added without rebuilding the index, making it well suited for dynamic workflows like real-time document ingestion or continuous model updates.

ChromaDB’s integration of HNSW extends beyond raw vector search. It pairs seamlessly with metadata and document-level filters, allowing developers to combine similarity search with structured constraints—such as filtering by category, timestamp, source type, or any custom attributes. In addition, the database’s flexible persistence options and client libraries make it easy to integrate HNSW-powered retrieval inside RAG pipelines, agent architectures, or operational ML workflows. Whether serving as an embedded engine within a Python application or deployed as a distributed service, ChromaDB maintains HNSW’s performance characteristics even as collections scale into millions of entries.

As organizations increasingly leverage RAG to ground LLMs in proprietary knowledge, retrieval speed and accuracy are becoming competitive differentiators. HNSW provides the computational backbone necessary to meet those demands, enabling ChromaDB to deliver fast, flexible, and high-recall vector search at scale. For engineers looking to build high-performance AI systems—from enterprise knowledge assistants to augmented analytics—understanding HNSW is key to unlocking ChromaDB’s full potential.

Alexey Piskovatskov 12/7/25 Alexey Piskovatskov 12/7/25

LangChain vs. LlamaIndex in RAG context

LangChain and LlamaIndex are two of the most widely used frameworks for building Retrieval-Augmented Generation (RAG) systems, but they serve different roles within the pipeline. LangChain provides a comprehensive toolkit for orchestrating the entire RAG workflow—retrieval, prompt construction, tool integrations, agents, and post-processing—while LlamaIndex focuses more deeply on data ingestion, indexing, and retrieval quality. In a typical RAG setup, LangChain functions as the “application orchestrator,” whereas LlamaIndex serves as the “data engine” responsible for building a highly optimized knowledge base.

In the data preparation and indexing stage, LlamaIndex offers advanced features for chunking, metadata extraction, document hierarchies, and hybrid or graph-based index structures. This makes it exceptionally strong when the quality of retrieved information depends on how the knowledge base is constructed. LangChain also supports document loading and embedding, but LlamaIndex is built specifically to give developers fine-grained control over how data is transformed into vector indexes. These indexing-centric capabilities make LlamaIndex especially effective for improving RAG retrieval relevance and precision.

When orchestrating the live retrieval and generation process, LangChain provides greater flexibility and modularity. It excels at building multi-step chains, coordinating multiple retrievers, calling external tools or APIs, routing queries, and composing different prompts. This makes LangChain a strong choice for complex RAG applications that require logic flows, evaluation loops, or agent-style reasoning. While LlamaIndex also supports retrieval pipelines and query engines, its primary focus is ensuring that the data is structured and accessible rather than orchestrating multi-step decision workflows.

Both frameworks integrate seamlessly with modern vector databases, including ChromaDB. ChromaDB is a popular choice for storing embeddings due to its open-source nature, high performance, and flexible metadata filtering. In LangChain, ChromaDB can be plugged in with just a few lines of code as a VectorStore, allowing LangChain chains and agents to retrieve relevant documents efficiently. LlamaIndex also supports ChromaDB as a storage backend, enabling developers to use LlamaIndex’s powerful indexing and query abstractions on top of the same vector database. This means teams can use ChromaDB as a shared, persistent vector layer regardless of whether the orchestration is done through LangChain, LlamaIndex, or a combination of both.

In production RAG deployments, LangChain and LlamaIndex often work side-by-side, and ChromaDB acts as a reliable vector storage layer for both. LlamaIndex can handle the data ingestion, embedding, and index construction, storing vectors inside ChromaDB. LangChain can then use that same ChromaDB instance to retrieve relevant chunks during runtime, build prompts, and drive multi-step reasoning flows. The result is a flexible, scalable, and high-quality RAG system: LlamaIndex optimizes the data and indexing layer, LangChain manages orchestration and logic, and ChromaDB provides a shared high-speed vector store that both can rely on.