Agentic RAG built on Claude Haiku and Sonnet with MCP and AWS Bedrock

ruthson-zimmerman-FVwG5OzPuzo-unsplash 1

To help internal marketing specialists navigate large volumes of information with ease and clarity, Vention has developed an internal agentic RAG system powered by Anthropic AI models and hosted on AWS.

Background

This is an internal Vention project focused on streamlining data retrieval across multiple sources, including Salesforce, internal storage, and marketing systems.

Vention's marketing team works with large volumes of distributed information, which makes it difficult to access the right data quickly and consistently. To address this, Vention’s AI team designed an agentic RAG system on Claude as a deliberate architectural solution.

The platform is intended to help marketers navigate information faster and with greater accuracy, enabling quicker responses to new requests for marketing materials.

Project description

Vention’s marketing team works with large volumes of data from multiple sources:

Bragging lists that highlight the company’s key wins
Proprietary research on client personas and markets
Guidelines for specific tasks and marketing functions
External and internal events
Existing marketing materials

When a request for new marketing materials comes in, the main challenge lies in retrieving the latest and factually accurate data. Information is spread across multiple boards and owned by different teams and contributors, which makes it harder to access the right inputs quickly.

Building an AI agent for your team?

Vention designs agentic RAG systems on Claude and AWS Bedrock that turn scattered knowledge into usable outputs, taking you from PoC to production.

Talk to our AI engineering team

Our solution

Vention’s AI team worked closely with the Marketing department to design a RAG system that would meet the evolving needs of Vention’s marketing in the long term.

The solution included the following modules:

A client web app that marketers used to submit requests and retrieve model outputs
An orchestration layer built on AWS Bedrock Agents, routing requests between Claude Haiku and Sonnet and coordinating tool use
An AWS-hosted data layer, including a data lake, vector database, and MCP server for accurate data retrieval
A two-model (Haiku and Sonnet) RAG core that supported efficient and reliable information retrieval
AWS Lambda functions that handled automations and routed requests across the system

Web app

The web app served as the primary touchpoint between the system and marketing specialists. The interface was designed as a chat-based workspace where users interacted with Claude Haiku using natural language, reviewed their conversation history, and attached files when needed.

Request processing via Haiku

As the most lightweight model in Claude’s lineup, Haiku was used to handle request decomposition and routing across the system.

When a user submitted an input, Haiku broke it down into more specific terms to form a clearer prompt, identified what information was needed to fulfill the request, and retrieved relevant data from storage.

Then, depending on the scenario, one of two paths followed:

If the request focused on fact-checking (for example, “Do we have a case study with company X?”), Haiku returned a direct response
If the request required a creative output (for example, “Help me build a factual base for a Y deck”), Haiku gathered the relevant information from internal storage, tracked the request, and passed it to Sonnet for generation along with the user’s input and context. After Sonnet generated the output, Haiku returned it to the user.

Internal storage, MCP, caching, and API integrations

To improve awareness and efficiency, the system was planned to integrate with marketing’s internal Monday boards and cache frequent requests (such as bragging lists or recent case studies) for faster retrieval.

A local SQLite database was used to cache frequent queries and return responses directly from cache where available, which reduced API calls and improved response times.

Through integration with Monday via an MCP gateway, the agentic RAG system added another layer of context to the Marketing department’s workflows. For example, when a user requested information on security services for a deck, the agent could surface not only the latest materials but also highlight related content in progress that was likely to be completed soon, which, in turn, helps teams make more informed decisions about what to include.

Key stats

Weeks to develop a PoC

15%

Rework reduction

40%

Less time spent on fetching internal information

Results

The work Vention’s team has done has led to a 40% reduction in time spent retrieving information from internal sources, which has allowed marketing specialists to focus more on creative tasks.

Improved access to information increased situational awareness and reduced the time spent adjusting content in marketing materials by about 15%. Greater efficiency carried over into day-to-day workflows, where relevant information became easier to find and use across tasks.

Following these results, the team moved into PoC development and testing, with the next steps focused on validating the Monday integration and preparing for the MVP phase and a broader rollout.

Tech stack

Frontend

Next.js

Cloud

AWS

AWS Lambda

Bedrock Agents

AI

Claude Haiku

Claude Sonnet via AWS Bedrock

Agent architecture

MCP Server

Storage and databases

Amazon S3

OpenSearch Serverless

Integrations

Monday.com

API

MCP

Auth/security

Amazon Cognito

Amazon Secrets Manager

FAQs

What is an agentic RAG system?

An agentic RAG system combines retrieval-augmented generation with an orchestration layer that can break down user requests, fetch relevant data, and decide how to process it. Instead of relying on a single prompt-response flow, it routes tasks across models, tools, and data sources to produce more accurate and context-aware outputs.

Why use two Claude models instead of one?

Vention’s system uses Claude Haiku and Sonnet for distinct roles. Haiku handles request decomposition, routing, and data retrieval, where speed and cost efficiency are critical. Sonnet supports more complex reasoning and content generation. The split keeps the system responsive while maintaining high output quality.

How does MCP work in this architecture?

MCP (Model Context Protocol) acts as a gateway between the model and external systems. It defines how the agent accesses tools such as APIs, databases, and internal platforms like Monday. In Vention’s setup, MCP enables structured data retrieval and consistent interaction between the RAG system and connected services.