Retrieval in Kastrax RAG Systems ✅
Retrieval is the critical middle step in any RAG (Retrieval-Augmented Generation) system, responsible for finding the most relevant information to answer user queries. Kastrax provides a comprehensive retrieval system with multiple strategies to ensure high-quality, relevant results.
Retrieval Architecture ✅
Kastrax’s retrieval system is built on a flexible, multi-layered architecture that enables sophisticated information retrieval:
- Query Processing: Transforms user queries into vector embeddings and structured search parameters
- Vector Search: Performs similarity search against embedded documents in vector databases
- Metadata Filtering: Applies structured filters to narrow down results based on document attributes
- Reranking: Refines initial results using more sophisticated relevance algorithms
- Result Processing: Formats and prepares retrieved information for generation
This architecture allows for a range of retrieval strategies, from simple vector similarity search to complex hybrid approaches combining multiple retrieval methods.
How Retrieval Works ✅
The retrieval process in Kastrax follows these key steps:
- Query Embedding: The user’s query is converted to a vector embedding using the same model used for document embeddings
- Vector Similarity: This embedding is compared to stored document embeddings using vector similarity metrics (cosine, dot product, or Euclidean distance)
- Result Ranking: The most similar chunks are retrieved and ranked by similarity score
- Optional Processing: Retrieved chunks can be further processed through:
- Metadata filtering to narrow results based on document attributes
- Reranking to improve relevance using more sophisticated algorithms
- Hybrid retrieval combining vector search with keyword or graph-based approaches
Basic Retrieval ✅
The simplest approach is direct semantic search, which uses vector similarity to find chunks that are semantically similar to the query:
import ai.kastrax.rag.embedding.OpenAIEmbedder
import ai.kastrax.rag.vectordb.PgVectorAdapter
import ai.kastrax.rag.retrieval.RetrievalOptions
// Create an embedder for query embedding
val embedder = OpenAIEmbedder(
options = EmbeddingOptions(
modelName = "text-embedding-3-small",
apiKey = System.getenv("OPENAI_API_KEY")
)
)
// Generate embedding for the query
val query = "What are the main points in the article?"
val queryEmbedding = embedder.embedText(query)
// Create a vector database adapter
val vectorDB = PgVectorAdapter(
connectionString = System.getenv("POSTGRES_CONNECTION_STRING")
)
// Perform semantic search
val results = vectorDB.search(
indexName = "document_embeddings",
embedding = queryEmbedding,
limit = 10,
minScore = 0.7 // Only return results with similarity score >= 0.7
)
// Process the results
results.forEach { result ->
println("Score: ${result.score}")
println("Content: ${result.chunk.content.take(100)}...")
println("Metadata: ${result.chunk.metadata}")
println()
}
Results include both the document content and a similarity score:
// Example search result
SearchResult(
chunk = EmbeddedChunk(
content = "Climate change poses significant challenges...",
embedding = floatArrayOf(...), // Vector embedding
metadata = mapOf(
"source" to "article1.txt",
"page" to 1,
"category" to "environment"
)
),
score = 0.89 // Similarity score (0.0-1.0)
)
Using the Retrieval Engine
Kastrax provides a high-level RetrievalEngine
that simplifies the retrieval process:
import ai.kastrax.rag.retrieval.RetrievalEngine
import ai.kastrax.rag.retrieval.RetrievalOptions
// Create a retrieval engine with the vector database
val retrievalEngine = RetrievalEngine(
vectorDB = vectorDB,
embedder = embedder
)
// Retrieve relevant chunks for a query
val retrievedChunks = retrievalEngine.retrieve(
query = "What are the main points in the article?",
options = RetrievalOptions(
indexName = "document_embeddings",
limit = 5,
minScore = 0.7
)
)
// Process the retrieved chunks
retrievedChunks.forEach { result ->
println("Score: ${result.score}")
println("Content: ${result.chunk.content}")
}
The RetrievalEngine
handles query embedding and search in a single operation, making it easier to implement retrieval in your applications.
Advanced Retrieval Strategies ✅
Metadata Filtering
Metadata filtering allows you to narrow down search results based on document attributes. This is particularly useful when you have documents from different sources, time periods, or with specific characteristics. Kastrax provides a unified MongoDB-style query syntax that works across all supported vector databases.
Here are examples of different filtering approaches:
import ai.kastrax.rag.retrieval.RetrievalEngine
import ai.kastrax.rag.retrieval.RetrievalOptions
// Create a retrieval engine
val retrievalEngine = RetrievalEngine(vectorDB, embedder)
// Simple equality filter
val results1 = retrievalEngine.retrieve(
query = "What are the main points in the article?",
options = RetrievalOptions(
indexName = "document_embeddings",
filter = mapOf(
"source" to "article1.txt" // Only return documents from this source
)
)
)
// Numeric comparison
val results2 = retrievalEngine.retrieve(
query = "What are the main points in the article?",
options = RetrievalOptions(
indexName = "document_embeddings",
filter = mapOf(
"price" to mapOf(
"$gt" to 100 // Only return documents with price > 100
)
)
)
)
// Multiple conditions
val results3 = retrievalEngine.retrieve(
query = "What are the main points in the article?",
options = RetrievalOptions(
indexName = "document_embeddings",
filter = mapOf(
"category" to "electronics",
"price" to mapOf(
"$lt" to 1000 // Price < 1000
),
"inStock" to true
)
)
)
// Array operations
val results4 = retrievalEngine.retrieve(
query = "What are the main points in the article?",
options = RetrievalOptions(
indexName = "document_embeddings",
filter = mapOf(
"tags" to mapOf(
"$in" to listOf("sale", "new") // Tags include either "sale" or "new"
)
)
)
)
// Logical operators
val results5 = retrievalEngine.retrieve(
query = "What are the main points in the article?",
options = RetrievalOptions(
indexName = "document_embeddings",
filter = mapOf(
"$or" to listOf(
mapOf("category" to "electronics"),
mapOf("category" to "accessories")
),
"$and" to listOf(
mapOf("price" to mapOf("$gt" to 50)),
mapOf("price" to mapOf("$lt" to 200))
)
)
)
)
Common Filter Operators
Operator | Description | Example |
---|---|---|
Equality | Direct match | "category" to "electronics" |
$gt | Greater than | "price" to mapOf("$gt" to 100) |
$gte | Greater than or equal | "price" to mapOf("$gte" to 100) |
$lt | Less than | "price" to mapOf("$lt" to 1000) |
$lte | Less than or equal | "price" to mapOf("$lte" to 1000) |
$in | In array | "tags" to mapOf("$in" to listOf("sale", "new")) |
$nin | Not in array | "tags" to mapOf("$nin" to listOf("discontinued")) |
$or | Logical OR | "$or" to listOf(mapOf(...), mapOf(...)) |
$and | Logical AND | "$and" to listOf(mapOf(...), mapOf(...)) |
$not | Logical NOT | "$not" to mapOf("category" to "outdated") |
Common Use Cases for Metadata Filtering
- Source Filtering: Limit results to specific document sources or types
- Date Filtering: Retrieve documents from specific time periods
- Category Filtering: Focus on documents with specific categories or tags
- Numerical Filtering: Filter by numerical ranges (e.g., price, rating, page count)
- Attribute Filtering: Select documents with specific attributes (e.g., language, author, format)
- Combination Filtering: Combine multiple conditions for precise querying
Vector Query Tool
Sometimes you want to give your agent the ability to query a vector database directly. The Vector Query Tool allows your agent to be in charge of retrieval decisions, combining semantic search with optional filtering and reranking based on the agent’s understanding of the user’s needs.
import ai.kastrax.rag.vectordb.PgVectorAdapter
import ai.kastrax.rag.tools.VectorQueryTool
import ai.kastrax.rag.embedding.OpenAIEmbedder
import ai.kastrax.core.agent.Agent
import ai.kastrax.core.agent.AgentBuilder
// Create a vector database adapter
val vectorDB = PgVectorAdapter(
connectionString = System.getenv("POSTGRES_CONNECTION_STRING")
)
// Create an embedder
val embedder = OpenAIEmbedder()
// Create the vector query tool
val vectorQueryTool = VectorQueryTool(
vectorDB = vectorDB,
embedder = embedder,
indexName = "document_embeddings",
description = "Search for information in the knowledge base"
)
// Create an agent with the vector query tool
val agent = AgentBuilder()
.withTools(listOf(vectorQueryTool))
.withModel("gpt-4")
.withSystemPrompt("You are a helpful assistant with access to a knowledge base.")
.build()
// The agent can now use the tool to query the vector database
val result = agent.run("Find information about climate change impacts on agriculture")
When creating the tool, pay special attention to the tool’s name and description - these help the agent understand when and how to use the retrieval capabilities. For example, you might name it “SearchKnowledgeBase” and describe it as “Search through our documentation to find relevant information about specific topics.”
The agent can use this tool to:
- Formulate semantic search queries
- Apply metadata filters
- Adjust the number of results
- Rerank results based on relevance to the user’s question
This approach gives the agent more control over the retrieval process, allowing it to adapt its search strategy based on the conversation context and user needs.
Advanced Retrieval Techniques ✅
Hybrid Search
Hybrid search combines vector similarity search with traditional keyword search to improve retrieval quality. This approach is particularly useful when you need to find documents that are both semantically similar to a query and contain specific keywords.
import ai.kastrax.rag.retrieval.HybridRetrievalEngine
import ai.kastrax.rag.retrieval.HybridRetrievalOptions
// Create a hybrid retrieval engine
val hybridEngine = HybridRetrievalEngine(
vectorDB = vectorDB,
embedder = embedder,
textIndexer = textIndexer // For keyword search
)
// Perform hybrid search
val results = hybridEngine.retrieve(
query = "climate change agriculture drought resistant crops",
options = HybridRetrievalOptions(
indexName = "document_embeddings",
vectorWeight = 0.7, // Weight for vector similarity (0.0-1.0)
textWeight = 0.3, // Weight for text similarity (0.0-1.0)
limit = 10
)
)
Hybrid search is particularly effective for:
- Queries containing specific technical terms or proper nouns
- Situations where exact keyword matching is important
- Improving precision while maintaining recall
Reranking
Reranking improves retrieval quality by applying a more sophisticated relevance model to the initial search results. Kastrax supports multiple reranking approaches:
import ai.kastrax.rag.retrieval.RetrievalEngine
import ai.kastrax.rag.retrieval.RetrievalOptions
import ai.kastrax.rag.reranking.CrossEncoderReranker
import ai.kastrax.rag.reranking.RerankerOptions
// Create a reranker
val reranker = CrossEncoderReranker(
options = RerankerOptions(
modelName = "cross-encoder/ms-marco-MiniLM-L-6-v2",
threshold = 0.7 // Minimum score to include in results
)
)
// Create a retrieval engine with reranking
val retrievalEngine = RetrievalEngine(
vectorDB = vectorDB,
embedder = embedder,
reranker = reranker
)
// Retrieve and rerank results
val results = retrievalEngine.retrieve(
query = "What are the effects of climate change on agriculture?",
options = RetrievalOptions(
indexName = "document_embeddings",
limit = 20, // Retrieve more initial results for reranking
rerank = true // Enable reranking
)
)
Reranking is particularly useful for:
- Improving precision in the final results
- Handling complex or ambiguous queries
- Reducing the impact of embedding model limitations
Contextual Retrieval
Contextual retrieval takes into account the conversation history or user context when retrieving information:
import ai.kastrax.rag.retrieval.ContextualRetrievalEngine
import ai.kastrax.rag.retrieval.ContextualRetrievalOptions
import ai.kastrax.rag.context.ConversationContext
// Create a conversation context
val conversationContext = ConversationContext()
// Add messages to the context
conversationContext.addMessage("user", "Tell me about climate change impacts.")
conversationContext.addMessage("assistant", "Climate change has various impacts including rising temperatures, sea level rise, and extreme weather events.")
// Create a contextual retrieval engine
val contextualEngine = ContextualRetrievalEngine(
vectorDB = vectorDB,
embedder = embedder
)
// Perform contextual retrieval
val results = contextualEngine.retrieve(
query = "What about its effects on agriculture specifically?",
options = ContextualRetrievalOptions(
indexName = "document_embeddings",
context = conversationContext,
contextWeight = 0.3, // Weight for conversation context
limit = 5
)
)
Contextual retrieval is valuable for:
- Maintaining conversation coherence
- Resolving ambiguous references in follow-up questions
- Providing more relevant results based on user interests
Best Practices ✅
Query Formulation
- Be specific: Craft queries that are specific and focused on the information you need
- Include key terms: Include important keywords and technical terms in your queries
- Avoid ambiguity: Phrase queries to minimize ambiguity and multiple interpretations
- Consider query expansion: For important queries, try multiple formulations to improve recall
Retrieval Configuration
- Adjust result limits: Retrieve more results for reranking (e.g., 20-50) but fewer for direct use (e.g., 3-5)
- Use appropriate similarity metrics: Choose the right similarity metric (cosine, dot product, Euclidean) for your embedding model
- Set minimum similarity thresholds: Filter out low-relevance results with minimum score thresholds
- Combine retrieval strategies: Use hybrid search and reranking for important queries
Metadata Management
- Design metadata thoughtfully: Include metadata fields that will be useful for filtering
- Be consistent: Use consistent naming and formatting for metadata fields
- Include hierarchical metadata: Add category/subcategory relationships for more flexible filtering
- Add temporal metadata: Include creation/update dates to filter by recency
Complete RAG Pipeline ✅
Here’s a complete example showing how retrieval fits into the full RAG pipeline:
import ai.kastrax.rag.document.Document
import ai.kastrax.rag.chunking.RecursiveChunker
import ai.kastrax.rag.embedding.OpenAIEmbedder
import ai.kastrax.rag.vectordb.PineconeAdapter
import ai.kastrax.rag.retrieval.RetrievalEngine
import ai.kastrax.rag.reranking.CrossEncoderReranker
import ai.kastrax.rag.generation.LLMGenerator
import ai.kastrax.rag.prompt.PromptTemplate
// 1. Document Processing
val document = Document.fromText("""
# Climate Change and Agriculture
Climate change poses significant challenges to global agriculture.
Rising temperatures and changing precipitation patterns affect crop yields.
Farmers must adapt to these changing conditions to ensure food security.
## Effects on Crop Yields
Studies have shown that for each degree Celsius of warming, there is a
potential reduction in global yields of wheat by 6%, rice by 3.2%,
maize by 7.4%, and soybean by 3.1%.
## Adaptation Strategies
Farmers are implementing various strategies to adapt to climate change:
- Drought-resistant crop varieties
- Improved irrigation systems
- Diversification of crops
- Precision agriculture techniques
""".trimIndent())
// 2. Chunking
val chunker = RecursiveChunker()
val chunks = chunker.chunk(document)
// 3. Embedding
val embedder = OpenAIEmbedder()
val embeddedChunks = embedder.embed(chunks)
// 4. Vector Storage
val vectorDB = PineconeAdapter()
vectorDB.upsert("document_embeddings", embeddedChunks)
// 5. Reranker Setup
val reranker = CrossEncoderReranker()
// 6. Retrieval Engine
val retrievalEngine = RetrievalEngine(
vectorDB = vectorDB,
embedder = embedder,
reranker = reranker
)
// 7. Query and Retrieve
val query = "How does climate change affect wheat production?"
val retrievedChunks = retrievalEngine.retrieve(
query = query,
options = RetrievalOptions(
indexName = "document_embeddings",
limit = 3,
rerank = true
)
)
// 8. Format Context
val context = retrievedChunks.joinToString("\n\n") { it.chunk.content }
// 9. Generate Response
val promptTemplate = PromptTemplate(
template = """
Answer the following question based on the provided context.
Context:
{{context}}
Question:
{{question}}
Answer:
""".trimIndent()
)
val prompt = promptTemplate.format(mapOf(
"context" to context,
"question" to query
))
val generator = LLMGenerator()
val answer = generator.generate(prompt)
// 10. Return Result
println("Query: $query")
println("Answer: $answer")
Conclusion ✅
Effective retrieval is the cornerstone of any successful RAG system. Kastrax provides a comprehensive suite of retrieval capabilities that enable you to find the most relevant information for your users’ queries.
By leveraging advanced techniques like metadata filtering, hybrid search, reranking, and contextual retrieval, you can significantly improve the quality of your RAG system’s responses. Remember to follow best practices for query formulation, retrieval configuration, and metadata management to get the most out of Kastrax’s retrieval capabilities.
The right retrieval strategy depends on your specific use case, data characteristics, and performance requirements. Experiment with different approaches and combinations to find the optimal solution for your application.
Knowledge Graph Retrieval
For documents with complex relationships, knowledge graph retrieval can follow connections between chunks. This approach is particularly useful when:
import ai.kastrax.rag.retrieval.GraphRetrievalEngine
import ai.kastrax.rag.retrieval.GraphRetrievalOptions
import ai.kastrax.rag.graph.KnowledgeGraph
// Create a knowledge graph
val knowledgeGraph = KnowledgeGraph()
// Add nodes and relationships from embedded chunks
embeddedChunks.forEach { chunk ->
knowledgeGraph.addNode(
id = chunk.id,
content = chunk.content,
embedding = chunk.embedding,
metadata = chunk.metadata
)
}
// Add relationships between nodes
knowledgeGraph.addRelationship(
sourceId = "chunk1",
targetId = "chunk2",
type = "references",
weight = 0.8
)
// Create a graph retrieval engine
val graphEngine = GraphRetrievalEngine(
knowledgeGraph = knowledgeGraph,
embedder = embedder
)
// Retrieve information using graph traversal
val results = graphEngine.retrieve(
query = "How does climate change affect agriculture?",
options = GraphRetrievalOptions(
maxHops = 2, // Maximum relationship traversal depth
limit = 5, // Maximum number of results
traversalStrategy = "breadth-first" // BFS or DFS
)
)
Knowledge graph retrieval is particularly valuable when:
- Information is spread across multiple documents
- Documents reference each other
- You need to traverse relationships to find complete answers
- The domain has complex entity relationships