Skip to Content
ExamplesEvalsContextual Recall

Contextual Recall ✅

This example demonstrates how to use Kastrax’s Contextual Recall metric to evaluate how effectively responses incorporate information from provided context.

Overview ✅

The example shows how to:

  1. Configure the Contextual Recall metric
  2. Evaluate context incorporation
  3. Analyze recall scores
  4. Handle different recall levels

Setup ✅

Environment Setup

Make sure to set up your environment variables:

.env
OPENAI_API_KEY=your_api_key_here

Dependencies

Import the necessary dependencies:

src/index.ts
import { openai } from '@ai-sdk/openai'; import { ContextualRecallMetric } from '@kastrax/evals/llm';

Example Usage ✅

High Recall Example

Evaluate a response that includes all context information:

src/index.ts
const context1 = [ 'Product features include cloud sync.', 'Offline mode is available.', 'Supports multiple devices.', ]; const metric1 = new ContextualRecallMetric(openai('gpt-4o-mini'), { context: context1, }); const query1 = 'What are the key features of the product?'; const response1 = 'The product features cloud synchronization, offline mode support, and the ability to work across multiple devices.'; console.log('Example 1 - High Recall:'); console.log('Context:', context1); console.log('Query:', query1); console.log('Response:', response1); const result1 = await metric1.measure(query1, response1); console.log('Metric Result:', { score: result1.score, reason: result1.info.reason, }); // Example Output: // Metric Result: { score: 1, reason: 'All elements of the output are supported by the context.' }

Mixed Recall Example

Evaluate a response that includes some context information:

src/index.ts
const context2 = [ 'Python is a high-level programming language.', 'Python emphasizes code readability.', 'Python supports multiple programming paradigms.', 'Python is widely used in data science.', ]; const metric2 = new ContextualRecallMetric(openai('gpt-4o-mini'), { context: context2, }); const query2 = 'What are Python\'s key characteristics?'; const response2 = 'Python is a high-level programming language. It is also a type of snake.'; console.log('Example 2 - Mixed Recall:'); console.log('Context:', context2); console.log('Query:', query2); console.log('Response:', response2); const result2 = await metric2.measure(query2, response2); console.log('Metric Result:', { score: result2.score, reason: result2.info.reason, }); // Example Output: // Metric Result: { score: 0.5, reason: 'Only half of the output is supported by the context.' }

Low Recall Example

Evaluate a response that misses most context information:

src/index.ts
const context3 = [ 'The solar system has eight planets.', 'Mercury is closest to the Sun.', 'Venus is the hottest planet.', 'Mars is called the Red Planet.', ]; const metric3 = new ContextualRecallMetric(openai('gpt-4o-mini'), { context: context3, }); const query3 = 'Tell me about the solar system.'; const response3 = 'Jupiter is the largest planet in the solar system.'; console.log('Example 3 - Low Recall:'); console.log('Context:', context3); console.log('Query:', query3); console.log('Response:', response3); const result3 = await metric3.measure(query3, response3); console.log('Metric Result:', { score: result3.score, reason: result3.info.reason, }); // Example Output: // Metric Result: { score: 0, reason: 'None of the output is supported by the context.' }

Understanding the Results ✅

The metric provides:

  1. A recall score between 0 and 1:

    • 1.0: Perfect recall - all context information used
    • 0.7-0.9: High recall - most context information used
    • 0.4-0.6: Mixed recall - some context information used
    • 0.1-0.3: Low recall - little context information used
    • 0.0: No recall - no context information used
  2. Detailed reason for the score, including analysis of:

    • Information incorporation
    • Missing context
    • Response completeness
    • Overall recall quality





View Example on GitHub
Last updated on