Skip to Content
ExamplesEvalsContext Precision

Context Precision ✅

This example demonstrates how to use Kastrax’s Context Precision metric to evaluate how precisely responses use provided context information.

Overview ✅

The example shows how to:

  1. Configure the Context Precision metric
  2. Evaluate context precision
  3. Analyze precision scores
  4. Handle different precision levels

Setup ✅

Environment Setup

Make sure to set up your environment variables:

.env
OPENAI_API_KEY=your_api_key_here

Dependencies

Import the necessary dependencies:

src/index.ts
import { openai } from '@ai-sdk/openai'; import { ContextPrecisionMetric } from '@kastrax/evals/llm';

Example Usage ✅

High Precision Example

Evaluate a response where all context is relevant:

src/index.ts
const context1 = [ 'Photosynthesis converts sunlight into energy.', 'Plants use chlorophyll for photosynthesis.', 'Photosynthesis produces oxygen as a byproduct.', 'The process requires sunlight and chlorophyll.', ]; const metric1 = new ContextPrecisionMetric(openai('gpt-4o-mini'), { context: context1, }); const query1 = 'What is photosynthesis and how does it work?'; const response1 = 'Photosynthesis is a process where plants convert sunlight into energy using chlorophyll, producing oxygen as a byproduct.'; console.log('Example 1 - High Precision:'); console.log('Context:', context1); console.log('Query:', query1); console.log('Response:', response1); const result1 = await metric1.measure(query1, response1); console.log('Metric Result:', { score: result1.score, reason: result1.info.reason, }); // Example Output: // Metric Result: { score: 1, reason: 'The context uses all relevant information and does not include any irrelevant information.' }

Mixed Precision Example

Evaluate a response where some context is irrelevant:

src/index.ts
const context2 = [ 'Volcanoes are openings in the Earth\'s crust.', 'Volcanoes can be active, dormant, or extinct.', 'Hawaii has many active volcanoes.', 'The Pacific Ring of Fire has many volcanoes.', ]; const metric2 = new ContextPrecisionMetric(openai('gpt-4o-mini'), { context: context2, }); const query2 = 'What are the different types of volcanoes?'; const response2 = 'Volcanoes can be classified as active, dormant, or extinct based on their activity status.'; console.log('Example 2 - Mixed Precision:'); console.log('Context:', context2); console.log('Query:', query2); console.log('Response:', response2); const result2 = await metric2.measure(query2, response2); console.log('Metric Result:', { score: result2.score, reason: result2.info.reason, }); // Example Output: // Metric Result: { score: 0.5, reason: 'The context uses some relevant information and includes some irrelevant information.' }

Low Precision Example

Evaluate a response where most context is irrelevant:

src/index.ts
const context3 = [ 'The Nile River is in Africa.', 'The Nile is the longest river.', 'Ancient Egyptians used the Nile.', 'The Nile flows north.', ]; const metric3 = new ContextPrecisionMetric(openai('gpt-4o-mini'), { context: context3, }); const query3 = 'Which direction does the Nile River flow?'; const response3 = 'The Nile River flows northward.'; console.log('Example 3 - Low Precision:'); console.log('Context:', context3); console.log('Query:', query3); console.log('Response:', response3); const result3 = await metric3.measure(query3, response3); console.log('Metric Result:', { score: result3.score, reason: result3.info.reason, }); // Example Output: // Metric Result: { score: 0.2, reason: 'The context only has one relevant piece, which is at the end.' }

Understanding the Results ✅

The metric provides:

  1. A precision score between 0 and 1:

    • 1.0: Perfect precision - all context pieces are relevant and used
    • 0.7-0.9: High precision - most context pieces are relevant
    • 0.4-0.6: Mixed precision - some context pieces are relevant
    • 0.1-0.3: Low precision - few context pieces are relevant
    • 0.0: No precision - no context pieces are relevant
  2. Detailed reason for the score, including analysis of:

    • Relevance of each context piece
    • Usage in the response
    • Contribution to answering the query
    • Overall context usefulness





View Example on GitHub
Last updated on