Skip to Content
ExamplesEvalsAnswer Relevancy

Answer Relevancy Evaluation ✅

This example demonstrates how to use Kastrax’s Answer Relevancy metric to evaluate how well responses address their input queries.

Overview ✅

The example shows how to:

  1. Configure the Answer Relevancy metric
  2. Evaluate response relevancy to queries
  3. Analyze relevancy scores
  4. Handle different relevancy scenarios

Setup ✅

Environment Setup

Make sure to set up your environment variables:

.env
OPENAI_API_KEY=your_api_key_here

Dependencies

Import the necessary dependencies:

src/index.ts
import { openai } from '@ai-sdk/openai'; import { AnswerRelevancyMetric } from '@kastrax/evals/llm';

Metric Configuration ✅

Set up the Answer Relevancy metric with custom parameters:

src/index.ts
const metric = new AnswerRelevancyMetric(openai('gpt-4o-mini'), { uncertaintyWeight: 0.3, // Weight for 'unsure' verdicts scale: 1, // Scale for the final score });

Example Usage ✅

High Relevancy Example

Evaluate a highly relevant response:

src/index.ts
const query1 = 'What are the health benefits of regular exercise?'; const response1 = 'Regular exercise improves cardiovascular health, strengthens muscles, boosts metabolism, and enhances mental well-being through the release of endorphins.'; console.log('Example 1 - High Relevancy:'); console.log('Query:', query1); console.log('Response:', response1); const result1 = await metric.measure(query1, response1); console.log('Metric Result:', { score: result1.score, reason: result1.info.reason, }); // Example Output: // Metric Result: { score: 1, reason: 'The response is highly relevant to the query. It provides a comprehensive overview of the health benefits of regular exercise.' }

Partial Relevancy Example

Evaluate a partially relevant response:

src/index.ts
const query2 = 'What should a healthy breakfast include?'; const response2 = 'A nutritious breakfast should include whole grains and protein. However, the timing of your breakfast is just as important - studies show eating within 2 hours of waking optimizes metabolism and energy levels throughout the day.'; console.log('Example 2 - Partial Relevancy:'); console.log('Query:', query2); console.log('Response:', response2); const result2 = await metric.measure(query2, response2); console.log('Metric Result:', { score: result2.score, reason: result2.info.reason, }); // Example Output: // Metric Result: { score: 0.7, reason: 'The response is partially relevant to the query. It provides some information about healthy breakfast choices but misses the timing aspect.' }

Low Relevancy Example

Evaluate an irrelevant response:

src/index.ts
const query3 = 'What are the benefits of meditation?'; const response3 = 'The Great Wall of China is over 13,000 miles long and was built during the Ming Dynasty to protect against invasions.'; console.log('Example 3 - Low Relevancy:'); console.log('Query:', query3); console.log('Response:', response3); const result3 = await metric.measure(query3, response3); console.log('Metric Result:', { score: result3.score, reason: result3.info.reason, }); // Example Output: // Metric Result: { score: 0.1, reason: 'The response is not relevant to the query. It provides information about the Great Wall of China but does not mention meditation.' }

Understanding the Results ✅

The metric provides:

  1. A relevancy score between 0 and 1:

    • 1.0: Perfect relevancy - response directly addresses the query
    • 0.7-0.9: High relevancy - response mostly addresses the query
    • 0.4-0.6: Moderate relevancy - response partially addresses the query
    • 0.1-0.3: Low relevancy - response barely addresses the query
    • 0.0: No relevancy - response does not address the query at all
  2. Detailed reason for the score, including analysis of:

    • Query-response alignment
    • Topic focus
    • Information relevance
    • Improvement suggestions





View Example on GitHub
Last updated on