Skip to Content
ExamplesEvalsPrompt Alignment

Prompt Alignment ✅

This example demonstrates how to use Kastrax’s Prompt Alignment metric to evaluate how well responses follow given instructions.

Overview ✅

The example shows how to:

  1. Configure the Prompt Alignment metric
  2. Evaluate instruction adherence
  3. Handle non-applicable instructions
  4. Calculate alignment scores

Setup ✅

Environment Setup

Make sure to set up your environment variables:

.env
OPENAI_API_KEY=your_api_key_here

Dependencies

Import the necessary dependencies:

src/index.ts
import { openai } from '@ai-sdk/openai'; import { PromptAlignmentMetric } from '@kastrax/evals/llm';

Example Usage ✅

Perfect Alignment Example

Evaluate a response that follows all instructions:

src/index.ts
const instructions1 = [ 'Use complete sentences', 'Include temperature in Celsius', 'Mention wind conditions', 'State precipitation chance', ]; const metric1 = new PromptAlignmentMetric(openai('gpt-4o-mini'), { instructions: instructions1, }); const query1 = 'What is the weather like?'; const response1 = 'The temperature is 22 degrees Celsius with moderate winds from the northwest. There is a 30% chance of rain.'; console.log('Example 1 - Perfect Alignment:'); console.log('Instructions:', instructions1); console.log('Query:', query1); console.log('Response:', response1); const result1 = await metric1.measure(query1, response1); console.log('Metric Result:', { score: result1.score, reason: result1.info.reason, details: result1.info.scoreDetails, }); // Example Output: // Metric Result: { score: 1, reason: 'The response follows all instructions.' }

Mixed Alignment Example

Evaluate a response that misses some instructions:

src/index.ts
const instructions2 = [ 'Use bullet points', 'Include prices in USD', 'Show stock status', 'Add product descriptions' ]; const metric2 = new PromptAlignmentMetric(openai('gpt-4o-mini'), { instructions: instructions2, }); const query2 = 'List the available products'; const response2 = '• Coffee - $4.99 (In Stock)\n• Tea - $3.99\n• Water - $1.99 (Out of Stock)'; console.log('Example 2 - Mixed Alignment:'); console.log('Instructions:', instructions2); console.log('Query:', query2); console.log('Response:', response2); const result2 = await metric2.measure(query2, response2); console.log('Metric Result:', { score: result2.score, reason: result2.info.reason, details: result2.info.scoreDetails, }); // Example Output: // Metric Result: { score: 0.5, reason: 'The response misses some instructions.' }

Non-Applicable Instructions Example

Evaluate a response where instructions don’t apply:

src/index.ts
const instructions3 = [ 'Show account balance', 'List recent transactions', 'Display payment history' ]; const metric3 = new PromptAlignmentMetric(openai('gpt-4o-mini'), { instructions: instructions3, }); const query3 = 'What is the weather like?'; const response3 = 'It is sunny and warm outside.'; console.log('Example 3 - N/A Instructions:'); console.log('Instructions:', instructions3); console.log('Query:', query3); console.log('Response:', response3); const result3 = await metric3.measure(query3, response3); console.log('Metric Result:', { score: result3.score, reason: result3.info.reason, details: result3.info.scoreDetails, }); // Example Output: // Metric Result: { score: 0, reason: 'No instructions are followed or are applicable to the query.' }

Understanding the Results ✅

The metric provides:

  1. An alignment score between 0 and 1, or -1 for special cases:

    • 1.0: Perfect alignment - all applicable instructions followed
    • 0.5-0.8: Mixed alignment - some instructions missed
    • 0.1-0.4: Poor alignment - most instructions not followed
    • 0.0:No alignment - no instructions are applicable or followed
  2. Detailed reason for the score, including analysis of:

    • Query-response alignment
    • Instruction adherence
  3. Score details, including breakdown of:

    • Followed instructions
    • Missed instructions
    • Non-applicable instructions
    • Reasoning for each instruction’s status

When no instructions are applicable to the context (score: -1), this indicates a prompt design issue rather than a response quality issue.






View Example on GitHub
Last updated on