Running Evals in CI ✅

Running evals in your CI pipeline helps bridge this gap by providing quantifiable metrics for measuring agent quality over time.

Setting Up CI Integration ✅

We support any testing framework that supports ESM modules. For example, you can use Vitest , Jest or Mocha to run evals in your CI/CD pipeline.

src/kastrax/agents/index.test.ts


import { describe, it, expect } from 'vitest';
import { evaluate } from "@kastrax/evals";
import { ToneConsistencyMetric } from "@kastrax/evals/nlp";
import { myAgent } from './index';
 
describe('My Agent', () => {
  it('should validate tone consistency', async () => {
    const metric = new ToneConsistencyMetric();
    const result = await evaluate(myAgent, 'Hello, world!', metric)
 
    expect(result.score).toBe(1);
  });
});

You will need to configure a testSetup and globalSetup script for your testing framework to capture the eval results. It allows us to show these results in your kastrax dashboard.

Framework Configuration ✅

Vitest Setup

Add these files to your project to run evals in your CI/CD pipeline:

globalSetup.ts


import { globalSetup } from '@kastrax/evals';
 
export default function setup() {
  globalSetup()
}

testSetup.ts


import { beforeAll } from 'vitest';
import { attachListeners } from '@kastrax/evals';
 
beforeAll(async () => {
  await attachListeners();
});

vitest.config.ts


import { defineConfig } from 'vitest/config'
 
export default defineConfig({
  test: {
    globalSetup: './globalSetup.ts',
    setupFiles: ['./testSetup.ts'],
  },
})

Storage Configuration ✅

To store eval results in Kastrax Storage and capture results in the Kastrax dashboard:

testSetup.ts


import { beforeAll } from 'vitest';
import { attachListeners } from '@kastrax/evals';
import { kastrax } from './your-kastrax-setup';
 
beforeAll(async () => {
  // Store evals in Kastrax Storage (requires storage to be enabled)
  await attachListeners(kastrax);
});

With file storage, evals persist and can be queried later. With memory storage, evals are isolated to the test process.