Running Evals in CI ✅
Running evals in your CI pipeline helps bridge this gap by providing quantifiable metrics for measuring agent quality over time.
Setting Up CI Integration ✅
We support any testing framework that supports ESM modules. For example, you can use Vitest , Jest or Mocha to run evals in your CI/CD pipeline.
src/kastrax/agents/index.test.ts
import { describe, it, expect } from 'vitest';
import { evaluate } from "@kastrax/evals";
import { ToneConsistencyMetric } from "@kastrax/evals/nlp";
import { myAgent } from './index';
describe('My Agent', () => {
it('should validate tone consistency', async () => {
const metric = new ToneConsistencyMetric();
const result = await evaluate(myAgent, 'Hello, world!', metric)
expect(result.score).toBe(1);
});
});
You will need to configure a testSetup and globalSetup script for your testing framework to capture the eval results. It allows us to show these results in your kastrax dashboard.
Framework Configuration ✅
Vitest Setup
Add these files to your project to run evals in your CI/CD pipeline:
globalSetup.ts
import { globalSetup } from '@kastrax/evals';
export default function setup() {
globalSetup()
}
testSetup.ts
import { beforeAll } from 'vitest';
import { attachListeners } from '@kastrax/evals';
beforeAll(async () => {
await attachListeners();
});
vitest.config.ts
import { defineConfig } from 'vitest/config'
export default defineConfig({
test: {
globalSetup: './globalSetup.ts',
setupFiles: ['./testSetup.ts'],
},
})
Storage Configuration ✅
To store eval results in Kastrax Storage and capture results in the Kastrax dashboard:
testSetup.ts
import { beforeAll } from 'vitest';
import { attachListeners } from '@kastrax/evals';
import { kastrax } from './your-kastrax-setup';
beforeAll(async () => {
// Store evals in Kastrax Storage (requires storage to be enabled)
await attachListeners(kastrax);
});
With file storage, evals persist and can be queried later. With memory storage, evals are isolated to the test process.
Last updated on