Skip to Content
DocsEvalsRunning in CI

Running Evals in CI ✅

Running evals in your CI pipeline helps bridge this gap by providing quantifiable metrics for measuring agent quality over time.

Setting Up CI Integration ✅

We support any testing framework that supports ESM modules. For example, you can use Vitest, Jest or Mocha to run evals in your CI/CD pipeline.

src/kastrax/agents/index.test.ts
import { describe, it, expect } from 'vitest'; import { evaluate } from "@kastrax/evals"; import { ToneConsistencyMetric } from "@kastrax/evals/nlp"; import { myAgent } from './index'; describe('My Agent', () => { it('should validate tone consistency', async () => { const metric = new ToneConsistencyMetric(); const result = await evaluate(myAgent, 'Hello, world!', metric) expect(result.score).toBe(1); }); });

You will need to configure a testSetup and globalSetup script for your testing framework to capture the eval results. It allows us to show these results in your kastrax dashboard.

Framework Configuration ✅

Vitest Setup

Add these files to your project to run evals in your CI/CD pipeline:

globalSetup.ts
import { globalSetup } from '@kastrax/evals'; export default function setup() { globalSetup() }
testSetup.ts
import { beforeAll } from 'vitest'; import { attachListeners } from '@kastrax/evals'; beforeAll(async () => { await attachListeners(); });
vitest.config.ts
import { defineConfig } from 'vitest/config' export default defineConfig({ test: { globalSetup: './globalSetup.ts', setupFiles: ['./testSetup.ts'], }, })

Storage Configuration ✅

To store eval results in Kastrax Storage and capture results in the Kastrax dashboard:

testSetup.ts
import { beforeAll } from 'vitest'; import { attachListeners } from '@kastrax/evals'; import { kastrax } from './your-kastrax-setup'; beforeAll(async () => { // Store evals in Kastrax Storage (requires storage to be enabled) await attachListeners(kastrax); });

With file storage, evals persist and can be queried later. With memory storage, evals are isolated to the test process.

Last updated on