ElevenLabs ✅

The ElevenLabs voice implementation in Kastrax provides high-quality text-to-speech (TTS) and speech-to-text (STT) capabilities using the ElevenLabs API.

Usage Example ✅


import { ElevenLabsVoice } from "@kastrax/voice-elevenlabs";
 
// Initialize with default configuration (uses ELEVENLABS_API_KEY environment variable)
const voice = new ElevenLabsVoice();
 
// Initialize with custom configuration
const voice = new ElevenLabsVoice({
  speechModel: {
    name: 'eleven_multilingual_v2',
    apiKey: 'your-api-key',
  },
  speaker: 'custom-speaker-id',
});
 
// Text-to-Speech
const audioStream = await voice.speak("Hello, world!");
 
// Get available speakers
const speakers = await voice.getSpeakers();

Constructor Parameters ✅

speechModel?:

ElevenLabsVoiceConfig

= { name: 'eleven_multilingual_v2' }

Configuration for text-to-speech functionality.

speaker?:

string

= '9BWtsMINqrJLrRacOk9x' (Aria voice)

ID of the speaker to use for text-to-speech

ElevenLabsVoiceConfig

name?:

ElevenLabsModel

= 'eleven_multilingual_v2'

The ElevenLabs model to use

apiKey?:

string

ElevenLabs API key. Falls back to ELEVENLABS_API_KEY environment variable

Methods ✅

speak()

Converts text to speech using the configured speech model and voice.

input:

string | NodeJS.ReadableStream

Text to convert to speech. If a stream is provided, it will be converted to text first.

options?:

object

Additional options for speech synthesis

options.speaker?:

string

Override the default speaker ID for this request

Returns: Promise<NodeJS.ReadableStream>

getSpeakers()

Returns an array of available voice options, where each node contains:

voiceId:

string

Unique identifier for the voice

name:

string

Display name of the voice

language:

string

Language code for the voice

gender:

string

Gender of the voice

listen()

Converts audio input to text using ElevenLabs Speech-to-Text API.

input:

NodeJS.ReadableStream

A readable stream containing the audio data to transcribe

options?:

object

Configuration options for the transcription

The options object supports the following properties:

language_code?:

string

ISO language code (e.g., 'en', 'fr', 'es')

tag_audio_events?:

boolean

Whether to tag audio events like [MUSIC], [LAUGHTER], etc.

num_speakers?:

number

Number of speakers to detect in the audio

filetype?:

string

Audio file format (e.g., 'mp3', 'wav', 'ogg')

timeoutInSeconds?:

number

Request timeout in seconds

maxRetries?:

number

Maximum number of retry attempts

abortSignal?:

AbortSignal

Signal to abort the request

Returns: Promise<string> - A Promise that resolves to the transcribed text

Important Notes ✅

An ElevenLabs API key is required. Set it via the ELEVENLABS_API_KEY environment variable or pass it in the constructor.
The default speaker is set to Aria (ID: ‘9BWtsMINqrJLrRacOk9x’).
Speech-to-text functionality is not supported by ElevenLabs.
Available speakers can be retrieved using the getSpeakers() method, which returns detailed information about each voice including language and gender.