Voice Endpoint Format

Cedar OS provides two approaches for handling voice, depending on your provider configuration:

Mastra/Custom backends: Direct voice endpoint handling
AI SDK/OpenAI providers: Automatic transcription and speech generation

Provider-Specific Voice Handling

Mastra and Custom Backends

When using Mastra or custom backends, Cedar OS sends voice data directly to your voice endpoint. You have full control over:

Audio transcription
Response generation
Text-to-speech synthesis
Response format

AI SDK and OpenAI Providers

When using AI SDK or OpenAI providers, Cedar OS automatically:

Transcribes audio using OpenAI’s Whisper model
Generates a text response using the configured LLM
Optionally generates speech using OpenAI’s TTS model (when useBrowserTTS is false)

Request Format (Mastra/Custom)

Cedar OS sends voice data to your endpoint as a multipart form data request:

POST /voice
Content-Type: multipart/form-data
Authorization: Bearer YOUR_API_KEY (if configured)

FormData:
- audio: Blob (audio file, typically webm format)
- settings: JSON string containing voice settings
- context: String containing additional context

Voice Settings Structure

The settings field contains a JSON object with the following structure:

{
	"language": "en-US",
	"voiceId": "optional-voice-id",
	"pitch": 1.0,
	"rate": 1.0,
	"volume": 1.0,
	"useBrowserTTS": false,
	"autoAddToMessages": true
}

language: Language code for speech recognition/synthesis
voiceId: Voice identifier for TTS (provider-specific)
pitch, rate, volume: Voice modulation parameters
useBrowserTTS: Whether to use browser’s built-in TTS
autoAddToMessages: Whether to add voice interactions to chat history

Context

The context field contains stringified additional context from the Cedar state, which may include:

Current chat messages
Application state
User-defined context

Response Format (All Providers)

Your endpoint can return different types of responses:

1. JSON Response (Recommended)

{
	"text": "The assistant's response text",
	"transcription": "What the user said",
	"audioData": "base64-encoded-audio-data",
	"audioUrl": "https://example.com/audio.mp3",
	"audioFormat": "audio/mpeg",
	"usage": {
		"promptTokens": 100,
		"completionTokens": 50,
		"totalTokens": 150
	},
	"object": {
		"type": "action",
		"stateKey": "myState",
		"setterKey": "updateValue",
		"args": ["new value"]
	}
}

All fields are optional:

text: The text response from the assistant
transcription: The transcribed user input
audioData: Base64-encoded audio response
audioUrl: URL to an audio file
audioFormat: MIME type of the audio
usage: Token usage statistics
object: Structured response for actions

2. Audio Response

Return raw audio data with appropriate content type:

HTTP/1.1 200 OK
Content-Type: audio/mpeg

[Binary audio data]

3. Plain Text Response

HTTP/1.1 200 OK
Content-Type: text/plain

This is the assistant's response.

Implementation Example (Mastra)

Here’s an example of implementing a voice endpoint in a Mastra backend:

import { Agent } from '@mastra/core';

export async function POST(request: Request) {
	const formData = await request.formData();
	const audio = formData.get('audio') as Blob;
	const settings = JSON.parse(formData.get('settings') as string);
	const context = formData.get('context') as string;

	// Process audio (transcription)
	const transcription = await transcribeAudio(audio);

	// Generate response using your agent
	const agent = new Agent({
		// ... agent configuration
	});

	const response = await agent.generate({
		prompt: transcription,
		context: context,
	});

	// Optionally generate speech
	let audioData;
	if (!settings.useBrowserTTS) {
		audioData = await generateSpeech(response.text);
	}

	// Return JSON response
	return Response.json({
		text: response.text,
		transcription: transcription,
		audioData: audioData
			? Buffer.from(audioData).toString('base64')
			: undefined,
		usage: response.usage,
	});
}

Voice Response Handling

Cedar OS provides a unified handleLLMVoice function that processes voice responses consistently across all providers:

Audio Playback: Handles base64 audio data, audio URLs, or browser TTS
Message Integration: Automatically adds transcriptions and responses to chat history
Action Execution: Processes structured responses to trigger state changes

Structured Responses

Cedar OS supports structured responses that can trigger actions in your application:

Action Response

To execute a state change:

{
	"text": "I've updated the value for you.",
	"object": {
		"type": "action",
		"stateKey": "myCustomState",
		"setterKey": "setValue",
		"args": [42]
	}
}

This will call myCustomState.setValue(42) in your Cedar state.

Error Handling

Return appropriate HTTP status codes:

200 OK: Successful response
400 Bad Request: Invalid request format
401 Unauthorized: Missing or invalid API key
500 Internal Server Error: Server-side error

Voice Configuration

Configure voice settings when initializing Cedar:

const store = createCedarStore({
	voiceSettings: {
		language: 'en-US',
		voiceId: 'alloy', // OpenAI voice options: alloy, echo, fable, onyx, nova, shimmer
		useBrowserTTS: false, // Use provider TTS instead of browser
		autoAddToMessages: true, // Add voice interactions to chat
	},
});

Changelog

Introduction to Cedar-OS

Getting Started

Connecting to an Agent

Chat

Agent Input Context

Agentic State

Voice (Beta)

Spells (Coming Soon)

External Integrations (Coming Soon)

Customising Cedar

Learn more about Cedar ❤️🌳

Voice Endpoint Format

Voice Endpoint Format

Provider-Specific Voice Handling

Mastra and Custom Backends

AI SDK and OpenAI Providers

Request Format (Mastra/Custom)

Voice Settings Structure

Context

Response Format (All Providers)

1. JSON Response (Recommended)

2. Audio Response

3. Plain Text Response

Implementation Example (Mastra)

Voice Response Handling

Structured Responses

Action Response

Error Handling

Voice Configuration

Provider-Specific Notes

OpenAI/AI SDK

Mastra/Custom

Changelog

Introduction to Cedar-OS

Getting Started

Connecting to an Agent

Chat

Agent Input Context

Agentic State

Voice (Beta)

Spells (Coming Soon)

External Integrations (Coming Soon)

Customising Cedar

Learn more about Cedar ❤️🌳

​Voice Endpoint Format

​Provider-Specific Voice Handling

​Mastra and Custom Backends

​AI SDK and OpenAI Providers

​Request Format (Mastra/Custom)

​Voice Settings Structure

​Context

​Response Format (All Providers)

​1. JSON Response (Recommended)

​2. Audio Response

​3. Plain Text Response

​Implementation Example (Mastra)

​Voice Response Handling

​Structured Responses

​Action Response

​Error Handling

​Voice Configuration

​Provider-Specific Notes

​OpenAI/AI SDK

​Mastra/Custom

Voice Endpoint Format

Provider-Specific Voice Handling

Mastra and Custom Backends

AI SDK and OpenAI Providers

Request Format (Mastra/Custom)

Voice Settings Structure

Context

Response Format (All Providers)

1. JSON Response (Recommended)

2. Audio Response

3. Plain Text Response

Implementation Example (Mastra)

Voice Response Handling

Structured Responses

Action Response

Error Handling

Voice Configuration

Provider-Specific Notes

OpenAI/AI SDK

Mastra/Custom