Backend Integration

Cedar’s voice system sends audio data to your backend for processing and expects either audio responses or structured JSON responses. This page covers how to implement the backend endpoint to handle voice interactions.

Endpoint Configuration

Automatic Configuration with Mastra

When using the Mastra provider, you can automatically configure the voice endpoint through the provider configuration:

<CedarCopilot
	llmProvider={{
		provider: 'mastra',
		baseURL: 'http://localhost:3000/api',
		voiceRoute: '/chat/voice-execute', // Automatically sets voice endpoint to: http://localhost:3000/api/chat/voice-execute
	}}>
	<YourApp />
</CedarCopilot>

This eliminates the need to manually call setVoiceEndpoint() and ensures consistency between your chat and voice endpoints.

Manual Configuration

For other providers or custom setups, configure the endpoint manually:

const voice = useCedarStore((state) => state.voice);
voice.setVoiceEndpoint('https://your-backend.com/api/voice');

Request Format

The voice system sends a multipart/form-data POST request to your configured endpoint with the following fields:

FormData {
  audio: Blob,      // WebM format with Opus codec
  settings: string, // JSON string of voice settings
  context: string   // JSON string of additional context
}

Audio Data

Format: WebM container with Opus codec
Type: Binary blob from MediaRecorder API
Quality: Optimized for speech recognition

Voice Settings

{
  language: string;     // e.g., 'en-US'
  voiceId?: string;     // Optional voice ID for TTS
  pitch?: number;       // 0.5 to 2.0
  rate?: number;        // 0.5 to 2.0
  volume?: number;      // 0.0 to 1.0
  useBrowserTTS?: boolean;
  autoAddToMessages?: boolean;
}

Additional Context

The context includes Cedar’s additional context data (file contents, state information, etc.) that can be used to provide better responses.

Response Formats

Your backend can respond in several ways:

1. Direct Audio Response

Return audio data directly with the appropriate content type:

// Response headers
Content-Type: audio/mpeg
// or audio/wav, audio/ogg, etc.

// Response body: Raw audio data

2. JSON Response with Audio URL

{
  "audioUrl": "https://example.com/response.mp3",
  "text": "Optional text transcript",
  "transcription": "What the user said"
}

3. JSON Response with Base64 Audio

{
  "audioData": "base64-encoded-audio-data",
  "audioFormat": "mp3", // or "wav", "ogg"
  "text": "Response text",
  "transcription": "User input transcription"
}

4. Structured Response with Actions

{
  "transcription": "Show me the user dashboard",
  "text": "I'll show you the user dashboard",
  "object": {
    "type": "action",
    "stateKey": "ui",
    "setterKey": "navigateTo",
    "args": ["/dashboard"]
  },
  "audioUrl": "https://example.com/response.mp3"
}

Implementation Examples

Node.js with Express

import express from 'express';
import multer from 'multer';
import OpenAI from 'openai';

const app = express();
const upload = multer();
const openai = new OpenAI();

app.post(
	'/api/chat/voice',
	upload.fields([
		{ name: 'audio', maxCount: 1 },
		{ name: 'settings', maxCount: 1 },
		{ name: 'context', maxCount: 1 },
	]),
	async (req, res) => {
		try {
			const audioFile = req.files.audio[0];
			const settings = JSON.parse(req.body.settings);
			const context = JSON.parse(req.body.context);

			// 1. Transcribe audio to text
			const transcription = await openai.audio.transcriptions.create({
				file: new File([audioFile.buffer], 'audio.webm', {
					type: 'audio/webm',
				}),
				model: 'whisper-1',
				language: settings.language?.split('-')[0] || 'en',
			});

			// 2. Process with your AI agent
			const messages = [
				{ role: 'system', content: 'You are a helpful assistant.' },
				{ role: 'user', content: transcription.text },
			];

			const completion = await openai.chat.completions.create({
				model: 'gpt-4',
				messages: messages,
			});

			const responseText = completion.choices[0].message.content;

			// 3. Convert response to speech
			const speech = await openai.audio.speech.create({
				model: 'tts-1',
				voice: settings.voiceId || 'alloy',
				input: responseText,
				speed: settings.rate || 1.0,
			});

			const audioBuffer = Buffer.from(await speech.arrayBuffer());

			// 4. Return audio response
			res.set({
				'Content-Type': 'audio/mpeg',
				'Content-Length': audioBuffer.length,
			});
			res.send(audioBuffer);
		} catch (error) {
			console.error('Voice processing error:', error);
			res.status(500).json({ error: 'Failed to process voice request' });
		}
	}
);

Python with FastAPI

from fastapi import FastAPI, File, Form, UploadFile
from fastapi.responses import Response
import openai
import json
import io

app = FastAPI()

@app.post("/api/chat/voice")
async def handle_voice(
    audio: UploadFile = File(...),
    settings: str = Form(...),
    context: str = Form(...)
):
    try:
        # Parse settings and context
        voice_settings = json.loads(settings)
        additional_context = json.loads(context)

        # Read audio data
        audio_data = await audio.read()

        # 1. Transcribe audio
        audio_file = io.BytesIO(audio_data)
        audio_file.name = "audio.webm"

        transcription = openai.Audio.transcribe(
            model="whisper-1",
            file=audio_file,
            language=voice_settings.get("language", "en")[:2]
        )

        # 2. Process with AI
        response = openai.ChatCompletion.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": transcription["text"]}
            ]
        )

        response_text = response.choices[0].message.content

        # 3. Generate speech
        speech_response = openai.Audio.speech.create(
            model="tts-1",
            voice=voice_settings.get("voiceId", "alloy"),
            input=response_text,
            speed=voice_settings.get("rate", 1.0)
        )

        # 4. Return audio
        return Response(
            content=speech_response.content,
            media_type="audio/mpeg"
        )

    except Exception as e:
        return {"error": str(e)}, 500

Mastra Agent Integration

When using Cedar-OS with the Mastra provider and voiceRoute configuration, your Mastra backend should handle requests at the specified route. Here’s how to implement the voice handler:

import { Agent } from '@mastra/core';
import { openai } from '@mastra/openai';

const agent = new Agent({
	name: 'voice-assistant',
	instructions: 'You are a helpful voice assistant.',
	model: openai.gpt4o(),
});

export async function handleVoiceRequest(
	audioBlob: Blob,
	settings: VoiceSettings,
	context: string
) {
	// 1. Transcribe audio
	const transcription = await openai.audio.transcriptions.create({
		file: audioBlob,
		model: 'whisper-1',
		language: settings.language?.split('-')[0] || 'en',
	});

	// 2. Add context to the conversation
	const contextData = JSON.parse(context);
	const systemMessage = `Additional context: ${JSON.stringify(contextData)}`;

	// 3. Generate response with agent
	const response = await agent.generate([
		{ role: 'system', content: systemMessage },
		{ role: 'user', content: transcription.text },
	]);

	// 4. Convert to speech
	const speech = await openai.audio.speech.create({
		model: 'tts-1',
		voice: settings.voiceId || 'alloy',
		input: response.text,
		speed: settings.rate || 1.0,
	});

	return {
		transcription: transcription.text,
		text: response.text,
		audioData: await speech.arrayBuffer(),
		audioFormat: 'mp3',
	};
}

// Example Mastra route setup
// If you configured voiceRoute: '/chat/voice-execute' in Cedar-OS,
// your Mastra backend should handle POST requests to this route:

app.post(
	'/chat/voice-execute',
	upload.fields([
		{ name: 'audio', maxCount: 1 },
		{ name: 'settings', maxCount: 1 },
		{ name: 'context', maxCount: 1 },
	]),
	async (req, res) => {
		const audioFile = req.files.audio[0];
		const settings = JSON.parse(req.body.settings);
		const context = req.body.context;

		const result = await handleVoiceRequest(
			new Blob([audioFile.buffer]),
			settings,
			context
		);

		// Return the structured response
		res.json(result);
	}
);

Error Handling

Your backend should handle various error cases:

app.post('/api/chat/voice', async (req, res) => {
	try {
		// ... processing logic
	} catch (error) {
		console.error('Voice processing error:', error);

		// Return appropriate error response
		if (error.code === 'AUDIO_TRANSCRIPTION_FAILED') {
			return res.status(400).json({
				error: 'Could not understand the audio. Please try again.',
				code: 'TRANSCRIPTION_ERROR',
			});
		}

		if (error.code === 'TTS_FAILED') {
			return res.status(500).json({
				error: 'Failed to generate speech response.',
				code: 'TTS_ERROR',
				// Fallback to text-only response
				text: responseText,
				transcription: userInput,
			});
		}

		// Generic error
		res.status(500).json({
			error: 'Internal server error processing voice request',
			code: 'INTERNAL_ERROR',
		});
	}
});

CORS Configuration

Ensure your backend allows CORS for the frontend domain:

import cors from 'cors';

app.use(
	cors({
		origin: ['http://localhost:3000', 'https://your-app-domain.com'],
		methods: ['POST'],
		allowedHeaders: ['Content-Type', 'Authorization'],
	})
);

Performance Considerations

Audio Processing

Use streaming transcription for real-time responses
Implement audio compression to reduce bandwidth
Cache TTS responses for common phrases

Response Optimization

Stream audio responses when possible
Use CDN for serving generated audio files
Implement request queuing for high-traffic scenarios

// Example streaming response
app.post('/api/chat/voice-stream', async (req, res) => {
	res.writeHead(200, {
		'Content-Type': 'audio/mpeg',
		'Transfer-Encoding': 'chunked',
	});

	const audioStream = await generateAudioStream(transcription);

	audioStream.on('data', (chunk) => {
		res.write(chunk);
	});

	audioStream.on('end', () => {
		res.end();
	});
});

Security Best Practices

Rate Limiting: Prevent abuse of voice endpoints
Authentication: Verify user permissions
Input Validation: Sanitize audio data and settings
Content Filtering: Screen transcriptions for inappropriate content

import rateLimit from 'express-rate-limit';

const voiceRateLimit = rateLimit({
	windowMs: 15 * 60 * 1000, // 15 minutes
	max: 50, // 50 requests per window
	message: 'Too many voice requests, please try again later.',
});

app.post('/api/chat/voice', voiceRateLimit, handleVoiceRequest);

Testing Your Integration

Test your voice endpoint with curl:

# Create test audio file
curl -X POST http://localhost:3456/api/chat/voice \
  -F "audio=@test-audio.webm" \
  -F "settings={\"language\":\"en-US\",\"rate\":1.0}" \
  -F "context={\"files\":[],\"state\":{}}" \
  -o response.mp3

Or use the Cedar voice system directly for end-to-end testing:

// With Mastra provider (automatic configuration)
<CedarCopilot
	llmProvider={{
		provider: 'mastra',
		baseURL: 'http://localhost:3456/api',
		voiceRoute: '/chat/voice-execute',
	}}>
	<YourApp />
</CedarCopilot>;

// Or manually configure the endpoint
const voice = useCedarStore((state) => state.voice);
voice.setVoiceEndpoint('http://localhost:3456/api/chat/voice');
await voice.requestVoicePermission();
voice.startListening();

Changelog

Introduction to Cedar-OS

Getting Started

Connecting to an Agent

Chat

Agent Input Context

Agentic State

Voice (Beta)

Spells (Coming Soon)

External Integrations (Coming Soon)

Customising Cedar

Learn more about Cedar ❤️🌳

Endpoint Configuration

Automatic Configuration with Mastra

Manual Configuration

Request Format

Audio Data

Voice Settings

Additional Context

Response Formats

1. Direct Audio Response

2. JSON Response with Audio URL

3. JSON Response with Base64 Audio

4. Structured Response with Actions

Implementation Examples

Node.js with Express

Python with FastAPI

Mastra Agent Integration

Error Handling

CORS Configuration

Performance Considerations

Audio Processing

Response Optimization

Security Best Practices

Testing Your Integration

Changelog

Introduction to Cedar-OS

Getting Started

Connecting to an Agent

Chat

Agent Input Context

Agentic State

Voice (Beta)

Spells (Coming Soon)

External Integrations (Coming Soon)

Customising Cedar

Learn more about Cedar ❤️🌳

​Endpoint Configuration

​Automatic Configuration with Mastra

​Manual Configuration

​Request Format

​Audio Data

​Voice Settings

​Additional Context

​Response Formats

​1. Direct Audio Response

​2. JSON Response with Audio URL

​3. JSON Response with Base64 Audio

​4. Structured Response with Actions

​Implementation Examples

​Node.js with Express

​Python with FastAPI

​Mastra Agent Integration

​Error Handling

​CORS Configuration

​Performance Considerations

​Audio Processing

​Response Optimization

​Security Best Practices

​Testing Your Integration

Endpoint Configuration

Automatic Configuration with Mastra

Manual Configuration

Request Format

Audio Data

Voice Settings

Additional Context

Response Formats

1. Direct Audio Response

2. JSON Response with Audio URL

3. JSON Response with Base64 Audio

4. Structured Response with Actions

Implementation Examples

Node.js with Express

Python with FastAPI

Mastra Agent Integration

Error Handling

CORS Configuration

Performance Considerations

Audio Processing

Response Optimization

Security Best Practices

Testing Your Integration