Voice Integration

Cedar OS provides comprehensive voice support that integrates seamlessly with your agent backend. Voice processing is now handled through the agent connection system, providing a unified approach across different providers.

Voice Processing Architecture

As of the latest update, voice processing now works out of the box for Mastra, AI SDK, and custom backends.

Mastra/Custom backends: Voice data is sent directly to your voice endpoint
AI SDK/OpenAI providers: Voice is transcribed using Whisper, then processed through the LLM.

See the Voice Endpoint Format documentation for implementation details.This is still in beta. If you want more details please check out the repo, book a call at https://calendly.com/jesse-cedarcopilot/30min or join our discord.

Quick Start

Initialize Cedar with voice settings (already configured with the built in Cedar chat input):

import { CedarCopilot } from 'cedar-os';

function App() {
	return (
		<CedarCopilot
			llmProvider={{
				provider: 'openai',
				apiKey: process.env.OPENAI_API_KEY!,
			}}
			voiceSettings={{
				language: 'en-US',
				voiceId: 'alloy', // OpenAI voices: alloy, echo, fable, onyx, nova, shimmer
				useBrowserTTS: false, // Use OpenAI TTS instead of browser
				autoAddToMessages: true, // Add voice interactions to chat history
				pitch: 1.0,
				rate: 1.0,
				volume: 1.0,
				endpoint: '/api/voice', // Optional: Custom voice endpoint
			}}>
			{/* Your app content */}
		</CedarCopilot>
	);
}

Voice Settings

The voiceSettings prop accepts a partial configuration object with the following TypeScript interface:

interface VoiceSettings {
	language: string;                // Required - Language code for speech recognition/synthesis
	voiceId?: string;               // Optional - Voice identifier (provider-specific)
	pitch?: number;                 // Optional - Voice pitch modulation (0.5-2.0)
	rate?: number;                  // Optional - Speech rate (0.5-2.0)
	volume?: number;                // Optional - Audio volume (0.0-1.0)
	useBrowserTTS?: boolean;        // Optional - Use browser's built-in TTS
	autoAddToMessages?: boolean;    // Optional - Add voice interactions to chat
	endpoint?: string;              // Optional - Custom voice endpoint URL
}

// Usage: All properties except 'language' are optional
voiceSettings?: Partial<VoiceSettings>

Default Values

Setting	Type	Default	Description
`language`	string	`'en-US'`	Language code for speech recognition/synthesis
`voiceId`	string	`undefined`	Voice identifier (provider-specific)
`pitch`	number	`1.0`	Voice pitch modulation (0.5-2.0)
`rate`	number	`1.0`	Speech rate (0.5-2.0)
`volume`	number	`1.0`	Audio volume (0.0-1.0)
`useBrowserTTS`	boolean	`false`	Use browser’s built-in TTS
`autoAddToMessages`	boolean	`true`	Add voice interactions to chat
`endpoint`	string	`undefined`	Custom voice endpoint URL

Provider-Specific Configuration

OpenAI / AI SDK

<CedarCopilot
  llmProvider={{
    provider: 'ai-sdk',
    providers: {
      openai: { apiKey: process.env.OPENAI_API_KEY! },
    },
  }}
  voiceSettings={{
    voiceId: 'nova', // OpenAI voice selection
    useBrowserTTS: false, // Use OpenAI TTS
  }}
>

Mastra

<CedarCopilot
  llmProvider={{
    provider: 'mastra',
    baseURL: 'https://your-mastra-api.com',
    apiKey: process.env.MASTRA_API_KEY,
    voiceRoute: '/voice', // Auto-configures voice endpoint
  }}
  voiceSettings={{
    language: 'en-US',
    autoAddToMessages: true,
    // endpoint is auto-configured from voiceRoute
  }}
>

Custom Backend

<CedarCopilot
  llmProvider={{
    provider: 'custom',
    // ... custom provider config
  }}
  voiceSettings={{
    endpoint: 'https://your-api.com/voice',
    useBrowserTTS: true, // Use browser TTS if backend doesn't provide audio
  }}
>

Overview

Cedar-OS provides a complete voice integration system that enables natural voice conversations with AI agents. The voice system handles audio capture, streaming to backend services, and automatic playback of responses, with seamless integration into the messaging system.

Features

🎤 Voice Capture: Browser-based audio recording with permission management
🔊 Audio Playback: Automatic playback of audio responses from agents
🌐 Streaming Support: Real-time audio streaming to configurable endpoints
🔧 Flexible Configuration: Customizable voice settings (language, pitch, rate, volume)
🎯 State Management: Full integration with Cedar’s Zustand-based store
💬 Message Integration: Automatic addition of voice interactions to chat history
🛡️ Error Handling: Comprehensive error states and recovery
🎨 Visual Indicators: Animated voice status indicators

ChatInput Component with Built-in Voice

The ChatInput component from cedar-os-components comes with voice functionality built-in, providing a seamless voice experience out of the box. When you use this component, voice capabilities are automatically available without additional configuration.

How It Works

The ChatInput component integrates voice through the following key features:

import { useVoice } from 'cedar-os';

// The useVoice hook provides complete access to voice state and controls
const voice = useVoice();

// Available properties
voice.isListening; // Is currently recording audio
voice.isSpeaking; // Is playing audio response
voice.voicePermissionStatus; // 'granted' | 'denied' | 'prompt' | 'not-supported'
voice.voiceError; // Error message if any
voice.voiceSettings; // Current voice configuration

// Available methods
voice.checkVoiceSupport(); // Check browser compatibility
voice.requestVoicePermission(); // Request microphone access
voice.toggleVoice(); // Start/stop recording
voice.startListening(); // Start recording
voice.stopListening(); // Stop recording
voice.updateVoiceSettings(); // Update configuration
voice.resetVoiceState(); // Clean up resources

ChatInput Implementation Details

The ChatInput component automatically:

Displays a microphone button that changes appearance based on voice state
Shows the VoiceIndicator when voice is active (listening or speaking)
Handles keyboard shortcuts - Press ‘M’ to toggle voice (when not typing)
Manages permissions - Automatically requests microphone access when needed
Provides visual feedback - Button animations and color changes for different states

// The mic button automatically changes appearance
getMicButtonClass() {
  if (voice.isListening) {
    // Red pulsing animation when recording
    return 'text-red-500 animate-pulse';
  }
  if (voice.isSpeaking) {
    // Green when playing response
    return 'text-green-500';
  }
  if (voice.voicePermissionStatus === 'denied') {
    // Grayed out if permission denied
    return 'text-gray-400 cursor-not-allowed';
  }
  // Default state
  return 'text-gray-600 hover:text-black';
}

Exported Components and Hooks

All voice-related functionality is exported from the main cedar-os package:

// Main exports from 'cedar-os'
export { useVoice } from 'cedar-os'; // Voice control hook
export { VoiceIndicator } from 'cedar-os'; // Voice status component
export { cn } from 'cedar-os'; // Utility for className merging

// The ChatInput component is in cedar-os-components
import { ChatInput } from 'cedar-os-components';

Quick Start

1. Automatic Configuration with Mastra

The easiest way to set up voice integration is through the Mastra provider configuration:

import { CedarCopilot } from 'cedar-os';

function App() {
	return (
		<CedarCopilot
			llmProvider={{
				provider: 'mastra',
				baseURL: 'http://localhost:3000/api',
				voiceRoute: '/chat/voice-execute', // Automatically configures voice endpoint
			}}>
			<YourVoiceApp />
		</CedarCopilot>
	);
}

When you specify a voiceRoute in your Mastra configuration, Cedar-OS automatically sets the voice endpoint to baseURL + voiceRoute.

2. Manual Configuration

For non-Mastra providers or custom setups, configure the voice endpoint manually:

import { useCedarStore } from '@cedar/core';

function VoiceChat() {
	const voice = useCedarStore((state) => state.voice);

	useEffect(() => {
		// Configure the voice endpoint manually
		voice.setVoiceEndpoint('http://localhost:3456/api/chat/voice');

		// Cleanup on unmount
		return () => {
			voice.resetVoiceState();
		};
	}, []);

	return (
		<div>
			<button
				onClick={() => voice.toggleVoice()}
				disabled={voice.voicePermissionStatus !== 'granted'}>
				{voice.isListening ? 'Stop Listening' : 'Start Listening'}
			</button>

			{voice.voiceError && <div className='error'>{voice.voiceError}</div>}
		</div>
	);
}

3. Request Microphone Permission

const handleEnableVoice = async () => {
	if (!voice.checkVoiceSupport()) {
		alert('Voice features are not supported in your browser');
		return;
	}

	await voice.requestVoicePermission();

	if (voice.voicePermissionStatus === 'granted') {
		console.log('Voice enabled!');
	}
};

4. Using the Voice Indicator

import { VoiceIndicator } from '@cedar/voice';

function App() {
	const voice = useCedarStore((state) => state.voice);

	return (
		<div>
			<VoiceIndicator voiceState={voice} />
			{/* Rest of your app */}
		</div>
	);
}

Provider-Specific Voice Configuration

Mastra Provider

When using the Mastra provider, voice configuration is streamlined through the provider setup:

<CedarCopilot
	llmProvider={{
		provider: 'mastra',
		baseURL: 'http://localhost:3000/api',
		chatPath: '/chat',
		voiceRoute: '/chat/voice-execute', // Automatically sets voice endpoint
	}}>
	<YourApp />
</CedarCopilot>

Benefits of Mastra voice integration:

Automatic endpoint configuration
Consistent routing with chat endpoints
Built-in context passing
Structured response handling

Other Providers

For OpenAI, Anthropic, AI SDK, or custom providers, configure the voice endpoint manually:

const voice = useCedarStore((state) => state.voice);

useEffect(() => {
	// Set your custom voice endpoint
	voice.setVoiceEndpoint('https://your-backend.com/api/voice');
}, []);

Voice State

The voice slice manages comprehensive state for voice interactions:

interface VoiceState {
	// Core state
	isVoiceEnabled: boolean;
	isListening: boolean;
	isSpeaking: boolean;
	voiceEndpoint: string;
	voicePermissionStatus: 'granted' | 'denied' | 'prompt' | 'not-supported';
	voiceError: string | null;

	// Audio resources
	audioStream: MediaStream | null;
	audioContext: AudioContext | null;
	mediaRecorder: MediaRecorder | null;

	// Voice settings
	voiceSettings: {
		language: string; // Required - Language code (e.g., 'en-US')
		voiceId?: string; // Optional - Voice ID for TTS (provider-specific)
		pitch?: number; // Optional - Voice pitch (0.5 to 2.0)
		rate?: number; // Optional - Speech rate (0.5 to 2.0)
		volume?: number; // Optional - Audio volume (0.0 to 1.0)
		useBrowserTTS?: boolean; // Optional - Use browser TTS instead of backend
		autoAddToMessages?: boolean; // Optional - Add voice interactions to messages
		endpoint?: string; // Optional - Custom voice endpoint URL
	};
}

Available Actions

Permission Management

checkVoiceSupport() - Check if browser supports voice features
requestVoicePermission() - Request microphone access

Voice Control

startListening() - Start recording audio
stopListening() - Stop recording and send to endpoint
toggleVoice() - Toggle between listening and idle states

Audio Processing

streamAudioToEndpoint(audioData) - Send audio to backend
playAudioResponse(audioUrl) - Play audio response

Configuration

setVoiceEndpoint(endpoint) - Set the backend endpoint URL
updateVoiceSettings(settings) - Update voice configuration
setVoiceError(error) - Set error message
resetVoiceState() - Clean up and reset all voice state

Message Integration

By default, voice interactions are automatically added to the Cedar messages store, creating a seamless conversation history:

// User speech is transcribed and added as a user message
{
  type: 'text',
  role: 'user',
  content: 'Show me the latest reports',
  metadata: {
    source: 'voice',
    timestamp: '2024-01-01T12:00:00Z'
  }
}

// Agent response is added as an assistant message
{
  type: 'text',
  role: 'assistant',
  content: 'Here are your latest reports...',
  metadata: {
    source: 'voice',
    usage: { /* token usage data */ },
    timestamp: '2024-01-01T12:00:01Z'
  }
}

Disabling Message Integration

voice.updateVoiceSettings({
	autoAddToMessages: false,
});

Browser Compatibility

The voice system requires modern browser APIs:

navigator.mediaDevices.getUserMedia - Audio capture
MediaRecorder API - Audio recording
AudioContext API - Audio processing

Supported browsers:

Chrome/Edge 47+
Firefox 25+
Safari 11+
Opera 34+

HTTPS is required for microphone access in production environments (localhost is exempt).

Next Steps

Backend Integration

Learn how to set up your backend to handle voice requests

Streaming Implementation

Implement real-time audio streaming

Examples

Complete Voice Chat Component

import { useCedarStore } from '@cedar/core';
import { VoiceIndicator } from '@cedar/voice';
import { useEffect } from 'react';

export function VoiceChat() {
	const voice = useCedarStore((state) => state.voice);

	useEffect(() => {
		voice.setVoiceEndpoint('http://localhost:3456/api/chat/voice');
		return () => voice.resetVoiceState();
	}, []);

	const handleVoiceToggle = async () => {
		if (voice.voicePermissionStatus === 'prompt') {
			await voice.requestVoicePermission();
		}

		if (voice.voicePermissionStatus === 'granted') {
			voice.toggleVoice();
		}
	};

	return (
		<div className='voice-chat'>
			<VoiceIndicator voiceState={voice} />

			<button
				onClick={handleVoiceToggle}
				className={`voice-button ${voice.isListening ? 'listening' : ''}`}
				disabled={voice.voicePermissionStatus === 'denied'}>
				{voice.isListening ? 'Stop' : 'Talk'}
			</button>

			{voice.voiceError && (
				<div className='error-message'>{voice.voiceError}</div>
			)}

			<div className='voice-settings'>
				<select
					value={voice.voiceSettings.language}
					onChange={(e) =>
						voice.updateVoiceSettings({ language: e.target.value })
					}>
					<option value='en-US'>English (US)</option>
					<option value='en-GB'>English (UK)</option>
					<option value='es-ES'>Spanish</option>
					<option value='fr-FR'>French</option>
				</select>
			</div>
		</div>
	);
}

Changelog

Introduction to Cedar-OS

Getting Started

Connecting to an Agent

Chat

Agent Input Context

Agentic State

Voice (Beta)

Spells (Coming Soon)

External Integrations (Coming Soon)

Customising Cedar

Learn more about Cedar ❤️🌳

​Voice Integration

​Voice Processing Architecture

​Quick Start

​Voice Settings

​Default Values

​Provider-Specific Configuration

​OpenAI / AI SDK

​Mastra

​Custom Backend

​Overview

​Features

​ChatInput Component with Built-in Voice

​How It Works

​ChatInput Implementation Details

​Exported Components and Hooks

​Quick Start

​1. Automatic Configuration with Mastra

​2. Manual Configuration

​3. Request Microphone Permission

​4. Using the Voice Indicator

​Provider-Specific Voice Configuration

​Mastra Provider

​Other Providers

​Voice State

​Available Actions

​Permission Management

​Voice Control

​Audio Processing

​Configuration

​Message Integration

​Disabling Message Integration

​Browser Compatibility

​Next Steps

Backend Integration

Streaming Implementation

​Examples

​Complete Voice Chat Component

Voice Integration

Voice Processing Architecture

Quick Start

Voice Settings

Default Values

Provider-Specific Configuration

OpenAI / AI SDK

Mastra

Custom Backend

Overview

Features

ChatInput Component with Built-in Voice

How It Works

ChatInput Implementation Details

Exported Components and Hooks

Quick Start

1. Automatic Configuration with Mastra

2. Manual Configuration

3. Request Microphone Permission

4. Using the Voice Indicator

Provider-Specific Voice Configuration

Mastra Provider

Other Providers

Voice State

Available Actions

Permission Management

Voice Control

Audio Processing

Configuration

Message Integration

Disabling Message Integration

Browser Compatibility

Next Steps

Examples

Complete Voice Chat Component