Voice Integration

Cedar OS provides comprehensive voice support that integrates seamlessly with your agent backend. Voice processing is now handled through the agent connection system, providing a unified approach across different providers.

Voice Processing Architecture

As of the latest update, voice processing now works out of the box for Mastra, AI SDK, and custom backends.
  • Mastra/Custom backends: Voice data is sent directly to your voice endpoint
  • AI SDK/OpenAI providers: Voice is transcribed using Whisper, then processed through the LLM.
See the Voice Endpoint Format documentation for implementation details.This is still in beta. If you want more details please check out the repo, book a call at https://calendly.com/jesse-cedarcopilot/30min or join our discord.

Quick Start

Initialize Cedar with voice settings (already configured with the built in Cedar chat input):
import { CedarCopilot } from 'cedar-os';

function App() {
	return (
		<CedarCopilot
			llmProvider={{
				provider: 'openai',
				apiKey: process.env.OPENAI_API_KEY!,
			}}
			voiceSettings={{
				language: 'en-US',
				voiceId: 'alloy', // OpenAI voices: alloy, echo, fable, onyx, nova, shimmer
				useBrowserTTS: false, // Use OpenAI TTS instead of browser
				autoAddToMessages: true, // Add voice interactions to chat history
				pitch: 1.0,
				rate: 1.0,
				volume: 1.0,
				endpoint: '/api/voice', // Optional: Custom voice endpoint
			}}>
			{/* Your app content */}
		</CedarCopilot>
	);
}

Voice Settings

The voiceSettings prop accepts a partial configuration object with the following TypeScript interface:
interface VoiceSettings {
	language: string;                // Required - Language code for speech recognition/synthesis
	voiceId?: string;               // Optional - Voice identifier (provider-specific)
	pitch?: number;                 // Optional - Voice pitch modulation (0.5-2.0)
	rate?: number;                  // Optional - Speech rate (0.5-2.0)
	volume?: number;                // Optional - Audio volume (0.0-1.0)
	useBrowserTTS?: boolean;        // Optional - Use browser's built-in TTS
	autoAddToMessages?: boolean;    // Optional - Add voice interactions to chat
	endpoint?: string;              // Optional - Custom voice endpoint URL
}

// Usage: All properties except 'language' are optional
voiceSettings?: Partial<VoiceSettings>

Default Values

SettingTypeDefaultDescription
languagestring'en-US'Language code for speech recognition/synthesis
voiceIdstringundefinedVoice identifier (provider-specific)
pitchnumber1.0Voice pitch modulation (0.5-2.0)
ratenumber1.0Speech rate (0.5-2.0)
volumenumber1.0Audio volume (0.0-1.0)
useBrowserTTSbooleanfalseUse browser’s built-in TTS
autoAddToMessagesbooleantrueAdd voice interactions to chat
endpointstringundefinedCustom voice endpoint URL

Provider-Specific Configuration

OpenAI / AI SDK

<CedarCopilot
  llmProvider={{
    provider: 'ai-sdk',
    providers: {
      openai: { apiKey: process.env.OPENAI_API_KEY! },
    },
  }}
  voiceSettings={{
    voiceId: 'nova', // OpenAI voice selection
    useBrowserTTS: false, // Use OpenAI TTS
  }}
>

Mastra

<CedarCopilot
  llmProvider={{
    provider: 'mastra',
    baseURL: 'https://your-mastra-api.com',
    apiKey: process.env.MASTRA_API_KEY,
    voiceRoute: '/voice', // Auto-configures voice endpoint
  }}
  voiceSettings={{
    language: 'en-US',
    autoAddToMessages: true,
    // endpoint is auto-configured from voiceRoute
  }}
>

Custom Backend

<CedarCopilot
  llmProvider={{
    provider: 'custom',
    // ... custom provider config
  }}
  voiceSettings={{
    endpoint: 'https://your-api.com/voice',
    useBrowserTTS: true, // Use browser TTS if backend doesn't provide audio
  }}
>

Overview

Cedar-OS provides a complete voice integration system that enables natural voice conversations with AI agents. The voice system handles audio capture, streaming to backend services, and automatic playback of responses, with seamless integration into the messaging system.

Features

  • 🎤 Voice Capture: Browser-based audio recording with permission management
  • 🔊 Audio Playback: Automatic playback of audio responses from agents
  • 🌐 Streaming Support: Real-time audio streaming to configurable endpoints
  • 🔧 Flexible Configuration: Customizable voice settings (language, pitch, rate, volume)
  • 🎯 State Management: Full integration with Cedar’s Zustand-based store
  • 💬 Message Integration: Automatic addition of voice interactions to chat history
  • 🛡️ Error Handling: Comprehensive error states and recovery
  • 🎨 Visual Indicators: Animated voice status indicators

ChatInput Component with Built-in Voice

The ChatInput component from cedar-os-components comes with voice functionality built-in, providing a seamless voice experience out of the box. When you use this component, voice capabilities are automatically available without additional configuration.

How It Works

The ChatInput component integrates voice through the following key features:
import { useVoice } from 'cedar-os';

// The useVoice hook provides complete access to voice state and controls
const voice = useVoice();

// Available properties
voice.isListening; // Is currently recording audio
voice.isSpeaking; // Is playing audio response
voice.voicePermissionStatus; // 'granted' | 'denied' | 'prompt' | 'not-supported'
voice.voiceError; // Error message if any
voice.voiceSettings; // Current voice configuration

// Available methods
voice.checkVoiceSupport(); // Check browser compatibility
voice.requestVoicePermission(); // Request microphone access
voice.toggleVoice(); // Start/stop recording
voice.startListening(); // Start recording
voice.stopListening(); // Stop recording
voice.updateVoiceSettings(); // Update configuration
voice.resetVoiceState(); // Clean up resources

ChatInput Implementation Details

The ChatInput component automatically:
  1. Displays a microphone button that changes appearance based on voice state
  2. Shows the VoiceIndicator when voice is active (listening or speaking)
  3. Handles keyboard shortcuts - Press ‘M’ to toggle voice (when not typing)
  4. Manages permissions - Automatically requests microphone access when needed
  5. Provides visual feedback - Button animations and color changes for different states
// The mic button automatically changes appearance
getMicButtonClass() {
  if (voice.isListening) {
    // Red pulsing animation when recording
    return 'text-red-500 animate-pulse';
  }
  if (voice.isSpeaking) {
    // Green when playing response
    return 'text-green-500';
  }
  if (voice.voicePermissionStatus === 'denied') {
    // Grayed out if permission denied
    return 'text-gray-400 cursor-not-allowed';
  }
  // Default state
  return 'text-gray-600 hover:text-black';
}

Exported Components and Hooks

All voice-related functionality is exported from the main cedar-os package:
// Main exports from 'cedar-os'
export { useVoice } from 'cedar-os'; // Voice control hook
export { VoiceIndicator } from 'cedar-os'; // Voice status component
export { cn } from 'cedar-os'; // Utility for className merging

// The ChatInput component is in cedar-os-components
import { ChatInput } from 'cedar-os-components';

Quick Start

1. Automatic Configuration with Mastra

The easiest way to set up voice integration is through the Mastra provider configuration:
import { CedarCopilot } from 'cedar-os';

function App() {
	return (
		<CedarCopilot
			llmProvider={{
				provider: 'mastra',
				baseURL: 'http://localhost:3000/api',
				voiceRoute: '/chat/voice-execute', // Automatically configures voice endpoint
			}}>
			<YourVoiceApp />
		</CedarCopilot>
	);
}
When you specify a voiceRoute in your Mastra configuration, Cedar-OS automatically sets the voice endpoint to baseURL + voiceRoute.

2. Manual Configuration

For non-Mastra providers or custom setups, configure the voice endpoint manually:
import { useCedarStore } from '@cedar/core';

function VoiceChat() {
	const voice = useCedarStore((state) => state.voice);

	useEffect(() => {
		// Configure the voice endpoint manually
		voice.setVoiceEndpoint('http://localhost:3456/api/chat/voice');

		// Cleanup on unmount
		return () => {
			voice.resetVoiceState();
		};
	}, []);

	return (
		<div>
			<button
				onClick={() => voice.toggleVoice()}
				disabled={voice.voicePermissionStatus !== 'granted'}>
				{voice.isListening ? 'Stop Listening' : 'Start Listening'}
			</button>

			{voice.voiceError && <div className='error'>{voice.voiceError}</div>}
		</div>
	);
}

3. Request Microphone Permission

const handleEnableVoice = async () => {
	if (!voice.checkVoiceSupport()) {
		alert('Voice features are not supported in your browser');
		return;
	}

	await voice.requestVoicePermission();

	if (voice.voicePermissionStatus === 'granted') {
		console.log('Voice enabled!');
	}
};

4. Using the Voice Indicator

import { VoiceIndicator } from '@cedar/voice';

function App() {
	const voice = useCedarStore((state) => state.voice);

	return (
		<div>
			<VoiceIndicator voiceState={voice} />
			{/* Rest of your app */}
		</div>
	);
}

Provider-Specific Voice Configuration

Mastra Provider

When using the Mastra provider, voice configuration is streamlined through the provider setup:
<CedarCopilot
	llmProvider={{
		provider: 'mastra',
		baseURL: 'http://localhost:3000/api',
		chatPath: '/chat',
		voiceRoute: '/chat/voice-execute', // Automatically sets voice endpoint
	}}>
	<YourApp />
</CedarCopilot>
Benefits of Mastra voice integration:
  • Automatic endpoint configuration
  • Consistent routing with chat endpoints
  • Built-in context passing
  • Structured response handling

Other Providers

For OpenAI, Anthropic, AI SDK, or custom providers, configure the voice endpoint manually:
const voice = useCedarStore((state) => state.voice);

useEffect(() => {
	// Set your custom voice endpoint
	voice.setVoiceEndpoint('https://your-backend.com/api/voice');
}, []);

Voice State

The voice slice manages comprehensive state for voice interactions:
interface VoiceState {
	// Core state
	isVoiceEnabled: boolean;
	isListening: boolean;
	isSpeaking: boolean;
	voiceEndpoint: string;
	voicePermissionStatus: 'granted' | 'denied' | 'prompt' | 'not-supported';
	voiceError: string | null;

	// Audio resources
	audioStream: MediaStream | null;
	audioContext: AudioContext | null;
	mediaRecorder: MediaRecorder | null;

	// Voice settings
	voiceSettings: {
		language: string; // Required - Language code (e.g., 'en-US')
		voiceId?: string; // Optional - Voice ID for TTS (provider-specific)
		pitch?: number; // Optional - Voice pitch (0.5 to 2.0)
		rate?: number; // Optional - Speech rate (0.5 to 2.0)
		volume?: number; // Optional - Audio volume (0.0 to 1.0)
		useBrowserTTS?: boolean; // Optional - Use browser TTS instead of backend
		autoAddToMessages?: boolean; // Optional - Add voice interactions to messages
		endpoint?: string; // Optional - Custom voice endpoint URL
	};
}

Available Actions

Permission Management

  • checkVoiceSupport() - Check if browser supports voice features
  • requestVoicePermission() - Request microphone access

Voice Control

  • startListening() - Start recording audio
  • stopListening() - Stop recording and send to endpoint
  • toggleVoice() - Toggle between listening and idle states

Audio Processing

  • streamAudioToEndpoint(audioData) - Send audio to backend
  • playAudioResponse(audioUrl) - Play audio response

Configuration

  • setVoiceEndpoint(endpoint) - Set the backend endpoint URL
  • updateVoiceSettings(settings) - Update voice configuration
  • setVoiceError(error) - Set error message
  • resetVoiceState() - Clean up and reset all voice state

Message Integration

By default, voice interactions are automatically added to the Cedar messages store, creating a seamless conversation history:
// User speech is transcribed and added as a user message
{
  type: 'text',
  role: 'user',
  content: 'Show me the latest reports',
  metadata: {
    source: 'voice',
    timestamp: '2024-01-01T12:00:00Z'
  }
}

// Agent response is added as an assistant message
{
  type: 'text',
  role: 'assistant',
  content: 'Here are your latest reports...',
  metadata: {
    source: 'voice',
    usage: { /* token usage data */ },
    timestamp: '2024-01-01T12:00:01Z'
  }
}

Disabling Message Integration

voice.updateVoiceSettings({
	autoAddToMessages: false,
});

Browser Compatibility

The voice system requires modern browser APIs:
  • navigator.mediaDevices.getUserMedia - Audio capture
  • MediaRecorder API - Audio recording
  • AudioContext API - Audio processing
Supported browsers:
  • Chrome/Edge 47+
  • Firefox 25+
  • Safari 11+
  • Opera 34+
HTTPS is required for microphone access in production environments (localhost is exempt).

Next Steps

Examples

Complete Voice Chat Component

import { useCedarStore } from '@cedar/core';
import { VoiceIndicator } from '@cedar/voice';
import { useEffect } from 'react';

export function VoiceChat() {
	const voice = useCedarStore((state) => state.voice);

	useEffect(() => {
		voice.setVoiceEndpoint('http://localhost:3456/api/chat/voice');
		return () => voice.resetVoiceState();
	}, []);

	const handleVoiceToggle = async () => {
		if (voice.voicePermissionStatus === 'prompt') {
			await voice.requestVoicePermission();
		}

		if (voice.voicePermissionStatus === 'granted') {
			voice.toggleVoice();
		}
	};

	return (
		<div className='voice-chat'>
			<VoiceIndicator voiceState={voice} />

			<button
				onClick={handleVoiceToggle}
				className={`voice-button ${voice.isListening ? 'listening' : ''}`}
				disabled={voice.voicePermissionStatus === 'denied'}>
				{voice.isListening ? 'Stop' : 'Talk'}
			</button>

			{voice.voiceError && (
				<div className='error-message'>{voice.voiceError}</div>
			)}

			<div className='voice-settings'>
				<select
					value={voice.voiceSettings.language}
					onChange={(e) =>
						voice.updateVoiceSettings({ language: e.target.value })
					}>
					<option value='en-US'>English (US)</option>
					<option value='en-GB'>English (UK)</option>
					<option value='es-ES'>Spanish</option>
					<option value='fr-FR'>French</option>
				</select>
			</div>
		</div>
	);
}