Voice Integration
Cedar OS provides comprehensive voice support that integrates seamlessly with your agent backend. Voice processing is now handled through the agent connection system, providing a unified approach across different providers.Voice Processing Architecture
As of the latest update, voice processing now works out of the box for Mastra, AI SDK, and custom backends.
- Mastra/Custom backends: Voice data is sent directly to your voice endpoint
- AI SDK/OpenAI providers: Voice is transcribed using Whisper, then processed through the LLM.
Quick Start
Initialize Cedar with voice settings (already configured with the built in Cedar chat input):Voice Settings
ThevoiceSettings
prop accepts a partial configuration object with the following TypeScript interface:
Default Values
Setting | Type | Default | Description |
---|---|---|---|
language | string | 'en-US' | Language code for speech recognition/synthesis |
voiceId | string | undefined | Voice identifier (provider-specific) |
pitch | number | 1.0 | Voice pitch modulation (0.5-2.0) |
rate | number | 1.0 | Speech rate (0.5-2.0) |
volume | number | 1.0 | Audio volume (0.0-1.0) |
useBrowserTTS | boolean | false | Use browser’s built-in TTS |
autoAddToMessages | boolean | true | Add voice interactions to chat |
endpoint | string | undefined | Custom voice endpoint URL |
Provider-Specific Configuration
OpenAI / AI SDK
Mastra
Custom Backend
Overview
Cedar-OS provides a complete voice integration system that enables natural voice conversations with AI agents. The voice system handles audio capture, streaming to backend services, and automatic playback of responses, with seamless integration into the messaging system.Features
- 🎤 Voice Capture: Browser-based audio recording with permission management
- 🔊 Audio Playback: Automatic playback of audio responses from agents
- 🌐 Streaming Support: Real-time audio streaming to configurable endpoints
- 🔧 Flexible Configuration: Customizable voice settings (language, pitch, rate, volume)
- 🎯 State Management: Full integration with Cedar’s Zustand-based store
- 💬 Message Integration: Automatic addition of voice interactions to chat history
- 🛡️ Error Handling: Comprehensive error states and recovery
- 🎨 Visual Indicators: Animated voice status indicators
ChatInput Component with Built-in Voice
TheChatInput
component from cedar-os-components
comes with voice functionality built-in, providing a seamless voice experience out of the box. When you use this component, voice capabilities are automatically available without additional configuration.
How It Works
The ChatInput component integrates voice through the following key features:ChatInput Implementation Details
The ChatInput component automatically:- Displays a microphone button that changes appearance based on voice state
- Shows the VoiceIndicator when voice is active (listening or speaking)
- Handles keyboard shortcuts - Press ‘M’ to toggle voice (when not typing)
- Manages permissions - Automatically requests microphone access when needed
- Provides visual feedback - Button animations and color changes for different states
Exported Components and Hooks
All voice-related functionality is exported from the maincedar-os
package:
Quick Start
1. Automatic Configuration with Mastra
The easiest way to set up voice integration is through the Mastra provider configuration:voiceRoute
in your Mastra configuration, Cedar-OS automatically sets the voice endpoint to baseURL + voiceRoute
.
2. Manual Configuration
For non-Mastra providers or custom setups, configure the voice endpoint manually:3. Request Microphone Permission
4. Using the Voice Indicator
Provider-Specific Voice Configuration
Mastra Provider
When using the Mastra provider, voice configuration is streamlined through the provider setup:- Automatic endpoint configuration
- Consistent routing with chat endpoints
- Built-in context passing
- Structured response handling
Other Providers
For OpenAI, Anthropic, AI SDK, or custom providers, configure the voice endpoint manually:Voice State
The voice slice manages comprehensive state for voice interactions:Available Actions
Permission Management
checkVoiceSupport()
- Check if browser supports voice featuresrequestVoicePermission()
- Request microphone access
Voice Control
startListening()
- Start recording audiostopListening()
- Stop recording and send to endpointtoggleVoice()
- Toggle between listening and idle states
Audio Processing
streamAudioToEndpoint(audioData)
- Send audio to backendplayAudioResponse(audioUrl)
- Play audio response
Configuration
setVoiceEndpoint(endpoint)
- Set the backend endpoint URLupdateVoiceSettings(settings)
- Update voice configurationsetVoiceError(error)
- Set error messageresetVoiceState()
- Clean up and reset all voice state
Message Integration
By default, voice interactions are automatically added to the Cedar messages store, creating a seamless conversation history:Disabling Message Integration
Browser Compatibility
The voice system requires modern browser APIs:navigator.mediaDevices.getUserMedia
- Audio captureMediaRecorder
API - Audio recordingAudioContext
API - Audio processing
- Chrome/Edge 47+
- Firefox 25+
- Safari 11+
- Opera 34+
HTTPS is required for microphone access in production environments (localhost
is exempt).
Next Steps
Backend Integration
Learn how to set up your backend to handle voice requests
Streaming Implementation
Implement real-time audio streaming