Skip to main content

Introduction

The Savants Voice API enables real-time, bidirectional voice conversations between your application and AI. The API uses a hybrid approach combining HTTP for authentication and WebSockets for real-time audio streaming.

Architecture

The API follows a simple two-step process: Step 1: Authentication
Flutter App  →  [POST /api/websocket-voice/token]  →  Auth Server
Flutter App  ←  [JWT Token Response]  ←  Auth Server
Step 2: Real-time Communication
Flutter App  →  [WebSocket Connect + JWT]  →  WebSocket Server
Flutter App  ←  [savant_voice_connected confirmation]  ←  WebSocket Server
Flutter App  ↔  [Bidirectional Audio Stream]  ↔  WebSocket Server

Flow Description

  1. Authentication Request: Flutter app sends API key and device ID to get JWT token
  2. Token Response: Server returns short-lived JWT token (1 minute expiry)
  3. WebSocket Connection: App connects to WebSocket endpoint with JWT token
  4. Connection Confirmation: Server confirms with savant_voice_connected message
  5. Audio Streaming: Bidirectional PCM audio streaming begins

Base Configuration

EnvironmentBase URL
Productionhttps://api.thesavants.ai
WebSocketwss://api.thesavants.ai/voice-stream

API Endpoints

Authentication

  • POST /api/websocket-voice/token - Request JWT for WebSocket connection

WebSocket

  • WSS /voice-stream - Real-time audio streaming endpoint

Authentication Flow

1

Request Token

POST to /api/websocket-voice/token with API key and device ID
2

Receive JWT

Server returns short-lived (1 minute) JWT token
3

WebSocket Auth

Send JWT as first message after WebSocket connection
4

Begin Streaming

Start bidirectional audio streaming

Data Flow

Client → Server

  1. Authentication Token (String): JWT for connection authorization
  2. Audio Data (Binary): PCM audio from microphone

Server → Client

  1. Connection Messages (JSON): Status and error messages
  2. Audio Data (Binary): PCM audio from AI

Audio Specifications

PropertyValue
FormatRaw PCM
Sample Rate16,000 Hz
Bit Depth16-bit
Channels1 (Mono)
EndiannessLittle-Endian
EncodingSigned Integer

Rate Limits

ResourceLimit
Token Requests60/minute per device
Concurrent Connections10 per device
Audio StreamingNo specific limit (bandwidth dependent)

Error Handling

All errors follow a consistent format:
{
  "error": {
    "code": "ERROR_CODE",
    "message": "Human-readable description"
  }
}

SDK Support

LanguageStatusPackage
Flutter/Dart✅ OfficialNative implementation
JavaScript🚧 PlannedComing soon
Python🚧 PlannedComing soon

Security

  • Transport Security: All connections use TLS/SSL encryption
  • Authentication: JWT-based token authentication
  • Token Expiry: 1-minute token lifespan prevents replay attacks
  • No Data Storage: Audio streams are not stored server-side

Versioning

Current API version: v1 The API uses URL-based versioning:
https://your-server.com/api/v1/websocket-voice/token

Support

  • Technical Documentation: This reference guide
  • Integration Support: [email protected]
  • Response Time: 24-48 hours for technical inquiries