Introduction
This page explains how to use The Pragatix API to get a voice response to a written prompt.
We also offer a demo site to experience this here:
โhttps://audiochat.pragatix.ai/
To integrate this demo into your workflow, you should use the following API:
Ask & Play
This demo site showcases Pragatixโs advanced AI integration and text-to-speech (TTS) capabilities.
It allows users to ask questions and receive spoken answers based on uploaded documents or public data.
๐ข Background
Pragatix is a platform designed to provide advanced and private AI services, powered by either local AI models or cloud-based AI models.
Our goal is to deliver customizable, enterprise-grade AI solutions that integrate seamlessly with your existing tools and workflows.
This demo site โhttps://audiochat.pragatix.ai/ serves as a proof of concept demonstrating:
AI-driven interaction through natural language queries
Secure integration with user data and documents
Chat-to-speech (CTS) playback and voice output generation
Main Components
1. Data ID (content_id or collection_id)
This field identifies the data source (document collection) you want to query. By default, it uses the document: โืืงืฉืช ืจืืฉืืื ืืืื ืืจืืื ืคืจืื.docxโ
You can change the Data ID to another document by uploading it to the AGI Dashboard (see below).
If you leave this field empty, the system will answer based on public data from the model knowledge.
2. Question
Enter your question here .
The system will process your question and provide an answer, both as text and as a WAV audio file.
3. Max Words in Answer
Controls the maximum length of the text response (default: 30 words).
Useful when you want concise answers.
4. Audio Output
Once the response is ready, the WAV file will appear here for playback.
If no audio has been generated yet, youโll see the message:
โNo audio yet.โ
For querying other documents please contact us
Streaming Mode (Chat-To-Speech Live Streaming)
In addition to the standard Chat-To-Speech response (single response with full WAV file), the API now supports streaming output.
Streaming mode allows you to start playing audio before the final answer is ready, providing lower latency and real-time user experience.
To enable streaming, send in your request: โStreaming": true.
When streaming mode is enabled, the API returns multiple partial messages instead of a single file response.
Each message contains:
Field | Description |
|---|---|
is_final_answer | false for partial chunks, true for the final message |
data.AnswerText | Partial text for this chunk |
data.AudioFile | Contains RAW PCM audio chunk |
{ "log_level": "Info", "is_final_answer": false, "has_data": true, "data_type": "Json", "user_message": "audio_chunk_1_1", "data": { "AnswerText": "ืฉืืื ืื, ืืื ืื ื ืืืื ืืขืืืจ?!", "AudioFile": { "Name": "ืฉืืื_ืื.raw", "MimeType": "audio/raw", "Content": "..." }, "SampleRate": 22050 } }
What You Need to Do as a Client
During streaming:
Collect each partial audio chunk (AudioFile.Content)
These chunks are base64-encoded RAW samples (no WAV header).Decode it into your application's preferred internal sample format
(e.g., Float32, Int16, PCM bytes, etc.)Append the decoded samples to a list or buffer for later reconstruction
Code Example (Python)
Rebuild WAV From Streaming Chunks
Below is a Python example for merging raw streaming chunks into a valid WAV file:
# Combine all raw audio chunks into a single audio file import numpy as np import struct def float32_chunks_to_wav_bytes(chunks, sample_rate, num_channels=1): """ Exactly mirrors the JavaScript function float32ChunksToWavBlob. chunks = list of byte strings containing FLOAT32 PCM little-endian. """ # 1. Decode Float32 LE chunks float_chunks = [np.frombuffer(c, dtype='<f4') for c in chunks] # 2. Merge samples merged = np.concatenate(float_chunks) # 3. Convert float32 โ int16 PCM clipped = np.clip(merged, -1.0, 1.0) pcm16 = (clipped * 32767).astype('<i2') # little-endian int16 # WAV constants bits_per_sample = 16 bytes_per_sample = bits_per_sample // 8 block_align = num_channels * bytes_per_sample byte_rate = sample_rate * block_align data_size = pcm16.nbytes riff_chunk_size = 36 + data_size # Build WAV header header = struct.pack( '<4sI4s' '4sIHHIIHH' '4sI', b'RIFF', riff_chunk_size, b'WAVE', b'fmt ', # fmt chunk 16, # PCM fmt chunk size 1, # PCM type num_channels, sample_rate, byte_rate, block_align, bits_per_sample, b'data', # data chunk data_size ) return header + pcm16.tobytes()
Streaming Workflow Example
import requests import base64 import json def chat_to_audio_streaming_support(): firewall_url = r"https://bgd15-firewall.agatdemo.com/firewallapi/v1/tts" api_key = "your_api_key_here" prompt = "ืืื" streaming = True chunks = [] payload = { "APIKey": api_key, "Prompt": prompt, "streaming": streaming } response = requests.post(firewall_url, json=payload, stream=True) for line in response.iter_lines(): if not line: continue if line.startswith(b"data: "): raw = line[len(b"data: "):] # Convert bytes โ string try: text = raw.decode("utf-8") except UnicodeDecodeError: continue # Parse JSON try: msg = json.loads(text) except json.JSONDecodeError: print("Skipping invalid JSON:", text) continue if not msg.get("is_final_answer", False) and msg.get("has_data", False): inner = msg.get("data", {}) audio = inner.get("AudioFile") if audio and "Content" in audio: raw_base64 = audio["Content"] raw_bytes = base64.b64decode(raw_base64) chunks.append(raw_bytes) print(f"Received audio chunk: {len(raw_bytes)} bytes") # Combine chunks into a single WAV file wav_bytes = float32_chunks_to_wav_bytes(chunks, 22050) with open("output.wav", "wb") as f: f.write(wav_bytes) print("Saved output.wav")