Services

Speech/non-speech detection
Determines speech segments in the input audio stream.
Voice to text (V2T)
Converts detected speech segments into text with time stamps for indexing.
Text post-processing
Formats the stream of recognized words into a proper form using a set of grammars.
Add your own custom words with every request.
Processing more than hours of audio every day
Supported languages for V2T
Supporting languages spoken by more than 1.3 billion of people
Slavic languages
Bosnian
Bulgarian
Croatia
Czech
Macedonian
Montenegrin
Polish
Russian
Serbian
Slovak
Slovenian
Ukrainian
Other languages
Hungarian
Romanian
Spanish
US English
Accuracy varies depending on task and language. Contact us to try V2T on your own data.
Features
Streaming
All our APIs process audio and return results in real-time, so you can react immediately
Multilingual
V2T is available for 18 languages with optional live updates, custom language models and ability to add words to dictionary.
Secure
Supporting on premise installations without internet access at any scale.
Use-cases
Telephone calls transcription, voice dictation, broadcast monitoring ...
Integration
REST
Batch processing of offline recordings.
Websocket
Real-time communication inside web browser.
GRPC
Non-web applications with fast response time requirements.
See our API documentation here.
Deployment
Container
Single machine
High available cluster
Cloud