Automatic Speech Recognition Embedded on the Edge - EdgeVUI

Professional Voice AI Integration Tools

The Creoir EdgeVUI™ simplifies the development of Voice UIs by providing advanced Natural Language Understanding (NLU) tools utilizing Automatic Speech Recognition (ASR), Text-to-speech (TTS), and Speech Signal Enhancement (SSE) technologies from Cerence AI. Additionally, the technology enables embedded transcription and analysis, with support for industry-specific custom vocabulary.

If you want to try Creoir speech technologies, request a demo here.

High Accuracy

Proven Cerence AI speech technologies used in more than 500 million cars
Fully SW-based Speech Signal Enhancement (SSE) algorithms reducing environmental noise impact

On-The-Edge Privacy

No audio data sent to cloud or 3rd party
Reliable operation without internet connection

Easy Integration

Fast development cycles with wide range of Voice UI creation tools
Simple-to-use MQTT and Rest-like-API customer user interface for Linux, Android, Windows OS

EdgeVUI™ Software Development Kit

Features

Requirements

Domain-specific, speaker-independent speech recognition and voice feedback

Built-in run-time libraries and API
Tools and documentation
Step-by-step instructions and sample applications
Method for domain-specific Natural Language Understanding

Built-in interface options for
Linux, Windows and Android

Programming language independent action code implementation

Python, C, C++, ReactJS, etc. for Linux and Windows
Kotlin and Java for Android

Simple-to-use customer interface with

MQTT (Linux & Windows)
Rest-like API (Android)

Hardware requirements

Embedded: ARM Cortex @1GHZ, 256MB RAM, 2GB file system
PC (Linux, Windows): X86-64, SSE2, AVX2
Mobile: Android 8 (API level 26)

Memory requirements

Flash: Data models ~7 MB/language
Executable code ~50 MB depending on features and complexity of Voice UI

RAM Usage

150 – 250 MB depending on features
Required footprint at runtime depends on the requested set / specific project

Cerence AI Speech Technologies

Creoir is a technology partner of Cerence AI, the developer and manufacturer of speech technologies used in nearly 500 million cars worldwide.

Creoir EdgeVUI™ Software Development Kit utilizes the following Cerence AI core speech technologies:

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) language models to support 40 languages (on request). The advanced Cerence AI speech recognition engine delivers a new level of speaker-independent and continuous speech recognition capabilities with unique features for voice-enabled applications:

Large vocabulary support: Enable embedded recognition for large lists up to millions of items
Wake-up words: Always listening mode with key-word activation removes the need for a “press to talk” button
Barge-in: Allows user to speak over spoken dialog prompts and be recognized
Global language support: Global support for over 40 languages provides universal functionality

Natural Text-to-Speech (TTS)

Natural Text-to-Speech (TTS) for up to 65 languages and 147 voices (on request). The Cerence AI Text-to-Speech (TTS) is a suite of speech output solutions to generate high-quality speech, with seamless blending of dynamic text-to-speech, pre-recorded audio and tuned prompts. It is optimized to read long texts in a natural, human way. New, deep learning-based algorithms deliver higher smoothness and more natural prosody, resulting in a unique voice experience, e.g.

Emotional TTS: Developers can choose from four different speaking styles: neutral, lively, forceful, and apologetic.
Prosody control: Volume, pitch, speaking rate, and timbre can be changed at run time for more dynamic and lively affects.
Languages and Voices: A truly universal voice portfolio offers 65 languages and 147 voices for creation of global structures using a single engine.
Accuracy: High linguistic accuracy offers correct readout for all type of text, including a large set of personal names.

Speech Signal Enhancement (SSE)

Speech Signal Enhancement (SSE) improves the quality and clarity of spoken voice commands by reducing noise and distortions. In environments where background noise is prevalent, various techniques of audio processing are employed for optimal speech recognition. These include noise suppression through adaptive filtering and spectral subtraction. SSE optimizes speech intelligibility in adverse acoustics conditions. Several software algorithms are utilized in SSE tuning for different microphone arrays between 1-16 microphones.

Acoustic Speaker Localization: Determines the location and direction of speaker based on audio signals, analyzing the arrival times, phase, and amplitude of the sound signals.
Wind Buffet Suppression: Removes the wind-induced vibrations and voice signals from the overall collected audio signal.
Noise Reduction: Minimizes unwanted sounds from an audio signal to increase the clarity of desired speech signal
Software Based Solution: Creoir EdgeVUI™ SSE is constantly developed with new features and evolving performance, and updates do not require any modifications to HW