STT Plugins
STT plugins are responsible for converting spoken audio into text
STT
The base STT, this handles the audio in "batch mode" taking a complete audio file, and returning the complete transcription.
Each STT plugin class needs to define the execute()
method taking two arguments:
audio
(AudioData object) - the audio data to be transcribed.lang
(str) - optional - the BCP-47 language code
The bare minimum STT class will look something like
from ovos_plugin_manager.templates.stt import STT
class MySTT(STT):
def execute(audio, language=None):
# Handle audio data and return transcribed text
[...]
return text
StreamingSTT
A more advanced STT class for streaming data to the STT. This will receive chunks of audio data as they become available and they are streamed to an STT engine.
The plugin author needs to implement the create_streaming_thread()
method creating a thread for handling data sent through self.queue
.
The thread this method creates should be based on the StreamThread class. handle_audio_data()
method also needs to be implemented.
Entry point
To make the class detectable as an STT plugin, the package needs to provide an entry point under the mycroft.plugin.stt
namespace.
setup([...],
entry_points = {'mycroft.plugin.stt': 'example_stt = my_stt:mySTT'}
)
Where example_stt
is is the STT module name for the plugin, my_stt is the Python module and mySTT is the class in the module to return.
List of STT plugins
Plugin | Offline | Streaming | Type |
---|---|---|---|
ovos-stt-plugin-fasterwhisper | ✔️ | ❌ | FOSS |
ovos-stt-plugin-whispercpp | ✔️ | ❌ | FOSS |
ovos-stt-plugin-vosk | ✔️ | ❌ | FOSS |
ovos-stt-plugin-chromium | ❌ | ❌ | API (free) |
ovos-stt-plugin-http-server | ❌ | ❌ | API (self hosted) |
ovos-stt-plugin-pocketsphinx | ✔️ | ❌ | FOSS |
ovos-stt-azure-plugin | ❌ | ❌ | API (key) |
neon-stt-plugin-google_cloud_streaming | ❌ | ✔ | API (key) |
neon-stt-plugin-nemo | ✔️ | ✔️ | FOSS |
neon-stt-plugin-nemo-remote | ❌️ | ❌ | API (self hosted) |
Standalone Usage
STT plugins can be used in your owm projects as follows
from speech_recognition import Recognizer, AudioFile
plug = STTPlug()
# verify lang is supported
lang = "en-us"
assert lang in plug.available_languages
# read file
with AudioFile("test.wav") as source:
audio = Recognizer().record(source)
# transcribe AudioData object
transcript = plug.execute(audio, lang)
Plugin Template
from ovos_plugin_manager.templates.stt import STT
# base plugin class
class MySTTPlugin(STT):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# read config settings for your plugin
lm = self.config.get("language-model")
hmm = self.config.get("acoustic-model")
def execute(self, audio, language=None):
# TODO - convert audio into text and return string
transcript = "You said this"
return transcript
@property
def available_languages(self):
"""Return languages supported by this STT implementation in this state
This property should be overridden by the derived class to advertise
what languages that engine supports.
Returns:
set: supported languages
"""
# TODO - what langs can this STT handle?
return {"en-us", "es-es"}
# sample valid configurations per language
# "display_name" and "offline" provide metadata for UI
# "priority" is used to calculate position in selection dropdown
# 0 - top, 100-bottom
# all other keys represent an example valid config for the plugin
MySTTConfig = {
lang: [{"lang": lang,
"display_name": f"MySTT ({lang}",
"priority": 70,
"offline": True}]
for lang in ["en-us", "es-es"]
}