Audio Service

The audio service is responsible for handling TTS and simple sounds playback

TTS

Two TTS plugins may be loaded at once, if the primary plugin fails for some reason the second plugin will be used.

This allows you to have a lower quality offline voice as fallback to account for internet outages, this ensures your device can always give you feedback

"tts": {
    "pulse_duck": false,

    // plugins to load
    "module": "ovos-tts-plugin-server",
    "fallback_module": "ovos-tts-plugin-mimic",

    // individual plugin configs
    "ovos-tts-plugin-server": {
        "host": "https://tts.smartgic.io/piper",
        "v2": true,
        "verify_ssl": true,
        "tts_timeout": 5,
    }
}

Skill Methods

skills can use self.play_audio, self.acknowledge, self.speak and self.speak_dialog methods to interact with ovos-audio

def play_audio(self, filename: str, instant: bool = False):
    """
    Queue and audio file for playback
    @param filename: File to play
    @param instant: if True audio will be played instantly 
                    instead of queued with TTS
    """
def acknowledge(self):
    """
    Acknowledge a successful request.

    This method plays a sound to acknowledge a request that does not
    require a verbal response. This is intended to provide simple feedback
    to the user that their request was handled successfully.
    """
def speak(self, utterance: str, expect_response: bool = False, wait: Union[bool, int] = False):
    """Speak a sentence.

    Args:
        utterance (str):        sentence mycroft should speak
        expect_response (bool): set to True if Mycroft should listen
                                for a response immediately after
                                speaking the utterance.
        wait (Union[bool, int]): set to True to block while the text
                                 is being spoken for 15 seconds. Alternatively, set
                                 to an integer to specify a timeout in seconds.
    """
def speak_dialog(self, key: str, data: Optional[dict] = None,
                 expect_response: bool = False, wait: Union[bool, int] = False):
    """
    Speak a random sentence from a dialog file.

    Args:
        key (str): dialog file key (e.g. "hello" to speak from the file
                                    "locale/en-us/hello.dialog")
        data (dict): information used to populate sentence
        expect_response (bool): set to True if Mycroft should listen
                                for a response immediately after
                                speaking the utterance.
        wait (Union[bool, int]): set to True to block while the text
                                 is being spoken for 15 seconds. Alternatively, set
                                 to an integer to specify a timeout in seconds.
    """

to play sounds via bus messages emit "mycroft.audio.play_sound" or "mycroft.audio.queue" with data {"uri": "path/sound.mp3"}

PlaybackThread

ovos-audio implements a queue for sounds, any OVOS component can queue a sound for playback.

Usually only TTS speech is queue for playback, but sounds effects may also be queued for richer experiences, for example in a story telling skill

The PlaybackThread ensures sounds don't play over each other but instead sequentially, listening might be triggered after TTS finishes playing if requested in the "speak" message

shorts sounds can be played outside the PlaybackThread, usually when instant feedback is required, such as in the listening sound or on error sounds

You can configure default sounds and the playback commands under mycroft.conf

  // File locations of sounds to play for default events
  "sounds": {
    "start_listening": "snd/start_listening.wav",
    "end_listening": "snd/end_listening.wav",
    "acknowledge": "snd/acknowledge.mp3",
    "error": "snd/error.mp3"
  },

  // Mechanism used to play WAV audio files
  // by default ovos-utils will try to detect best player
  "play_wav_cmdline": "paplay %1 --stream-name=mycroft-voice",

  // Mechanism used to play MP3 audio files
  // by default ovos-utils will try to detect best player
  "play_mp3_cmdline": "mpg123 %1",

  // Mechanism used to play OGG audio files
  // by default ovos-utils will try to detect best player
  "play_ogg_cmdline": "ogg123 -q %1",

NOTE: by default the playback commands are not set and OVOS will try to determine the best way to play a sound automatically

Native playback

Usually playback is triggered in response to originating bus message, eg "recognizer_loop:utterance", this message contains metadata that is used to determine if playback should happen.

message.context may contain a source and destination, playback is only triggered if a message destination is a native_source or if missing (considered a broadcast).

This separation of native sources allows remote clients such as an android app to interact with OVOS without the actual device where ovos-core is running repeating all TTS and music playback out loud

You can learn more about message targeting here

By default, only utterances originating from the speech client are considered native

"Audio": {
    "native_sources": ["debug_cli", "audio"]
}

Transformer Plugins

NEW in ovos-core version 0.0.8

Similarly to audio transformers in ovos-dinkum-listener, the utterance and audio data generated by TTS are exposed to a set of plugins that can transform them before playback

imagem

Dialog Transformers

Similarly to utterance transformers in core, ovos-audio exposes utterance and message.context to a set of plugins that can transform it before TTS stage

The utterance to be spoken is sent sequentially to all transformer plugins, ordered by priority (developer defined), until finally it is sent to the TTS stage

To enable a transformer add it to mycroft.conf

// To enable a dialog transformer plugin just add it's name with any relevant config
// these plugins can mutate utterances before TTS
"dialog_transformers": {
    "ovos-dialog-translation-plugin": {},
    "ovos-dialog-transformer-openai-plugin": {
        "rewrite_prompt": "rewrite the text as if you were explaining it to a 5 year old"
    }
}

TTS Transformers

The audio to be spoken is sent sequentially to all transformer plugins, ordered by priority (developer defined), until finally it played back to the user

NOTE: Does not work with StreamingTTS

To enable a transformer add it to mycroft.conf

// To enable a tts transformer plugin just add it's name with any relevant config
// these plugins can mutate audio after TTS
"tts_transformers": {
    "ovos-tts-transformer-sox-plugin": {
        "default_effects": {
            "speed": {"factor": 1.1}
        }
    }
}

Legacy Audio Service

The legacy audio service supports audio playback via the old mycroft api (@mycroft @ovos)

TIP: HiveMind satellites can also run the legacy audio service and handle playback on the satellite devices

{
    "enable_old_audioservice": true,

    "Audio": {
        "backends": {
          "OCP": {
            "type": "ovos_common_play",
            "disable_mpris": true,
            "manage_external_players": false,
            "active": true
          },
          "simple": {
            "type": "ovos_audio_simple",
            "active": true
          },
          "vlc": {
            "type": "ovos_vlc",
            "active": true
          }
        }
    }
  },
}

NOTE: "enable_old_audioservice" will default to false in ovos-core version 0.1.0

legacy plugins: - ocp - vlc - simple (no https support)

Disabling Classic OCP

Classic OCP is currently the default media playback subsystem for OVOS, scheduled to be replaced by ovos-media in ovos-core version 0.1.0

by default OCP acts as a translation layer for the legacy audio api, but if you want to disable ocp the legacy audio service remains available in a standalone fashion

NOTE: "default-backend" needs to be set if you disable ocp

{
    "enable_old_audioservice": true,
    "disable_ocp": true,
    "Audio": {
        "default-backend": "vlc",
        "backends": {
          "simple": {
            "type": "ovos_audio_simple",
            "active": true
          },
          "vlc": {
            "type": "ovos_vlc",
            "active": true
          }
        }
    }
  },
}

Classic OCP technical details:

  • OCP was developed for mycroft-core under the legacy audio system
  • OCP will pose as a legacy plugin and translate the received bus events to the OCP api
  • OCP is always the default audio plugin, unless you set "disable_ocp": true in config
  • OCP uses the legacy api internally, to delegate playback when GUI is not available (or if configured to do so)

NOTE: "disable_ocp" will default to true in ovos-core version 0.1.0