Model2Vec Intent Pipeline

The Model2Vec Intent Pipeline is an advanced plugin for OpenVoiceOS, designed to enhance intent classification using pretrained Model2Vec models. By leveraging vector-based representations of natural language, this pipeline offers improved accuracy over traditional deterministic engines, especially in scenarios where intent recognition is challenging.

Features

Model2Vec-Powered Classification: Uses pretrained Model2Vec models for rich vector-based intent understanding.
Seamless OVOS Integration: Plug-and-play compatibility with existing OVOS intent pipelines.
Multilingual & Language-Specific Models: Offers large multilingual models distilled from LaBSE and smaller, efficient language-specific models ideal for limited hardware (e.g., Raspberry Pi).
Dynamic Intent Syncing: Automatically synchronizes with Adapt and Padatious intents at runtime.
Skill-Aware Matching: Classifies only official OVOS skill intents, reducing false positives by ignoring unregistered or personal skill intents.
Supports Partial Translations: Multilingual models allow usage of partially translated skills, provided their dialogs are translated.

Installation

Install the plugin via pip:

pip install ovos-m2v-pipeline

Configuration

Configure the plugin in your mycroft.conf file:

{
  "intents": {
    "ovos-m2v-pipeline": {
      "model": "Jarbas/ovos-model2vec-intents-LaBSE",
      "conf_high": 0.7,
      "conf_medium": 0.5,
      "conf_low": 0.15,
      "ignore_intents": []
    },
    "pipeline": [
      "converse",
      "ovos-m2v-pipeline-high",
      "padatious_high",
      "fallback_low"
    ]
  }
}

Parameters:

model: Path to the pretrained Model2Vec model or Hugging Face repository.
conf_high: Confidence threshold for high-confidence matches (default: 0.7).
conf_medium: Confidence threshold for medium-confidence matches (default: 0.5).
conf_low: Confidence threshold for low-confidence matches (default: 0.15).
ignore_intents: List of intent labels to ignore during matching.

Note: Model2Vec models are pretrained and do not dynamically learn new skills at runtime.

How It Works

Receives a user utterance as text input.
Predicts intent labels using the pretrained Model2Vec embedding model.
Filters out any intents not associated with currently loaded official OVOS skills.
Returns the highest-confidence matching intent.

This process enhances intent recognition, particularly in cases where traditional parsers like Adapt or Padatious may struggle.

Models Overview

Multilingual Model: Over 500MB, distilled from LaBSE, supports many languages and partially translated skills.
Language-Specific Models: Smaller (\~10x smaller than multilingual), highly efficient, almost as accurate — ideal for devices with limited resources.

Models can be specified via local paths or Hugging Face repositories: OVOS Model2Vec Models on Hugging Face

Training Data

The Model2Vec intent classifier is trained on a diverse, aggregated collection of intent examples from:

OVOS LLM Augment Intent Examples — synthetic utterances generated by large language models for OVOS skills.
Music Query Templates — focused on music-related intents.
Language-Specific Skill Intents — extracted CSV files from OpenVoiceOS GitLocalize covering English, Portuguese, Basque, Spanish, Galician, Dutch, French, German, Catalan, Italian, and Danish.

Models are regularly updated with new data to improve performance and language coverage.

Important Usage Notes

Official OVOS Skills Only: The Model2Vec pipeline classifies intents only from official OVOS skills. For personal or custom skills, you should continue to use Adapt and Padatious parsers alongside Model2Vec.
Complementary Pipeline: Model2Vec is designed to augment your intent pipeline, not replace Adapt or Padatious. Using all three together provides the best overall recognition.
Padatious Intent Data & Training: Padatious intent data and example utterances are available in GitLocalize for translations and new model training. The Model2Vec models are continuously updated with this data.
Language Support: The multilingual model (500MB+) supports many languages and works well with partially translated skills, as long as dialogs are localized.
Optimization: Language-specific models are on average 10x smaller and nearly as accurate as the multilingual model, making them ideal for constrained hardware or single-language setups.