A speech recognition library running in the browser thanks to a WebAssembly build of Vosk

Last update: Jan 3, 2023

Overview

Vosk-Browser

A somewhat opinionated speech recognition library for the browser using a WebAssembly build of Vosk

This library picks up the work done by Denis Treskunov and packages an updated Vosk WebAssembly build as an easy-to-use browser library.

Note: WebAssembly builds can target NodeJS, the browser's main thread or web workers. This library explicitly compiles Vosk to be used in a WebWorker context. If you want to use Vosk in a NodeJS application it is recommended to use the official node bindings.

Live Demo

Checkout the demo running in-browser speech recognition of microphone input or audio files in 13 languages.

Vosk-Browser Live Demo

Installation

You can install vosk-browser as a module:

$ npm i vosk-browser

You can also use a CDN like jsdelivr to add the library to your page, which will be accessible via the global variable Vosk:

Usage

See the README in ./lib for API reference documentation or check out the examples folder for some ways of using the library

Basic example

One of the simplest examples that assumes vosk-browser is loaded via a script tag. It loads the model named model.tar.gzlocated in the same path as the script and starts listening to the microphone. Recognition results are logged to the console.

async function init() {
    const model = await Vosk.createModel('model.tar.gz');

    const recognizer = new model.KaldiRecognizer();
    recognizer.on("result", (message) => {
        console.log(`Result: ${message.result.text}`);
    });
    recognizer.on("partialresult", (message) => {
        console.log(`Partial result: ${message.result.partial}`);
    });
    
    const mediaStream = await navigator.mediaDevices.getUserMedia({
        video: false,
        audio: {
            echoCancellation: true,
            noiseSuppression: true,
            channelCount: 1,
            sampleRate: 16000
        },
    });
    
    const audioContext = new AudioContext();
    const recognizerNode = audioContext.createScriptProcessor(4096, 1, 1)
    recognizerNode.onaudioprocess = (event) => {
        try {
            recognizer.acceptWaveform(event.inputBuffer)
        } catch (error) {
            console.error('acceptWaveform failed', error)
        }
    }
    const source = audioContext.createMediaStreamSource(mediaStream);
    source.connect(recognizerNode);
}

window.onload = init;

Todos

Support for word/phrase lists in KaldiRecognizer
Add example with word/phrase list
Write tests
Automate npm publish
Update to OpenFST 1.8.0

Comments

Unable to load model

Hi,

Thanks for this work. I am using Chrome. The model file model.tar.gz is placed in the same folder. It never moves past "Loading..." message!

how do you start the demo locally?

Navigated to modern-vanilla directory and launched python3 -m http.server

can you share the output of the browser console?

ERROR (VoskAPI:Model():src/model.cc:122) Folder '/vosk/model_tar_gz' does not contain model files. Make sure you specified the model path properly in Model constructor. If you are not sure about relative path, use absolute path specification. put_char @ 82049aad-16de-4cf3-9fcf-0c277f01fe02:41

opened by raghavendrajain 8
Build broken by kaldi repo

The kaldi repo no longer has an upstream-1.8.0 branch nor a revision 75ecaef39 (thanks, git, for allowing erasing history). Right now, vosk-browser doesn't build because of these issues.

opened by Yahweasel 8
Online demo created

Not sure if this is of any use, but I created a small online demo using this tool when I was experimenting with it. You can view it online at

https://captioner.richardson.co.nz/

And the source code for it is at: https://github.com/Rodeoclash/captioner

It might be possible to adapt this for an official demo if you're interested (although it is lacking a few things at the moment, i.e. it only works on video and currently the videos have no audio when playing).

opened by Rodeoclash 7
Recognizer.removeEventListener
I am currently using Vue Js to run Vosk-browser and manage to call the ASR model and Kaldi recognizer by using

this.recognizer.on("result", (message) => { const result = message.result; this.full.textContent += result.text + " " })

The model is working well, however, I am trying to remove the event listener by using:

this.recognizer.removeEventListener("result", (message) => { const result = message.result; this.full.textContent += result.text + " " })

Is this the way of doing it?
opened by stevenlimcorn 7
Vosk model

I am new to javascript. I want to see how the vosk-browser script worked using the sample script. I downloaded a vosk model, zipped it as tar.gz and put it in the same folder as the script. I tried to just check for errors using a button onclick event on a html page. I got this on visual studio code: Setting up persistent storage at /vosk null/4ccd8af6-9ac1-407c-9f6a-436d83146d69:147 File system synced from host to runtime null/4ccd8af6-9ac1-407c-9f6a-436d83146d69:40 Am I to create a folder named "vosk". I really do not understand. Thank you for responding.

opened by temitopefunmi 6

Webpage is not loading

I have little coding experience, but I followed all guidelines to launch the demo app from examples/react folder. I ran npm install, npm build and a few other commands to resolve errors for webpack 5. However, finally when I ran npm run start, the vosk-browser fails to launch, even though no errors are detected. The page is empty.

C:\Users\CNata\Downloads\vosk-browser-master\examples\react>npm run start

> [email protected] start
> react-scripts start

(node:17120) [DEP_WEBPACK_DEV_SERVER_ON_AFTER_SETUP_MIDDLEWARE] DeprecationWarning: 'onAfterSetupMiddleware' option is deprecated. Please use the 'setupMiddlewares' option.
(Use `node --trace-deprecation ...` to show where the warning was created)
(node:17120) [DEP_WEBPACK_DEV_SERVER_ON_BEFORE_SETUP_MIDDLEWARE] DeprecationWarning: 'onBeforeSetupMiddleware' option is deprecated. Please use the 'setupMiddlewares' option.
Starting the development server...
Compiled successfully!

You can now view vosk-browser-react-demo in the browser.

  Local:            http://localhost:3000/vosk-browser
  On Your Network:  http://192.168.56.1:3000/vosk-browser

Note that the development build is not optimized.
To create a production build, use npm run build.

webpack compiled successfully
Files successfully emitted, waiting for typecheck results...
Issues checking in progress...
No issues found.

opened by Nata0801 5

Result event not triggered on file upload

Hello, I am working on a way to pass audio file to the recognizer all at once.

I took the react example and edited file-upload.tsx to send the whole file as buffer to the AudioStreamer "_write" method. The problem reside on the "result" event of the recognizer not being fired after the process. The "partialresult" event is called with every words but misses timestamps.

Here is the implementation of the "onChange" function in file-upload.tsx:

const onChange = useCallback(
    async ({ file }: UploadChangeParam<UploadFile<any>>) => {

      if (
        recognizer &&
        file.originFileObj &&
        file.percent === 100
      ) {
        const fileUrl = URL.createObjectURL(file.originFileObj);
        const _audioContext = audioContext ?? new AudioContext();
        const arr = await fetch(fileUrl).then((res) => res.arrayBuffer());

        _audioContext.decodeAudioData(arr, (buffer) => {
          let audioStreamer = new AudioStreamer(recognizer);
          audioStreamer._write(buffer, {
            objectMode: true,
          }, () => {
            console.log('done')
          });
        });
      }
    },
    [audioContext, recognizer]
  );

I have also noticed when uploading a second file it works well, the result event is triggered and includes both files data.

What I am missing? Is there a way to dispatch a "result" event?

opened by Clement-mim 4

Recognizer listens before the event 'result' or 'partialresult' is added
Hello! If I say "Hello" and then run the code below, I get the result "Hello".

this.recognizer.addEventListener('partialresult', this.getPartialResult); this.recognizer.addEventListener('result', this.getResult);

Expected: recognizer starts listening when got event listener.

I am creating a feature in which users press and speak.

I thought to write when users don't need a microphone like this, but then the recognizer pauses.

this.mediaStream.getAudioTracks (). forEach (track => { track.enabled = false; });

So if after a long time user will press my button again, code will run track.enabled = true and recognizer will continue to recognize previous (not actual) voice.

Tested on Vue.js
opened by timpuyda 4

Failed to sync file system: Error: FS error

I am getting the following error in both Chrome and Firefox...

Failed to sync file system: Error: FS error
(anonymous) @ fcedf841-34f4-40cb-8bb0-17f857a1d44c:127
Promise.catch (async)
handleMessage @ fcedf841-34f4-40cb-8bb0-17f857a1d44c:126
(anonymous) @ fcedf841-34f4-40cb-8bb0-17f857a1d44c:107

fcedf841-34f4-40cb-8bb0-17f857a1d44c:127 links to the following code:

    class RecognizerWorker {
        constructor() {
            this.recognizers = new Map();
            ctx.addEventListener("message", (event) => this.handleMessage(event));
        }
        handleMessage(event) {
            const message = event.data;
            if (!message) {
                return;
            }
            if (ClientMessage.isLoadMessage(message)) {
                console.debug(JSON.stringify(message));
                const { modelUrl } = message;
                if (!modelUrl) {
                    ctx.postMessage({
                        error: "Missing modelUrl parameter",
                    });
                }
                this.load(modelUrl)
                    .then((result) => {
                    ctx.postMessage({ event: "load", result });
                })
                    .catch((error) => {                                                       // --- IT'S THIS ERROR THAT IS CATCHING  
                    console.error(error);
                    ctx.postMessage({ error: error.message });
                });
                return;

... etc

Do let me know if more details to reproduce the error are needed.

Thankyou!!

opened by mattmegarry 4

Unable to load model in nodejs

When I run the following code:

let Vosk = require("vosk-browser");
let url = "model.tar.gz";
async function init() {
  const model = await Vosk.createModel(url);
}
init();

I get this error:

this.worker.addEventListener("message", (event) => this.handleMessage(event));
                   ^

TypeError: this.worker.addEventListener is not a function
    at EventTarget.initialize (/Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:238:25)
    at new Model (/Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:235:18)
    at Object.<anonymous> (/Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:354:27)
    at Generator.next (<anonymous>)
    at /Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:28:75
    at new Promise (<anonymous>)
    at __awaiter (/Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:24:16)
    at Object.createModel (/Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:353:16)
    at init (/Users/bobby/Desktop/vosk-browser/index.js:5:28)
    at Object.<anonymous> (/Users/bobby/Desktop/vosk-browser/index.js:7:1)

My folder structure is

|
|-- index.js
|-- model.tar.gz
|-- node_modules/

So I would thing the program could load the model, but I also get the same error when I set the url to be complete gibberish.

Thanks for your help

opened by LittleRobertTables 4

Attribution difficult

The NOTICES file doesn't include all dependent software, but every piece of dependent software requires attribution. This makes it extremely difficult for anyone to put together a correct (and legally mandatory) attribution and license notice. I put this one together, which I believe includes all dependencies: https://raw.githubusercontent.com/Yahweasel/ennuicastr/master/src/vosk-browser-license.js .

Moreover, I was surprised to find GSL in the mix. GSL is under the GPL (not the LGPL), so if it's being used, then vosk-browser as a whole is licensed under the GPL. That's no problem for my use, but it should be documented somewhere. Weirdly, though, as far as I can tell, it's not actually using GSL. The kaldi patch seems to add GSL to the configure, but doesn't add any uses of GSL as far as I can tell. If it was some experiment (perhaps from the original porter of vosk?) it should just be removed, to fix this licensing snafu.

opened by Yahweasel 4
information available in the User Agent string will be reduced

A page or script is accessing at least one of navigator.userAgent, navigator.appVersion, and navigator.platform. Starting in Chrome 101, the amount of information available in the User Agent string will be reduced. To fix this issue, replace the usage of navigator.userAgent, navigator.appVersion, and navigator.platform with feature detection, progressive enhancement, or migrate to navigator.userAgentData. Note that for performance reasons, only the first access to one of the properties is shown

opened by praksun 0

Delays when transcribing streaming audio

First of all, excellent work. Vosk is great as it is, and this library makes it even better.

I am experiencing a heavy delay on transcription when pulling in a stream from webRTC (partials and fulls).

I suspect maybe it is because of the deprecated "createScriptProcessor" and "onaudioprocess" pieces, but I am unsure.

Here is how I am processing things. If you have any ideas as to why things would be delayed, please let me know. Thank you.

this.recognizeSpeech = async () => {
    console.log("starting recognizeSpeech");
    let audioContext = this.remoteAudioContext;
    let remoteStream = this.incomingAudioStream;
    //
    const recognizerNode = audioContext.createScriptProcessor(4096, 1, 1);
    const model = await createModel("./softphone/model.tar.gz");
    const recognizer = new model.KaldiRecognizer(48000);
    recognizer.setWords(true);
    recognizer.on("partialresult", function (message) {
      console.log("PARTIAL: " + message.result.partial);
    });
    recognizerNode.onaudioprocess = async (event) => {
      try {
        recognizer.acceptWaveform(event.inputBuffer);
      } catch (error) {
        console.error("acceptWaveform failed", error);
      }
    };
    this.remoteTrack.connect(recognizerNode).connect(audioContext.destination);
  };

opened by scott-vector 3

Build output location

I am able to get the build to complete (when using the modification made in #56), but I cannot find the output files. I run the build by running make in the vosk-browser directory. Where does the build output its files? Do the output files need to be manually extracted from the Docker container?

opened by stevennyman 2
How to create an example of the X-vector of the speaker (voice fingerprint)?
Hello. First of all very big thank you for this project.

I am trying to create an example with a speaker model to get the X-vector of the speaker (voice fingerprint).

I am using this example: https://github.com/ccoreilly/vosk-browser/blob/master/examples/words-vanilla/index.js

const model = await Vosk.createModel('vosk-model-small-en-in-0.4.tar.gz'); const speakerModel = await Vosk.createSpeakerModel('vosk-model-spk-0.4.zip'); ... const recognizer = new model.KaldiRecognizer(sampleRate, JSON.stringify(['[unk]', 'encen el llum', 'apaga el llum'])); recognizer.setSpkModel(speakerModel); recognizer.on("result", (message) => { const result = message.result; if(result.hasOwnProperty('spk')) console.info("X-vector:", result.spk); });

Speaker identification model: https://alphacephei.com/vosk/models/vosk-model-spk-0.4.zip

Node.js example: https://github.com/alphacep/vosk-api/blob/master/nodejs/demo/test_speaker.js

Could you offer some advice, please:

How to load vosk-model-spk-0.4.zip

How to implement methods createSpeakerModel and setSpkModel

How to fetch the X-vector of the speaker (voice fingerprint)? Thank you for your answer.
opened by arbdevml 5
AudioWorklet support via SEPIA Web Audio?

Hi everybody,

I just saw this project and thought it was very interesting and fits quite well to a library I've just released :slightly_smiling_face: . For my SEPIA Open Assistant project I've built the SEPIA Web Audio Library that can handle custom audio pipelines with AudioWorklet and Web-Worker support. There is pretty good WASM support as well since the resampler for example can use Speex via a WASM module.

The library has a module that interfaces with Vosk via the SEPIA STT-Server (a WebSocket streaming STT server). Currently I prefer to host Vosk on a Raspberry Pi 4 instead of running it on the client, but I'm pretty sure much of the code could be reused :smiley: .

Let me know if this sounds interesting to you and I can help to get started!

opened by fquirin 6