npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@transcribe/transcriber

v2.0.5

Published

Transcribe speech to text in the browser.

Downloads

172

Readme

NPM Version

Transcribe.js

Transcribe speech to text in the browser. Based on a wasm build of whisper.cpp.

Note: This package is browser only. Node.js is not supported. (see this discussion for details)

Packages

All packages are under @transcribe namespace.

| Package | Description | | --------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | | @transcribe/shout | Wasm build based on whisper.cpp. Contains Module file including the wasm binary and a separate webworker file. | | @transcribe/transcriber | FileTranscriber and StreamTranscriber for transcribing media files or streams. |

Prerequisite

Webserver

Your webserver must serve the files with cross origin headers.

"Cross-Origin-Embedder-Policy": "require-corp"
"Cross-Origin-Opener-Policy": "same-origin"

Browser

Your browser must support SharedArrayBuffer. (brower support)

The default wasm files are built with SIMD enabled. If your browser/device doens't support SIMD use the no-simd files instead. Also check out the example code on how to use it. (brower support)

Model File

You need a ggml model file to run Transcribe.js. You can download them on hugging face https://huggingface.co/ggerganov/whisper.cpp/tree/main . You should start with the (quantized) tiny or base models. Larger models propably won't work but you can try it, though.

Installation

NPM

Install shout wasm and transcriber packages

npm install --save @transcribe/transcriber

copy the shout.wasm and webworker files to your project directory

# copy shout wasm
cp node_modules/@transcribe/shout/src/shout/shout.wasm.worker.mjs /your/project
cp node_modules/@transcribe/shout/src/shout/shout.wasm.js /your/project

# optional: copy no-simd build
cp node_modules/@transcribe/shout/src/shout/shout.wasm.worker_no-simd.mjs /your/project
cp node_modules/@transcribe/shout/src/shout/shout.wasm_no-simd.js /your/project

# optional: copy audio-worklets, only needed if you want to use StreamTranscriber
cp -r node_modules/@transcribe/transcriber/src/audio-worklets /your/project

Manual Installation

You can use Transcribe.js without a bundler or package manager. Download the files from this repository, copy the src/* directories to your webserver and include the following into your HTML. Make sure to set the correct paths in the import map.

<!-- set paths to js files -->
<script type="importmap">
  {
    "imports": {
      "@transcribe/shout": "/src/shout/shout.wasm.js",
      "@transcribe/transcriber": "/src/index.js"
    }
  }
</script>

<!-- use type="module" for es6 imports -->
<script type="module">
  import createModule from "/your/project/shout.wasm.js"; // path where you've copied before
  // import createModule from "@transcribe/shout";  // if you use an import map
  import { FileTranscriber } from "@transcribe/transcriber";

  ...
</script>

Usage

For full code examples and advanced usage please see https://www.transcribejs.dev or check out the File Transcriber Example.

import createModule from "/your/project/shout.wasm.js"; // path where you've copied before
// import createModule from "@transcribe/shout";  // if you use an import map
import { FileTranscriber } from "@transcribe/transcriber";

// create new instance
const transcriber = new FileTranscriber({
  createModule, // create module function from emscripten build
  model: "/your/project/ggml-tiny-q5_1.bin", // path to ggml model file
  workerPath: "/your/project", // directory of shout.wasm.worker.mjs copied before
});

// init wasm transcriber worker
await transcriber.init();

// transcribe audio/video file
const result = await transcriber.transcribe("/your/project/my.mp3");

console.log(result);

The result is an JSON object containg the text segements and timestamps.

{
  "result": {
    "language": "en"
  },
  "transcription": [
    {
      "timestamps": {
        "from": "00:00:00,000",
        "to": "00:00:11,000"
      },
      "offsets": {
        "from": 0,
        "to": 11000
      },
      "text": " And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.",
      "tokens": [
        {
          "text": " And",
          "timestamps": {
            "from": "00:00:00,320",
            "to": "00:00:00,350"
          },
          "offsets": {
            "from": 320,
            "to": 350
          },
          "id": 400,
          "p": 0.726615 // propability, aka. how likely the estimate is true, 0..1, 1 is best
        },
        // ... one token per word
      ]
    }
  ]
}

Development

Clone the repository, install dependencies, start the dev server and open http://localhost:9876/examples/index.html in your browser.

git clone https://github/transcribejs/transcribe.js
cd transcribe
npm install
npm run dev

Types

The library is not written in typescript. This way no extra build step is needed during development and in production.

To still get proper type support type definitions get generated from JSDoc comments.

npm run generate-types

Wasm build

The whisper.cpp repository is a git submodule. To get the latest version of whisper.cpp go into the directory and pull the latest changes from github.

cd shout.wasm/whisper.cpp
git pull origin master

The wasm files are build from shout.wasm/src/whisper.wasm.cpp. If you want to add new functions from whisper.cpp to the wasm build this is the file to add them.

I'm pretty sure that this will not compile on every machine/architecture, but I'm no expert in C++. If you know how to optimize the build process please let me know or create a pull request. Maybe this should be dockerized.?

# run cmake to build wasm
npm run wasm:build

# copy emscripten build files to project
npm run wasm:copy

Tests

Unit/functional tests for the Transcriber functions.

npm run test:unit

E2E tests using Playwright. Firefox somehow needs waaaaaay longer during e2e test than in a the "real" browser.

npm run test:e2e

or use the Playwright UI for details

npm run test:e2e-ui

Credits

People

Many thanks to the people who supported this project, be it through code, ideas or general testing. I appreciate your time and effort.

Libraries

Also thank you to the creators and contributors of the following open source libraries that were used in this project:

  • whisper.cpp: A C++ implementation of whisper. GitHub Repository
  • emscripten: A toolchain for compiling C and C++ code to WebAssembly. Official Site
  • water.css: A minimal CSS framework for styling HTML. Official Site
  • fft.js: A library for Fast Fourier Transform calculations. GitHub Repository
  • Moattar, Mohammad & Homayoonpoor, Mahdi. (2010). A simple but efficient real-time voice activity detection algorithm. Research Paper
  • vitest: A website for testing voice recognition. Official Site
  • Playwright: A tool for automating browser testing. Official Site

Audio Test Files

  • examples/albert.ogg Radio Universidad Nacional de La Plata, CC BY-SA 3.0, via Wikimedia Commons
  • examples/jfk.wav: CC BY-SA 3.0, via Wikimedia Commons

Sponsoring

This project is tested with BrowserStack