npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

mecab-web-worker

v0.3.2

Published

A web worker for Japanese morphological analysis in the browser using WASM.

Downloads

10

Readme

integration tests CodeQL npm

mecab-web-worker

Using MeCab for Japanese segmentation in the browser. Inspired by fugashi's API.

npm install mecab-web-worker

Compatibility notice: Uses Module Workers and the Compression Streams API. These are not available in every major browser.

import { MecabWorker } from "mecab-web-worker";

const worker = await MecabWorker.create("/unidic-mecab-2.1.2_bin.zip");
const result = await worker.parse("和布蕪は、ワカメの付着器の上にある");
console.log(result);

const nodes = await worker.parseToNodes("和布蕪は、ワカメの付着器の上にある");
for (let node of nodes) {
  console.log(node);
}

MeCab was compiled to WASM and runs in a background thread via the Web Workers API. It's necessary to provide a dictionary (an url to a zip file). The corresponding files are available here: https://github.com/leyhline/mecab-web-worker/releases/tag/v0.3.0 After the first download the zip file is persisted in the browser cache (using CacheStorage) to avoid repeated downloads.

Motivation

I want to build some interactive tool for aligning Japanese text and audio. Since interactivity is easier to accomplish in the browser I wanted to go full JS instead of putting e.g. Python in the mix. And since the functionality for segmentation is easy to separate I decided to create an NPM package that's hopefully as easy to use as Python's fugashi (a great wrapper around MeCab, check it out, cite it and sponsor Paul's work).

My uninformed self did also draw from the knowledge he published on his blog, e.g. An Overview of Japanese Tokenizer Dictionaries and I use his Unidic distribution. Thanks a lot! I hope to build a better understanding of the theory behind all this at a later date.

Technical Background

MeCab was compiled to WASM using Emscripten without wrapper code in C. See the corresponding GitHub Action for the compiler flags.

However, for accessing a C struct from MeCab, I had to use pointer arithmetic in JavaScript (see mecab-worker.js:MecabNode which isn't really elegant. I also wrote a simple unzip function using the Compression Streams API which works (in Chrome at least) but is not completely correct.

TODO

  • [x] Support different dictionaries; this isn't hard but wasn't a use case for me personally.
  • [ ] Wrap more of MeCab's functionality like returning nbest results.
  • [ ] Polyfills for APIs that are not widely supported by browsers.