npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

baa-lexer

v0.3.1

Published

![](img/baa-sheep-lemmling.svg)

Downloads

12

Readme

Original image by lemmling on OpenClipArt.org

Baa!

Baa is a highly-optimised tokenizer/lexer written in TypeScript. It is inspired by moo , but completely rewritten.

It accepts most of moo's configurations, but lacks some features.

  • No support for arrays of keywords.
  • No support for rules that are arrays of rule definitions.
  • No support for regular expressions with unicode flag
  • Less dynamic checks (e.g. silently drops all provided regex flags)

Advantages:

  • Compiles to a reusable concurrency-save lexer instead of creating an iterable object directly (see "Usage").
  • Different token format.
  • Slightly faster than moo (at least not much slower)
  • About 2.2kb of size.
  • Strong typings, including state-names and token-types
  • Understandable code

Note: This was mostly an exercise for me to practice test-driven development and think about architecture a bit. In the end, I tried to optimize speed and build size. I don't think it makes a lot of difference whether you use moo or baa. moo is more popular and may be better supported in the long run. I will use baa in handlebars-ng though.

Installation

Install the baa-lexer with

npm install baa-lexer

Usage

The examples/ show you how to use baa. One of the simple examples is this:

import { baa } from "baa-lexer";

const lexer = baa({
  main: {
    A: "a",
    FALLBACK: { fallback: true },
    B: "b",
  },
});

for (const token of lexer.lex("a b")) {
  console.log(token);
}

This will print in the following tokens:

{ type: 'A',  original: 'a', value: 'a', start: { line: 1, column: 0 }, end: { line: 1, column: 1 } }
{ type: 'FALLBACK', original: ' ', value: ' ', start: { line: 1, column: 1 }, end: { line: 1, column: 2 } }
{ type: 'B', original: 'b', value: 'b', start: { line: 1, column: 2 }, end: { line: 1, column: 3 } }

For a complete list of rules, have a look at the tests

Using types

If you create a type

interface Typings {
  tokenType: "my" | "token" | "types";
  stateName: "my" | "state" | "names";
}

and pass it as generic to the baa function, you will get auto-completion for types within the configuration as well as for the "type" field in the created tokens. The following screenshot highlights all places that are type-checked and auto-completed.

Benchmarks

See performance/ for the exact tests and run then yourself with

yarn perf

These are the results, but be aware that results may vary a lot:

 BENCH  Summary

  moo - performance/moo-baa.bench.ts > moo-baa test: './tests/abab.ts' (+0)
    1.07x faster than baa

  baa - performance/moo-baa.bench.ts > moo-baa test: './tests/fallback.ts' (+0)
    1.19x faster than moo

  baa - performance/moo-baa.bench.ts > moo-baa test: './tests/handlears-ng.ts' (+0)
    1.50x faster than moo

  baa - performance/moo-baa.bench.ts > moo-baa test: './tests/handlears-ng.ts' (1)
    1.25x faster than moo

  baa - performance/moo-baa.bench.ts > moo-baa test: './tests/handlears-ng.ts' (2)
    1.19x faster than moo

  baa - performance/moo-baa.bench.ts > moo-baa test: './tests/json-regex.ts' (+0)
    1.15x faster than moo

  moo - performance/moo-baa.bench.ts > moo-baa test: './tests/json.ts' (+0)
    1.04x faster than baa

Readable / Extendable code

What bothered me most about moo was that it is just one large JavaScript file, and it took me a long while to understand all the optimizations they implemented.

It tried to take modular approach. Basically the whole program is divided into

  • The Lexer: Responsible for creating an IterableIterator of tokens which then manages state transitions. Uses the TokenFactory to create the actual tokens.
  • The Matcher: Finds the next token match. There are different strategies
    • RegexMatcher: Creates a large regex to find the next match
    • StickySingleCharMatcher: Uses an array to map char-codes to rules. Can only find single-char tokens, but this can be done much faster than with Regex.
  • The StateProcessor: Uses the Matcher to find the next match, interleaves matches for fallback and error rules.
  • The TokenFactory: Keeps track of the current location and creates tokens from matches.
  • The mooAdapter takes a moo-config and combines all those components so that they do what they should.

Advances usage

You do not have to use the mooAdapter though: Most the internal components are exposed, so you can use them yourself. You can create a StateProcess and pass your own Matcher instance to it. You can create a completely new StateProcessor with completely custom logic.

The program could also be extended to allow a custom TokenFactory, applying the token format that you need (but I won't do this unless somebody needs it).