npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

doken

v1.0.2

Published

A minimalistic, general purpose tokenizer generator

Downloads

274

Readme

doken CI Status

A minimalistic, general purpose tokenizer generator.

Usage

Use npm to install:

$ npm install doken

Import doken and create a tokenizer with regular expression rules:

const {createTokenizer, regexRule} = require('doken')

const tokenizeJSON = createTokenizer({
  rules: [
    regexRule('_whitespace', /\s+/y, {lineBreaks: true}),
    regexRule('brace', /[{}]/y),
    regexRule('bracket', /[\[\]]/y),
    regexRule('colon', /:/y),
    regexRule('comma', /,/y),
    regexRule('string', /"([^"\n\\]|\\[^\n])*"/y),
    regexRule('number', /(-|\+)?\d+(.\d+)?/y),
    regexRule('boolean', /(true|false)\b/y),
    regexRule('null', /null\b/y)
  ]
})

let tokens = tokenizeJSON(`{"a": "Hello World!"}`)

console.log([...tokens])

API

Rule object

A rule object contains the following fields:

  • type <string> - The type of the token this rule generates. If type starts with an underscore _, the token will not be emitted by the tokenizer.

  • lineBreaks <boolean> (Optional) - Set this property to true if this rule might match line breaks and you want to track it correctly.

  • match <Function> - A function with the following signature:

    (input: string, position: number) =>
      null |
      {
        length: number,
        value: any
      }

    This function will try to get the token of given type at given position in input if applicable. Return null if input at position is not a token with the given type, otherwise return an object.

    The first length characters of input after position will be the matched token. You can optionally return a value which can contain any data that will be attached to the token. If value is not given, it will default to the first length characters of input after position.

Token object

A token will be represented by an object with the following fields:

  • type <string> | null - The type of the token or null if no given rules match the input.
  • length <number> - The length of the token.
  • value <any> - The value generated by the rule.
  • pos <number> - The zero-based position of the first character of the token.
  • row <number> - The zero-based row of the first character of the token.
  • col <number> - The zero-based column of the first character of the token.

doken.createTokenizer(options)

  • options <object>
    • rules <Array<Rule>>
    • strategy 'first' | 'longest' (Optional) - Default: 'first'
  • Returns: <Function>

Generates a tokenize function with the following signature:

(input: string) => IterableIterator<Token>

This function will attempt to tokenize given input, yielding tokens matched by given rules one by one.

Set strategy to 'longest' to match the token with the rule that matches the most characters instead of using the rule that matches first.

doken.regexRule(type, regex[, options])

  • type <string>
  • regex <RegExp>
  • options <object>
    • lineBreaks <boolean> (Optional) - Set this property to true if this rule might match line breaks and you want to track it correctly.
    • value <Function> (Optional) - A function for calculating the token value out of the match.
    • condition <Function> (Optional) - A function for indicating whether to discard match or not.
  • Returns: <Rule>

Returns a rule that attempts to match input string with the given regex.

value can be set to a function (match: RegExpExecArray) => any. The generated token will have the returned value as value.

condition can be set to a function (match: RegExpExecArray) => boolean. Return false to indicate to discard matched result and go on with the next rule.