npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

lexdoc

v2.0.3

Published

Simplified token definition and lexer creation library for use with Chevrotain.

Downloads

14

Readme

lexdoc

npm license maintenance

Simplified token definition and lexer creation library for use with the Chevrotain parser building toolkit.

Example:

const chevrotain = require('chevrotain')
const LD = require('lexdoc')(chevrotain)

const JsonLexer = LD.build({
  WhiteSpace: {
    pattern: /[ \t\n\r]+/,
    group: LD.SKIPPED,
    line_breaks: true
  },

  NumberLiteral: /-?(0|[1-9]\d*)(\.\d+)?([eE][+-]?\d+)?/,
  StringLiteral: /"(?:[^\\"]|\\(?:[bfnrtv"\\/]|u[0-9a-fA-F]{4}))*"/,

  LCurly:  '{',
  RCurly:  '}',
  LSquare: '[',
  RSquare: ']',
  Comma:   ',',
  Colon:   ':',
  True:    'true',
  False:   'false',
  Null:    'null',
})

JsonLexer.instance // Stores the Chevrotain Lexer instance that was created

JsonLexer.lex(str) // Tokenize the given string. Wrapper around the created
                   // Lexer instance's #tokenize method, with built-in error
                   // handling.

JsonLexer.tokens        // An object containing the created tokens

JsonLexer.tokens.LCurly // Referencing a token

Installation

npm install lexdoc

Features

Shortened Definitions for Simple Tokens

Tokens which have only a pattern and no other properties can be defined using only their pattern, simplifying many token definitions:

// Using the Chevrotain #createToken API:
const True = createToken({ name: "True", pattern: /true/ })
const False = createToken({ name: "False", pattern: /false/ })
const Null = createToken({ name: "Null", pattern: /null/ })
const LCurly = createToken({ name: "LCurly", pattern: /{/ })
const RCurly = createToken({ name: "RCurly", pattern: /}/ })

// Using Lexdoc
LD.build({
  True:   /true/,
  False:  /false/,
  Null:   /null/,
  LCurly: /{/,
  RCurly: /}/
})

Notice that the token name is not repeated using Lexdoc, keeping things DRY.

Soft Token References

References to other tokens within a token definition are done using a string representing the token's name, rather than a direct reference to the created token object. This has two advantages:

  • Tokens can be referenced before they are defined.
  • Tokens are defined in order of lexing precedence; order of precedence doesn't have to be specified separately.

Examples:

LD.build({
  Boolean: LD.CATEGORY, // Categories are defined using LD.CATEGORY, a more
  Value:   LD.CATEGORY, // semantic synonym for Chevrotain's Lexer.NA

  // Reserved words
  True: {
    pattern: /true/,
    longer_alt: 'Identifier',   // Note that we reference Identifier before it's defined
    categories: 'Boolean Value'
  },
  False: {
    pattern: /false/,
    longer_alt: 'Identifier',
    categories: 'Boolean Value' // Categories can be referenced using a
  },                            // space-seperated string. An array of strings
                                // could also be used.

  Identifier: {
    pattern: /[a-zA-Z][a-zA-Z_]*/,
    categories: 'Value'
  }
})

Order of Definition is Order of Precedence

As seen above, the order of precedence when lexing is the same as the order in which tokens are defined when using Lexdoc. In Chevrotain tokens are defined and their order of precedence is then given separately.

Using Chevrotain:

// Identifier must be defined before Select or From so it can be referenced by
// them.
const Identifier = createToken({ name: "Identifier", pattern: /[a-zA-Z]\w*/ })

const Select = createToken({
    name: "Select",
    pattern: /SELECT/,
    longer_alt: Identifier
})
const From = createToken({
    name: "From",
    pattern: /FROM/,
    longer_alt: Identifier
})

// The order in which tokens will be lexed must be specified separately.
const tokens = [
  Select,
  From,
  Identifier // Identifier must be lexed after Select and From
]

const lexer = new Lexer(tokens)

In Lexdoc token precedence during lexing is the same as the order in which tokens are defined:

const lexer = LD.build({
  Select: {
    pattern: /SELECT/,
    longer_alt: 'Identifier'
  },
  From: {
    pattern: /FROM/,
    longer_alt: 'Identifier'
  },
  Identifier: /[a-zA-Z]\w*/
})

Multi-mode Lexers

Multi-mode lexers are supported by Lexdoc:

LD.mode('ModeA', {
  TokenA: 'A',
  TokenB: 'B'
})
LD.mode('ModeB', {
  TokenC: 'C',
  TokenD: 'D'
})

LD.defaultMode('ModeB') // The first mode defined is implicitly set to be the
                        // default mode, but any other mode may be explicitly
                        // set as the default using #defaultMode

const lexer = LD.build()

Built-in XRegExp DSL

Lexdoc has a built-in XRegExp DSL, as seen in many of Chevrotain's examples, allowing for reuse of patterns.

Example usage:

// Define multiple fragments at once using #fragments
LD.fragments({
  fragA: 'foo',
  fragB: 'bar'
})

// Fragments can also be defined one at a time using #fragment
LD.fragment('fragC', 'baz')

// Fragments are re-used with the #pattern method
const lexer = LD.build({
  TokenA: LD.pattern('{{fragA}}.*?{{fragB}}'),
  TokenB: LD.pattern('\\({{fragC}}\\)')
})

Full Example

For a full example, see the provided JSON parser example.

Here's a stripped-down example showing only the important stuff:

const JsonLexer = LD.build({
  // Token definitions ...
})

class JsonParser extends Parser {
  constructor(input, config) {
    super(input, JsonLexer.tokens, config) // Use of lexer object's token list

    // Rules ...

    $.RULE('object', () => {
      $.CONSUME(JsonLexer.tokens.LCurly) // Reference to token
      // ...
      $.CONSUME(JsonLexer.tokens.RCurly) // Reference to token
    })

    // Rules ...

    this.performSelfAnalysis()
  }
}

const parser = new JsonParser(JsonLexer.tokens)

module.exports = function(text) {
  const lexResult = JsonLexer.lex(text) // Actual usage of the lexer
  parser.input = lexResult.tokens
  const value = parser.json()
  return value
}

License

Available under the terms of the MIT license.

Copyright 2022 0E9B061F