npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

universal-lexer

v2.0.6

Published

Universal lexer, where you can pass your rules for lexical analytics

Downloads

38

Readme

Universal Lexer

Travis Code Climate Coverage Status NPM Downloads

Lexer which can parse any text input to tokens, according to provided regular expressions.

In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens (strings with an assigned and thus identified meaning). A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer. A lexer is generally combined with a parser, which together analyze the syntax of programming languages, web pages, and so forth.

Features

  • Allow named regular expressions, so you don't have to work with it a lot
  • Allow post-processing tokens, to get more information you require

How to install

Package is available as universal-lexer in NPM, so you can use it in your project using npm install universal-lexer or yarn add universal-lexer

What are requirements?

Code itself is written in ES6 and should work in Node.js 6+ environment. If you would like to use it in browser or older development, there is also transpiled and bundled (UMD) version included. You can use universal-lexer/browser in your requires or UniversalLexer in global environment (in browser):

// Load library
const UniversalLexer = require('universal-lexer/browser')

// Create lexer
const lexer = UniversalLexer.compile(definitions)

// ...

How it works

You've got two sets of functions:

// Load library
const UniversalLexer = require('universal-lexer')

// Build code for this lexer
const code1 = UniversalLexer.build([ { type: 'Colon', value: ':' } ])
const code2 = UniversalLexer.buildFromFile('json.yaml')

// Compile dynamically a function which can be used
const func1 = UniversalLexer.compile([ { type: 'Colon', value: ':' } ])
const func2 = UniversalLexer.compileFromFile('json.yaml')

There are two ways of passing rules to this lexer: from file or array of definitions.

Pass as array of definitions

Simply, pass definitions to lexer:

// Load library
const UniversalLexer = require('universal-lexer')

// Create token definition
const Colon = {
  type: 'Colon',
  value: ':'
}

// Build array of definitions
const definitions = [ Colon ]

// Create lexer
const lexer = UniversalLexer.compile(definitions)

A definition is more complex object:

// Required fields: 'type' and either `regex` or `value`
{
  // Token name
  type: 'String',

  // String value which should be searched on beginning on string
  value: 'abc',
  value: '(',

  // Regular expression to validate
  // if current token should be parsed as this token
  // Useful i.e. when you require separator after sentence,
  // but you don't want to include it.
  valid: '"',

  // Regular expression flags for 'valid' field
  validFlags: 'i',

  // Regular expression to find current token
  // You can use named groups as well (?<name>expression):
  // Then it will attach this information to token.
  regex: '"(?<value>([^"]|\\.)+)"',

  // Regular expression flags for 'regex' field
  regexFlags: 'i'
}

Pass YAML file

// Load library
const UniversalLexer = require('universal-lexer')

const lexer = UniversalLexer.compileFromFile('scss.yaml')

YAML file for now should contain only Tokens property with definitions. Later it may have more advanced stuff like macros (for simpler syntax).

Example:

Tokens:
  # Whitespaces

  - type: NewLine
    value: "\n"

  - type: Space
    regex: '[ \t]+'

  # Math

  - type: Operator
    regex: '[+-*/]'

  # Color
  # It has 'valid' field, to be sure that it's not i.e. blacker
  # Now, it will check if there is no text after

  - type: Color
    regex: '(?<value>black|white)'
    valid: '(black|white)[^\w]'

Processing data

Processing input data, after you created a lexer is pretty straight-forward with for method:

// Load library
const UniversalLexer = require('universal-lexer')

// Create lexer
const tokenize = UniversalLexer.compileFromFile('scss.yaml')

// Build processor
const tokens = tokenize('some { background: code }').tokens

Post-processing tokens

If you would like to make more advanced parsing on parsed tokens, you can do it with addProcessor method:

// Load library
const UniversalLexer = require('universal-lexer')

// Create lexer
const tokenize = UniversalLexer.compileFromFile('scss.yaml')

// That's 'Literal' definition:
const Literal = {
  type: 'Literal',
  regex: '(?<value>([^\t \n;"'',{}()\[\]#=:~&\\]|(\\.))+)'
}

// Create processor which will replace all '\X' to 'X' in value
function process (token) {
  if (token.type === 'Literal') {
    token.data.value = token.data.value.replace(/\\(.)/g, '$1')
  }

  return token
}

// Also, you can return a new token
function process2 (token) {
  if (token.type !== 'Literal') {
    return token
  }

  return {
    type: 'Literal',
    data: {
      value: token.data.value.replace(/\\(.)/g, '$1')
    },
    start: token.start,
    end: token.end
  }
}

// Get all tokens...
const tokens = tokenize('some { background: code }', process).tokens

Beautified code

If you would like to get beautified code of lexer, you can use second argument of compile functions:

UniversalLexer.compile(definitions, true)
UniversalLexer.compileFromFile('scss.yaml', true)

Possible results

On success you will retrieve simple object with array of tokens:

{
  tokens: [
    { type: 'Whitespace', data: { value: '     ' }, start: 0, end: 5 },
    { type: 'Word', data: { value: 'some' }, start: 5, end: 9 }
  ]
}

When something is wrong you will get error information:

{
  error: 'Unrecognized token',
  index: 1,
  line: 1,
  column: 2
}

Examples

For now, you can see example of JSON semantics in examples/json.yaml file.

CLI

After installing globally (or inside of NPM scripts) universal-lexer command is available:

Usage: universal-lexer [options] output.js

Options:
  --version       Show version number                                  [boolean]
  -s, --source    Semantics file                                      [required]
  -b, --beautify  Should beautify code?                [boolean] [default: true]
  -h, --help      Show help                                            [boolean]

Examples:
  universal-lexer -s json.yaml lexer.js  build lexer from semantics file

Changelog

Version 2

  • 2.0.6 - bugfix for single characters
  • 2.0.5 - fix mistake in README file (post-processing code)
  • 2.0.4 - remove unneeded benchmark dependency
  • 2.0.3 - add unit and E2E tests, fix small bugs
  • 2.0.2 - added CLI command
  • 2.0.1 - fix typo in README file
  • 2.0.0 - optimize it (even 10x faster) by expression analysis and some other things

Version 1

  • 1.0.8 - change that current position in syntax error starts from 1 always
  • 1.0.7 - optimize definitions with "value", make syntax errors developer-friendly
  • 1.0.6 - optimized Lexer performance (20% faster in average)
  • 1.0.5 - fix browser version to be put into NPM package properly
  • 1.0.4 - bugfix for debugging
  • 1.0.3 - add proper sanitization for debug HTML
  • 1.0.2 - small fixes for README file
  • 1.0.1 - added Rollup.js support to build version for browser