npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

tinylex

v0.7.4

Published

A simple iterative lexer written in TypeScript

Downloads

28

Readme

tinylex

Build Status

A simple iterative lexer written in TypeScript

Under development

Install:

npm install tinylex

Import:

const lexer = require('tinylex')

Code:

const code = `
#
# Darklord source
#
summon "messenger"

forge harken(msg) {
  messenger(msg || 'All shall flee before me!')
}

craft lieutenants = 12
craft message = "I have " + leutenants + " servants"

harken.wield(message)
`

Rules:

const KEYWORDS = [
  'summon', 'forge', 'craft', 'wield',
  'if', 'while', 'true', 'false', 'null'
]

const KEYWORD = new RegExp(`^(?:${KEYWORDS.join('|')})`)
const COMMENT = /^\s*(#.*)\n/
const IDENTIFIER = /^[a-z]\w*/
const NUMBER = /^(?:\+|-)?(?:\.)?\d+\.?(?:\d+)?/
const STRING_SINGLE = /^'([^']*)'/
const STRING_DOUBLE = /^"([^"]*)"/
const LOGICAL = /^(?:\|\||&&|==|!=|<=|>=)/
const WHITESPACE = /^\s/

const rules = [
  [COMMENT, 'COMMENT'],
  [KEYWORD, 0],
  [IDENTIFIER, 'IDENTIFIER'],
  [NUMBER, 'NUMBER'],
  [LOGICAL, 0],
  [STRING_DOUBLE, 'STRING'],
  [STRING_SINGLE, 'STRING'],
  [WHITESPACE]
]

Instantiate:

const lexer = new TinyLex(code, rules)

Consume:

for (let token of lexer) {
  console.log(token)
}

or

while(!lexer.done()) {
  console.log(lexer.lex())
}

or

const tokens = [...lexer]
console.log(tokens)

or

const tokens = lexer.tokenize()
console.log(tokens)

Result:

// ------------------------------------------------------------------
// generated tokens
//
[ 'COMMENT', '#' ]
[ 'COMMENT', '# Darklord source' ]
[ 'COMMENT', '#' ]
[ 'SUMMON', 'summon' ]
[ 'STRING', 'messenger' ]
[ 'FORGE', 'forge' ]
[ 'IDENTIFIER', 'harken' ]
[ '(', '(' ]
[ 'IDENTIFIER', 'msg' ]
[ ')', ')' ]
[ '{', '{' ]
[ 'IDENTIFIER', 'messenger' ]
[ '(', '(' ]
[ 'IDENTIFIER', 'msg' ]
[ '||', '||' ]
[ 'STRING', 'All shall flee before me!' ]
[ ')', ')' ]
[ '}', '}' ]
[ 'CRAFT', 'craft' ]
[ 'IDENTIFIER', 'lieutenants' ]
[ '=', '=' ]
[ 'NUMBER', '12' ]
[ 'CRAFT', 'craft' ]
[ 'IDENTIFIER', 'message' ]
[ '=', '=' ]
[ 'STRING', 'I have ' ]
[ '+', '+' ]
[ 'IDENTIFIER', 'leutenants' ]
[ '+', '+' ]
[ 'STRING', ' servants' ]
[ 'IDENTIFIER', 'harken' ]
[ '.', '.' ]
[ 'WIELD', 'wield' ]
[ '(', '(' ]
[ 'IDENTIFIER', 'message' ]
[ ')', ')' ]
[ 'EOF', 'EOF' ]

Rules

const rules = [
  [COMMENT, 'COMMENT'],       // ['COMMENT', '# Darklord source']
  [KEYWORD, 0],               // ['SUMMON', 'summon']
  [IDENTIFIER, 'IDENTIFIER'], // ['IDENTIFIER', 'harken']
  [NUMBER, 'NUMBER'],         // ['NUMBER', '12']
  [LOGICAL, 0],               // ['||', '||']
  [STRING_DOUBLE, 'STRING'],  // ['STRING', 'messenger']
  [STRING_SINGLE, 'STRING'],  // ['STRING', 'All shall flee...']
  [WHITESPACE]
]

Rules can be specified in the form [RegExp, string|number|function|null|undefined]

RegExp: the match criteria specified as a regular expression object.

string: the name of the token, e.g., 'COMMENT' as in [COMMENT, 'COMMENT']. The token content is taken from match group 0 (the lexeme) of the RegExp match object which produces the token ['COMMENT', '# Darklord source']. If the RegExp contains a match group, then match group 1 is used, as is the case for the RegExp used for the string rules, e.g., /^"([^"]*)"/, which captures the portion of the match between the quotes. This only works for match group 1.

number: the number of the match group to use for both the token name and content, as in [KEYWORD, 0] which produces the token ['SUMMON', 'summon']. This means that if your regular expression contains a match group, you can use it to generate the name and value for the token: [SOME_REGEXP, 1].

null|undefined: no token should be created from the match - effectively discards the match altogether, as in [WHITESPACE] which swallows whitespace with no other effect. The cursor is advanced by the length of the lexeme (match group 0).

function: a function used to create the token, discard the match, and/or advance the cursor by some positive, non-zero integer amount (TinyLex always advances the cursor to avoid infinite loops). Functions here can also push multiple tokens if desired. If the function returns null or undefined, the cursor is advanced by the length of the lexeme (match group 0). If the function returns a number <= 1, the cursor is advanced by one. The function's this context is set to the lexer instance.

// We could use a function to swallow whitespace.
[WHITESPACE, function (match, tokens, chunk) {
  // Advance the cursor by one. If we don't return a number, the
  // cursor is advanced by the size of the lexeme (match group 0),
  // so in this case returning 1 is no different from returning
  // null or undefined.
  return 1
}]
// We could use a function to customize the token in some way.
[LOGICAL, function (match, tokens, chunk) {
  const lexeme = match[0]
  switch (lexeme) {
    case '&&': tokens.push(['OPERATOR', '&&']); break
    case '||': tokens.push(['OPERATOR', '||']); break
    default: tokens.push([lexeme, lexeme])
  }

  // We don't actually need to do this because by default the
  // cursor is advanced by the lexeme length (match group 0).
  return lexeme.length
}]

Note: when using a rule function you must push one or more tokens onto the tokens array unless you intentionally intend to discard the match. If no tokens are pushed no token will be generated.

The onToken Function

This function, if given, is called for every token. It can modify the contents of the token, return an entirely new token, or discard some or all tokens (except for the final EOF token which can be transformed but not removed). onToken can be utilized by calling lexer.onToken and passing a function definition. This function is called with its this context set to the lexer instance.

const lexer = new TinyLex(code, rules)

// The callback function will have it's 'this' context set
// to the lexer instance.
lexer.onToken(function (token, match) {
  // We can return a new token, the original token, a modified
  // version of the given token, or nothing at all - in which case
  // the token will be discarded except for the EOF token which can
  // only be modified or set to null.
  return token
})

Options

The option onError specifies what to do if a match is not found at the cursor.

tokenize: (default) Tokenize the next single character and advance the cursor by one.

ignore: Advance the cursor by one and do nothing else.

throw: Throw an error indicating that a match was not found.

// onError can be 'tokenize', 'throw', or 'ignore'.
const lexer = new TinyLex(code, rules, {onError: 'tokenize'})

Note: onError is the only configuration option.