npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@jlguenego/lexer

v1.5.0

Published

Lexical analyzer in Full Javascript / Typescript

Downloads

46

Readme

https://en.wikipedia.org/wiki/Lexical_analysis

Why ? I wanted a SIMPLE, INTUITIVE, FAST, Full JS lexer, allowing to

  • tokenize correctly string and comment in a language.
  • stropping for being able to write not ambigous grammar in the next phase (syntax analysis)
  • use context for being able to make off-side rules languages, etc.

Module useful for:

  • compilers
  • prettyprinters
  • linters

Install

npm i @jlguenego/lexer

Code Style: Google

Usage

const {Lexer, Group} = require('@jlguenego/lexer');

// Source code to tokenize.
const str = `
var x = 3;
var y = 52;
`;

// declare all the language rules.
const blank = new Rule({
  name: 'blank',
  pattern: /\s+/,
  ignore: true,
});

const keywords = Rule.createKeywordRules(['var']);

const operators = Rule.createGroupRules(Group.OPERATOR, [
  {
    name: 'equal',
    pattern: '=',
  },
]);

const separators = Rule.createGroupRules(Group.SEPARATOR, [
  {
    name: 'semi-column',
    pattern: ';',
  },
]);

const identifier = new Rule({
  name: 'identifier',
  pattern: /\w+/,
  group: Group.IDENTIFIER,
});

// the order is important. Token are applied from first to last.
const rules = [blank, ...keywords, ...operators, ...separators, identifier];

// Do the job.
const tokenSequence = new Lexer(rules).tokenize(str);

// print the output.
console.log('tokenSequence: ', tokenSequence);

This produces the following output:

tokenSequence: [
  {
    name: 'var',
    lexeme: 'var',
    group: 'keywords',
    position: {col: 1, line: 2},
  },
  {
    name: 'identifier',
    lexeme: 'x',
    group: 'identifiers',
    position: {col: 5, line: 2},
  },
  {
    name: 'equal',
    lexeme: '=',
    group: 'operators',
    position: {col: 7, line: 2},
  },
  // ...
];

examples

TODO:

  • show example in famous language (JSON, XML, YAML, Javascript)

Concepts

This module purpose is to tokenize a source code input. In computer science, this process is known under the term lexical analysis. We call it also a lexer.

const tokenSequence = new Lexer(rules).tokenize(str);

The above instruction applies rules to tokenize a source string str.

The rules are specified according the language and applied according a regular expression given in rule.pattern.

During tokenization we define by the word state the source code being tokenized. The state is a sequence of two types of element:

  • source code fragment, called source element, not yet tokenized,
  • and recognized token.

At the beginning the state is an array of one source element reflecting the entire source code. At the end the state must be an array of only tokens, otherwise the source code is not respecting the syntax.

Normally, the source code is tokenized from the beginning to the end of the string (called left to right scan). But tokenizing can be faster if instead of looking from the beginning to the end, we choose to apply successively one rule after another to the current state. The drawback is that certains rules (for instance string, comment) cannot be well correctly tokenized if they are nested together.

Therefore this lexer do both algorithms successively:

  1. Preprocessing: performs the slow and robust method with only the rules marked as preprocess flag.
  2. Main: performs the fast way: applying the rules one after the other to the state.

The recommandation is to mark a rule with the preprocess flag only if the main stage cannot apply the rule correctly. Of course if there is no rules with preprocess flag, no need to run the preprocessing stage.

The preprocess stage applies all rules, and select the rule that will apply at the smallest index of the source string. It is slower than the main stage because there is many rules applied for finally only one selected. The preprocess phase also allows to make rule for stropping.

The main stage applies one rule after the other. This means that the order of rules declaration are important. For instance, the keyword rules should be applied from the longest one to the shortest one (maximal munch rule) The most generic one (identifer, type, etc.) must be applied with very low priority, so it is recommanded to place them at the end of the rule list.

When a rule is applied, its method expand is executed. This method replaces the source element by an array of source elements and tokens. Context is allowed for doing scenario like off-side rule.

Typescript

This module is written in Typescript and ready to be used in Typescript without separate typing module.

Modules

This modules exports both a CommonJS module and ES2015 module.

Thanks

  • Thanks to https://refactoring.guru/ for helping me to refactor code, trying to produce something understandable for the non expert.
  • Thanks to the other lexers I have visited on github. Specially the chevrotain lexer.
  • Thanks to the people at Stanford University releasing for free their compiler course. I tried to speak with the same terminology in this lexer: https://web.stanford.edu/class/archive/cs/cs143/cs143.1128/

Author

Made with :heart: by Jean-Louis GUENEGO [email protected]