npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

occam-parsers

v23.0.32

Published

Occam's parsers.

Downloads

4,427

Readme

Occam Parsers

Occam's parsers.

Contents

Introduction

Three parsers are documented:

  • A BNF parser, actually a variant of extended BNF.
  • A basic parser, for illustrative purposes, and for developing new grammars.
  • The common parser, which can be extended.

All parsers share common functionality. The last two parse content according to rules defined in the aforementioned variant of extended BNF. The BNF parser on the other hand has its rules hard-coded. These rules can be defined in the self same variant that they implement:

  document                 ::=  ( rule | error )+ ;

  rule                     ::=  name opacityModifier? "::=" definitions ";" ;

  name                     ::=  [name] ;

  definitions              ::=  definition ( "|" definition )* ;

  definition               ::=  part+ precedence? ;

  part                     ::=  nonTerminalPart quantifier*

                             |  terminalPart quantifier*
                          
                             ;

  nonTerminalPart          ::=  choiceOfParts

                             |  sequenceOfParts

                             |  ruleName lookAheadModifier?

                             ;

  terminalPart             ::=  significantTokenType

                             |  regularExpression

                             |  terminalSymbol

                             |  endOfLine

                             |  wildcard

                             |  epsilon

                             |  noWhitespace

                             |  startOfContent

                             ;
                          
  startOfContentPart       ::+  "<START_OF_CONTENT>"; 
  
  noWhitespacePart         ::=  "<NO_WHITESPACE>" ;                              

  sequenceOfParts          ::=  "(" part part+ ")" ;

  choiceOfParts            ::=  "(" part precedence? ( "|" part precedence? )+ ")" ;

  ruleName                 ::=  [name] ;

  significantTokenType     ::=  [type] ;

  regularExpression        ::=  [regular-expression] ;

  terminalSymbol           ::=  [string-literal] ;

  precedence               ::=  "(" [number]? ")" ;
  
  endOfLine                ::=  "<END_OF_LINE>" ;
  
  wildcard                 ::=  "." ;

  epsilon                  ::=  "ε" ;

  quantifier               ::=  optionalQuantifier

                             |  oneOrMoreQuantifier

                             |  zeroOrMoreQuantifier

                             ;

  opacityModifier          ::=  <NO_WHITESPACE>( "." | ".." );

  lookAheadModifier        ::=  <NO_WHITESPACE>"..." ;

  optionalQuantifier       ::=  <NO_WHITESPACE>"?" ;

  oneOrMoreQuantifier      ::=  <NO_WHITESPACE>"+" ;

  zeroOrMoreQuantifier     ::=  <NO_WHITESPACE>"*" ;

  error.                   ::=  . ;

Installation

With npm:

npm install occam-parsers

You can also clone the repository with Git...

git clone https://github.com/djalbat/occam-parsers.git

...and then install the dependencies with npm from within the project's root directory:

npm install

You can also run a development server, see the section on building later on.

Examples

There is a small development server that can be run from within the project's directory with the following command:

npm start

The examples will then be available at the following URL:

http://localhost:8888

The source for the examples can be found in the src/example.js file and corresponding src/example folder. You are encouraged to try the examples whilst reading what follows. You can rebuild them on the fly with the following command:

npm run watch-debug

The development server will reload the page whenever you make changes.

One last thing to bear in mind is that this package is included by way of a relative rather than a package import. If you are importing it into your own application, however, you should use the standard package import.

Usage

Import the requisite parser and its corresponding lexer from this package and the occam-lexers package, respectively. Then call their fromNothing(...) factory methods.

import { BasicLexer } from "occam-lexers";
import { BasicParser } from "occam-parsers"

const basicLexer = BasicLexer.fromNothing(),
      basicParser = BasicParser.fromNothing();

const content = `

        ...

      `,
      tokens = basicLexer.tokenise(content),
      node = basicParser.parse(tokens);

...

The tokens returned from the lexers's tokenise(...) method can be passed directly to the parser's parse(...) method, which itself returns a node or null.

Features

Quantifiers

  • * zero or more
  • + one or more
  • ? optional

These bind tightly to the symbols to their left and can be chained. Take note that both the *+ and ?+ chains will cause an infinite loop and must be avoided.

Regular expressions

A regular expression is distinguished by the usual leading and trailing forward slashes:

/\d+/

Matching significant token types

This can be done with a symbol that is identical to the significant token type in question, contained within square brackets. For example:

 name                       ::=   [unassigned] ;

Matching end of line tokens

This can be done with a special <END_OF_LINE> special symbol. For example:

 verticalSpace              ::=   <END_OF_LINE>+ ;

Matching no whitespace

This can be done with the <NO_WHITESPACE> special symbol. For example:

 parenthesisedTerms         ::=   <NO_WHITESPACE>"(" terms <NO_WHITESPACE>")" ;

It is conventional to leave no whitespace between the symbol and its subsequent part.

Sequences of parts

This can be done with the brackets. For example:

 terms                      ::=   term ( "," term )* ;

Choosing between parts

The vertical bar symbol | is overloaded and can be used in conjunction with brackets to choose between parts as opposed to definitions. For example:

 justifiedStatement         ::=   statement ( "by" | "from" ) reference <END_OF_LINE> ;

Look-ahead

Consider the following rules:

  ABC  ::=  AAB BC ;

  AAB  ::=  "a" "b" | "a" ;

   BC  ::=  "b" "c" ;

These will not parse the tokens a, b, c because the first definition of the AAB rule will parse the a and b tokens, leaving only the c token for the BC rule to parse. This situation can be addressed by making the AAB rule look ahead, that is, try each of its definitions in turn until one is found that allows the remainder of the parent rule to parse. The look-ahead modifier is an ellipsis, thus the rules above become:

  ABC  ::=  AAB... BC ;

  AAB  ::=  "a" "b" | "a" ;

   BC  ::=  "b" "c" ;

Now the ABC rule will indeed parse the tokens a, b, c, because the second definition of the AAB rule will be tried after the first definition fails to allow the BC rule name part to parse.

Also bear in mind that look-ahead is carried out to arbitrary depth and this it affects the behaviour of the ?, * and + quantifiers, which become lazy. For example:

ABC  ::=  AAB... ;

AAB  ::=  "a" "b"+ "b" "c" ;

Here the look-ahead modifier on the AAB rule name part forces the + quantifier on the "b" terminal part to be lazy, allowing the following to parse:

a b b b c

Without look-ahead, the "b"+ part would consume all of the b tokens, leaving none for the subsequent "b" terminal part.

It seems that the parser parses in time that is roughly directly proportional to the length of the input. However, on the ohter hand it is most likely that look-ahead takes exponential time given its nested nature. For this reason, look-ahead should be used sparingly.

Building

Automation is done with npm scripts, have a look at the package.json file. The pertinent commands are:

npm run build-debug
npm run watch-debug

The example will then be available at http://localhost:8888 and will reload automatically when changes are made.

Contact