cfgrammar-tool

v1.0.0

Published

2 years ago

Work with context-free grammars. Parsing, string generation, and manipulation.

Downloads

0High
0Medium
0Low

bakkot

context free grammar parser fuzzing

CFGrammar-Tool

A JavaScript library for working with context-free grammars. It's also a node.js module (npm install cfgrammar-tool).

Check out the the demo.

Features

Parsing. The implementation is Earley's algorithm, so arbitrary CFGs are supported without transformation. Optionally keep track of two parses or all parses, so as to catch ambiguity. Note that tracking all parses can take exponential or infinite time (though the latter possibility can be detected in advance).
Generation. Given a grammar, generate a string of length n in its language. All such strings are generated with non-zero probability, and if the grammar is unambiguous and does not contain nullable nonterminals then strings are generated uniformly at random. Requires n^2 preprocessing time, then linear time for each string.

Useful for automatic testing when QuickCheck and its ilk aren't generating sufficiently structured data. For example, test.js contains a CFG for CFGs, which was used to automatically test this very application.

Diagnostics and manipulation. Find/remove unreachable symbols, symbols which do not generate any string, nullable symbols, duplicate rules, unit productions (A -> B), etc.

Example

var cfgtool = require('cfgrammar-tool');
var types = cfgtool.types;
var parser = cfgtool.parser;
var generatorFactory = cfgtool.generator;

var Grammar = types.Grammar;
var Rule = types.Rule;
var T = types.T;
var NT = types.NT;
var exprGrammar = Grammar([
  Rule('E', [NT('E'), T('+'), NT('T')]),
  Rule('E', [NT('T')]),
  Rule('T', [NT('T'), T('*'), NT('F')]),
  Rule('T', [NT('F')]),
  Rule('F', [T('('), NT('E'), T(')')]),
  Rule('F', [T('n')])
]);

parser.parse(exprGrammar, 'n*(n+n)').length > 0; // true
parser.parse(exprGrammar, 'n(n+n)').length > 0; // false

var generator = generatorFactory(exprGrammar);
generator(21); // something like 'n*((n+(n)*n+n+n*n))*n'

TODO

General code cleanup; this was mostly written in a couple of marathon sessions to try to get a tool based on it up, and the haste shows. Strict mode and linting, too.
Normal forms: put a grammar in Chomsky normal form, Greibach normal form, or others.
Import and export: parse and produce BNF and other representations of grammars.
Automatic tokenization. Currently all tokens are implicitly single-character strings, at least on the parsing end, which is often not what you want.
~~Port to a language with a proper type system~~.
~~Put up a demo page on gh-pages.~~

License

Licensed under the MIT license. If you're making public or commercial use of this library, I encourage (but do not require) you to tell me about it!

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

CFGrammar-Tool

Features

Example

TODO

License