@conartist6/rd-parse
v4.0.0-rc.0
Published
Recursive descent parser generator. Define your grammar in pure Javascript.
Downloads
20
Readme
@conartist6/rd-parse
A generic, minimalist, recursive-descent parser which allows you to define your grammar directly in Javascript in a syntax closely related to EBNF. Avoiding the need to compile the grammar makes the resultant parser compact and quick to initialize.
You will be bound by the limitations of recursive descent parsing, for example you must remove left recursion in your grammar.
Usage
npm i rd-parse # or
yarn add rd-parse
Once installed you can import the Parser
class and the grammar combinators and build your parser. Here is an example which uses rd-parse to recognize a subset of valid JSON.
const {
Parser,
Node,
All,
Any,
Star,
Y,
} = require('@conartist6/rd-parse');
// `Y` lets us use a function recursively before we're finished defining it
const dict = Y((dict) => {
// Match empty space and do not use the result
const _ = Ignore(/\s*/);
const string = Node(
All('"', /[^"]*/, '"'),
([text]) => text,
);
const key = string;
// `dict` is the top level grammar returned below
const value = Any(string, dict);
const entry = Node(
All(key, _, ':', _, value),
(entry) => entry,
);
// Define `dict` as the object built from [key, value] entries
return Node(
All('{', _, Star(entry), _, '}'),
Object.fromEntries,
);
});
const dictParser = new Parser(dict);
dictParser.parse('{ "foo": "bar" }');
// { foo: 'bar' }
dictParser.parse('{ "foo": { "bar": "baz" } }');
// { foo: { bar: 'baz' }}
Grammar
The main task in using rd-parse is to define a grammar. A grammar is made up of rules, where a rule may be a token or a combinator. Tokens can be represented as string literals or regex literals, and each token is itself a grammar. For example the /\w/
token is a grammar that matches any single lowercase latin letter.
Making a useful grammar is a process of combining token grammars using combinators such as All
, Optional
, and Plus
. Each combinator returns a new grammar derived from an input grammar or grammars according to the rules documented below:
'foo'
matches if the charactersfoo
are next in the input/[0-9]/
matches if the next character in the input is a digitAll(...grammars)
matches when allgrammars
matchAny(...grammars)
matches when anygrammar
ofgrammars
matchesOptional(grammar)
always matchesIgnore(grammar)
matches whengrammar
matchesPlus(grammar)
matchesgrammar
repeated one or more timesStar(grammar)
matchesgrammar
repeated zero or more times
The last step of parsing - extracting and AST from the grammar - is done with the special Node
combinator, which takes a (stack) => node
callback. The stack is built up by tokens and combinators, and is always transformed to a single value (usually an object) by Node
. The specifics of how the stack grows are documented below:
'foo'
does not add anything to the stack/[0-9]/
addsmatch[0]
to the stack, ornull
All(...grammars)
concatenates stacks from allgrammars
Any(...grammars)
uses stack from first matching grammar ofgrammars
Optional(grammar)
adds to the stack whengrammar
matchesIgnore(grammar)
never adds to the stackPlus(grammar)
concatenates stacks from all repetitions ofgrammar
Star(grammar)
concatenates stacks from all repetitions ofgrammar
Fork
This project is forked from dmaevsky's rd-parse. I wanted the freedom to tweak several aspects of the upstream library at once.
In particular the documentation in this version has been rewritten and several notable changes have been made to the API:
- Undocumented exports are removed, particularly
START
, andUse
. - Regex token changes:
- Capture groups no longer have meaning. Instead the full match result is used.
- required
^
is now prepended to expression for you
Ignore
renamed toIgnoreBetween
.Ignore
is a new rule type.