@multila/multila-parser-generator
v0.2.0
Published
An LR(1) Parser Generator written in TypeScript
Downloads
49
Readme
multila-parser-generator
An LR(1) Parser Generator for the Web written in TypeScript.
Copyright 2022 by Andreas Schwenk
Licensed by GPLv3
Multila Website: https://www.multila.org
Personal Website: https://www.arts-and-sciences.com
Mail: [email protected]
Installation
npm install @multila/multila-parser-generator
Example
The example program in this section defines a trivial interpreter for mathematical terms.
These terms consist of integer constants, operators +
and *
, as well as parentheses.
For example, 2 * (3+4)
is evaluated as 14
.
The grammar for this simple language can be written in BNF as follows:
term = add;
add = add "+" mul | mul;
mul = mul "*" unary | unary;
unary = INT | "(" add ")";
Interpretation can be handled by using a stack:
- Constant integer values are pushed on top of the stack.
- Binary operations, pop the two topmost values from the stack, add them (or multiply them respectively), and push the result onto the stack.
The following code (that can also be found in file test/example.js
) implements the example. Copy it to https://npm.runkit.com/ and run it!
// Import dependencies
var lex = require('@multila/multila-lexer');
const parse = require('@multila/multila-parser-generator');
// Define production rules for bottom-up parsing.
// The identifier right to "->" defines the name of a callback function that
// is called after reduction of the rule.
// The first rule is also called root-rule.
// Instead of using the or operator "|", you may write rules also individually.
const rulesSrc = `
term = add;
add = add "+" mul -> callbackAdd | mul;
mul = mul "*" unary -> callbackMultiply | unary;
unary = INT -> callbackConst | "(" add ")";
`;
// Create an instance of parser the generator and parse rules
const lr1 = new parse.LR1();
lr1.parseRules(rulesSrc);
const table = lr1.calcTable();
console.log(table.toString());
// Define functions that are called after the reduction of rules.
// The parameter is an array of objects of type LexerToken.
const stack = []; // stack of numbers for interpretation
lr1.addCallback('callbackConst', function (terminals) {
// push a constant (integral) value to the stack
stack.push(terminals[0].value);
});
lr1.addCallback('callbackMultiply', function () {
// pop two values from the stack, multiply them and put them
// back onto the stack.
const o2 = stack.pop();
const o1 = stack.pop();
stack.push(o1 * o2);
});
lr1.addCallback('callbackAdd', function () {
// pop two values from the stack, add them and put them
// back onto the stack.
const o2 = stack.pop();
const o1 = stack.pop();
stack.push(o1 + o2);
});
// Source code to be parsed and interpreted.
const src = '2 * (3+4)';
const lexer = new lex.Lexer();
lexer.pushSource('', src);
try {
// Parse
lr1.parse(lexer);
// Print the result (2 * (3+4) = 14)
console.log('result = ' + stack[0]);
} catch (e) {
console.log('Error:' + e);
}
Grammar of rule definitions
Grammar rules can be implemented programmatically or by a domain-specific language (DSL).
The following terminal types are supported:
INT
integer values, e.g.42
.REAL
real values, e.g.3.14
,.1337
HEX
hexadecimal values, e.g.0xAFFE
STR
strings, e.g."Knuth"
ID
identifiers, e.g.pi
,mp3
,imaginary_value
Rule definition with API
- Create a new rule with method
addRule(id:string):Rule
from classLR1
. The identifier is the non-terminal on the left-hand side of the rule. - Use methods
addTerminalItem(t:string)
andaddNonTerminalItem(nt:string)
from classRule
to create items on the right-hand side of the rule. Use a preceding colon (:
) to mark explicit terminal values. Otherwise use one of the terminal types from the table above.
Example
const parse = require('@multila/multila-parser-generator');
const lr1 = new parse.LR1();
// x = INT "a" y;
const rule = lr1.addRule('x');
rule.addTerminalItem('INT');
rule.addTerminalItem(':a'); // precede terminal with a colon
rule.addNonTerminalItem('y');
Rule definition with DSL
Grammar
rules = { rule };
rule = ID "=" rhs { "|" rhs } ";";
rhs = { item } [ "->" ID ];
item = "INT" | "REAL" | "HEX" | "ID" | "STR" | ID | STR;
Example
See introduction example in this README file.
Parsing
For parsing an input program, you may either use method parse(lexer:Lexer)
of class LR1
or write your own LR(1) parser.
Use method getTable()
of class LR1
to get the generated parse table. An stringified output for the example above is listed in the following:
0: action={"INT"->S18, ":("->S5}; goto={add->19, mul->2, unary->1 }
1: action={"END"->R4, ":+"->R4, ":*"->R4}; goto={ }
2: action={":*"->S3, "END"->R2, ":+"->R2}; goto={ }
3: action={"INT"->S18, ":("->S5}; goto={unary->4 }
4: action={"END"->R3, ":+"->R3, ":*"->R3}; goto={ }
5: action={"INT"->S14, ":("->S10}; goto={add->16, mul->7, unary->6 }
6: action={":)"->R4, ":+"->R4, ":*"->R4}; goto={ }
7: action={":*"->S8, ":)"->R2, ":+"->R2}; goto={ }
8: action={"INT"->S14, ":("->S10}; goto={unary->9 }
9: action={":)"->R3, ":+"->R3, ":*"->R3}; goto={ }
10: action={"INT"->S14, ":("->S10}; goto={add->11, mul->7, unary->6 }
11: action={":)"->S15, ":+"->S12}; goto={ }
12: action={"INT"->S14, ":("->S10}; goto={mul->13, unary->6 }
13: action={":*"->S8, ":)"->R1, ":+"->R1}; goto={ }
14: action={":)"->R5, ":+"->R5, ":*"->R5}; goto={ }
15: action={":)"->R6, ":+"->R6, ":*"->R6}; goto={ }
16: action={":)"->S17, ":+"->S12}; goto={ }
17: action={"END"->R6, ":+"->R6, ":*"->R6}; goto={ }
18: action={"END"->R5, ":+"->R5, ":*"->R5}; goto={ }
19: action={":+"->S20, "END"->R0}; goto={ }
20: action={"INT"->S18, ":("->S5}; goto={mul->21, unary->1 }
21: action={":*"->S3, "END"->R1, ":+"->R1}; goto={ }