peg-express
v0.2.2
Published
A PEG-based parser generator for TypeScript
Downloads
7
Maintainers
Readme
peg-express
Installation
To install peg-express, run the following command in the project directory:
$ npm install peg-express
Getting Started
To use peg-express, you need to write a grammar definition and optional semantic actions. Without semantics actions, generated parsers create a parse tree of the input. The user can easily write their semantic actions by extending the generated parser class.
Typical steps to use peg-express are:
- Write a grammar definition in PEG.
- Generate a parser from the grammar definition.
- Write semantic actions by inheriting the generated parser.
- Use the parser.
Step1: Write a grammar definition in PEG
peg-express uses PEG for grammar definition. PEG is a simple and powerful grammar definition language.
You can write a grammar definition for a simple calculator as follows in calculator.peg
:
Expression <- Term (('+' / '-') Term)*
Term <- Factor (('*' / '/') Factor)*
Factor <- Number / '(' Expression ')'
Number <- r'\d+'
The start symbol of the grammar is Expression
, which matches an arithmetic expression like (1+2)*3
.
Note that you can use regular expressions in the grammar definition.
r'\d+'
is a regular expression that matches a sequence of one or more digits.
r
is a prefix of a regular expression literal.
You can write arbitrary JavaScript regular expressions between the prefix r'
and '
.
Step2: Generate a parser from the grammar definition
To generate a parser from the grammar definition, run the following command:
peg-express <grammar>
<grammar>
is a path to a grammar definition file.
The above command generates a file named 'parser.ts'.
For example, if you save the grammar definition in Step 2 as calculator.peg
, you can generate a parser as follows:
peg-express calculator.peg
This generates a parser file parser.ts
in the current directory.
Step3: Write semantic actions by inheriting the generated parser
This step is optional. If you do not write semantic actions, the generated parser creates a parse tree of the input. You can write semantic actions by extending the generated parser class.
For example, if you want to evaluate the input expression, you can write semantic actions as follows:
import { Expression, Factor, Parser, Term, NodeTerminal } from './parser';
export class Calculator extends Parser {
override Expression([term, rest]: Expression) {
return rest.reduce(
(acc, [[add], term]): number =>
add ? acc + term.value : acc - term.value,
term.value
);
}
override Term([factor, rest]: Term) {
return rest.reduce(
(acc, [[mul], factor]): number =>
mul ? acc * factor.value : acc / factor.value,
factor.value
);
}
override Factor([number, parenthesizedExpression]: Factor) {
if (number) {
return number.value;
}
const [, expr] = parenthesizedExpression;
return expr.value;
}
override Number(node: NodeTerminal) {
return parseInt(node.text, 10);
}
}
When you write the above code, the IDE (e.g., VSCode) can provide code completion and type-checking for semantic actions as follows:
The parser
class implements default semantic actions that you can override.
Step4: Use the parser
You can use the generated parser by instantiating the parser class and calling parse()
method.
If you extend the generated parser class, you need to instantiate the extended class.
For example, if you save the semantic actions above as calculator.ts
, you can use the parser as follows:
import { Calculator } from './calculator';
const parser = new Calculator();
const result = parser.parse('(1+2)*3', 'Expression');
if (result instanceof Error) {
console.error(result.message);
} else {
console.log(result.value);
}
The output of the above program is 9
.
PEG Syntax
A rule in PEG is defined as follows:
name <- parsing_expression
name
is a rule name called a nonterminal symbol.
parsing_expression
is a sequence of one or more parsing expressions.
peg-express
supports the following parsing expressions:
| Expression | Example | Description |
| ------------------ | --------- | -------------------------------------- |
| wildcard | . | Matches any character. |
| string | 'abc'
| Matches the character string abc
. |
| regular expression | r'abc'
| Is equivalent to /abc/
in TypeScript |
| nonterminal symbol | name
| Matches the rule name
. |
| ordered choice | e1 / e2
| Matches e1
or e1
. |
| sequence | e1 e2
| Matches e1
and e2
. |
| zero or more | e*
| Matches zero or more e
. |
| one or more | e+
| Matches one or more e
. |
| optional | e?
| Matches e
or nothing. |
| grouping | (e)
| Matches e
. |
| and predicate | &e
| Matches if e
matches. |
| not predicate | !e
| Matches if e
does not match. |
Generating a Parser
Generated parsers consist of the following two files:
parser.ts
: The generated parser class.SemanticValue.ts
: The types of semantic values for each node in the parse tree.
Parser Class parser.ts
peg-express
generates a parser class from a grammar definition.
The generated parser implements default semantic actions that you can override.
A semantic action is a method that is called when a node in the parse tree is matched.
The name of the semantic action is the same as the name of the node in the parse tree.
In other words, the name of the semantic action is the same as the name of the rule in the grammar definition.
The return value of the semantic action is the semantic value of the node in the parse tree.
The parameter of the semantic action is a collection of placeholders, each of which holds the semantic value of a child node in the parse tree.
Its type depends on the right-hand side of the rule in the grammar definition.
For example, if the right-hand side of the rule is A B
where A
and B
are nonterminal symbols, the parameter of the semantic action is [item0, item1]: [Value<SemanticValue.A>, Value<SemanticValue.B>]
.
The type of each parsing expression is defined as follows:
| Expression | Example | Type |
| ------------------ | --------- | ---------------------------------------------------- |
| wildcard | . | NodeTerminal
|
| string | 'abc'
| NodeTerminal
|
| regular expression | r'abc'
| NodeTerminal
|
| nonterminal symbol | name
| Value<SemanticType.name>
|
| ordered choice | e1 / e2
| [e1:Type(e1), e2: null] \| [e1:null, e2: Type(e2)]
|
| sequence | e1 e2
| [e1:Type(e1), e2: Type(e2)]
|
| zero or more | e*
| Type(e)[]
|
| one or more | e+
| Type(e)[]
|
| optional | e?
| Type(e)[]
|
| grouping | (e)
| Type(e)
|
| and predicate | &e
| null
|
| not predicate | !e
| null
|
In the above table, Type(e)
is the type of the semantic value of the parsing expression e
.
Type definitions for semantic values SemanticValue.ts
peg-express
generates a type definition file for semantic values.
By default, the type of the semantic value of each node in the parse tree is any
.
You can override the default type by specifying the type of the semantic value in the grammar definition.
This enables the IDE to provide better type checking when you write semantic actions.
For example, if you want to specify the type of the semantic value of the Expression
node as number
, you can modify the type definition for the Expression
node as follows:
type Expression = number;
When you accidentally write a semantic action that returns a value of a different type, the IDE will notify you of the error.
Currently, SemanticValue.ts
is overwritten every time you generate a parser.
In the future, peg-express
will support the update of the type definition file.
License
MIT License
Feedback
If you have any questions or comments, please contact me at [email protected]