parsil
v1.6.0
Published
A parser combinators library written in Typescript
Downloads
5
Readme
Parsil
Description
Parsil is a lightweight and flexible parser combinators library for JavaScript and TypeScript. It provides a set of composable parsers that allow you to build complex parsing logic with ease.
Key Features:
- Composable parsers for building complex parsing logic
- Support for error handling and error reporting
- Extensive library of predefined parsers for common parsing tasks
- Flexible and expressive API for defining custom parsers
- Well-documented and easy to use
Release notes
v1.1.0
v1.2.0
v1.3.0
- Improved type inference in choice, sequenceOf and exactly parsers using variadic generics from Typescript 4.X
v1.4.0
- New parsers
v1.5.0
- New parser
v1.6.0
- New parser
Table of contents
Installation
Install Parsil using npm:
npm install parsil
Usage
import P from 'your-library-name';
// Define parsers
const digitParser = P.digits();
const letterParser = P.letters();
const wordParser = P.manyOne(letterParser);
// Parse input
const input = 'Hello123';
const result = wordParser.parse(input);
if (result.isSuccess) {
console.log('Parsing succeeded:', result.value);
} else {
console.error('Parsing failed:', result.error);
}
API
Methods
.run
.run
starts the parsing process on an input, (which may be a string
, TypedArray
, ArrayBuffer
, or DataView
), initializes the state, and returns the result of parsing the input using the parser.
Example
str('hello').run('hello')
// -> {
// isError: false,
// result: "hello",
// index: 5
// }
.fork
Takes an input to parse, and two functions to handle the results of parsing:
- an error function that is called when parsing fails
- a success function that is called when parsing is successful.
The fork method will run the parser on the input and, depending on the outcome, call the appropriate function.
Example
str('hello').fork(
'hello',
(errorMsg, parsingState) => {
console.log(errorMsg);
console.log(parsingState);
return "goodbye"
},
(result, parsingState) => {
console.log(parsingState);
return result;
}
);
// [console.log] Object {isError: false, error: null, target: "hello", index: 5, …}
// -> "hello"
str('hello').fork(
'farewell',
(errorMsg, parsingState) => {
console.log(errorMsg);
console.log(parsingState);
return "goodbye"
},
(result, parsingState) => {
console.log(parsingState);
return result;
}
);
// [console.log] ParseError @ index 0 -> str: Expected string 'hello', got 'farew...'
// [console.log] Object {isError: true, error: "ParseError @ index 0 -> str: Expected string 'hello',…", target: "farewell", index: 0, …}
// "goodbye"
.map
.map
transforms the parser into a new parser that applies a function to the result of the original parser.
Example
const newParser = letters.map(x => ({
matchType: 'string',
value: x
});
newParser.run('hello world')
// -> {
// isError: false,
// result: {
// matchType: "string",
// value: "hello"
// },
// index: 5,
// }
.chain
.chain
transforms the parser into a new parser by applying a function to the result of the original parser.
This function should return a new Parser that can be used to parse the next input.
This is used for cases where the result of a parser is needed to decide what to parse next.
Example
const lettersThenSpace = sequenceOf([
letters,
char(' ')
]).map(x => x[0]);
const newParser = lettersThenSpace.chain(matchedValue => {
switch (matchedValue) {
case 'number': return digits;
case 'string': return letters;
case 'bracketed': return sequenceOf([
char('('),
letters,
char(')')
]).map(values => values[1]);
default: return fail('Unrecognised input type');
}
});
.errorMap
.errorMap
is like .map but it transforms the error value. The function passed to .errorMap
gets an object the current error message (error
) and the index (index
) that parsing stopped at.
Example
const newParser = letters.errorMap(({error, index}) => `Old message was: [${error}] @ index ${index}`);
newParser.run('1234')
// -> {
// isError: true,
// error: "Old message was: [ParseError @ index 0 -> letters: Expected letters] @ index 0",
// index: 0,
// }
Functions
anyChar
anyChar
matches exactly one utf-8 character.
Example
anyChar.run('a')
// -> {
// isError: false,
// result: "a",
// index: 1,
// }
anyChar.run('😉')
// -> {
// isError: false,
// result: "😉",
// index: 4,
// }
anyCharExcept
anyCharExcept
takes a exception parser and returns a new parser which matches exactly one character, if it is not matched by the exception parser.
Example
anyCharExcept (char ('.')).run('This is a sentence.')
// -> {
// isError: false,
// result: 'T',
// index: 1,
// data: null
// }
const manyExceptDot = many (anyCharExcept (char ('.')))
manyExceptDot.run('This is a sentence.')
// -> {
// isError: false,
// result: ['T', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 's', 'e', 'n', 't', 'e', 'n', 'c', 'e'],
// index: 18,
// data: null
// }
bit
bit
parses a bit at index from a Dataview
Example
const parser = bit
const data = new Uint8Array([42]).buffer
parser.run(new Dataview(data))
// -> {
// isError: false,
// result: 0,
// index: 1,
// }
between
between
takes 3 parsers, a left parser, a right parser, and a value parser, returning a new parser that matches a value matched by the value parser, between values matched by the left parser and the right parser.
This parser can easily be partially applied with char ('(')
and char (')')
to create a betweenRoundBrackets
parser, for example.
Example
const newParser = between (char ('<')) (char ('>')) (letters);
newParser.run('<hello>')
// -> {
// isError: false,
// result: "hello",
// index: 7,
// }
const betweenRoundBrackets = between (char ('(')) (char (')'));
betweenRoundBrackets (many (letters)).run('(hello world)')
// -> {
// isError: true,
// error: "ParseError @ index 6 -> between: Expected character ')', got ' '",
// index: 6,
// }
char
char
takes a character and returns a parser that matches that character exactly one time.
Example
char ('h').run('hello')
// -> {
// isError: false,
// result: "h",
// index: 1,
// }
choice
choice
is a parser combinator that tries each parser in a given list of parsers, in order,
until one succeeds.
If a parser succeeds, it consumes the relevant input and returns the result.
If no parser succeeds, choice
fails with an error message.
Example
const newParser = choice ([
digit,
char ('!'),
str ('hello'),
str ('pineapple')
])
newParser.run('hello world')
// -> {
// isError: false,
// result: "hello",
// index: 5,
// }
coroutine
coroutine
is a parser that allows for advanced control flow and composition of parsers.
Example
const parserFn: ParserFn<number> = (yield) => {
const x = yield(parserA);
const y = yield(parserB);
return x + y;
};
*
const coroutineParser = coroutine(parserFn);
coroutineParser.run(input);
digit
digit
is a parser that matches exactly one numerical digit /[0-9]/
.
Example
digit.run('99 bottles of beer on the wall')
// -> {
// isError: false,
// result: "9",
// index: 1,
// }
digits
digits
matches one or more numerical digit /[0-9]/
.
Example
digits.run('99 bottles of beer on the wall')
// -> {
// isError: false,
// result: "99",
// index: 2,
// }
endOfInput
endOfInput
is a parser that only succeeds when there is no more input to be parsed.
Example
const newParser = sequenceOf ([
str ('abc'),
endOfInput
]);
newParser.run('abc')
// -> {
// isError: false,
// result: [ "abc", null ],
// index: 3,
// data: null
// }
newParser.run('')
// -> {
// isError: true,
// error: "ParseError @ index 0 -> endOfInput: Expecting string 'abc', but got end of input.",
// index: 0,
// data: null
// }
everyCharUntil
everyCharUntil
takes a termination parser and returns a new parser which matches every possible character up until a value is matched by the termination parser. When a value is matched by the termination parser, it is not "consumed".
Example
everyCharUntil (char ('.')).run('This is a sentence.This is another sentence')
// -> {
// isError: false,
// result: 'This is a sentence',
// index: 18,
// data: null
// }
// termination parser doesn't consume the termination value
const newParser = sequenceOf ([
everyCharUntil (char ('.')),
str ('This is another sentence')
]);
newParser.run('This is a sentence.This is another sentence')
// -> {
// isError: true,
// error: "ParseError (position 18): Expecting string 'This is another sentence', got '.This is another sentenc...'",
// index: 18,
// data: null
// }
everythingUntil
Note: Between 2.x and 3.x, the definition of the everythingUntil
has changed. In 3.x, what was previously everythingUntil
is now everyCharUntil
.
everythingUntil
takes a termination parser and returns a new parser which matches every possible numerical byte up until a value is matched by the termination parser. When a value is matched by the termination parser, it is not "consumed".
Example
everythingUntil (char ('.')).run('This is a sentence.This is another sentence')
// -> {
// isError: false,
// result: [84, 104, 105, 115, 32, 105, 115, 32, 97, 32, 115, 101, 110, 116, 101, 110, 99, 101],
// index: 18,
// data: null
// }
// termination parser doesn't consume the termination value
const newParser = sequenceOf ([
everythingUntil (char ('.')),
str ('This is another sentence')
]);
newParser.run('This is a sentence.This is another sentence')
// -> {
// isError: true,
// error: "ParseError (position 18): Expecting string 'This is another sentence', got '.This is another sentenc...'",
// index: 18,
// data: null
// }
exactly
exactly
takes a positive number and returns a function. That function takes a parser and returns a new parser which matches the given parser the specified number of times.
Example
const newParser = exactly (4)(letter)
newParser.run('abcdef')
// -> {
// isError: false,
// result: [ "a", "b", "c", "d" ],
// index: 4,
// data: null
// }
newParser.run('abc')
// -> {
// isError: true,
// error: 'ParseError @ index 0 -> exactly: Expecting 4 letter, but got end of input.',
// index: 0,
// data: null
// }
newParser.run('12345')
// -> {
// isError: true,
// error: 'ParseError @ index 0 -> exactly: Expecting 4 letter, but got '1'',
// index: 0,
// data: null
// }
fail
fail
takes an error message string and returns a parser that always fails with the provided error message.
Example
fail('Nope').run('hello world')
// -> {
// isError: true,
// error: "Nope",
// index: 0,
// }
int
int
reads the next n
bits from the input and interprets them as an signed integer.
Example
const parser = int(8)
const input = new Uint8Array([-42])
const result = parser.run(new DataView(input.buffer))
// -> {
// isError: false,
// result: -42,
// index: 8,
// }
letter
letter
is a parser that matches exactly one alphabetical letter /[a-zA-Z]/
.
Example
letter.run('hello world')
// -> {
// isError: false,
// result: "h",
// index: 1,
// }
letters
letters
matches one or more alphabetical letter /[a-zA-Z]/
.
Example
letters.run('hello world')
// -> {
// isError: false,
// result: "hello",
// index: 5,
// }
lookAhead
lookAhead
takes look ahead parser, and returns a new parser that matches using the look ahead parser, but without consuming input.
Example
const newParser = sequenceOf ([
str ('hello '),
lookAhead (str ('world')),
str ('wor')
]);
newParser.run('hello world')
// -> {
// isError: false,
// result: [ "hello ", "world", "wor" ],
// index: 9,
// data: null
// }
many
many
is a parser combinator that applies a given parser zero or more times.
It collects the results of each successful parse into an array, and stops when the parser can no longer match the input.
It doesn't fail when the parser doesn't match the input at all; instead, it returns an empty array.
Example
const newParser = many (str ('abc'))
newParser.run('abcabcabcabc')
// -> {
// isError: false,
// result: [ "abc", "abc", "abc", "abc" ],
// index: 12,
// }
newParser.run('')
// -> {
// isError: false,
// result: [],
// index: 0,
// }
newParser.run('12345')
// -> {
// isError: false,
// result: [],
// index: 0,
// }
manyOne
manyOne
is similar to many
, but it requires the input parser to match the input at least once.
Example
const newParser = many1 (str ('abc'))
newParser.run('abcabcabcabc')
// -> {
// isError: false,
// result: [ "abc", "abc", "abc", "abc" ],
// index: 12,
// }
newParser.run('')
// -> {
// isError: true,
// error: "ParseError @ index 0 -> manyOne: Expected to match at least one value",
// index: 0,
// data: null
// }
newParser.run('12345')
// -> {
// isError: true,
// error: "ParseError @ index 0 -> manyOne: Expected to match at least one value",
// index: 0,
// data: null
// }
one
one
parses bit at index from a Dataview and expects it to be 1
Example
const parser = one
const data = new Uint8Array([234]).buffer
parser.run(new Dataview(data))
// -> {
// isError: false,
// result: 1,
// index: 1,
// }
const data = new Uint8Array([42]).buffer
parser.run(new Dataview(data))
// -> {
// isError: true,
// error: "ParseError @ index 0 -> one: Expected 1 but got 0",
// index: 0,
// }
optionalWhitespace
optionalWhitespace
is a parser that matches zero or more whitespace characters.
Example
const newParser = sequenceOf ([
str ('hello'),
optionalWhitespace,
str ('world')
]);
newParser.run('hello world')
// -> {
// isError: false,
// result: [ "hello", " ", "world" ],
// index: 21,
// }
newParser.run('helloworld')
// -> {
// isError: false,
// result: [ "hello", "", "world" ],
// index: 10,
// }
peek
peek
matches exactly one numerical byte without consuming any input.
Example
peek.run('hello world')
// -> {
// isError: false,
// result: 104,
// index: 0,
// data: null
// }
sequenceOf([
str('hello'),
peek
]).run('hello world')
// -> {
// isError: false,
// result: [ "hello", 32 ],
// index: 5,
// data: null
// }
possibly
possibly
takes an attempt parser and returns a new parser which tries to match using the attempt parser. If it is unsuccessful, it returns a null value and does not "consume" any input.
Example
const newParser = sequenceOf ([
possibly (str ('Not Here')),
str ('Yep I am here')
]);
newParser.run('Yep I am here')
// -> {
// isError: false,
// result: [ null, "Yep I am here" ],
// index: 13,
// }
rawString
rawString
matches a string of characters exactly as provided.
Each character in the input string is converted to its corresponding ASCII code and a parser is created for each ASCII code.
The resulting parsers are chained together using sequenceOf to ensure they are parsed in order.
The parser succeeds if all characters are matched in the input and fails otherwise.
Example
const parser = rawString('Hello')
parser.run('Hello')
// -> {
// isError: false,
// result: [72, 101, 108, 108, 111],
// index: 40,
// }
parser.run('World')
// -> {
// isError: true,
// error: "ParseError -> rawString: Expected character H, but got W",
// index: 8,
// }
recursive
recursive
takes a function that returns a parser (a thunk), and returns that same parser. This is needed in order to create recursive parsers because JavaScript is an eager language.
In the following example both the value
parser and the matchArray
parser are defined in terms of each other, so one must be one must be defined using recursive
.
Example
const value = recursiveParser (() => choice ([
matchNum,
matchStr,
matchArray
]));
const betweenSquareBrackets = between (char ('[')) (char (']'));
const commaSeparated = sepBy (char (','));
const spaceSeparated = sepBy (char (' '));
const matchNum = digits;
const matchStr = letters;
const matchArray = betweenSquareBrackets (commaSeparated (value));
spaceSeparated(value).run('abc 123 [42,def] 45')
// -> {
// isError: false,
// result: [ "abc", "123", [ "42", "def" ], "45" ],
// index: 29,
// }
regex
regex
takes a RegExp and returns a parser that matches as many characters as the RegExp matches.
Example
regex(/^[hH][aeiou].{2}o/).run('hello world')
// -> {
// isError: false,
// result: "hello",
// index: 5,
// }
sepBy
sepBy
takes two parsers - a separator parser and a value parser - and returns a new parser that matches zero or more values from the value parser that are separated by values of the separator parser. Because it will match zero or more values, this parser will fail if a value is followed by a separator but NOT another value. If there's no value, the result will be an empty array, not failure.
Example
const newParser = sepBy (char (',')) (letters)
newParser.run('some,comma,separated,words')
// -> {
// isError: false,
// result: [ "some", "comma", "separated", "words" ],
// index: 26,
// }
newParser.run('')
// -> {
// isError: false,
// result: [],
// index: 0,
// }
newParser.run('12345')
// -> {
// isError: false,
// result: [],
// index: 0,
// }
sepByOne
sepByOne
is the same as sepBy
, except that it matches one or more occurence.
Example
const newParser = sepByOne(char (','))(letters)
newParser.run('some,comma,separated,words')
// -> {
// isError: false,
// result: [ "some", "comma", "separated", "words" ],
// index: 26,
// }
newParser.run('1,2,3')
// -> {
// isError: true,
// error: "ParseError @ index0 -> sepByOne: Expected to match at least one separated value",
// index: 0,
// }
sequenceOf
sequenceOf
is a parser combinator that accepts an array of parsers and applies them
in sequence to the input. If all parsers succeed, it returns an array
of their results.
If any parser fails, it fails immediately and returns the error state of that parser.
Example
const newParser = sequenceOf ([
str ('he'),
letters,
char (' '),
str ('world'),
])
newParser.run('hello world')
// -> {
// isError: false,
// result: [ "he", "llo", " ", "world" ],
// index: 11,
// }
startOfInput
startOfInput
is a parser that only succeeds when the parser is at the beginning of the input.
Example
const mustBeginWithHeading = sequenceOf([
startOfInput,
str("# ")
]);
const newParser = between(mustBeginWithHeading)(endOfInput)(everyCharUntil(endOfInput));
newParser.run('# Heading');
// -> {
// isError: false,
// result: "# Heading",
// index: 9,
// data: null
// }
newParser.run(' # Heading');
// -> {
// isError: true,
// error: "ParseError @ index 0 -> startOfInput: Expecting string '# ', got ' #...'",
// index: 0,
// data: null
// }
succeed
succeed
is a parser combinator that always succeeds and produces a constant value. It ignores the input state and returns the specified value as the result.
Example
const parser = succeed(42);
parser.run("hello world");
// Returns:
// {
// isError: false,
// result: 42,
// index: 0
// }
str
str
tries to match a given string against its input.
Example
str('hello').run('hello world')
// -> {
// isError: false,
// result: "hello",
// index: 5,
// }
uint
uint
reads the next n
bits from the input and interprets them as an unsigned integer.
Example
const parser = uint(8)
const input = new Uint8Array([42])
const result = parser.run(new DataView(input.buffer))
// -> {
// isError: false,
// result: 42,
// index: 8,
// }
whitespace
whitespace
is a parser that matches one or more whitespace characters.
Example
const newParser = sequenceOf ([
str ('hello'),
whitespace,
str ('world')
]);
newParser.run('hello world')
// -> {
// isError: false,
// result: [ "hello", " ", "world" ],
// index: 21,
// }
newParser.run('helloworld')
// -> {
// isError: true,
// error: "ParseError 'many1' (position 5): Expected to match at least one value",
// index: 5,
// }
zero
zero
parses bit at index from a Dataview and expects it to be 0
Example
const parser = zero
const data = new Uint8Array([42]).buffer
parser.run(new Dataview(data))
// -> {
// isError: false,
// result: 0,
// index: 1,
// }
const data = new Uint8Array([234]).buffer
parser.run(new Dataview(data))
// -> {
// isError: true,
// error: "ParseError @ index 0 -> zero: Expected 0 but got 1",
// index: 0,
// }