@candlelib/wind
v0.5.7
Published
Tokenizer
Downloads
15
Readme
\ ˈwīnd \ - to raise to a high level [as of excitement or tension]
Install
NPM
npm install --save @candlelib/wind
Usage
note: This script uses ES2015 module syntax, and has the extension .mjs. To include this script in a project, you may need to use the node flag
--experimental-modules
; or, use a bundler that supports ES modules, such as rollup.
import wind from "@candlelib/wind"
const sample_string = "The 2345 a 0x3456 + 'a string'";
let lexer = wind(sample_string);
//Example
lexer.text //=> "The"
lexer.n.tx //=> "2345"
lexer.n.text //=> "a"
lexer.assert("b")
lexer.text //=> "0x3456"
lexer.ty == lexer.types.number //=> true
Wind Lexer
import { Lexer } from "@candlelib/wind"
Constructor
new Lexer ( string [ , INCLUDE_WHITE_SPACE_TOKENS ] )
string
- The input string to parse.INCLUDE_WHITE_SPACE_TOKENS
- Flag to include white space tokens such asTABS
andNEW_LINE
.
note: the default export
wind
has the same form as the Lexer constructor function and is called without the new keyword.let lexer = wind ( string [ , INCLUDE_WHITE_SPACE_TOKENS )
Properties
char (Read-Only) - Number The char offset of the token relative to the
line
.CHARACTERS_ONLY - Boolean If true the Lexer will only produce tokens that are one character in length;
END (Read-Only) - Boolean If true the Lexer has reached the end of the input string.s
IGNORE_WHITE_SPACE - Boolean If true
white_space
andnew_line
tokens will not be generated.line (Read-Only) - Number The index of the current line the token is located at.
off - Number The absolute index position of the current token measured from the beginning of the input string.
p - Wind Lexer A pointer cache to a peeking Lexer.
PARSE_STRING - Boolean If set to true then string tokens will not be generated and instead the contents of string will be individually tokenized.
sl - Number The length of the input string. Changing
sl
will cause the Lexer to stop parsing onceoff+token_length >= sl
.str - String The string that is being tokenized.
string (Read-Only) - String Returns the result of
slice()
string_length (Read-Only) - Number The length of the remaining string to be parsed. Same as
lex.sl - lex.off
.text - String The string value for the current token.
tl - Number The size of the current token.
type - Number The current token type. See types.
types - Object Proxy to types object.
ch The first character of the current token.
Alias properties
n Property proxy for
next()
;string Returns the result of
slice()
.token Property proxy for
copy()
tx Proxy for
text
.ty Proxy for
type
.pos Proxy for
off
.pk Property proxy for
peek()
.
Methods
Lexer - assert ( text ) Compares the current token text value to the argument
text
. If the values are the same then the lexer advances to the next token. If they are not equal, an error message is thrown.- Returns Lexer to allow method chaining.
Lexer - assertCharacter ( char ) Same as
assert()
except compares a single character only.- Returns Lexer to allow method chaining.
Lexer - comment ( [ ASSERT [ , marker ] ] ) Skips to the end of the comment section if the current token is
/
and the peek token is/
or*
. If true is passed for theASSERT
argument then an error is thrown if the current token plus the peek token is not/*
or//
.- Returns Lexer to allow method chaining.
Lexer - copy ( [ destination ]) Copies the value of the lexer to
destination
.destination
defaults to a new Wind Lexer.Lexer - fence ( [ marker ] ) - Reduces the input string's parse length by the value of
marker.off
. The value of themarker
must be a Wind Lexer that has the same input string as the callee Wind Lexer.- Returns Lexer to allow method chaining.
Lexer - next ( [ marker ] ) Advances the
marker
to the next token in its input string. Returnsmarker
or null if the end of the input string has been reached.marker
defaults to the calling Wind Lexer object, which means this will be returned if no value is passed asmarker
.- Returns Lexer to allow method chaining.
Lexer - peek ( [ marker [ , peek_marker ] ] ) Returns another Wind Lexer that is advanced one token ahead of
marker
.marker
defaults to this andpeek_marker
defaults top
. A new Wind Lexer is created if no value is passed aspeek_marker
andmarker.p
is null.Lexer - reset ( ) Resets lexer completely. After this is called, the lexer will need to be set with a new input string to allow it to begin parsing again.
- Returns Lexer to allow method chaining.
Lexer - resetHead ( ) Reset the lexer to the beginning of the string.
- Returns Lexer to allow method chaining.
Lexer - setString ( string [ , RESET ] ) Changes the input string to
string
. If the optionalRESET
argument is true thenresetHead()
is also called.- Returns Lexer to allow method chaining.
String - slice ( [ start ] ) Returns a substring of the input string that starts at
start
and ends at the value ofoff
. Ifstart
is undefined then the substring starts atoff
and ends atsl
.Lexer - sync ( [ marker ] ) Copies the current values of the
marker
object to the Wind Lexer.marker
defaults to the value ofp
.- Returns Lexer to allow method chaining.
throw ( message ) Throws a new Error with a custom
message
and information to indicate where in the input string the current token is positioned.String - toString ( ) Returns the result of
slice()
.trim ( ) Creates and returns new Lexer with leading and trailing whitespace and line terminator characters removed from the input string.
Alias Methods
a ( text ) Proxy for
assert(text)
.aC ( char ) Proxy for
assertCharacter(character)
.r ( ) Proxy for
reset()
.s( [ start ] ) Proxy for
slice(start)
.
Types
There are 10 types of tokens that the Wind Lexer will create. Type identifiers can be accessed through wind.types, Lexer.types, and the types
property in Lexer instances. Each type is identified with a power of 2 value to allow nested comparisons:
(lexer.type & (lexer.types.identifier | lexer.types.symbol)) ? true : false;
types.identifier or types.id Any set of characters beginning with
_
|a-z
|A-Z
, and followed by0-9
|a-z
|A-Z
|-
|_
|#
|$
.types.number or types.num Any set of characters beginning with
0-9
|.
, and followed by0-9
|.
.types.string or types.str A set of characters beginning with either
'
or"
and ending with a matching'
or"
.types.open_bracket or types.ob A single character from the set
<
|(
|{
|[
.types.close_bracket or types.cb A single character from the set
>
|)
|}
|]
.types.operator or types.op A single character from the set
*
|+
|<
|=
|>
|\
|&
|%
|!
||
|^
|:
.types.new_line or types.nl A single
newline
(LF
orNL
) character. It may also beLFCR
if the input string has Windows style new lines.types.white_space or types.ws An uninterrupted set of
tab
orspace
characters.types.symbol or types.sym All other characters not defined by the the above, with each symbol token being comprised of one character.
types.data_link or types.dl A data link ASCII character, followed by two more characters and another data link character.