ux-lexer
v0.10.0-alpha1
Published
An extensible lexer written JavaScript that does not require regular expressions
Downloads
3
Readme
ux-lexer
version: 0.10.0-alpha1
Extensible Lexer for JavaScript without requiring regular expressions.
This library is meant to provide a foundation for creating custom lexers written in JavaScript in order to provide lexical analysis which will produce tokens.
- Create rules
- Add the rules to a lexer in order of precedence.
- Parse the lexical syntax from a string.
Contents
Key Requirements
- A lexer that does not require regular expressions.
- Runs in multiple JavaScript environments: browsers, web workers, nodejs, windows rt.
- Keep the library at a bare minimum.
Rationale
Some problems need a better solution than using regular expressions to parse tokens from strings, especially HTML. Regular expressions can be ineffecient,hard to read, and gives you only so much control.
Builds
The builds for nodejs, browser, and amd will let you include files as needed. The builds will also include a util.js file that includes all the methods bundled in a single file. The build process creates 3 versions of the scripts, one for CommonJS, one for AMD, and one with closures for the browser.
The source files are included with the npm module inside the src folder. This is so that developers and script consumers may cheery pick the functionality he/she wants and include the files as neede for custom builds.
To build and test everything use:
$ grunt ci
There are two different versions of ux-lexer. lexer.js will require ux-util as a depdendency, while lexer-all.js will bundle only the methods needed from ux-util requiring zero external dependenies.
Browser Distribution
location: dist/browser
The browser distribution will use closures to wrap functionality and it uses the global variable of "ux.util". If you wish to use a method you can do the following:
<script type="/scripts/ux-util/equals.js"> </script>
var equals = self.ux.util.equals;
if(equals(left, right))
{
// do something
}
AMD Distribution
location: dist/amd[/lib]
The amd distribution has the main file in the root and the rest of the files are pushed into the lib folder. This is so that the same require statements will work with node and when using something like require js with a browser.
CommonJS Distribution
location: lib
The files are located inside of the lib folder.
API
Reader
Provides various methods to read, scan, and peek at various parts of an array like value/object. back to top
constructor(Array|String enumerable)
Takes an array like source that can be iterated over and has a zero based index property accessor.
example
var reader = new Reader("text to reader");
var example = function() {
var argReader = new Reader(arguments);
};
current
Gets the current value or object for the current position.
back to top
limit
Gets the number of items in the ienumerable which is the fartherest index - 1 that the reader can move to. back to top
position
Gets the current position/index that reader is currently pointing to for the enumerable.
data
Gets the enumerable data that the reader was given. back to top
emptyValue
Gets or sets value that the reader knows to be the end of string or file. This defaults to null.
dispose()
Remove any references that the reader is holding on to. By default, the reader will dispose of the reference to the enumerable and methods that are created in the constructor.
example
var using = require("ux-util/lib/using");
using(new Reader("some text here!"), function(r){
}); // disposed will be called.
var reader = new Reader("some other text");
console.log(reader.peek(0));
reader.dispose();
next()
Returns the next value in the ienumerable. If the environment supports StopIteration, next() will throw it when done. Otherwise it will throw an Error with the message of "StopIteration".
example
var example = function(){
for(value of new Reader(arguments)) {
console.log(value);
}
}
example("one", "two", "three");
var reader = new Reader("function(arg1, arg2) {}");
try {
while(true) {
console.log(reader.next());
}
} catch(e) {
if(e.message !== "StopInteration" && typeof e !== "StopInteration") {
// log error or rethrow
}
}
nextValue()
Returns the next value or an empty value if the reader has reached the end of ienumerable.
example
var reader = new Reader("function(arg1, arg2) {}");
var current = null;
while((current = reader.nextValue()) !== reader.emptyValue) {
console.log(current);
}
peek(Number position)
Returns the value or object at the specified position if the position is less than the limit, otherwise it returns the emptyValue.
example
var reader = new Reader("ABCDEF");
console.log(reader.peek(3)); // D
console.log(reader.peek(9)); // null
peekAtNext()
Returns the value, object, or emptyValue for next position in the reader.
#example
var reader = new Reader("ABCDED");
reader.next();
var c = reader.peekAtNext();
console.log(c); // B
reset()
Return the reader to the start position in order to be read the enumerable again.
example
var reader = new Reader("ABCDEF");
var current = null;
while((current = reader.nextValue())) {
console.log(current);
}
reader.reset();
while(current = reader.nextValue())) {
console.log("v2: " + current);
}
scan(Function|Object predicate, [Number position], [Number limit])
Looks for a section of the enumerable for values or objects that match the predicate and returns position of the match.
example
var reader = Reader("ABCDEFEDCBA");
var position = reader.scan("D");
console.log(position); // 3
position = reader.scan(function scan(c) {
var count = scan.count || 0;
if(c === 'D')
{
if(count === 1)
return true;
scan.count = 1;
}
return false;
});
position = reader.scan("D", 5);
console.log(position); // 7
slice(Number offset, Number limit)
Returns an array of values or objects that starts at the offset position up to the specified limit.
example
var reader = Reader("ABCDEFEDCBA");
var slice = reader.slice(1,2);
console.log(slice); // ["B","C"];
to(Number position)
Moves the reader to specified position.
example
var reader = Reader("ABCDEFEDCBA");
reader.to(3)
var next = reader.nextValue();
console.log(next); // "E"
LexerRule
Rules determine how characters are consumed and transformed into a token.
LexerRule Example
var IdentifierRule = LexerRule.extend({
tokenName: "IDENTIFIER",
value: null,
position: null;
match: function(character, reader) {
var alpha = Lexer.isLetter(character);
if(!alpha || (character !== "_" && character !== "$"))
return false;
this.value = this.position = null;
var start = reader.position,
i = start,
c = null;
while((c = reader.peek(i++)) !== reader.emtpyValue && c !== ' ')
{
if(!Lexer.isLetterOrDigit(c) && c !== '_')
return false;
}
var count = (start - i);
this.value = reader.slice(start, count).join('');
this.position = i;
return true;
},
createToken: function(reader) {
var token = {name: this.tokenName, value: this.value, ruleIndex: this.ruleIndex };
reader.to(this.position);
this.value = this.position = null;
return token;
}
});
var SpaceRule = LexerRule.extend({
match:function(character, reader) {
return character === ' ';
}
});
var AnyRule = LexerRule.extend({
match: function(character, reader) {
return character !== ' ';
},
next: function(reader) {
var c = reader.peekAtNext();
if(c !== ' ' && c !== null)
return true;
return false;
}
});
var enumerable = "$test word hyphen-word";
var reader = new Reader(enumerable),
c = null,
rules = [new IdentifierRule(), new SpaceRule(), new AnyRule()];
tokens = [],
i = 0,
l = rules.length;
while((c = reader.nextValue()) !== reader.emtpyValue)
{
for(; i < l; i++)
{
var rule = rules[i];
if(rule.match(c))
{
tokens.push(rule.createToken(reader));
break;
}
}
}
console.log(tokens);
symbol
symbol static property found on the constructor that is the same value as the tokenName property. This will allow you to have one statically available const for the token name.
example
var NewRule = LexerRule.extend({
tokenName: "NEW",
// other stuff
});
// elsewhere
// if token.name === "NEW"
if(tokens[2].name === NewRule.symbol) {
// do something.
}
constructor()
Creates an instance of LexerRule.
tokenName
Gets or sets the name for the token when the rule generates the token object.
back to top | example |
createToken(Reader reader)
Returns the token generated by this rule when a match is found.
match(String character, Reader reader)
Returns true when the rule matches on the character(s), otherwise it returns false.
next(Reader reader)
Returns true when the rule matches on the next character(s) in the sequence, otherwise it returns false. This method is used by createToken to generate the value for the token and move the reader forward as needed.
extends(Object prototype)
Creates a sub class of LexerRule. This is the preferred way of sub classing the LexerRule and to create rules for the lexer.
Lexer
The base class to inherit from in order to create a customized lexer.
Lexer example
var SimpleLexer = Lexer.extend({
emptyValue: null,
addRules: function() {
this.addRule(new IdentifierRule());
this.addRule(new SpaceRule());
this.addRule(new AnyRule());
}
});
var lexer = new SimpleLexer("var x = new Test();"),
tokens = []
token = null;
while((token = lexer.nextValue()) !== lexer.emptyValue)
{
if(token.name !== SpaceRule.symbol)
tokens.push(token);
}
console.log(tokens);
constructor(Object enumerable)
The Lexer constructor. It takes an array like object as the main parameter. This could be a string, array, or arguments.
emptyValue]
Gets or sets emptyvalue or the end of sequence marker. This will instructor the lexer that it has reached the end of known tokens.
initialized
Gets a value that indicates whether or not the Lexer has been initialized.
reader
Gets a reference to the reader for the Lexer.
rules
Gets the array of rules for the lexer. The order of the rules are important as the rules are processed in order. The rule that matches first determines how the token is created.
slots
Gets the array of positions where tokens were found within the enumerable value that was passed to the lexer for analysis.
addRule(LexerRule rule)
Adds a lexer rule to the lexer in order to find matches and create tokens. back to top | example
addRules()
An abstract method that subclasses are meant to override in order to add rules to the lexer.
dispose()
Disposes of resources that the lexer is holding onto in order to free up memory.
init()
Initializes the lexer. This is called by the constructor. back to top
iterator()
Returns an iterator for the lexer. This method is for iterators in ecmascript 6, however it can be used in previous versions of JavaScript.
var lexer = new SimpleLexer("var x ='one';"),
tokens = [];
for(token of lexer) {
if(token.name !== SpaceRule.symbol)
tokens.push(token);
}
next()
Returns the next value in the iteration or throws a StopIteration exception. If the environment does not support StopIteration, then an Error with the message "StopIteration" is thrown.
nextValue()
Returns the next value in the iteration or returns the emptyValue when the sequence / loop has ended.
extends(Object prototype)
A static method that sub classes lexer.
isDigit(String character)
A static method that returns true if the character is a digit (0-9), otherwise it returns false.
isLetter(String character)
A static method that returns true if the character is a letter (a-zA-z), otherwise it returns false.
isLetterOrDigit(String character)
A static method that returns true if the character is a letter or digit(0-9a-zA-Z), otherwise it returs false.
License
For extends, isPlainObject, isWindow: Copyright 2014 jQuery Foundation and other contributors http://jquery.com/
The MIT License (MIT)
Copyright (c) 2013-2014 Michael Herndon http://dev.michaelherndon.com
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.