ux-lexer

v0.10.0-alpha1

Published

2 years ago

An extensible lexer written JavaScript that does not require regular expressions

Downloads

0High
0Medium
0Low

michaelherndon

ux-lexer

version: 0.10.0-alpha1

Extensible Lexer for JavaScript without requiring regular expressions.

This library is meant to provide a foundation for creating custom lexers written in JavaScript in order to provide lexical analysis which will produce tokens.

Create rules
Add the rules to a lexer in order of precedence.
Parse the lexical syntax from a string.

Key Requirements

A lexer that does not require regular expressions.
Runs in multiple JavaScript environments: browsers, web workers, nodejs, windows rt.
Keep the library at a bare minimum.

Rationale

Some problems need a better solution than using regular expressions to parse tokens from strings, especially HTML. Regular expressions can be ineffecient,hard to read, and gives you only so much control.

Builds

The builds for nodejs, browser, and amd will let you include files as needed. The builds will also include a util.js file that includes all the methods bundled in a single file. The build process creates 3 versions of the scripts, one for CommonJS, one for AMD, and one with closures for the browser.

The source files are included with the npm module inside the src folder. This is so that developers and script consumers may cheery pick the functionality he/she wants and include the files as neede for custom builds.

To build and test everything use:

$ grunt ci

There are two different versions of ux-lexer. lexer.js will require ux-util as a depdendency, while lexer-all.js will bundle only the methods needed from ux-util requiring zero external dependenies.

Browser Distribution

location: dist/browser

The browser distribution will use closures to wrap functionality and it uses the global variable of "ux.util". If you wish to use a method you can do the following:

    <script type="/scripts/ux-util/equals.js"> </script>

    var equals = self.ux.util.equals;
    if(equals(left, right))
    {
        // do something
    }

AMD Distribution

location: dist/amd[/lib]

The amd distribution has the main file in the root and the rest of the files are pushed into the lib folder. This is so that the same require statements will work with node and when using something like require js with a browser.

CommonJS Distribution

location: lib

The files are located inside of the lib folder.

API

Reader

Provides various methods to read, scan, and peek at various parts of an array like value/object. back to top

constructor(Array|String enumerable)

Takes an array like source that can be iterated over and has a zero based index property accessor.

example

    var reader = new Reader("text to reader");

    var example = function() {
        var argReader = new Reader(arguments);
    };

current

Gets the current value or object for the current position.
back to top

limit

Gets the number of items in the ienumerable which is the fartherest index - 1 that the reader can move to. back to top

position

Gets the current position/index that reader is currently pointing to for the enumerable.

data

Gets the enumerable data that the reader was given. back to top

emptyValue

Gets or sets value that the reader knows to be the end of string or file. This defaults to null.

dispose()

Remove any references that the reader is holding on to. By default, the reader will dispose of the reference to the enumerable and methods that are created in the constructor.

example

    var using = require("ux-util/lib/using");

    using(new Reader("some text here!"), function(r){

    }); // disposed will be called. 

    var reader = new Reader("some other text");

    console.log(reader.peek(0));

    reader.dispose();

next()

Returns the next value in the ienumerable. If the environment supports StopIteration, next() will throw it when done. Otherwise it will throw an Error with the message of "StopIteration".

example

    var example = function(){
        for(value of new Reader(arguments)) {
            console.log(value);
        }
    }

    example("one", "two", "three");

    var reader = new Reader("function(arg1, arg2) {}");
    try {
        while(true) {
            console.log(reader.next());
        }
    } catch(e) {
        if(e.message !== "StopInteration" && typeof e !== "StopInteration") {
            // log error or rethrow
        }
    }

nextValue()

Returns the next value or an empty value if the reader has reached the end of ienumerable.

example

    var reader = new Reader("function(arg1, arg2) {}");
    var current  = null;
    while((current = reader.nextValue()) !== reader.emptyValue) {
        console.log(current);
    }

peek(Number position)

Returns the value or object at the specified position if the position is less than the limit, otherwise it returns the emptyValue.

example

    var reader = new Reader("ABCDEF");
    console.log(reader.peek(3)); // D
    console.log(reader.peek(9)); // null

peekAtNext()

Returns the value, object, or emptyValue for next position in the reader.

#example

    var reader = new Reader("ABCDED");
    reader.next();
    var c = reader.peekAtNext();
    console.log(c); // B

reset()

Return the reader to the start position in order to be read the enumerable again.

example

    var reader = new Reader("ABCDEF");
    var current = null;
    while((current = reader.nextValue())) {
        console.log(current);
    }

    reader.reset();
    while(current = reader.nextValue())) {
        console.log("v2: " + current);
    }

scan(Function|Object predicate, [Number position], [Number limit])

Looks for a section of the enumerable for values or objects that match the predicate and returns position of the match.

example

    var reader = Reader("ABCDEFEDCBA");

    var position = reader.scan("D");
    console.log(position); // 3

    position = reader.scan(function scan(c) {
        var count = scan.count || 0;
        if(c === 'D')
        {
            if(count === 1)
                return true;
            scan.count = 1;
        }
        return false;
    });

    position = reader.scan("D", 5);
    console.log(position); // 7

slice(Number offset, Number limit)

Returns an array of values or objects that starts at the offset position up to the specified limit.

example

    var reader = Reader("ABCDEFEDCBA");

    var slice = reader.slice(1,2);
    console.log(slice); // ["B","C"];

to(Number position)

Moves the reader to specified position.

example

    var reader = Reader("ABCDEFEDCBA");
    reader.to(3)
    var next = reader.nextValue();
    console.log(next); // "E"

LexerRule

Rules determine how characters are consumed and transformed into a token.

LexerRule Example

    var IdentifierRule = LexerRule.extend({
        tokenName: "IDENTIFIER",
        value: null,
        position: null;
        match: function(character, reader) {
            var alpha = Lexer.isLetter(character);
        
            if(!alpha || (character !== "_" && character !== "$"))
                return false;

            this.value = this.position = null;
            var start = reader.position,
                i = start,
                c = null;

            while((c = reader.peek(i++)) !== reader.emtpyValue && c !== ' ')
            {
                if(!Lexer.isLetterOrDigit(c)  && c !== '_')
                    return false;
            }
            
            var count = (start - i);
            this.value = reader.slice(start, count).join('');
            this.position = i;

            return true;
        },
        createToken: function(reader) {
            var token = {name: this.tokenName, value: this.value, ruleIndex: this.ruleIndex };
            reader.to(this.position);

            this.value = this.position = null;

            return token;
        }
    });
    var SpaceRule = LexerRule.extend({
        match:function(character, reader) {
            return character === ' ';
        }
    });


    var AnyRule = LexerRule.extend({
        match: function(character, reader) {
            return character !== ' ';
        },
        next: function(reader) {
            var c = reader.peekAtNext();
            if(c !== ' ' && c !== null)
                return true;
            return false;
        }
    });

    var enumerable = "$test word hyphen-word";

    var reader = new Reader(enumerable),
        c = null, 
        rules = [new IdentifierRule(), new SpaceRule(), new AnyRule()];
        tokens = [],
        i = 0,
        l = rules.length;

    while((c = reader.nextValue()) !== reader.emtpyValue)
    {
        for(; i < l; i++)
        {
            var rule = rules[i];
            if(rule.match(c))
            {
                tokens.push(rule.createToken(reader));
                break;
            }
        }
    }

    console.log(tokens);

symbol

symbol static property found on the constructor that is the same value as the tokenName property. This will allow you to have one statically available const for the token name.

example

    var NewRule = LexerRule.extend({
        tokenName: "NEW",
        // other stuff
    });

    // elsewhere

    // if token.name === "NEW"
    if(tokens[2].name === NewRule.symbol) {
        // do something.  
    }

constructor()

Creates an instance of LexerRule.

tokenName

Gets or sets the name for the token when the rule generates the token object.
back to top | example |

createToken(Reader reader)

Returns the token generated by this rule when a match is found.

match(String character, Reader reader)

Returns true when the rule matches on the character(s), otherwise it returns false.

next(Reader reader)

Returns true when the rule matches on the next character(s) in the sequence, otherwise it returns false. This method is used by createToken to generate the value for the token and move the reader forward as needed.

extends(Object prototype)

Creates a sub class of LexerRule. This is the preferred way of sub classing the LexerRule and to create rules for the lexer.

Lexer

The base class to inherit from in order to create a customized lexer.

Lexer example

    var SimpleLexer = Lexer.extend({
        emptyValue: null,
        addRules: function() {
            this.addRule(new IdentifierRule());
            this.addRule(new SpaceRule());
            this.addRule(new AnyRule());
        }
    });

    var lexer = new SimpleLexer("var x = new Test();"),
        tokens = []
        token = null;

    while((token = lexer.nextValue()) !== lexer.emptyValue)
    {
        if(token.name !== SpaceRule.symbol)
            tokens.push(token);
    }

    console.log(tokens);

constructor(Object enumerable)

The Lexer constructor. It takes an array like object as the main parameter. This could be a string, array, or arguments.

emptyValue]

Gets or sets emptyvalue or the end of sequence marker. This will instructor the lexer that it has reached the end of known tokens.

initialized

Gets a value that indicates whether or not the Lexer has been initialized.

reader

Gets a reference to the reader for the Lexer.

rules

Gets the array of rules for the lexer. The order of the rules are important as the rules are processed in order. The rule that matches first determines how the token is created.

slots

Gets the array of positions where tokens were found within the enumerable value that was passed to the lexer for analysis.

addRule(LexerRule rule)

Adds a lexer rule to the lexer in order to find matches and create tokens. back to top | example

addRules()

An abstract method that subclasses are meant to override in order to add rules to the lexer.

dispose()

Disposes of resources that the lexer is holding onto in order to free up memory.

init()

Initializes the lexer. This is called by the constructor. back to top

iterator()

Returns an iterator for the lexer. This method is for iterators in ecmascript 6, however it can be used in previous versions of JavaScript.

    var lexer  = new SimpleLexer("var x ='one';"),
        tokens = [];

    for(token of lexer) {

        if(token.name !== SpaceRule.symbol) 
            tokens.push(token);
    }

next()

Returns the next value in the iteration or throws a StopIteration exception. If the environment does not support StopIteration, then an Error with the message "StopIteration" is thrown.

nextValue()

Returns the next value in the iteration or returns the emptyValue when the sequence / loop has ended.

extends(Object prototype)

A static method that sub classes lexer.

isDigit(String character)

A static method that returns true if the character is a digit (0-9), otherwise it returns false.

isLetter(String character)

A static method that returns true if the character is a letter (a-zA-z), otherwise it returns false.

isLetterOrDigit(String character)

A static method that returns true if the character is a letter or digit(0-9a-zA-Z), otherwise it returs false.

License

The MIT License (MIT)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ux-lexer

Contents

Key Requirements

Rationale

Builds

Browser Distribution

AMD Distribution

CommonJS Distribution

API

Reader

constructor(Array|String enumerable)

example

current

limit

position

data

emptyValue

dispose()

example

next()

example

nextValue()

example

peek(Number position)

example

peekAtNext()

reset()

example

scan(Function|Object predicate, [Number position], [Number limit])

example

slice(Number offset, Number limit)

example

to(Number position)

example

LexerRule

LexerRule Example

symbol

example

constructor()

tokenName

createToken(Reader reader)

match(String character, Reader reader)

next(Reader reader)

extends(Object prototype)

Lexer

Lexer example

constructor(Object enumerable)

emptyValue]

initialized

reader

rules

slots

addRule(LexerRule rule)

addRules()

dispose()

init()

iterator()

next()

nextValue()

extends(Object prototype)

isDigit(String character)

isLetter(String character)

isLetterOrDigit(String character)

License