@hgargg-0710/regex

v0.1.0

Published

9 months ago

A JavaScript-flavoured regular expression parser, generator and AST-construction API.

Downloads

0High
0Medium
0Low

hgargg-0710

regex parse generate AST

`regex`

regex is a JavaScript library intended for parsing, generation and AST-construction of various regular expressions, as per the JavaScript variety's definition.

NOTE: the library depends upon the parsers.js package for parser-making

Installation

npm install @hgargg-0710/regex

Documentation

The package has the following exports:

parse (function)
generate (function)
parser (submodule)
generator (submodule)
tree (submodule)
tokens (submodule)

`parse`

function parse(regex: string): Flags

A function taking in a string containing a regular expression, and returning an AST of it.

`generate`

function generate(AST: Flags): string

Takes in the given AST node (not necessariliy Flags, but too long to express here), and returns a string representing it.

NOTE: partial nodes will give only partial results. For example, passing a PatternEnd will give "$".

`parser`

Various parsing layers APIs

| export | description | | ------------------ | --------------------------------------------------------- | | ExpressionParser | Function. Parses an Expression, initially tokenizing it | | boundry | Submodule. Handles parsing of boundries | | chars | Submodule. Handles tokenization | | classes | Submodule. Handles parsing of character classes | | deflag | Submodule. Handles removal of flags | | disjunction | Submodule. Handles parsing of disjunction expressions | | escaped | Submodule. Handles parsing of escape-sequences | | group | Submodule. Handles recursion within a regular expression | | nogreedy | Submodule. Handles the "no-greedy" quantifiers | | quantifier | Submodule. Handles the quantifiers |

The submodule exports are a part of the parse function's final definition.

The order in which they (layers) are passed within the parse function are:

deflag
chars
classes
escaped
boundry
group (recursive, looped)
quantifier
nogreedy
disjunction

`deflag`

| export | description | | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | DeFlag | Functions for the de-flagging of a string with regular expression in it. Returns a Flags object, with the .expression field containing the expressions's string | | flagTable | Table for identification of flags with appropriate TokenInstances | | flagInstance | Function based off flagTable. Returns the TokenType of a given flag string | | identifyFlags | Maps flagInstance to an array of strings |

`chars`

| export | description | | --------------------- | --------------------------------------------------------------------------------------- | | ExpressionTokenizer | A PatternTokenizer for tokenizing the given Pattern with a regular expression in it | | tokenizerMap | The RegExpMap, on which ExpressionTokenizer is based |

`classes`

| export | description | | ----------------------- | ----------------------------------------------------------------------------------- | | CharacterClassParser | Main parser for character classes | | classLimit | Limits the given stream up to the next RectOp from the current element | | classMap | TypeMap, on which CharacterClassParser is based | | HandleClass | The handler for the RectOp token inside the classMap | | ClassHandler | A multistep function, serving as the main component of HandleClass | | EscapeInner | A parser function, first component of the ClassHandler. Escapes inside characters | | HandleEscaped | Handler for the escaped characters, main part of the EscapeInner | | IdentifyRanges | Second parsing function of ClassHandler. Identifies and parsers ranges | | HandleRange | The main component of IdentifyRanges, parses encountered ranges | | InClassEscapedHandler | A slightly modified version of the escapedMap from escaped module for escaping |

`escaped`

| export | description | | -------------------------- | ---------------------------------------------------------------------- | | EscapedParser | Main parser of the escaped characters | | escapePreface | The TypeMap, on which EscapedParser is based | | escapeMap | The ValueMap, on which defines the global-scope escaping | | escapedHandler | Creates a function for handling escaped characters based off given map | | parseBackreference | Returns a Backreference based on given arguments of curr, input | | parseMultControl | Returns a ControlCharacter of lengths 4-5 based on curr, input | | parseDoubleControl | Returns a ControlCharacter of length 2 based on curr, input | | parseSingleControl | Returns a ControlCharacter of length 1 based on curr, input | | readUnicodeClassProperty | Parses a UnicodeClassProperty based on curr, input | | readBraced | Reads the given Stream, until a ClBrace is encountered | | readNamedBackreference | Reads a NamedBackreference based on readIdentifier | | readUBrace | Reads a sequence of {hhhh} or {hhhhh} where isHex(h) === true | | readu | Reads a sequence of hhhh, where isHex(h) === true | | readx | Reads a sequence of hh, where isHex(h) === true | | isHex | Returns whether a character given is a hexidecimal |

`boundry`

| export | description | | --------------- | ----------------------------------------------------------------------- | | BoundryParser | Main parser of the submodule. Separates boundries into TokenInstances | | boundryMap | The TypeMap, on which the BoundryParser is based | | HandleEscaped | Handles the NonWordBoundry TokenInstances |

`group`

| export | --------------------------- | EndParser | GroupParser | groupMap | GroupHandler | nestedBrack | CollectionHandler | HandleQMark | HandleCollectionBase | QMarkHandler | HandleQMarkExclMark | HandleQMarkEq | HandleLeftAngular | HandleColon | LeftAngularHandler | HandleLeftAngularBase | HandleLeftAngularExclMark | HandleLeftAngularEq | readIdentifier | description | | --------------------------------------------------------------------------------------------------------------- | | The main parser of the submodule. The ExpressionParser ends with it | | The first parsing layer of the EndParser. Recursive. Handles recursion, groups/captures, look-aheads/-behinds | | The TypeMap, on which the GroupParser is based | | The main component of the groupMap | | Function for limiting the current-level nested bracket-expression | | Function for handling current collection | | Function for handling "collections" starting with ? ((?<!...), (?<...>...), ...) | | Function for recursively handling a capture group | | Underlying TableParser of HandleQMark | | Handles a negative look-ahead | | Handles a look-ahead | | Handles all "collections" starting with < ((?<...>...), (?<=...), ...) | | Handles a no-capture group | | Underlying TableParser for HandleLeftAngular | | Handles a named capture | | Handles a negative look-behind | | Handles a look-behind | | Reads an identifier (for the named capture/backreference) |

`quantifier`

| export | description | | ------------------- | ---------------------------------------------------------------------------- | | QuantifierParser | Main parser of the submodule. Parses quantifiers | | QuantifierHandler | A TableParser, main component of the QuantifierParser | | HandlePlus | Handles a Plus token encountered | | HandleStar | Handles a Star token encountered | | HandleQMark | Handles a QMark token encountered | | BraceHandler | Handles a OpBrace token encountered | | HandleBraced | Returns a handling function for either one of NtoM, NPlus, or NOnly | | readNumber | Reads a number from the given Stream (note: up to the first isNaN token) | | limitBraced | Limits the given Stream up to the point of the first encountered ClBrace |

`nogreedy`

| export | description | | ------------------- | -------------------------------------------------------------- | | ParseNoGreedy | Main parser of the submodule. Parsers NoGreedy tokens | | noGreedyMap | The TypeMap, on which ParseNoGreedy is based | | HandleQuantifier | Handler for quantifiers | | QuantifierHandler | The underlying TableParser-function of HandleQuantifiers | | HandleQMark | Handles QMark following a quantifier (no-greedy quantifiers) |

`disjunction`

| export | description | | ---------------------- | ----------------------------------------------------------------------------------------------------------------- | | DisjunctionParser | The main export of the submodule. Parses disjunctions | | EmptyFixer | First parsing layer of DisjunctionParser. Fixes empty expressions \|\| | | DisjunctionTokenizer | Second parsing layer of DisjunctionParser. Puts non-Pipe bits of current Stream into DisjucntionArguments | | DisjunctionDelimiter | Third and final parsing layer of DisjunctionParser. Delimits the Stream based off Pipe tokens | | hasDisjunctions | Checks whether a given Stream has disjunctions to parse from given point on | | limitPipe | Limits the given Stream until the moment the next Pipe is encountered | | skipTilPipes | Skips Stream until a Pipe is discovered |

`generator`

Provides regex-generation related exports based off the package's AST

| export | description | | ------------------------------ | -------------------------------------------------------------------------------------------------- | | RegexGenerator | The SourceGenerator for the package's AST (generate is based on it) | | generatorMap | The TypeMap, on which RegexGenerator is based | | GenerateBackspaceClass | Generates a regex for BackspaceClass | | GenerateWordBoundry | Generates a regex for WordBoundry | | GenerateNonWordBoundry | Generates a regex for NonWordBoundry | | GenerateNewline | Generates a regex for Newline | | GenerateCarriageReturn | Generates a regex for CarriageReturn | | GenerateWordClass | Generates a regex for WordClass | | GenerateNonWordClass | Generates a regex for NonWordClass | | GenerateFormFeed | Generates a regex for FormFeed | | GenerateDigitClass | Generates a regex for DigitClass | | GenerateNonDigitClass | Generates a regex for NonDigitClass | | GenerateNULClass | Generates a regex for NULClass | | GenerateVerticalTab | Generates a regex for VerticalTab | | GenerateHorizontalTab | Generates a regex for HorizontalTab | | GenerateNonWhitespaceClass | Generates a regex for NonWhitespaceClass | | GenerateWhitespaceClass | Generates a regex for WhitespaceClass | | GenerateEmptyExpression | Generates a regex for EmptyExpression | | GenerateMatchIndicies | Generates a regex for MatchIndicies flag | | GenerateGlobalSearch | Generates a regex for GlobalSearch flag | | GenerateCaseInsensitive | Generates a regex for CaseInsensitive flag | | GenerateMultline | Generates a regex for Multline flag | | GenerateDotAll | Generates a regex for DotAll flag | | GenerateUnicode | Generates a regex for Unicode flag | | GenerateUnicodeSets | Generates a regex for UnicodeSets flag | | GenerateSticky | Generates a regex for Sticky flag | | GeneratePatterStart | Generates a regex for PatternStart | | GeneratePatternEnd | Generates a regex for PatternEnd | | GenerateFlags | Generates a regex for Flags | | GenerateExpression | Generates an regex for Expression | | GenerateNOnly | Generates an regex for NOnly | | GenerateNtoM | Generates an regex for NtoM | | GenerateNPlus | Generates an regex for NPlus | | GenerateEscaped | Generates an regex for Escaped | | GenerateBackreference | Generates a regex for Backreference | | GenerateUnicodeClassProperty | Generates a regex for UnicodeClassProperty | | GenerateControlCharacter | Generates a regex for ControlCharacter | | GenerateNamedBackreference | Generates a regex for NamedBackreference | | GenerateClassRange | Generates a regex for ClassRange | | GenerateNoGreedy | Generates a regex for NoGreedy | | GenerateOptional | Generates anregex for Optional | | GenerateZeroPlus | Generates a regex for ZeroPlus | | GenerateOnePlus | Generates a regex for OnePlus | | GenerateClass | Generates a regex for CharacterClass | | GenerateNegClass | Generates a regex for NegCharacterClass | | GenerateDisjunction | Generates a regex for Disjunction | | GenerateDisjunctionArgument | Generates a regex for DisjunctionArgument | | GenerateNonCaptureGroup | Generates a regex for NonCaptureGroup | | GenerateCaptureGroup | Generates a regex for CaptureGroup | | GenerateLookAhead | Generates a regex for LookAhead | | GenerateLookBehind | Generates a regex for LookBehind | | GenerateNegLookAhead | Generates a regex for NegLookAhead | | GenerateNegLookBehind | Generates a regex for NegLookBehind | | GenerateNamedCapture | Generates a regex for NamedCapture | | GenerateWildcard | Generates a regex for Wildcard | | GeneratePipe | Generates a regex for Pipe | | GenerateComma | Generates a regex for Comma | | GenerateTrivial | Generates a regex for anything else not in the table already (with a typeof .value === 'string') |

`tree`

| export | description | | ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | | RegexStream | A TreeStream for the library's AST (note: accepts THE AST ITSELF) | | RegexTree | A Tree interface implementation for the library's AST | | treeMap | The TypeMap, on which RegexTree is based | | NamedCaptureTree | The function for conversion of a NamedCapture to a Tree | | ExpressionTree | The function for conversion of an Expression to a Tree | | FlagTree | The function for convertsion of a Flags to a Tree | | SeveralTree | The function for conversion of NOnly, NtoM and NPlus to a Tree | | SingleTree | The function for conversion of ZeroPlus, OnePlus, Optional, LookAhead, LookBehind, NegLookAhead, NegLookBehind, NamedBackreference to a Tree | | ValueTree | The function for conversion of ClassRange, DisjunctionArgument, CharacterClass, NegCharacterClass and Disjunction to a Tree | | ChildlessTree | The function for conversion of the rest of the tokens to a Tree |

`tokens`

The tokens module has the same submodule structure as the parser module.

| submodule | description | | ------------- | ------------------------------------------------ | | boundry | Various boundry tokens | | chars | Various basic (first-order) tokens | | classes | Tokens for representation of character classes | | deflag | Flags and expressions representation tokens | | disjunction | Disjunction-related tokens | | escaped | Escape-sequence-related tokens | | group | Tokens for groups and other recursive structures | | nogreedy | Tokens for non-greedy quantifiers | | quantifier | Tokens for quantifiers |

`deflag`

| TokenType/TokenInstance | represents | type | | --------------------------- | ------------------------------------------------------------------------- | -------------------- | | MatchIndicies | The d flag | "indicies" | | GlobalSearch | The g flag | "global" | | CaseInsensitive | The i flag | "case-insensitive" | | Multiline | The m flag | "multiline" | | DotAll | The s flag | "dot-all" | | Unicode | The u flag | "unicode" | | UnicodeSets | The v flag | "unicode-sets" | | Sticky | The y flag | "sticky" | | Flags | The complete regular expression with flags | "flags" | | Expression | A partial expression, without flags (can have other Expressions inside) | "expression" |

`chars`

| TokenType | represents | type | | -------------- | --------------- | ------------ | | Escape | \\ | "escape" | | RectOp | [ | "rop" | | RectCl | ] | "rcl" | | Hyphen | - | "hyphen" | | Pipe | \| | "pipe" | | OpBrack | ( | "opbrack" | | ClBrack | ) | clbrack | | QMark | ? | "qmark" | | ExclMark | ! | "emark | | Eq | = | "eq" | | Wildcard | . | "wildcard" | | Star | * | "star" | | Plus | + | "plus" | | OpBrace | { | "opbrc" | | ClBrace | } | "clbrc" | | Colon | : | "colon" | | Comma | , | "comma" | | LeftAngular | < | "lang" | | RightAngular | > | "rang" | | Dollar | $ | "dollar" | | Xor | ^ | "xor" | | RegexSymbol | everything else | "symbol" |

`classes`

| TokenType | represents | type | | ------------------- | ----------------------------------- | ----------------- | | CharacterClass | A character class [...] | "charclass" | | NegCharacterClass | A negative character class [^...] | "neg-charclass" | | ClassRange | A character class range X-Y | "class-range" |

`escaped`

| TokenType/TokenInstance | represents | type | | --------------------------- | ---------------------------------------------------- | ------------------------ | | ControlCharacter | \cX, \xhh, \uhhhh, \u{hhhh} or \u{hhhhh} | "control-char" | | Backreference | \N - numeric backreference | "backref" | | NamedBackreference | \k<name> - named backreference | "named-backref" | | UnicodeClassProperty | \p{...} - unicode class property | "uniprop" | | RegexIdentifier | name - identifier in named captures/backreferences | "identifier" | | CarriageReturn | \r - carriage return | "cr" | | NonWordBoundry | \B - non-word boundry (outside classes) | "non-word-boundry" | | WordBoundry | \b - word-boundry | "word-boundry" | | NULClass | \0 - NUL class | "nul-class" | | FormFeed | \f - form feed | "form-feed" | | DigitClass | \d - digit class | "digit-class" | | NonDigitClass | \D - non-digit class | "non-digit-class" | | WordClass | \w - word-class | "word-class" | | NonWordClass | \W - nonw-word-class | "non-word-class" | | WhitespaceClass | \s - whitespace class | "whitespace-class" | | NonWhitespaceClass | \S - non-whitespace class | "non-whitespace-class" | | HorizontalTab | \t - horizontal tab | "tab" | | VerticalTab | \v - vertical tab | "vtab" | | BackspaceClass | \b - backspace | "backspace" | | Newline | \n - newline | "newline" | | Escaped | Any other escaped character | "escaped" |

`boundry`

| TokenInstance | represents | type | | --------------- | ---------- | --------- | | PatternStart | ^ | "start" | | PatternEnd | $ | "end" |

`group`

| TokenType | represents | type | | ---------------- | ------------- | ------------------ | | CaptureGroup | (...) | "capture" | | NoCaptureGroup | (?:...) | "non-capture" | | NamedCapture | (<name>...) | "named-capture" | | LookAhead | (?=...) | "lookahead" | | LookBehind | (?<=...) | "lookbehind" | | NegLookAhead | (?!...) | "neg-lookahead" | | NegLookBehind | (?<!...) | "neg-lookbehind" |

`quantifier`

| TokenType | represents | type | | ----------- | -------------- | ------------- | | ZeroPlus | ...* | "zero-plus" | | OnePlus | ...+ | "one-plus" | | Optional | ...? | "optional" | | NOnly | ...{...} | "n-only" | | NPlus | ...{...,} | "n-plus" | | NtoM | ...{...,...} | "n-to-m" |

`nogreedy`

| export | description | type | | -------------- | ------------------------------------------------------------------------------------ | ------------ | | NoGreedy | A TokenType representing no-greedy opertors | "nogreedy" | | isQuantifier | A predicate returning true only for tokens with types from the quantifier module |

`disjunction`

| TokenType/TokenInstance | represents | type | | --------------------------- | -------------------------------------------- | ------------------- | | Disjunction | ...\|...\|... | "disjunction" | | DisjunctionArgument | An element of a Disjunction | "disjunction-arg" | | EmptyExpression | An empty element of a Disjunction (\|\|) | "empty" |

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

regex

Installation

Documentation

parse

generate

parser

deflag

chars

classes

escaped

boundry

group

quantifier

nogreedy

disjunction

generator

tree

tokens

deflag

chars

classes

escaped

boundry

group

quantifier

nogreedy

disjunction

`regex`

`parse`

`generate`

`parser`

`deflag`

`chars`

`classes`

`escaped`

`boundry`

`group`

`quantifier`

`nogreedy`

`disjunction`

`generator`

`tree`

`tokens`

`deflag`

`chars`

`classes`

`escaped`

`boundry`

`group`

`quantifier`

`nogreedy`

`disjunction`