@elasticpath/epcc-search-ast-generator-js
v1.7.0
Published
This project contains all the logic for processing [filtering in EPCC](https://documentation.elasticpath.com/commerce-cloud/docs/api/basics/filtering.html). This document provides an introduction into how this conceptually works.
Downloads
1,468
Keywords
Readme
Filtering Introduction.
This project contains all the logic for processing filtering in EPCC. This document provides an introduction into how this conceptually works.
A problem we often have is that we have a string, and want to classify it as being valid or invalid. For instance here are some things you might care about strings:
- Does a string start with a capital letter.
- Is a string a valid email address.
- Is a string a palindrome (written the same forwards and backwards like redivider, civic, radar, level)
- Is a string a valid Java program.
There are different techniques that can be used to check the above, including writing code directly and using a regular expression. But some things are more complicated, like case #3 and #4.
In the case of the last one, you would likely define a grammar which is a set of rules that say what a valid string looks like. A common technique is to proceed in two phases:
- Lexing - Converting the string into chunks of things called tokens, each token representing a some part of the string (e.g., in a java program you might have a token for each keyword). More details are in the
lexer.js
file - Parsing - Write rules for checking the output of the lexer. We use a recursive decent parser, which means that each grammar rule calls a function that checks for a match and may recursively call other functions.
The grammar that we are attempting to parse in Augmented Backus-Naur Form (https://www.rfc-editor.org/rfc/rfc5234) is the following:
(Some of the way this is structured is to make the code easier to read)
root = term *( ":" term )
term /= binary_term
term /= unary_term
term /= vararg_term
binary_term = binary_operator "(" literal "," literal ")"
unary_term = unary_operator "(" literal ")"
vararg_term = vararg_operator "(" literal "," literal *( "," literal ) ")"
binary_operator = "eq" / "like" / "gt" / "ge " / "lt" / "le" / "text"
unary_operator = "is_null"
vararg_operator = "in"
; these literal represent standard unencoded literals.
literal = 1*(ALPHA / DIGIT / "$" / "-" / "_" / "*" / "." / " " / "{" / "}" / "@" / "|")
; we treat binary, unary & vararg operators as literals in cases where people want to do something like eq(foo, ne).
literal /= binary_operator
literal /= unary_operator
literal /= vararg_operator
; 0x22 is a quotation mark
literal /= 0x22 *string_characters 0x22
; there is probably a better way to write this in ABNF involving hex like 0x01-0x21 0x23-0x5B 0x5D-0x7E
string_character = any_character but not " or \
string_character /= escape_sequence
escape_sequence = "\" 0x22
A more interested reader is encouraged to read: Domain-Specific Languages, by Fowler, the two techniques we use here are: (Regex Table Lexer p.239-244, Recursive Descent Parser p.245-255).
Released Package
https://www.npmjs.com/package/@elasticpath/epcc-search-ast-generator-js