nmatch

v0.1.0

Published

3 years ago

combine multiple pattern-matching indices

Downloads

0High
0Medium
0Low

wraugh

rules index wildcard path glob router patterns

NMatch lets you combine multiple pattern matchers.

Say you have a set of URL patterns, like "/foo" and "/foo/{id}", and a set of HTTP method patterns, like "GET", "POST", and "ANY", and a set of user role patterns, like "{ user: "alice" } and { group: "admins" }. You can bundle these into NMatch and then ask it "what's the handler for this user doing this method to this URL?"

You do this in three steps.

First you define what your rules will look like by choosing a set of matchers:

const NMatch = require('nmatch')

const router = new NMatch([
  () => new UrlMatcher(),
  () => new MethodMatcher(),
  () => new UserRoleMatcher()
])

Then you set rules that map patterns to results. Each rule takes one pattern per matcher, plus the result that the rule should return when it matches:

router.set('/foo',       'GET', { group: 'public' }, getFoosHandler)
router.set('/foo/{id}',  'GET', { group: 'public' }, getAFooHandler)
router.set('/foo/{id}',  'PUT', { group: 'admins' }, editFooHandler)
router.set('{default+}', 'ANY', { group: 'public' }, defaultHandler)

Now you can match terms against the ruleset, one term per matcher. NMatch will return the best result that matches all the terms: the first term against the first matcher, the second term against the second matcher, and so on.

const alice = { 'username': 'alice', 'groups': ['public', 'admins'] }
const anon = { 'groups': ['public'] }

tap.is(router.match('/foo/bar', 'GET', anon), getAFooHandler)
tap.is(router.match('/foo/bar', 'PUT', anon), defaultHandler)
tap.is(router.match('/foo/bar', 'PUT', alice), editFooHandler)

API

The NMatch constructor takes for argument an array of functions, each of which should return a new instance of a matcher. They will be called as needed when more patterns are added. Their order is the same as that of the set and match methods' arguments.

const Id = require('nmatch/matchers/id') // "matches" using `===`

const bookNotes = new NMatch([
  () => new Id(), // Author
  () => new Id()  // Title
])

The set method takes as many arguments as you've defined matchers, plus one for the value being set. set will associate that value to the given patterns. What counts as a pattern depends on the kind of matcher you use.

bookNotes.set('Homer', 'Odyssey', 'Really old')
bookNotes.set('James Joyce', 'Ulysses', 'Just here for the puns')

The match method takes as many arguments as you've defined matchers. It will pass the first argument to the first matcher, the second argument to the second matcher, and so on. It returns the best match, or null if no matches are found.

tap.is(bookNotes.match('Homer', 'Odyssey'), 'Really old')
tap.is(bookNotes.match('Homer', 'Ulysses'), null) // no match

The NMatch constructor does some basic validation on its arguments

tap.throws(() => new NMatch(), 'At least one matcher is required')
tap.throws(() => new NMatch([]), 'At least one matcher is required')
tap.throws(() => new NMatch(['herp']), 'Matchers must be functions')
tap.throws(() => new NMatch([() => 'derp']),
  'Matchers must have a `pattern` and a `match` function')
tap.throws(() => new NMatch([() => ({ pattern: () => 1 })]),
  'Matchers must have a `pattern` and a `match` function')
tap.throws(() => new NMatch([() => ({ match: () => 1 })]),
  'Matchers must have a `pattern` and a `match` function')

set and match will check that you pass in the right number of arguments, but they can't tell whether you've given them in the right order. Make sure to be consistent.

tap.throws(() => bookNotes.set('Homer', 'Iliad'),
  'Too few arguments')
tap.throws(() => bookNotes.set('Homer', 'Iliad', 'Trojan war', 'Prequel'),
  'Too many arguments')

tap.throws(() => bookNotes.get('Homer'),
  'Too few arguments')
tap.throws(() => bookNotes.get('Homer', 'Iliad', 'Odyssey'),
  'Too many arguments')

bookNotes.set('Iliad', 'Homer', 'Trojan war') /* Oops, arguments in wrong order,
  but NMatch has no way of knowing */
tap.is(bookNotes.match('Homer', 'Iliad'), null, 'No match: order matters')

Built-in Matchers

NMatch is powered by matchers, objects whose job it is to store patterns and perform matching against them. Let's look at the built-in matchers before discussing how you can build your own.

Id

The most basic matcher is Id, whose "patterns" can be any object, and whose "matching" is just applying the strict equality (===) operator. It's the equivalent of a Map, and in fact uses a Map internally to store patterns and values.

const Id = require('nmatch/matchers/id')

const id = new NMatch([ () => new Id() ])

id.set(123, 123)
id.set('Abc', 'Abc')

tap.is(id.match(123), 123)
tap.is(id.match(456), null)
tap.is(id.match('Abc'), 'Abc')
tap.is(id.match('A'), null)

const arr = [1, 2, 3]
id.set(arr, 'arr')
const obj = { a: 1, b: 2 }
id.set(obj, 'obj')

tap.is(id.match(arr), 'arr')
tap.is(id.match(arr.slice()), null, 'In javascript, `[] !== []`')

tap.is(id.match(obj), 'obj')
tap.is(id.match({ a: 1, b: 2 }), null, 'In javascript, `{} !== {}`')

Calling set a second time with the same pattern overrides the old value

id.set('Abc', 789)
tap.is(id.match('Abc'), 789, 'Used to be "Abc"')

You can use any object for patterns and/or values

const dub = a => 2 * a
id.set('double', dub)
id.set(dub, 2)
tap.is(id.match('double'), dub)
tap.is(id.match(dub), 2)
tap.is(id.match(a => 2 * a), null, 'In javascript, `() => {} !== () => {}`')

Matchers are essentially enhanced Maps: they bind keys to values. The difference is that the key-matching operation for Matchers can be anything, whereas for Maps it's always ===. Still, Maps are so useful that we need a matcher that replicates their behavior. Id is that matcher. It's a simple wrapper around a Map.

class Id {
  constructor () {
    this._map = new Map()
  }

Maps have a get(key) method that returns the value bound to key. Matchers need two such methods: one for pattern matching, and one for getting the value for a pattern (literally). These are the match and pattern methods, respectively.

match is a generator that yields results in order from best match to worst. In our case here we will either yield one value or none: either there's a value at the given key, or there isn't. But more generally, match methods can yield 0 or more results.

  * match (obj) {
    const v = this._map.get(obj)
    if (v !== undefined) {
      yield v
    }
  }

pattern is a getter that's used to bind a value to a pattern. Matchers don't store these values directly; they store "container" objects on which the caller can attach values. This layer of indirection is what allows NMatch to chain matchers. pattern returns the container object associated to the given pattern, first creating it if it doesn't exist.

  pattern (obj) {
    let v = this._map.get(obj)
    if (v === undefined) {
      v = {}
      this._map.set(obj, v)
    }

    return v
  }
}

Paths

Paths matches paths against patterns that can contain wildcards. It makes use of the hierarchical nature of paths to store and match patterns efficiently. If you run a benchmark test please share your results.

const NMatch = require('nmatch')
const Paths = require('nmatch/matchers/paths')

const p = new NMatch([ () => new Paths() ])

Patterns are made up of literals, wildcards, and super-wildcards. Literals match when strings are exactly equal, obv:

p.set('lit', 1)
p.set('lit/foo', 2)
p.set('lit/bar', 3)

tap.equals(p.match('lit/bar'), 3)
tap.equals(p.match('lit/quux'), null)

Wildcards (*) will match anything in a single path segment

p.set('wild/*', 4)
p.set('wild/*/foo', 4.1)

tap.equals(p.match('wild/foo'), 4)
tap.equals(p.match('wild/bar/foo'), 4.1)
tap.equals(p.match('wild/foo/bar'), null)

Super-wildcards (**) will match anything including further path segments

p.set('super/**', 5)

tap.equals(p.match('super/foo'), 5)
tap.equals(p.match('super/foo/bar'), 5)

It's an error if you try to add more patterns after a super-wildcard

tap.throws(() => p.set('super/**/useless suffix', 5))

Wildcards and super-wildcards apply to path segments as a whole; there are no partial string matches. The following example patterns are just literals:

p.set('lit/w*', 6)
p.set('lit/w**', 7)

tap.equals(p.match('lit/wat'), null)
tap.equals(p.match('lit/w*'), 6)
tap.equals(p.match('lit/w**'), 7)

Only patterns can have wildcards. When you call match, the argument is just a literal.

p.set('foo/bar', 8)
p.set('foo/baz', 9)

tap.equals(p.match('foo/*'), null)

When more than one pattern matches, they are sorted in order from tightest match to loosest match. A literal matches more tightly than a wildcard, which in turn matches more tightly than a super-wildcard. NMatch will return the first hit.

p.set('foo/bar', 'foobar')
p.set('foo/*', 'splat')
p.set('foo/*/**', 'splat super')
p.set('foo/**', 'super')

tap.equals(p.match('foo/bar'), 'foobar')
tap.equals(p.match('foo/foo'), 'splat')
tap.equals(p.match('foo/bar/baz'), 'splat super')

Customizing

The default path separator is /, but you can change that

const ns = new NMatch([ () => new Paths('::') ])
ns.set('Foo::*', 1)
tap.equals(ns.match('Foo::Bar'), 1)

You can use a regex to specify the separator

const words = new NMatch([ () => new Paths(/\s+/) ])
words.set('Ho * Ho!', 1)
tap.equals(words.match('Ho   Ho   Ho!'), 1)

Or you can define your own separator function

const urls = new NMatch([ () => new Paths(path => {
  const parts = path.replace(/\?.*/, '').split('/').slice(1)
  return parts.length > 0 ? parts : ['']
})])
urls.set('/foo/bar', 1)
tap.equals(urls.match('/foo/bar?baz=quux'), 1)

You can use a falsy separator to turn off path separation altogether

const lit = new NMatch([ () => new Paths('') ])
lit.set('foo/*', 1)
tap.equals(lit.match('foo/bar'), null)
tap.equals(lit.match('foo/*'), 1)

The default wildcard symbol is *, but you can change that

const madLibs = new NMatch([ () => new Paths(' ', '____') ])
madLibs.set('The ____ brown ____', 1)
tap.equals(madLibs.match('The quick brown fox'), 1)

You can use a regex to specify what counts as a wildcard

const ruby = new NMatch([ () => new Paths('/', /^:\w+/) ])
ruby.set('foos/:id', 1)
tap.equals(ruby.match('foos/123'), 1)

Or you can pass in your own function to determine if a string is a wildcard

const params = ['size', 'color']
const fw = new NMatch([ () => new Paths(':', str => params.includes(str)) ])
fw.set('towels:size:color', 1)
tap.equals(fw.match('towels:large:green'), 1)

You can use a falsy value to turn off wildcards altogether

const quiteLit = new NMatch([ () => new Paths('.', null) ])
quiteLit.set('*.*', 1)
tap.equals(quiteLit.match('cmd.exe'), null)
tap.equals(quiteLit.match('*.*'), 1)

The default super-wildcard symbol is **, but you can change that

const trunc = new NMatch([ () => new Paths(' ', '____', '...') ])
trunc.set('The ____ brown ...', 1)
tap.equals(trunc.match('The quick brown fox jumped'), 1)

You can use a regex to specificy what counts as a super-wildcard

const apiGateway = new NMatch([ () => new Paths('/', /^\{\w+\}$/, /^\{\w+\+\}$/) ])
apiGateway.set('{proxy+}', 1)
tap.equals(apiGateway.match('foo/bar'), 1)

Or you can pass in your own function to determine if a string is a super-wildcard

const anything = new NMatch([ () => new Paths(0, 0, str => true) ])
anything.set('really anything', 1)
tap.equals(anything.match('welp'), 1)

You can use a falsy value to turn off super-wildcards altogether

const http = new NMatch([ () => new Paths('/', '*', false) ])
http.set('*', 1)
http.set('**', 2)
tap.equals(http.match('wild'), 1)
tap.equals(http.match('not/wild'), null)
tap.equals(http.match('**'), 2)

Consider the difference between

find / -path /usr/bin/node

and

ls /usr/bin/node

The first visits every file on the system and checks if their name is "/usr/bin/node". It takes 15s to run on my laptop.

The second takes 3ms to run. It doesn't visit every file. It visits /usr, /usr/bin, and /usr/bin/node. It can do this because directories are organized in a tree structure, and file names represent paths through that tree. It just needs to walk the path one /-delimited node at a time.

Inspired by this, we can build a Paths matcher that stores its patterns in a tree, and does its matching by traversing that tree.

We'll support three kinds of path elements: literals, wildcards, and super-wildcards. Literals are strings that match only the same string, wildcards match any string, and super-wildcards match any string, even across path separators. Take for instance this four pattern set:

| Pattern | "/foo/bar" | "/foo/baz" | "/foo/bar/baz" | | ---------- | ---------- | ---------- | -------------- | | /foo/bar | Match | No match | No match | | /foo/baz | No match | Match | No match | | /foo/* | Match | Match | No match | | /foo/** | Match | Match | Match |

We can represent these four patterns as a tree with six nodes:

/
└── foo/
    ├── bar
    ├── baz
    ├── *
    └── **

Before we get around to writing our matcher, let's implement that Node object. It will handle tree traversal and matching individual path segments.

We'll need a way for a Node to reference a child Node:

const nextKey = Symbol('nmatch/path nextNode')

And we'll need a way to tell whether a Node is the end of a path. This isn't as simple as checking whether it has any children. In the pattern set ("/foo", "/foo/bar"), the Node "/foo" has a child Node, but it's also the end of a path. We need to attach a flag on every node that's a valid endpoint:

const endKey = Symbol('nmatch/path isEnd')

A Node needs to keep a list of its children. We can simply use an Object for this purpose: the key is the child's path part string (e.g. "foo" or "bar"), and the value is the child Node. We could treat * and ** as special keys for wildcards and super-wildcards, but because we let users define custom wildcard tokens (via test functions), it's possible for * or ** to be legitimate literal path segments. So we'll use separate variables to hold those two.

class Node {
  constructor (wildcardTest, superWildcardTest) {
    this._isWildcard = wildcardTest
    this._isSuperWildcard = superWildcardTest
    this._literals = {}
    this._wildcard = undefined
    this._superWildcard = undefined
  }

Matching is simple: we want to find an exact match, a wildcard match, and/or a super-wildcard match, in that order. So we'll ask three questions: "is there an exact match? a wildcard? a super-wildcard?". For any "yes", we yield the matching child Node. If we're at the end of the path, this means yielding the child directly; otherwise it means recursively matching the rest of the path against the child Node.

  * match (pathParts) {
    for (let m of [this._literals[pathParts[0]], this._wildcard, this._superWildcard]) {
      if (m === undefined) continue

      const isLeaf = (pathParts.length === 1 || m === this._superWildcard) &&
        m[endKey] !== undefined
      if (isLeaf) yield m

      const isBranch = pathParts.length > 1 && m[nextKey] !== undefined
      if (isBranch) yield * m[nextKey].match(pathParts.slice(1))
    }
  }

When adding paths to our Paths matcher via its pattern method, we'll need an analogous part method on each Node: a function that returns a handle to the part's container, and creates it as needed. There are two modes: if the part is an endpoint, we just set the endpoint flag on the container. Otherwise we put a new child Node in the container.

  part (part, isEndpoint = false) {
    let [p, partType] = this._getOrCreate(part)

    if (isEndpoint) {
      p[endKey] = true
    } else {
      if (partType === 'super-wildcard') {
        throw new Error('Super wildcards must be the last path element')
      }
      if (p[nextKey] === undefined) {
        p[nextKey] = new Node(this._isWildcard, this._isSuperWildcard)
      }
    }

    return p
  }

Our part containers are just fresh Objects, i.e. {}.

  _getOrCreate (part) {
    let partType = this._parseType(part)
    let p = this._getExisting(part, partType)
    if (p === undefined) {
      if (partType === 'super-wildcard') {
        this._superWildcard = {}
        p = this._superWildcard
      } else if (partType === 'wildcard') {
        this._wildcard = {}
        p = this._wildcard
      } else {
        this._literals[part] = {}
        p = this._literals[part]
      }
    }

    return [p, partType]
  }

  _getExisting (part, type) {
    switch (type) {
      case 'super-wildcard':
        return this._superWildcard
      case 'wildcard':
        return this._wildcard
      case 'literal':
        return this._literals[part]
    }
  }

  _parseType (part) {
    if (this._isSuperWildcard(part)) {
      return 'super-wildcard'
    }
    if (this._isWildcard(part)) {
      return 'wildcard'
    }

    return 'literal'
  }
}

The Node class we've just defined does all the heavy lifting when matching: it's the one that walks the tree and finds matches. The Paths class' main job is to split paths into parts, but that's done with a user-defined function. So there's not much left for it to do.

class Paths {
  constructor (separator = '/', wildcard = '*', superWildcard = '**') {
    this._separate = typeof separator === 'function'
      ? separator : this._getDefaultSeparator(separator)

    const isWildcard = typeof wildcard === 'function'
      ? wildcard : this._getDefaultPartTest(wildcard)

    const isSuperwildcard = typeof superWildcard === 'function'
      ? superWildcard : this._getDefaultPartTest(superWildcard)

    this._pathStart = new Node(isWildcard, isSuperwildcard)
  }

When adding a new pattern, we walk along the tree and create missing Nodes as needed with the Node.part method.

  pattern (pattern) {
    const pathParts = this._separate(pattern)
    var step = this._pathStart

    while (pathParts.length > 1) {
      step = step.part(pathParts.shift())[nextKey]
    }

    return step.part(pathParts.shift(), true)
  }

  * match (path) {
    yield * this._pathStart.match(this._separate(path))
  }

  _getDefaultSeparator (pattern) {
    if (!pattern) {
      return path => [path]
    }

    return path => path.split(pattern)
  }

  _getDefaultPartTest (pattern) {
    if (!pattern) {
      return part => false
    }

    return pattern instanceof RegExp
      ? part => pattern.test(part)
      : part => part === pattern
  }
}

Any

The Any matcher is a wrapper that lets you match more than one instance at a time. You pass its constructor another matcher (e.g. Id or Paths), and you pass its match method an array of instances. It will loop through that array and call match on the underlying matcher, returning the first match it finds. This is useful when you need to support fallbacks or aliases.

const NMatch = require('nmatch')
const Any = require('nmatch/matchers/any')
const Paths = require('nmatch/matchers/paths')

const any = new NMatch([ () => new Any(new Paths()) ])
any.set('/foo/*', 'foo')
any.set('/bar/*', 'bar')

tap.is(any.match(['/foo/123', '/bar/456']), 'foo')
tap.is(any.match(['/bar/456', '/foo/123']), 'bar',
  "Matches are searched for in the order they're given")

tap.is(any.match(['/baz/789', '/quux/012']), null)
tap.is(any.match(['/baz/789', '/quux/012', '/foo/345']), 'foo')

Any.match takes an iterable for argument, and will throw an Error otherwise. It doesn't count strings as iterables.

tap.is(any.match([]), null,
  'Attempting to match nothing at all is the same as finding no match')
tap.throws(() => any.match(123),
  'Argument must be a non-string iterable')
tap.throws(() => any.match('/foo/123'),
  'Argument must be a non-string iterable')

If you don't specify a matcher, Any will default to using Id.

const defaultAny = new NMatch([ () => new Any() ])
defaultAny.set(any, any)
defaultAny.set(tap, tap)
tap.is(defaultAny.match([any, 123, tap]), any)
tap.is(defaultAny.match([123, tap, any]), tap)

All Any does is one type check and one for loop; the real work is delegated to the matcher it wraps.

class Any {
  constructor (matcher = new Id()) {
    this._matcher = matcher
  }

  pattern (pattern) {
    return this._matcher.pattern(pattern)
  }

  * match (values) {
    if (values == null || typeof values[Symbol.iterator] !== 'function' || typeof values === 'string') {
      throw new Error('Argument must be a non-string iterable')
    }
    for (let value of values) {
      yield * this._matcher.match(value)
    }
  }
}

Creating your own Matcher

Any object that implements the following two methods is a matcher:

function pattern (pattern) stores the given pattern in the matcher, and returns a container object on which we can set its value. NMatch.set calls this method.

function * match (instance) yields a list of matches for the given instance, ordered from best match to worst. NMatch.match calls this method.

There's no restriction on the type of argument taken by these functions. The one limitation is that they always get called with a single argument. The rest is up to you, and the semantics are pretty flexible.

For example, here's the implementation of the three matchers used in the intro:

class UrlMatcher extends Paths {
  constructor () {
    super('/', /^\{\w+\}$/, /^\{\w+\+\}$/)
  }
}

class MethodMatcher extends Paths {
  constructor () {
    super(null, 'ANY')
  }
}

class UserRoleMatcher {
  constructor () {
    this._users = new Map()
  }

  pattern (user) {
    let u = this._users.get(user)
    if (u === undefined) {
      u = {}
      this._users.set(user, u)
    }

    return u
  }

  * match (term) {
    for (let [user, u] of this._users.entries()) {
      if (term.groups.includes(user.group)) {
        yield u
      }
    }
  }
}

NMatch stores patterns in a tree-like structure, with the leftmost matcher as the root of the tree, and each subsequent matcher one level down in the tree. If you're working with a lot of patterns, you'll typically want to put your most discriminating matchers first to speed up matching operations.

const valueKey = Symbol('nmatch-value')

class NMatch {
  constructor (matcherMakers) {
    this._checkMatcherMakers(matcherMakers)
    this._matcherMakers = matcherMakers
    this._root = this._matcherMakers[0]()
  }

To set a value, we walk down the pattern tree, creating missing nodes as needed, and set the value on the end node.

  set (...patterns) {
    const value = patterns.pop()
    this._checkParts(patterns)
    let cur = this._root
    const nexts = this._matcherMakers.slice(1)
    let pattern, next
    while ((pattern = patterns.shift(), next = nexts.shift())) {
      cur = cur.pattern(pattern)
      if (cur[valueKey] === undefined) {
        cur[valueKey] = next()
      }
      cur = cur[valueKey]
    }
    cur.pattern(pattern)[valueKey] = value
  }

To match an instance, we also walk down the pattern tree, but each step forward is a matching operation that returns 0 or more child nodes. If the first child node doesn't match, we try the second, and so on until we know for sure whether we have a match or not.

  match (...instances) {
    this._checkParts(instances)

    return this._rMatch(this._root, instances)
  }

  _rMatch (matcher, instances) {
    const rest = instances.slice(1)

    for (let candidate of matcher.match(instances[0])) {
      if (rest.length === 0) {
        return candidate[valueKey]
      }

      let subMatch = this._rMatch(candidate[valueKey], rest)
      if (subMatch !== null) {
        return subMatch
      }
    }

    return null
  }

  _checkMatcherMakers (matcherMakers) {
    if (matcherMakers.length < 1) {
      throw new Error('At least one matcher is required')
    }
    for (let matcher of matcherMakers) {
      if (typeof matcher !== 'function') {
        throw new Error('MatcherMakers must be functions')
      }
      let m = matcher()
      if (typeof m.pattern !== 'function' || typeof m.match !== 'function') {
        throw new Error('MatcherMakers must have a `pattern` and a `match` function')
      }
    }
  }

  _checkParts (parts) {
    if (parts.length !== this._matcherMakers.length) {
      throw new Error('Bad part count. ' +
                `Expected ${this._matcherMakers.length}, have ${parts.length}`)
    }
  }
}

Contributing

This project is deliberately left imperfect to encourage you to participate in its development. If you make a Pull Request that

explains and solves a problem,
follows standard style, and
maintains 100% test coverage

it will be merged: this project follows the C4 process.

To make sure your commits follow the style guide and pass all tests, you can add

./.pre-commit

to your git pre-commit hook.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

API

Built-in Matchers

Id

Paths

Customizing

Any

Creating your own Matcher

Contributing