npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

elasticsearch-synonyms

v1.0.2

Published

A library for parsing synonyms

Downloads

7

Readme

Synonyms are hard, lets face it

Build Status

Well, they aren't really, just check out a Thesaurus. However, the difficulty comes when we use phrases for synonyms. As Solr and Elasticsearch parse with a space ' ', phrases are broken up and our results are not what we expect. Like I say, synonyms are hard.

I'm also not worried about case at the moment, so RIRO, I expect your parameters to be the exact case you want.

The following is taken from elasticsearch synonym tokenfilter documentation:

# Blank lines and lines starting with pound are comments.

# Explicit mappings match any token sequence on the LHS of "=>"
# and replace with all alternatives on the RHS.  These types of mappings
# ignore the expand parameter in the schema.
# Examples:
i-pod, i pod => ipod,
sea biscuit, sea biscit => seabiscuit

# Equivalent synonyms may be separated with commas and give
# no explicit mapping.  In this case the mapping behavior will
# be taken from the expand parameter in the schema.  This allows
# the same synonym file to be used in different synonym handling strategies.
# Examples:
ipod, i-pod, i pod
foozball , foosball
universe , cosmos

# If expand==true, "ipod, i-pod, i pod" is equivalent
# to the explicit mapping:
ipod, i-pod, i pod => ipod, i-pod, i pod
# If expand==false, "ipod, i-pod, i pod" is equivalent
# to the explicit mapping:
ipod, i-pod, i pod => ipod

# Multiple synonym mapping entries are merged.
foo => foo bar
foo => baz
# is equivalent to
foo => foo bar, baz

There are four permutations of these synonyms:

  • Simple expansion (a,b,c)
  • Simple contraction (a,b,c => a)
  • Genre expansion (a => c,b,a)
  • Explicit mappings (a,b,c => a,b,c)

Simple expansion

Simple expansion/Equivalent synonyms, are single words separated by a comma. Each term equals each other.

football,soccer,foosball

Searches for soccer would return foosball and football.

Phrases would also be included in this if the lhs equaled the rhs of the fat arrow.

Simple contraction

The key is in the term 'contraction' - words on the left are replaced by the term/s on the rhs.

leap,hop => jump

This has to be used at analysis time as well as query time. I think this is because at index time, terms on the left will be replaced with the term on the right, so in order for your search for "hop" to return results, you need to pass that in at the query time.

Genre expansion

This sets up genres. For example, a cat is a type of pet. A kitten is a type of cat, which is a type of pet. A dog is a pet, and a puppy is a type of dog.

cat => cat,pet,
kitten => kitten,cat,pet,
dog => dog,pet,
puppy => puppy,dog,pet

Searching 'pet' would return 'cat', 'kitten', 'dog', 'puppy'.

Explicit mapping

These match any token sequence on the LHS of "=>" and replace with all alternatives on the RHS. This has issues with phrases as elasticsearch tokenizes using whitespace. Terms on the left will be replaced by terms on the right.

a,b,c => a,b,c

Install

const s = require('elasticsearch-synonyms');

Methods

s.expand(array)

Takes an array and returns a comma delimited string.

Turns:

['u s a', 'usa', 'united states of america']

into:

'u s a,usa,united states of america => u s a,usa,united states of america'

s.expandString(string)

Takes a string of words separated with spaces and returns a comma delimited string, 'wood bark tree splinter' becomes 'wood,bark,tree,splinter'.

s.contract(array, [replacement])

The contract method should take an array and perform a simple contraction (a,b,c => a). If there is no replacement parameter (optional) it takes the first non-phrase and uses that for the replacement. For example:

['a', 'b b', 'c', 'd']
'a,c,d,b b => a'

If all phrases are used, each phrase is expanded:

['a a', 'b b', 'c c', 'd d']
'a a,b b,c c,d d => a a,b b,c c,d d'

s.genre(object)

The genre method should take a hierarchy object and perform genre expansion (a => a,b,c).

Given the following object:

{
  pet: {
    cat: {
      kitten: 'kitten',
    },
    dog: {
      puppy: 'puppy',
    }
  }
}

Result will be:

cat => cat,pet
kitten => kitten,cat,pet
dog => dog,pet
puppy => puppy,dog,pet

There must be only one common ancestor. Each subsequent element starts off lhs, then fat arrow, then itself and predecessors.

s.explicit(array, [array])

If a single array, comma delimits lhs and duplicates on rhs:

s.explicit(['g b', 'gb', 'great britain']);
> g b,gb,great britain => g b,gb,great britain

If two arrays, second array becomes the rhs:

s.explicit(['g b', 'gb', 'great britain'], ['britain', 'england', 'scotland', 'wales']);
g b,gb,great britain => britain,england,scotland,wales

s.stringify(array or object)

Takes an array or object and stringifies it. With an object, a new line is inserted after each attribute (just the top level values are flattened):

{
  a: ['a', 'b'],
  c: ['c', 'd'],
}
'a,b\nc,d'

s.stringToArray(string)

Takes a string and splits on new line character. Any comments (#) are removed. Used as the starting point for config file processing.

s.parseFile(string)

Takes a config file (as a string), like the example token filter file in the introduction, and converts it to an object (tokens are expanded by default):

{
  'i-pod': ['ipod', 'i-pod', 'i pod'],
  'i pod': ['ipod', 'i-pod', 'i pod'],
  ipod: ['ipod', 'i-pod', 'i pod'],
  'sea biscuit': ['seabiscuit'],
  'sea biscit': ['seabiscuit'],
  foozball: ['foozball', 'foosball'],
  foosball: ['foozball', 'foosball'],
  universe: ['universe', 'cosmos'],
  cosmos: ['universe', 'cosmos'],
  foo: ['foo bar', 'baz']
}

Testing

Run npm run test

References