npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@salo/mongoose-athena

v1.6.4

Published

A Mongoose plugin for weighted search and pagination

Downloads

10

Readme

mongoose-athena

Publish Development

A plugin to add weighted search and pagination to your schema.

Usage

Install

yarn add @salo/mongoose-athena

Add Athena to your schema:

const athena = require('@salo/mongoose-athena');

MySchema.plugin(athena, {
  fields: [{
    name: 'name',
    prefixOnly: true,
    threshold: 0.3,
    weight: 2
  }, {
    name: 'biography',
    minSize: 4
  }]
});

Then, to use it with weighting you can do:

MySchema.athena({
  query: { /* something to filter the collection */ },
  term: 'Athena',
  sort: 'relevancy', // this is the key to trigger weighting
  page: 1,
  limit: 20
});

This will search name and biography for the term 'athena'. If it is sorted by 'relevancy' then a confidenceScore will be attached to the result. The result looks like so:

{
  docs: [], // matching records in the collection
  pagination: {
    page: Number,
    hasPrevPage: Boolean,
    hasNextPage: Boolean,
    nextPage: Number || null,
    prevPage: Number || null,
    total: Number
  }
}

Or you can use it simply to paginate:

MySchema.athena({
  query: { /* something to filter the collection */ },
  term: 'Athena',
  sort: '-created_at', // this will not add `confidenceScore` to the results
  page: 1,
  limit: 20
});

API

Field options

| Field | Description | Type | Default | |------------|-----------------------------------------------------------------------------------------------------------------------------------------------|---------|---------| | name | The field name in your collection | String | | | prefixOnly | Whether to only match from the start of the string or anywhere in the string e.g. 'ob' would match 'bob' with this off but not when it's on | Boolean | false | | threshold | Value between 0 and 1. It will only count a score if it is greater or equal to this value | Float | 0 | | minSize | The length of the string to start matching against. e.g. if minSize is 4 then the term 'bob' will not search against the field | Int | 2 | | weight | A scaling value to multiply scores by so you can weigh certain fields higher/lower than others | Int | 1 |

Response

| Field | Description | Type | |-------------------------|-----------------------------------------|---------------| | docs | Array of matching documents | Array | | pagination.page | The current page | Int | | pagination.hasPrevPage | Whether or not there is a previous page | Boolean | | pagination.hasNextPage | Whether or not there is a next page | Boolean | | pagination.nextPage | Value of the next page or null | Int || null | | pagination.prevPage | Value of the previous page or null | Int || null | | pagination.total | Total number of matching documents | Int |

How it works

The crux of it lies in the calculateScore method in the helpers directory. This uses the Jaro-Winkler distance to compute how close your search term is (e.g. 'Athena') to the text in your database. Additionally text is ranked higher if it appears at the start rather than the end of a string so 'Athena Rogers' will have a higher confidenceScore than 'Rogers Athena'.

One thing to note is that the search term is not split on spaces but text on the database is. So using our previous example where term = 'Athena Rogers' the text in the database is split into ['Athena', 'Rogers']. Now, Athena Rogers doesn't directly match 'Athena' or 'Rogers' (it scores 0.93 and 0.41 respectively) but this score is accumulated (0.93+0.41) and then multiplied by the position in the string and any weighting applied to the field. We could split the search term to get direct matches and higher scores but this would considerably slow the calculation of the score down by an order of magnitude as every part of the search term would need matching to every part of the field. In my testing the current approach lends itself to speed and logical weighting.

Pagination

The pagination is based on mongoose-paginate-v2 and mongoose-aggregate-paginate-v2. Athena's implementation is an amalgamation of both libraries and it transparently determines if the query is an aggregate or not.

const aggregate = MySchema.aggregate();
const result = await MySchema.athena({
  query: fullNameQuery,
  limit: 10
});

Publishing

  1. Create a feature branch from master
  2. Open a PR from your feature back to master. This can be repeated multiple times between release
  3. For each change update the draft release on github to maintain an accurate changelog
  4. When you are ready to release the library checkout master increment the package.json and push back to origin
  5. On github publish the draft release, ensuring the tag matches the package.json version number. When you publish the tag the CI should kick in and automatically publish for you

Testing

Athena currently has 100% test coverage.

Roadmap

  • Make options (e.g. weighting, minSize) configurable outside of the schema definition.
  • Add more robust tests to ensure there aren't regressions in options going to pagination (e.g. select, sort, etc.).

Prior art (and disclaimer)

I'm not an expert in any of these fields and have very much relied on a few prior projects to reach this point. There's a very high chance there are more efficient ways to accomplish this and I welcome PRs to help this!

That said, many thanks to: