npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

relatinator

v1.3.0

Published

A humble library for finding related posts and content. Uses tf-idf under the hood. Primarily aimed at static site generators.

Downloads

17

Readme

Relatinator

A humble library for finding related posts and content. Uses tf-idf under the hood. Primarily aimed at static site generators.

Motivation

My "corner of the internet" was lacking a related posts section, and as usual, my first reflex was to try to find something off the shelf. After a couple of tries, I realized nothing was suitable for this particular use case - a static site for which the related post classification must happen at build time.

One approach would be to use a third-party service (plenty of those out there), but I'm a bit allergic to external dependencies for relatively simple problem spaces (I mean, c'mon, it's a blog, not a rocket ship). So, I decided to build my own.

Full post describing the process here: https://darko.io/posts/build-you-a-related-post-classifier

Features

  • Train a tf-idf classifier with your content
  • Get N-related posts for a given documents as input
  • Get top N keywords for a given documents id as input
  • Get top related documents for a given term

Usage

Installation

npm i relatinator

Training

Before you can get related documents, you need to train the TF-IDF with your content. To that end, the library exposes a train function that takes an array of documents as input. A document is defined as an object with an id and content property:

  • id - a unique identifier for the document
  • content - the document's contents; These are expected to be a string. You can concatenate any metadata, descriptions, or anything else you might want to use for matching.
import { train } from "relatinator";

const documents = [
  {
    id: "1",
    content: "This is the first document",
  },
  {
    id: "2",
    content: "This is the second document",
  },
  {
    id: "3",
    content: "This is the third document",
  },
];

train(documents);

Getting related documents

Once you've trained the classifier, you can get related documents for a given document by using the getRelated function. It takes the following arguments:

  • documentToCompare - the content of the document for which you want to get related documents
  • id - the id of the document for which you want to get related documents
  • topN - the number of related documents you want to get
import { train, getRelated } from "relatinator";

const documents = [
  {
    id: "1",
    content: "This is the first document",
  },
  {
    id: "2",
    content: "This is the second document",
  },
  {
    id: "3",
    content: "This is the third document",
  },
];

train(documents);

// Get the top 2 related posts for something
const related = getRelated("This is the first document", "1", 2);

Getting top keywords for a document

You can also get the top keywords for a given document id by using the getTopTerms function. It takes the following arguments:

  • id - the id of the document you want to get top terms for
  • topN - the number of top terms you want to get
// Assuming you've already trained the classifier
import { getTopTerms } from "relatinator";

getTopTerms("your-doc-id-here", 2);

// Example output:
// -> [{ term: 'term1', tfidf: 0.123 }, { term: 'term2', tfidf: 0.456 }]

Getting top related documents for a term

Getting top related documents for a term is also possible. You can use the getTopRelatedDocumentsForTerm function. It takes the following arguments:

  • term - the term you want to get top related documents for
  • topN - the number of top related documents you want to get
import { getTopRelatedDocumentsForTerm } from "relatinator";

getTopRelatedDocumentsForTerm("term", 2);

// Example output:
// -> ["doc-id-1", "doc-id-2"]

Roadmap

  • [x] ~Reduce bundle size (natural isn't too tree-shakeable).~ Externalized it and made it a peer dep in v 1.0.3.
  • [ ] Add practical examples
  • [x] ~Add support for extracting top N keywords from a document (possible utility with automated tagging and linking)~ Added in v1.1.0.
  • [ ] Add summarization support (useful for auto-generated descriptions); Will likely have to use Transfromers for this one.
  • [x] ~Migrate to monorepo and add Astro integration~ Added in v1.2.0.

Acknowledgment

If you found it useful, I would be grateful if you could leave a star in the project's GitHub repository.

Thank you.