npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

ramekin

v0.1.4

Published

An open source, real time trend detection library. This project uses machine learning to detect trends in text (i.e. news stories) over time.

Downloads

43

Readme

ramekin

An open source, real time trend detection library. This project uses machine learning to detect trends in text (i.e. news stories) over time.

Trends are identified by detecting phrases that start occurring much more frequently than those that don't typically occur. Various natural language processing and data science techniques are used to ensure similar words are modelled together (i.e. "cycle", "cycling" and "cyclist" all reduce down to a common word form, such as "cycle").

Documents can be grouped by a subject, so it is possible to detect "localised" trends - for example, 7 articles talking about a new bike from Santa Cruz might. Similar phrases tend to relate to a particular trend, so hierachical clustering is used to make sure documents related to the same trend are grouped, rather than creating two "trends" about the same thing. For example, "doping scandal" and "Tour de France" are likely to be about the same thing...allegedly.

Keywords: trending, trends, news, natural language processing, NLP, machine learning, artificial intelligence, data science, hierarchical clustering.

Quick Start

Document format

Documents need to be ingested into a ramekin using the following format:

{
  _id: <Unique ID - can be any format>,
  body: "Text",
  date: <ISO Date format string, or JavaScript date object>,
  subject: <Any object>
}

"Hello World" example

This is the simplest example. A very crude data file has been created with random text, and two "constructed" trends - one based on a trend in the "Tech" subject for "Ramekin trending" and another in cycling for "Chris Froome Tour de France".

First, install the NPM package for ramekin:

   npm i ramekin

Create a simple script that ingests the data from this file, and detects the trends:

   const Ramekin = require('ramekin');
   const ramekin = new Ramekin();

   // load all the example stories

   // load some stories
   ramekin.ingestAll({..});

   // process the trends
   ramekin.trending(...); 

In a practical example, you will want to look at the last few days. The following code snippet lets you do this:

  get the last 2 days and pass as options...

Configurable

History Window - how far back you want to look for your history to determine typical usage of a particular phrase. During the build, we have found 90 days gives a good balance of coverage and computation.

Balance - ability to configure what consitutes a spike for a trend. It's important that this is kept relative to the particular term used. For example, if the word "cycle" typically occurs in 100 posts per day in the cycling category, if it then occurs an extra 10 times that's not massively significant. However if the phrase "Santa Cruz Hightower" typically occurs once or twice per month (which seems reasonable, given that it's an established product), if Santa Cruz release a new iteration of the bike, and 10 articles appear about the bike within a small time period, then that would be more statistically significant.

Roadmap

  • Eslint code to an ES8 standard
  • Modularise clustering
  • Remove lodash dependancy.
  • Implement/reimplment unit tests in Jasmine.
  • Extract clustering algorithm to a separate module/reuse an existing module.
  • Blog article talking about how it works.
  • Improve error handling
  • Implement the @todos.
  • Tidy up data when it's loaded (i.e. prune documents that have fallen outside the window.
  • Fix filtering with subject-based trending works.
  • Backgroud recalculate trends when a new document is added (configurable).
  • Persistence/redis.
  • Run at scale.

Automated tests:

Unit tests:

  • ensure similar phrases are modelled as one. i.e. "cycle", "cycling", "cyclist".
  • provide 100% coverage for each function within the Ramekin class.

Microservice

There will be an (initially insecure) API available for creating ramekins, adding news stories and getting the current trends from them.

  • Able to take a load of docs and give trends (by category).
  • Supply with example data across a range of subjects.
  • DONE Rank documents for each cluster. The more phrases covered, the higher they rank, if these are equal, rank by the length of the sentence.

Thanks & Credits

Thanks to montemishkin for handing over the ramekin NPM package. Thanks to everyone involved with the natural NPM package.