npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

manger

v8.0.5

Published

cache feeds

Downloads

81

Readme

Build Status Coverage Status

manger - cache feeds

The Manger Node.js package provides caching for RSS and Atom formatted XML feeds, it implements an interface to query entries by feed and time. The obvious challenge here is to build a resilient system facing potentially misconfigured servers and malformed feeds. Most of Manger’s API is implemented as streams.

Manger leverages the lexicographical key sort order of LevelDB. The keys are designed to stream feeds or entries in time ranges between now and some user defined point in the past.

REPL

There’s a REPL for exploring the API.

$ npm start

> [email protected] start /Users/michael/node/manger
> ./repl.js

manger> const feeds = cache.feeds()
manger> feeds.write('http://rss.art19.com/the-daily')
true
manger> read(feeds, 'title')
manger> 'The Daily'
manger>

Data and Types

void()

null | undefined Absence of any object value, intentional or not.

str()

String() | void() An optional String().

html()

A sanitized HTML String() with a limited set of tags: 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'blockquote', 'p', 'a', 'ul', 'ol', 'li', 'b', 'i', 'strong', 'em', 'code', 'br', 'div', 'pre'.

enclosure()

A related resource of an entry().

  • href str()
  • length Number()
  • type str()

entry()

An individual entry.

  • author str()
  • duration Number() | null The value of the <itunes:duration> tag in seconds or null.
  • enclosure enclosure() | void()
  • id String() A globally unique, not the original, (SHA-1) identifier for this entry.
  • image str()
  • link str()
  • originalURL str() The originally requested URL.
  • subtitle str()
  • summary html() | void()
  • title str()
  • updated str()
  • url str() The URL of this entry’s feed.

Why SHA-1, cryptographic hashing, to produce the id property?

Having a good hash is good for being able to trust your data, it happens to have some other good features, too, it means when we hash objects, we know the hash is well distributed and we do not have to worry about certain distribution issues.

Read more here.

feed()

One metadata object per XML feed.

  • author str()
  • copyright str()
  • id str()
  • image str()
  • language str()
  • link str()
  • originalURL str()
  • payment str()
  • subtitle str()
  • summary str()
  • title str()
  • ttl str()
  • updated Number()
  • url str()

Creating a Cache

const {createLevelDB, Manger, Opts} = require('manger');

We need a Level database for creating a cache. Additionally, we can pass options or use defaults.

Opts

Options for a Manger instance.

  • cacheSize = 16 * 1024 * 1024 LevelDB cache size for uncompressed blocks.
  • counterMax = 500 Limits the items in the ranks counter.
  • failures = { set, get, has } LRU cache for failures.
  • force = false A flag to bypass cached data entirely.
  • highWaterMark Buffer level when stream.write() starts returning false.
  • isEntry = function (entry) { return true } A function to validate entries.
  • isFeed = function (feed) { return true } A function to validate feeds.
  • objectMode = false Read Object() instead of Buffer().
  • redirects = { set, get, has } LRU cache for redirects.
const db = createLevelDB('/tmp/manger');
const opts = {}; // | new Opts()
const cache = new Manger(db, opts);

Querying the Manger Cache

A Manger cache object provides multiple streams to access data. Selecting data works by writing query objects to these streams.

const {Queries, Query} = require('manger');

Query

A Query to get a feed or entries of a feed in a time range between Date.now() and since. Conceptually consequent, but semantically inaccurate, the since date is exclusive. If you pass the updated date of the latest entry received, this entry will not be included in the response.

Sourced with inaccurate URLs the Query constructor throws.

Queries

For convenience, the Queries class transforms JSON to queries which can be piped to feeds() and entries(). The expected JSON input format:

[
  { "url": "http://feeds.5by5.tv/directional" },
  { "url": "http://www.newyorker.com/feed/posts",
    "since": 1433083971124 },
  { "url": "https://www.joyent.com/blog/feed",
    "since": "May 2015" },
  ...
]

Where "since" can be anything Date() is able to parse.

Querying Feeds

The distinction between feed and entries might be unclear: a feed models the metadata of an RSS or Atom feed (title, author, published, etc.), while entries are the actual items in the feed. These are detached to not repeatedly transmit feed metadata—after all Manger tries to reduce round-trips.

cache.feeds()

Returns a Transform stream that transforms queries or URL strings to feeds.

  • write(query() | String())
  • read() Buffer() | String() | Object() | feed()
const feeds = cache.feeds();

feeds.on('readable', () => {
  let data;

  while ((data = this.read())) {
    console.log(data);
  }
});

feeds.write(new Query({url: 'http://rss.art19.com/the-daily'}));

Querying Entries

Access granular ranges of items within multiple feeds at once – using a single stream.

cache.entries()

Returns a Transform stream that transforms queries or URLs to entries.

  • write(Buffer() | String() | Object() | query())
  • read() Buffer() | entry()
const entries = cache.entries();

entries.on('readable', () => {
  let data;

  while ((data = this.read())) {
    console.log(data);
  }
});

entries.write(new Query({url: 'http://rss.art19.com/the-daily'}));

Updating the Cache

It’s common to update all cached feeds on a regular basis. Manger uses a counter to reason about which feeds to update first.

cache.flushCounter(cb)

  • cb Function(error, count) | void() The callback
    • error Error() | void() The possible error
    • count Number() Total number of cached feeds

A Manger cache keeps an in-memory count of how many times feeds have been accessed. This function flushes the counter to disk, updating the ranks index.

cache.update()

Updates all ranked feeds and returns a stream that emits feed URLs of updated feeds. This, of course, is a resource heavy long-running operation! Feeds are updated ordered by their popularity, using the rank index, therefor flushCounter must have been invoked before this method takes any effect.

  • read() str()

Removing Feeds from the Cache

Sometimes feeds contain redundant or unwanted items, so you might want to remove them from the cache.

cache.remove(url, cb)

Attempts to remove a feed matching the url from the cache and applies callback without error if this succeeds.

  • url String() The URL of the feed
  • cb Function(error) | void() The callback
    • error Error() | void() The possible error

Observing Cache Contents

cache.list()

A Readable stream of URLs of all feeds currently cached.

  • read() Buffer() | str()

cache.has(url, cb)

Applies callback cb without arguments if a feed with this url is cached.

  • url String() The URL of the feed

  • cb Function(error) | void() The callback

    • error Error() | void() The possible error

cache.ranks(limit)

A stream of URLs sorted by rank (highest first).

  • limit Number() Optionally, limit the number of URLs.

This stream lets you:

  • read() Buffer() | str()

cache.resetRanks(cb)

Resets the ranks index.

Event: 'hit'

  • query() The query that hit the cache.

For each cache hit the Manger cache emits a 'hit' event.

Installation

With npm, do:

$ npm install manger

License

MIT License