npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

post-feed-reader

v1.3.1

Published

Discovers and parses news, blog and podcast posts from any website

Downloads

331

Readme

post-feed-reader

npm license

A library to fetch news, blog or podcast posts from any site. It works by auto-discovering a post source, which can be an RSS/Atom/JSON feed or the Wordpress REST API, then fetches and parses the list of posts.

It's meant for NodeJS, but as it is built on Isomorphic Javascript, it can work on browsers if the website allows cross-origin requests.

Originally built for apps that need to list the posts with their own UI, but don't actually manage the blog and need automatic fallbacks when the blog technology does change.

Features

Getting Started

Install it with NPM or Yarn:

npm install post-feed-reader # or yarn add post-feed-reader

You first need to discover the post source, which will return an object containing a URL to the RSS/Atom/JSON Feed or the Wordpress REST API.

Then you can pass the discovered source to the getPostList, which will fetch and parse it.

import { discoverPostSource, getPostList } from 'post-feed-reader';

// Looks for metadata pointing to the Wordpress REST API or Atom/RSS Feeds
const source = await discoverPostSource('https://www.nytimes.com');

// Retrieves the posts from the given source
const list = await getPostList(source);

// Logs all post titles
console.log(list.posts.map(post => post.title));

Simple enough, eh? Try it on RunKit

Output

See an example of the post list based on the Mozilla blog.

Options

const source = await discoverPostSource('https://techcrunch.com', {
  // Custom axios instance
  axios: axios.create(...),

  // Whether it will prioritize feeds over the wordpress api
  preferFeeds: false,

  // Custom data source filtering
  canUseSource: (source: DiscoveredSource) => true,

  // Whether it will try to guess wordpress api and feed urls if the auto-discovery process fails
  tryToGuessPaths: false,
  
  // The paths that it will query trying to guess both the Wordpress API or the RSS/Atom/JSON feed
  wpApiPaths: ['./wp-json', '?rest_route=/'],
  feedPaths: ['./feed', './atom', './rss', './feed.json', './feed.xml', '?feed=atom'],
});

const posts = await getPostList(source, {
  // Custom axios instance
  axios: axios.create(...),

  // Whether missing plain text contents will be filled automatically from html contents
  fillTextContents: false,

  // Wordpress REST API only options
  wordpress: {
    // Whether it will include author, taxonomy and media data from the wordpress api
    includeEmbedded: true,

    // Whether it will fetch the blog info, such as the title, description, url and images
    // Setting this to true adds one extra http request
    fetchBlogInfo: false,

    // The amount of items to return
    limit: 10,

    // The search string filter
    search: '',

    // The author id filter
    authors: [...],

    // The category id filter
    categories: [...],

    // The tag id filter
    tags: [...],

    // Any additional querystring parameter for the wordpress api you may want to include
    additionalParams: { ... },
  },
});

Skip the auto-discovery

If you already have an Atom/RSS/JSON Feed or the Wordpress REST API url in hands, you can fetch the posts directly:

// RSS, Atom or JSON Feed
const feedPosts = await getFeedPostList('https://news.google.com/atom');

// WordPress API
const wpApiPosts = await getWordpressPostList('https://blog.mozilla.org/en/wp-json/');

Pagination

The post list may have pagination metadata attached. You can use it to navigate through pages. Here's an example:

const result = await getPostList(...);

if (result.pagination.next) {
  // There is a next page!
  
  const nextResult = await getPostList(result.pagination.next);
  
  // ...
}

// You can also check for result.pagination.previous, result.pagination.first and result.pagination.last

Why support other sources, isn't RSS enough?

RSS is the most widely feed format used on the web, but not only it lacks information that might be trivial to your application, the specification is a mess with many vague to implementation properties, meaning how the information is formatted differs from feed to feed. For instance, the description can be the full post as HTML, or just an excerpt, or in plain text, or even just an HTML link to the post page.

Atom's specification is way more rigid and robust, which makes relying on the data trustworthier. It's definitely the way to go in the topic of feeds. But it still lacks some properties that can only be fetched through the Wordpress REST API.

Since WordPress is by far the most used CMS, supporting its API is a great alternative. The Wordpress REST API supports the following over RSS and Atom feeds:

  • Filtering by category, tag and/or author
  • Searching
  • Pagination
  • Featured media
  • Author profile

The JSON Feed format is also just as good as the Atom format, but at the moment very few websites produce it.

How does the auto-discovery works?

  1. Fetches the site's main page
  2. Looks for WordPress API Link headers
  3. Looks for RSS, Atom and JSON Feed <link> metatags
  4. If tryToGuessPaths is set to true, it will query a few common paths to try to find a feed or the WP API.

Most properties are optional, what am I guaranteed to have?

Nothing.

Yeah, there's no property that is required in all specs, thus we can't guarantee any of them will be present.

But! The most basic properties are very likely to be present, such as guid, title and link.

For all the other properties, it's highly recommended implementing your own fallbacks. For instance, showing a substring of the content when the summary isn't available.

The library will try its best to fetch the most data available.