npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

fetch-filecache-for-crawling

v5.1.1

Published

Implementation of a `fetch` that extends the implementation from `node-fetch` to add an HTTP cache using a local cache folder for crawling purpose.

Downloads

1,863

Readme

Implementation of fetch with a file-based HTTP cache for crawling purpose

Node.js module that exports a fetch function that extends the implementation of Node.js native fetch to add an HTTP cache using a local cache folder.

The code was developed for a particular scenario with specific requirements in mind, and no attempts were made to generalize them. Publication as an npm package is mostly intended to ease reuse by a couple of specific projects.

Typically, the module is intended to be used for crawling purpose and makes the following assumptions, which do not hold true in other cases:

  1. The user is only interested in GET requests (although this will be fixed, see #3)
  2. The HTTP headers sent with the request do not matter for the response (although this will be fixed as well, see #3)
  3. The user wants to preserve cached files in a folder, even after the application is done running. That file cache will be used upon next run of the application to send conditional requests.
  4. The user will want to control the cache expiration strategy, through the refresh parameter. By default, the cache follows HTTP expiration rules but setting the parameter to e.g. once will make the cache behave completely differently. The ability to tweak that behavior is the module's main added value!

Installation

Run npm install fetch-filecache-for-crawling.

Usage

const fetch = require('fetch-filecache-for-crawling');

// URLs to crawl, some of which may be identical
let urls = [
  'https://caniuse.com/data.json',
  'https://caniuse.com/data.json'
]

Promise.all(urls.map(url =>
  fetch(url, { logToConsole: true })
    .then(response => response.json())
    .then(json => console.log(Object.keys(json.data).length +
      ' entries in Can I Use'))
)).catch(err => console.error(err));

Configuration

On top of usual fetch options, the following optional parameters can be passed to fetch in the options parameter to change default behavior:

  • cacheFolder: the name of the cache folder to use. By default, the code caches all files in a folder named .cache.
  • resetCache: set to true to empty the cache folder when the application starts. Defaults to false. Note that the cache folder will only be reset once, regardless of whether the parameter is set to true in subsequent calls to fetch.
  • refresh: the refresh strategy to use for the cache. Values can be one of:
    • force: Always consider that the content in the cache has expired
    • default: Follow regular HTTP rules (that is the mode by default)
    • once: Fetch the URL at least once, but consider the cached entry to then be valid throughout the lifetime of the application
    • never: Always consider that the content in the cache is valid
    • an integer: Consider that cache entries are valid for the given period of time (in seconds)
  • logToConsole: set to true to output progress messages to the console. Defaults to false. All messages start with the ID of the request to be able to distinguish between them.

For instance, you may do:

const fetch = require('fetch-filecache-for-crawling');

fetch('https://www.w3.org/', {
  resetCache: true,
  cacheFolder: 'mycache',
  logToConsole: true
}).then(response => {});

Configuration parameters may also be set for all requests programmatically by calling fetch.setParameter(name, value) where name is the name of the parameter to set and value the value to set it to. Note parameters passed in options take precedence).

Licensing

The code is available under an MIT license.