npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

archivator

v1.0.2

Published

Ever wanted to archive your own copy of articles you enjoyed reading and to be able to search through them?

Downloads

17

Readme

Archivator

Ever wanted to archive your own copy of articles you enjoyed reading and to be able to search through them?

| Version | | ---------------------------------------------------------------------------------------------------------------------------------------------- | | npm |

CURRENT STATUS: This is frozen v1.x branch, future work is under v3.x-dev branch, but usable as-is see renoirb/archivator-demo

Summary

This project is a means to try out ECMAScript 2017 tooling and do something useful. See Challenge below.

The objective of this project is to:

(Note Check marks below :white_check_mark: denotes that work had been done and should be usable)

  • :white_check_mark: Cache HTML payload of source Web Pages URLs we want archived (see src/fetcher.js)
  • :white_check_mark: Store files for each source URL at a consistent path name (see src/normalizer/slugs.js) (see v3.x-dev url-dirname-normalizer)
    • :white_check_mark: Extract assets, download them for archiving purposes (see src/transformer.js at extractAssets and src/normalizer/assets.js) (see v3.x-dev @archivator/archivable)
    • :white_check_mark: Download images ("assets") from Web Pages (see v3.x-dev @archivator/archivable)
    • :white_check_mark: Rename assets in archive and adjust archived version to use cached copies (see src/normalizer/hash.js and src/transformer.js at reworkAssetReference) (see v3.x-dev @archivator/archivable)
    • :white_check_mark: Do not download tracking images and/or ignore inline base64 images
  • Read link list from different source list
    • RSS xml document
    • :white_check_mark: CSV file (defaults to archive/index.csv)
  • :white_check_mark: Extract the main content for each article (see src/transformer.js at extractAssets) (see v3.x-dev @archivator/archivable)
  • :white_check_mark: Export into simplified excerpt document (see src/transformer.js at markdownify) (see v3.x-dev @archivator/content-divinator)
  • Add documents into a search index
  • Make a stand-alone bundle using Rollup
  • :white_check_mark: (incomplete) Make it usable as an external module (see renoirb/archivator-demo)
  • :white_check_mark: Make it an NPM package

Use

Install production only dependencies.

Assuming you have dist/ compiled (see Build below), and you deleted node_modules/.

npm install --only=production

Edit example.js, add more urls (if you want)

node example.js

Run through Babel

yarn install

Create a folder archive/, add an index file that we'll use to read and fetch pages from

File is CSV, using semi-column ; as a separator, fields are:

  1. URL to read from
  2. CSS selector to main part of the content you want to keep
  3. One or many CSS selectors (i.e. coma separated, like CSS supports already) of elements you want off of archives (e.g. ads)
// file archive/index.csv
https://renoirboulanger.com/blog/2015/05/converting-dynamic-site-static-copy/;article;
https://renoirboulanger.com/blog/2015/05/add-openstack-instance-meta-data-info-salt-grains/;article;

Run fetcher

npm start

You should see the following in the terminal output

...
Archived renoirboulanger.com/blog/2015/05/converting-dynamic-site-static-copy
Archived renoirboulanger.com/blog/2015/05/add-openstack-instance-meta-data-info-salt-grains

And you should see a few files getting created:

  • cache.html: Is the raw HTML file download from the origin
  • cache.json: Is a JSON cache of gathered metadata from the process
  • index.md: Is the simplified article converted to Markdown
  • Files with letters and numbers are images found in the document
archive/
 `-renoirboulanger.com/
   `-blog/
     `-2015/
       `-05/
         `-add-openstack-instance-meta-data-info-salt-grains/
           |- cache.html
           |- cache.json
           |- 5e6327f278a336349f8bb6b26163dabedb173bcd.png
           |- 881811befc2fa6ad9c8ec058e1be3bd231fdcc1f.png
           |- b69a780dc3278f5d86296d2f219821eeac385f20.jpg
           |- c0e21ae7f0a56374116f08b44087d07ab8710035.png
           |- c3d25fac5b0c573275b15822294e484097edd945
           |- cd5f2a6cfa00a45e755b07013e59cb7c03bb9826.jpg
           |- eb31cca43b832b0016a2211e6e0058b263f4a1c0.png
           |- f6c4338884f46d3942589fcc29611fa68b600bad.png
           |- index.md

Run tests

npm test

Run xo (coding convention linter)

npm run lint

Build

IMPORTANT This is no longer supported and is broken, see note in dist/README.md

Run in Node.js, as ECMASCript 5 transpiled code.

yarn install
npm run build

Should do the same as if we ran npm start with modern Node.js v6+ with Babel

node dist/cli.js

Challenge

Make an archiving system while learning how to use bleeding edge JavaScript.

  • Use ECMAScript 2016’ Async/Await along with Generators (function * (){ /* ... */ yield 'something'; })
  • Figure out how to export into ES5
  • Figure out how to package, test and so on
  • Least number of dependencies as possible for development
  • (Ideally) No dependencies to run once bundled