npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

mpa-archive

v3.0.18

Published

Crawls a Multi-Page Application to a zip file, serve the Multi-Page Application from the zip file. A MPA archiver. Could be used as a Site Generator

Downloads

86

Readme

Multi-Page Application Archive

Crawls a Multi-Page Application into a zip file. Serve the Multi-Page Application from the zip file. A MPA archiver. Could be used as a Site Generator.

Installation

npm install -g mpa-archive

Usage

Crawling

mpa http://example.net

Will crawl the url recursively and save it in example.net.zip. Once done, it will display a report and can serve the files from the zip.

SPA mode

The original idea is to save the HTML generated by JavaScript, to allow search engines index the content of a website that uses JavaScript. This has the undesired result that some applications, specially SPA may not work. To save the original HTML instead of the rendered HTML you can use the --spa option, which will save the original HTML and avoid re-writing links.

mpa https://example.net --spa

Serving

mpa

Will create a server for each zip file on the current directory. binds to 0.0.0.0 (it can be opened in localhost) the port is random but seeded to the zip file name, so it remains the same.

Features

  • It uses headless puppeteer
  • Crawls http://example.net with cpu count / 2 threads
  • Progress is displayed in the console
  • Fetches sitemap.txt and sitemap.xml as a seed point
  • Reports HTTP status codes different than 200, 304, 204, 206
  • Crawls on site urls only but will fetch external resources
  • Intercepts site resources and saves that too
  • Generates mpa/sitemap.txt and mpa/sitemap.xml
  • Saves site sourcemaps
  • Can resume if process exit, save checkpoint every 250 urls
  • When serving what has been crawled, if an url is not found it will fetch it from source and update the zip
  • domain blacklist https://github.com/potahtml/mpa-archive/blob/master/src/lib/blacklist.js
  • downloads are saved via fetch requests

Legends

  • 🍳 a url has been opened in a tab for crawling
  • 🧽 a link on page has been focused (for when js modules are preloaded/loaded on focus)
  • ⚠ response headers contains a status code differently than 200, 304, 204, 206
  • 🧭 the document.documentElement.outerHTML has been saved
  • ✔ the response of a fetch request has been saved, or the response body of a request made by the tab has been saved
  • 🔗 a fetch request has been fired
  • 🛑🍳 the opened tab gave an error (it retries with via a fetch request)
  • 🛑🔗 a fetch request gave en error

To Consider

  • save it in an incremental compression format, that doesnt require re-compressing the whole file when it changes, maybe already does that?
  • urls to externals resources are not re-written to be local resources, if this is done then stuff loaded from the root will break
  • it should maybe crawl the site by clicking the links instead of opening a full tab
  • crawl updates
  • should scroll the page for stuff with loading=lazy