npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

htmldump

v0.1.2

Published

Parsoid HTML dump utility

Downloads

2

Readme

htmldumper

HTML dump script for RESTBase APIs like https://rest.wikimedia.org/.

Installation

npm install

Usage: Dumping a single wiki

Usage: node ./bin/dump_wiki
Example: node ./bin/dump_wiki --domain en.wikipedia.org \
  --ns 0 --apiURL http://en.wikipedia.org/w/api.php \
  --saveDir /tmp

Options:
  --apiURL       [required]
  --domain       [required]
  --ns           [required]
  --host         [required] [default: "http://rest.wikimedia.org"]
  -d, --saveDir  Directory to store a dump in (named by domain) [default: no saving]
  --db, --dataBase  SQLite database name [default: no saving]

Filesystem output

With --saveDir as specified in the example above, a directory structure like this will be created:

/tmp/
  en.wikikpedia.org/
    Aaa/
      123456
    Bbbb/
      456768

The directory names for articles are percent-encoded using JavaScript's encodeURIComponent(). On a repeat run with the same --saveDir path, only updated articles are downloaded. Outdated revisions are deleted. These incremental dumps speed up the process significantly, and reduce the load on the servers.

SQLite database output

With --dataBase set to someSQLiteDB.db, a database will be created / updated. The schema currently looks like this:

REATE TABLE data(
    title TEXT,
    revision INTEGER,
    tid TEXT,
    body TEXT,
    page_id INTEGER,
    namespace INTEGER,
    timestamp TEXT,
    comment TEXT,
    user_name TEXT,
    user_id INTEGER,
    PRIMARY KEY(title ASC, revision DESC)
);

Usage: dumping all RESTBase wikis

You need to install pixz, which is used for parallel lzma / xz compression:

apt-get install pixz

With this in place, follow the instructions:

# node bin/dump_restbase --help

Create HTML dumps in a directoy

Example usage:
node ./bin/dump_restbase --workDir /tmp --dumpDir /tmp

Options:
  -h, --help     Show help and exit.
  -v, --verbose  Verbose logging
  --workDir      Directory to use for in-progress dump files  [default: "/tmp"]
  --dumpDir      Directory to use for finished dump files     [default: "/tmp"]