npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

wayback-restore

v0.0.14

Published

Website restoration tool written for Node Js.

Downloads

4

Readme

Wayback Restore JS

This package is experimental.

A website restoration tool written for Node Js.

This package allows you to restore a website from web.archive.org. It was developed for Node JS and can be used in any Electron application.

Install

npm install wayback-restore

Usage

var Wayback = require("wayback-restore");

Wayback.restore({...});
Wayback.downloader({...});
Wayback.snapshot({...});

API

Wayback.restore(options)

restore is a predefined process for restoring an entire website based on the options you specify. You would use this method if you wish to rebuild a website from a point in time.

options

url - string

URL to restore. Ex. https://web.archive.org/web/20150801040409/http://example.com/

If you use url then you do not need to use timestamp and domain.

timestamp - string

Timestamp to restore.

Ex. 20150801040409

domain - string

Domain to restore

directory - string

(Default: restore) Directory to output into.

max_pages - integer

(default: no limit), Maximum number of pages to download. Leave empty for no limit.

links - boolean

(default: true) Set to true to download links found on the page.

assets - boolean

(default: true) Set to true to download CSS, JS, images.

concurrency - integer

(default: 1): Number of downloads to process at once.

Warning: Setting this value too high might get you blocked from web.archive.org.

log - boolean

(default: false) Set to true to enable logging to a log file.

logFile - string

(default: restore.log) Name of the log file to write. It will be written to options.directory.

Methods

start
stop
pause
resume

Events

The following events are emitted.

start(callback)

Fired when restoring starts.

.on("start", function() {
    console.log("[STARTED USING]:", this.settings);
})
restoring - (asset)

When a file begins restoring.

.on("restoring", function(asset) {
   console.log("[RESTORING]", asset.original_url);
})
restored - (asset)

Fired when a file has been downloaded.

.on("restored", function(asset) {
   console.log("[RESTORED]", asset.original_url);
})
cdxquery
.on("cdxquery", function(cdx) {
  console.log("Snapshots Found: ", cdx.size);
})
completed - (results)

When the restore process has completed.

.on("completed", function(results) {
    console.log("restoration has completed");
    console.log("url: ", results.url);
    console.log("domain: ", results.domain);
    console.log("timestamp: ", results.timestamp);
    console.log("directory: ", results.directory);
    //console.log("first file: ", results.first_file);
    //console.log("started: ", results.started);
    //console.log("ended: ", results.ended);
    console.log("restored: ", results.restored_count);
    console.log("failed: ", results.failed_count);
    console.log("Runtime:", results.runtime_hms);
})

Wayback.downloader(options)

This method is useful for downloading all snapshots. This is different from restore because this will download all snapshots from a point in time giving you multiple snapshots for the same asset.

options

url - string

A snapshot URL to download.

Wayback.download({
    url: 'http://web.archive.org/web/20150531/http://www.example.com'
});
domain - string

A domain to download from.

from - string

Only files on or after timestamp supplied (ie. 20150801231334).

Can also be 20150801

to - string

Only files on or before timestamp supplied (ie. 20150801231334).

Can also be 20150801

limit - integer (default: 0)

limit number of files to download.

0 = no limit.

exact_url - boolean (default: false)

Downloads only the url provided not full site.

list - boolean (default: false)

Doesn't download any files.

concurrency - integer (default: 1)

Number of files to download at the same time.

only - string|RegEx (defalt: '')

Only include files matching this filter.

exclude - string|RegEx (defalt: '')

Excludes files matching this filter.

directory - string (defalt: '.')

Directory to output into.

Methods

See Wayback.restore methods.

Wayback.snapshot(options, callback)

Use this to explore snapshots on web.archive.org by querying their CDX server. You can use this to build your own downloader.

Returns a Promise with an array of Assets found.

See https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server#intro-and-usage for more information.

callback (optional) - function(asset)

This is an optional method that will be executed for each snapshot found in the snapshot search. The parameter received in the callback is an Asset.

Example:

Wayback.snapshot(
  {
    url: 'example.com',
    filter: 'statuscode:200',
    collapse: 'digest',
    matchType: 'exact',
    limit: 10,
    from: '20210623',
    to: '20210624'
  },
  (asset) => {
    console.log('Snapshot', asset);
  }
);

options

url

A URL to search for snapshots for. Ex: example.com

matchType

Accepted values of exact, prefix, host, domain.

Defaults to exact.

See: https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server#url-match-scope

output

@deprecated since 0.0.13

Defaults to json.

See: https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server#output-format-json

fl - Array

The fields to return.

Defaults to [ 'urlkey', 'timestamp', 'original', 'mimetype', 'statuscode', 'digest', 'length' ]

See: https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server#field-order

filter (optional)

Defaults to: statuscode:200

See: https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server#filtering

to

Results may be filtered by timestamp using from= and to= params. The ranges are inclusive and are specified in the same 1 to 14 digit format used for wayback captures: yyyyMMddhhmmss

See: Results may be filtered by timestamp using from= and to= params. The ranges are inclusive and are specified in the same 1 to 14 digit format used for wayback captures: yyyyMMddhhmmss

from

Results may be filtered by timestamp using from= and to= params. The ranges are inclusive and are specified in the same 1 to 14 digit format used for wayback captures: yyyyMMddhhmmss

See: Results may be filtered by timestamp using from= and to= params. The ranges are inclusive and are specified in the same 1 to 14 digit format used for wayback captures: yyyyMMddhhmmss

collapse

Defaults to: digest

See: https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server#collapsing

limit

Defaults to CDX's default value.

See: https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server#query-result-limits

Note: fastLatest=true option is not supported.

Wayback.Asset (object)

An Asset is an object that gets downloaded or restured and is returned by various events.

Asset.properties

// CDX urlkey this.key = null;

// the url to restore this.original_url = "";

// path to local file this.restored_file = "";

this.timestamp = "";

// restored | failed | unarchived this.status = RESTORE_STATUS.EMPTY;

// mimetype: html, image, css, js, based on wayback types this.mimetype = "";

Asset.methods

Asset.getSnapShot(flag)

Set to true to fetch file in its original state, without any processing by the Wayback Machine or waybackpack. You will likely want this to always be true.

Wayback.createAsset(obj)

Helper method to create an Asset object.

obj maps to Asset.properties

Wayback.downloadAsset(Asset, directory)

This method will download an Asset to the directory provided.

Asset is an Asset object.

directory is the directory to write the asset content to.

Examples

var restore = Wayback.restore({
    url:
        "http://web.archive.org/web/20150531/http://example.com"
});
var restore = Wayback.restore({
    domain: 'example.com',
    timestamp: "20150531"
});
var restore = Wayback.restore('http://web.archive.org/web/20150531/http://example.com');
Wayback.downloader({
  url: 'https://trufish.org/',
  from: '20181001',
  to: '20201031',
  list: true,
  concurrency: 10,
  exact_url: false,
  exclude: /.(gif|jpg|jpeg|png|svg)$/i
})
  .on('completed', function (results) {
    console.log('completed');
    console.log(results);
  })
  .start((asset) => {
    console.log('Asset', asset.getSnapshotUrl());
  });
Wayback.snapshot({
  url: 'example.com',
  filter: 'statuscode:200',
  collapse: 'digest',
  matchType: 'exact',
  limit: 10,
  from: '20210623',
  to: '20210624'
})
  .then(([snapshots, url]) => {
    console.log('snapshots', snapshots);
    console.log('# snapshots found: ', snapshots.length);
    console.log('CDX Query URL:', url);
  })
  .catch((error) => {
    console.log('error', error);
  });

More examples are in the example.js file of this repository.

TODO

  • Improve documentation
  • Improve this modules API
  • Create a CLI. Maybe as a separate module?

Need a GUI Application?

Checkout Restorizor a Wayback Machine download application built using Electron and powered by this very same wayback-restore.js module.