npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

mirror-website

v2.0.0

Published

Mirrors websites in a useful way, with corrected links and local copies of assets.

Downloads

21

Readme

Why another mirroring tool?

Have you seen websites lately? Every single one is a special snowflake with unusual requirements if you are going to mirror it as a static site. mirror-website is designed from the beginning with the expectation that you'll need to write some code for those cases. And mirror-website makes it really easy to do that.

Usage

  1. Create your own project, with git init and npm init, in the usual way. Also npm install tools like lodash and cheerio that you may wish to use.
  2. npm install mirror-website
  3. Write your app.js. You'll write preprocessors (which modify the markup first), discoverers (which seek out URLs to be mirrored), and rewriters (your last chance to modify the markup). Here are some examples.

Note the use of $append. Without this, you're replacing all of the standard preprocessors, discoverers and rewriters bundled with mirror-website. You probably don't want to do that.

```javascript
const _ = require('lodash');
const cheerio = require('cheerio');
const fs = require('fs');
const mirror = require('mirror-website');

mirror({

  // Define the sites to be mirrored. We can override properties for
  // individual sites in these objects too, the stuff in "defaults"
  // applies to all of them
  sites: [
    { url: "http://example1.com" },
    { url: "http://example2.com" },
    { url: "http://example3.com" },
    { url: "http://example4.com" }
  ],

  // Config settings for all sites
  defaults: {
    aliases: [
      'some-cdn-that-must-get-mirrored-too-for-every-site.com'
    ],
    // Some special attributes that also carry URLs for this set of sites
    urlAttrs: {
      $append: [ 'altsrc', 'data-original', 'data-original-alt' ]
    },
    preprocessors: {
      $append: [
        // A chance to modify the body before any URL discoverers run
        function(url, body) {
          // modify body, then...
          return body;
        }
      ]
    },
    discoverers: {
      $append: [
        {
          // Discover URLs in special HTML pages that just do browser side redirects.
          // Push them onto the urls array to make sure they get crawled
          type: 'text/html',
          function: function(urls, url, body) {
            const matches = body.match(/location=\'(.*?)\'/);
            if (matches) {
              urls.push(matches[1]);
            }
          }
        }
      ]
    },
    rewriters: {
      $append: [
        {
          // Work around a problem with a redirect page
          type: 'text/html',
          function: function(url, body) {
            if (!url.match(/examples$/)) {
              return body;
            }
            return '<script>location = \'/example/example.htm\';</script>';
          }
        },
        {
          // Replace custom contact forms found on a particular
          // site with a mailto link and instructions to
          // provide the same fields. This logic is specific to
          // the sites I was mirroring that day. The point is
          // to give you the same flexibility

          type: 'text/html',

          function: function(url, body) {
            const $ = cheerio.load(body);
            const $old = $('form.fancy');
            if (!$old.length) {
              return body;
            }
            const mailto = $old.attr('emailto');
            const $link = $('<p style="margin-top: 32px"><a>Please reach out via email to ' + mailto + '.</a></p>');
            $link.find('a').attr('href', 'mailto:' + mailto);
            const $prompt = $('<p style="margin-top: 32px">Be sure to include the following information:</p>');
            const $list = $('<ul></ul>');
            const venue = $('title').text().replace(/^[\s\S]*\|\s*/, '');
            if (venue) {
              var $item = $('<li></li>');
              $item.text('Name of venue: ' + venue);
            } else {
              $item.text('Name of venue');
            }
            $list.append($item);
            $old.find('label').each(function() {
              var $item = $('<li></li>');
              $item.text($(this).text());
              $list.append($item);
            });
            const $new = $('<div></div>');
            $new.append($link);
            $new.append($prompt);
            $new.append($list);
            $old.replaceWith($new);
            return $.html();
          }
        },
      ]
    }
  },

  // Custom callback to modify the config for each site
  // before it actually gets mirrored
  init: function(config) {
    // Change foo.com into just foo, so we do not add domain names
    // when creating folders, and use a "sites/" parent folder
    config.folder = 'sites/' + config.folder.replace(/\.[^/]+$/, '');
  }
}).then(function() {
  console.log('Done!');
})

Note that the configuration object must have a sites property, which should be an array in which every entry has a url property and, optionally, aliases (an array of hostnames considered equivalent to the main one).

You can also set properties like aliases under a defaults key, which applies to every site.

Then run your application:

node app

This creates a subdirectory in the current directory named after the domain name in your URL.

Changelog

2.0.0: abandoned global install in favor of a library for better maintainability and error messages. 1.0.1: request dependency for request-promise.