npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

web-tree-crawler

v3.1.0

Published

A web crawler that builds a tree of URLs

Downloads

3

Readme

web-tree-crawler

A naive web crawler that builds a tree of URLs under a domain using web-tree.

Note: This software is intended for personal learning and testing purposes.

How it works

You pass web-tree-crawler a URL and it tries to discover/visit as many URLs under that domain name as it can within a time limit. When time's up or it's run out of URLs, web-tree-crawler spits out a tree of URLs it visited. There are several configuration options - see the usage sections below.

Install

npm i web-tree-crawler

CLI

Usage

Usage: [option=] web-tree-crawler <url>

Options:
  format     , f  The output format of the tree (default="string")
  headers    , h  File containing headers to send with each request
  numRequests, n  The number of requests to send at a time (default=200)
  outFile    , o  Write the tree to file instead of stdout
  pathList   , p  File containing paths to initially crawl
  timeLimit  , t  The max number of seconds to run (default=120)
  verbose    , v  Log info and progress to stdout

Examples

Crawl and print tree to stdout

$ h=/path/to/file web-tree-crawler <url>

.com
  .domain
    .subdomain1
      /foo
        /bar
      .subdomain-of-subdomain1
        /baz
          ?q=1
    .subdomain2
...

And to print an HTML tree...

$ f=html web-tree-crawler <url>

...

Crawl and write tree to file

$ o=/path/to/file web-tree-crawler <url>

Wrote tree to file!

Crawl with verbose logging

$ v=true web-tree-crawler <url>

Visited "<url>"
Visited "<another-url>"
...

JS

Usage

/**
 * This is the main exported function that crawls and resolves the URL tree.
 *
 * @param  {String}   url
 * @param  {Object}   [opts = {}]
 * @param  {Object}   [opts.headers]           - headers to send with each request
 * @param  {Number}   [opts.numRequests = 200] - the number of requests to send at a time
 * @param  {String[]} [opts.startPaths]        - paths to initially crawl
 * @param  {Number}   [opts.timeLimit = 120]   - the max number of seconds to run for
 * @param  {Boolean}  [opts.verbose]           - if true, logs info and progress to stdout
 * @param  {}         [opts....]               - additional options for #lib.request()
 *
 * @return {Promise}
 */

Example

'use strict'

const crawl = require('web-tree-crawler')

crawl(url, opts)
  .then(tree => { ... })
  .catch(err => { ... })

Test

npm test

Lint

npm run lint

Documentation

npm run doc

Generate the docs and open in browser.

Contributing

Please do!

If you find a bug, want a feature added, or just have a question, feel free to open an issue. In addition, you're welcome to create a pull request addressing an issue. You should push your changes to a feature branch and request merge to develop.

Make sure linting and tests pass and coverage is 💯 before creating a pull request!