npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

linkscanner

v0.4.2

Published

A CLI and node library to scan a source URL and recursively check URLs

Downloads

64

Readme

Linkscanner

npm version Build Status

Linkscanner will recursively scan your site for broken links, including css and javascript files if you ask it to.

usage:

linkscanner

Scans a web page for broken links

Options:
  --help                     Show help                                 [boolean]
  --version                  Show version number                       [boolean]
  --followRedirects, -L      Follow 301 & 302 redirects to their destination and
                             report that result                        [boolean]
  --debug, -d                Print additional debug logging to stderr  [boolean]
  --verbose, -v              Print more information in the output results
                             (formatter dependent)                     [boolean]
  --compact, -c              Print less information in the output results
                             (formatter dependent)                     [boolean]
  --no-progress, -P          Do not display a progress bar             [boolean]
  --progress, -p             display a progress bar                    [boolean]
  --total                    Give the progress bar a hint of approx how many
                             URLs we will scan                          [number]
  --ignore-robots-file       Causes linkscanner to not respect robots file rules
                             like disallow or crawl delay              [boolean]
  --recursive, -r            Recursively crawl all links on the same host
  --XGET                     Always use a GET request when normally would use a
                             HEAD                                      [boolean]
  --user-agent               A user-agent string to be used when sending
                             requests                                   [string]
  --max-concurrency          The maximum number of simultaneous requests going
                             out from your computer                     [number]
  --timeout                  The maximum time to wait (in seconds) for a
                             response before writing a timeout to the results
                                                          [number] [default: 10]
  --headers, -H                                                          [array]
  --formatter, -f, --format  Set the output formatter or format string.
                             Options: console (default), table, json, csv,
                             or format string like "url: %{url_effective}"
                                                                        [string]
  --skip-leaves              Do not issue a HEAD request to leaf urls, simply
                             print them (implies show-skipped)         [boolean]
  --exclude-external, -e     Do not test links that point to other hosts
                                                                       [boolean]
  --show-skipped             Display skipped results in the formatted output
                                                                       [boolean]
  --include, -i              CSS Selector for which HTML elements to inspect.
                             Examples: "a", "link[rel=\"canonical\"]", "img",
                             "script", "form", "iframe", "all"           [array]
  --only                     A content type (or list of content types) to parse.
                             All other content types will not be scanned for
                             links.                                      [array]

example:

▶ linkscanner -r http://gordonburgett.net
200 GET  http://gordonburgett.net/
	26 links found. 0 not checked. 0 broken.
	301 GET  http://gordonburgett.net/albania
	301 GET  http://gordonburgett.net/search
	301 HEAD http://www.gordonburgett.net/index.xml
	301 HEAD http://www.gordonburgett.net/

200 GET  http://gordonburgett.net/sq/
	found on http://gordonburgett.net/
	14 links found. 0 not checked. 0 broken.
	301 GET  http://gordonburgett.net/sq/albania
	301 GET  http://gordonburgett.net/sq/search
	301 HEAD http://www.gordonburgett.net/index.xml
	301 HEAD http://www.gordonburgett.net/sq/

200 GET  http://gordonburgett.net/post/2019/04_summer_projects_2019/
	found on http://gordonburgett.net/
	13 links found. 0 not checked. 0 broken.
	301 GET  http://gordonburgett.net/albania
	301 GET  http://gordonburgett.net/search
	301 HEAD http://www.gordonburgett.net/index.xml
	301 HEAD http://biblehub.com/esv/1_corinthians/3.htm
	301 HEAD https://www.teamalbania.org/blog/2019-02-25-the-plus-in-albania-plus
	301 HEAD http://biblehub.com/esv/1_corinthians/3.htm
	301 HEAD http://www.gordonburgett.net/post/2019/04_summer_projects_2019/

...

When STDOUT is not a TTY (for example, when piping) then the default output formatter is the Table format. This makes it easy to work with using command line tools:

▶ linkscanner  http://gordonburgett.net/ | awk '{ print $3,$1 }'
http://gordonburgett.net/ 200
http://gordonburgett.net/albania 301
http://gordonburgett.net/search 301
http://gordonburgett.net/sq/ 200
http://www.gordonburgett.net/index.xml 301
http://gordonburgett.net/# 200
http://gordonburgett.net/pubkey.txt 200
http://gordonburgett.net/post/2019/04_summer_projects_2019/ 200
http://gordonburgett.net/post/2018/07_acts_29_in_albania/ 200
http://gordonburgett.net/post/2018/05_enrolled_at_hogwarts/ 200
http://gordonburgett.net/post/2018/04_upcoming_summer_project/ 200
http://gordonburgett.net/post/2017/12_mens_retreat/ 200
http://gordonburgett.net/post/2017/10_nothings_over/ 200
http://gordonburgett.net/post/2017/07_making_a_searchable_static_site/ 200
http://gordonburgett.net/post/2017/07_home/ 200
http://gordonburgett.net/post/ 200
http://www.gordonburgett.net/ 301
https://github.com/gburgett 200

The verbose table format is a markdown table:

▶ linkscanner -v http://gordonburgett.net/ | pbcopy

| status | method | url | ms | parent | | ------ | ------ | -------------------------------------------------------------------------------- | ---- | -------------------------------------------------------------------------------- | | 200 | GET | http://gordonburgett.net/ | 86 | | | 301 | HEAD | http://gordonburgett.net/albania | 67 | http://gordonburgett.net/ | | 301 | HEAD | http://gordonburgett.net/search | 52 | http://gordonburgett.net/ | | 200 | HEAD | http://gordonburgett.net/sq/ | 55 | http://gordonburgett.net/ | | 301 | HEAD | http://www.gordonburgett.net/index.xml | 55 | http://gordonburgett.net/ | | 200 | HEAD | http://gordonburgett.net/# | 51 | http://gordonburgett.net/ | | 200 | HEAD | http://gordonburgett.net/pubkey.txt | 50 | http://gordonburgett.net/ | | 200 | HEAD | http://gordonburgett.net/post/2019/04_summer_projects_2019/ | 77 | http://gordonburgett.net/ | | 200 | HEAD | http://gordonburgett.net/post/2018/07_acts_29_in_albania/ | 49 | http://gordonburgett.net/ | | 200 | HEAD | http://gordonburgett.net/post/2018/05_enrolled_at_hogwarts/ | 40 | http://gordonburgett.net/ | | 200 | HEAD | http://gordonburgett.net/post/2018/04_upcoming_summer_project/ | 45 | http://gordonburgett.net/ | | 200 | HEAD | http://gordonburgett.net/post/2017/12_mens_retreat/ | 67 | http://gordonburgett.net/ | | 200 | HEAD | http://gordonburgett.net/post/2017/10_nothings_over/ | 59 | http://gordonburgett.net/ | | 200 | HEAD | http://gordonburgett.net/post/2017/07_making_a_searchable_static_site/ | 51 | http://gordonburgett.net/ | | 200 | HEAD | http://gordonburgett.net/post/2017/07_home/ | 47 | http://gordonburgett.net/ | | 200 | HEAD | http://gordonburgett.net/post/ | 44 | http://gordonburgett.net/ | | 301 | HEAD | http://www.gordonburgett.net/ | 54 | http://gordonburgett.net/ | | 200 | HEAD | https://github.com/gburgett | 1375 | http://gordonburgett.net/ |

To get back to the regular format you can use -f console:

▶ linkscanner  http://gordonburgett.net/ -f console | pbcopy
200 GET  http://gordonburgett.net/
	26 links found. 0 not checked. 0 broken.
	301 HEAD http://gordonburgett.net/albania
	301 HEAD http://gordonburgett.net/search
	301 HEAD http://www.gordonburgett.net/index.xml
	301 HEAD http://www.gordonburgett.net/

Check JS, CSS, and Images with the --include=all option:

▶ linkscanner -i all -f table http://gordonburgett.net
...
200	HEAD	http://maxcdn.bootstrapcdn.com/font-awesome/4.3.0/css/font-awesome.min.css      	  63	http://gordonburgett.net/
200	HEAD	http://gordonburgett.net/css/uno.min.css                                        	 108	http://gordonburgett.net/
200	HEAD	https://github.com/gburgett                                                     	 847	http://gordonburgett.net/
200	HEAD	http://gordonburgett.net/css/custom.css                                         	 121	http://gordonburgett.net/
200	HEAD	https://code.jquery.com/jquery-2.2.4.min.js                                     	  92	http://gordonburgett.net/
200	HEAD	http://gordonburgett.net/js/highlight/styles/vs2015.css                         	 181	http://gordonburgett.net/
200	HEAD	http://gordonburgett.net/js/main.min.js                                         	  92	http://gordonburgett.net/
200	HEAD	http://gordonburgett.net/js/custom.js                                           	  88	http://gordonburgett.net/
200	HEAD	http://gordonburgett.net/js/highlight/highlight.pack.js                         	  87	http://gordonburgett.net/

When scanning application/json responses, by default the scanner looks for URLs in any of the following keys, no matter how deep in the document: links, _links, link, _link, or url.

If you want to scan more keys for links, use the --include option with a jsonpath selector, or use --include all which will scan every key.