npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

broken-links-inspector

v1.4.0

Published

Extract and recursively check all URLs reporting broken ones

Downloads

102

Readme

Broken Links Inspector

NPM pipeline status coverage report

This project is heavily inspired by stevenvachon/broken-link-checker.

If you want to use this tool and need any help (instructions, bug fixes, features) open an issue!

Features:

  • inspects a web-page and all its URLs, reports broken ones
  • can go recursively, inspecting all pages within a domain
  • makes requests in parallel, shows indication of "work in progress"
  • does not check URL twice
  • reports OK, TIMEOUT, ERROR CODE or generic error
  • support configurable timeout
  • supports GET and HEAD methods (double checks with GET if HEAD fails)
  • supports a list of excluded URLs (glob matching) and/or excluded prefixes (e.g. mailto:)
  • can define OK codes, such as 999 for linkedin
  • supports different reporting, such as colored console or JUnit file
  • JUnit report is best used with CI (tested with GitLab)
  • need a feature, go to issues

How to install and run

npm i -g broken-links-inspector

bli inspect https://dbogatov.org -r -t 2000 -s linkedin --reporters console

# or
# bli inspect file://links.txt
# with a URL per line in a file links.txt
................................................................................
................................................................................
........................
original request
	OK      : https://dbogatov.org/
	OK: 1, skipped: 0, broken: 0
https://dbogatov.org/
	OK      : https://scholar.google.com/citations?user=Mq8ButkAAAAJ
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/resume.pdf
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/cv.pdf
	OK      : https://twitter.com/Dima4ka007
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/vendor/css/merged.css
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/vendor/js/merged.js
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/dmytro-bogatov.jpg
	OK      : https://dbogatov.org/contact
	OK      : https://dbogatov.org/research
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/favicon.ico
	OK      : https://dbogatov.org/publications
	OK      : https://www.googletagmanager.com/gtag/js?id=UA-65293382-4
	OK      : https://stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css
	OK      : https://git.dbogatov.org/dbogatov/research-website/commit/39ecd1a9
	OK      : https://dbogatov.org/projects
	OK      : https://www.facebook.com/dkbogatov
	OK      : https://dbogatov.org/education
	OK      : https://github.com/dbogatov
	OK: 18, skipped: 3, broken: 0
https://dbogatov.org/education
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/config/grades.yml
	OK: 1, skipped: 21, broken: 0
https://dbogatov.org/projects
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/mandelbrot.png
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/matters-proj.png
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/shevastream.png
	OK      : https://github.com/WPIMHTC
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/status-site.png
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/bu-logo.png
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/fabric.png
	OK      : https://github.com/dbogatov/shevastream
	OK      : https://legacy.dbogatov.org/Project/Mandelbrot
	OK      : https://github.com/dbogatov/legacy-website
	OK      : https://github.com/IBM/dac-lib
	OK      : https://github.com/dbogatov/status-site
	OK      : https://github.com/dbogatov/ore-benchmark
	OK      : https://shevastream.com/
	OK      : https://status.dbogatov.org/
	OK      : https://ore.dbogatov.org/
	OK      : http://matters.mhtc.org/
	OK      : https://dbogatov.org/assets/docs/dac-fabric.pdf
	OK: 18, skipped: 21, broken: 0
https://dbogatov.org/publications
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/mqp-paper.pdf
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/econ-paper.pdf
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/ore-presentation.pdf
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/ore-poster.pdf
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/ore-benchmark.pdf
	OK      : http://dispot.korkinlab.org/
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/dac-fabric.pdf
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/dispot.pdf
	OK      : https://hub.docker.com/r/korkinlab/dispot
	OK      : https://github.com/korkinlab/dispot
	OK      : https://digitalcommons.wpi.edu/cgi/viewcontent.cgi?article=2915&context=iqp-all
	OK      : https://dl.acm.org/doi/10.14778/3324301.3324309
	OK      : https://doi.org/10.14778/3324301.3324309
	OK      : https://doi.org/10.1093/bioinformatics/btz587
	OK      : https://academic.oup.com/bioinformatics/article/35/24/5374/5539863
	OK: 15, skipped: 21, broken: 0
https://dbogatov.org/research
	OK      : http://people.cs.georgetown.edu/~kobbi/
	OK      : https://arxiv.org/abs/1706.01552
	OK      : https://www.cs.bu.edu/~reyzin/
	OK      : http://www.cs.bu.edu/~gkollios/
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/bjoern.png
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/kobi.jpg
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/kellaris.jpeg
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/lorenzo.png
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/leo.png
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/adam.jpg
	OK      : http://www.cs.bu.edu/fac/gkollios/
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/kollios.png
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/pixel.jpg
	OK      : https://www.icloud.com/sharedalbum/
	OK      : https://www.cics.umass.edu/people/oneill-adam
	OK      : https://computerscience.uchicago.edu/people/profile/lorenzo-orecchia/
	OK      : https://midas.bu.edu/
	OK      : https://dblp.org/pers/t/Tackmann:Bj=ouml=rn.html
	OK      : https://dbogatov.org/assets/docs/ore-benchmark.pdf
	OK      : https://dbogatov.org/assets/docs/dac-fabric.pdf
	OK: 20, skipped: 22, broken: 0
https://dbogatov.org/contact
	OK: 0, skipped: 23, broken: 0
OK: 73, skipped: 111, broken: 0

How to use

$ bli inspect -h

Usage: index inspect [options] <url> <file://>

Check links in the given URL or a text file

Options:
  -r, --recursive                             recursively check all links in all URLs within supplied host (ignored for file://) (default: false)
  -t, --timeout <number>                      timeout in ms after which the link will be considered broken (default: 2000)
  -g, --get                                   use GET request instead of HEAD (default: false)
  -s, --skip <globs>                          URLs to skip defined by globs, like '*linkedin*' (default: [])
  --reporters <coma-separated-strings>        Reporters to use in processing the results (junit, console) (default: ["console"])
  --retries <number>                          The number of times to retry TIMEOUT URLs (default: 3)
  --user-agent <string>                       The User-Agent header (default: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15
                                              (KHTML, like Gecko) Version/14.1 Safari/605.1.15")
  --ignore-prefixes <coma-separated-strings>  prefix(es) to ignore (without ':'), like mailto: and tel: (default: ["javascript","data","mailto","sms","tel","geo"])
  --accept-codes <coma-separated-numbers>     HTTP response code(s) (beyond 200-299) to accept, like 999 for linkedin (default: [999])
  --ignore-skipped                            Do not report skipped URLs (default: false)
  --single-threaded                           Do not enable parallelization (default: false)
  -v, --verbose                               log progress of checking URLs (default: false)
  -h, --help                                  display help for command

Return code is 1 if at least one broken link detected, 0 otherwise.

-r, --recursive will instruct inspector to keep checking all URLs in the original domain. Very useful for checking an entire website, such as personal blog. For example, bli inspect https://yoursite.com -r will check yoursite.com and if it finds something like yoursite.com/contact it will check that as well and will keep going. It will check all URLs on all pages, but will not parse "external" pages.

-t, --timeout <number> given in milliseconds sets a timeout for a request. If this timeout is exceeded, the check fails with TIMEOUT.

-g, --get instructs to use GET request instead fo the default HEAD request. If HEAD request fails, the URL will be retried with GET.

-s, --skip <coma-separated-globs> is a list of globs or parts of URL to skip. As an example, -s *linkedin* -s hello will instruct to skip all URLs which contain either linkedin or hello in them.

--reporters <coma-separated-strings> is a list of reporters to process the result. Currently there are two: console and junit. console will print appealing colored report to the console. junit will produce junit-report.xml file in the current directory. JUnit file treats pages as test suites and URLs in a page as test cases.

--retries will instruct the number of times to try a URL before declaring it failed.

--user-agent <string> will use specified User-Agent header (some websites reply with 401 Unauthorized for "bots")

--ignore-prefixes <coma-separated-strings> is a list of prefixes/ schemas to skip, such as mailto:. Provided list should not include colons.

--accept-codes <coma-separated-numbers> is a list of HTTP code to consider successful, like 999 for linkedin.

--ignore-skipped excludes skipped URLs from reports.

--single-threaded mandates a sequential execution (should be used in for debugging).

-v, --verbose currently unused.

How to build

npm install # to install dependencies

npm run build # to compile TS (result in ./dist/index.js)

npm run coverage # to run tests and coverage