npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

walk-site

v0.0.1

Published

Walk URLs

Downloads

67

Readme

walk-site (experimental)

A minimal, Puppeteer-based link crawler with concurrency, depth-limiting, extension filtering, and flexible callbacks.

Installation

npm i walk-site

Quick start

import { walkSite } from 'walk-site'

const targetURL = new URL('https://example.com')

await walkSite(targetURL, {
  // Only visit links on the same domain
  onURL: (url) => url.hostname === targetURL.hostname,
  onPage: (page) => {
    console.log('Page title:', page.title)
    console.log('Page content:', page.content)
  },
})

Examples

With depth limit

import { walkSite } from 'walk-site'

const targetURL = new URL('https://example.com')

await walkSite(targetURL, {
  // Visit the initial page and its direct links
  depth: 1,
  onURL: (url) => url.hostname === targetURL.hostname,
  onPage: (page) => {
    console.log('Page title:', page.title)
    console.log('Depth:', page.depth)
  },
})

With concurrency

import { walkSite } from 'walk-site'

const targetURL = new URL('https://example.com')

await walkSite(targetURL, {
  // Visit up to 5 pages concurrently
  concurrency: 5,
  onURL: (url) => url.hostname === targetURL.hostname,
  onPage: (page) => {
    console.log('Page title:', page.title)
  },
})

API Reference

walkSite(targetURL, options)

Crawls links starting from targetURL. Returns a Promise that resolves once all pages have been processed or fails on internal errors (unless caught by onError).

Parameters

  • targetURL: string | URL The starting URL to crawl.

  • options: WalkSiteOptions Configuration object:

    | Option | Type | Description | | ----------------- | -------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | depth | number \| undefined | Limits crawl depth. depth = 0 visits only targetURL; depth = 1 includes its children, etc. Default is undefined (no limit). | | concurrency | number \| undefined | Number of pages processed in parallel. Defaults to 1 (serial crawling). | | onURL | (url: URL, meta: { href: string; depth: number }) => boolean \| void \| Promise<boolean \| void> | Called before enqueuing a link. Return false to skip. | | onPage | (page: Page) => void \| Promise<void> | Called after navigating to a page. Can be used to extract or process HTML content. | | onError | (error: unknown, url: URL) => void \| Promise<void> \| undefined | Called on errors (e.g., non-2xx HTTP status if you handle it that way, network errors, etc.). If not provided, errors are logged to console.error. | | extensions | string[] \| null \| undefined | File extensions recognized as HTML. Defaults to [".html", ".htm"]. If null, all links are followed. If you pass your own array, it completely overrides the default. |

Returns

  • Promise<void> Resolves when the entire crawl finishes (or rejects on internal errors, unless you handle them in onError).

OnURL Type

type OnURL = (
  url: URL,
  metadata: { href: string; depth: number },
) => boolean | void | Promise<boolean | void>
  • Return false to skip crawling url.
  • All other return values will include it in the crawl queue.

Page Type

type Page = {
  title: string
  url: URL
  href: string
  content: string
  depth: number
  ok: boolean
  status: number
}
  • title: <title> of the page.
  • url: The final URL as a URL object.
  • href: String form of the link from which we arrived here.
  • content: Inner HTML (page.content()).
  • depth: Depth relative to the starting URL.
  • ok: true if HTTP status was in the 200 range.
  • status: Numeric HTTP status code.