npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@autogram/url-tools

v2.5.4

Published

Helper classes for validating, filtering, and normalizing URLs in bulk

Downloads

84

Readme

URL Tools

Processing, normalizing, and de-duplicating large piles of URLs can be a pain, particularly if you're trying to distinguish "real" unique URLs from the many variations that can appear in the wild. URLs with anchor links, query params in different orders, social sharing and analytics campaign cruft, accidental references to staging servers… You get the idea.

URL Tools is a helper library whose sole purpose is making that process just a little less frustrating. It consists of four major pieces:

  • ParsedUrl, a wrapper for the standard WHATWG URL class that mixes in the domain and subdomain parsing from tldts, and provides a convenience serializer that preserves the object's individual properties, rather that flattening it to an href the way URL.toJSON() does.
  • A collection set of helper functions for URL filtering and normalizing operations, including sorting querystring parameters, stripping social sharing cruft, remapping 'ww1', 'ww2', etc. subdomains to a single canonical one, identifying web vs. non-web URLs, flagging urls on public hosting like S3, and more.
  • NormalizedUrl, a version of ParsedUrl that applies one of those normalizer functions automatically on instantiation. Functions that accept a URL or ParsedURL as a parameter can use if (url instanceof NormalizedUrl) to determine whether a given incoming URL has already been normalized. Yay.
  • UrlSet, ParsedUrlSet, and NormalizedUrlSet, a trio of Set classes that store, de-duplicate, filter, and normalize piles of Urls in bulk. It's a bit fussy with bulk adding of relative URLs, but you can pass in a 'fallback base url' that helps in some circumstances.

Installation

npm install @autogram/url-tools

Usage

ParsedUrl and NormalizedUrl are meant to work as drop-in replacements for the built-in URL class.

import { ParsedUrl } from '@autogram/url-tools';

const p = new ParsedUrl('http://staging.foo.com');
console.log(p.domain);    // 'foo.com';
console.log(p.subdomain); // 'staging';

URL scrubbing and normalization

NormalizedUrl applies a given UrlMutator function after it parses incoming URLs; by default it applies the relatively aggressive UrlMutator.defaultNormalizer; it strips off 'www' subdomains, utm querystring parameters, authentication information, ports, common index pages like index.html and default.aspx, page anchors, and more. It also enforces lowercasing of hostnames, and alphabetizes all remaining queryString parameters.

Individual rules are broken out into discrete UrlMutator functions for easy composition of alternate rulesets, and any function that accepts a NormalizedUrl and returns a NormalizedUrl can be passed in to implement custom rules.

import { NormalizedUrl } from '@autogram/url-tools';

const url = 'http://www.mydomain.com:80/index.html?utm_campaign=foo&search=bar#footer';
const n = new NormalizedUrl(url);
console.log(n.href); // 'https://mydomain.com/?search.bar'

NormalizedUrl.normalizer = UrlMutators.stripPort;
const n2 = new NormalizedUrl(url);
console.log(n2.href); // 'http://www.mydomain.com/index.html?utm_campaign=foo&search=bar#footer'

const n3 = new NormalizedUrl(url, undefined, (url) => new NormalizedUrl('http://total-override.com'));
console.log(n3.href); // 'http://total-override.com'

Batch parsing and de-duplication

UrlSet is the simplest of the collection classes; toss URLs at it, and parsed URLs come out. Any that couldn't be parsed can be found the urlSet.unparsable property.

import { UrlSet } from '@autogram/url-tools';

const us = new UrlSet([
  'http://example.com',
  'https://127.0.0.1',
  'tel:1-800-555-1212',
  'definitely-not-a-url'
]);

for (url of us) {
  console.log(url.href);
}
// 'http://example.com', 'https://127.0.0.1', 'tel:1-800-555-1212'

console.log([...us.unparsable]); // ['definitely-not-a-url']

Filtered and Normalized URL Sets

Both ParsedUrlSet and NormalizedUrlSet can accept a UrlFilter function in their constructor options; incoming URLs rejected by that function are shunted to the Set's parsedUrlSet.rejected property and not added to the Set proper.

NormalizedUrlSet can rely rely on NormalizedUrl's aggressive defaults, or pass in a UrlMutator function to use as an override.

import { NormalizedUrlSet, UrlFilters, UrlMutators } from '@autogram/url-tools';

const options = {
  urlFilter: UrlFilters.isValidWebUrl,
  normalizer: (u) => UrlMutators.forceProtocol(u, 'https')
}
const ns = new NormalizedUrlSet([
  'http://example.com',
  'https://127.0.0.1',
  'tel:1-800-555-1212',
  'definitely-not-a-url'
], options);

for (n of ns) {
  console.log(n.href);
}
// 'http://example.com', 'https://127.0.0.1'

console.log([...ns.unparsable]); // ['definitely-not-a-url']
console.log([...ns.rejected]);   // ['tel:1-800-555-1212']