npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

parse-http-url

v0.3.2

Published

A url parser for http requests, compliant with RFC 7230

Downloads

9

Readme

parse-http-url Build Status

Another URL parser?

The core url module is great for parsing generic URLs. Unfortunately, the URL of an HTTP request (formally called the request-target), is not just a generic URL. It's a URL that must obey the requirements of the URL RFC 3986 as well as the HTTP RFC 7230.

The problems

The core http module does not validate or sanitize req.url.

The legacy url.parse() function also allows illegal characters to appear.

The newer url.URL() constructor will attempt to convert the input into a properly encoded URL with only legal characters. This is better for the general case, however, the official http spec states:

A recipient SHOULD NOT attempt to autocorrect and then process the request without a redirect, since the invalid request-line might be deliberately crafted to bypass security filters along the request chain.

This means a malformed URL should be treated as a violation of the http protocol. It's not something that should be accepted or autocorrected, and it's not something that higher-level code should ever have to worry about.

The severity

It's tempting to use the Robustness Principle as an argument for using the url.URL constructor here. Normally, it can be acceptable to diverge from the spec if the result is harmless and beneficial. However, this is not one of those cases. The strictness of URL correctness exists in the spec explicity for security reasons, which should be non-negotiable—especially for a large and respected platform such as Node.js.

Adoption into core

Because of backwards compatibility, it's unlikely that the logic expressed in parse-http-url will be incorporated into the core http module. My recommendation is to either incorporate it into http2, which is still considered experimental, or as an alternative function in the core url module. These are just a few examples, but there are many paths forward.

How to use

The function takes a request object as input (not a URL string) because the http spec requires inspection of req.method and req.headers.host in order to properly interpret the URL of a request. If the function returns null, the request should not be processed further—either destroy the connection or respond with Bad Request.

If the request is valid, it will return an object with five properties: protocol, hostname, port, pathname, and search. The first three properties are either non-empty strings or null, and are mutually dependant. The path property is always a non-empty string, and the search property is always a possibly empty string.

If the first three properties are not null, it means the request was in absolute-form or a valid non-empty Host header was provided.

const result = parse(req);
if (result) {
  // { protocol, hostname, port, pathname, search }
} else {
  res.writeHead(400);
  res.end();
}

Unexpected benefits

The goal of parse-http-url was not to create a fast parser, but it turns out this implementation can be between 1.5–9x faster than the general-purpose parsers in core.

$ npm run benchmark
legacy url.parse() x 371,681 ops/sec ±0.88% (297996 samples)
whatwg new URL() x 58,766 ops/sec ±0.3% (118234 samples)
parse-http-url x 552,748 ops/sec ±0.54% (344809 samples)

Run the benchmark yourself with npm run benchmark.