parse-http-url

v0.3.2

Published

3 years ago

A url parser for http requests, compliant with RFC 7230

Downloads

0High
0Medium
0Low

joshuawise

url uri http https parser idna dns

parse-http-url

Another URL parser?

The core url module is great for parsing generic URLs. Unfortunately, the URL of an HTTP request (formally called the request-target), is not just a generic URL. It's a URL that must obey the requirements of the URL RFC 3986 as well as the HTTP RFC 7230.

The problems

The core http module does not validate or sanitize req.url.

The legacy url.parse() function also allows illegal characters to appear.

The newer url.URL() constructor will attempt to convert the input into a properly encoded URL with only legal characters. This is better for the general case, however, the official http spec states:

A recipient SHOULD NOT attempt to autocorrect and then process the request without a redirect, since the invalid request-line might be deliberately crafted to bypass security filters along the request chain.

This means a malformed URL should be treated as a violation of the http protocol. It's not something that should be accepted or autocorrected, and it's not something that higher-level code should ever have to worry about.

The severity

It's tempting to use the Robustness Principle as an argument for using the url.URL constructor here. Normally, it can be acceptable to diverge from the spec if the result is harmless and beneficial. However, this is not one of those cases. The strictness of URL correctness exists in the spec explicity for security reasons, which should be non-negotiable—especially for a large and respected platform such as Node.js.

Adoption into core

Because of backwards compatibility, it's unlikely that the logic expressed in parse-http-url will be incorporated into the core http module. My recommendation is to either incorporate it into http2, which is still considered experimental, or as an alternative function in the core url module. These are just a few examples, but there are many paths forward.

How to use

The function takes a request object as input (not a URL string) because the http spec requires inspection of req.method and req.headers.host in order to properly interpret the URL of a request. If the function returns null, the request should not be processed further—either destroy the connection or respond with Bad Request.

If the request is valid, it will return an object with five properties: protocol, hostname, port, pathname, and search. The first three properties are either non-empty strings or null, and are mutually dependant. The path property is always a non-empty string, and the search property is always a possibly empty string.

If the first three properties are not null, it means the request was in absolute-form or a valid non-empty Host header was provided.

const result = parse(req);
if (result) {
  // { protocol, hostname, port, pathname, search }
} else {
  res.writeHead(400);
  res.end();
}

Unexpected benefits

The goal of parse-http-url was not to create a fast parser, but it turns out this implementation can be between 1.5–9x faster than the general-purpose parsers in core.

$ npm run benchmark
legacy url.parse() x 371,681 ops/sec ±0.88% (297996 samples)
whatwg new URL() x 58,766 ops/sec ±0.3% (118234 samples)
parse-http-url x 552,748 ops/sec ±0.54% (344809 samples)

Run the benchmark yourself with npm run benchmark.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

parse-http-url

Another URL parser?

The problems

The severity

Adoption into core

How to use

Unexpected benefits