parse-http-url
v0.3.2
Published
A url parser for http requests, compliant with RFC 7230
Downloads
9
Maintainers
Readme
parse-http-url
Another URL parser?
The core url
module is great for parsing generic URLs. Unfortunately, the URL of an HTTP request (formally called the request-target
), is not just a generic URL. It's a URL that must obey the requirements of the URL RFC 3986 as well as the HTTP RFC 7230.
The problems
The core http
module does not validate or sanitize req.url
.
The legacy url.parse()
function also allows illegal characters to appear.
The newer url.URL()
constructor will attempt to convert the input into a properly encoded URL with only legal characters. This is better for the general case, however, the official http spec states:
A recipient SHOULD NOT attempt to autocorrect and then process the request without a redirect, since the invalid request-line might be deliberately crafted to bypass security filters along the request chain.
This means a malformed URL should be treated as a violation of the http protocol. It's not something that should be accepted or autocorrected, and it's not something that higher-level code should ever have to worry about.
The severity
It's tempting to use the Robustness Principle as an argument for using the url.URL
constructor here. Normally, it can be acceptable to diverge from the spec if the result is harmless and beneficial. However, this is not one of those cases. The strictness of URL correctness exists in the spec explicity for security reasons, which should be non-negotiable—especially for a large and respected platform such as Node.js.
Adoption into core
Because of backwards compatibility, it's unlikely that the logic expressed in parse-http-url
will be incorporated into the core http
module. My recommendation is to either incorporate it into http2
, which is still considered experimental, or as an alternative function in the core url
module. These are just a few examples, but there are many paths forward.
How to use
The function takes a request object as input (not a URL string) because the http spec requires inspection of req.method
and req.headers.host
in order to properly interpret the URL of a request. If the function returns null
, the request should not be processed further—either destroy the connection or respond with Bad Request
.
If the request is valid, it will return an object with five properties: protocol
, hostname
, port
, pathname
, and search
. The first three properties are either non-empty strings or null
, and are mutually dependant. The path
property is always a non-empty string, and the search
property is always a possibly empty string.
If the first three properties are not null
, it means the request was in absolute-form
or a valid non-empty Host header was provided.
const result = parse(req);
if (result) {
// { protocol, hostname, port, pathname, search }
} else {
res.writeHead(400);
res.end();
}
Unexpected benefits
The goal of parse-http-url
was not to create a fast parser, but it turns out this implementation can be between 1.5–9x faster than the general-purpose parsers in core.
$ npm run benchmark
legacy url.parse() x 371,681 ops/sec ±0.88% (297996 samples)
whatwg new URL() x 58,766 ops/sec ±0.3% (118234 samples)
parse-http-url x 552,748 ops/sec ±0.54% (344809 samples)
Run the benchmark yourself with
npm run benchmark
.