doc-sniff
v1.0.1
Published
A helper for combating incorrect content-type, aka a mime sniffing module for node.js
Downloads
15,498
Readme
doc-sniff
A helper for combating incorrect content-type, aka a mime sniffing module for node.js
Motivation
So you have made a http request and got back some headers and a response body, but you just don't know if that innocent Content-Type
header tells you what really goes on in its body
.
Enter doc-sniff
, a much simpler implementation of whatwg mime sniffing algorithm. Specifically for those responses that can't be easily distinguished via file extensions or magic numbers, eg. HTML, XML documents.
Install
npm install doc-sniff --save
Usage
var docsniff = require('doc-sniff');
var mime1 = docsniff(false, '<html></html>');
console.log(mime1); // text/html
var mime2 = docsniff('text/html', '<?xml version="1.0" encoding="UTF-8" ?><feed></feed>');
console.log(mime2); // application/atom+xml
var mime3 = docsniff('application/xml; charset=UTF-8', '<?xml version="1.0" encoding="UTF-8" ?><feed></feed>');
console.log(mime3); // application/xml
Currently this module will correct following mime:
- text/html
- text/xml
- text/plain
- application/xml
- application/rss+xml
- application/atom+xml
- application/rdf+xml
- application/octet-stream
It does not attempt to be overzealous at correcting subtypes; see example 3 above, if original mime is acceptable, it will not be replaced.
API
docsniff(type, body);
type
is the content-type header in responsebody
is the response body string- returns the sniffed content-type as string
Limits
The whatwg spec has a much more thorough algorithm and mime list for browser vendors, but on server-side, we are more interested in parsable documents and information extractions, if you encounter a use case not covered by this algorithm, please let us know on github issues.
Like any simple algorithm, this can easily be spoofed, so don't use it for validation, use it for mime sniffing incoming documents only.
(For better security: mime and mmmagic can handle most filetypes, but you still need XSS protections and content whitelist to safely serve content to users.)
License
MIT