url-inspector-abhinavzspace
v2.3.3
Published
Get metadata about any url
Downloads
2
Maintainers
Readme
url-inspector
Get metadata about any URL.
Limited memory and network usage.
This is a node.js module.
It returns and normalizes information found in http headers or in the resource itself using exiftool (which knows almost everything about files but html), or a sax parser to read oembed, opengraph, twitter cards, schema.org attributes or standard html tags.
Both tools stop inspection when they gathered enough tags, or stop when a max number of bytes (depending on media type) have been downloaded.
A demo using this module is available, with url-inspector-daemon
url
url of the inspected resourcetitle
title of the resource, or filename, or last component of pathname with querydescription
optional longer description, without title in itsite
the name of the site, or the domain namemime
RFC 7231 mime type of the resource (defaults to Content-Type)
The inspected mime type could be more accurate than the http header.ext
the extension matching the mime type (not the file extension)type
what the resource represents
image, video, audio, link, file, embed, archivehtml
a canonical html representation of the full resource,
depending on the type and mime, could be img, a, video, audio, iframe tag.size
optional Content-Length; discarded when type is embedicon
optional link to the favicon of the sitewidth, height
optional dimensionsduration
optionalthumbnail
optional a URL to a thumbnail, could be a data-uri for embedded imagessource
optional a URL that can go in a 'src' attribute; for example a resource can be an html page representing an image type. The URL of the image itself would be stored here; same thing for audio, video, embed types.error
optional an http error code, or stringall
an object with all additional metadata that was found
Installation
npm install url-inspector
Add -g
switch to install the executable.
exiftool executable must be available:
- a package is available for debian/ubuntu: libimage-exiftool-perl and for fedora: perl-Image-ExifTool.
- Otherwise it is installable from http://owl.phy.queensu.ca/~phil/exiftool/
API
var inspector = require('url-inspector');
var opts = {
all: false, // return all available non-normalized metadata
ua: "Mozilla/5.0", // some oembed providers might not answer otherwise
nofavicon: false, // disable any favicon-related additional request
nosource: false, // disable any sub-source inspection for audio, video, image types
providers: [{ // an array of custom OEmbed providers, or path to a module exporting such an array
provider_name: "Custom OEmbed provider",
endpoints: [{
schemes: ["http:\/\/video\.com\/*"],
builder: function(urlObj, obj) {
// can see current obj and override arbitrary props
obj.embed = "custom embed url";
}
}]
}],
// new in version 2.3.0
file: true
};
inspector(url, opts, function(err, obj) {
});
// or simply
inspector(url, function(err, obj) {...});
Command-line client
inspector-url <url>
inspector-url <filepath>
Low resource usage
network:
- a maximum of several hundreds of kilobytes (depending on resource type) is downloaded but it is usually much less, depending on connection speed.
- inspection stops as soon as enough metadata is gathered
memory:
- html is inspected using a sax parser, without building a full DOM.
exiftool:
- runs using
streat
module, which keeps exiftool always open for performance
Since version 2.3.0, file:// protocol is supported through cli by default, or setting "file" flag to true (false by default) through api.
License
See LICENSE.
See also
https://github.com/kapouer/url-inspector-daemon
https://github.com/kapouer/node-streat