npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

url-inspector

v8.7.0

Published

Get metadata about any url

Downloads

210

Readme

url-inspector

Synopsys

npx url-inspector <url>

Description

Get normalized metadata about what a URL mainly represents.

This is a Node.js module.

Sources of information:

  • HTTP response headers
  • embedded tags in binary formats (using exiftool)
  • OpenGraph, Twitter Cards, schema.org, json+ld, title and meta tags in HTML pages
  • oEmbed endpoints
  • if a URL is mainly wrapping a media, that media might be inspected too

Inspection stops when enough information has been gathered, or when a maximum number of bytes (depending on media type) have been downloaded.

Format

  • url: url of the inspected resource
  • title: title of the resource, or filename, or last component of pathname with query
  • description: optional longer description, without title in it, and only the first line.
  • site: the name of the site, or the domain name
  • mime: RFC 7231 mime type of the resource (defaults to Content-Type) The inspected mime type could be more accurate than the http header.
  • ext: The file extension, only derived from the mime type. Safe to be used as file extension.
  • what: what the resource represents page, image, video, audio, file
  • type: how the resource is used: link, image, video, audio, embed. Example: if what:image and mime:text/html, and no html snippet is found, type will be 'link'.
  • html: the html representation of the resource, according to type and use.
  • script: url of a script that must be installed along with the html representation.
  • date (YYYY-MD-DD format) creation or modification date
  • author: optional credit, author (without the @ prefix and with _ replaced by spaces)
  • keywords: optional array of collected keywords (lowercased words that are not in title words).
  • size: optional Content-Length as integer; discarded when type is embed
  • icon: optional link to the favicon of the site
  • width, height: optional dimensions as integers
  • duration: optional hh:mm:ss string
  • thumbnail: optional a URL to a thumbnail, could be a data-uri for embedded images
  • source: optional a URL that can go in a 'src' attribute; for example a resource can be an html page representing an image type. The URL of the image itself would be stored here; same thing for audio, video, embed types.
  • error: optional an http error code, or string

Install

url-inspector currently requires those external libraries/tools:

  • exiftool
  • libcurl (and libcurl-dev if node-libcurl needs to be rebuilt)

Both programs are well-maintained, and available in most linux distributions.

Usage


import Inspector from 'url-inspector';

const opts = {
 ua: "Mozilla/5.0", // override ua, defaults to somewhat modern browser
 nofavicon: false, // disable additional requests to get a favicon
 nosource: false, // disable main embedded media sub-inspection
 file: true, // local files inspection is only enabled by default when using CLI
 meta: {} // user-entered metadata, to be merged and normalized
 providers: null // custom providers (module path or array)
};

const inspector = new Inspector(opts);

const obj = await inspector.look(url);

Inspector throws http-errors instances.

By default oembed providers are

  • found from a curated list of providers
  • found from a custom list, required from opts.providers
  • discovered in the inspected web pages

It is possible to add custom providers in the options, by passing an array or a path to a module exporting an array.

See src/custom-oembed-providers.js for examples.

To normalize an already existing metadata object, including url rewriting done by providers, and other changes in fields, do:

await inspector.norm(obj);

url-inspector uses node-libcurl to make http requests, and exposes it as:

const req = await Inspector.get(urlObj);

where req.abort() stops the request, req.res is the response stream, and res.statusCode, res.headers are available.

Proxy support

url-inspector configures http(s) proxies through proxy-from-env package and environment variables (http_proxy, https_proxy, all_proxy, no_proxy):

Read proxy-from-env documentation.

License

Open Source, see ./LICENSE.