env-hash
v1.1.0
Published
Produces a hash value representing the state of a typical Node/NPM environment
Downloads
14
Readme
env-hash
Produces a hash value representing the state of a typical Node/NPM environment.
By default, the hash generated is a product of the mtimes and content of selected
files and directories. By default, package.json
and node_modules
are used.
This package is used by unfort to produce hash values that namespace cached data. This enables unfort to aggressively cache data that will be invalidated automatically when the next generated hash differs from the previous.
Install
npm install --save env-hash
Example
import envHash from 'env-hash';
envHash().then(hash => {
console.log(hash);
// something like 3983761008_3107418173
});
Options
Env-hash accepts three options, root
, files
and directories
.
root
is the origin directory that is prepended to all relative paths. Defaults toprocess.cwd()
files
is an array of relative or absolute file paths. Defaults to['package.json']
directories
is an array of relative or absolute directory paths. Defaults to['node_modules']
Options can be specified in an object passed to env-hash:
import envHash from 'env-hash';
envHash({
// defaults
root: process.cwd(),
files: ['package.json'],
directories: ['node_modules']
}).then(hash => {
console.log(hash);
// something like 3983761008_3107418173
});
Background & trade-offs
A section from the original research and design docs of unfort:
We aggressively cache path resolution of external packages, but, as always, cache invalidation is a pain. To resolve if our cached data is still valid, we need a way to uniquely identify the state of the node_modules package tree.
The most accurate, but slowest, method would be to crawl node_modules and generate a hash from the file tree. This would work well on small code bases, but more typical package trees will introduce multiple seconds of overhead as the tree is crawled. While non-blocking IO would help to prevent unblock the event loop, crawling the tree will still consume most of libuv's thread pool, let alone blocking any code that depends on the result of the crawl.
A similar - and somewhat more performant - approach is to use the same mechanism that NPM uses to walk the tree, eg: recursively read the package.json, then look in node_modules for more modules, etc. This still has a fair measure of IO overhead though. It also requires you to introspect each package's package.json in some fashion, either hashing the contents, reading the version, or just stating the file.
The simplest - and most performant - solution would be to treat the root package.json as a canonical indicator, and simply hash its content. However, in practice this falls apart as NPM will install packages that are semantic version compatible, but that may not match the exact versions specified in package.json. Additionally, as NPM 3 builds the dependency tree non-deterministically, the state of the node_modules tree can't be relied upon without interrogating it.
A performant approach - that maintains some accuracy - is to do a shallow crawl of the node_modules directory's contents, and build a hash from each directory's names and mtimes. This works reasonably well in practice, but does depend on NPM not making any changes further up in the directory structure.
Mindful of both performance and accuracy requirements, we'll combine the package.json and shallow crawl approaches to produce a single hash which is then used to namespace cached data. This approach does add a bit of IO overhead, but it seems to work well enough for the purposes of rapidly detecting the state of an environment.