indexed-tarball
v3.1.7
Published
a tarball with constant-time reads and modifications
Downloads
225
Keywords
Readme
indexed-tarball
a tarball with constant-time reads and modifications
A small extension to the tar archive format to support some additional features:
- Constant time random access reads
- Constant time writes (appends)
- Constant time deletions (truncation)
- Multi-file support
This is done by generating a special "index file" that is always appended to the end of the tar achive, which maps file paths within the archive to byte offsets.
Compatibility
Tarballs created with this module are still plain old tar files, and will work with existing utilities.
Usage
var Tarball = require('indexed-tarball')
var through = require('through2')
var tarball = new Tarball('file.tar')
var t = through()
var ws = tarball.append('hello.txt', done)
t.pipe(ws)
t.end('hello world')
function done ()
tarball.list(function (err, files) {
console.log('files', files)
tarball.read('hello.txt')
.on('data', function (buf) {
console.log('data', buf.toString())
})
})
})
outputs
files [ 'hello.txt' ]
data hello world
API
var Tarball = require('indexed-tarball')
var tarball = new Tarball('/path/to/file.tar'[, opts])
Creates or opens an indexed tarball. These are compatible with regular tarballs, so no special extension or archiving software is needed.
If opts.multifile
is set, further tarballs will be searched for an opened as well. If opts.maxFileSize
is set as well, this will be used to decide when to "overflow" to a new tarball. See the "Multi-file support" section below for more details. Defaults to 4 gigabytes.
var ws = tarball.append(filepath[, size], cb)
Returns a writable stream that will be appended to the end of the tarball.
A size
of the file may be included, if it is already known. This is used in
the multi-tarball case to anticipate when the file will become too large for
the filesystem and split it into a new tarball before writing. If omitted and
the appended file goes over the maximum file size for the filesystem, the
operation will fail and may result in corruption.
cb
is called when the write has been completely persisted to disk.
var rs = tarball.read(filepath)
Returns a readable stream of the data within the archive named by filepath
. If
the file doesn't exist in the archive, the stream rs
will emit an error err
with err.notFound
set to true
.
tarball.pop([filepath, ]cb)
Truncates the syncfile such that the last file of the archive is dropped. cb
is called once the change is persisted to disk.
A filepath
can optionally be passed in, which will cause an error to be returned if the to-be-popped file does not match filepath
, as a sanity check.
var rs = tarball.read(filepath)
Returns a readable stream of the file at filepath
.
tarball.list(cb)
Calls cb
with a list of the paths and metadata (byte offsets) of the files within the archive.
tarball.userdata([data, ]cb)
Retrieves or sets the current userdata for the tarball.
indexed-tarball already stores an index in the tarball itself, so you can store arbitrary user data here as well if you'd like.
If data
is given, the object is JSON encoded and stored in the tarball as well. If only cb
is given, the current userdata will be retrieved.
Install
With npm installed, run
$ npm install indexed-tarball
Multi-file support
How does it work?
Once a file (e.g. file.tar
) reaches opts.maxFileSize
or 4 gigabytes (default), the next file appended will be written to file.tar.1
. Once it fills, file.tar.2
, and so forth. Each tarball has its own index file, which are unioned (think set theory) together to allow all files across all tarballs be read and listed without any file scanning.
Caveats?
If there are multiple files with the same name across the multiple tarballs, the file that comes latest in the tarball set wins; the earlier one(s) are ignored. (e.g. if foo.tar.3
and foo.tar.7
both contain a file with path bar/bax/quux.txt
, the one from foo.tar.7
will always be returned & used.
Also, currently new appends are always made to the final tarball in the set. So if you wrote a lot of files and ended up with file.tar
and file.tar.1
, and then pop
d all of the files until none were left, future append
s would go to file.tar.1
, not file.tar
. Fixing this is a TODO.
License
MIT