streaming-tarball

v1.0.3

Published

a year ago

Streaming interface for decoding tarballs on modern JavaScript runtimes

Downloads

153

0High
0Medium
0Low

shaunpersad

tar tarball streaming webstream stream cloudflare workers deno bun

streaming-tarball

Streaming interface for decoding tarballs on modern JavaScript runtimes (Cloudflare Workers, Deno, etc.).

Features

Extract tarballs of any size with no filesystem usage and low memory usage.
Handles many of the tar extensions (ustar, pax, gnu) to enable long file names, attributes, etc.
Works anywhere WebStreams are supported.

Installation

npm install streaming-tarball

Usage

Use the extract function to get a readable stream of tar objects. Each object contains a header and a body (if it's a file). You can get the full body of the file as text by calling obj.text().

import { extract } from 'streaming-tarball';

const response = await fetch('https://github.com/shaunpersad/streaming-tarball/archive/refs/heads/main.tar.gz');
const stream = response.body.pipeThrough(new DecompressionStream('gzip'));

for await (const obj of extract(stream)) {
  console.log(
    'name:', obj.header.name, 
    'type:', obj.header.type, 
    'size:', obj.header.size,
  );
  console.log('text body:', await obj.text());
}

The file bodies are actually binary streams, so we could've rewritten the above example using the obj.body stream like this:

import { extract } from 'streaming-tarball';

const response = await fetch('https://github.com/shaunpersad/streaming-tarball/archive/refs/heads/main.tar.gz');
const stream = response.body.pipeThrough(new DecompressionStream('gzip'));

for await (const { header, body } of extract(stream)) {
  console.log(
    'name:', header.name, 
    'type:', header.type, 
    'size:', header.size,
  );
  if (body) {
    const subStream = body.pipeThrough(new TextDecoderStream());
    let str = '';
    for await (const chunk of subStream) {
      str += chunk;
    }
    console.log('text body:', str);
  }
}

Because file bodies are sub-streams of the parent stream, you must consume them all in order for the parent stream to make progress. There's a discard helper function on the tar object to help you do that when you aren't otherwise using the body:

import { extract, TAR_OBJECT_TYPE_FILE } from 'streaming-tarball';

const response = await fetch('https://github.com/shaunpersad/streaming-tarball/archive/refs/heads/main.tar.gz');
const stream = response.body.pipeThrough(new DecompressionStream('gzip'));

for await (const obj of extract(stream)) {
  if (obj.header.type === TAR_OBJECT_TYPE_FILE && obj.header.size < 100_000) {
    console.log(
      'file found:', obj.header.name, 
      'text body:', await obj.text(),
    );
  } else {
    await obj.discard(); // consumes the unused body
  }
}

There are many other tar object types, which you can determine by comparing obj.header.type to the appropriate string value, which are conveniently exported for you:

import {
  TAR_OBJECT_TYPE_BLOCK_SPECIAL,
  TAR_OBJECT_TYPE_CHAR_SPECIAL,
  TAR_OBJECT_TYPE_CONTIGUOUS,
  TAR_OBJECT_TYPE_DIRECTORY,
  TAR_OBJECT_TYPE_FIFO,
  TAR_OBJECT_TYPE_FILE,
  TAR_OBJECT_TYPE_GNU_NEXT_LINK_NAME,
  TAR_OBJECT_TYPE_GNU_NEXT_NAME,
  TAR_OBJECT_TYPE_HARD_LINK,
  TAR_OBJECT_TYPE_PAX_GLOBAL,
  TAR_OBJECT_TYPE_PAX_NEXT,
  TAR_OBJECT_TYPE_SYM_LINK,
} from 'streaming-tarball';

Note however that you will never see the _GNU_ or _PAX_ object types in practice, because they are consumed and applied to the objects they are targeting.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

streaming-tarball

Features

Installation

Usage