@vascosantos/ipfs-unixfs-importer

v0.0.12

Published

3 years ago

JavaScript implementation of the UnixFs importer used by IPFS

Downloads

1,418

0High
0Medium
0Low

vascosantos

IPFS

ipfs-unixfs-importer

JavaScript implementation of the layout and chunking mechanisms used by IPFS to handle Files

Lead Maintainer

Alex Potsides

Install

> npm install ipfs-unixfs-importer

Usage

Example

Let's create a little directory to import:

> cd /tmp
> mkdir foo
> echo 'hello' > foo/bar
> echo 'world' > foo/quux

And write the importing logic:

const { importer } = require('ipfs-unixfs-importer')

// Import path /tmp/foo/bar
const source = [{
  path: '/tmp/foo/bar',
  content: fs.createReadStream(file)
}, {
  path: '/tmp/foo/quxx',
  content: fs.createReadStream(file2)
}]

// You need to create and pass an ipld-resolve instance
// https://github.com/ipld/js-ipld-resolver
for await (const entry of importer(source, ipld, options)) {
  console.info(entry)
}

When run, metadata about DAGNodes in the created tree is printed until the root:

{
  cid: CID, // see https://github.com/multiformats/js-cid
  path: 'tmp/foo/bar',
  unixfs: UnixFS // see https://github.com/ipfs/js-ipfs-unixfs
}
{
  cid: CID, // see https://github.com/multiformats/js-cid
  path: 'tmp/foo/quxx',
  unixfs: UnixFS // see https://github.com/ipfs/js-ipfs-unixfs
}
{
  cid: CID, // see https://github.com/multiformats/js-cid
  path: 'tmp/foo',
  unixfs: UnixFS // see https://github.com/ipfs/js-ipfs-unixfs
}
{
  cid: CID, // see https://github.com/multiformats/js-cid
  path: 'tmp',
  unixfs: UnixFS // see https://github.com/ipfs/js-ipfs-unixfs
}

API

const { importer } = require('ipfs-unixfs-importer')

const stream = importer(source, ipld [, options])

The importer function returns an async iterator takes a source async iterator that yields objects of the form:

{
  path: 'a name',
  content: (Buffer or iterator emitting Buffers),
  mtime: (Number representing seconds since (positive) or before (negative) the Unix Epoch),
  mode: (Number representing ugo-rwx, setuid, setguid and sticky bit)
}

stream will output file info objects as files get stored in IPFS. When stats on a node are emitted they are guaranteed to have been written.

ipld is an instance of the IPLD Resolver

The input's file paths and directory structure will be preserved in the dag-pb created nodes.

options is an JavaScript option that might include the following keys:

wrapWithDirectory (boolean, defaults to false): if true, a wrapping node will be created
shardSplitThreshold (positive integer, defaults to 1000): the number of directory entries above which we decide to use a sharding directory builder (instead of the default flat one)
chunker (string, defaults to "fixed"): the chunking strategy. Supports:
- fixed
- rabin
avgChunkSize (positive integer, defaults to 262144): the average chunk size (rabin chunker only)
minChunkSize (positive integer): the minimum chunk size (rabin chunker only)
maxChunkSize (positive integer, defaults to 262144): the maximum chunk size
strategy (string, defaults to "balanced"): the DAG builder strategy name. Supports:
- flat: flat list of chunks
- balanced: builds a balanced tree
- trickle: builds a trickle tree
maxChildrenPerNode (positive integer, defaults to 174): the maximum children per node for the balanced and trickle DAG builder strategies
layerRepeat (positive integer, defaults to 4): (only applicable to the trickle DAG builder strategy). The maximum repetition of parent nodes for each layer of the tree.
reduceSingleLeafToSelf (boolean, defaults to true): optimization for, when reducing a set of nodes with one node, reduce it to that node.
hamtHashFn (async function(string) Buffer): a function that hashes file names to create HAMT shards
hamtBucketBits (positive integer, defaults to 8): the number of bits at each bucket of the HAMT
progress (function): a function that will be called with the byte length of chunks as a file is added to ipfs.
onlyHash (boolean, defaults to false): Only chunk and hash - do not write to disk
hashAlg (string): multihash hashing algorithm to use
cidVersion (integer, default 0): the CID version to use when storing the data (storage keys are based on the CID, including it's version)
rawLeaves (boolean, defaults to false): When a file would span multiple DAGNodes, if this is true the leaf nodes will not be wrapped in UnixFS protobufs and will instead contain the raw file bytes
leafType (string, defaults to 'file') what type of UnixFS node leaves should be - can be 'file' or 'raw' (ignored when rawLeaves is true)
blockWriteConcurrency (positive integer, defaults to 10) How many blocks to hash and write to the block store concurrently. For small numbers of large files this should be high (e.g. 50).
fileImportConcurrency (number, defaults to 50) How many files to import concurrently. For large numbers of small files this should be high (e.g. 50).

Overriding internals

Several aspects of the importer are overridable by specifying functions as part of the options object with these keys:

chunkValidator (function): Optional function that supports the signature async function * (source, options)
- This function takes input from the content field of imported entries. It should transform them into Buffers, throwing an error if it cannot.
- It should yield Buffer objects constructed from the source or throw an Error
chunker (function): Optional function that supports the signature async function * (source, options) where source is an async generator and options is an options object
- It should yield Buffer objects.
bufferImporter (function): Optional function that supports the signature async function * (entry, ipld, options)
- This function should read Buffers from source and persist them using ipld.put or similar
- entry is the { path, content } entry, where entry.content is an async generator that yields Buffers
- It should yield functions that return a Promise that resolves to an object with the properties { cid, unixfs, size } where cid is a CID, unixfs is a UnixFS entry and size is a Number that represents the serialized size of the IPLD node that holds the buffer data.
- Values will be pulled from this generator in parallel - the amount of parallelisation is controlled by the blockWriteConcurrency option (default: 10)
dagBuilder (function): Optional function that supports the signature async function * (source, ipld, options)
- This function should read { path, content } entries from source and turn them into DAGs
- It should yield a function that returns a Promise that resolves to { cid, path, unixfs, node } where cid is a CID, path is a string, unixfs is a UnixFS entry and node is a DAGNode.
- Values will be pulled from this generator in parallel - the amount of parallelisation is controlled by the fileImportConcurrency option (default: 50)
treeBuilder (function): Optional function that supports the signature async function * (source, ipld, options)
- This function should read { cid, path, unixfs, node } entries from source and place them in a directory structure
- It should yield an object with the properties { cid, path, unixfs, size } where cid is a CID, path is a string, unixfs is a UnixFS entry and size is a Number.

Contribute

Feel free to join in. All welcome. Open an issue!

This repository falls under the IPFS Code of Conduct.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme