kpipe-streams
v0.10.2
Published
Multiplexed and Transform streams for kpipe pipelines
Downloads
4
Readme
kpipe-streams
Stream Types
| Name | Type | Reads | Writes | Notes |
|--|--|--|--|--|--|
| transform/gunzip
| duplex | buffers | | |
| transform/delineate
| duplex | buffers | lines | Split buffer on newlines |
| transform/jsonparse
| duplex | lines | objects | JSON => object |
| transform/compact
| duplex | objects | objects | Convert objects to arrays of values |
| transform/head
| duplex | lines | lines | Only process first N lines |
| transform/progress
| duplex | objectsbuffer | objectsbuffer | Emit .
to stderr for every 10k objects Emit .
to stderr for every 10k \n
(newlines) |
| transform/value
| duplex | objects | lines | Extract object value property as a string |
| transform/jsonstringify
| duplex | objects | lines | object => JSON |
| transform/lineate
| duplex | lines | buffers | Join lines with newlines |
| transform/gzip
| duplex | | buffers | |
Lines are strings which are terminated by newlines. Streams operating on lines are assumed to be in object mode. Each chunk is a string (with the terminating newline removed)
Structure of data
The main data format transferred by the datapipe functions are assumed to be streaming JSON. That is, the data are organized as parseable JSON objects which are contained in a single line (non-prettified). Newlines in the stream mark the end of each JSON object. In this way, data may be streamed line by line as a string when simply transferring data, but can be parsed into a object when required.
Though some stream functions are agnostic to the underlying structure of the stream, the interpretation of the stream as topic events assumes particular structure, which can be one of the following:
Object form
The event is serialized as a JSON object. If the object contains a property key
at the root, then its value is used as a the topic key for a produced event.
{"prop1":"value1","key":"1","prop2":"value1_detail","prop3":["a","nested","array"]}
Array form
The event is serialized as a JSON array. In this format, the values are distinguised by their order in the array. When this format is used, the first element of the array is assumed to be the topic key. (Un-keyed data should contain a null
in the first element position)
["1","value1","value1_detail",["a","nested","array"],{"a":"nested","b":"object"}]
Examples
A note about IO:
stdout
is reserved for transmission of data. All messaging, status, and progress output is emitted tostderr
// Read from S3 and write to stdout
require('stream').pipeline(
new (require('./reader))({
type: 's3',
region: 'us-east-1',
bucket: 'a-bucket'
})('path/to/object'),
new (require('./writer'))({ type: 'stdio' })(),
(err) => {
if (err) {
console.error(err)
process.exit(-1)
}
}
)