streamsplit
v2.0.0
Published
Memory/CPU efficient stream splitter by token
Downloads
108
Maintainers
Readme
streamsplit
streamsplit. is a Reactive Programming (rx) module which provides a straight-forward and both memory & CPU efficient way to split readable streams into chunks of data.
This module was built with the main goal of splitting a large MySQL dump .sql
file into multiple files (one per each table). ( npm -> mysqldumpslit
)
Thanks to the great streamsearch library which uses the Boyer-Moore-Horspool algorithm (the same used in the Unix grep
command) the token search is blazing fast and have a very low memory footprint.
This library contains several splitters that will emit
an item each time a chunk between the split token is found.
The Module is written in ES6 transpiled using the wonderful Babel library.
Requirements
- node.js - v0.12 or newer!
Installation
npm install streamsplit
Basic Usage
Base splitter
import StreamSplit from 'streamsplit';
const readStream = getReadableStreamSomehow();
const splitToken = '\n';
StreamSplit.split({
stream: readStream,
token: splitToken
}).subscribe(
range => {
console.log(`start: ${range.start}, end: ${range.end}`);
}
);
Assuming readStream
contains a\nbb\nc
, the code above will output:
start: 0, end: 1
start: 2, end: 4
start: 5, end: 6
Buffer splitter
The Buffer splitter uses the Base splitter but instead of emitting ranges of start+end it will provide a buffer for each chunk. Using it is pretty straight forward:
import toBuffSplit from 'streamsplit/dist/splitters/tobuffer';
const readStream = getReadableStreamSomehow();
const splitToken = '\n';
toBuffSplit({
stream: readStream,
token: splitToken
}).subscribe(
buf => {
console.log(`Buf: '${buf.toString('utf8')}'`);
}
);
Assuming readStream
contains a\nbb\nc
, the code above will output:
Buf: 'a'
Buf: 'bb'
Buf: 'c'
Note: The buffer splitter is memory efficient but if you expect large chunk of data between every split token avoid using it :)
JSON Splitter
Sometimes you have a stream of data which contains several JSON items. for example, if you've a stream "containing":
{"iam": "great"}
{"and": "you"}
{"know": "it"}
Aka a stream containing a json per each line; you could parse it using:
import toJsonSplit from 'streamsplit/dist/splitters/tojson';
const readStream = getReadableStreamSomehow();
const splitToken = '\n';
toJsonSplit({
stream: readStream,
token: splitToken
}).subscribe(
obj => {
console.log(`Obj: ',obj);
}
);
And you'll get it parsed for you.
Need something different?
Thanks to the powerful Rx operators you can manipulate the data the way you prefer.
Lets say you've a very magical stream of data consisting in base64 encoded PNGs separated by a fictional (but magical) separator -LOL-LIMEWIRE+NYAN-CAT-
, you could do the following to get the original image data back
import toBuffSplit from 'streamsplit/dist/splitters/tobuffer';
const readStream = getReadableStreamSomehow();
const splitToken = '-LOL-LIMEWIRE+NYAN-CAT-';
toBuffSplit({
stream: readStream,
token: splitToken
})
.map(base64Buf => new Buffer(base64Buf, 'base64'))
.subscribe(
imageData => {
// imageData contains a single image contente buffer.
}
);
Lets say youve a stream of mixed data separated by ";". Every 'piece' might contain:
- a number
- a string
Ex: '10;lol;haha;13;12;love you!;....
And you want to fetch only the even numbers; then:
import toBuffSplit from 'streamsplit/dist/splitters/tobuffer';
const readStream = getReadableStreamSomehow();
toBuffSplit({
stream: readStream,
token: ';'
})
.map(buf => buf.toString('utf8')) // convert buf to string!
.filter(item => !isNaN(item)) // filter out strings!
.map(parseFloat) // remap item from "string representation of a number" to Number
.filter(n => n % 2 == 0) // filter out odd numbers.
.subscribe(
evenNumber => {
// ..
}
);
If you're unfamiliar with the RX world I suggest you take a look at Getting started with Rx