structurize
v1.4.1
Published
Text format hinting library for estimating the format of a given input string
Downloads
15
Maintainers
Readme
structurize
Text format hinting library for estimating the format of a given input string and generic parsing API. The use case is that you receive some input file from a user and need to estimate what format it's in and how to parse it.
Currently supports the following:
- tar, parsing with tar
- gzip, parsing with zlib
- json, parsing with jsonstream2
- csv, parsing with csv-parse
- tsv, parsing with csv-parse
- xlsx, parsing with xlsx
- querystring, parsing with qs-stream
- WebDistributionLog, parsing with wdl-stream
NOTE: it's generally a bad idea to "guess" the format of an input semi-structured text file, because there are endless edge cases. It's always advised to use an explicit format type from the user, using the file extension or otherwise. The library attempts to cover most obvious situations.
Usage
$ npm install structurize
var structurize = require( "structurize" );
var type = structurize.guess( '{"hello":"world"}' ); // returns "json"
Then you can use parser library to parse your data, or you can use the internal generic parser Transform stream for each format type:
var stream = structurize.parser( type ); // any of the supported text formats, like "json" or "csv"
fs.createReadStream( "somefile.unknown" )
.pipe( stream )
.on( "data", function ( obj ) {
// do something awesome with it
})
The
.parser()
function returns a parser stream from external libraries for each supported format type (listed above). However, these libraries are not a direct dependency ofstructurize
. You'll have to npm install them separately.
Wrapping it all together: for the use-case of type-guessing and parsing, this library includes a convenience Transform stream that does both:
fs.createReadStream( "somefile.unknown" )
.pipe( structurize() ) // guess the type based on the first N-bytes, and parse it.
.on( "data", function ( obj ) {
// ...
})
This structurize()
stream supports a helper function for modifying/filtering the parsed output before pushing it out by defining a mapper function:
structurize()
.map(function (d) {
d.name = "cookie";
return d; // or return nothing (undefined) to filter it out.
})
Multi
It's common to have multiple inputs with different formats, and wanting to parse all of them via a single stream, for example when reading a list of unidentified files. structurize.multi
is a helper stream supporting this use case:
m = structurize.multi()
m.write({ hello: "world" }) // already an object, left as is.
m.write('{"foo":"bar"}') // identified and parsed as a json
m.on( "data", console.log ) // => { "hello": "world" }\n{ "foo": "bar" }
Of course, structurize needs to be able to differentiate between the different files in the stream in order to direct them to different sub-parsers. You can tag your input buffers/strings with their names, like a filename, to avoid having all of the inputs stream though to a single parser:
var buf = new String('{"hello":"world"}')
buf.name = "filename1"
m.write(buf)
Finally, you may want to configure the individual sub-parsers with different options or mapper. The multi
stream fires a subparser
event whenever a new parser is created, along with its name and the parser object. You can configure this nested structurize
stream individually:
m.on("subparser", function( name, s ) {
s.map( ... )
})