npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@stdlib/utils-dsv-base-parse

v0.2.2

Published

Incremental parser for delimiter-separated values (DSV).

Downloads

49

Readme

DSV Parser

NPM version Build Status Coverage Status

Incremental parser for delimiter-separated values (DSV).

Installation

npm install @stdlib/utils-dsv-base-parse

Usage

var Parser = require( '@stdlib/utils-dsv-base-parse' );

Parser( [options] )

Returns an incremental parser for delimiter-separated values (DSV).

var parse = new Parser();

// Parse a line of comma-separated values (CSV):
parse.next( '1,2,3,4\r\n' ); // => [ '1', '2', '3', '4' ]

// ...

// Parse multiple lines of CSV:
parse.next( '4,5,6\r\n7,8,9\r\n' ); // => [ '4', '5', '6' ], [ '7', '8', '9' ]

// ...

// Parse partial lines:
parse.next( 'a,b' );
parse.next( ',c,d\r\n' ); // => [ 'a', 'b', 'c', 'd' ]

// ...

// Chain together invocations:
parse.next( 'e,f' ).next( ',g,h' ).next( '\r\n' ); // => [ 'e', 'f', 'g', 'h' ]

The constructor accepts the following options:

  • comment: character sequence appearing at the beginning of a row which demarcates that the row content should be parsed as a commented line. A commented line ends upon encountering the first newline character sequence, regardless of whether that newline character sequence is preceded by an escape character sequence. Default: ''.

  • delimiter: character sequence separating record fields (e.g., ',' for comma-separated values (CSV) and \t for tab-separated values (TSV)). Default: ','.

  • doublequote: boolean flag indicating how quote sequences should be escaped within a quoted field. When true, a quote sequence must be escaped by another quote sequence. When false, a quote sequence must be escaped by the escape sequence. Default: true.

  • escape: character sequence for escaping character sequences having special meaning (i.e., the delimiter, newline, and escape sequences outside of quoted fields; the comment sequence at the beginning of a record and outside of a quoted field; and the quote sequence inside a quoted field when doublequote is false). Default: ''.

  • ltrim: boolean indicating whether to trim leading whitespace from field values. If false, the parser does not trim leading whitespace (e.g., a, b, c parses as [ 'a', ' b', ' c' ]). If true, the parser trims leading whitespace (e.g., a, b, c parses as [ 'a', 'b', 'c' ]). Default: false.

  • maxRows: maximum number of records to process (excluding skipped lines). By default, the maximum number of records is unlimited.

  • newline: character sequence separating rows. Default: '\r\n' (see RFC 4180).

  • onClose: callback to be invoked upon closing the parser. If a parser has partially processed a record upon close, the callback is invoked with the following arguments:

    • value: unparsed partially processed field text.

    Otherwise, the callback is invoked without any arguments.

  • onColumn: callback to be invoked upon processing a field. The callback is invoked with the following arguments:

    • field: field value.
    • row: row number (zero-based).
    • col: field (column) number (zero-based).
    • line: line number (zero-based).
  • onComment: callback to be invoked upon processing a commented line. The callback is invoked with the following arguments:

    • comment: comment text.
    • line: line number (zero-based).
  • onError: callback to be invoked upon encountering an unrecoverable parse error. By default, upon encountering a parse error, the parser throws an Error. When provided an error callback, the parser does not throw and, instead, invokes the provided callback. The callback is invoked with the following arguments:

    • error: an Error object.
  • onRow: callback to be invoked upon processing a record. The callback is invoked with the following arguments:

    • record: an array-like object containing field values. If provided a rowBuffer, the record argument will be the same array-like object for each invocation.
    • row: row number (zero-based).
    • ncols: number of fields (columns).
    • line: line number (zero-based).

    If a parser is closed before fully processing the last record, the callback is invoked with field data for all fields which have been parsed. Any remaining field data is provided to the onClose callback. For example, if a parser has processed two fields and closes while attempting to process a third field, the parser invokes the onRow callback with field data for the first two fields and invokes the onClose callback with the partially processed data for the third field.

  • onSkip: callback to be invoked upon processing a skipped line. The callback is invoked with the following arguments:

    • record: unparsed record text.
    • line: line number (zero-based).
  • onWarn: when strict is false, a callback to be invoked upon encountering invalid DSV. The callback is invoked with the following arguments:

    • error: an Error object.
  • quote: character sequence demarcating the beginning and ending of a quoted field. When quoting is false, a quote character sequence has no special meaning and is processed as normal text. Default: '"'.

  • quoting: boolean flag indicating whether to enable special processing of quote character sequences (i.e., when a quote sequence should demarcate a quoted field). Default: true.

  • rowBuffer: array-like object for the storing field values of the most recently processed record. When provided, the row buffer is reused and is provided to the onRow callback for each processed record. If a provided row buffer is a generic array, the parser grows the buffer as needed. If a provided row buffer is a typed array, the buffer size is fixed, and, thus, needs to be large enough to accommodate processed fields. Providing a fixed length array is appropriate when the number of fields is known prior to parsing. When the number of fields is unknown, providing a fixed length array may still be appropriate; however, one is advised to allocate a buffer having more elements than is reasonably expected in order to avoid buffer overflow.

  • rtrim: boolean indicating whether to trim trailing whitespace from field values. If false, the parser does not trim trailing whitespace (e.g., a ,b ,c parses as [ 'a ', 'b ', 'c' ]). If true, the parser trims trailing whitespace (e.g., a ,b ,c parses as [ 'a', 'b', 'c' ]). Default: false.

  • skip: character sequence appearing at the beginning of a row which demarcates that the row content should be parsed as a skipped record. Default: ''.

  • skipBlankRows: boolean flag indicating whether to skip over rows which are either empty or containing only whitespace. Default: false.

  • skipRow: callback whose return value indicates whether to skip over a row. The callback is invoked with the following arguments:

    • nrows: number of processed rows (equivalent to the current row number).
    • line: line number (zero-based).

    If the callback returns a truthy value, the parser skips the row; otherwise, the parser attempts to process the row.

    Note, however, that, even if the callback returns a falsy value, a row may still be skipped depending on the presence of a skip character sequence.

  • strict: boolean flag indicating whether to raise an exception upon encountering invalid DSV. When false, instead of throwing an Error or invoking the onError callback, the parser invokes an onWarn callback with an Error object specifying the encountered error. Default: true.

  • trimComment: boolean flag indicating whether to trim leading whitespace in commented lines. Default: true.

  • whitespace: list of characters to be interpreted as whitespace. Default: [ ' ' ].

The parser does not perform field conversion/transformation and, instead, is solely responsible for incrementally identifying fields and records. Further processing of fields/records is the responsibility of parser consumers who are generally expected to provide either an onColumn callback, an onRow callback, or both.

var format = require( '@stdlib/string-format' );

function onColumn( field, row, col ) {
    console.log( format( 'Row: %d. Column: %d. Value: %s', row, col, field ) );
}

function onRow( record, row, ncols ) {
    console.log( format( 'Row: %d. nFields: %d. Value: | %s |', row, ncols, record.join( ' | ' ) ) );
}

var opts = {
    'onColumn': onColumn,
    'onRow': onRow
};
var parse = new Parser( opts );

parse.next( '1,2,3,4\r\n' ); // => [ '1', '2', '3', '4' ]
parse.next( '5,6,7,8\r\n' ); // => [ '5', '6', '7', '8' ]

// ...

Upon closing the parser, the parser invokes an onClose callback with any partially processed (i.e., incomplete) field data. Note, however, that the field data may not equal the original character sequence, as escape sequences may have already been removed.

var format = require( '@stdlib/string-format' );

function onClose( v ) {
    console.log( format( 'Incomplete: %s', v ) );
}

var opts = {
    'onClose': onClose
};
var parse = new Parser( opts );

parse.next( '1,2,3,4\r\n' ); // => [ '1', '2', '3', '4' ]

// ...

// Provide an incomplete record:
parse.next( '5,6,"foo' );

// Close the parser:
parse.close();

By default, the parser assumes RFC 4180-compliant newline-delimited comma separated values (CSV). To specify alternative separators, specify the relevant options.

var opts = {
    'delimiter': '--',
    'newline': '%%'
};
var parse = new Parser( opts );

parse.next( '1--2--3--4%%' ); // => [ '1', '2', '3', '4' ]
parse.next( '5--6--7--8%%' ); // => [ '5', '6', '7', '8' ]

// ...

By default, the parser escapes double (i.e., two consecutive) quote character sequences within quoted fields. To parse DSV in which quote character sequences are escaped by an escape character sequence within quoted fields, set doublequote to false and specify the escape character sequence.

// Default parser:
var parse = new Parser();

// Parse DSV using double quoting:
parse.next( '1,"""2""",3,4\r\n' ); // => [ '1', '"2"', '3', '4' ]

// ...

// Create a parser which uses a custom escape sequence within quoted fields:
var opts = {
    'doublequote': false,
    'escape': '\\'
};
parse = new Parser( opts );

parse.next( '1,"\\"2\\"",3,4\r\n' ); // => [ '1', '"2"', '3', '4' ]

When quoting is true, the parser identifies a quote character sequence at the beginning of a field as the start of a quoted field. To process quote character sequences as normal field text, set quoting to false.

// Default parser;
var parse = new Parser();

parse.next( '1,"2",3,4\r\n' ); // => [ '1', '2', '3', '4' ]

// ...

// Create a parser which treats quote sequences as normal field text:
var opts = {
    'quoting': false
};
parse = new Parser( opts );

parse.next( '1,"2",3,4\r\n' ); // => [ '1', '"2"', '3', '4' ]

To parse DSV containing commented lines, specify a comment character sequence which demarcates the beginning of a commented line.

var opts = {
    'comment': '#'
};
var parse = new Parser( opts );

parse.next( '1,2,3,4\r\n' ); // => [ '1', '2', '3', '4' ]
parse.next( '# This is a commented line.\r\n' ); // comment
parse.next( '9,10,11,12\r\n' ); // => [ '9', '10', '11', '12' ]

To parse DSV containing skipped lines, specify a skip character sequence which demarcates the beginning of a skipped line.

var opts = {
    'skip': '//'
};
var parse = new Parser( opts );

parse.next( '1,2,3,4\r\n' ); // => [ '1', '2', '3', '4' ]
parse.next( '//5,6,7,8\r\n' ); // skipped line
parse.next( '9,10,11,12\r\n' ); // => [ '9', '10', '11', '12' ]

Properties

Parser.prototype.done

Read-only property indicating whether a parser is able to process new chunks.

var parse = new Parser();

parse.next( '1,2,3,4\r\n' );

// ...

var b = parse.done;
// returns false

// ...

parse.close();

// ...

b = parse.done;
// returns true

Methods

Parser.prototype.next( chunk )

Incrementally parses the next chunk.

var parse = new Parser();

parse.next( '1,2,3,4\r\n' );

// ...

parse.next( '5,6,7,8\r\n' );

// ...

Parser.prototype.close()

Closes the parser.

var parse = new Parser();

parse.next( '1,2,3,4\r\n' );

// ...

parse.next( '5,6,7,8\r\n' );

// ...

parse.close();

After closing a parser, a parser raises an exception upon receiving any additional chunks.


Notes

  • Special character sequences (i.e., delimiter, newline, quote, escape, skip, and comment sequences) must all be unique with respect to one another, and no special character sequence is allowed to be a subsequence of another special character sequence. Allowing common subsequences would lead to ambiguous parser states.

    For example, given the chunk 1,,3,4,,, if delimiter is ',' and newline is ',,', is the first ,, a field with no content or a newline? The parser cannot be certain, hence the prohibition.

  • As specified in RFC 4180, special character sequences must be consistent across all provided chunks. Hence, providing chunks in which, e.g., line breaks vary between \r, \n, and \r\n is not supported.


Examples

var format = require( '@stdlib/string-format' );
var Parser = require( '@stdlib/utils-dsv-base-parse' );

function onColumn( v, row, col ) {
    console.log( format( 'Row: %d. Column: %d. Value: %s', row, col, v ) );
}

function onRow( v, row, ncols ) {
    console.log( format( 'Row: %d. nFields: %d. Value: | %s |', row, ncols, v.join( ' | ' ) ) );
}

function onComment( str ) {
    console.log( format( 'Comment: %s', str ) );
}

function onSkip( str ) {
    console.log( format( 'Skipped line: %s', str ) );
}

function onWarn( err ) {
    console.log( format( 'Warning: %s', err.message ) );
}

function onError( err ) {
    console.log( format( 'Error: %s', err.message ) );
}

function onClose( v ) {
    console.log( format( 'End: %s', v || '(none)' ) );
}

var opts = {
    'strict': false,
    'newline': '\r\n',
    'delimiter': ',',
    'escape': '\\',
    'comment': '#',
    'skip': '//',
    'doublequote': true,
    'quoting': true,
    'onColumn': onColumn,
    'onRow': onRow,
    'onComment': onComment,
    'onSkip': onSkip,
    'onError': onError,
    'onWarn': onWarn,
    'onClose': onClose
};
var parse = new Parser( opts );

var str = [
    [ '1', '2', '3', '4' ],
    [ '5', '6', '7', '8' ],
    [ 'foo\\,', 'bar\\ ,', 'beep\\,', 'boop\\,' ],
    [ '""",1,"""', '""",2,"""', '""",3,"""', '""",4,"""' ],
    [ '# This is a "comment", including with commas.' ],
    [ '\\# Escaped comment', '# 2', '# 3', '# 4' ],
    [ '1', '2', '3', '4' ],
    [ '//A,Skipped,Line,!!!' ],
    [ '"foo"', '"bar\\ "', '"beep"', '"boop"' ],
    [ ' # 😃', ' # 🥳', ' # 😮', ' # 🤠' ]
];
var i;
for ( i = 0; i < str.length; i++ ) {
    str[ i ] = str[ i ].join( opts.delimiter );
}
str = str.join( opts.newline );

console.log( format( 'Input:\n\n%s\n', str ) );
parse.next( str ).close();

Notice

This package is part of stdlib, a standard library for JavaScript and Node.js, with an emphasis on numerical and scientific computing. The library provides a collection of robust, high performance libraries for mathematics, statistics, streams, utilities, and more.

For more information on the project, filing bug reports and feature requests, and guidance on how to develop stdlib, see the main project repository.

Community

Chat


License

See LICENSE.

Copyright

Copyright © 2016-2024. The Stdlib Authors.