npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

xlsx-reader-tmp

v1.1.1

Published

Memory efficinet non-blocking/event based streaming XLSX reader, uses /tmp or custom folder store files while reading

Downloads

1

Readme

xlsx-stream-reader

JavaScript Style Guide

======

This is a fork of https://github.com/daspawn/xlsx-stream-reader#readme

I upgraded some packages and added extra config options to allow you to choose your tmp folder and ignore sheets based on their number.

this is usefull for AWS LAMBDA where /tmp only has 512mb, so you can mount an EFS volume to /mnt/tmp and use that directory with unlimited space.

Memory efficinet minimalist streaming XLSX reader that can handle piped streams as input. Events are emmited while reading the stream.

Example

More examples can be found if example folder

    var stream = new XlsxStreamReader({
        verbose: false,
        formatting: false
    })

Options

|Key|Default Value|Description| |---|---|---| |verbose|true|throw additional exceptions, if false - then pass empty string in that places| |formatting|true|should cells with combined formats be formatted or not| |saxTrim|true|whether or not to trim text and comment nodes| |tmpOptions|{}|an options object to be passed to the "tmp" package, eg. to use a folder other than OS tmp directory { tmpdir: "/mnt/tmp" } | |sheetNumbers|[]|Array with the sheet number to process. This prevents writing to /tmp the sheets what are not needed. Eg. if [1,4] only events for sheets 1 and 4 will be returned. Use only if you're sure the xlsx file will have its internal naming compatible with the order of the sheets. This is not guaranteed by xlsx spec but seems to be followed by excel and most libs that write xlsx, eg. sheet1.xml might represent sheet 3 but 99.99% of the time it will be sheet 1. | |enableCompression|false|If true it used gzip when writing the sheets to /tmp its unclear if this improves or worsens performance, it definaetly requires more cpu/memory, but also less IO to disk.|

const XlsxStreamReader = require("xlsx-stream-reader");

var workBookReader = new XlsxStreamReader();
workBookReader.on('error', function (error) {
    throw(error);
});
workBookReader.on('sharedStrings', function () {
    // do not need to do anything with these, 
    // cached and used when processing worksheets
    console.log(workBookReader.workBookSharedStrings);
});

workBookReader.on('styles', function () {
    // do not need to do anything with these
    // but not currently handled in any other way
    console.log(workBookReader.workBookStyles);
});

workBookReader.on('worksheet', function (workSheetReader) {
    if (workSheetReader.id > 1){
        // we only want first sheet
        workSheetReader.skip();
        return; 
    }
    // print worksheet name
    console.log(workSheetReader.name);

    // if we do not listen for rows we will only get end event
    // and have infor about the sheet like row count
    workSheetReader.on('row', function (row) {
        if (row.attributes.r == 1){
            // do something with row 1 like save as column names
        }else{
            // second param to forEach colNum is very important as
            // null columns are not defined in the array, ie sparse array
            row.values.forEach(function(rowVal, colNum){
                // do something with row values
            });
        }
    });
    workSheetReader.on('end', function () {
        console.log(workSheetReader.rowCount);
    });

    // call process after registering handlers
    workSheetReader.process();
});
workBookReader.on('end', function () {
    // end of workbook reached
});

fs.createReadStream(fileName).pipe(workBookReader);

Beta Warning


This module is currently in use on a live internal business system for product management. That being said this should still be considered beta. More usage and input from users will be needed due to the numerous differences/incompatibilities/flukes I have already run into with XLSX files.

Limitations


The row reader currently returns stored values for formulas (these are normally available) and does not calculate the formula itself. As time permits the row handler will be more capable but was enough for currrent purposes (loading values from large worksheets fast)

Inspiration


Need a simple XLSX file streaming reader to handle large excel sheets but only one available/compatible was by guyonroche/exceljs. The stream reader module at the time was unfinished/unusable and rewrite attempts exposed column shifting I could not solve

More Information


Events are emmited as pertinent parts of the workbook and worksheet are receieved in the stream. Theoretically you could pause the input stream if events are being receieved too fast but this has not been tested

Events can potentially (even though I have not seen it) be receieved out of order, if you receive a worksheet end event while still receieving rows be sure to make sure your number of rows receieved equals the workSheetReader.rowCount

Theoretically you could process an excel sheet as it is being uploaded, depending on the sheet type, but untried (I encountered some XLSX files that have a different zip format that requires having the entire file to read the archive contents properly), but still probably better to save temp first and read streasm from there.

Currently if the zip archive does not have the shared strings at the begining of the archive then the input stream for each sheet is pied into a temp file until the shared string are encountered and processed, then re-read the temp worksheets with the shared strings.

API Information


new XlsxStreamReader()

Create a new XlsxStreamReader object (workBookReader). After attaching handlers you can pipe() your input stream into the reader to begin processing

Event: 'error'

  • error {Error Object}

Emitted if there was an error in processing (may not catch all errors, some may be thrown depending on where the error happened)

Event: 'end'

Emmitted once the XLSX zip parser has closed and all sheets have been processed

Event: 'sharedStrings'

After the workbook shared strings have been parsed this event is emmited. Shared strings are available via array workBookReader.workBookSharedStrings.

Event: 'styles'

After the workbook styles have been parsed this event is emmited. Styles are available via array workBookReader.workBookStyles

Event: 'worksheet'

  • workSheetReader {Object} XlsxStreamReaderWorkSheet object

Emmitted when a worksheet is reached. The sheet number is availble via {Number} workSheetReader.id. You can either process or skip at this point, but you must do one for the processing to the next sheet to continue/finish.

Once event is recieved you can attach worksheet on handlers (end, row) then you would workSheetReader.process(). If you do not want to process a sheet and instead want to skip entirely, you would workSheetReader.skip() without attaching any handlers.

Worksheet Event: 'end'

Emmitted once the end of the worksheet has been reached. The row count is available via {Number} workSheetReader.rowCount

Worksheet Event: 'row'

  • row {Object} Row object

Emmitted on every row encountered in the worksheet. for more details on what is in the row object attributes, see the Row class on MSDN.

For example:

  • row.values: sparse array containing all cell values
  • row.formulas: sparse array containing all cell formulas
  • row.attributes.r: row index
  • row.attributes.ht: Row height measured in point size
  • row.attributes.customFormat: '1' if the row style should be applied.
  • row.attributes.hidden: '1' if the row is hidden

References


Used Modules


Authors


Written by Brian Taber and Kirill Husyatin

DaSpawn's Gratipay

License


MIT