npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@uwdata/flechette

v1.1.0

Published

Fast, lightweight access to Apache Arrow data.

Downloads

1,006

Readme

Flechette

Flechette is a JavaScript library for reading and writing the Apache Arrow columnar in-memory data format. It provides a faster, lighter, zero-dependency alternative to the Arrow JS reference implementation.

Flechette performs fast extraction and encoding of data columns in the Arrow binary IPC format, supporting ingestion of Arrow data from sources such as DuckDB and Arrow use in JavaScript data analysis tools like Arquero, Mosaic, Observable Plot, and Vega-Lite.

For documentation, see the API Reference. For code, see the Flechette GitHub repo.

Why Flechette?

In the process of developing multiple data analysis packages that consume Arrow data (including Arquero, Mosaic, and Vega), we've had to develop workarounds for the performance and correctness of the Arrow JavaScript reference implementation. Instead of workarounds, Flechette addresses these issues head-on.

  • Speed. Flechette provides better performance. Performance tests show 1.3-1.6x faster value iteration, 2-7x faster array extraction, 7-11x faster row object extraction, and 1.5-3.5x faster building of Arrow columns.

  • Size. Flechette is smaller: ~43k minified (~14k gzip'd) versus 163k minified (~43k gzip'd) for Arrow JS. Flechette's encoders and decoders also tree-shake cleanly, so you only pay for what you need in custom bundles.

  • Coverage. Flechette supports data types unsupported by the reference implementation, including decimal-to-number conversion, month/day/nanosecond time intervals (as used by DuckDB), run-end encoded data, binary views, and list views.

  • Flexibility. Flechette includes options to control data value conversion, such as numerical timestamps vs. Date objects for temporal data, and numbers vs. bigint values for 64-bit integer data.

  • Simplicity. Our goal is to provide a smaller, simpler code base in the hope that it will make it easier for ourselves and others to improve the library. If you'd like to see support for additional Arrow features, please file an issue or open a pull request.

That said, no tool is without limitations or trade-offs. Flechette assumes simpler inputs (byte buffers, no promises or streams), has less strict TypeScript typings, and may have a slightly slower initial parse (as it decodes dictionary data upfront for faster downstream access).

What's with the name?

The project name stems from the French word fléchette, which means "little arrow" or "dart". 🎯

Examples

Load and Access Arrow Data

import { tableFromIPC } from '@uwdata/flechette';

const url = 'https://vega.github.io/vega-datasets/data/flights-200k.arrow';
const ipc = await fetch(url).then(r => r.arrayBuffer());
const table = tableFromIPC(ipc);

// print table size: (231083 x 3)
console.log(`${table.numRows} x ${table.numCols}`);

// inspect schema for column names, data types, etc.
// [
//   { name: "delay", type: { typeId: 2, bitWidth: 16, signed: true }, ...},
//   { name: "distance", type: { typeId: 2, bitWidth: 16, signed: true }, ...},
//   { name: "time", type: { typeId: 3, precision: 1 }, ...}
// ]
// typeId: 2 === Type.Int, typeId: 3 === Type.Float
console.log(JSON.stringify(table.schema.fields, 0, 2));

// convert a single Arrow column to a value array
// when possible, zero-copy access to binary data is used
const delay = table.getChild('delay').toArray();

// data columns are iterable
const time = [...table.getChild('time')];

// data columns provide random access
const time0 = table.getChild('time').at(0);

// extract all columns into a { name: array, ... } object
// { delay: Int16Array, distance: Int16Array, time: Float32Array }
const columns = table.toColumns();

// convert Arrow data to an array of standard JS objects
// [ { delay: 14, distance: 405, time: 0.01666666753590107 }, ... ]
const objects = table.toArray();

// create a new table with a selected subset of columns
// use this first to limit toColumns or toArray to fewer columns
const subtable = table.select(['delay', 'time']);

Build and Encode Arrow Data

import {
  bool, dictionary, float32, int32, tableFromArrays, tableToIPC, utf8
} from '@uwdata/flechette';

// data defined using standard JS types
// both arrays and typed arrays work well
const arrays = {
  ints: [1, 2, null, 4, 5],
  floats: [1.1, 2.2, 3.3, 4.4, 5.5],
  bools: [true, true, null, false, true],
  strings: ['a', 'b', 'c', 'b', 'a']
};

// create table with automatically inferred types
const tableInfer = tableFromArrays(arrays);

// encode table to bytes in Arrow IPC stream format
const ipcInfer = tableToIPC(tableInfer);

// create table using explicit types
const tableTyped = tableFromArrays(arrays, {
  types: {
    ints: int32(),
    floats: float32(),
    bools: bool(),
    strings: dictionary(utf8())
  }
});

// encode table to bytes in Arrow IPC file format
const ipcTyped = tableToIPC(tableTyped, { format: 'file' });

Customize Data Extraction

Data extraction can be customized using options provided to table generation methods. By default, temporal data is returned as numeric timestamps, 64-bit integers are coerced to numbers, map-typed data is returned as an array of [key, value] pairs, and struct/row objects are returned as vanilla JS objects with extracted property values. These defaults can be changed via conversion options that push (or remove) transformations to the underlying data batches.

const table = tableFromIPC(ipc, {
  useDate: true,          // map dates and timestamps to Date objects
  useDecimalBigInt: true, // use BigInt for decimals, do not coerce to number
  useBigInt: true,        // use BigInt for 64-bit ints, do not coerce to number
  useMap: true,           // create Map objects for [key, value] pair lists
  useProxy: true          // use zero-copy proxies for struct and table row objects
});

The same extraction options can be passed to tableFromArrays. For more, see the API Reference.

Build Instructions

To build and develop Flechette locally:

  • Clone https://github.com/uwdata/flechette.
  • Run npm i to install dependencies.
  • Run npm test to run test cases, npm run perf to run performance benchmarks, and npm run build to build output files.