npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

hyparquet

v1.8.0

Published

parquet file parser for javascript

Downloads

39,993

Readme

hyparquet

hyparquet parakeet

npm minzipped workflow status mit license coverage dependencies

Dependency free since 2023!

What is hyparquet?

Hyparquet is a lightweight, dependency-free, pure JavaScript library for parsing Apache Parquet files. Apache Parquet is a popular columnar storage format that is widely used in data engineering, data science, and machine learning applications for efficiently storing and processing large datasets.

Hyparquet aims to be the world's most compliant parquet parser. And it runs in the browser.

Parquet Viewer

Try hyparquet online: Drag and drop your parquet file onto hyperparam.app to view it directly in your browser. This service is powered by hyparquet's in-browser capabilities.

hyperparam parquet viewer

Features

  1. Browser-native: Built to work seamlessly in the browser, opening up new possibilities for web-based data applications and visualizations.
  2. Performant: Designed to efficiently process large datasets by only loading the required data, making it suitable for big data and machine learning applications.
  3. TypeScript: Includes TypeScript definitions.
  4. Dependency-free: Hyparquet has zero dependencies, making it lightweight and easy to use in any JavaScript project. Only 9.2kb min.gz!
  5. Highly Compliant: Supports all parquet encodings, compression codecs, and can open more parquet files than any other library.

Why hyparquet?

Existing JavaScript-based parquet readers (like parquetjs) are no longer actively maintained, may not support streaming or in-browser processing efficiently, and often rely on dependencies that can inflate your bundle size. Hyparquet is actively maintained and designed with modern web usage in mind.

Demo

Check out a minimal parquet viewer demo that shows how to integrate hyparquet into a react web application using HighTable.

Quick Start

Node.js Example

To read the contents of a parquet file in a node.js environment use asyncBufferFromFile:

const { asyncBufferFromFile, parquetRead } = await import('hyparquet')

await parquetRead({
  file: await asyncBufferFromFile(filename),
  onComplete: data => console.log(data)
})

Note: Hyparquet is published as an ES module, so dynamic import() may be required on the command line.

Browser Example

In the browser use asyncBufferFromUrl to wrap a url for reading asyncronously over the network. It is recommended that you filter by row and column to limit fetch size:

const { asyncBufferFromUrl, parquetRead } = await import('https://cdn.jsdelivr.net/npm/hyparquet/src/hyparquet.min.js')

const url = 'https://hyperparam-public.s3.amazonaws.com/bunnies.parquet'
await parquetRead({
  file: await asyncBufferFromUrl({url}),
  columns: ['Breed Name', 'Lifespan'],
  rowStart: 10,
  rowEnd: 20,
  onComplete: data => console.log(data)
})

Advanced Usage

Reading Metadata

You can read just the metadata, including schema and data statistics using the parquetMetadata function. To load parquet data in the browser from a remote server using fetch:

import { parquetMetadata } from 'hyparquet'

const res = await fetch(url)
const arrayBuffer = await res.arrayBuffer()
const metadata = parquetMetadata(arrayBuffer)

AsyncBuffer

Hyparquet accepts argument file of type AsyncBuffer which is like a js ArrayBuffer but the slice method can return Promise<ArrayBuffer>. You can pass an ArrayBuffer anywhere that an AsyncBuffer is expected, if you have the entire file in memory.

type Awaitable<T> = T | Promise<T>
interface AsyncBuffer {
  byteLength: number
  slice(start: number, end?: number): Awaitable<ArrayBuffer>
}

You can define your own AsyncBuffer to create a virtual file that can be read asynchronously. In most cases, you should probably use asyncBufferFromUrl or asyncBufferFromFile.

Authorization

Pass the requestInit option to asyncBufferFromUrl to provide authentication information to a remote web server. For example:

await parquetRead({
  file: await asyncBufferFromUrl({url, requestInit: {headers: {Authorization: 'Bearer my_token'}}}),
  onComplete: data => console.log(data)
})

Returned row format

By default, data returned in the onComplete function will be one array of columns per row. If you would like each row to be an object with each key the name of the column, set the option rowFormat to object.

import { parquetRead } from 'hyparquet'

await parquetRead({
  file,
  rowFormat: 'object',
  onComplete: data => console.log(data),
})

Supported Parquet Files

The parquet format is known to be a sprawling format which includes options for a wide array of compression schemes, encoding types, and data structures. Hyparquet supports all parquet encodings: plain, dictionary, rle, bit packed, delta, etc.

Hyparquet is the most compliant parquet parser on earth — hyparquet can open more files than pyarrow, rust, and duckdb.

Compression

By default, hyparquet supports uncompressed and snappy-compressed parquet files. To support the full range of parquet compression codecs (gzip, brotli, zstd, etc), use the hyparquet-compressors package.

| Codec | hyparquet | with hyparquet-compressors | |---------------|-----------|----------------------------| | Uncompressed | ✅ | ✅ | | Snappy | ✅ | ✅ | | GZip | ❌ | ✅ | | LZO | ❌ | ✅ | | Brotli | ❌ | ✅ | | LZ4 | ❌ | ✅ | | ZSTD | ❌ | ✅ | | LZ4_RAW | ❌ | ✅ |

hysnappy

For faster snappy decompression, try hysnappy, which uses WASM for a 40% speed boost on large parquet files.

hyparquet-compressors

You can include support for ALL parquet compressors plus hysnappy using the hyparquet-compressors package.

import { parquetRead } from 'hyparquet'
import { compressors } from 'hyparquet-compressors'

await parquetRead({ file, compressors, onComplete: console.log })

References

  • https://github.com/apache/parquet-format
  • https://github.com/apache/parquet-testing
  • https://github.com/apache/thrift
  • https://github.com/apache/arrow
  • https://github.com/dask/fastparquet
  • https://github.com/duckdb/duckdb
  • https://github.com/google/snappy
  • https://github.com/hyparam/hightable
  • https://github.com/hyparam/hysnappy
  • https://github.com/hyparam/hyparquet-compressors
  • https://github.com/ironSource/parquetjs
  • https://github.com/zhipeng-jia/snappyjs

Contributions

Contributions are welcome! If you have suggestions, bug reports, or feature requests, please open an issue or submit a pull request.

Hyparquet development is supported by an open-source grant from Hugging Face :hugs: