npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@creationix/rando

v3.3.0

Published

A serialization format optimized for fast random access of unstructured data

Downloads

521

Readme

Rando - An Exciting Serialization Format

Bun Tests

| Light Mode | Dark Mode | | ------------------------------------------- | ----------------------------------------- | | Rando Logo for Light | Rando Logo for Dark |

Rando is a new serialization format optimized for fast random access of unstructured data.

| JS | JSON | Rando | Comment | | ---------------------------------------: | ----------------------------------: | :--------------------------- | ------------------- | | 0 | 0 | + | Integers | | -1 | -1 | 1+ | | | 1 | 1 | 2+ | | | -25 | -25 | N+ | | | 2000 | 2000 | -w+ | | | -125000 | -125000 | Z2f+ | | | 8654321 | 8654321 | 121Ly+ | | | 1/3 | 0.3333333333333333 | 2\|3/ | Rational | | -13/7 | -1.8571428571428572 | p\|7/ | | | 1/0 | N/A | 2\|/ | Infinity | | -1/0 | N/A | 1\|/ | -Infinity | | 0/0 | N/A | \|/ | NaN | | 20.24 | 20.24 | _g\|3. | Decimal | | 1e100 | 1e100 | 2\|38. | | | -1e-200 | -1e-200 | 1\|6f. | | | Math.PI | 3.141592653589793 | mkEokiJF2\|t. | | | Math.sqrt(3) | 1.7320508075688772 | 1X4t8mn8q8\|v. | | | true | true | ! | True | | false | false | ~ | False | | null | null | ? | Null | | '' | "" | $ | Empty String | | 'Banana' | "Banana" | Banana@ | B64 String | | 'Hi, World' | "Hi, World" | 9$Hi, World | String | | '🍌' | "🍌" | 4$🍌 | UTF-8 String | | [ 1, 2, 3] | [ 1, 2, 3] | 6;2+4+6+ | Lists | | [ 100, 100, 100 ] | [ 100, 100, 100 ] | 6;1**38+ | Lists with Pointers | | [ 1, 2, 3 ] | [ 1, 2, 3 ] | 6\|3;2+4+6+ | Counted Lists | | { a: 1, b: 2, c: 3 } | {"a":1,"b":2,"c":3} | c:a@2+b@4+c@6+ | Maps | | { a: 1, b: 2, c: 3 } | {"a":1,"b":2,"c":3} | c\|3:a@b@c@2+4+6+ | Counted Maps | | [ { name: 'Alice' }, { name: 'Bob' } ] | [{"name":"Alice"},{"name":"Bob"}] | l\|2;8:8*Alice@9:name@Bob@ | Repeated Keys | | new Map([[1,2],[3,4]]) | N/A | 8\|2:2+6+4+8+ | Non-string Keys | | new Uint8Array([213,231,187]) | N/A | 4=1ee7 | Bytes |

Use Rando anywhere you might use JSON if the following are true:

  • You don't want to always parse everything when reading documents
  • You don't like encoding repeat values (like object keys) over and over.
  • You don't need a human-readable or human-editable format.
  • You might be able to share a list of known values between encoder and decoder.
  • You still need to be able to serilize to text, maybe embed inside JSON as a string.

Basic Usage

First add rando as a dependency to your project:

npm i -S @creationix/rando

Then import it as any module. The npm package also includes a .d.ts package for conveinence.

import { stringify, parse } from "@creationix/rando"

const sampleDoc = {
  person: {
    name: 'John Doe',
    age: 30,
    id: 12345,
    'ai-generated': true,
  },
  list: [1, 2, 3, 4, 5],
  nested: {
    key: 'value',
    nested: {
      key: 'value',
    },
  },
}

const encoded = stringify(sampleDoc)
console.log(encoded)
// 1B|3:person@list@11*H|4:name@age@id@c$ai-generated8$John DoeY+61O+!a;2+4+6+8+a+n|2:b*nested@6*a:key@value@

const decoded = parse(encoded)
console.log(decoded)
// {
//   person: [Getter],
//   list: [Getter],
//   nested: [Getter],
// }

The string output is generally smaller than the JSON equivalent, especially if you have a lot of repeated values. Some documents in the wild end up being 30% their original size!

The decoded value in JavaScript is lazy parsed so it should be very fast even for large documents as only the top-level keys in an object are eagerly decoded. All object values and array values are lazilly decoded on first access and then cached for future access.

Supported Types

Rando can encode anything that JSON can encode:

  • null
  • boolean
  • number
  • string
  • list
  • map

It can also encode:

  • binary (base64 encoded when rando is in text mode)
  • rational numbers (including Infinity, -Infinity and NaN)
  • maps can contain any values, not just strings.

Boolean and Null

These three primitives have a dedicated symbol each.

        true:  `!`
       false:  `~`
        null:  `?`

Numbers

Numbers are encoded using three different types to optimize for common values.

     integer:  zigzag(number) `+`
    rational:  zigzag(numerator) `|` denominator `/`
     decimal:  zigzag(base) `|` zigzag(exponent) `.`

Integer uses zigzag encoding so small negative values are short, but we can encode arbitrary precision using any number of digits.

Rational works great for values like 0.33333333333333333 that encode as 1/3. Also rational allows encoding Infinity, -Infinity, and NaN as 1/0, -1/0 and 0/0.

Decimal powers of 10 as typically written in scientific notation. 10000000000 is 1e10, not powers of 2 like double or float.

Strings and Binary

Rando can encode unicode strings using UTF-8 as well as arbitrary binary data using base64.

Since strings are so common in most data sets a few special types exist to shave the bytes.

  b64-string:  b64String `@`
 utf8-string:  len `$` utf-8-data
       bytes:  len `=` base64-data
       chain:  len `,` ( b64-string | string | bytes | pointer | reference | chain )*

Lists (aka Arrays)

Lists are zero or more arbitrary values that can be encoded in three levels of verboseness.

        list:  len                     `;`             value*
counted-list:  len `|` count           `;`             value*
indexed-list:  len `|` count `|` width `;` array-index value*

The first has O(n) costs to find an item and O(n) costs to count the items.

By adding a count we reduce the cost of counting to O(1)

By also adding the array index we reduce the cost of indexing an item to O(1)

Maps (aka Objects)

Maps are zero or more pairs of arbitrary values that can also be encoded in three levels of verboseness.

Note that keys are not limited to strings.

        map:   len                     `:`            (key value)*
counted-map:   len `|` count           `:`            key* value*
indexed-map:   len `|` count `|` width `:` trie-index key* value*

The first has O(n) costs to find a key, and if the value is huge, this may require fetching multiple blocks.

By adding a count, we're able to move the keys up-front where they are more likely to be contained in the same block.

Then by adding an index, we reduce lookup costs to O(log n) time.

Pointers and References

One method rando can use to speed up decoding and reduce serialized size is to de-duplicate seen or known values.

     pointer:  offset `*`
   reference:  index `&`

A pointer points to a value seen later in the document. Since Rando encodes back to front, these are values that have already been seen while encoding.

To decode a pointer, add offset bytes after the * and decode again.

A reference is a zero based index into an external array of known values provided by the caller of both the encoder and decoder.

It's up to the user of the library to ensure that the same list is used for both sides.

Lazy Block Fetches

Rando is designed on the principle that it might be used for cases where the document is so large that the client doesn't want to download the entire thing before starting to read it.

This is why the various optimizations take into account the blockSize encoding option.

  • Pointers ensure that they never point to data in another block
  • Lists automatically add count if the number of items exceeds the listCountThreshold encoding option.
  • Lists automatically add indices if the encoded values are larger than one block.
  • Maps automatically add count if the encoded key/value pairs are larger than one block.
  • Maps automatically add indices if either the number of keys exceeds the mapCountThreshold encoding option or the encoded keys alone are larger than one block.

With this in mind, rando is designed to be used with a system that has a block-level caching proxy.

  • The decoder will read arbitrary byte slices our of the document as needed
  • The block system will translate these reads to block aligned reads (possibly spanning more than one block in a single read)
  • The block system will cache fetched blocks so that future reads that expand to the same block don't fetch it again.

One common method for hosting these files is a normal HTTP server that supports range requests.