falcon-vis
v0.17.4
Published
**`Github Pages`**
Downloads
12
Readme
FalconVis
is a JavaScript library that links your own custom visualizations at scale! We also support a variety of data formats for different scales of data (e.g., Apache Arrow, DuckDB WASM, backend servers, and more).
You can cross-filter billions of data entries in the browser with no interaction delay by using the Falcon data index.
FalconVis
was created by Donny Bertucci and Dominik Moritz because the previous implementation (vega/falcon
) could not be used as a library or with custom visualizations.
Table of Contents
Examples
Github Pages
| Data | Type | Count | Live Demo | | ---------------------------------------------------------------------------------------- | ----------- | ----- | ------------------------------------------------------------------------------------ | | Movies | Arrow | 3k | Click to open on Github Pages | | Movies | JSON | 3k | Click to open on Github Pages | | Movies | DuckDB WASM | 3k | Click to open on Github Pages | | Flights (with US Map) | DuckDB WASM | 3m | Click to open on Github Pages | | Flights (comparison with crossfilter fork) | DuckDB WASM | 3m | Click to open on Github Pages | | Flights (comparison with crossfilter fork) | HeavyAI | 7m | Click to open on Github Pages |
ObservableHQ
| Data | Type | Count | Live Demo | | ------- | ----------- | ----- | ---------------------------------------------------------------------------- | | Flights | Arrow | 1m | Click to open on ObservableHQ | | Flights | DuckDB WASM | 3m | Click to open on ObservableHQ | | Flights | DuckDB WASM | 10m | Click to open on ObservableHQ |
Other
| Data | Type | Count | Live Demo | | --------------------- | --------------------- | ----- | --------------------------------------------------------------------------------------- | | Flights (with US Map) | HTTP to DuckDB Python | 10m | Click to open on HuggingFace🤗 Spaces |
Usage
Install FalconVis
via npm.
npm install falcon-vis
Data
Before you filter your data, you need to tell FalconVis
about your data.
FalconVis
currently supports javascript objects, Apache Arrow tables, DuckDB Wasm, and HTTP GET Requests. For different data sizes, or if you want the computation to take place in the browser, different data types are recommended.
| DB | Recommended Data Size | Memory/Computation | Description |
| ------------------------- | -------------------------------- | ------------------ | ---------------------------------------------------------------------------------------------- |
| JsonDB
| up to 500k | Browser | Takes javascript object |
| ArrowDB
| up to 1m | Browser | Takes Apache Arrow table |
| DuckDB
| up to 10m | Browser | Queries DuckDB WASM database |
| HeavyaiDB
| whatever your backend can handle | Backend | Queries HeavyAI database connection |
| HttpDB
| whatever your backend can handle | Backend | Sends GET request to a backend server (sends SQL queries and expects arrow tables in response) |
They are all typed as FalconDB
.
import { JsonDB, ArrowDB, DuckDB, HttpDB } from "falcon-vis";
Linking Views
First initialize the FalconVis
instance with your data. I will use the ArrowDB
for this example for the 1M flights dataset.
import { tableFromIPC } from "@apache-arrow";
import { FalconVis, ArrowDB } from "falcon-vis";
// load the flights-1m.arrow data into memory
const buffer = await (await fetch("data/flights-1m.arrow")).arrayBuffer();
const arrowTable = await tableFromIPC(buffer);
// initialize the falcon instance with the data
const db = new ArrowDB(arrowTable);
const falcon = new FalconVis(db);
Next, create views that contain the data dimension and what happens when the cross-filtered counts change (onChange
). FalconVis
supports 0D and 1D views.
Note that your specified onChange
function is called every time the cross-filtered counts change so that you can update your visualization with the new filtered counts.
Distance View
const distanceView = await falcon.view1D({
type: "continuous",
name: "Distance",
bins: 25,
resolution: 400,
});
distanceView.onChange((counts) => {
updateDistanceBarChart(counts);
});
Arrival Delay View
const arrivalDelayView = await falcon.view1D({
type: "continuous",
name: "ArrDelay",
range: [-20, 140],
bins: 25,
resolution: 400,
});
arrivalDelay.onChange((counts) => {
updateDelayBarChart(counts);
});
Total Count
const countView = await falcon.view0D();
countView.onChange((count) => {
updateCount(count);
});
Link the views together to fetch the initial counts (outputs are shown above).
await falcon.link();
Cross-Filtering Views
Now, you can cross-filter the views by calling .select()
on a view. FalconVis
uses the Falcon data index to cross-filter the views.
Falcon works by activating a single view that you plan to interact with. In the background, we compute the Falcon data index when you activate a view. Then, when you .select()
on an activated view, in we fetch the cross-filtered counts for the other views in constant time.
For Example
I directly .activate()
the distanceView
from before to prefetch the Falcon data index.
await distanceView.activate();
Then, I can apply a filter with .select([rangeStart, rangeEnd])
for continuous data
await distanceView.select([1000, 2000]); // 1k to 2k miles
Which automatically cross-filters and updates the counts for other views in constant time (onChange
is called for each other view).
In the live example, you can take mouse events to call the select()
with user selected filters as shown in the video
https://github.com/cmudig/falcon/assets/65095341/ab7fa9fc-d51f-4830-89f6-93ac6913a5d3
API Reference
# class
JsonDB(object)
Takes a javascript object and attaches FalconVis
data index methods to it. Under the hood, it converts into a ArrowDB class.
The JsonDB supports row-wise or column-wise object formats, but it is recommended to use column-wise format because the row-wise format converts to column-wise with a copy.
Columns JSON Example
import { JsonDB } from "falcon-vis";
const columnarJson = {
names: ["bob", "billy", "joe"],
ages: [21, 42, 40],
};
const db = new JsonDB(columnarJson); // ⬅️
Rows JSON Example
import { JsonDB } from "falcon-vis";
const rowJson = [
{ name: "bob", age: 21 },
{ name: "billy", age: 42 },
{ name: "joe", age: 40 },
];
const db = new JsonDB(rowJson); // ⬅️, but does a copy over rowJson
# class
ArrowDB(table)
Takes an Apache Arrow Table
created using the apache-arrow
package and attaches FalconVis
data index methods to it.
Example
import { ArrowDB } from "falcon-vis";
import { tableFromIPC } from "apache-arrow";
const buffer = await (await fetch("data/flights-1m.arrow")).arrayBuffer();
const table = await tableFromIPC(buffer);
const db = new ArrowDB(table); // ⬅️
Arrow Shorthand Example
import { ArrowDB } from "falcon-vis";
const db = await ArrowDB.fromArrowFile("data/flights-1m.arrow"); // ⬅️
# class
DuckDB(duckdb, table)
Takes a @duckdb/duckdb-wasm
db and table name within the db and attaches FalconVis
data index methods to it.
Example
import { DuckDB } from "falcon-vis";
import * as duckdb from "@duckdb/duckdb-wasm";
// duckdb setup
const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles();
const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);
const worker = await duckdb.createWorker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const flightsDb = new duckdb.AsyncDuckDB(logger, worker);
await flightsDb.instantiate(bundle.mainModule, bundle.pthreadWorker);
const c = await flightsDb.connect();
// load parquet file into table called flights
await c.query(
`CREATE TABLE flights
AS SELECT * FROM parquet_scan('${window.location.href}/data/flights-1m.parquet')`
);
c.close();
const db = new DuckDB(flightsDb, "flights"); // ⬅️
Parquet Shorthand Example
If you just want to load one parquet file, you can use the shorthand method DuckDB.fromParquetFile()
.
import { DuckDB } from "falcon-vis";
const db = await DuckDB.fromParquetFile("data/flights-1m.parquet"); // ⬅️
# class
HeavyaiDB(session, table)
Takes in a session from @heavyai/connector
with a given table name.
Example
import { HeavyaiDB } from "falcon-vis";
import HeavyaiCon from "@heavyai/connector";
const connector = new HeavyaiCon();
const conn = {
host: "your host url address",
dbName: "db name",
user: "user name",
password: "password",
protocol: "https",
port: 443,
};
const connection = connector
.protocol(conn.protocol)
.host(conn.host)
.port(conn.port)
.dbName(conn.dbName)
.user(conn.user)
.password(conn.password);
const session = await connection.connectAsync();
const tableName = "flights";
const db = new HeavyaiDB(session, tableName); // ⬅️
Session Connection Shorthand
import { HeavyaiDB } from "falcon-vis";
const tableName = "flights";
const db = await HeavyaiDB.connectSession(️{
host: "your host url address",
dbName: "db name",
user: "user name",
password: "password",
protocol: "https",
port: 443
}, tableName); // ⬅️
# class
HttpDB(url, table, encodeQuery?)
HttpDB sends SQL queries (from table name) over HTTP GET to the url and hopes to receive an Apache Arrow table bytes in response.
encodeQuery is an optional parameter that encodes the SQL query before sending it over HTTP GET. By default it uses the encodeURIComponent
function on the SQL query so that it can be sent in the url.
Example
import { HttpDB } from "falcon-vis";
const tableName = "flights";
const db = new HttpDB("http://localhost:8000", tableName); // ⬅️
# class
FalconVis(db)
The main logic that orchestrates the cross-filtering between views.
Takes in the data (JsonDB
, ArrowDB
, DuckDB
, HeavyaiDB
, or HttpDB
).
Example
import { FalconVis } from "falcon-vis";
// given a db: FalconDB
const falcon = new FalconVis(db); // ⬅️
# function
falcon.view0D(onChangeCallback?)
Adds a 0D view onto an existing FalconVis
instance named falcon and describes what to execute when the counts change.
Takes an onChangeCallback
function that is called whenever the view count changes (after cross-filtering).
Returns a View0D
instance (you can add more onChange callbacks to it later).
The onChangeCallback
gives you access to the updated filtered count and total count of the rows (View0DState
) object as a parameter.
interface View0DState {
total: number | null;
filter: number | null;
}
Example
import { FalconVis } from "falcon-vis";
const falcon = new FalconVis(db);
const countView = falcon.view0D((count) => {
console.log(count.total, count.filter); // gets called every cross-filter
}); // ⬅️
Example multiple and disposable onChangeCallback
s
import { FalconVis } from "falcon-vis";
const falcon = new FalconVis(db);
// create view0D
const countView = falcon.view0D();
// add onChange callbacks
const disposeA = countView.onChange((count) => {
console.log("A", count.total, count.filter);
}); // ⬅️
const disposeB = countView.onChange((count) => {
console.log("B", count.total, count.filter);
}); // ⬅️
// then can be disposed later to stop listening for onChange
disposeA();
disposeB();
# function
falcon.view1D(dimension, onChangeCallback?)
Adds a 1D view onto an existing FalconVis
instance named falcon and describes what to execute when the counts change. A 1D view is a histogram of the data with counts per bin.
dimension is a Dimension
object that defines which data column to use for the 1D view. (more info below)
Takes an onChangeCallback
function that is called whenever the view count changes (after cross-filtering).
Returns a View1D
instance (you can add more onChange callbacks to it later).
The dimension can be type: "categorical"
for discrete values or type: "continuous"
for ranged values.
A continuous Dimension
can be defined as follows (with ? being optional parameters):
interface ContinuousDimension {
/* continuous range of values */
type: "continuous";
/* column name in the data table */
name: string;
/**
* resolution of visualization brushing (e.g., histogram is 400px wide, resolution: 400)
* a smaller resolution than the brush will approximate the counts, but be faster
*/
resolution: number;
/**
* max number of bins to create, the result could be less bins
*
* @default computed from the data using scotts rule
*/
bins?: number;
/**
* forces the specified number bins to use exactly
* otherwise, will use the specified number of bins as a suggestion
*
* @default false
*/
exact?: boolean;
/**
* [min, max] extent to limit the range of data values
* @default computed from the data
*/
range?: [number, number];
/* should format for dates */
time?: boolean;
}
A categorical dimension can be defined as follows:
interface CategoricalDimension {
/* categorical values */
type: "categorical";
/* column name in the data table */
name: string;
/**
* categorical values to include
*
* @default computed from the data
*/
range?: string[];
}
The onChangeCallback
gives you access to the updated counts per bin (View1DState
) object as a parameter.
If the view is type continuous:
interface ContinuousView1DState {
/* total counts per bin */
total: Float64Array | null;
/* filtered counts per bin */
filter: Float64Array | null;
/* continuous bins */
bin: { binStart: number; binEnd: number }[] | null;
}
If the view is type categorical:
interface CategoricalView1DState {
/* total counts per bin */
total: Float64Array | null;
/* filtered counts per bin */
filter: Float64Array | null;
/* categorical bin labels */
bin: string[] | null;
}
Initialization
import { FalconVis } from "falcon-vis";
const falcon = new FalconVis(db);
// continuous
const distanceView = await falcon.view1D(
{
type: "continuous",
name: "Distance",
resolution: 400,
bins: 25,
},
(counts) => {
console.log(counts.total, counts.filter, counts.bin); // gets called every cross-filter
}
); // ⬅️
// categorical
const originStateView = await falcon.view1D(
{
type: "categorical",
name: "OriginState",
},
(counts) => {
console.log(counts.total, counts.filter, counts.bin);
}
); // ⬅️
Interaction
# function
view.activate()
You must .activate()
a view before .select()
ing it. .activate()
computes the Falcon index so that subsequent .select()
s are fast (constant time). More details on the Falcon index can be found in the paper.
# function
view.select(filter)
You can directly interact with you View1D
(view) instance to filter the dimension and automatically cross-filter all other views on the same FalconVis
instance.
You only have to call .activate()
everytime before you interact with a new view, but only once!
The index changes when new filters are present, so if you .activate()
a view, then .activate()
a different view and filter that view, when you come back to the original view you have to call .activate()
again.
Continuous view selection:
await distanceView.activate(); // compute Falcon index
await distanceView.select([0, 1000]); // filter to only flights with distance between 0 and 1000 miles
await distanceView.select([600, 800]); // change filter
await distanceView.select(); // deselect all
Categorical view selection:
await originStateView.activate(); // compute Falcon index
await originStateView.select(["CA", "PA", "OR"]); // select California, Pennsylvania, and Oregon
await originStateView.select(["FL"]); // change filter
await originStateView.select(); // deselect all
After each .select()
the onChangeCallback
will be called with the updated counts on all other views.
# function
view.detach()
Detach is how you remove your view from the FalconVis
instance. Note that you directly call this on the view instance, not the FalconVis
instance.
# function
view.attach()
Attach is how you add your view back onto the FalconVis
instance. Note that you directly call this on the view instance, not the FalconVis
instance.
# function
falcon.link()
The link function takes the added views and links them together. This is required before cross-filtering.
link also initializes the counts for all views.
Call link whenever you add or remove views. Calling link once will suffice after adding (or removing) multiple views.
Example
import { FalconVis } from "falcon-vis";
const falcon = new FalconVis(db);
const distanceView = await falcon.view1D(
{
type: "continuous",
name: "Distance",
resolution: 400,
bins: 25,
},
(counts) => {
console.log(counts.total, counts.filter, counts.bin);
}
);
const countView = falcon.view0D((count) => {
console.log(count.total, count.filter);
});
await falcon.link(); // 🔗⬅️
Which then proceeds to call the onChangeCallback
for each view with the initial counts. So you will see two console.logs from this particular example to start.
# function
falcon.entries(location)
This gives you access to the filtered entries. So after cross-filtering you need to manually call this if you want to extract the filtered entries.
Takes a location defined by
interface Location {
/* defaults to 0 */
offset?: number;
/* defaults to Infinity (all) */
length?: number;
}
Where offset
refers to the offset in the data table and length
refers to the number of rows to return.
Note that offset
refers to the filtered data table, so if you have a filter applied, the offset will be relative to the filtered data table.
Returns an Iterator over the entries in the data table as Iterable<Row>
where Row
is an object with key names corresponding to the column names in the data table.
Example
import { FalconVis } from "falcon-vis";
const falcon = new FalconVis(db);
const entries = await falcon.entries({
offset: 0,
length: 25,
}); // first 25 entries ⬅️
// print out first 25 distances
for (const entry of entries) {
console.log(entry["Distance"]);
}
You can easily use offset to shift over 25, to then get the second 25 entries. (or by whatever amount you want).
const entries = await falcon.entries({
offset: 25, // start after 25 entries
length: 25,
}); // second 25 entries ⬅️
// print out second 25 distances
for (const entry of entries) {
console.log(entry["Distance"]);
}