npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

gesel

v0.3.3

Published

Client-side search for gene set selections

Downloads

40

Readme

Client-side gene set enrichment

Overview

Search for interesting gene sets, client-side. This uses Javascript to do the queries in the browser, improving latency and avoiding the need for a back-end service. Users can test for enrichment in their own list of genes and/or search by text in the set names or descriptions. The queries rely on prebuilt databases containing gene sets of interest - see here for the expectations around the database files.

Installation

gesel can be installed with the usual commands:

npm i gesel

gesel is implemented as an ES6 module, so importing is as simple as:

import * as gesel from "gesel";

See the API documentation for all functions.

Example usage

Given a user-supplied set of genes, what are the overlapping gene sets in our references? First, we need to map our user-supplied genes to gesel's internal identifiers.

let user_supplied = [ "SNAP25", "NEUROD6", "ENSG00000123307", "TSPAN6" ];

/* Seeing if the user-supplied symbols/IDs are found in the human reference. */
let user_supplied_ids = await gesel.searchGenes("9606", user_supplied);

/* Taking the first matching ID for each user-supplied gene name. Applications
 * may prefer to print warnings/errors if there are multiple matches.
 */
let user_supplied_union = [];
for (const x of user_supplied_ids) {
    if (x.length >= 1) {
        user_supplied_union.push(x[0]);
    }
}

Then, we can search for the overlapping sets. This returns an array of objects with the set IDs, the number of overlapping genes, the size of each set and the enrichment p-value based on the hypergeometric distribution:

let overlaps = await gesel.findOverlappingSets("9606", user_supplied_union);

Once we have a set ID, we can query the references to obtain that set's details:

let set_details = await gesel.fetchSingleSet("9606", overlaps[0].id);

Each set also has some associated free text in its name and description. We can do some simple queries via gesel:

let hits = await gesel.searchSetText("9606", "B immune");
let first_hit = await gesel.fetchSingleSet("9606", hits[0]);

// '*' and '?' wildcards are also supported.
let hits2 = await gesel.searchSetText("9606", "B immun*");
let first_hit2 = await gesel.fetchSingleSet("9606", hits2[0]);

This can be combined with the output of findOverlappingSets to find all gene sets that overlap the user-supplied set and contain the desired keywords.

let combined = gesel.intersect([ hits, overlaps.map(x => x.id) ]);

Overriding the downloader

By default, we use the reference gene sets collated in the feedstock repository. However, users can point gesel to their own references by overriding the URLs before calling any gesel functions. For example, if our prebuilt references are hosted on some other URL:

gesel.referenceBaseUrl("https://some.company.com/prebuilt-gesel-db");
gesel.geneBaseUrl("https://some.company.com/prebuilt-gesel-genes");

More advanced users can override the downloader functions to customize the request and handling of the response. The example below attaches an authorization header to the request; the same approach can be used to cache the response on-disk for faster responsiveness on subsequent visits to an application.

// To set the downloader for the reference files.
gesel.referenceDownload(async (file, start = null, end = null) => {
    const url = gesel.referenceBaseUrl() + "/" + file;
    if (start == null) {
        return fetch(url, { headers: { Authorization: "Bearer XXX" } });
    } else {
        let range_text = "bytes=" + String(start) + "-" + String(end);
        return fetch(url, { headers: { Authorization: "Bearer XXX", Range: range_text } });
    }
});

// To set the downloader for gene information.
gesel.geneDownload(file => {
    const url = gesel.geneBaseUrl() + "/" + file;
    return fetch(url, { headers: { Authorization: "Bearer XXX" } });
});

Implementation details

By default, gesel uses HTTP range requests to efficiently extract slices of data from the databases. This allows us to obtain the identities of genes belonging to a particular gene set, or the identities of the sets containing a particular gene, or the details of a particular gene set or collection, without downloading the entirety of the associated refences files. Only the range indices need to be transferred to the client - as of time of writing, this amounts to an acceptably small payload (< 2 MB).

That said, some applications may prefer to download the entire database up-front rather than performing range requests for each query. This may be more performant for batch processing where repeated range requests would unnecessarily increase network activity. In those cases, we provide functions like fetchGenesForAllSets() and options like forceDownload = true to trigger a full download of the relevant file(s) on first use. Subsequent calls to related functions like fetchGenesForSet() will then re-use this data and skip range requests. This approach transfers more data to the client but is still practical - the default human gene set database (containing the Gene Ontology and almost all MSigDB gene sets) is less than 9 MB in size, which is a tolerable payload.

gesel will automatically cache responses in memory to reduce network traffic across the lifetime of a single session. Note that no caching is done across sessions, though users can add their own (e.g., with IndexedDB or the Cache API) by overriding the downloader.