npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

nbtx

v0.2.3

Published

Jupyter Notebook Translators: Transform Jupyter notebook JSON files (*.ipynb) to and from more compact data structures for use in web applications or other contexts where loading component parts (e.g. images, data, etc.) is preferred.

Downloads

771

Readme

nbtx: Jupyter Notebook Transformation Library

nbtx on npm MIT License CI

Transform Jupyter notebook JSON files (*.ipynb) to and from more compact data structures for use in web applications or other contexts where loading component parts (e.g. images, data, etc.) is preferred. For example, in pulling apart a notebook in a publishing workflow the images, interactive charts or other outputs are required either on-disk or through a specific web-request.

Driving Use Cases

  1. Optimize a notebook for a viewing context, so that initial network payload is small (no images, html, data), allowing large components to be loaded lazily.
  2. Identify and extract known output images, html and other data for other formats (e.g. JATS, LaTeX, Word), where the images and outputs are required to be accessed independently.
  3. Allow for additional, post-processed mimetypes to be added to the transformed notebook (e.g. WebP, thumbnail images) while maintaining a transformation path back to original notebook.

Scope

The scope of this library is currently isolated to "minifying" large notebook cell outputs, including stream, error, and mimetype outputs (update_display_data, display_data, execute_result). Large outputs are extracted from the notebook JSON, moved to a cache data structure, and referenced in the notebook by their hash and content_type. This library also provides a function to restore notebook outputs to their original state, given minifed outputs and the cached output content.

This library uses existing notebook types defined in nbformat (see docs); the only new types defined in nbtx are for "minified" outputs. However, there are no functions for handling entire notebooks; outputs must be isolated prior to invoking nbtx functions. This choice allows the library to be used in non-notebook contexts (e.g. MyST Markdown), which include output mime-bundles, but does not conform to the full notebook specification.

Goals

  • Stay as close as possible to the nbformat for defining outputs.
  • Identify and transforming outputs; nbtx does not write files to disk or fetch pieces of a notebook.
  • Identify and extract large stream and error outputs, the length can be customized depending on use case.

Installation

Install using npm or yarn

npm install nbtx

Usage

The following example loads a notebook, then iterates through each cell and, if outputs are present, mutates the cells to include minified output objects that reference a separate outputCache:

import fs from 'fs';
import type { MinifiedContentCache, MinifyOptions } from 'nbtx';
import { minifyCellOutput } from 'nbtx';

const notebook = JSON.parse(fs.readFileSync('my-notebook.ipynb'));
const outputCache: MinifiedContentCache = {};
// Options for minification, see note on hashing below
const opts: Partial<MinifyOptions> = { computeHash };

notebook.cells.forEach((cell) => {
  if (!cell.outputs?.length) return;
  cell.outputs = minifyCellOutput(cell.outputs, outputCache);
});

You may then handle the outputCache however you want. For example, writing each large output to its own file and updating the cell outputs to point to those files:

import { extFromMimeType, walkOutputs } from 'nbtx';

notebook.cells.forEach((cell) => {
  if (!cell.outputs?.length) return;
  walkOutputs(cell.outputs, (output) => {
    if (!output.hash || !outputCache[output.hash]) return;
    const [content, { contentType, encoding }] = outputCache[hash];
    const filename = `${hash}${extFromMimeType(contentType)}`;
    fs.writeFileSync(filename, content, { encoding: encoding as BufferEncoding });
    // The path can be used, for example in a web-context
    output.path = filename;
  });
});

You may also rehydrate the original notebook from an outputCache:

import { convertToIOutputs } from 'nbtx';

notebook.cells.forEach((cell) => {
  if (!cell.outputs?.length) return;
  cell.outputs = convertToIOutputs(cell.outputs, outputCache);
});

Note Minifying and restoring notebook outputs may change the structure of output text from a string list to a single, new-line-delimited string. Both of these formats are acceptable in the notebook types defined by nbformat.

Hashing function

To be able to have no dependencies and also run easily in the browser, nbtx does not bundle a hashing library. To create the computeHash function, choose an algorithm, for example, md5 and digest the content. If you are in the browser, consider using crypto-js or some other random function.

import { createHash } from 'crypto';

function computeHash(content: string): string {
  return createHash('md5').update(content).digest('hex');
}

By default nbtx will create a random string for the hash and raise a warning.

Data transformation example

Starting with an ipynb JSON document, the following example shows the output transformation for an execute_result with three outputs (html, image, text):

{
  ...,
  "cells": [
    {
      "cell_type": "code",
      ...,
      "outputs": {
        "output_type": "execute_result",
        ...,
        "data": {
          "text/html": ["...veryLargeString\n", "on many lines\n"],
          "image/png": "base64-encoded-data-without-a-header",
          "text/plain": ["alt.VConcatChart(...)"],
        }
      }
    }
  ],
  ...
}

After minifyCellOutput is called and an optional pass to write to disk and add a path (as in the above example), the JSON structure would be:

{
  ...,
  "cells": [
    {
      "cell_type": "code",
      ...,
      "outputs": {
        "output_type": "execute_result",
        ...,
        "data": {
          "text/html": {
            "content_type": "text/html",
            "hash": "29cb113f927eb3abba1b303571caa653",
            // The path isn't added by nbtx, but is a common place to put a URL
            "path": "/static/29cb113f927eb3abba1b303571caa653.html"
          },
          "image/png": {
            "content_type": "image/png",
            "hash":  "W5Zulz9J5PLlOkjN2RWMa6CRgJdjxq2r",
            // Known output types are given sensible extensions through `extFromMimeType`
            "path": "/static/W5Zulz9J5PLlOkjN2RWMa6CRgJdjxq2r.png"
          },
          "text/plain": {
            // Small strings are by default not extracted, this can be modified in options
            "content": "alt.VConcatChart(...)",
            "content_type": "text/plain"
          }
        }
      }
    }
  ],
  ...
}

Viewing and "rehydration" applications can choose to walkOutputs and download the various parts of a notebook, and/or add additional mimetypes to the bundle. For example, adding transformations to take screenshots of outputs for long-term preservation or add web-optimized images (e.g. WebP) that were not created in the execution process.

This can be done asyncronously from the first request of notebook content payload, improving pageload speed and leaving it up to the consuming application which of the mime-bundles to fetch.