npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

unified-doc

v3.3.2

Published

unified document APIs.

Downloads

137

Readme

unified-doc

unified document APIs.

Install

npm install unified-doc

Use

A unified API to easily work with documents of supported content types.

import Doc from 'unified-doc';

// easily initialize a `doc` instance with access to document APIs.
// any supported content type benefits from the same APIs.
const doc = Doc({
  content: '> **some** markdown content',
  filename: 'doc.md',
});

// export source file
expect(doc.file()).toEqual({
  content: '> **some** markdown content',
  extension: '.md',
  name: 'doc.md',
  stem: 'doc',
  type: 'text/markdown',
});

// export file as html
expect(doc.file('.html')).toEqual({
  content: '<blockquote><strong>some</strong> markdown content</blockquote>',
  extension: '.html',
  name: 'doc.html',
  stem: 'doc',
  type: 'text/html',
});

// export file as text (only textContent is extracted)
expect(doc.file('.txt')).toEqual({
  content: 'some markdown content',
  extension: '.txt',
  name: 'doc.txt',
  stem: 'doc',
  type: 'text/plain',
});

// easily search on a doc
expect(doc.search('nt')).toEqual([
  { 
    start: 16,
    end: 18,
    value: 'nt',
    snippet: ['some markdown co', 'nt', 'ent'],
  },
  {
    start: 19,
    end: 21,
    value: 'nt',
    snippet: ['some markdown conte', 'nt', ''],
  },
]);

// retrieve just the textContent of the document
expect(doc.textContent()).toEqual('some markdown content');

// retrieve the `hast` (syntax tree) representation of the document
expect(doc.parse()).toEqual({ // hast tree
  type: 'root',
  children: [...],
});

// compile the document to use the results for rendering
expect(doc.compile()).toBeInstanceOf(VFile); // vfile instance

API

A doc refers to an instance of unified-doc.

Doc(options)

Initialize a doc instance that provides a set of useful methods to working with documents. Any supported content type benefits from the API methods. As of the time of this writing, unified-doc supports the following content types:

  • .html
  • .md
  • parsing most code file content (e.g. .js, .py) into code blocks.

Future content types such as .csv, .xml, .docx, .pdf will be supported.

// initialize as markdown content
const doc1 = Doc({
  content: '> **some** markdown content',
  filename: 'doc.md',
});
// initialize as html content
const doc2 = Doc({
  content: '<blockquote><strong>some</strong> markdown content</blockquote>',
  filename: 'doc.html',
});
// initialize as code content
const doc3 = Doc({
  content: 'var hello = "world";';
  filename: 'doc.js',
});

The doc instance can be configured by providing other properties. This will be elaborated more in the options section.

doc.compile()

Interface

function compile(): VFile;

Returns the results of the compiled content based on the compiler attached to the doc. The results are stored as a VFile, and can be used by various renderers. By default, a HTML string compiler is used, and stringifed HTML is returned by this method.

doc.file([extension])

Interface

function file(extension?: string): FileData;

Returns FileData for the specified extension. This is a useful way to convert and output different file formats. Supported extensions include '.html', '.txt', .md, .xml. If no extension is provided, the source file should be returned. Future extensions can be implemented, providing a powerful way to convert file formats for any supported content type

Example

const doc = Doc({
  content: '> **some** markdown content',
  filename: 'doc.md',
});

// export source file
expect(doc.file()).toEqual({
  content: '> **some** markdown content',
  extension: '.md',
  name: 'doc.md',
  stem: 'doc',
  type: 'text/markdown',
});

// export file as html
expect(doc.file('.html')).toEqual({
  content: '<blockquote><strong>some</strong> markdown content</blockquote>',
  extension: '.html',
  name: 'doc.html',
  stem: 'doc',
  type: 'text/html',
});

// export file as markdown
expect(doc.file('.markdown')).toEqual({
  content: '> **some** markdown content',
  extension: '.md',
  name: 'doc.md',
  stem: 'doc',
  type: 'text/markdown',
});

// export file as text (only textContent is extracted)
expect(doc.file('.txt')).toEqual({
  content: 'some markdown content',
  extension: '.txt',
  name: 'doc.txt',
  stem: 'doc',
  type: 'text/plain',
});

// export file as html-compatible xml
expect(doc.file('.xml')).toEqual({
  content: '<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body><blockquote><p><strong>some</strong> markdown content</p></blockquote></body></html>',
  extension: '.xml',
  name: 'doc.xml',
  stem: 'doc',
  type: 'application/xml',
});

Related interfaces

interface FileData {
  /** file content in string form */
  content: string;
  /** file extension (includes preceding '.') */
  extension: string;
  /** file name (includes extension) */
  name: string;
  /** file name (without extension) */
  stem: string;
  /** mime type of file */
  type: string;
}

doc.parse()

Interface

function parse(): Hast;

Returns the hast representation of the content. This content representation is used internally by the doc, but it can also be used by any hast utility.

Example

import toMdast from 'hast-util-to-mdast';

const hast = doc.parse();
const mdast = toMdast(hast);

doc.search(query[, options])

Interface

function search(
  /** search query string */
  query: string,
  /** algorithm-specific options based on attached search algorithm */
  options?: Record<string, any>,
): SearchResultSnippet[];

Returns SearchResultSnippet based on the provided query string and search options. Uses the searchAlogrithm attached to the doc for when executing a search against the textContent of a doc.

Example

const doc = Doc({
  content: '> **some** markdown content',
  filename: 'doc.md',
});

expect(doc.search('nt')).toEqual([
  { 
    start: 16,
    end: 18,
    value: 'nt',
    snippet: ['some markdown co', 'nt', 'ent'],
  },
  {
    start: 19,
    end: 21,
    value: 'nt',
    snippet: ['some markdown conte', 'nt', ''],
  },
]);

Related interfaces

interface SearchResult {
  /** start offset of the search result relative to the `textContent` of the `doc` */
  start: number;
  /** end offset of the search result relative to the `textContent` of the `doc` */
  end: number;
  /** matched text value in the `doc` */
  value: string;
  /** additional data can be stored here */
  data?: Record<string, any>;
}

interface SearchResultSnippet extends SearchResult {
  /** 3-tuple string representing the [left, matched, right] of a matched search result.  left/right are characters to the left/right of the matched text value, and its length is configurable in `SearchOptions.snippetOffsetPadding` */
  snippet: [string, string, string];
}

doc.textContent()

Interface

function textContent(): string;

Returns the textContent of a doc. This content is the concatenated value of all text nodes under a doc, and is used by many internal APIs (marking, searching).

Example

const doc = Doc({
  content: '> **some** markdown content',
  filename: 'doc.md',
});

expect(doc.textContent()).toEqual('some markdown content');

options

Configure the doc instance with the following options:

Interface

interface Options {
  /** required content must be provided */
  content: string;
  /** required filename must be provided.  This infers the mimeType of the content and subsequently how content will be parsed */
  filename: string;
  /** attach a compiler (with optional options) in the `PluggableList` interface to determine how the content will be compiled */
  compiler?: PluggableList;
  /** an array of `Mark` data that is used by the mark algorithm to insert `mark` nodes with matching offsets */
  marks?: Mark[];
  /** specify new parsers or override existing parsers with this map */
  parsers?: Parsers;
  /** apply plugins to further customize the `doc`.  These plugins are applied after private plugins (hence the 'post' prefix).  Private methods such as `textContent()` and `parse()` will not incorporate `hast` modifications introduced by these plugins. */
  postPlugins?: PluggableList;
  /** apply plugins to further customize the `doc`.  These plugins are applied before private plugins (hence the 'pre' prefix).  Private methods such as `textContent()` and `parse()` may incorporate `hast` modifications introduced by these plugins. */
  prePlugins?: PluggableList;
  /** provide a sanitize schema for sanitizing the `doc`.  By default, the `doc` is safely sanitized.  If `null` is provided as a value, the `doc` will not be sanitized. */
  sanitizeSchema?: SanitizeSchema | null;
  /** attach a search algorithm that implements the `SearchAlgorithm` interface to support custom search behaviors on a `doc` */
  searchAlgorithm?: SearchAlgorithm;
  /** unified search options independent of the attached search algorithm */
  searchOptions?: SearchOptions;
}

Example

The following is an example of a doc with custom configurations.

const doc = Doc({
  content: '> **some** markdown content',
  filename: 'doc.md',
  compiler: [ // attach a custom compiler for custom content rendering
    [customCompiler, customCompilerOptions],
  ],
  marks: [ // content will be marked in appropriate places
    { id: 'a', start: 0, end: 4, classNames: ['class-a']},
    { id: 'b', start: 0, end: 4, style: { background: 'red', color: 'white' } },
  ],
  parsers: {
    'text/html': [parser1, parser2, parser3], // overwrite html parser with a custom multi-step parser
    'application/pdf': [pdfParser],  // a unified pdf parser is in high demand!
  },
  postPlugins: [ // apply custom rehype plugins (post content processing)
    [toc, tocOptions],
  ],
  prePlugins: [ // apply custom rehype plugins (pre content processing)
    [highlight, { ignoreMissing: true }],
  ],
  sanitizeSchema: { // custom sanitization rules
    attributes: { '*': ['style'] }
  },
  searchAlgorithm: customSearchAlgorithm, // attach a custom search algorithm relevant to your needs (e.g. elasticsearch, googlesearch etc)
  searchOptions: { // unified search option behavior
    minQueryLength: 5,
    snippetOffsetPadding: 10,
  },
});

Related interfaces

/**
 * An object used by a mark algorithm to mark text nodes based on text offset positions
 */
interface Mark {
  /** unique ID for mark (required for mark algorithm to work) */
  id: string;
  /** start offset of the mark relative to the `textContent` of the `doc` */
  start: number;
  /** end offset of the mark relative to the `textContent` of the `doc` */
  end: number;
  /** apply optional CSS classnames to marked nodes */
  classNames?: string[];
  /** apply optional dataset attributes (i.e. `data-*`) to marked nodes */
  dataset?: Record<string, any>;
  /** additional data can be stored here */
  data?: Record<string, any>;
  /** apply optional styles to marked nodes */
  style?: Record<string, any>;
}

/**
 * Mapping of mimeTypes to unified parsers.
 * New parsers can be introduced and existing parsers can be overwritten.
 * Parsers are provided in the `PluggableList` interface.
 * Parsers may consist of multiple steps.
 */
interface Parsers {
  [mimeType: string]: PluggableList;
}

// see https://github.com/syntax-tree/hast-util-sanitize
type SanitizeSchema = Record<string, any>;

/**
 * Unified search options affecting any attached search algorithm.
 */
interface SearchOptions {
  /** only execute `doc.search` if the query is at least of the specified length */
  minQueryLength?: number;
  /** return snippets padded with extra characters on the left/right of the matched value based on the specified padding length */
  snippetOffsetPadding?: number;
}

enums

The following enums indicate mimeTypes for supported parsers and extensionTypes for supported file conversion targets.

const extensionTypes = {
  HTML: '.html',
  MARKDOWN: '.md',
  TEXT: '.txt',
};

const mimeTypes = {
  HTML: 'text/html',
  MARKDOWN: 'text/markdown',
};