npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@atjson/document

v0.30.2

Published

Document definition for AtJSON

Downloads

6,030

Readme

🖋 @atjson/document

The Document is the core component of AtJSON. It's a library that provides tools to create annotations and documents, query them, and convert between them. We'll walk through this in this, describing how to use AtJSON.

🎴 Introduction to Annotations

Annotations are a term used in both writing and technology. In writing, annotations are marks on a page that is informative. They can be copy-editing notes or shorthand that dictates alterations of text. Software systems use them to add extra contextual information to existing data.

When we mention annotations in AtJSON, we primarily mean the software definition. We are marking up ranges of text that add more meaningful information to the text / document. What makes AtJSON a little different, is how we think of annotations.

Typically when dealing with text, we think of markup languages, like HTML, XML, and markdown. AtJSON can slurp up markup and represent it in terms of annotations. The important feature to note is that it does not distinguish between different types of markup. This is a flexible and powerful foundation, since it allows us to seamlessly convert and manipulate annotations in different formats as if they were the same format.

As an example markdown document like the following:

_Roses_ are red,  
_Violets_ are blue.

Can be parsed into an AtJSON document by passing it into a source in AtJSON. A source is kind of like a parser, but is called a source because they use a parser, and normally we don't do any parsing in a source. Here we use AtJSON's built-in Commonmark source, which is a document format that supports all markdown features specified in the CommonMark spec.

import CommonmarkSource from "@atjson/source-commonmark";

CommonmarkSource.fromRaw("*Roses* are red,  \n*Violets* are blue.");

The result is a logical representation of the markdown document:

{
  "content": "*Roses* are red,  \n*Violets* are blue.",
  "annotations": [
    {
      "type": "-commonmark-em",
      "start": 0,
      "end": 7,
      "attributes": {}
    },
    {
      "type": "-atjson-parse-token",
      "start": 0,
      "end": 1,
      "attributes": {
        "-atjson-reason": "em_open"
      }
    },
    {
      "type": "-atjson-parse-token",
      "start": 6,
      "end": 7,
      "attributes": {
        "-atjson-reason": "em_close"
      }
    },
    {
      "type": "-commonmark-hard_break",
      "start": 16,
      "end": 19,
      "attributes": {}
    },
    {
      "type": "-commonmark-em",
      "start": 20,
      "end": 29,
      "attributes": {}
    },
    {
      "type": "-atjson-parse-token",
      "start": 20,
      "end": 21,
      "attributes": {
        "-atjson-reason": "em_open"
      }
    },
    {
      "type": "-atjson-parse-token",
      "start": 28,
      "end": 29,
      "attributes": {
        "-atjson-reason": "em_close"
      }
    }
  ]
}

💁‍♀️ Notes

AtJSON stores annotations with vendor prefixes. These are meant to prevent collisions between different types of documents.

Positions indicate the position *between* characters, not the character boundary. This means { start: 0, end: 1 } is over the first character of the content.

Once the markdown is in AtJSON, we can start modifying the document.

📝 Documents

We skipped right to an example of using AtJSON directly, but let's backtrack a bit to explain the other core concept of AtJSON. Annotations can't live in the ether - they need something to latch onto. So we attach them to a Document. A document is the combination of content + annotations. The content, being for our purposes, text, and the annotations being a bunch of information about the text.

Above, the JSON blob is a raw look at the document.

Documents have associated annotation classes that will turn the JSON into a full annotation class that can be accessed without vendor prefixes.

As an intro into documents, let's create a small document definition that describes addresses:

import Document, { InlineAnnotation } from "@atjson/document";

class Name extends InlineAnnotation {
  static vendorPrefix = "address";
  static type = "name";
}

class AddressLine extends InlineAnnotation {
  static vendorPrefix = "address";
  static type = "line";
}

class PostalCode extends InlineAnnotation {
  static vendorPrefix = "address";
  static type = "postal-code";
}

class City extends InlineAnnotation {
  static vendorPrefix = "address";
  static type = "city";
}

class Region extends InlineAnnotation {
  static vendorPrefix = "address";
  static type = "region";
}

class Address extends Document {
  static schema = [Name, AddressLine, PostalCode, City, Region];
}

This may seem like a weird example to use, but if we pull up our address for Condé Nast's NYC office:

Condé Nast
1 WTC
New York, NY 10006

We can start marking this up with some annotations:

let address = new Address({
  content: "Condé Nast\n1 WTC\nNew York, NY 10006",
  annotations: [],
});

address.addAnnotations(
  new Name({ start: 0, end: 10 }),
  new AddressLine({ start: 11, end: 16 }),
  new City({ start: 17, end: 25 }),
  new Region({ start: 27, end: 29 }),
  new PostalCode({ start: 30, end: 35 })
);

It seems very arbitrary to store data this way. Why would we do this?

Well, we think this provides the most natural input mechanism for people, while providing robust machine reading capabilities. Documents provide tools that manage annotations so when text is added or removed it "just works" as expected. So, if we wanted to change the text here, and we inserted " Entertainment" after "Condé Nast", the annotations would remain intact, and in fact, would cover the correct text!

The flexibility here also means that we can render this address out into something more suitable for Google, such as JSONLD. We can use AtJSON's rendering here to take this document and turn it into something that can show a rich location card in search results. Note that in order to do this, we define a rendering rule for every annotation we've defined thus far for our Address schema:

import Renderer from "@atjson/renderer-hir";

class JSONLDRenderer extends Renderer {
  private jsonld: any;

  constructor() {
    this.jsonld = {
      "@context": "https://schema.org",
      "@type": "Place",
      address: {
        "@type": "PostalAddress",
      },
    };
  }

  *Name() {
    let name = yield;
    this.jsonld.name = name.join("");
  }

  *AddressLine() {
    let streetAddress = yield;
    this.jsonld.address.streetAddress = streetAddress.join("");
  }

  *City() {
    let addressLocality = yield;
    this.jsonld.address.addressLocality = addressLocality.join("");
  }

  *Region() {
    let addressRegion = yield;
    this.jsonld.address.addressRegion = addressRegion.join("");
  }

  *PostalCode() {
    let postalCode = yield;
    this.jsonld.address.postalCode = postalCode.join("");
  }

  *root() {
    yield;
    return this.jsonld;
  }
}

Sending the document that we created through this would result in a JSON-LD object:

{
  "@context": "https://schema.org",
  "@type": "Place",
  "address": {
    "@type": "PostalAddress",
    "addressLocality": "New York",
    "addressRegion": "NY",
    "postalCode": "10006"
  },
  "name": "Condé Nast"
}

This was a bit of a whirlwind, but we think it provides a good example of ways that you can start thinking about how to use annotations!

💁‍♀️ Notes

We call a defined document schema a "source". They're not exactly parsers, since they represent the content in AtJSON in that format. I like to think of them like a source that a journalist has, where they collect information from that source and use it in their reporting.

🔍 How to find data in a document

Annotations in AtJSON can be treated like a little database. It's a little like how jQuery suddenly opened up the possibilities of what could be done on websites because you could do simple and elegant manipulation.

There's 2 levels of querying for documents. The first is partial matches:

doc.where({ type: "-offset-link" });

This query will find all links in a document

💁‍♀️ Notes

The partial match query language requires vendor prefixes

The equivalent version of this using the function syntax would be:

import { Link } from "@atjson/offset-annotations";

doc.where((annotation) => annotation instanceof Link);

💁‍♀️ Notes

The function syntax returns hydrated annotations. You may encounter Unknown annotations here, which are annotations that are unsupported by the document schema. Be careful! 👹

👹A note about Unknown Annotations

AtJSON never deletes or drops data. Instead, it isn't supported or understood by a document. If a feature becomes discontinued, or is migrated from a legacy system, we can choose to ignore this, while retaining the data.

This data is stored as unknown annotations, which are an annotation that stores all the annotation information in the attributes hash. When serialized to JSON, they look like the original annotation, but when rendering they are ignored.

{
  "type": "-html-marquee",
  "start": 0,
  "left": 10,
  "attributes": {
    "-html-style": "rainbow"
  }
}

Turns into:

new UnknownAnnotation({
  start: 0,
  end: 10,
  attributes: {
    type: "-html-marquee",
    attributes: {
      "-html-style": "rainbow",
    },
  },
});