npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

wordmap-usfm

v0.5.4

Published

A utility for injecting alignment data into USFM files

Downloads

14

Readme

Build Status codecov TypeScript

wordMAP-usfm

This library provides utilities for injecting alignment data from WordMAP into USFM3 files.

Usage

From the command line:

npm i wordmap-usfm -g
wordmap-usfm --help

As a module:

npm i wordmap-usfm
import {align} from 'wordmap-usfm';
...
const alignedUSFM = align(alignmentData, usfmData);

Input

Alignment JSON Data Structure

The intent is to have a single file as input that allows full round trip conversion to USFM 3 without any loss. One of the resources is intended to be the source/primary text, the second resource is the one that is the target language of the USFM file. The target language is what would typically be shown in the USFM file without any alignment data.

In theory the data structure is extensible to allow for other metadata per word or token and potentially more than two languages although that may not lend itself well to USFM.

Top level attributes

The top level attributes of the data structure are conformsTo:String, metadata:Object, and segments:Array.

{
  "conformsTo": "alignment-0.1",
  "metadata": {...},
  "segments": [...]
}

conformsTo

The conformsTo attribute specifies which version of the spec was used for the generation of the alignment file. Over time we plan to make changes to the alignment specification and are using Semantic Versioning starting with version 0.1 for this release.

"conformsTo": "alignment-0.1",

metadata

The top level attribute metadata stores information about the content stored in the file.

"metadata": {
    "modified": "1524293704",
    "resources": {...}
  },
metadata.modified

The metadata.modified attribute is a unix timestamp of the last modification of the file. Any time an edit is made to the content of the file this timestamp should be updated so that users of the file can keep up with the latest version of the data.

"modified": "1524293704"

metadata.resources

The metadata.resources attribute is an object whose keys are the names of resources and the values are the metadata describing the resources. These keys/names are used later in the segments section of the file to specify which resource the respective content belongs to. The expected attributes for each resource are languageCode:String, name:String, and version:String. This information will be used in generating the headers of the USFM3 file output.

NOTES:

  • One text will be specified as the language of the USFM file and the other will be aligned to it as USFM3 milestones.
  • One of the resource's text of each segment will be used as the raw USFM string for the verse for USFM3 generation.
  • The tokens in the corresponding segment of the other resource will be aligned to the tokens found in the raw string of the first.
    "resources": {
      "r0": {
        "languageCode": "el-x-koine",
        "name": "UGNT",
        "version": "0.1"
      },
      "r1": {
        "languageCode": "en",
        "name": "ULT",
        "version": "9"
      }

segments

The segments attribute is an array of individual segments of the resources grouped together at the aligned segment level.

  "segments": [
    {
      "resources": {...},
      "alignments": [...]
    },
    {
      "resources": {...},
      "alignments": [...]
    },
    ...
  ]
segments[n].resources

The segments[n].resources attribute is an object of which the keys correspond to the keys in the metadata.resources. In the example below, r0 and r1 are the resource keys.

The values of the resource keys are an object whose attributes are text:String, tokens:Array, and metadata:Object.

  • The text attribute holds the raw string of the segment.
  • The tokens attribute is an array of individual tokens as strings.
    • Later spec revisions will include tokens represented as data objects.
  • The metadata attribute is an object that holds data about the segment.
    • Currently only requires a contextId attribute that identifies where the segment belongs, such as the verse identifier.
    • Having metadata.contextId at each resource allows for alignments to exist between different versification systems.
      "resources": {
        "r0": {
          "text": "Βίβλος γενέσεως Ἰησοῦ Χριστοῦ υἱοῦ Δαυὶδ υἱοῦ Ἀβραάμ.",
          "tokens": ["Βίβλος", "γενέσεως", "Ἰησοῦ", "Χριστοῦ", "υἱοῦ", "Δαυὶδ", "υἱοῦ", "Ἀβραάμ"],
          "metadata": {
            "contextId": "MAT001001"
          }
        },
        "r1": {
          "text": "The book of the genealogy of Jesus Christ, son of David, son of Abraham:",
          "tokens": ["The", "book", "of", "the", "genealogy", "of", "Jesus", "Christ", "son", "of", "David", "son", "of", "Abraham"],
          "metadata": {
            "contextId": "MAT001001"
          }
        },
      }
segments[n].alignments

The segments[n].alignments attribute is an array of individual alignments between tokens of the resources at the same level.

Each alignment is an object with the attributes of score:Float, verified:Boolean, and the [key]:Object that correspond with the resources.

  • The score attribute holds the confidence of this specific alignment generated by the alignment tool.
  • The verified attribute holds the boolean of whether or not the alignment was generated or approved by a human.
  • The remaining attributes hold an array of indexes that correspond to their string counterparts in the segments[n].resources[key].tokens in the respective key of the array.

The example below shows an alignment of the above tokens. Note that alignments to null can be represented as not being present at all. Optionally they can be represented as indexes on one side and an empty array on the other.

  "alignments": [
    {
      "score": 0.516905944153279,
      "r0": [0],
      "r1": [0, 1],
      "verified": false
    },
    {
      "score": 0.5363691430931895,
      "r0": [1],
      "r1": [3, 4],
      "verified": false
    },
    {
      "score": 0.5372550334365089,
      "r0": [2],
      "r1": [6],
      "verified": false
    },
    {
      "score": 0.4762634342491979,
      "r0": [3],
      "r1": [7],
      "verified": false
    },
    {
      "score": 0.46762244230161903,
      "r0": [4],
      "r1": [8, 9],
      "verified": false
    },
    {
      "score": 0.5253404588058129,
      "r0": [5],
      "r1": [10],
      "verified": false
    },
    {
      "r0": [6],
      "r1": [11, 12],
      "verified": true
    },
    {
      "r0": [7],
      "r1": [13],
      "verified": true
    }
  ]

The example below is fabricated to show many to many, many to one, one to many, one to one, none to many, one to none, one to many verified, and many to one verified in a respective order. The non-verified are machine aligned and verified are human aligned or confirmed.

  "alignments": [
    {
      "score": 0.516905944153279,
      "r0": [0, 1],
      "r1": [1, 4],
      "verified": false
    },
    {
      "score": 0.5363691430931895,
      "r0": [1, 2],
      "r1": [4],
      "verified": false
    },
    {
      "score": 0.5372550334365089,
      "r0": [2],
      "r1": [6, 7],
      "verified": false
    },
    {
      "score": 0.4762634342491979,
      "r0": [3],
      "r1": [7],
      "verified": false
    },
    {
      "score": 0.46762244230161903,
      "r0": [],
      "r1": [8, 9],
      "verified": false
    },
    {
      "score": 0.5253404588058129,
      "r0": [5],
      "r1": [],
      "verified": true
    },
    {
      "r0": [6],
      "r1": [10, 12],
      "verified": true
    },
    {
      "r0": [7, 9],
      "r1": [13],
      "verified": true
    }
  ]

Roadmap

  • Support extracting alignment data from USFM3. This will be useful when importing usfm into tC.
  • Support alignments that span verses
  • Support alignments that span chapters