npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@istex/istex-merge

v2.3.1

Published

Library to build merged documents and generate Hal TEIs from them.

Downloads

2

Readme

istex-merge

Library to build merged documents and generate Hal TEIs from them.

Table of Contents

Install

npm install @istex/istex-merge

generateMergedDocument

Function to create a merged document from multiple documents with a set of rules.

Prerequisites

Mapping

A mapping example can be found here.

This JSON file's structure is as follows:

{
  "corpusName": true,
  "source": true,
  "sourceId": false,
  "sourceUid": {
    "action": "merge",
    "path": "sourceUids"
  },
  // ...
  "title.default": true,
  "title.en": true,
  "title.fr": true,
  "utKey": false,
  // ...
  "business.duplicates": {
    "action": "merge",
    "id": "sourceUid"
  },
  // ...
  "business.hasFulltext": false,
  "fulltextUrl": true
}

This file describes the fields that will be present in the generated merged document.

The default mapping is exported, which means you can build your own mapping from it without having to recreate everything. This can be done like so:

const { defaultMapping } = require('@istex/istex-merge');

// default: true
defaultMapping.authors = false;

Note: istex-merge can merge data coming from all sources. The two possible scenarios are:

  • Fields with a simple value (like a string): you can specify a path to where the merged data will be in the final object. In the example above, the sourceUid field is merged and placed into sourceUids (we make it plurial because the value becomes an array).
  • Fields with an array value (like business.duplicates): a property (sourceUid in the example above) must be used to discriminate the values and remove potential duplicates if the values are objects.

Rules

An example file describing the priority rules can be found here.

This JSON file's structure is as follows:

{
  "priorities": [
    "hal",
    "crossref",
    "pubmed",
    "sudoc"
  ],
  "keys": {
    "corpusName": [/*...*/],
    "source": [/*...*/],
    "sourceId": [/*...*/],
    "sourceUid": [/*...*/],
    // ...
    "title.default": [/*...*/],
    "title.fr": [/*...*/],
    "title.en": [/*...*/],
    "utKey": [/*...*/],
    // ...
    "business.hasFulltext": [/*...*/],
    "fulltextUrl": [/*...*/]
  }
}

The priority mechanism:

  • priorities defines the default priority order. It is applied to every field without a specific priority order.
  • keys.<field> defines a specific priority order for <field>. Use an empty array ([]) to tell istex-merge to use the default priority order.

The default rules are exported, which means you can build your own rules from them without having to recreate everything. This can be done like so:

const { defaultRules } = require('@istex/istex-merge');

// default: ['sudoc-theses', 'sudoc-ouvrages', 'hal', 'pubmed', 'crossref']
defaultRules.keys.abstract = ['pubmed', 'crossref', 'hal'];

Usage

This library must be integrated in an environment with direct access to the docObjects.

const { generateMergedDocument, defaultRules, defaultMapping } = require('@istex/istex-merge');
const docObjects = [{...}, {...}, {...}];

defaultRules.keys.abstract = ['pubmed', 'crossref', 'hal'];

defaultMapping.authors = false;

const mergedDocument = generateMergedDocument(docObjects, { rules: defaultRules, mapping: defaultMapping });

Example

Considering the following list of documents:

[
  {
    "source": "hal",
    "authors": [],
    "abstract": {
      "fr": "abstract.hal.fr",
      "en": "abstract.hal.en"
    }
  },
  {
    "source": "crossref",
    "authors": [
      "authors.crossref.1",
      "authors.crossref.2"
    ],
    "abstract": {
      "fr": "abstract.crossref.fr",
      "en": "abstract.crossref.en"
    }
  },
  {
    "source": "pubmed",
    "authors": [
      "authors.pubmed.1",
      "authors.pubmed.2"
    ],
    "abstract": {
      "fr": "abstract.pubmed.fr",
      "en": "abstract.pubmed.en"
    }
  },
  {
    "source": "sudoc",
    "authors": [
      "authors.sudoc.1",
      "authors.sudoc.2"
    ],
    "abstract": {
      "fr": "abstract.sudoc.fr",
      "en": "abstract.sudoc.en"
    }
  }
]

Note: The docObjects used to create the merged document MUST contain a source field.

I want to build a merged document according to the following rules:

  • By default, use data coming from "hal", then "crossref", then "pubmed" and finally "sudoc".
  • For abstract.fr, use data coming from "crossref", then "pubmed" and finally "sudoc".
  • For abstract.en, use data coming from "pubmed", then "sudoc".

I, thus, use the following JSON file:

{
  "priorities": [
    "hal",
    "crossref",
    "pubmed",
    "sudoc"
  ],
  "keys": {
    "authors": [],
    "abstract.fr": [
      "crossref",
      "pubmed",
      "sudoc",
      "hal"
    ],
    "abstract.en": [
      "pubmed",
      "sudoc",
      "crossref",
      "hal"
    ]
  }
}

Which will give me the following result:

{
  "source": "hal",
  "authors": [
    "authors.crossref.1",
    "authors.crossref.2"
  ],
  "abstract": {
    "fr": "abstract.crossref.fr",
    "en": "abstract.pubmed.en"
  },
  "origins": {
    "authors": "crossref",
    "abstract.fr": "crossref",
    "abstract.en": "pubmed",
    "sources": [
      "hal",
      "crossref",
      "pubmed"
    ]
  }
}

Description:

  • source: the base source
  • origins.<field>: the source that was modified by istex-merge for <field>
  • origins.sources: an array compiling all the sources used in the merged document
  • If the source on top of the priority list has no data for a field (in our example, the prioritized source (hal) has no authors), istex-merge will go down the priority list until it finds a source with data for this field.

generateHalTei

Function to generate a Hal TEI from a merged document.

Prerequisites

Generate a merged document using the generateMergedDocument function.

Usage

const { generateMergedDocument, generateHalTei } = require('@istex/istex-merge');
const docObjects = [{...}, {...}, {...}];

const mergedDocument = generateMergedDocument(docObjects);

const halTeiAsString = generateHalTei(mergedDocument);

You can also pass an options object to generateHalTei. This object is passed as is to xmlbuilder2 (the XML builder used by istex-merge). You can find all the available options here. For example, you can use this options object to pretty print the TEI like so:

const prettyPrintedTei = generateHalTei(mergedDocument, { prettyPrint: true });