npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@treecg/actor-init-typeahead

v1.21.5

Published

TREE+Comunica Typeahead is a query engine for realtime text search over decentralized RDF datasets.

Downloads

2

Readme

TREE+Comunica Typeahead

TREE+Comunica Typeahead is a query engine for realtime text search over decentralized RDF datasets.

Internally, this is a Comunica module based on Comunica SPARQL, and can likewise be configured with additional modules.

Command Line Utilities

If you have installed this project as described on Github sources, you can test this actor, or its configuration files, using the bin/run.jsand bin/run-dynamic.js scripts. The bin/run.js script uses the preconfigured actor at engine-default.js, which gets created while installing the project (see the prepare script in package.json). The other script interprets the configuration file at runtime, making it more suitable for testing configurations.

For example, running bin/run.js https://treecg.github.io/demo_data/cht.nt aarde nederland will use the default configuration to search for the terms 'aarde' and 'nederland' in a TREE collection that is hosted at https://treecg.github.io/demo_data/cht.nt. Running this should yield this output:

# Discovered TREE nodes: 135
[1]
  Subject: https://data.cultureelerfgoed.nl/term/id/cht/d9121f36-5c7e-49ea-8d40-08521f7f8c2b
  # Matching Quads: 1
  Score:   [2,12,-52]
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type
      https://data.cultureelerfgoed.nl/vocab/id/rce#Source
    http://purl.org/dc/terms/created
      2016-09-29T11:46:57Z
    https://data.cultureelerfgoed.nl/vocab/id/rce#isSourceOf
      https://data.cultureelerfgoed.nl/term/id/cht/7be94ac2-6e3f-4f4c-ada2-36f1cbc50457
    http://www.w3.org/2004/02/skos/core#prefLabel
      Bewogen aarde, aardkundig erfgoed in Nederland, 2007
    https://data.cultureelerfgoed.nl/vocab/id/rce#isSourceOf
      https://data.cultureelerfgoed.nl/term/id/cht/078b8a8e-0437-4b06-b498-95b0f0ef41ba
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type
      http://www.w3.org/2004/02/skos/core#Concept
    http://www.w3.org/2004/02/skos/core#broader
      https://data.cultureelerfgoed.nl/term/id/cht/f772d0b1-5828-4786-b77f-37eabc8c19f4
    http://purl.org/dc/terms/creator
      https://cultureelerfgoed.poolparty.biz/user/moutp
    https://data.cultureelerfgoed.nl/vocab/id/rce#hasConceptStatus
      https://data.cultureelerfgoed.nl/term/id/cht/c58475d5-0795-4623-b4be-ea1524f4b4fb
    https://data.cultureelerfgoed.nl/vocab/id/rce#isSourceOf
      https://data.cultureelerfgoed.nl/term/id/cht/b2c54b42-bdc5-4542-9c77-a4e626a7d450
    http://www.w3.org/2004/02/skos/core#inScheme
      https://data.cultureelerfgoed.nl/term/id/cht/7ca1a6b4-85ce-4244-a097-0ad177b5575c
    https://data.cultureelerfgoed.nl/vocab/id/rce#isSourceOf
      https://data.cultureelerfgoed.nl/term/id/cht/0e3a026d-e227-4295-a1a6-0a7af073801b
    https://data.cultureelerfgoed.nl/vocab/id/rce#isSourceOf
      https://data.cultureelerfgoed.nl/term/id/cht/f4e9eec3-2f0e-4682-adf6-82ef8224dc59
    http://schema.semantic-web.at/ppt/inSubtree
      https://data.cultureelerfgoed.nl/term/id/cht/f772d0b1-5828-4786-b77f-37eabc8c19f4

All quads related to the matched subject are returned, while the only the fourth quad matches the input.

Usage within other applications

This package has been developed and tested on Linux and macOS, on Node.JS version 14.

To use this library within your own Javascript project, install it from NPM as follows:

$ npm i @treecg/actor-init-typeahead

You can then create a query engine (with default engine) as follows:

const newEngine = require('@treecg/actor-init-typeahead').newEngine;
const myEngine = newEngine();

An engine may also be created dynamically from a custom configuration file:

const newEngineDynamic = require('@treecg/actor-init-typeahead').newEngineDynamic;
const myEngine = await newEngineDynamic({ configResourceUrl: 'path/to/config.json' });

Alternatively, to use a static version of a custom configuration we recommend to create an NPM script:

"scripts": {
    "prepare": "comunica-compile-config cpath/to/config.json urn:comunica:typeaheadinit > comunica-engine.js"
}

Running npm run prepare will then create a comunica-engine.js file which can be used as:

const myEngine = require('comunica-engine');

Once you have an engine, you can use it to query the Web as follows:

const expectedPredicateValues = {
    'http://www.w3.org/2004/02/skos/core#prefLabel': ['aarde', 'nederland'],
    'http://www.w3.org/2004/02/skos/core#altLabel': ['aarde', 'nederland'],
}

const query = {
    numResults: 5,
    urls: ['https://treecg.github.io/demo_data/cht.nt'],
    treeNodes: [],
    expectedDatatypeValues: {},
    expectedPredicateValues,
};

const results = engine.query(query);
results.on('data', d => {
    console.log(JSON.stringify(d));
});

The return value of the query method is an AsyncIterator, which may be consumed by subscribing to any 'data' events it emits. Each data event contains an IResult object, defined as:

import type * as RDF from 'rdf-js';

interface IResult {
  knownTreeNodes: ITreeNode[];
  rankedSubjects: IRankedSubject[];
}

interface ITreeNode {
  url: string;
  values: TreeValues;
}

interface IRankedSubject {
  score: number[];
  subject: string;
  matchingQuads: RDF.Quad[];
  quads: RDF.Quad[];
}

The returned TREE nodes can be passed on to subsequent query calls, so that not every query has to restart from the collection roots.

Configuration

Like all Comunica modules, this actor can be configured through Components.js. After getting acquainted with this framework, you can start making changes to the configuration files in config/.

For example, if you want to add an additional metric for ranking the found results, have a look at config/sets/rdf-score.jsonld. At the moment this file specifies that three actors are used, with their order set using the beforeActor property.

To test these any changes, either rebuild the engine-default.js file by running npm run prepare, or by using the bin/run-dynamic.js script.

Recipes

To use this actor in a real application, here are a few patterns you might want to reuse.

Close running iterators when a new query is fired

Keep track of the current results iterator, and close it before starting a new one:

iterator.removeAllListeners();
iterator.close();

This will stop the running query from traversing any further data pages.

Reuse discovered tree nodes

let nodes = null;

function query(...) {
    const query = {
        numResults,
        urls: roots,
        treeNodes: nodes,
        expectedDatatypeValues,
        expectedPredicateValues,
      };

    iterator = engine.query(query);
    iterator.on('data', d => {
        nodes = d.knownTreeNodes;
    }
}

This will make subsequent queries more efficient, as it will reuse the discovered tree nodes to bootstrap the node traversal.

Prefetch the dataset roots

Related to the previous point, you can parse the root nodes on page load:

let nodes = null;
function prefetch(urls) {
    engine.prefetch(urls).then((discoveredNodes) => {
        nodes = discoveredNodes;
    })
}

The query engine exposes a prefetch method that only returns the discovered nodes on the specified root URLs.

Normalize your input

Literal normalization actors can be configured to improve the recall, but this only works if the input is normalized in the same way as the found results. The engine exposes a normalizeInput method for this reason, which can be used to normalize a list of search terms to a list of normalized terms:

const normalizedInput = await engine.normalizeInput(['belgië']);

const expectedPredicateValues = {
    'http://www.w3.org/2004/02/skos/core#prefLabel': normalizedInput,
    'http://www.w3.org/2004/02/skos/core#altLabel': normalizedInput,
}

Use a Webworker per data source

Processing a lot of RDF data freeze your web page, so it's a good idea to use a Webworker in any case. But especially if you're querying over multiple datasets, as these can be handled in parallel. Each results object also specifies the scores the result received, so the results from different data sources can be recombined in the main thread. For example of how this can be implemented, see our demo code.