npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

kgx

v0.2.0

Published

Helpful tools for (RDF/Linked Data) Knowledge Graph Exchange and Exploration

Downloads

2

Readme

kgx - knowledge graph toolkit

NPM version Coverage Status

status: pretty dynamic, still changing the API when I feel like it

Motivation

Sometimes I work with RDF data. I couldn't find any tools that did all the things I wanted, or generally behaved in a way I found comfortable. So I built this as a place to put the things I kept needing.

Biggest things are probably:

  1. Synchronous. Yes, async is great, esp with async/await, but you're going to have some of the data in memory, and when you do, things are simpler. Let's build the API around that, and then have a module for synchronizing that in-memory data with the remote data. (Maybe with async iterators now one could make asych stuff look as good. I might try that some day.)

  2. Converting to/from JavaScript types. I don't always want to work with graphs, and especially with NamedNodes, Literals, etc, so the API tends to convert freely between "native" representation and RDF representations.

  3. Organize the API around quadstores, aka graphstores, aka databases, aka datasets, aka knowledge bases. We call it a "kb" in the code. You make a kb, you add stuff to it, you look at what's in it, you change stuff, you delete stuff, you mirror it to a server somewhere, etc. (It's a "kb" not a "kg" because it can contain many distinct knowledge graphs and their metadata.)

  4. We use trig/sparql-like strings in the API. Most API calls are not performance sensitive, and using a nice RDF syntax is much easier than putting together some complex JavaScript expression.

Example

From example tbl.js

const kgx = require('kgx')
const kb = kgx.memKB()

async function main () {
  const tbl = kb.named('https://www.w3.org/People/Berners-Lee/card#1')
  await kb.fetch(tbl)
  console.log('Got %d triples', [...kb].length)
  // => Got 87 triples                   

  // maybe: for (const q of kb) console.log(kb.quadAsNQ(q))

  for (const {title, name} of
       kb.query('?tbl foaf:title ?title; foaf:name ?name',
                { bind: {'?tbl': tbl} })) {
    console.log(title,name)
    // => Sir Timothy Berners-Lee
  }
}

API documentation

Only parts of the API are currently documented, sorry.

See API Documentation


Thoughts / Plans / Notes

From here on out is just a place where I write down idea, maybe when I've built something at a higher level, and am thinking about whether I can make a general version to go in kgx.


kgx-server sources...

  • runs web server to show those sources

kgx-view sources...

  • runs private kgx server and opn the result
  • or loads it into current instance if there is one? at std port??

kgx-from-{csv|nt|turtle|jsonld|}

  • web centric, not just parsers
  • include ldfetch, all-your-base, headless-chrome-crawler, metascraper
  • include progress and error reporting
  • include some of the HTML stuff, maybe
  • so, ever load of a URL results in at least a Fetch (which might be failed)
  • FETCHID :fetched NG gets put into DG
  • NG :origin <https://google.com:5151>
  • NG :source <https://google.com:5151/foo/bar/baz>

kgx-to-{...}

  • as currently in quadsite
    • shape, format, dateformat, linkformat

library:

  • new kgx.KB()
  • kb.tablify(shape) returns a kgx.Table() .rows, .headings
  • kb.filter(f) -> read-only kb
  • kb.load(src), kb.addSource(src), kb.loader.addSource(src) USE CRAWLER

kgx-crawler

  • separate process from kgx library, kgx server
    • maybe just reads/writes to local fs
    • https://www.digitalocean.com/community/tutorials/how-to-install-and-secure-redis-on-debian-9
    • https://www.npmjs.com/package/headless-chrome-crawler
    • https://news.ycombinator.com/item?id=16437082
    • https://github.com/brendonboshell/supercrawler + puppeteer
    • https://www.browserless.io/ --- remote puppeteer

Following / Crawling (planned)

kb.crawl({owlImports: true, predicates: true, classes: true})
kb.crawl(['some url', 'some other url'])

How is provenance recorded?

  1. only fetch triples, and graph name is source
  2. only fetch triples, and graph name is linked to source
  3. okay to fetch quads, but .isolate them, then link to source

kb.isolate() returns modified kb (or modifies in place? Or just operates on quadlist?) where any NamedNode graph names have been replaced by new BlankNodes, and the default graph is place into a named graph, whose label (another new BlankNode) is returned. isolate() allows multiple datasets to co-exist in one dataset without interacting until/unless we query across graphs.

Linking to source is done like:

_:gr332 { <a> <b> _:gr332_1 }
_:gr332_1 { <a> <b> <d> }
:fetch332
    providedDefaultGraph _:gr332;
    providedGraph _:gr332, _:gr332_1;
    completed $time;
    date $time;
    lastModified $time;
    fromURL $url
    .

vs

$url { <a> <b> _:gr332_1 }
_:gr332_1 { <a> <b> <d> }
# optional:
:fetch332
    providedDefaultGraph _:gr332;
    providedGraph _:gr332, _:gr332_1;
    completed $time;
    date $time;
    lastModified $time;
    fromURL $url
    .

Maybe it's an option:

  • defaultGraphName: 'default' | 'source' | 'blank' |

But 'blank' (with isolate()) is the only one that can't get out of control, so it seems like best practice. But it also seems kinda complicated.

It means we kinda want:

kb.crawler.sources = [ { url, lastStarted, lastEnded, **defaultGraphNode** } ]

so you can find the defaultGraphNode. Right? you could also find it via querying. Bascially, Crawler maintains a KB where it owns the default graph, keeping it full of metadata about fetches, and all the named graphs are what they are.

On-Demand (Lazy) Data (planned)

kb.provide(pattern, providerFunction)

Add the pair to the set of active providers. The providers are used whenever looking in the kb for data. Can be used to implement overlayKb (unionKB? mergeKB?), and various otherwise-expensive tricks.

Unclear if we want:

  • provideBindings (Solutions), nice if the pattern has lots of constants in it and maybe some joins. Basically backward chaining. Could make answering some kinds of queries super efficient; you never actually need to turn things into quads.
  • provideQuads, simpler in the simple cases, especially like looking for all quads.
  • provideTriples, even simpler, and lets the system offer provenance pointing to this provisionFunction.

Maybe that's settled by an options parameter.

Rules (planned)

Example File ruleset1.js

const ruleset = [
  {
    if: '?person foaf:firstName ?first; foaf:lastName ?last',
    do: v => { v.name = v.first + ' ' + v.last },
    then: '?person foaf:name ?name'
  }
]

ruleset.name = 'Name Vocabulary Conversion'
ruleset.strategy = 'Forward'

module.exports = ruleset

Use like:

kb.addRules(require('ruleset1'))

Variations:

  • if/then, all variables/bnodes match, pure datalog
  • if/then with fresh blank nodes in the then-clause, such as due to a [ ] or ( ) construct; this is now Horn logic (which is Turing complete). Not obvious how to implement backward-chaining with this without FOL-style "Terms". Maybe we make arrays (lists) native, and use them?
  • iff/then, implies the same rule with clauses swapped
  • if/do, just executable, forward only
  • if/do/then, a way to execute builtins to define vars for then (but best to make them side-effect free and using no data except what's in the the argument, which is why they are in a separate module in the example)

Provenance

Provenance chain can use graph label, at least when only triples are concluded. What happens when you want a rule about provenance, though? Terms/tuples seem much better for this than quads. Like, instead of graph literals, just use lists of triples, where triples are spo lists. But those are harder to search when we don't know the provenance.

Does .isolate, and the .provide and Crawler stuff help with this? Do it just like that. Always output isolated stuff, and link it with the provenance.

(If someone equates the graph labels, we'll get lost, though.)

_:gr007 { <a> <b> _:gr007_1 }
_:gr007_1 { <a> <b> <d> }
:fetch007
    providedDefaultGraph _:gr007;
    providedGraph _:gr007, _:gr007_1;
    completed $time;
    date $time;
    lastModified $time;
    fromRule $ruleID       # this is the only different part
    .

We're going to need views to be near-JS-level performance much of the time, via provide, I think. kb.provide inputs JS objects, kb.view outputs them, and we need to make sure there usually a combinatoric explosion of joins in the middle.

owl:InverseFunctionalProperty

As a special case, this reasoning can be done like:

const kb2 = kgx.owlifp.rewrite(kb1, prefns)

It's equivalent to running the IFP rule and the equate rules, but

  • doesn't use the rules engine or anything sophisticated
  • doesn't chain; so it's only really appropriate for Datatype Properties, where chaining isn't needed
  • picks one of the values and discards the rest (you can give the preferred namespace to keep)

Issue: should we use keys instead of IFP? I don't understand the DL-Safe issue in https://www.w3.org/TR/owl2-syntax/#Keys

See:

  • https://www.w3.org/TR/owl2-syntax/#Inverse-Functional_Object_Properties
  • https://www.w3.org/TR/owl2-mapping-to-rdf/

This is used for implemented movable schemas.