neo4jkb

v0.1.35

Published

3 years ago

A graph knowledge base implemented in neo4j.

Downloads

0High
0Medium
0Low

kengz

neo4j kb knowledge base ai

neo4jKB

A graph knowledge base implemented in neo4j.

Documentation

Read the docs here. Refer to test/ for usage.

Improvement is still underway, so it will be continuously updated.

Installation

npm i --save neo4jkb

Ensure that you have neo4j installed. From the terminal do neo4j start, change your password (if you haven't already) using curl -H "Content-Type: application/json" -X POST -d '{"password":"YOUR_NEW_PASSWORD"}' -u neo4j:neo4j http://localhost:7474/user/neo4j/password. You can go to http://localhost:7474/ for the browser GUI.

Backup

Use the neo4j-shell, files will be saved to ${NEO4J_HOME}:

Export:

export-graphml -o backup.graphml -t -r

Import:

import-graphml -i backup.graphml -t

Usage

// import and initialize
var KB = require('neo4jkb')({ NEO4J_AUTH: 'neo4j:neo4j' })

// node label
var labelNode = 'test',
// nodes A, B
propA = KB.cons.legalize({ name: 'A', hash_by: 'name' }),
propB = KB.cons.legalize({ name: 'B', hash_by: 'name' }),

// edge label
labelEdge = 'test_next',
// edge E from (a)-[e]->(b)
propE = cons.legalize({ name: 'E', hash_by: 'name' }),


// build the nodes
function buildNodes() {
  return new Promise(function(resolve, reject) {
    KB.addNode(
      [[A.propA, A.labelNode]],
      [[A.propB, A.labelNode]]
      )
      // .then(A.log)
      .then(resolve)
      .catch(reject)
  })
}

// build the edges
function buildEdges() {
  return new Promise(function(resolve, reject) {
    KB.addEdge(
      // A -> B
      [[A.propA], [A.propE, A.labelEdge], [A.propB]]
      )
      // .then(A.log)
      .then(resolve)
      .catch(reject)
  })
}

// build the graph: first clear the test, then buildNodes, buildEdges
function buildGraph() {
  return new Promise(function(resolve, reject) {
    buildNodes()
    .then(buildEdges)
    .then(resolve)
    .catch(reject)
  })
}


buildGraph()
// A simple graph is built. Go to localhost:7474 to query and see it.

Tests

To run the test, clone this repo, make sure you set the environment variable NEO4J_AUTH=<username>:<password> (or just save an .env if you like), then run npm test.

DB migration

Install the neo4j-shell-tools for db migration; use export-graphml -o backup.graphml -t -r and import-graphml -i backup.graphml -t from within neo4j-shell. Files will be saved to ${NEO4J_HOME}.

KB standard (basic)

We use a graph knowledge base (KB) to encode generic knowledge and relationships. The implementation is through a graph database - we choose Neo4j for the purpose. A graph consists of individual nodes connected with edges.

A `node`:

is a unit of knowledge
encodes type of information by Labels - an array of strings.
encodes information by prop - a flat JSON.

An `edge`:

encodes relations between two nodes by a single string Label.
encodes information about the relation by prop - a flat JSON.

KB constraints:

all nodes and edges must have the following fields in their prop:

hash_by: the field used to hash this node. e.g. name.
hash: the actual hash string, e.g. "document1".
updated_by: The hash-string of the prop's creator.
updated_when: The timestamp of when the author updated the prop. Same format as Date.now().
created_by: The hash-string of the prop's updator. Doesn't show in constrain.js but is built in to KB_builder.js.
created_when: The timestamp of when the author created the prop. Doesn't show in constrain.js but is built in to KB_builder.js.
external of prop, each node must have at least zero Labels, and each edge exactly one Label.
note that although hash_by and hash are required for edges, it's optional to obey it, i.e. you can utilize the hash in your custom query(), but addEdge will allow for duplicate edge hash. In fact, addEdge hashes by using LabelE and the hash of the source and target nodes, i.e. there can be only one edge of a unique label between two distinct nodes.

Permissible graph operations:

create/update {nodes, edges}
search {nodes, edges} (this is rich, requires data-ordering)
delete {nodes, edges}. If delete node => delete edges too. If delete edge, nodes not affected.
set/remove {nodes, edges}
micro properties: degree, component, connectivity
macro properties: neighborhood, shortest distance, span, SCC, partition, etc.

Authorship

All knowledge must be created by users, thus the created_by and updated_by are mandatory fields. We keep to using user ID as the hash string since it's the only constant hash, and is universal to all adapters. Whereas the use of username as hash, despite its convenience, is costly whenever it is changed (update is O(2n)).

As a tradeoff, we will provide an easy lookup function to yield the user node on inputing an ID, or any node with an authorship. For the timestamp, we will provide a chrono method too. (soon)

KB standard (extended)

If global conflict may exist for a hash, localize it per owner-user by <userHash>#<hashStr>.
KB graph path should represent action pattern of action, e.g. (user1)-[:assigns]->(task)-[:to]->(user2), so (n)-[:to]->(user) implies n is given, or belongs, to user. i.e. path/relationship transition
proper english in cypher, e.g. (a)-[:assigns]->(t), then transition by tenses: (a)->[:assigned]->(t). Deprecation by past-tense. Ohh you can also do continuing tense, like (c)-[:doing]->(t). Preference: (a)-[:prefers]->(sushi:lunch); (a)-[:prefers]->(cold:weather)
advantage: if cypher is so much like english, in fact one can parse a subsbet of english sentences into cypher
generic auto context-mapper then graph constructor. e.g. for a sentence never been seen before, NLP parse 'gdoc1 refers gdoc2', in the form (source)-[action]->(outcome), as (gdoc1)-[ref]-(gdoc2), so parses action to a standard value, .e.g. maps {links, link, refers, refs} => ref, using word2vec and metric closeness. Then parse source and outcome by MATCH and hash.
state transition and causality
add <notes> parses into (kengz)-[adds]->(<notes>).

Neo4j Use cases

real time recommendation: by graph expansion and detection of change
master data management: who's reporting to what task and people
fraud detection: ohhh can uncover fraud mitigated a few steps away from the source, final in-degree, loops
graph based search
identity and access management

Advantages

intuitiveness
speed
agility: natural model(schema-free, expansive), proper query language

What can be solved by AI exclusively?

quick KB
specific automation/proxy task
secretarial work
thus most vitally, context awareness and semantic understanding

Todo

search engine
add other macro micro graph property methods
chronos method
permission, belongs_to, context tag, priority level

Changelog

Jan 2016

added mocha using chai library for test; coverage by istanbul.
create/update KB graph methods
search KB graph methods; do whatever u want with the results: <filter> then RETURN|DELETE|DETACH DELETE
delete KB graph methods from search
set/remove KB graph methods
shortest-paths KB graph methods
add/get as unified methods of all above
timestamp in cons.now uses the ISO 8601 format, e.g. 2016-01-22T15:07:25.550Z
list the set of permissible query ops
support NODE_ENV=development: all labels created will be prepended with 'test_'. This allows one to isolate the effects of devs and tests from the KB, as well as easy cleaning post-test.
support sequential transformations
cons.legalize also acts as a quick legal prop constructor
add sorter, picker as transformer methods
add flattenIndex as generic matrix-to-string formatter
add opsHead to opsRe in lib/constrain.js for chaining