neo4jkb
v0.1.35
Published
A graph knowledge base implemented in neo4j.
Downloads
24
Readme
neo4jKB
A graph knowledge base implemented in neo4j.
Documentation
Read the docs here. Refer to test/ for usage.
Improvement is still underway, so it will be continuously updated.
Installation
npm i --save neo4jkb
Ensure that you have neo4j
installed. From the terminal do neo4j start
, change your password (if you haven't already) using curl -H "Content-Type: application/json" -X POST -d '{"password":"YOUR_NEW_PASSWORD"}' -u neo4j:neo4j http://localhost:7474/user/neo4j/password
. You can go to http://localhost:7474/
for the browser GUI.
Backup
Use the neo4j-shell
, files will be saved to ${NEO4J_HOME}
:
Export:
export-graphml -o backup.graphml -t -r
Import:
import-graphml -i backup.graphml -t
Usage
// import and initialize
var KB = require('neo4jkb')({ NEO4J_AUTH: 'neo4j:neo4j' })
// node label
var labelNode = 'test',
// nodes A, B
propA = KB.cons.legalize({ name: 'A', hash_by: 'name' }),
propB = KB.cons.legalize({ name: 'B', hash_by: 'name' }),
// edge label
labelEdge = 'test_next',
// edge E from (a)-[e]->(b)
propE = cons.legalize({ name: 'E', hash_by: 'name' }),
// build the nodes
function buildNodes() {
return new Promise(function(resolve, reject) {
KB.addNode(
[[A.propA, A.labelNode]],
[[A.propB, A.labelNode]]
)
// .then(A.log)
.then(resolve)
.catch(reject)
})
}
// build the edges
function buildEdges() {
return new Promise(function(resolve, reject) {
KB.addEdge(
// A -> B
[[A.propA], [A.propE, A.labelEdge], [A.propB]]
)
// .then(A.log)
.then(resolve)
.catch(reject)
})
}
// build the graph: first clear the test, then buildNodes, buildEdges
function buildGraph() {
return new Promise(function(resolve, reject) {
buildNodes()
.then(buildEdges)
.then(resolve)
.catch(reject)
})
}
buildGraph()
// A simple graph is built. Go to localhost:7474 to query and see it.
Tests
To run the test, clone this repo, make sure you set the environment variable NEO4J_AUTH=<username>:<password>
(or just save an .env
if you like), then run npm test
.
DB migration
Install the neo4j-shell-tools
for db migration; use export-graphml -o backup.graphml -t -r
and import-graphml -i backup.graphml -t
from within neo4j-shell
. Files will be saved to ${NEO4J_HOME}
.
KB standard (basic)
We use a graph knowledge base (KB) to encode generic knowledge and relationships. The implementation is through a graph database - we choose Neo4j for the purpose. A graph consists of individual nodes connected with edges.
A node
:
- is a unit of knowledge
- encodes type of information by
Labels
- an array of strings. - encodes information by
prop
- a flat JSON.
An edge
:
- encodes relations between two nodes by a single string
Label
. - encodes information about the relation by
prop
- a flat JSON.
KB constraints:
all nodes and edges must have the following fields in their prop
:
hash_by
: the field used to hash this node. e.g. name.hash
: the actual hash string, e.g. "document1".updated_by
: The hash-string of theprop
's creator.updated_when
: The timestamp of when the author updated theprop
. Same format asDate.now()
.created_by
: The hash-string of theprop
's updator. Doesn't show inconstrain.js
but is built in toKB_builder.js
.created_when
: The timestamp of when the author created theprop
. Doesn't show inconstrain.js
but is built in toKB_builder.js
.- external of
prop
, eachnode
must have at least zero Labels, and each edge exactly one Label. - note that although
hash_by
andhash
are required for edges, it's optional to obey it, i.e. you can utilize thehash
in your customquery()
, butaddEdge
will allow for duplicate edgehash
. In fact,addEdge
hashes by usingLabelE
and the hash of the source and target nodes, i.e. there can be only one edge of a unique label between two distinct nodes.
Permissible graph operations:
- create/update
{nodes, edges}
- search
{nodes, edges}
(this is rich, requires data-ordering) - delete
{nodes, edges}
. If delete node => delete edges too. If delete edge, nodes not affected. - set/remove
{nodes, edges}
- micro properties: degree, component, connectivity
- macro properties: neighborhood, shortest distance, span, SCC, partition, etc.
Authorship
All knowledge must be created by users, thus the created_by
and updated_by
are mandatory fields. We keep to using user ID as the hash string since it's the only constant hash, and is universal to all adapters. Whereas the use of username as hash, despite its convenience, is costly whenever it is changed (update is O(2n)
).
As a tradeoff, we will provide an easy lookup function to yield the user node on inputing an ID, or any node with an authorship. For the timestamp, we will provide a chrono method too. (soon)
KB standard (extended)
- If global conflict may exist for a hash, localize it per owner-user by
<userHash>#<hashStr>
. - KB graph path should represent action pattern of action, e.g.
(user1)-[:assigns]->(task)-[:to]->(user2)
, so(n)-[:to]->(user)
impliesn
is given, or belongs, touser
. i.e. path/relationship transition - proper english in cypher, e.g.
(a)-[:assigns]->(t)
, then transition by tenses:(a)->[:assigned]->(t)
. Deprecation by past-tense. Ohh you can also do continuing tense, like(c)-[:doing]->(t)
. Preference:(a)-[:prefers]->(sushi:lunch); (a)-[:prefers]->(cold:weather)
- advantage: if cypher is so much like english, in fact one can parse a subsbet of english sentences into cypher
- generic auto context-mapper then graph constructor. e.g. for a sentence never been seen before, NLP parse 'gdoc1 refers gdoc2', in the form
(source)-[action]->(outcome)
, as(gdoc1)-[ref]-(gdoc2)
, so parsesaction
to a standard value, .e.g. maps{links, link, refers, refs} => ref
, using word2vec and metric closeness. Then parsesource
andoutcome
byMATCH
andhash
. - state transition and causality
add <notes>
parses into(kengz)-[adds]->(<notes>)
.
Neo4j Use cases
- real time recommendation: by graph expansion and detection of change
- master data management: who's reporting to what task and people
- fraud detection: ohhh can uncover fraud mitigated a few steps away from the source, final in-degree, loops
- graph based search
- identity and access management
Advantages
- intuitiveness
- speed
- agility: natural model(schema-free, expansive), proper query language
What can be solved by AI exclusively?
- quick KB
- specific automation/proxy task
- secretarial work
- thus most vitally, context awareness and semantic understanding
Todo
- search engine
- add other macro micro graph property methods
- chronos method
- permission, belongs_to, context tag, priority level
Changelog
Jan 2016
- added
mocha
usingchai
library for test; coverage byistanbul
. - create/update KB graph methods
- search KB graph methods; do whatever u want with the results:
<filter>
thenRETURN|DELETE|DETACH DELETE
- delete KB graph methods from search
- set/remove KB graph methods
- shortest-paths KB graph methods
- add/get as unified methods of all above
- timestamp in
cons.now
uses the ISO 8601 format, e.g.2016-01-22T15:07:25.550Z
- list the set of permissible query ops
- support
NODE_ENV=development
: all labels created will be prepended with 'test_'. This allows one to isolate the effects of devs and tests from the KB, as well as easy cleaning post-test. - support sequential transformations
cons.legalize
also acts as a quick legal prop constructor- add
sorter, picker
as transformer methods - add
flattenIndex
as generic matrix-to-string formatter - add opsHead to opsRe in
lib/constrain.js
for chaining