@peterwmwong/gto
v0.0.7
Published
GTO: Gremlin Typescript ORM
Downloads
1
Readme
GTO: Gremlin TypeScript ORM
WARNING: This project is an experiment and has not been put into production yet.
Developer Getting Started
npm ci
npm run test-db-build-docker-image
npm run test-db-start
npm run test
Enforced consistency and correctness
Currently, our with repository methods are ad-hoc groupings of raw read/write DB queries. Hard to enforce consistency or correctness.
Example: IngestionRow's rowNumber
Excel import path added rowNumber property, IRI/CSV import path did not. If repositories were Object Oriented that have a consistent read/write view of properties, this would not have happened.
Example: IngestionRow's rowNumber PART 2.
Mike and I attempted to add setting rowNumber in the IRI/CSV import path, but only found out later it was incorrectly set as a string instead of a number... and effectively causing ingestion row chunking to take forever/blow up.
Benefits from GTO
- Nodes and Edges are created and filtered with the correct properties with the correct types
- Traversals between Nodes and Edges are always correct
- ex. Prevent accidentally going from Ingestion to IngestionRow through the wrong edge (HAS???)_
- ex. Prevent accidentally using the wrong direction (in? out?)_
FUTURE IDEA: DB/Query Metrics/Statistics
- Individual
- Aggregate
- What are the longest taking queries?
- What are the most frequent queries?
- What are the biggest queries?
FUTURE IDEA: Automated DB validation
It is still possible for the database's structure to be tampered with outside of the application (JupyterHub, direct '/gremlin' access).
As GTO provides a single source of truth/schema for the DB, we could easily build a script that runs through each GTO Node, Edge, properties and make sure we're still in sync/valid.
- ex. Using
Node.name
andnew Node(g).properties
, query nodes that don't have all the required properties, mis-typed properties, extra-properties, etc.
A more accessible Graph DB
Currently, the learning curve to enable Product/QA/Developer to access data in the DB is steep for a number of reasons:
- Gremlin Querying
- Not widely known as other DB querying languages (ex. SQL)
- Less Stack Overflows
- Less Documentation
- Little-to-no tooling support (is this gremlin query syntactically correct?)
- No Schema
- Unlike SQL DBs, where out-of-the-box tooling can surface tables, columns (name, type), relationships between tables... Neptune does not.
- This makes it hard to even know where to begin when trying to access data:
- What nodes/edges are available?
- What properties for nodes/edges and their types (number? string?)
- Which direction is the edge? (inE? outE? in_? out?)
- Currently, the structure of the Graph DB is enforced by our code.
- Even worse, the code currently does not have a single-source-of-truth about which nodes/edges nor the properties (name/type) on nodes/edges.
- Constants
- Labels for Nodes/Edges and property names are mostly in flat "lists" of constants
- Incredibly easy to use the wrong constant, in the wrong place. Nothing stopping you from trying to use
P_VERTEX_TYPE
when querying against anEdge
.
Benefits from GTO
- Single source of truth for a Node/Edge and relationship between Nodes and Edges
- Type/Editor driven querying
- Type information provides users accurate hints on what's possible and valid
- ex.
Ingestion.
options -all
,byId
- ex.
Ingestion.all(g, {
options -source
(property) - ex.
Ingestion.all(g, {source: 'Annotator'}).
options -having
,count
,fetchOne
,fetchAll
,IngestionRows
.
- ex.
- Type information provides users accurate hints on what's possible and valid
Discoveries
Gremlin: GraphTraversalSource, GraphTraversal, Statics (Anonymous Traversal) have different steps.
| Step | GraphTraversalSource | GraphTraversal | Statics | |----------|----------------------|----------------|---------| | E | ✖ | | | | V | ✖ | ✖ | ✖ | | addE | ✖ | ✖ | ✖ | | addV | ✖ | ✖ | ✖ | | toList | | ✖ | | | iterate | | ✖ | | | next | | ✖ | |