simplegraph
v0.1.1
Published
JavaScript kata for directed acyclic graph data structure
Downloads
9
Readme
simplegraph
Makes a fairly simple graph structure (used to be "simple-graph")
update
[19 DEC 2013] ~ v0.1.0 contains several breaking changes ~ if you're using an earlier version please update.
Thanks to Ferry Boender
...for his post on the [dependency resolving algorithm] (http://www.electricmonk.nl/log/2008/08/07/dependency-resolving-algorithm/), and for his ongoing encouragement in this exploration.
travis
testling
rawgit
View the generated test-bundle page on rawgit.
install
npm install simplegraph
git clone git://github.com/dfkaye/simplegraph.git
use
node.js
var simplegraph = require('simplegraph');
browser
window.simplegraph
justify
Everyone should figure out at least one directed acyclic graph data structure in JavaScript. Here's mine.
I'm using this for learning tests in order to work out dependency-loading trivia in another project, and re-learning performance optimizations for scaling with big collections (over 2 million elements in the big-fixture tests).
what makes it simple?
There is no concept of a node or external node lookup map. Rather, every
object on a graph is a graph instance whose edges are other graph instances.
Nothing special - perhaps a bit naïve, but traversal is required by several
methods, and making that simple has been a challenge.
structure
A graph contains a map of edges
(other graphs) by their id. The required
constructor argument is a string id
. The constructor can be called with or
without the new
keyword.
var main = simplegraph('main');
var main = new simplegraph('main');
That returns an object with the following fields:
main.id // string 'main'
main.edges // map of graphs {}
These are the only constructor-created properties.
No graph data element is stored in a graph element - final answer
traversal
[20 DEC 2013 ] API/DOC IN PROGRESS
The traversal API is not ideal, but has the virtue of being testable so far...
Some graph child methods use procedural code (for-loops) rather than iterator functions in order to support IE6-8 - and to execute faster (iterators run just under an order of magnitude slower).
The resolve (traversal) methods use visitor iteration methods internally. The pattern looks like this:
// pass visit function to the visitor method...
var visitor = this.visitor(function visit(edge) {
// process this edge
});
// or define visit directly:
visitor.visit = function (edge) {
// process this edge
};
// define an optional post traversal processing method
visitor.after = function (edge) {
// post-process this edge
};
// run it
this.resolve(visitor);
// inspect it
visitor.results.join('');
// etc.
resolve(visitor?)
Any graph object can be resolved independently with the resolve()
method. The
resolve()
method optionally accepts a visitor
object. If one is not
specified, resolve()
creates one for use internally from the graph on which
resolve()
is first called. The visitor is a collecting parameter which lets
you manage results, terminate processing at specified points, and so on.
If the resolve()
method detects a cycle, it will throw an error.
If no cycle or other error occurs, resolve()
returns a visitor
object.
visitor(fn?)
Any graph object can create a visitor
object. A visitor has an id
set to the
creating graph's id, an ids
array, visited
and visiting
maps, a results
array, and a done()
method.
The visitor()
method can optionally take a visit
function argument. The
visit function will run on the current graph being visited before descending
to edges. The function takes a graph param representing a child or edge.
The visitor object is used internally by the resolve()
method for tracking
visited subgraphs, and for throwing an error
if a cycle is detected.
A fully returned visitor has the following properties:
.id ~ string id of the graph for which visitor is first created
.ids ~ array of graph ids visited
.results ~ collecting parameter array of results
.visited ~ map of ids used internally for cycle detection
.visiting ~ map of ids used internally for cycle detection
.done ~ method that halts further processing or traversals when called
.visit ~ optional iteration function to run when visiting a graph element
.after ~ optional post-processing function to run when a graph's depth
traversal is completed
visitor examples
The following snippet from the remove(id)
method demonstrates a visitor usage.
It calls detach
internally and pushes detached edges to the visitor's results
array:
var visitor = this.visitor(function(edge) {
// uses closure on id param
if (edge.detach(id)) {
// *this* is the visitor
this.results.push(edge);
}
});
return this.resolve(visitor);
which lets you retrieve the visitor's results array:
var results = remove('something').results;
visitor.done()
The following snippet from descendant(id)
shows a call to done()
:
var child = false;
var visitor = this.visitor(function(edge) {
// uses closure on descendant(id) param
var item = edge.edges[id];
if (item) {
child = item;
this.done(); // halt further processing
}
});
this.resolve(visitor);
return child;
visitor.after = function () {}
The following snippet from sort()
shows how to define an after
callback:
var visitor = this.visitor();
visitor.after = function (edge) {
// if the current edge has not been marked as visited yet,
// unshift it to the front of results if it's a leaf,
// otherwise, push it to the end if it has edges
if (!this.visited[edge.id]) {
if (edge.empty()) {
this.results.unshift(edge.id);
} else {
this.results.push(edge.id);
}
}
};
You can define visit
as a method directly on a created visitor
rather than
passing it as a parameter. The following snippet from the list()
method shows
how to define visit
, after
, and some custom or nonce properties:
var visitor = this.visitor();
// assign nonce properties we can use in visit and after functions
visitor.depth = -1;
visitor.INDENT = 2;
// explicit visit assignment
visitor.visit = function (edge) {
// unset depth after() visiting...
this.depth += this.INDENT;
for (var i = 0; i < this.depth; i++) {
this.results.push(' ');
}
if (edge.empty()) {
this.results.push('-');
} else {
this.results.push('+');
}
this.results.push(' ' + edge.id + '\n');
};
// post-processing aop
// unset depth after() visiting...
visitor.after = function (edge) {
this.depth -= this.INDENT;
};
methods
attach(graph)
attach()
accepts a graph object as a child in the current graph.
If adding a graph that is not already a child, attach()
returns the added
child; else it returns false.
var main = simplegraph('main')
var a = simplegraph('a');
main.attach(a);
// => a
main.attach(a); // again
// => false
detach(id)
detach()
accepts a string id for the child in the current graph.
If a child matching the id is found, detach()
returns returns the detached
child; else it returns false.
The graph is removed from the child's parents
array. The child's root
is
set to itself if the parents
array is empty.
main.detach('a');
// => a
main.detach('a'); // again
// => false
empty()
empty()
returns true if a graph has no edges; else returns false.
var main = simplegraph('main');
main.empty()
// true
main.attach(a);
// => a
main.empty()
// false
main.detach('a');
// => a
main.empty()
// true
has(id)
has()
accepts a string id for the target child of a graph. If a child is
found matching the id, the child is returned; else has()
returns false.
main.attach(a);
// => a
main.has('a')
// a
main.detach('a');
// => a
main.has('a')
// false
descendant(id)
descendant()
accepts a string id for the target descendant of a graph. If a
descendant is found matching the id, the found target is returned; else
descendant()
returns false.
// main -> a -> b -> c
main.descendant('c');
// => c
remove(id)
remove()
accepts a string id for the target as a child in the current graph or
as a descendant within any subgraph.
remove()
visits every child and descendant of the current graph. If a
descendant's id matches the given argument, the item is detached from its graph.
remove()
returns a visitor with a results
array of all parent graphs
from which the target item has been detached. This allows for graph "refreshes"
or migrations as necessary.
// main -> a -> b -> c -> d
// a -> c -> e
var visitor = main.remove('c');
visitor.results
// => ['a', 'b']
parents(id) - aka 'fan-in'
parents()
accepts a string id for an item in the graph, and finds all graphs
in subgraph that depend on graph with given id. Found graphs are returned in the
visitor.results
array.
parents()
always returns a visitor
object with a `visitor.results' array.
// main -> a -> b -> c -> d
// a -> c -> e
var visitor = main.parents('c');
visitor.results
// => ['a', 'b']
subgraph() - aka 'fan-out'
subgraph()
traverses a graph's edges and their edges recursively, and returns
all graphs visited in the visitor.results
array. The current graph is not
included as a dependency in the subgraph.
subgraph()
always returns a visitor
object with a `visitor.results' array.
// main -> a -> b
// b -> d
// a -> c -> e
var visitor = main.subgraph();
visitor.results
// => ['a', 'b', 'd', 'c', 'e']
size()
size()
returns the number of edges under a graph.
// main -> a -> b
// b -> d -> e
// a -> c -> d -> e
// c -> e
main.size();
// => 8 arrows
list(depth?)
list()
is really a 'just for show' visitor-based method with both
visitor.visit
and visitor.after
methods - meaning some AOP is creeping into
the resolve()
algorithm, so that definitely needs re-visiting (pun - sorry).
list()
iterates over a graph and returns a string showing the graph and its
subgraph, indented, something like:
main.list()
// =>
sort()
sort()
uses visitor
internally, and returns an array of ids found from the
current graph being 'sorted' in depth-first order.
Call sort()
on any graph element to retrieve the topo-sort for that element's
subgraph.
var main = simplegraph('main');
var c = simplegraph('c');
main.attach(c);
// etc.
var results;
// either
results = c.sort()
// or
results = main.descendant('c').sort();
tests
Using tape to run tests from the node.js command line, and in order to use testling from the github service hook.
tape from the command line
cd ./simplegraph
and either of these:
npm test
node ./test/suite.js
which will run both of these:
node ./test/simple-test.js
node ./test/big-fixture-test.js
The big-fixture-test
generates over 2 million graph items (currently
2,001,001 items). Though it's optimized to run fast, you can avoid it by
running just the simple test:
npm run simple
rawgit test page
In order to verify that the test suite runs locally or on rawgit - and rather
than re-create the tests using jasmine, QUnit, or what-have-you, I now use
browserify to create a test-bundle starting with the
test suite file itself. This pulls in the tape
module and its dependencies
(there are many many many of them), plus the graph module and tests:
cd ./simplegraph
browserify ./test/suite.js -o ./browser-test/bundle.js
Use the alias for that command as defined in package.json:
npm run bundle
The rawgit page includes a dom-console.js
shim that outputs console
statements in the DOM, useful because tape outputs its results to
console.log()
.
View the generated browser test bundle page on rawgit.
TODO
refactor resolve() to take a properties object, a visit function, maybe an after function, to reduce verbose visitor creation gack everywhere
add bulk attach capability ~ possibly a
load
method that runs attachment on a (dare I say it) separate process or worker or iframe... or maybe an excuse to try promises or streams...add serialize() (and deserialize ?) support as part of that (maybe)
get off testling (?) ~ not always reliable
port tape tests to jasmine ~ travis works with jasmine-node
setup test suites for testem.js
split tests into smaller method-specific files maybe
add build task dependency or module dependency example
add code snippets for each method in the README (especially for visitor)
fix recursion performance on edges
- change edges from array to a map
- remove
indexOf
- add
has()
method - recursion by resolve/visitor is a huge problem with the array of edges
implementation ~ using it in attach() for root and parent checking has
doubled creation times for large graphs (15 seconds for 1 million items) ~
removing attach() from big fixture setup altogether cuts creation time from
8 or 9 seconds down to 0.6 (!) ~ using
new
inside big fixture reduces time by another 100ms ~ using it in size() takes 50x longer than procedural looping ~ using an edge map reduces attach() and size() times enough that a big fixture with 2 million items is processed in 2-3s good enough for now
add
empty()
method to replace the for-in fakeoutsrename find() to descendant()
[16 DEC 2013] add topological sort
rename
require
'd graph function tosimplegraph
massive speed up of big-fixture setup using edges directly rather than attach()
rename
dependants()
(items that depend on certain node) asparents
constructor
- unique ID constraint at simplegraph(id) ~ use a map of edge ids instead of an array
- remove
root
andparent
support - gad, what a mistake - support a
root
field so we cansort()
from the top by default - call resolve() on each attach() call for early cycle detection - needed for root/parents
- add the root property and attach/detach logic to reassign it
- reformat the README markdown
- npm publish - [done as simplegraph 12 DEC 2013] [27 SEPT 2013] - simple-graph name was still available in July, but is now taken... renaming to nested-graph [7 OCT 2013]
- rename nested-graph to simplegraph (no hyphen) [12 DEC 2103]
- add data element support - won't do that - final answer
- rethink the remove() method - id vs. graph now uses id
- reconsider visitor(), resolve() and fn argument. detached post-processing fn
- better visitor pattern for resolve done
- evict() - remove all occurrences of item from graph and dependants. added
- rethink success v error -- do we need both ? only adds error field
- reconsider the apply() method -- needed ? detached
- evict() - revisit the arg type - graph vs. id?
- evict() - revisit the return array - make use of the visitor()
- fanIn() - no exception throwing - make better use of visitor, resolve
- rename fanIn() as dependants()
- add dependencies() method 'fan out'
- add size() method
- print() or toString() method for graph and edges only (else big fixture issue)
- rename dependencies() to subgraph()
- rename add() to attach()
- rename remove() to detach()
- rename evict() to delete()
- rename delete() to remove() - because delete is a keyword in certain browsers...
visitor
refactor the visitor - set visitor.id to creating graph.id by default
refactor the visitor, resolve the visitor.walk vs graph.resolve quarrel
- decided in favor of graph.resolve() with a visitor which is really a collecting parameter plus one or two utility methods
add a done() method to terminate traversals instead of visitor.match
add an error() method to send visitor.errors to
- not implemented - opted to throw Errors on cycles after all.
add a before/after distinction, or preprocess-depthprocess-postprocess chain
- done but may re-visit due to creeping AOP and/or inelegant API.
6/2/13 - possible refactoring to try out 24/06/13 - NO - these are bad ideas
let visitor have oncycle or onerror method
let delete, fanIn and others (cycle?) create a visitor with this method and decide what to do
then call resolve which will check visitor for oncycle and add error or call it if necessary.
rename resolve as visit; name cycle as resolve;