@ashnazg/squirrelnado-difftable
v0.0.6
Published
``` const sqrl = require('@ashnazg/squirrelnado'); require('@ashnazg/squirrelnado-psql')(sqrl); require('@ashnazg/squirrelnado-difftable')(sqrl);
Downloads
3
Readme
Usage
const sqrl = require('@ashnazg/squirrelnado');
require('@ashnazg/squirrelnado-psql')(sqrl);
require('@ashnazg/squirrelnado-difftable')(sqrl);
const diff = sqrl.diff({
control: 'postgres://known_good_server/db1',
experiment: 'postgres://audit_this_server/db2',
cache: 'postgres://localhost/tmpdb/some_schema',
write: 'postgres://final_server/db3/reports/single_anomaly_table',
tmp: process.env.HOME + '/sqrl-diff-cache' // all control/experiment tables looked at will be cached as $tmp/{ctl|exp}/schema/table/page-N.jsonl.gz
// !! handle your own expiration of tables out of tmp !!
// to avoid burning DB I/O, sqrl.diff assumes that its existing cache is still usable as long as a folder named $tmp/$mode/schema/table exists.
});
await diff('schema1.table1'); // write table1's anomalies to 'single_anomaly_table'
await diff('schema1'); // scan tables in control. for each table, do the above.
// you can also break the steps out and sequence all the reads at once. the above oneliner breaks out as:
await diff.snapshot('schema1.table1'); // cache both copies of the target(s) into $tmp
await diff.compare('schema1.table1'); // compare each pair, building up and then slicing down a copy of the data in $cache, leaving only the anomalies behind
await diff.write('schema1.table1'); // export that group of anomalies from local cahe to $write's 'single_anomaly_table'
await diff.close();
broken out, the only step that needs access to control/experiment is snapshot, and only write needs access to $write db.
compare always reads from $tmp, and nukes and replaces the state in cache. So you could run snapshots on Monday; delete "$tmp/exp" on Tuesday to force snapshot to re-pull only that
side, and run compare() on Wednesday.
variations
- If experiment does not specify a table, scan schema3 for all tables/views, and look them up in control's schema2 (in this mode, a table name on control is ignored.)
- if write does not specify a table, pick a report name name based on "control"
- (The above "control" URL would put all final records in "db3/reports/schema2_table" if we're in table:table mode, or reports.schema2 if we're in schema:schema mode.)
- you can compare two tables in one DB as well: just use the same URL for control and experiment, and then use 2 params on
diff('experiment.table', 'reference.table')
- when the URLs are the same, the reference copy will load from $tmp/ctl if it exist, else $tmp/exp
index columns
By default, diff only looks for a column named 'id'. If not usable as a unique index, it falls back on the whole raw row as identity.
You can pass in any array of strings as the last param to indicate valid candidaates:
diff('table', ['id', 'src_id'])
diff('table', 'ref_table', ['id', 'src_id'])
diff.compare('table', ['id', 'src_id'])
diff.compare('table', 'ref_table', ['id', 'src_id'])
And an empty list indicates not to use the default of 'id'.