@aggregion/gold-record-find
v2.8.3
Published
Utility for finding matching rows in multiple data tables using column weights and matching threshold. The key advantage is finding matches for O(n) time and memory usage. ## Commands:
Downloads
117
Keywords
Readme
gold-record-find
Utility for finding matching rows in multiple data tables using column weights and matching threshold. The key advantage is finding matches for O(n) time and memory usage.
Commands:
agg-grf match
Find matching rows
OPTIONS
-O, --fullOutput=fullOutput Output file containing matched id and attributes of each file
-a, --attribute=attribute (required) Attribute to match in format: column_name1,column_name2,weight
-d, --delimiter=delimiter [default: ,] Attribute to match in format: column_name,weight
-f, --file=file (required) Input file
-o, --output=output Output file containing matched id of each file
-t, --threshold=threshold (required) Matching threshold
-q, --quiet Quiet mode (without progress)
-M, --estimatedMemoryUsage The maximum memory consumption that the utility will strive for. Default: 2GB.
-w, --minWindowSize Minimal matching chunk. Default: unlimited.
agg-grf generate
Generate test files
OPTIONS
-c, --columnsCount=columnsCount (required) Number of columns
-d, --delimiter=delimiter [default: ,] csv separator character
-f, --file=file (required) Output file
-l, --linesNumber=linesNumber (required) Number of lines
-r, --matchRate=matchRate (required) Match rate