segmenta
v1.0.37
Published
A fast API for managing and querying arbitrary data segments stored in Redis
Downloads
56
Readme
Segmenta: a segments api using Redis for storage
What does it do?
Provides a mechanism for storing and retrieving sets of numbers quickly as well as performing operations with those sets. Currently supported are:
and
: produce the set C of numbers which are in both A and B- [ 1, 2, 3 ] and [ 2, 3, 4 ] = [ 2, 3 ]
or
: produce the set C of numbers which are in either A or B- [ 1, 2, 3 ] or [ 2, 3, 4 ] = [ 1, 2, 3, 4 ]
not
: produce the set C of numbers which are in A excluding those in B- [ 1, 2, 3 ] not [ 2, 3, 4 ] = [ 1 ]
How to use?
import
orrequire
segmenta- javascript:
const Segmenta = require("segmenta");
- typescript:
import Segmenta from "segmenta";
- the only export of the library is the Segmenta class
- javascript:
- create an instance with options, if required:
const segmenta = new Segmenta()
const segmenta = new Segmenta(options)
- options have the structure:
where:{ redisOptions?: RedisOptions, segmentsPrefix?: string, bucketSize?: number, resultsTTL?: number, }
RedisOptions
are the options which would be passed toioredis
to initialize (eghost
,port
, etc)segmentsPrefix
is a prefix to apply to all segments keys (defaults to "segments")bucketSize
is the max size to use when creating buckets (defaults to 50kb)resultsTTL
is the time you'd like result snapshots to live for when not explicitly released (defaults to 1 day)
- options have the structure:
- Populating data
add
adds ids to the segmentawait segmenta.add("my-segment", [ 1, 2, 3 ]); // my-segment now contains [ 1, 2, 3 ]
del
deletes ids from the segmentawait segmenta.del("my-segment", [ 2, 3 ]); // my-segment now contains just [ 1 ]
put
takes a sequence of add / del commands and performs them in order (useful for batching streaming data)await segmenta.put("my-segment", [ { add: 5 }, { del: 1 }, { add: 4 }, { del: 5 }, { add: 1 } ]); // my-segment now contains [ 1, 4 ]
- Query
results are returned as an object with the shape:
{ ids: number[], skipped: number, count: number, resultSetId: string, // used to re-query against snapshot total: number }
simple queries are supported (entire segments), however a DSL is provided for easier querying. The client can perform
and
,or
, andnot
operations on results if they are retreived as buffers (see example below in DSL area).- Simple query, all results returned:
await segmenta.add("my-new-segment", [ 10, 20, 30 ]); const result = await segmenta.query("my-new-segment"); /* result looks like: { ids: [ 10, 20, 30 ], skipped: 0, count: 3, resultSetId: "4deee554-da28-4029-8231-98060fa014dc", total: 3 } */
- Paged query:
await segmenta.add("paged-results-segment", [ 1, 2, 3, 4, 5 ]); const result1 = await segmenta.query({ query: "paged-results-segment", skip: 0, take: 2 }); /* result1 looks like: { ids: [ 1, 2 ], skipped: 0, count: 2, resultSetId: "63c6a1f0-8aec-4249-9d80-63c5de13b942", total: 5 } */ // the rest of the results can be obtained with: const result2 = await segmenta.query({ query: "63c6a1f0-8aec-4249-9d80-63c5de13b942", skip: 2 });
- Paged results are snapshot and can be re-queried by using their id (uuid). Snapshots automatically expire
after 24 hours (or the number of seconds specified by
resultsTTL
in your constructor arguments. You may manually dispose of results when you no longer need them:
const result = await segmenta.query({ query: "my-set", skip: 0, take: 10 }); await segmenta.dispose(result.resultSetId);
Snapshots are only created when queries are performed with a positive integer
skip
ortake
valueThere is a DSL for querying in a more readable manner:
await segmenta.add("set1", [ 1, 2 ]); await segmenta.add("set2", [ 3, 4, 5 ]); await segmenta.add("set3", [ 2, 3, 5, 6, 7 ]); await segmenta.add("set4", [ 5, 6 ]); // ... some time later ... const query = "get where in 'set1' or 'set2' and 'set3' not 'set4'"; const result1 = await segmenta.query(query); // or, with paging options: const result2 = await segmenta.query({ query, skip: 10, take: 100 }); // the query syntax above is analogous to the following // more manual query mechanism: const set1 = await segmenta.getBuffer("set1"); const set2 = await segmenta.getBuffer("set2"); const set3 = await segmenta.getBuffer("set3"); const set4 = await segmenta.getBuffer("set4"); // these operations are fast, acting on bitfields in memory. const final = set1 // [ 1, 2 ] .or(set2) // [ 1, 2, 3, 4, 5 ] .and(set3) // [ 2, 3, 5 ] .not(set4) // [ 2, 3 ] .getOnBitPositions() .values; // returns the numeric array for bit positions
One may also query for counts only:
await segmenta.query("count where in 'x');
One may also query for results to come back in a random order:
await segmenta.query("random where in 'x');
When paging is requested and a
resultSetId
is returned, requering against that result-set will maintain the original randomized order.Query syntax is quite simple:
(GET | COUNT | RANDOM) WHERE IN('segment-id') [(AND|OR|NOT) IN('other-segment')]... [MIN {int}] [MAX {int}] [SKIP {int}] [TAKE {int}]
- segments are identified by strings (single- or double-quoted)
- only two operations are supported:
GET
andCOUNT
- the results of
COUNT
look likeGET
except no segment data is returned. Use thetotal
field in the result to read your count value. - boolean operations are run left-to-right
- operations may be grouped with brackets, in which case they are evaluated first, eg:
GET WHERE IN('x') AND NOT (IN('y') OR IN('z'))
- retreives values which are in 'x' and also not in 'y' or 'z';
- brackets around segment ids are optional:
GET WHERE IN 'x'
is equivalent toGET WHERE IN('x')
- the
IN
keyword is optional after the first usage:GET WHERE IN 'x' and IN 'y'
is equivalent toGET WHERE IN 'x' AND 'y'
- syntax is case-insensitive
GET WHERE IN 'x'
is equivalent toget where in 'x'
andGet Where In('x')
skip
andtake
can also be set on the query options -- when doing so, the skip/take values on query options take precedence. This allows easy re-use of natural-language query with changing paging values, but also facilitates natural language paging if that is your preference.MIN
andMAX
set minimum and maximum values to bring back in the result set. These values are inclusive. This may be useful ifSKIP
doesn't suit your chunking needs, but rather setting aMIN
and aTAKE
min
andmax
can also be set on query options. As withskip
andtake
, the query options values formin
andmax
override any natural language specification- segment ids are case-sensitive
get where in 'MY-SEGMENT'
is NOT equivalent toget where in 'my-segment'
- segment ids may not contain quotations
- they must be valid redis keys
- some queries will never make sense, so expect either strange results or parse errors:
GET WHERE NOT IN 'x'
- since the segments are open-ended, this is essentially an infinite set of all numbers, excluding those in segment 'x'