shashi
v0.2.4
Published
Shashi, a simple module to generate, using pseudo-randomness, a universal family/set of hash functions, which produce integer values within the selected range (a prime number)..
Downloads
13
Maintainers
Readme
Shashi
Shashi, a simple module to generate, using pseudo-randomness, a universal family/set of hash functions, which produce integer values within the selected range (a prime number).
###A random bit of theory
A family H of hash functions is universal if, for any two items in the universe, the probability of collision is as small as possible.
Briefly, assumed that for every input key k ∈ K and for every hash function h ∈ H:
- h(k) map to { 0, 1, .., m−1 }
then, H is universal when, for any two items x != y:
- Prob( h(x) == h(y) ) = 1/m
In other words, chosen h uniformly at random, from a universal set H of hash functions, if h is used to hash n arbitrary keys into m slots, then, for a given key k, we expect that the total number of collisions with k is < n/m.
Universal hash functions could also be used to compute perfect hashing, for a determined set of keys.
Choosing properly a prime p and a constant c:
- p > c*|set of keys| (c >= ~2.09)
We expect to find a perfect hash mapping for values, in a finite time (with high probability), interpreting lists of resulting hashed values as edges to insert in a bipartite or tripartite (hyper) graph G = {V,E}:
- |E| = n (keys)
- |V| = p
we set:
- |V| = c * |E|
then we choose a prime >= 2*n:
- |V| = p = c * |E| = c * n > n * 2
Algo example; we choose |H| = 2, then H = { h0, h1 }:
while ( true ) :
pick-up at random 2 hash functions from H
for every key k ∈ K, add resulting edge and vertices to the 2-graph:
- edge = (v0, v1), with v0 = h0(k), v1 = h1(k)
now testing the 2-graph acyclicity (it could be performed in linear time).
if 2-graph is acyclic:
- break the loop, well done!
- from now on, you can build a perfect hashing function for your set of keys, in a deterministic way.
- otherwise let's gamble!!
- continue loop, picking-up 2 others random hash functions
See also:
###Install
$ npm install shashi [-g]
require:
var Shashi = require( 'shashi' );
###Run Tests
to run all test files, install devDependecies:
$ cd shashi/
# install or update devDependecies
$ npm install
# run tests
$ npm test
to execute a single test file simply do:
$ node test/file-name.js
###Run Benchmarks
run miscellaneous benchmarks for Shashi:
$ cd shashi/
$ npm run bench
# or to run a single file
$ node bench/file-name-bench.js
###Method
Arguments within [ ] are optional.
#####Generate seed sequence using Math.random.
get a method to use a family of hash functions.
/*
* A trivial example, Shashi( 2, 16, 257 ) generates:
* - 2 hash functions, h0 and h1 (indexes 0, 1)
* - every hash fn, expects/accepts at most 16 items to encode
* - the range of items should be: [0,255], then using 1 byte
* per item in the seed sequence.
*
* NOTE:
* - if the p number is not a prime, the next greater prime number will be calculated.
* - if the p number is out of range (> 2^53), you'll get an Error.
*/
Shashi( Number hash_fn, Number items_to_hash, Number prime_for_seed_range ) : Array
It returns an Array composed by:
- a method used to retrieve and execute a particular hash function, using an index (0-k).
- the underlying seed Sequence, used for generating the hashed value (an integer).
/*
* When you only need to regenerate random data for seeding your hash functions,
* you can simply refresh the seed sequence using Sequence#fill or Sequence#parse.
*
* See also https://github.com/rootslab/brando.
*/
Array : [ Function uhash, Sequence seed ]
// universal set of hash functions
uhash : function ( Number fn_index, Buffer input_data [, Number bytes_per_item ] )
#####Generate seed sequence using a random sample.
When a data Buffer is used to fill the seed sequence, the method automatically switches to the callback mode.
Shashi( Number h, Number i, Number p [, Buffer src [, Function cback ] ] ) : undefined
the callback gets 3 arguments:
/*
* NOTE: an error was returned when:
* - the number supplied for the seed sequence range is not a prime.
* - the sample data are not enough to fill the random seed Sequence.
*/
cback : function ( Error e, Function uhash, Sequence seed ) { .. }
See also examples.
MIT License
Copyright (c) 2015 < Guglielmo Ferri : [email protected] >
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the 'Software'), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.