npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

weighted-random-item-sampler

v1.0.2

Published

A weighted random item sampler (selector), where the probability of selecting an item is proportional to its weight, with replacement allowed between samples. In other words, an item can be sampled more than once. The sampling method utilizes a binary sea

Downloads

128

Readme

The WeightedRandomItemSampler class implements a random sampler where the probability of selecting an item is proportional to its weight, with replacement allowed between samples. In other words, an item can be sampled more than once.

For example, given items [A, B] with respective weights [5, 12], the probability of sampling item B is 12/5 higher than the probability of sampling item A.

Weights must be positive numbers, and there are no restrictions on them being natural numbers. Floating point weights such as 0.95, 5.4, and 119.83 are also supported.

Use case examples include:

  • Distributed Systems: The sampler can assist in distributing workloads among servers based on their capacities or current load, ensuring that more capable servers handle a greater number of tasks.
  • Surveys and Polls: The sampler can be used to select participants based on demographic weights, ensuring a representative sample.
  • Attack Simulation: Randomly select attack vectors for penetration testing based on their likelihood or impact.
  • ML Model Training: Select training samples with weights based on their importance or difficulty to ensure diverse and balanced training data.

If your use case requires sampling each item exactly once without replacement, consider using non-replacement-weighted-random-item-sampler instead.

Table of Contents :bookmark_tabs:

Key Features :sparkles:

  • Weighted Random Sampling :weight_lifting_woman:: Sampling items with proportional probability to their weight.
  • With Replacement: Items can be sampled multiple times.
  • Efficiency :gear:: O(log(n)) time and O(1) space per sample, making this class suitable for performance-critical applications where the set of items is large and the sampling frequency is high.
  • Comprehensive documentation :books:: The class is thoroughly documented, enabling IDEs to provide helpful tooltips that enhance the coding experience.
  • Tests :test_tube:: Fully covered by unit tests.
  • TypeScript support.
  • No external runtime dependencies: Only development dependencies are used.
  • ES2020 Compatibility: The tsconfig target is set to ES2020, ensuring compatibility with ES2020 environments.

API :globe_with_meridians:

The WeightedRandomItemSampler class provides the following method:

  • sample: Randomly samples an item, with the probability of selecting a given item being proportional to its weight.

If needed, refer to the code documentation for a more comprehensive description.

Use Case Example: Training Samples for a ML model :man_technologist:

Consider a component responsible for selecting training-samples for a ML model. By assigning weights based on the importance or difficulty of each sample, we ensure a diverse and balanced training dataset.

import { WeightedRandomItemSampler } from 'weighted-random-item-sampler';

interface TrainingSampleData {
  // ...
}

interface TrainingSampleMetadata {
  importance: number; // Weight for sampling.
  // ...
}

interface TrainingSample {
  data: TrainingSampleData;
  metadata: TrainingSampleMetadata;
}

class ModelTrainer {
  private readonly _trainingSampler: WeightedRandomItemSampler<TrainingSample>;

  constructor(samples: ReadonlyArray<TrainingSample>) {
    this._trainingSampler = new WeightedRandomItemSampler(
      samples, // Items array.
      samples.map(sample => sample.metadata.importance) // Respective weights array.
    );
  }

  public selectTrainingSample(): TrainingSample {
    return this._trainingSampler.sample();
  }
}

Algorithm :gear:

This section introduces a foundational algorithm, which will later be optimized. For simplicity, we assume all weights are natural numbers (1, 2, 3, ...). A plausible and efficient solution with O(1) time complexity and O(weights sum) space complexity involves allocating an array with a size equal to the sum of the weights. Each item is assigned to its corresponding number of cells based on its weight. For example, given items A and B with respective weights of 1 and 2, we would allocate one cell for item A and two cells for item B. This approach is valid when the number of items and their weights are relatively small. However, challenges arise when weights can be non-natural (e.g., 5.4, 0.23) or when the total weight sum is substantial, leading to significant memory overhead.

Next, we introduce an optimization over this basic idea. We calculate a prefix sum of the weights, treating each cell in the prefix sum array as denoting an imaginary half-open range. Using the previous example with items A and B (weights 1 and 2), the first range is denoted as [0, 1), while the second range is [1, 3). We can then randomly sample a number (not necessarily a natural number) within the total range [0, 3) and match it to its corresponding range index, which corresponds to a specific item. This random-to-interval matching can be performed in O(log n) time using a left-biased binary search to find the leftmost index i such that randomPoint < prefix_sum[i]. A key observation that enables this binary search is the monotonic ascending nature of the prefix sum array, as weights are necessarily positive.

License :scroll:

Apache 2.0