npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

mongodb-collection-sample

v5.0.0

Published

Sample documents from MongoDB collections.

Downloads

32,785

Readme

mongodb-collection-sample

Sample documents from a MongoDB collection.

Install

npm install --save mongodb-collection-sample

Example

npm install mongodb lodash mongodb-collection-sample
const sample = require('mongodb-collection-sample');
const { MongoClient } = require('mongodb');
const _ = require('lodash');

const client = new MongoClient();

async function main() {
  await client.connect('mongodb://localhost:27017');

  // Generate 1000 documents
  const docs = _range(0, 1000).map(function(i) {
    return {
      _id: 'needle_' + i,
      is_even: i % 2
    };
  });

  // Insert them into a collection.
  await db.collection('haystack').insert(docs);

  const options = {};
  // Size of the sample to capture [default: `5`].
  options.size = 5;

  // Query to restrict sample source [default: `{}`]
  options.query = {};

  // Get a stream of sample documents from the collection.
  const stream = sample(db, 'haystack', options);
  stream.on('error', function(err){
    console.error('Error in sample stream', err);
    return process.exit(1);
  });
  stream.on('data', function(doc){
    console.log('Got sampled document `%j`', doc);
  });
  stream.on('end', function(){
    console.log('Sampling complete!  Goodbye!');
    db.close();
    process.exit(0);
  });
}

main();

Options

Supported options that can be passed to sample(db, coll, options) are

  • query: the filter to be used, default is {}
  • size: the number of documents to sample, default is 5
  • fields: the fields you want returned (projection object), default is null
  • raw: boolean to return documents as raw BSON buffers, default is false
  • sort: the sort field and direction, default is {_id: -1}
  • maxTimeMS: the maxTimeMS value after which the operation is terminated, default is undefined
  • promoteValues: boolean whether certain BSON values should be cast to native Javascript values or not. Default is true

How It Works

Native Sampler

MongoDB version 3.1.6 and above generally uses the $sample aggregation operator:

db.collectionName.aggregate([
  {$match: <query>},
  {$sample: {size: <size>}},
  {$project: <fields>},
  {$sort: <sort>}
])

However, if more documents are requested than are available, the $sample stage is omitted for performance optimization. If the sample size is above 5% of the result set count (but less than 100%), the algorithm falls back to the reservoir sampling, to avoid a blocking sort stage on the server.

Reservoir Sampling

For MongoDB version 3.1.5 and below we use a client-size reservoir sampling algorithm.

  • Query for a stream of _id values, limit 10,000.
  • Read stream of _ids and save sampleSize randomly chosen values.
  • Then query selected random documents by _id.

The two modes, illustrated:

Performance Notes

For peak performance of the client-side reservoir sampler, keep the following guidelines in mind.

  • The initial query for a stream of _id values must be limited to some finite value. (Default 10k)
  • This query should be covered by an index
  • Since there's a limit, you may wish to bias for recent documents via a sort. (Default: {_id: -1})
  • Don't sort on {$natural: -1}: this forces a collection scan!

Queries that include a sort by $natural order do not use indexes to fulfill the query predicate

  • When retrieving docs: batch using one $in to reduce network chattiness.

License

Apache 2