npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

popurelate

v0.0.7

Published

Easily populate and filter documents using data from related collections.

Downloads

50

Readme

Popurelate - Populate related data

Easily populate and filter documents using data from related collections.

Note: Although popurelate supports adding custom engine, it currently provides ready engines only for mongodb and mongoose, so this documentation will only concentrate on them.

But since mongoose already support populate, why is popurelate necessary? Mongoose populations are indeed powerful, but since they perform separate queries on collections after fetching documents from the base collection, you are only able to filter the documents from secondary collections. This design choice does not allow you to filter base documents after populating in database. On the other hand, popurelate sends one aggregation into database and makes it possible to filter after populating.

Let's consider an example.

import mongoose  from "mongoose";
const Schema = mongoose.Schema;

const userSchema = Schema({
  _id: Schema.Types.ObjectId,
  name: String,
  premiumMember: Boolean
});
const User = mongoose.model("User", userSchema);

const bookSchema = Schema({
  title: String,
  outOfStock: Boolean,
  author: { type: Schema.Types.ObjectId, ref:"'User" },
});

const Book = mongoose.model("Book", bookSchema);

If we write the following:

Book.
  find({ outOfStock: false }).
  populate('author').
  limit(5);

This promise will return max 5 books which are not out of stock and in the matched documents, mongoose will populate the authors from users collection. In the end, in the final documents author property will not be represented as ObjectId, but as an document from the User model.

Now, what if we want to fetch only the books that are not out of stock and have premium member authors? Mongoose supports match while populating. At first, we might think that this will do the job:

Book.
  find({ outOfStock: false }).
  populate({
    path: "author",
    match: { premiumMember: true },
  }).
  limit(5);

But this works quite differently. Instead of filtering out books that have premiumMember users as authors, it will filter out the users collections. In the end, mongoose will still return max 5 books but if the author is not a premium member, populated value will be null. Thus, you will get 5 books which will have author property populated only if corresponding user is a premium member.

This is not what we wanted. We wanted 5 in-stock books which have premium member authors.

This is where popurelate comes handy.

First, create popurelate instance, add mongoose engine and list our models.


import { createPopurelation } from "popurelate";
import { defaultEngines } from "popurelate/lib/mongo";

const dbQuery = createPopurelation()
	.addEngine(defaultEngines.mongoose())
	.addDb({
		engine: "mongoose",
		db: null, // no need to pass database in the case of mongoose.
		idField: "_id",
		defaultRequired: false, // whether populating should have same effect as `inner join` instead of `left join`. For optimization purposes, we recommend this to be set to false
		models: {
			users: User, // you can change the names of models for popurelate (the keys are the names). Just make sure that you pass mongoose models as values
			books: Book,
		},
	});

And then define relations:

dbQuery.addRelation("books", {
	author: "users", // if author property was an array, we would write "author[]": "users" instead of "author": "users"
  // you can list other properties too which are related to another model
});

But since we have already defined relations (references) on mongoose schema, popurelate comes with helper function to automatically detect relations from schemas. Instead of redefining relations for popurelate, you can write:

import { addMongooseRelations } from "popurelate/lib/mongo/relations";
addMongooseRelations(dbQuery);

To log relations that popurelate has detected, pass logger function to it

addMongooseRelations(dbQuery, { logger: console.log });

Now we can query books this way:

dbQuery.models("books").
  findMany({ outOfStock: false }).
  populate("author").
  where({ "author.premiumMember": true }).
  limit(5).
  exec();

Now we have achieved the original desire of fetching 5 in-stock books which have premium member user as an author. This is because popurelate constructs an aggregation pipeline for mongo and queries documents in one request, instead of performing additional queries after fetching like mongoose does. Our constructed aggregation uses $lookup aggregation behind the curtain. It avoids you to write huge messy lookup queries.

Deep aggregation

Population can happen multiple times arbitrary deeply. If we have other models and relations defined, deep population might look like this:

dbQuery.models("books").
  findMany({ outOfStock: false }).
  populate("author", {
    populate: {
      "friends[]": true, // since friends is an array
      blockedBook: {
        matchesMany: true, // pass true if it matches many document on joined collection
        populate: {
          "subscribedBy[].id": true // if subscribedBy is an array of objects which have property id defined
        }
      }
    }
  })
  populate("publisher", { // another population
    required: true, // will filter out books that could not join with publishers. Has same effect as preserveNullAndEmptyArrays: false in $unwind pipeline of mongodb 
  }).
  where({ "author.premiumMember": true }).
  limit(5).
  exec();

If the relation is not predefined for popurelate, you can still populate by specifying necessary information:

dbQuery.models("books").
  findMany({ outOfStock: false }).
  populate("bookAuthor", {
    localField: "author", // if not set, it will take provided property as default. In this case "bookAuthor"
    model: "users",
    foreignField: "_id", // if not set, it will take id field of database as default. id field of database is defined while creating popurelate instance
    populate: { // we can still continue deep population
      "friends[]": true,
    }
  }).
  where({ "author.premiumMember": true }).
  limit(5).
  exec();

.count and .withCount

You can get the count of documents:

dbQuery.models("books").
  findMany({ outOfStock: false }).
  populate("author").
  where({ "author.premiumMember": true }).
  count().
  exec();

If you want to get count alongside documents, use withCount instead of count

dbQuery.models("books").
  findMany({ outOfStock: false }).
  populate("author").
  where({ "author.premiumMember": true }).
  withCount().
  exec();

You can pass optional skip and limit to withCount; In this case, you might want to perform rest of the operations after skip and limit. Let's say, we want to populate publishers too. Since this population does not affect the count of matched documents, we can continue building the query after counting is done. The following code achieves this optimization:

dbQuery.models("books").
  findMany({ outOfStock: false }).
  populate("author").
  where({ "author.premiumMember": true }).
  withCount({
    skip: 40,
    limit: 20,
    queryBuilder: q => q.populate("publishers")
  }).
  exec();

This is same as:

dbQuery.models("books").
  findMany({ outOfStock: false }).
  populate("author").
  where({ "author.premiumMember": true }).
  withCount({
    queryBuilder: q => q.skip(40).limit(20).populate("publishers")
  }).
  exec();

this will return object with two properties, count and docs as an array

You might change the name of these fields this way:

dbQuery.models("books").
  findMany({ outOfStock: false }).
  populate("author").
  where({ "author.premiumMember": true }).
  withCount({
    skip: 40,
    limit: 20,
    countKey: "num",
    docsKey: "documents",
    queryBuilder: q => q.populate("publishers") // you can continue querying further. This will not affect the `count` property (in this case `num` property)
  }).
  exec();

Other pipelines

Other than matching, populating, limit, sort, count and withCount, popurelate supports sort, project and addFields pipelines too. You can even use custom pipelines using rawPipeline.

dbQuery.models("books").
  findMany({ outOfStock: false }).
  populate("author").
  where({ "author.premiumMember": true }).
  project({
    title: 1,
    author: 1
  }).
  .addFields({
    userId: "$author._id"
  }).
  sort({ createdAt: -1 }).
  rawPipeline({
    $group: { // since popurelate does not support group method while building query, you can add it as a raw pipeline
      _id: "$userId",
      bookId: { $first: "$_id" }
    }
  })
  exec();

You can define portion of query and use it later:

const query2 = dbQuery.newQueryBuilder(q => q.sort({ createdAt: -1 }).skip(40).limit(20));

dbQuery.models("books").
  findMany({ outOfStock: false }).
  populate("author").
  where({ "author.premiumMember": true }).
  queryBuilder(query2).
  exec();

Optimization

Query optimization is taken very seriously. It is off by default, since it might break when using rawPipeline. You can enable this during creating popurelation:


import { createPopurelation } from "popurelate";
const dbQuery = createPopurelation()
	.addEngine(defaultEngines.mongoose({
    useOptimizer: true,
  }))
  .addDb({
    ......

Consider the query:

dbQuery.models("books").
  findMany({ outOfStock: false }).
  populate("author").
  where({ "author.premiumMember": true }).
  populate("publishers").
  skip(40).
  limit(20).
  exec();

Here we are populating publishers before skip and limit. Our default optimizer reorders it to make query more performant. It will have same effect as:

dbQuery.models("books").
  findMany({ outOfStock: false }).
  populate("author").
  where({ "author.premiumMember": true }).
  skip(40).
  limit(20).
  populate("publishers").
  exec();

This:

dbQuery.models("books").
  findMany({ outOfStock: false }).
  populate("author").
  where({ "author.premiumMember": true }).
  populate("publishers").
  addFields({ authorId: "$author._id" }).
  sort({ createdAt: -1 }).
  withCount({
    skip: 40,
    limit: 20
  }).
  exec();

will have the same effect as:

dbQuery.models("books").
  findMany({ outOfStock: false }).
  populate("author").
  where({ "author.premiumMember": true }).
  sort({ createdAt: -1 }).
  withCount({
    queryBuilder: q => q.sort({ createdAt: -1 }).skip(40).limit(20).populate("publishers").addFields({ authorId: "$author._id" })
  }).
  exec();
dbQuery.models("books").
  findMany({ outOfStock: false }).
  populate("author").
  where({ "publishers": true }).
  populate("publishers").
  project({ authorId: "$author._id" }).
  exec();

Here populating publishers has no effect to final document, so it will be removed by optimizer.

Optimizer touches nested populations too. It tries to execute pipelines that are changing the count of final documents first and moves everything down if moving down does not change final outcome.

Optimizer does not work well on custom pipelines added using rawPipeline method. Optimizer barely optimizes populations that are matching single document and are marked as required (either using reqtuired: true in population method or defaultRequired: true in early stage of creating popurelation) (inner join), since potentially it may affect count of final documents.

You may disable/enable optimizer for single query using optimizer method:

dbQuery.models("books").
  findMany({ outOfStock: false }).
  populate("author").
  where({ "publishers": true }).
  populate("publishers").
  project({ authorId: "$author._id" }).
  optimizer(false)
  exec();

Popurelate for Mongodb

If you are not using mongoose but raw mongo, you can still use popurelate.

const dbQuery = createPopurelation()
	.addEngine(defaultEngines.mongodb({
    useOptimizer: true,
  }))
	.addDb({
		engine: "mongodb",
		db: MongoDatabase,
		idField: "_id",
		defaultRequired: false,
		models: {
			users: 1, // Since we have no actual models, just list name of the collections and pass arbitrary value 
			books: 1,
      companies: 1,
		},
	});

And then add relations, such as:

dbQuery.addRelation("books", {
	author: "users",
  "publishers[]": "companies"
});

Done!