npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@samsch/nestit

v0.1.2

Published

Nesting function for pg join output

Downloads

3

Readme

PostgreSQL Join data relation hydrator

npm package Github repo

Turns the denormalized output of a normal join into the nested data structure you actually want!

Note that normally you should write your queries to use json_agg and similar PostgreSQL functions rather than doing this aggregation on your client. This tool is for cases where you can't or you find the performance of this to be better.

// Input
// Fields data from `pg`, a little config:
[
	// All fields required
	{
		name: 'users', // Name of table (must be unique)
		id: 'id',  // ID column
		children: ['hobbies'],  // Children relations. Must match `name` of a subsquent table
		foreign: undefined,  // undefined if no parent
	},
	{
		name: 'hobbies',  // Must be in a previous children column
		id: 'id',
		children: [],  // empty array if no children
		foreign: 'user',  // Must be the foreign key field that matches the ID on the parent
	},
]
// and the row data
[
	[
		// users.id
		1,
		// users.name
		'Sam',
		// hobbies.id
		1,
		// hobbies.user
		1,
		// hobbies.name
		'Drawing',
	],
	[
		1,
		'Sam',
		3,
		1,
		'Singing',
	],
	[
		2,
		'Clara',
		2,
		2,
		'Coding',
	],
	[
		2,
		'Clara',
		4,
		2,
		'Skiing',
	],
]

// Output
[
	{
		id: 1,
		name: 'Sam',
		hobbies: [
			{
				id: 1,
				user: 1,
				name: 'Drawing',
			},
			{
				id: 3,
				user: 1,
				name: 'Singing',
			},
		],
	},
	{
		id: 2,
		name: 'Clara',
		hobbies: [
			{
				id: 2,
				user: 2,
				name: 'Coding',
			},
			{
				id: 4,
				user: 2,
				name: 'Skiing',
			},
		],
	}
]

Limitations:

  • Requires your select has the parent relations ordered first
  • Requires some annoying fiddling with knex options and query-response event
  • Requires you to give it relation data.
  • Performance implications

So, it's common for ORMs to do this kinda thing, and they solve these above problems by having the relation data pre-defined in "models", ordering the select statements automatically, adding naming prefixes (and sometimes requiring all fields to be defined on the model), and sometimes by simply making a bunch of separate queries.

Pretty much any of that could be built over this, but since you're getting the lower level tool, you can abstract it the best way for your app.

Usage example

const nestit = require('@samsch/nestit');

Promise.try(() => {
	return knex('users')
		// We need the array rowMode to get duplicated named fields
		.options({ rowMode: 'array' })
		// We need the column data, easiest to just attach it to the row data
		// ...and that's what nestit expects
		.on('query-response', (response, obj) => {
			response.fields = obj.response.fields;
		})
		.select(['users.*', 'posts.*', 'comments.*'])
		.leftJoin('posts', { 'users.id': 'posts.user' })
		.leftJoin('comments', { 'posts.id': 'comments.post' })
		.orderBy(['users.id', 'posts.id', 'comments.id']);
}).then(rows => {
	const result = nestit(rows, [
		{
			name: 'users',
			id: 'id',
			children: ['posts'],
			foreign: undefined,
		},
		{
			name: 'posts',
			id: 'id',
			children: ['comments'],
			foreign: 'user',
		},
		{
			name: 'comments',
			id: 'id',
			children: [],
			foreign: 'post',
		},
	]);

	console.log(result);
});

Performance notes

UPDATED:

Previously my notes here stated that using Nestit was probably a good performance choice for normal amounts of query data. This was based on some flawed tests which didn't have the appropriate data indexes on the foreign key fields. Missing those indexes causes the json_agg queries to take much longer.

With indexes on the foreign keys, using json_agg with various types of joins will generally perform better across a variety of query result sizes and nesting levels than using Nestit with plain join output.

Some very un-scientifically sourced numbers.

The dataset is generated 5000 users, 29810 posts, 177741 comments. About 6 posts per user, 6 comments per post on average. The real data has random amounts of posts and comments per parent row. For the numbers below, I made sure the single record had at least several posts and comments. I'm personally biased a bit against using Nestit instead of json_agg solutions. I think having the database do it is better.

||1 base row|3 base rows|1250 base rows|5000 base rows (full dataset) :-----:|:-----:|:-----:|:-----:|:-----: Simple Join|4.4ms|2.8ms|400ms|1587ms Nestit|.1ms|8.6ms|66ms|332ms Sum (Simple Join + Nestit)|4.5ms|11.3ms|466ms|1919ms Lateral json_agg|1.1ms|2.1ms|243ms|1026ms Select Subquery json_agg|1.1ms|2.2ms|232ms|1002ms Join Subquery json_agg|1.4ms|576ms|640ms|860ms

The queries used can be seen in testing/queries/.

My performance takeaways

Lateral joins are a fairly new feature, but for "normal" amounts of data you'd be dealing with for building user interfaces and such, it has really great performance. Unfortunately, Knex doesn't directly support this yet (https://github.com/knex/knex/issues/3732). Select subqueries are the most practical to use with Knex currently, and in my opinion the easiest to write an automatic nesting abstraction for. On the flipside, the SQL output is the hardest of all of them to read. Join subqueries are easy to write, pretty easy to understand, but they have severe performance issues for small amounts of data beyond a single base row. My guess here is the query planner is optimizing for the full size of the dataset when it knows it's getting more than a single record. This would also explain why this query takes the least time for the full dataset.

If your application is performance sensitive, I would recommend doing your own testing with realistic data. Don't forget indexes on the foreign keys!