npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

rapid-couchdb-backup

v0.3.5

Published

A couchdb database backup tool focusing on **speed** and **rate limit controls**. Will backup active docs in a couchdb database to a [node stream](https://nodejs.org/api/stream.html) (which could be a file or something else).

Downloads

8

Readme

Rapid Backup

A couchdb database backup tool focusing on speed and rate limit controls. Will backup active docs in a couchdb database to a node stream (which could be a file or something else).

Note that this tool does not back up deleted docs. See limitations.

Rate Limit Controls

This tool will start off with a slow api rate and increase it until a 429 response code is received. It will then lower the internal limit to stay within the rate limit and the head_room_percent setting. It will continue to adjust its internal rate if additional 429 codes are received throughout the backup. This "room" will allow other applications to continue to access the db without hitting the rate limit.

There are also settings to control min/max rate limits as well as maximum pending requests. These settings should prevent the backup from overwhelming couchdb!

Speed

This couchdb backup lib will be much faster than @cloudant/couchbackup if the database has a high deleted doc percentage. Otherwise it is only faster on large databases and its actually slower on very small databases.

| Backup Test | Rapid Backup | Cloudant CouchBackup | Speed Up | | ----------- | ----------- | ----------- | ----------- | | XLarge - 0% deleted | 1.8 hrs | 4.8 hrs | 2.7x | XLarge - 75% deleted | 34.5 mins | 4.9 hrs | 8.5x | Large - 0% deleted | 2.7 mins | 6.0 mins | 2.2x | Large - 75% deleted | 52.9 secs | 5.9 mins | 6.7x | Small - 0% deleted | 3.4 secs | 2.4 secs | 0.7x (slower) | Small - 75% deleted | 1.9 secs | 2.4 secs | 1.3

  • XLarge - 22M docs, total size 10GB
  • Large - 581k docs, total size 275MB
  • Small - 2k docs, total size 5MB

Usage

// to enable detailed logs pass a logger or the console to the lib,
// else logging is disabled. (warning - there are a lot of logs!)
const rapid = require('rapid-couchdb-backup')(console);

// all options are shown below:
const opts = {
	// [required] the database connection url, including basic auth and port if applicable
	couchdb_url: 'https://auth:[email protected]:443',

	// [required] the database name to backup
	db_name: 'my-db',

	// [required] the stream to write the backup to.
	write_stream: fs.createWriteStream('./_backup_docs.json'),

	// [optional] the optimal batch read response size in bytes.
	// This will indirectly set the number of docs to batch read per request.
	// Recommended to set this around 256KB - 1MB (the higher the better, usually).
	// Setting this too high may overwhelm couchdb.
	// defaults 131072 (128KB)
	batch_get_bytes_goal: 128 * 1024,

	// [optional] the maximum number of read queries to spawn per second.
	// If this the rate limit is unknown, leave blank.
	// This lib will auto detect the real rate limit.
	// It will back off once a 429 response code is found.
	// defaults 50
	max_rate_per_sec: 30,

	// [optional] the maximum number of read queries to be waiting on.
	// Setting this too high may overwhelm couchdb (10-50 seems okay).
	// defaults to Math.floor(max_rate_per_sec / get_doc_batch_size) * 2
	// recommended to leave it blank
	max_parallel_reads: undefined,

	// [optional] how much of the real rate limit should be left for other applications.
	// example if 20 is set then only 80% of the detected-rate limit will be used.
	// defaults 20
	head_room_percent: 18,

	// [optional] the minimum number of read queries per second.
	// when the lib encounters a 429 response code it lowers its internal limit.
	// this setting will create a floor for the internal limit.
	// defaults 50
	min_rate_per_sec: 50,

	// [optional] the maximum amount of time to wait on an read api in milliseconds.
	// defaults 240000 (4 minutes)
	read_timeout_ms: 1000 * 60 * 2,

	// [optional] an IBM Cloud IAM apikey can be provided.
	// if provide a bearer token authorization header will be used to connect to couch.
	// the access token will be refreshed 5 minutes before it expires.
	// the default iam exchange endpoint is:
	//  - https://identity-3.us-south.iam.cloud.ibm.com/identity/token
	//  - this url can be overwritten with the env var IAM_TOKEN_URL
	iam_apikey: 'asdf',
};

rapid.backup(opts, (errors, date_completed) => {
	console.log('backup completed on:', date_completed);
	if (errors) {
		console.error('looks like we had errors:', JSON.stringify(errors, null, 2));
	}
});

How it Works

The issue with the other backup tools are that they backup the delete history from the _changes feed. That leads to poor performance if you have a ton of deleted docs. Which only gets worse over time (assuming your applications are creating and deleting docs regularly). Each delete is still something it will process, so the time for a complete backup will actually grow indefinitely!

The number of deleted docs is mostly irrelevant to this lib. The main variable driving how long a backup will take is the number of docs that are not deleted.

In phase1 the backup will walk the _changes feed and ignore delete entries. It will keep up to X doc ids in memory at a time. In phase2 it will send bulk/batch GET doc apis to receive as many docs as the settings allow. As the docs come in they will be written to the output stream. It will then repeat phase1 and phase2 until all docs are backed up. Once its done with that it needs to find if any docs were added/edited since the backup started. phase3 will walk the _changes feed starting the feed from the start of the backup. Any new docs or changed docs will be written to the backup.

Limitations

  • Will only back up active docs. Meaning the deleted doc history is not part of the backup (with the exception of when a delete happens during the backup process).
  • Docs that were deleted during the backup will appear in the beginning of the backup (in the un-deleted state). However they will be followed by their delete stub at the end of the backup data. Since restoring walks the backup the deleted doc will momentarily appear and then be deleted by the end.
  • Docs that were edited during the backup will appear twice in the backup data. The latest version is the one towards the end of backup. Since restoring walks the backup the old doc will momentarily appear and then be updated by the end.
  • Does not store doc meta data such as previous revision tokens.
  • Does not back up attachments (this was chosen to preserve compatibility with @cloudant/couchbackup's restore function).

Backup Structure

Same output as @cloudant/couchbackup. It's a bunch of naked arrays with doc JSON objects separated by newlines.

[{"_id":"1","_rev":"1-1","d":1},{"_id":"2","_rev":"2-2","d":2}...]
[{"_id":"3","_rev":"3-3","d":3},{"_id":"4","_rev":"4-4","d":4}...]

How to Restore

The output format of this backup is compatible with @cloudant/couchbackup. Use that lib to restore.