npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

couchimport

v2.1.1

Published

CouchDB import scripts

Downloads

344

Readme

couchimport

Introduction

When populating CouchDB databases, often the source of the data is initially some JSON documents in a file, or some structured CSV/TSV data from another database's export.

couchimport is designed to assist with importing such data into CouchDb efficiently. Simply pipe a file full of JSON documents into couchimport, telling the URL and database to send the data to.

Note: couchimport used to handle the CSV to JSON conversion, but this part is now handled by csvtojsonlines, keeping this package smaller and easier to maintain. The [email protected] package is the last version to support CSV/TSV natively - from 2.0 onwards, couchimport is only for pouring JSONL files into CouchDB.

Also note: the companion CSV export utility (couchexport) is now hosted at couchcsvexport.

Installation

Install using npm or another Node.js package manager:

npm install -g couchimport

Usage

couchimport can either read JSON docs (one per line) from stdin e.g.

cat myfile.json | couchimport

or by passing a filename as the last parameter:

couchimport myfile.json

couchimport's configuration parameters can be stored in environment variables or supplied as command line arguments.

Configuration - environment variables

Simply set the COUCH_URL environment variable e.g. for a hosted Cloudant database

export COUCH_URL="https://myusername:[email protected]"

and define the name of the CouchDB database to write to by setting the COUCH_DATABASE environment variable e.g.

export COUCH_DATABASE="mydatabase"

Simply pipe the text data into "couchimport":

cat mydata.jsonl | couchimport

Configuring - command-line options

Supply the --url and --database parameters as command-line parameters instead:

couchimport --url "http://user:password@localhost:5984" --database "mydata" mydata.jsonl

or by piping data into stdin:

cat mydata.jsonl | couchimport --url "http://user:password@localhost:5984" --database "mydata" 

Handling CSV/TSV data

We can use another package csvtojsonlines to convert CSV/TSV files into a JSONL stream acceptable to couchimport:

# CSV file ----> JSON lines ---> CouchDB
cat transactions.csv | csvtojsonlines --delimiter ',' | couchimport --db ledger

Generating random data

couchimport can be paired with datamaker to generate any amount of sample data:

# template ---> datamaker ---> 100 JSON docs ---> couchimport ---> CouchDB
echo '{"_id":"{{uuid}}","name":"{{name}}","email":"{{email true}}","dob":"{{date 1950-01-01}}"}' | datamaker -f json -i 100 | couchimport --db people
written {"docCount":100,"successCount":1,"failCount":0,"statusCodes":{"201":1}}
written {"batch":1,"batchSize":100,"docSuccessCount":100,"docFailCount":0,"statusCodes":{"201":1},"errors":{}}
Import complete

or with the template as a file:

cat template.json | datamaker -f json -i 10000 | couchimport --db people

Understanding errors

We know if we get an HTTP 4xx/5xx response, then all of the documents failed to be written to the database. But as couchimport is writing data in bulk, the bulk request may get an HTTP 201 response that doesn't mean that all of the documents were written. Some of the document ids may have been in the database already. So the couchimport output includes counts of the number of documents that were written successfully and the number that failed, and a tally of the HTTP response codes and individual document error messages:

e.g.

written {"batch":10,"batchSize":1,"docSuccessCount":4,"docFailCount":6,"statusCodes":{"201":10},"errors":{"conflict":6}}

The above log line shows that after the tenth batch of writes, we have written 4 documents and failed to write 6 others. There were six "conflict" errors, meaning that there was a clash of document id or id/rev combination.

Parallel writes

Older versions of couchimport supported the ability to have multiple HTTP requests in flight at any one time, but the new simplified couchimport does not. To achieve the same thing, simply split your file of JSON docs into smaller pieces and run multiple couchimport jobs:

# split large file into files 1m lines each
# this will create files xaa, xab, xac etc
split -l 1000000 massive.txt
# find all files starting with x and using xargs,
# spawn a max of 2 process at once running couchimport, 
# one for each file
find . -name "x*" | xargs -t -I % -P 2 couchimport --db test %

Environment variables reference

  • COUCH_URL - the url of the CouchDB instance (required, or to be supplied on the command line)
  • COUCH_DATABASE - the database to deal with (required, or to be supplied on the command line)
  • COUCH_BUFFER_SIZE - the number of records written to CouchDB per bulk write (defaults to 500, not required)
  • IAM_API_KEY - to use IBM IAM to do authentication, set the IAM_API_KEY to your api key and a bearer token will be used in the HTTP requests.

Command-line parameters reference

You can also configure couchimport using command-line parameters:

  • --help - show help
  • --url/-u - the url of the CouchDB instance (required, or to be supplied in the environment)
  • --database/--db/-d - the database to deal with (required, or to be supplied in the environment)
  • --buffer/-b - the number of records written to CouchDB per bulk write (defaults to 500, not required)

Using programmatically

In your project, add couchimport into the dependencies of your package.json or run npm install --save couchimport. In your code, require the library with

const couchimport = require('couchimport')

and your options are set in an object whose keys are the same as the command line paramters:

e.g.

const opts = { url: "http://localhost:5984", database: "mydb", rs: fs.createReadStream('myfile.json') }
await couchimport(opts)

Note: rs is the readstream where data will be read (default: stdin) and ws is the write stream where the output will be written (default: stdout)