npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

into-cartodb

v3.3.5

Published

for streaming arbitrarily sized streams into cartodb

Downloads

82

Readme

into CartoDB

module for inserting data into cartodb

# for use in a node.js package
npm install into-cartodb
# for use on the command line
npm install into-cartodb -g

commmand line api

at it's simplest you can just point it to a file with the -f option and it will upload it to the table in cartodb that is the same as the file name minus the extension.

into-cartodb -k API_KEY -u USER_NAME -f path/to/file.ext -b 200

If you set the CARTODB_USER_NAME and CARTODB_API_KEY environmental variables you may omit the -k and -u options.

export CARTODB_USER_NAME=USER_NAME
export CARTODB_API_KEY=API_KEY
into-cartodb -f path/to/file.ext

if you need to import to a table that is named differently then the file use the -t option

into-cartodb -f path/to/file.ext -t table_name

if you want to stream from stdin you still need to tell us what kind of file it is and you can use the -n option to do so

getfile | into-cartodb -n file.ext

if you don't specify the table name then we default to the filename you said otherwise we only care about the extension, so the following are equivalent

getfile | into-cartodb -n tablename.ext
getfile | into-cartodb -n nobody_cares.ext -t tablename

By default we create a new table you can either use the --method (-m) flag to specify that you want to append or replace or use the --append (-a) or --replace (-r) flags.

# all three are equivalent
into-cartodb -f path/to/file.ext -m append
into-cartodb -f path/to/file.ext --append
into-cartodb -f path/to/file.ext -a
# all three are equivalent
into-cartodb -f path/to/file.ext -m replace
into-cartodb -f path/to/file.ext --replace
into-cartodb -f path/to/file.ext -r
# all four are equivalent
into-cartodb -f path/to/file.ext
into-cartodb -f path/to/file.ext -m create
into-cartodb -f path/to/file.ext --create
into-cartodb -f path/to/file.ext -c

Create mode throws an error if the table already exists in cartodb, replace and append modes throw an error if the table does not exist yet.

By default data is pushed into cartodb in batches of 200, you may use the -b argument to decrease the batch size if you are running out of memory or increase it if the upload is taking too long.

By default we stream to a temp table and then move it over after we're done, if you don't want that for whatever reason (i.e. really big data) then pass the -d option for a direct import, only works with create and append.

The -p option (for coPy, also because I've already used -c and -C) uses the new carto copy api.

Supported formats are

  • .geojson
  • .csv
  • .json
  • .shp
  • .kml
  • .kmz
  • .zip

Caveats:

  • .shp can't be streamed in from stdin, it must be from the file system and it must have a .dbf in the same folder, if it isn't in unprojected WGS84 the prj file must also be in the same folder.
  • the only geometry supported by .csv and .json are points encoded in fields named x and y or lat and lon (or lng). These must be WGS84 lat lons (even for x y).
  • .json must have an array of objects as the top level element (aka not the same as geojson).
  • shapefile must not be zipped.
  • .kml, no styles, and only extended data
  • .kmz, same as .kml but additionally like .shp must come from file system not stdin
  • a .zip must be a path on the filesystem (no stdin) and may contain any other format (except .kmz), if the zip has more then one valid file then use the -n parameter to specify which one otherwise it'll pick the first one it can find.

This tool uploads to a temp table and then inserts them into the table after all rows have been uploaded, this is MUCH faster then importing directly to the table but if the upload doesn't finish can lead to a two minor issues.

  1. orphan and invisible temp tables hanging around, you can delete these by using the -C (aka -cleanup) option from the command line which will drop all these.
  2. a failed creation of a new table will leave a rump table of only 50 rows in cartodb, you'll need to delete that manually.

programic api

var writeStream = intoCartoDB(user, key, table, options, callback);

user and key are the cartodb credentials, table is the destination table, options is an object with various config options, if it's a string it's assumed to be method which is one of 'create', 'append', or 'replace' (default create) which selects the table creation strategy and callback which is called when the data is fully inserted into cartodb. Note that listening for the streams finish event is not sufficient for knowing that it is fully uploaded due to the stream being buffered internally, if the callback is omitted then one can listen for the uploaded event. Additionally an inserted event is emitted which tells you when features are successfully inserted, the event object is an integer telling you how many.

Returned stream is an object stream which takes geojson features, geometry may be null.

Other options besides method which are supported include

  • validations: an array of Promise returning functions, called in order with 3 arguments

    • the name of the table in cartodb
    • a map where each entries key is the field name to be inserted into the main table and the value is the value to get it out of the temp table
    • a cartodb-tools database object (same api as knex minus transactions)
  • batchSize: number of features to insert into cartodb at a time, defaults to 200, decrease if you are running out of memory.

  • copy: use the carto copy api

    See below for more info.

Validations

If the promise rejects that the temporary table is cleaned up (and the stub table is cleaned up for create operations). The field map works by the key being the name of the field to insert into the table and the value being the expressions in sql. For instance usually the geometry value is the same as the key the_geom but if the geometry needs to be fixed it is instead ST_MakeValid(the_geom) as the_geom.

One validation is included by default and it is used to

  • Transfer the geometry field over if any non null geometries
  • check if any of the geometries are invalid and run ST_MakeValid on them if soo, the source of that function is
var fixGeom = Bluebird.coroutine(function * fixGeom(table, fields, db) {
  var hasGeom = yield db(table).select(db.raw('bool_or(the_geom is not null) as hasgeom'));
  hasGeom = hasGeom.length === 1 && hasGeom[0].hasgeom;
  if (hasGeom) {
    debug('has geometry');
    let allValid = yield db(table).select(db.raw('bool_and(st_isvalid(the_geom)) as allvalid'));
    allValid = allValid.length === 1 && allValid[0].allvalid;
    if (allValid) {
      debug('geometry is all valid');
      fields.set('the_geom', 'the_geom');
    } else {
      debug('has invalid geometry');
      fields.set('the_geom', 'ST_MakeValid(the_geom) as the_geom');
    }
  }
});