npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@posthog/clickhouse

v1.7.0

Published

Yandex ClickHouse database interface

Downloads

15,021

Readme

Simple and powerful interface for ClickHouse

npm package License: MIT

npm install @posthog/clickhouse

Fork of @apla/clickhouse.

Synopsis

const ClickHouse = require('@posthog/clickhouse')
const ch = new ClickHouse({ host, port, user, password })

const stream = ch.query("SELECT 1", (err, data) => {})
stream.pipe(process.stdout)

// promise interface, not recommended for selects
await ch.querying("CREATE DATABASE test")

Examples:

API

new ClickHouse(options: Options)

Options

| | required | default | description | :--------------- | :------: | :------------ | :---------- | host | ✓ | | Host to connect. | user | | | Authentication user. | password | | | Authentication password. | path (pathname) | | / | Pathname of ClickHouse server. | port | | 8123 | Server port number. | protocol | | 'http:' | 'https:' or 'http:'. | dataObjects | | false | By default (false), you'll receive array of values for each row. If you set dataObjects: true, every row will become an object with format: { fieldName: fieldValue, … }. Alias to format: 'JSON'. | format | | JSONCompact | Adds the FORMAT statement for query if it did not have one. Specifies format of selected or inserted data. See "Formats for input and output data" to find out possible values. | queryOptions | | | Object, can contain any ClickHouse option from Settings, Restrictions and Permissions. See example. | readonly | | false | Tells driver to send query with HTTP GET method. Same as readonly=1 setting. More details. | timeout, headers, agent, localAddress, servername, etc… | | | Any http.request or https.request options are also available.

Options example:
const ch = new ClickHouse({
  host: "clickhouse.msk",
  dataObjects: true,
  readonly: true,
  queryOptions: {
    profile: "web",
    database: "test",
  },
})

clickHouse.query(query, [options], [callback])

Sends a query statement to a server.

query: string

SQL query statement.

options: Options

The same Options, excluding connection options.

callback: (error, result) => void

Will be always called upon completion.

Returns: DuplexStream

It supports .pipe to process records. You should have at least one error handler listening. Via query callback or via stream error event.

| Stream event | Description | ------------ | ----------- | 'error' | Query execution finished with error. If you have both query callback and stream error listener, you'll have error notification in both listeners. | 'metadata' | When a column information is parsed. | 'data' | When a row is available. | 'end' | When entire response is processed. Regardless of whether there is an 'end' listener, the query callback are always called. You should always listen to 'data' event together with 'end' event. "The 'end' event will not be emitted unless the data is completely consumed." If you don't need to handle 'data' event prefer to use only callback or Promise interface.

stream.supplemental

After response is processed, you can read a supplemental response data from it, such as row count.

Examples:

clickHouse.ping(callback)

Sends an empty query. Doesn't requires authorization.

callback: (error, result) => void

Will be called upon completion.

Promise interface

Promise interface is not recommended for INSERT and SELECT queries.

  • INSERT can't do bulk load data with promise interface.
  • SELECT will collect entire query result in the memory. See the Memory size section.

With promise interface query result are parsed synchronously. This means that large query result in promise interface:

  • Will synchronously block JS thread/event loop.
  • May lead to memory leaks in your app due peak GC loads.

Use it only for queries where resulting data size is is known and extremely small. The good cases to use it is DESCRIBE TABLE or EXISTS TABLE

clickHouse.querying(query, [options])

Similar to ch.query(query) but collects entire response in memory and resolves with complete query result. See the Memory size section.

options: Options

The same Options, excluding connection options.

Returns: Promise

Will be resolved with entire query result.

Example of promise interface.

clickHouse.pinging()

Promise interface for .ping.

Returns: Promise

How it works

Bulk data loading with INSERT statements

INSERT can be used for bulk data loading. There is a 2 formats easily implementable with javascript: CSV and TabSeparated/TSV.

CSV is useful for loading from file, thus you can read and .pipe into clickhouse file contents. To activate CSV parsing you should set format driver option or query FORMAT statement to CSV:


var csvStream = fs.createReadStream('data.csv')
var clickhouseStream = ch.query(statement, { format: CSV })

csvStream.pipe(clickhouseStream)

TSV is useful for loading from file and bulk loading from external sources, such as other databases. Only \\, \t and \n need to be escaped in strings; numbers, nulls, bools and date objects need some minor processing. You can send prepared TSV data strings (line ending will be appended automatically), buffers (always passed as is) or Arrays with fields.

Internally, every field will be converted to the format which ClickHouse can accept. Then escaped and joined with delimiter for the particular format. If you ever need to store rows (in arrays) and send preformatted data, you can do it.

ClickHouse also supports JSONEachRow format which can be useful to insert javascript objects if you have such recordset.

const stream = ch.query(statement, { format: 'JSONEachRow' })

stream.write(object) // Do write as many times as possible
stream.end() // And don't forget to finish insert query

Memory size

You can read all the records into memory in single call like this:


var ch = new ClickHouse({ host: host, port: port })
ch.querying("SELECT number FROM system.numbers LIMIT 10", (err, result) => {
  // result will contain all the data you need
})

In this case whole JSON response from the server will be read into memory, then parsed into memory hogging your CPU. Default parser will parse server response line by line and emits events. This is slower, but much more memory and CPU efficient for larger datasets.

Examples

Selecting with stream:

const readableStream = ch.query(
  'SELECT * FROM system.contributors FORMAT JSONEachRow',
  (err, result) => {},
)
const writableStream = fs.createWriteStream('./contributors.json')
readableStream.pipe(writableStream)

Inserting with stream:

const readableStream = fs.createReadStream('./x.csv')
const writableStream = ch.query('INSERT INTO table FORMAT CSV', (err, result) => {})
readableStream.pipe(writableStream)

Insert single row of data:

const ch = new ClickHouse(options)
const writableStream = ch.query(`INSERT INTO table FORMAT TSV`, (err) => {
  if (err) {
    console.error(err)
  }
  console.log('Insert complete!')
})

// data will be formatted for you
writableStream.write([1, 2.22, "erbgwerg", new Date()])

// prepare data yourself
writableStream.write("1\t2.22\terbgwerg\t2017-07-17 17:17:17")

writableStream.end()

Selecting large dataset:

const ch = new ClickHouse(options)
// it is better to use stream interface to fetch select results
const stream = ch.query("SELECT * FROM system.numbers LIMIT 10000000")

stream.on('metadata', (columns) => { /* do something with column list */ })

let rows = []
stream.on('data', (row) => rows.push(row))

stream.on('error', (err) => { /* handler error */ })

stream.on('end', () => {
  console.log(
    rows.length,
    stream.supplemental.rows,
    stream.supplemental.rows_before_limit_at_least, // how many rows in result are set without windowing
  )
})

Inserting large dataset:

const ch = new ClickHouse(options)
// insert from file
const tsvStream = fs.createReadStream('data.tsv')
const clickhouseStream = ch.query('INSERT INTO table FORMAT TSV')

tsvStream.pipe(clickhouseStream)

Settings for connection:

const ch = new ClickHouse({
  host: 'clickhouse.msk',
  queryOptions: {
    database: "test",
    profile: "web",
    readonly: 2,
    force_index_by_date: 1,
    max_rows_to_read: 10 * 1e6,
  },
})

Settings for query:

const ch = new ClickHouse({ host: 'clickhouse.msk' })
const stream = ch.query('INSERT INTO table FORMAT TSV', {
  queryOptions: {
    database: "test",
    insert_quorum: 2,
  },
})

Promise interface:

const ch = new ClickHouse(options)
// Check connection to server. Doesn't requires authorization.
await ch.pinging()
const { data } = await ch.querying("SELECT 1")
// [ [ 1 ] ]
const { data } = await ch.querying("DESCRIBE TABLE system.numbers", { dataObjects: true })
// [ { name: 'number', type: 'UInt64', default_type: '', default_expression: '' } ]

Questions?

Join our Slack community.

We're here to help you with anything PostHog!