target-clickhouse
v2.8.0
Published
A Singer target for Clickhouse
Downloads
202
Readme
Target Clickhouse
A Singer target for Clickhouse, for use with Singer streams generated by Singer taps, written in node js using singer-node.
Usage
Install
As npm package on host
npm install -g target-clickhouse
Docker image
docker pull ghcr.io/biron-bi/target-clickhouse
Run
Create a config file
config.json
with connection information and ingestion parameters.{ "host": "localhost", "port": 8123, "database": "destination_database", "username": "user", "password": "averysecurepassword" }
Run
target-clickhouse
against a Singer tap.
In the following exemples:
We echo state at the end of a 'state.jsonl' file
The file current_state.json contains last line of state.jsonl
The file config.json contains clickhouse connection informations
Npm package:
<tap-anything> --state current_state.json | target-clickhouse --config config.json >> state.jsonl
Docker:
In this exemple, container reads config file in a /config
directory
<tap-anything> --state current_state.json | docker run --rm -i -a STDIN -a STDOUT -a STDERR -v "$(pwd):/config:ro" ghcr.io/biron-bi/target-clickhouse --config /config/config.json >> state.jsonl
Config.json
The fields available to be specified in the config file.
Mandatory fields
host
port
username
password
database
Optional fields
logging_level
Default to"INFO"
subtable_separator
Default to"__"
translate_values
: Whether fields should be parsed again to allow conversion of specific values, e.g.True
accepted astrue
. Defaultfalse
batch_size
: Amount of records to read before sending to clickhouse. Default100
finalize_concurrency
: Amount of concurrent stream ingestion finalisation. Default3
extra_active_tables
: List of tables that are considered active even if not present in ACTIVE_STREAMS message. Default[]
finalize_concurrency
Singer specification extension
Several features are supported that are not standard to the singer Spec:
- Update schemas : Pass the repeatable CLI option
--update-streams <stream>
to specify streams for which you want to recreate tables (root and children). - Clean first : Specify
clean_first: true
in SCHEMA messages to wipe table content before each ingestion. - Cleaning column : Specify
cleaning_column: "<column_name>"
in SCHEMA messages to wipe table content that matches column value during ingestion. For instance, if column "date" is specified as cleaning column, and the value "2022-01-01" is encountered in a record, all rows with values "2022-01-01" are replaced with those contained in the stream - All key properties : Specify
all_key_properties: {props: [], children: {}}
in SCHEMA messages to specify primary keys for all children of a root table. This will allow children to create a foreign key to their parent (with the format_parent_<column>
)
Sponsorship
Target Clickhouse is written and maintained by Biron https://birondata.com/
Acknowledgements
Special thanks to the people who built
License
Distributed under the AGPLv3