@contiamo/dev
v0.7.0-2-g5ec7b93
Published
Dev environment for contiamo
Downloads
441
Readme
Contiamo Local Dev Environment
Get the dev environment fast!
Quick overview
Get started:
make docker-auth
make pull
Get the latest versions:
git pull
make pull
Start everything in normal mode:
make start
Stop everything:
make stop
Stop everything and clean up:
make clean
Prepare for Pantheon-external mode (only do this once):
make build
sudo bash -c 'echo "127.0.0.1 metadb" >> /etc/hosts'
Start everything in Pantheon-external mode:
make pantheon-start
- (In Pantheon directory)
env METADB_URL="jdbc:postgresql://localhost:5433/pantheon?user=pantheon&password=test" DATASTORE_TYPE=external sbt run
Enable TLS verify-full
mode on port 5435:
- Download the private key for
*.dev.contiamo.io
:make get-pg-key
echo "127.0.0.1 pg-localhost.dev.contiamo.io" | sudo tee -a /etc/hosts
make build
make pantheon-start
- You may need to tell your local
psql
about the IdenTrust root we happen to be using:curl https://letsencrypt.org/certs/trustid-x3-root.pem.txt > ~/.postgresql/root.crt
psql "[email protected] password=<token> dbname=<project UUID> sslmode=verify-full" -h pg-localhost.dev.contiamo.io -p 5435
Getting started
Prerequisites
Local development is supported via Docker Compose.
Before you start, you must install Docker and Docker-Compose.
Additionally, the development requires access to our private docker registry. To access this ask the Ops team for permissions. Once permissions have been granted you must install the gcloud
CLI.
Once installed, run
make docker-auth pull
This will attempt to
- authenticate with Google,
- configure your Docker installation to use the new Google credentials, and
- pull the required Docker images.
Starting a fresh environment
Finally, to start the development environment, run
make start
Once the environment has started, you should see a message with a URL and credentials, like this
Dev ui: http://localhost:9898/contiamo/profile
Email: [email protected]
Password: localdev
Starting with the latest locadev snapshot
The above section starts with a completely empty environment. A standard development environment with preconfigured data sources (the internal metadbs) is provided in the project and can be started with
make load-snapshot
The existing environment (if any) will be stopped and destroyed, so be careful. It will then start the db, load the data, and then start the rest of the environment.
The environment
- contains two users
[email protected]
and[email protected]
both with passwordlocaldev
- has all of the datahub metadbs installed,
foodmart
,alaska
, andliftdata
- there are two virtualdbs with two views each. One that shows the maintenance tasks inside Hub and the other showing the use of PostGIS queries
- Mr. Lemon is an admin for everything
- Lemon Jr is not an admin and has various permission levels,
liftdata
is private and not available to Lemon Jr - There is a basic amount of metadata assigned to the datasources and tables including custom fields, descriptions, a mix of names, and even one with documentation
This should allow for basic development and testing of most use cases.
Overriding the service images
The image for each service can be overridden using env variables
| variable | value |
|-----------------------|---------------------------------------------------|
| AUTH_IMAGE
| eu.gcr.io/dev-and-test-env/idp:dev
|
| GRAPHQL_IMAGE
| eu.gcr.io/dev-and-test-env/pgql-server:dev
|
| UI_IMAGE
| eu.gcr.io/dev-and-test-env/contiamo-ui:dev
|
| HUB_IMAGE
| eu.gcr.io/dev-and-test-env/hub:dev
|
| DATASTORE_IMAGE
| eu.gcr.io/dev-and-test-env/datastore:dev
|
| PANTHEON_IMAGE
| eu.gcr.io/dev-and-test-env/pantheon:dev
|
| HUB_IMAGE
| eu.gcr.io/dev-and-test-env/hub:dev
|
| PROFILER_IMAGE
| eu.gcr.io/dev-and-test-env/profiler:dev
|
| SYNC_INGESTER_IMAGE
| eu.gcr.io/dev-and-test-env/sync-ingester:latest
|
| SYNC_AGENT_TABLEAU_IMAGE
| eu.gcr.io/dev-and-test-env/sync-agent-tableau:dev
|
You can manually override the image used by setting the required variable and the restarting the services
export HUB_IMAGE=eu.gcr.io/dev-and-test-env/hub:v1.2.3
make stop start
Integrations
The default environment runs only the core services required to support the Data Source integrations. To enable the demo
sign-up service or integration sync-agents for other resource types (like Tableau), you need to enable the optional integration services. To do this, simply export this env variable
export COMPOSE_FILES="-f docker-compose.yml -f docker-compose-extra.yml"
This will modify the start
and stop
commands to include the integration services.
Testing PR images
A helper make target is provided that will automatically pull and restart the local environment with PR preview image for the specified services.
For example to test PR 501 for hub
together with PR 489 for contiamo-ui
, use
make pr-preview services=hub:501,contiamo-ui:489
All other services will use the default images.
To reset to the original state, use
make stop start
End-To-End API testing
The project comes with a suite of end-to-end tests that use the API to verify that the backend services are working as expected. You can run this in any environment by using
make test
This assumes that you have already started the localdev environment using make start
or make pr-preview
.
Passing S3 credentials for the Federated mode / Datasets
By default, the Datasets feature wouldn't work with external DWH systems (e.g. Redshift, Snowflake). Data transfer to these systems needs to go
through mutually accessible object storage. There is pantheon-datasource-test
bucket on S3, but this repo doesn't include credentials for it.
If your scenario requires working with an external DWH you can pass S3 credentials by setting DATASETS_AWS_ACCESS_KEY_ID
and DATASETS_AWS_SECRET_ACCESS_KEY
environment variables. Additionally, the bucket name property should be set via DATASETS_S3_BUCKET
variable. An easy way to set these
variables is the .env
file.
Datasets for testing the Profiler
Two pre-created datasets have been created that provide more interesting stats and entity detection profiles. These should be used to test the Profiler and the related UI components.
The datasets are available in ./datasets
pii.csv
contains PII columns that should be detected during the entity detection profile.- `sales.csv' also contains PII data, but is a good sample for the stats report.
Start and add an external data source
We have a couple of data sets available on GCR for internal testing:
- Postgres database that contains a single table
liftdata
. - Postgis (Postgres) database that contains geometry of Alaska regions. The purpose is to test geometry-related operations for Pantheon and PGQL server.
Lift data
After starting the local dev environment, run:
docker run --name liftdata --rm --network dev_default eu.gcr.io/dev-and-test-env/deutschebahn-liftdata-postgres:v1.0.0
In the Data Hub, you can now add a external the data source using:
| field | value |
|------------|---------------------|
| HOST
| liftdata
|
| PORT
| 5432
|
| DATABASE
| liftdata
|
| USER
| pantheon
|
| PASS
| contiamodatahub19
|
when you are done, run
docker kill liftdata
to stop and cleanup the database container.
Postgis Alaska regions
After starting the local dev environment, run:
docker run --name alaska --rm --network dev_default eu.gcr.io/dev-and-test-env/alaska-postgis:1.0.0
In the Data Hub, you can now add the data source using:
| field | value |
|------------|---------------------|
| HOST
| alaska
|
| PORT
| 5432
|
| DATABASE
| alaska
|
| USER
| pantheon
|
| PASS
| contiamodatahub19
|
when you are done, run
docker kill alaska
to stop and cleanup the database container.
Stopping
You can always cleanly stop the environment using
make stop
Any data in the databases will be preserved between stop
and start
.
Adding the metadbs as external data sources
You can add the Data Hubs own metadbs to the Data Hub, meaning you can inspect the internals of the Data Hub from the Data Hub :) . Each of the following databases can be added as PostgreSQL data sources.
| service | db name | host | port | username | password |
|-------------|-------------|----------|--------|------------|------------|
| datastore
| datastore
| metadb
| 5433
| user
| localdev
|
| hub
| hub
| metadb
| 5433
| user
| localdev
|
| idp
| simpleidp
| metadb
| 5433
| user
| localdev
|
| pantheon
| pantheon
| metadb
| 5433
| pantheon
| test
|
Accessing the metadbs with pgadmin
Go to http://localhost:5050 (The link is on http://localhost:9898/lemonade-shop/configuration page)
Login with the following credentials:
- Email:
[email protected]
- Password:
admin
Add the metadb
server with the following connection info:
- Host name/address:
metadb
- Port:
5433
- Username:
user
- Password:
localdev
- Save password?: ✅
Cleaning up
If you need to reclaim space or want to restart your environment from scratch use
make clean
This will stop your current environment and remove any Docker volumes related to it. This includes any data and metadata in the databases.
As time goes on, Docker will download new images, but it does not automatically garbage collect old images. To do so, run docker system prune
.
On Mac, all Docker file system data is stored in a single file of a fixed size, which is 16GB or 32GB by default. You can configure the size of this file by clicking on the Docker Desktop tray icon -> Preferences -> Disk -> move the slider.
Exporting and restoring the database state
You can find snapshot.sh
and restore.sh
files in the ./scripts
folder.
Both scripts have the only parameter — a filename.
Snapshot
To make an encrypted snapshot from your local dev environment use:
./snapshot.sh localdev.snapshot
this will ask you to set the encryption key, will export the database of each service applying compression.
The snapshot is encrypted with a symmetric key (AES-128 cipher).
Restore
To erase your local database for each service and restore it to the state of the earlier exported snapshot use:
./restore.sh localdev.snapshot
this will delete all the data you have locally and will perform a reverse operation for shapshot.sh
.
IMPORTANT: do not move the scripts out of their ./scripts
folder, they use relative paths.
The make load-snapshot
uses the committed localdev.snapshot
. You can use the script, as described above, to load any other snapsnots
Tips
Run
make
ormake help
to see all available commands.You can also run these commands from a different directory, with e.g.
make -C /path/to/dev start
.The commands in the Makefile are very useful, but there's some extra stuff available if you use
docker-compose
straight. For instance, get all logs withdocker-compose logs --follow
, or only datastore worker logs withdocker-compose logs --follow ds-worker
. Refer todocker-compose.yml
for the definitions of the services.To use
docker-compose
withoutcd
'ing to this directory, use e.g.docker-compose -f /path/to/dev/docker-compose.yml logs --follow
.
Custom Images
The Compose file supports overriding the Docker tag used for a service by setting several environment variables:
| Server | Environment Variable | Default |
|-------------|----------------------|----------|
| datastore | DATASTORE_TAG
| dev
|
| idp | IDP_TAG
| dev
|
| pantheon | PANTHEON_TAG
| latest
|
| contiamo-ui | CONTIAMOUI_TAG
| latest
|
Options to Postgres
In environment variable POSTGRES_ARGS
, you can pass extra arguments to the PostgreSQL daemon. By defaults, this is set to -c log_connections=on
. To log modification statements in addition to connections, start the dev environment with
env POSTGRES_ARGS="-c log_connections=on -c log_statement=mod" make start
You can inspect these logs with docker-compose logs --follow metadb
. The four acceptable values for log_statement
are none
, ddl
, mod
, and all
. Further Postgres options can be found here: https://www.postgresql.org/docs/11/runtime-config.html .
Setting up Pantheon Local Development
Local Pantheon debug development is supported by port redirection. To set this up, you first need to run two extra steps.
Run
make build
This builds the
eu.gcr.io/dev-and-test-env/pantheon:redir
Docker image, a "pseudo-Pantheon" that forwards everything to your local Pantheon on127.0.0.1
port4300
. Do not push this image!Modify your
/etc/hosts
file to add127.0.0.1 metadb
You can easily do this with
sudo bash -c 'echo "127.0.0.1 metadb" >> /etc/hosts'
.This ensures that Pantheon can correctly resolve the storage database service.
Running the Pantheon Local Development
Make sure you first set up the prerequisites, and also set up for Pantheon local development.
To start the Pantheon dev environment use
make pantheon-start
This will replace the Pantheon image with a simple port redirection image that will enable transparent redirect of
- http://localhost:9898/pantheon/api/v1/* to http://localhost:4300/api/v1/* ,
- http://localhost:9898/pantheon/jdbc/* to http://localhost:8765/* .
You can then start your local Pantheon debug build, e.g. from your IDE, and have it bind to those ports on localhost. To configure the meta-DB and enable data store from Pantheon, run SBT with
env METADB_URL="jdbc:postgresql://localhost:5433/pantheon?user=pantheon&password=test" DATASTORE_TYPE=external sbt
or set the same environment variables in IntelliJ. You can also use export METADB_URL="jdbc:postgresql://localhost:5433/pantheon?user=pantheon&password=test" DATASTORE_TYPE=external
, to set the environment variables in the current terminal.
The docker-compose configuration will expose the following ports for use from local Pantheon:
- Nginx web server at
127.0.0.1
port9898
<-- Use this to access Data Hub including UI, IDP, Pantheon, Datastore. - PostgreSQL meta-DB at
127.0.0.1
port5433
, usernamepantheon
, passwordtest
. - Datastore manager at
127.0.0.1
port9191
- Minio (for ingested files) at
127.0.0.1
port9000
When accessing Pantheon via Nginx on port 9898, you need to pre-pend /pantheon
to Pantheon URLs, for instance: http://localhost:9898/pantheon/api/v1/status . Nginx will strip off the /pantheon
, authenticate the request with IDP, and forward the request to Pantheon as /api/v1/status
.
Using the pantheon
/test
credentials for Postgres, you also have access to
- the
metadb
database, for datastore, - collection databases corresponding to a managed DB,
- collection databases corresponding to materializations for a project,
- the
simpleidp
database.
Running a custom Pantheon in prod mode
You can also run Pantheon in prod mode locally, as follows.
- In
sbt
shell, rundist
. - From a console, run
docker build -t eu.gcr.io/dev-and-test-env/pantheon:local .
This will download dependencies if they are not cached yet, build a Docker image for Pantheon, and tag itlocal
. - Run
env PANTHEON_TAG=local make start
.
Now datastore and metadb will still be available on the usual ports, but Nginx will proxy to a prod-mode Pantheon which runs inside Docker. Pantheon will automatically be run with appropriate environment variables (https://github.com/contiamo/dev/blob/master/docker-compose.yml#L81).
Warning! Do not push this image to GCR. It may accidentally end up being deployed on dev.contiamo.io .
Profiler Server
The Profiler currently lives at http://localhost:8383.