gce-elastic-docker
v1.6.3
Published
A package to help setup Elasticsearch / Kibana clusters on Google Compute Engine.
Downloads
10
Maintainers
Readme
gce-elastic-docker
This package helps you set up Elasticsearch/Kibana clusters on Google Compute Engine. If you're also looking for a similar way to setup Elasticsearch/Kibana locally for development, look at this repo.
Getting Started
There are 5 main stages when using this module.
- creating Elasticsearch/Kibana Docker images locally.
- deploying these images to your Google Container Registry.
- creating a firewall rule to open your Kibana nodes on ports 80/443.
- creating Elasticsearch clusters on Compute Engine from these images.
- fetching the nodes from all your clusters at some time in the future so you can list/update/start/stop/delete them.
Prerequisites
You'll need curl
/gcloud
/gcloud beta
/docker
commands, a Compute Engine project that is set as your gcloud
project, and you must also have docker
configured to use gcloud
as a Docker credential helper.
With that said, I would strongly suggest you try and push a simple image to your Container Registry via the command line before you use this package. It will teach you some Docker fundamentals and will show you where your Container Registry is. Also if you've never deployed a VM on Compute Engine, you should try that too.
Installation
npm install gce-elastic-docker
Examples
If you would like to run these examples (recommended), copy/paste the files in ./examples and run them, ie node single-node-ex
and node multi-node-ex
. Also please make sure you delete the VMs when you are not using them, otherwise charges will incur on your GCE billing account. You can find your VMs here.
creating an Elasticsearch Docker image locally
you can view your local images w/ docker images
const ged = require('gce-elastic-docker');
const verbose = true;
const gce_project_id = 'my-project-id'; // replace w/ yours
const es_image_name = `gcr.io/${gce_project_id}/es-image`;
const mk_es_image = async () => {
await (new ged.Image({
es_version: '6.4.2',
name: es_image_name
})).create(verbose);
};
creating a Kibana Docker image locally
const kib_image_name = `gcr.io/${gce_project_id}/kib-image`;
const mk_kib_image = async () => {
await (new ged.Image({
es_version: '6.4.2',
name: kib_image_name,
kibana: true
})).create(verbose);
};
deploying them to Google Container Registry
you can find your registry here and then select your project.
const deploy_es_image = async () => {
await (new ged.Image({
es_version: '6.4.2',
name: es_image_name
})).deploy(verbose);
};
const deploy_kib_image = async () => {
await (new ged.Image({
es_version: '6.4.2',
name: kib_image_name,
kibana: true
})).deploy(verbose);
};
creating a firewall rule for your Kibana nodes
you can find your firewall rules here
const kibana_firewall = 'kibana-firewall';
const kibana_network_tag = 'kibana-network-tag';
const mk_kibana_firewall = async () => {
await ged.kibana_firewall.create({
name: kibana_firewall,
network_tag: kibana_network_tag,
verbose: verbose
});
};
creating a single md node cluster w/ Kibana
you can find your default gce service account here
// replace w/ yours
const gce_service_acc = '[email protected]';
const mk_node = async () => {
const child_node = new ged.ChildNode({
image: kib_image_name,
cluster_name: 'single-node-cluster',
name: `single-node`,
dsize: 10,
dtype: 'pd-ssd',
hsize: 500,
mtype: 'n1-standard-1',
zone: 'us-west1-a',
kibana: true,
service_account: gce_service_acc
});
const tasks = child_node.create({
verbose: verbose,
kibana_network_tag: kibana_network_tag,
kibana_users: { 'tom': 'hanks' }
});
return await tasks.main.on_end();
};
creating nodes for a 4 node cluster (3 md, 1 Kibana)
const mk_cluster = async () => {
const child_nodes = [0,1,2,3].map(num => {
return new ged.ChildNode({
image: !num ? kib_image_name : es_image_name,
cluster_name: 'my-cluster',
name: `node-${num}`,
dsize: 10,
dtype: 'pd-ssd',
hsize: 500,
mtype: 'n1-standard-1',
zone: 'us-west1-a',
master: !!num,
data: !!num,
kibana: !num,
service_account: gce_service_acc
});
});
const promises = child_nodes.map(n => {
return n.partial_create({
verbose: verbose,
kibana_network_tag: kibana_network_tag,
kibana_users: !n.kibana ? {} : { 'tom': 'hanks' }
});
});
return await Promise.all(promises);
};
connecting the 4 nodes to form the cluster
const connect_nodes = async (nodes) => {
const master_ips = nodes.filter(n => n.master).map(n => n.ip);
const promises = nodes.map(n => {
n.set_env({
'discovery.zen.ping.unicast.hosts': master_ips.toString(),
'discovery.zen.minimum_master_nodes': 2
});
const tasks = n.update({ verbose: verbose });
return tasks.main.on_end();
});
return await Promise.all(promises);
};
tieing it all together
const single_node_combo = async () => {
await mk_es_image();
await mk_kib_image();
await deploy_es_image();
await deploy_kib_image();
await mk_kibana_firewall();
const node = await mk_node();
};
const multi_node_combo = async () => {
await mk_es_image();
await mk_kib_image();
await deploy_es_image();
await deploy_kib_image();
await mk_kibana_firewall();
const nodes = await mk_cluster();
await connect_nodes(nodes);
};
fetching the nodes from your clusters later
const fetch_nodes_for_later = async() => {
const nodes = await ged.Node.fetch_all(verbose);
console.log(nodes);
};
API overview
Everything that follows can be found on the gce
object. Fields with ? denote an optional field (typescript) and [false] denotes a field with a default value of false. Entities prefixed with an I
indicate an interface. Finally, any param/option that is verbose
just indicates the operation will run w/ logging.
kibana_password_dir
the directory in the Kibana containers where your Kibana users are.kibana_password_file
the file in the Kibana containers where your Kibana users are stored.kibana_users_env_var
the environment variable that has your initial list of Kibana users.registries
the GCE Container Registries used.m_types
an object of GCE zones to GCE machine types. ie{ 'us-west2-b': [ 'f1-micro', 'g1-small' ... ], 'us-west2-c': [ 'f1-micro', 'g1-small' ... ] ... }
zones
the GCE zones used.regions
the GCE regions used.short_regions
shorter versions ofged.regions
. use this if you want to include a shorter version of a region in your node names.ged.short_regions['us-west1'] // usw1 ged.short_regions['northamerica-northeast1'] // nane1
kibana_firewall
create(opts): Promise
creates a firewall rule that opens ports 80/443 to ONLY your Kibana nodes.opts { name: string; network_tag: string; suppress?: boolean; verbose?: boolean; }
name
the name of the firewall rule.network_tag
the tag to apply the firewall rule to, in other words, the tag that is on all your Kibana VMs.suppress[true]
ignores the error that says the rule has already been created.verbose[false]
delete(opts): Promise
removes the firewall ruleopts { name: string; suppress?: boolean; verbose?: boolean; }
name
the name of the firewall rule.suppress[true]
ignores the error that says the rule doesn't exist.verbose[false]
Image
constructor(opts)
opts { es_version: string; kibana?: boolean; name: string; }
es_version
the Elasticsearch version you want to use.kibana[false]
should this image have Kibana installed?name
the name of the image to create. It must follow the format {gce-registry}/{gce-project-id}/{image-name}, iegcr.io/my-project-id/my-image
prototype.create(verbose?: boolean[false]): Promise
creates a Docker image locally.prototype.deploy(verbose?: boolean[false]): Promise
deploys the image to your Google Container Registry (make sure the image exists locally)
EndTask
prototype.on_end(): Promise
denotes when a task has finished. it may resolve or reject w/ data. see the specific task to determine when it does resolve w/ data.
FullTask extends EndTask
prototype.on_start(): Promise
denotes when a task has started. it will never reject.
INodeCreateTasks
The following tasks are executed in the order you see when a Node
is being created.
{
main: EndTask;
node_create: FullTask;
elastic_ready: FullTask;
kibana_ready: FullTask;
kso_upload: FullTask;
scripts_upload: FullTask;
sm_upload: FullTask;
}
main
concludes when all the other tasks have finished. It will resolve w/ an instance ofNode
if it is successful and it will reject with the first task that rejects' error.node_create
will resolve w/ an instanceofNode
once the VM has been created.elastic_ready
concludes when Elasticsearch goes live. it works by submittinggcloud compute ssh
curl requests to your Elasticsearch nodes VM at regular intervals waiting until it responds w/ a cluster state >= yellow.kibana_ready
concludes when Kibana goes live. it works by submittinggcloud compute ssh
curl requests to your Kibana nodes VM at regular intervals waiting until it responds w/ a status of 200. If the node isn't a Kibana node, the task will finish immediately.kso_upload
uploads any Kibana saved_objects given inNodeCreateOpts
to the Kibana instance if it's a Kibana container.- if a saved_object is erroneous this will reject w/ that error.
- else it will resolve w/ the created saved_objects.
scripts_upload
uploads any scripts given inNodeCreateOpts
to the node.if two scripts are uploaded successfully, this task will resolve w/ something like (standard Elasticsearch response)
[{ acknowledged: true }, { acknowledged: true }]
if N scripts are uploaded and 1 of them fails, this task will reject with something like (standard Elasticsearch response)
{ error: { root_cause: [ [Object] ], type: 'illegal_argument_exception', reason: 'unable to put stored script with unsupported lang [painlesss]' }, status: 400 }
sm_upload
uploads any settings/mappings given inNodeCreateOpts
to the node.if settings/mappings are uploaded for a users index, you'll get something like (standard Elasticsearch response)
[{ acknowledged: true, shards_acknowledged: true, index: 'users' }]
if N indices are uploaded and 1 of them fails, this task will reject with something like (standard Elasticsearch response)
{ error: { root_cause: [ [Object] ], type: 'illegal_argument_exception', reason: 'Failed to parse value [0] for setting [index.number_of_shards] must be >= 1' }, status: 400 }
INodeUpdateTasks
The following tasks are executed in the order you see when a Node
is being updated.
{
main: EndTask;
node_update: FullTask;
elastic_ready: FullTask;
kibana_ready: FullTask;
kso_upload: FullTask;
scripts_upload: FullTask;
sm_upload: FullTask;
}
main
""node_update
will resolve w/ an instanceof Node once the VM has been updated.elastic_ready
""kibana_ready
""kso_upload
""scripts_upload
""sm_upload
""
IElasticScript
{
lang: string;
source: string;
}
INodeCreateOpts
{
interval?: number;
kibana_network_tag?: string;
kibana_users?: { [username: string]: string };
kso?: any[];
scripts?: { [name: string]: IElasticScript };
sm?: object;
verbose?: boolean;
}
interval[2000]
interval in milliseconds between consecutivegcloud compute ssh
requests. these requests are purely health checks on your Elasticsearch/Kibana statuses. as a good rule, if you are making a cluster of N nodes, set this to 5000 * N.kibana_network_tag
required if the node you are creating is a Kibana node. if you do not provide this for a Kibana node, an error will be thrown.kibana_users[{}]
an object of usernames to passwords. these are the users you want to access your Kibana nodes through the browser.{ 'meryl': 'streep', 'tom': 'hanks' }
kso[empty array]
an array of Kibana saved_objects. use this when you want to create the Kibana instance for the container w/ charts/dashboards you've previously saved from another Kibana instance. To fetch the saved_objects from a currently running Kibana instance, call itsNode.prototype.kibana_saved_objects
method. A saved_objects array looks like:[ { "id": "e84e14c0-cdeb-11e8-b958-0b2cbb7f0531", "type": "timelion-sheet", "updated_at": "2018-10-12T08:37:00.919Z", "version": 1, "attributes": { "title": "sheet1", "hits": 0, "description": "", "timelion_sheet": [ ".es(*).title(\"I uploaded this.\")" ], "timelion_interval": "auto", "timelion_chart_height": 275, "timelion_columns": 2, "timelion_rows": 2, "version": 1 } } ]
scripts[{}]
an object of Elasticsearch scripts. the root keys are the script ids and their values are the scripts themselves.{ calc_score: { lang: 'painless', source: 'Math.log(_score * 2) + params.my_modifier' } }
sm[{}]
an object of Elasticsearch index settings/mappings. the root keys are the indices and their values are their settings/mappings.{ users: { mappings: { _doc: { properties: { name: { type: 'keyword' } } } }, settings: { number_of_shards: 1 } } }
verbose[false]
INodeUpdateOpts
{
interval?: number;
kso?: any[];
scripts?: { [name: string]: IElasticScript };
sm?: object;
verbose?: boolean;
}
interval[2000]
""kso[empty array]
""scripts[{}]
""sm[{}]
""verbose[false]
BaseNode
constructor(opts)
opts { name: string; cluster_name: string; master?: boolean; data?: boolean; ingest?: boolean; kibana?: boolean; hsize: number; khsize?: number; max_map_count?: number; env?: {}; labels?: {}; zone: string; mtype: string; dsize: number; dtype: 'pd-standard' | 'pd-ssd'; image: string; service_account: string; }
name
the name for this node and its VM.cluster_name
the cluster name for this node.master[true]
is this an master node?data[true]
is this an data node?ingest[false]
is this an ingest node?kibana[false]
is this a node w/ Kibana? MUST be set if it is.hsize
the heap size in MB you want to give to Elasticsearch. see herekhsize[512]
the max heap size in MB you want to give to your Kibana NodeJS process. this is the value for V8'sNODE_OPTIONS=--max-old-space-size
.max_map_count[262144]
see hereenv[{}]
any Elasticsearch/Kibana Docker environment variables you want set along w/ their values. You should only read from this. You should not write to this directly. To write to this, useBaseNode.prototype.set_env
instead.labels[{}]
any labels you want set on the VM instance. note,ged
is reserved; its used by this package to identify VMs this package has made. also note, you can only set labels on create. you cannot change labels or update them later. the reason for this is due to the nature of thegcloud beta compute instances update-container
command. currently, it does not allow you to set environment variables/labels at the same time. updating a container should only be one command. by splitting it into two commands, you run the risk of label/environment variable inconsistency if by rare chance one of the gcloud update calls fail.zone
the GCE zone you want to place this nodes VM in.mtype
the GCE machine type you want to use for this nodes VM.dsize
the disk size in GB you want for this nodes VM. must be atleast 10.dtype
the disk type you want for this nodes VM. eitherpd-ssd
orpd-standard
image
the name of the image in your Google Container Registry to use for this nodes VM. If this is a Kibana node, make sure you place a Kibana image here.service_account
the default GCE service account to use. this is necessary for the VM to pull your image from your Container Registry.region
auto set by the constructor. determined from the zone you provide.short_region
auto set by the constructor. determined from the zone you provide.
prototype.set_env(env: {})
call this method when you want to add/delete environment variables on the VM. To delete an environment variable, set its value as null. This is local, to persist them, you'll need to callNode.prototype.update
.node.set_env({ 'discovery.zen.ping.unicast.hosts': '10.2.3.4, 10.2.3.5', 'a_var_to_remove': null })
the following environment variables are reserved and thus you cannot set them
[ 'kibana_users', 'ged', 'bootstrap.memory_lock', 'cluster.name', 'ES_JAVA_OPTS', 'network.host', 'node.data', 'node.ingest', 'node.master', 'node.name', 'NODE_OPTIONS' ]
prototype.set_hsize(v: number)
the new value you want for the heap size in MB. to persist the changes, you'll need to callNode.prototype.update
.prototype.set_khsize(v?: number)
the new value you want for the max Kibana heap size in MB. to persist the changes, you'll need to callNode.prototype.update
. Ifundefined
is given ,the default value of 512 is used.
ChildNode extends BaseNode
prototype.create(opts: INodeCreateOpts): INodeCreateTasks
executes all the tasks found inINodeCreateTasks
. Use this method when you deploy single-node clusters.prototype.partial_create(opts: INodeCreateOpts): Promise<Node>
ONLY creates the VM. Use this when you want to deploy a cluster. The reason for this is purely to save time. In order to make a cluster, the nodes have to be created first to fetch their internal ips. Once the ips are obtained, the nodes have to be restarted w/ the new ips set as value to thediscovery.zen.ping.unicast.hosts
environment variable. This is necessary for the nodes to connect and form a cluster. It makes no sense to callChildNode.prototype.create
and wait for Elastic/Kibana to be ready when your just going to restart it.
Node extends BaseNode
fetch_all(verbose?: boolean[false]): Promise<Nodes[]>
fetches all the nodes this package has created. It does so by fetching the VMs w/ theged
label and grabbing the value for itsged
environment variable.constructor(opts)
opts are all the ones found in theBaseNode
constructor plus{ created: number; ip: string; }
created
when the nodes VM was created in milliseconds since epoch (UTC).ip
the internal ip set on the VM. this ip does not change between VM starts/stops.
prototype.update(opts: INodeUpdateOpts): INodeUpdateTasks
executes all the tasks found onINodeUpdateTasks
. This restarts the VM the node is hosted on.prototype.start(verbose?: boolean[false])
starts the hosting VM.prototype.stop(verbose?: boolean[false])
stops the hosting VMprototype.restart(verbose?: boolean[false])
stops then starts the hosting VM.prototype.delete(verbose?: boolean[false])
deletes the hosting VM.prototype.wait_for_elastic(interval: number[2000], verbose?: boolean[false])
sends health checks to your Elasticsearch node waiting for cluster state >= yellow. there is an interval between requests you can specify.prototype.wait_for_kibana(interval: number[2000], verbose?: boolean[false])
sends health checks to your Kibana node waiting for status 200. there is an interval between requests you can specify.prototype.exec(cmd: string, verbose?: boolean): Promise
executes the given command in this nodes container on the VM. If the container is not available, this will throw.const resp = await node.exec('curl localhost:9200/_cluster/health'); const status = JSON.parse(resp).status; // yellow | green | red ...
prototype.cluster_health(verbose?: boolean[false]): Promise<{} | undefined>
curls the host on port 9200 and asks for its cluster health. If it succeeds, it resolves with the standard Elasticsearch response. If it fails or gets no response, it resolves withundefined
. For example:{ cluster_name: 'single-node-cluster', status: 'green', timed_out: false, number_of_nodes: 1, number_of_data_nodes: 1, active_primary_shards: 0, active_shards: 0, relocating_shards: 0, initializing_shards: 0, unassigned_shards: 0, delayed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 100 }
prototype.cluster_state(verbose?: boolean[false]): Promise<string | undefined>
curls the host on port 9200 and asks for its cluster health state. If it succeeds, it resolves withgreen | yellow | red
. If it fails or gets no response, it resolves withundefined
.prototype.kibana_status(verbose?: boolean[false]): Promise<number | undefined>
curls the host on port 5601 and checks the http status code. If it succeeds, it resolves with a number (like 200). If it fails or gets no response, it resolves withundefined
.prototype.kibana_saved_objects(verbose?: boolean[false]): Promise<[]>
curls the host on port 5601 @/api/saved_objects/_find
and returns the saved_objects array. will throw if this isn't a Kibana node. I believe this api endpoint was added in 6.4, so don't call this if your image was for an es version < 6.4. (see here)
supported versions of Elasticsearch/Kibana
Currently, 5.x and 6.x should work. The only part I have to keep up to date are the supported Docker environment variables for Elasticsearch/Kibana. These seem to be updated for each major/minor version.
FAQ
For general insight into how this package works, I strongly recommend you create a single node cluster and set verbose to true.
What can i / cant i update for a node/VM?
- The only things you can update are those fields on
BaseNode.prototype
that have public setter methods, ie methods which do not start with an_
. Currently this isset_env
,set_hsize
andset_khsize
. Most of the things you will need to update for Elasticsearch/Kibana can be done through setting environment variables.
- The only things you can update are those fields on
How do i update a node/VM?
- call the setter method on the
Node
instance and then call itsprototype.update
method. This method takes the changes you set on the local instance and commits them to the VM instance. You can optionally monitor the tasks found on the returnedNodeUpdateTasks
instance.
- call the setter method on the
How are updates persisted?
- Environment variables can be set for a Container VM. All the environment variables you set/update are set/update there as well. In addition to those you set, this package base64 encodes a stringified version of the
Node
instance and stores it as an environment variable, specifically theged
environment variable.
- Environment variables can be set for a Container VM. All the environment variables you set/update are set/update there as well. In addition to those you set, this package base64 encodes a stringified version of the
How are the VMs this package created distinguished from my other VMs?
- this package sets the
ged
label on your instance when the instance is created. This is howNode.fetch_all
works.
- this package sets the
How safe are my Elasticsearch clusters?
- All non Kibana nodes are completely isolated. They have external ips, but there are no firewall rules that open access to any port on them. You can use their internal ips from within your VMs to access them. All Kibana nodes are open on ports 80/443 to anyone, however only those that know one of your users' usernames/passwords can access them. Also note that HTTP is redirected to HTTPS which has a self signed SSL cert. self signed will give you a browser warning, but you can disregard this and proceed because we know its safe...
How are my Kibana users stored?
when you create a Kibana node, the following occurs to your Kibana users:
{ 'meryl': 'streep', 'tom': 'hanks' } -> meryl:$apr1$yxz1hI19$9vEAdWWgswnZNmvke7oKG1 tom:$apr1$icMT/wUN$EDXt8IFVlI4mGywx2ZZ.8 -> bWVyeWw6JGFwcjEkeXh6MWhJMTkkOXZFQWRXV2dzd25aTm12a2U3b0tHMSB0b206JGFwcjEkaWNNVC93VU4kRURYdDhJRlZsSTRtR3l3eDJaWi44Cg== -> meryl:$apr1$yxz1hI19$9vEAdWWgswnZNmvke7oKG1 tom:$apr1$icMT/wUN$EDXt8IFVlI4mGywx2ZZ.8
In other words, your simple json is transformed to
meryl:<hash> tom:<hash>
which is bas64 encoded and stored as an environment variable on your VMs. When the container starts on your VM, the environment variable is base64 decoded and eachuser:<hash>
is stored on its own line in a htpasswd file Nginx uses. Nginx is what proxies from port 443 to Kibana on port 5601. Note that once the passwords are set the first time, they are NOT reset again.How do i get into the container?
ssh into the host VM, then
sudo -i docker ps -a # get the container id docker exec -it <id-here> bash
How do i see my Kibana users?
get into the container on the vm, then
cat /kibana-users/.htpasswd
How do i change/delete a Kibana user?
get into the container on the vm, then
htpasswd /kibana-users/.htpasswd <username> # update users pass htpasswd -D /kibana-users/.htpasswd <username> # delete user
How do i restart Nginx on my Kibana nodes?
- get into the container on the vm, then
nginx -s stop nginx ps -e # to verify
- get into the container on the vm, then
How do i restart Kibana on my Kibana nodes?
- get into the container on the vm
ps -e | grep node # get the pid kill -9 <pid> /usr/local/bin/kentry.sh --server.host=0.0.0.0 &
- get into the container on the vm
Which processes are monitored/restarted on my Kibana nodes?
- There are 3 processes in this container: Nginx, Kibana and Elasticsearch. if Nginx or Kibana stops, the Elasticsearch process is not affected. you will have to manually restart whichever died. if the Elasticsearch process dies, the container will stop, and the hosting OS will automatically restart the container which will restart all 3 processes.
Which processes are monitored/restarted on my Elasticsearch nodes?
- In these containers, there is only 1 process, the Elasticsearch process. If this dies, the container will stop, and the hosting OS will automatically restart the container which will then restart the Elasticsearch process.