elasticsearch-watchdog

v0.1.6

Published

2 years ago

A watchdog of elasticsearch - cluster nodes' statuses monitor, auto restart, keep PRIMARY node unique.

Downloads

0High
0Medium
0Low

tjatse

monitor watchdog cluster health automatic restart

elasticsearch-watchdog

A watchdog of elasticsearch - cluster nodes' statuses monitor, auto restart, keep PRIMARY node unique.

In my situation, millions data are indexed to ElasticSearch everyday, and our cluster has too many nodes, we spent a lot of time to make it stable and reliable, but unfortunately, they crash every few months due to:

Status changes to red or grey.
Different primary nodes but not a unique one (like autocephaly).
Unresponsive (HTTP timeout, shake failed and all that stuff).
Other issues.

What Can Watchdog Do

Monitor statuses/healths/states of ElasticSearch cluster/node.
Auto restart ElasticSearch through openSSH.
Quick look of Watchdog statuses any where, especially on mobile device.
Make every day is Sunday.

Installation

$ npm install elasticsearch-watchdog -g

Usage

watchdog


  Usage: watchdog [cmd] [file|name]

  Commands:

    pwd <password>            encrypt the password
    encrypt [options] <file>  encrypt the configuration file and save it to disk
    tmpl <name>               render a configuration template
    start [options] <file>    start watching on an ElasticSearch cluster
    stop <uid>                stop watching by `uid`, all the watchdogs will be killed if `uid` is `all`
    restart <uid>             restart watching by `uid`, call the watchdogs back and then send them out for watching again if `uid` is `all`
    ls [options]              list all the watchdogs we have
    web [port]                launch a web GUI, port default by 8088

  Options:

    -h, --help     output usage information
    -v, --version  output the version number
    -r, --root     the root location, you can find all logs here.

  Basic Examples:

    Start a watchdog, by file:
    $ watchdog start watchdog.yml

    Restart the alive watchdog, by uid:
    $ watchdog restart 1001

    Restart all watchdogs:
    $ watchdog restart all

    Stop the watchdog, by uid:
    $ watchdog stop 1001

    Stop all the watchdogs:
    $ watchdog stop all

encrypt

Usage: encrypt [options] <file>

  Options:

    -h, --help  output usage information
    --no-blank  remove the blank line if this option is provided

tmpl

$ watchdog tmpl <file>

<file> is the name of configuration file, .yml is optional, i.e. $ watchdog tmpl es-server and $ watchdog tmpl es-server.yml are both fine.

start

 Usage: start [options] <file>

  Options:

    -h, --help          output usage information
    --no-daemon         running watchdog as a service, otherwise in the terminal
    -m, --max <number>  maximize retry count when dog has died

stop

$ watchdog stop <uid>

All the watchdogs will be killed if uid is all. Head over to Printf to get more information about uid.

restart

$ watchdog restart <uid>

All the watchdogs will be called back and then sent out for watching if name is all. Head over to Printf to get more information about uid.

  Usage: ls [options]

  Options:

    -h, --help   output usage information
    --no-format  print list as JSON without formatting

web

# simple
$ watchdog web [port]

# daemonic
# start
$ nohup watchdog web > /dev/null 2>&1 & echo $! > /path/to/watchdog.pid
# stop
$ kill -9 `cat /path/to/watchdog.pid`

Port of web interface is optional (8088 by default). In order to have a perfect viewport, using your mobile device in a landscape mode, but not portrait.

GUI:

And a restful interface is providing yet, i.e.:http://[domain|ip]:[port]/json.

Printf

Take an example for $ watchdog ls, the output will be formatted like following.

name
CLUSTER-SERVER and PERCOLATOR-SERVER are names of the Watchdog.
uid
7707 and 6384 are uids of the Watchdogs, run $ watchdog stop 7707 or $ watchdog restart 7707 to do a stop/restart operation.
colors
red, yellow, grey and green are the statuses of ElasticSearch.
symbols
★ means primary node, ✩ means leaves (not master nodes).
dim style
- UNKNOWN [missing status] / 192.168.100.112 [unknown]
  It means unknown primary node, and can not get the status through _cluster/health / _cluster/state API.
- 192.168.100.166 [error]
  It means can not connect to server through openSSH, and you'd better check the logs (~/.watchdog/logs/).

Programmatic

var Watchdog = require('watchdog');

// load configuration.
var monit = Watchdog({
  conf: '/path/to/conf.yml',
  uid: false
});

// listen events.
monit.on('info', function(msg){
  console.log('[INFO]', msg.type, msg.message);
});

// start watching.
monit.watching();

// end it.
// monit.end();

Configuration

Execute $ watchdog tmpl my-es to render a copy one, edit it to meet the individual requirements. BTW, it almost supports all the YAML syntaxs.

In order to restart ElasticSearch smoothly, if you have ElasticSearch running then stop the process and start it using:

$ elasticsearch -d -p /path/to/es.pid [options]

Local environment

If you're running Watchdog and ElasticSearch on a same server, get the IP address by visit:

http://localhost:9200/_cluster/state

The transport_address of current server is which you're binding to ElasticSearch, and there is no need to provide nodes.ssh.password in configuration for it.

Examples

Head over to example or test directories.

Test

$ npm test

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

elasticsearch-watchdog

What Can Watchdog Do

Installation

Usage

Printf

Programmatic

Configuration

Examples

Test

License