juttle-elastic-adapter
v0.7.0
Published
Juttle adapter for Elasticsearch
Downloads
17
Readme
Juttle Elastic Adapter
The Juttle Elastic Adapter enables reading and writing documents using Elasticsearch. It works with Elasticsearch version 1.5.2 (including AWS Elasticsearch Service) and above, such as version 2.1.1.
Examples
Read all documents stored in Elasticsearch timestamped with the last hour:
read elastic -from :1 hour ago: -to :now:
Write a document timestamped with the current time, with one field { name: "test" }
, which you'll then be able to query using read elastic
.
emit -limit 1 | put name="test" | write elastic
Read recent records from Elasticsearch that have field name
with value test
:
read elastic -last :1 hour: name = 'test'
Read recent records from Elasticsearch that contain the text hello world
in any field:
read elastic -last :1 hour: 'hello world'
An end-to-end example is described here and deployed to the demo system demo.juttle.io. The Juttle Tutorial also covers using elastic adapter.
Installation
Like Juttle itself, the adapter is installed as a npm package. Both Juttle and the adapter need to be installed side-by-side:
$ npm install juttle
$ npm install juttle-elastic-adapter
Configuration
The adapter needs to be registered and configured so that it can be used from
within Juttle. To do so, add the following to your ~/.juttle/config.json
file:
{
"adapters": {
"elastic": {
"address": "localhost",
"port": 9200
}
}
}
To connect to an Elasticsearch instance elsewhere, change the address
and port
in this configuration.
The value for elastic
can also be an array of Elasticsearch host locations. Give each one a unique id
field, and read -id
and write -id
will use the appropriate host.
The Juttle Elastic Adapter can also make requests to Amazon Elasticsearch Service instances, which requires a little more configuration. To connect to Amazon Elasticsearch Service, an entry in the adapter config must have {"aws": true}
as well as region
, endpoint
, access_key
, and secret_key
fields. access_key
and secret_key
can also be specified by the environment variables AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
respectively.
Here's an example Juttle Elastic Adapter configuration that can connect to a local Elasticsearch instance running on port 9200 using read/write elastic -id "local"
and an Amazon Elasticsearch Service at search-foo-bar.us-west-2.es.amazonaws.com
using read/write elastic -id "amazon"
:
{
"adapters": {
"elastic": [
{
"id": "local",
"address": "localhost",
"port": 9200
},
{
"id": "amazon",
"aws": true,
"endpoint": "search-foo-bar.us-west-2.es.amazonaws.com",
"region": "us-west-2",
"access_key": "(my access key ID)",
"secret_key": "(my secret key)"
}
]
}
}
Schema
To read or write data, the adapter has to know the names of the indices storing that data in Elasticsearch. By default, the adapter writes points to an index called juttle
and reads from all indices.
You can choose indices to read and write from with the -index
option, or you can specify an index
for each configured Elasticsearch instance the adapter is connected to.
For schemas that create indices at regular intervals, the adapter supports an indexInterval
option. Valid values for indexInterval
are day
, week
, month
, year
, and none
. With day
, the adapter will use indices formatted ${index}${yyyy.mm.dd}
. With week
, it will use ${index}${yyyy.ww}
, where ww
ranges from 01 to 53 numbering the weeks in a year. With month
, it will use ${index}${yyyy.mm}
, and with year
, it will use ${index}${yyyy}
. With none
, the default, it will use just one index entirely specified by index
. When using indexInterval
, index
should be the non-date portion of each index followed by *
.
Lastly, the adapter expects all documents in Elasticsearch to have a field containing a timestamp. By default, it expects this to be the @timestamp
field. This is configurable with the -timeField
option to read
and write
.
Specifics of using the default Logstash schema are described here, including handling of analyzed vs not_analyzed string fields.
Usage
Read options
In addition to the options below, read elastic
supports field comparisons of form field = value
, that can be combined into filter expressions using AND
/OR
/NOT
operators, and free text search, following the Juttle filtering syntax.
Name | Type | Required | Description | Default
-----|------|----------|-------------|---------
from
| moment | no | select points after this time (inclusive) | none, either -from
and -to
or -last
must be specified
to
| moment | no | select points before this time (exclusive) | none, either -from
and -to
or -last
must be specified
last
| duration | no | select points within this time in the past (exclusive) | none, either -from
and -to
or -last
must be specified
id
| string | no | read from the configured Elasticsearch endpoint with this ID | the first endpoint in config.json
index
| string | no | index(es) to read from | *
indexInterval
| string | no | granularity of an index. valid options: day
, week
, month
, year
, none
| none
type
| string | no | document type to read from | all types
timeField
| string | no | field containing timestamps | @timestamp
idField
| string | no | if specified, the value of this field in each point emitted by read elastic
will be the document ID of the corresponding Elasticsearch document | none
optimize
| true/false | no | optional flag to disable optimized reads, see Optimizations | true
Write options
Name | Type | Required | Description | Default
-----|------|----------|-------------|---------
id
| string | no | write to the configured Elasticsearch endpoint with this ID | the first endpoint in config.json
index
| string | no | index to write to | juttle
indexInterval
| string | no | granularity of an index. valid options: day
week
, month
, year
, none
| none
type
| string | no | document type to write to | event
timeField
| string | no | field containing timestamps | @timestamp
idField
| string | no | if specified, the value of this field on each point will be used as the document ID for the corresponding Elasticsearch document and not stored | none
chunkSize
| number | no | buffer points until chunkSize
have been received or the program ends, then flush | 1024
concurrency
| number | no | number of concurrent bulk requests to make to Elasticsearch (each inserts <= chunkSize
points) | 10
Optimizations
Whenever the elastic adapter can shape the entire Juttle flowgraph or its portion into an Elasticsearch query, it will do so, sending the execution to ES, so only the matching data will come back into Juttle runtime. The portion of the program expressed in read elastic
is always executed as an ES query; the downstream Juttle processors may be optimized as well.
Fully optimized example
read elastic -last :1 hour: -index 'scratch' -type 'tag1' name = 'test'
| reduce count()
This program will form an ES query that asks it do count the documents in scratch
index with document type tag1
whose field name
is set to the value test
, and only a single record (count) will come back from Elasticsearch.
Less optimized example
read elastic -last :1 hour: name = 'test'
| put threshold = 42
| filter value > threshold
In this case, Juttle will issue a query against ES that matches documents whose field name
is set to the value test
(i.e. Juttle will not read all documents from ES, only the once that match the filter expression in read elastic
). However, the rest of the program that filters for values exceeding threshold will be executing in the Juttle runtime, as it isn't possible to hand off this kind of filtering to ES.
List of optimized operations
- any filter expression or full text search as part of
read elastic
(note:read elastic | filter ...
is not optimized) head
ortail
reduce count()
,sum()
, and other built-in reducersreduce by fieldname
(other than reduce by document type)reduce -every :interval:
Optimization and nested objects
There are a few fundamental incompatibilities between Elasticsearch's model for nested object and array fields and Juttle's. This can lead to some odd results for optimized programs. For objects, an optimized reduce by some_object_field
will return null
as the only value for some_object_field
. For arrays, an optimized reduce by some_array_field
will return a separate value for some_array_field
for every element in every array stored in some_array_field
. For results conforming to Juttle's reduce
behavior, disable optimization with read elastic -optimize false
.
In case of unexpected behavior with optimized reads, add -optimize false
option to read elastic
to disable optimizations, and kindly report the problem as a GitHub issue.
Contributing
Want to contribute? Awesome! Don’t hesitate to file an issue or open a pull request. See the common contributing guidelines for project Juttle.