athena-admin
v0.9.5
Published
Migrate the table schema, replace objects so that it has partition key=value prefix and add partitions.
Downloads
7
Readme
athena-admin
Migrate the table schema, replace objects so that it has partition key=value prefix and add partitions.
$ npm install athena-admin
const AthenaAdmin = require('athena-admin').AthenaAdmin;
const dbDef = require('./sampledatabase.json');
const admin = new AthenaAdmin(dbDef);
await admin.replaceObjects();
await admin.migrate();
await admin.partition();
Database definition
Describe the database definition in the following format.
{
"general": {
"athenaRegion": "ap-northeast-1",
"databaseName": "aaaa",
"saveDefinitionLocation": "s3://saveDefinitionBucket/aaaa.json"
},
"tables": {
"sample_data": {
"columns": {
"user_id": "int",
"some_value": { /* = "struct<score:int,category:string>" */
"score": "int",
"category": "string"
},
"some_array1": ["string"], /* = array<string> */
"some_array2": [{ /* = array<struct<aaa:int,bbb:string>> */
"aaa": "int",
"bbb": "string"
}]
},
"srcLocation": "s3://src/location/",
"partition": {
"prePartitionLocation": "s3://pre/partition/", /* optional */
"regexp": "(\\d{4})/(\\d{2})/(\\d{2})/", /* optional */
"keys": [
{
"name": "dt",
"type": "string",
"format": "{1}-{2}-{3}", /* optional */
}
]
}
}
}
}
general
| Field | Description | |:-----------|:------------| | athenaRegion | Region for Athena | | databaseName | Athena database name | | saveDefinitionLocation | Location to save the previous definition |
tables
- Root field name (sample_data) is a table name.
| Field | Description |
|:-----------|:------------|
| columns | Column name and type pairs. struct<> and array<> can also be described as a json object so you can describe these by converting the actual data values to the type. |
| srcLocation | Location to be refferenced by Athena |
| partition | Partition detectable by key=value prefix.If objects' location don't have partition's key=value prefix, you can replace from prePartitionLocation to srcLocation by replaceObjects()
. This is for partition()
automatically detecting and adding partitions with keys.key as its key and keys.format as its value of keys.type as its type.keys.format's {n} corresponds to the group of regexp. (e.g. s3://pre/partition/2017/12/01/00/aaa.png
=> [2017/12/01, 2017, 12, 01]
) |
API
replaceObjects(deletePreObject=true, matchedHandler=(matched, objKey, table)=>matched)
Replaces object located in prePartitionLocation to srcLocation with partition key=value prefix.
(e.g. s3://pre/partition/2017/12/01/00/aaa.png
=> s3://src/location/dt=2017-12-01/00/aaa.png
)
If you need to change the key before this operation, use matchedHandler.
The following example is changing the UTC string to that of TimeZone.
(e.g. 2017/12/01/19
=> 2017/12/02/04
)
There are full codes in /sample.
const utcToTZ = (matched, objKey, table) => {
let existsDt = false;
table.partition.keys.forEach((key) => {
if (key.name === 'dt') {
existsDt = true;
}
});
if (!existsDt) {
return matched;
}
let tz = moment(`${matched[0]} +00:00`, 'YYYY/MM/DD/HH ZZ');
matched[1] = tz.format('YYYY');
matched[2] = tz.format('MM');
matched[3] = tz.format('DD');
matched[4] = tz.format('HH');
return matched;
};
await admin.replaceObjects(false, utcToTZ);
migrate()
If there are differences from the previous saved definition in S3, create/drop the table or update the schema.
partition()
Just run MSCK REPAIR TABLE
. Partition is automatically detected and added by objects' key=value prefix.
Article
Athenaのmigrationやpartitionするathena-managerを作った - sambaiz-net