glc-data-glue-crawlers-serverless-plugin
v1.0.6
Published
Serverless Plugin to create glue crawlers with correct permissions and tags
Downloads
11
Readme
Serverless Plugin for GLC glue crawlers
Requirements
Tested with:
- Node.js >=
v10
- Serverless Framework >=
v1.51
Installation
Install the dependency
Using npm:
npm i -D glc-data-glue-crawlers-serverless-plugin
Using yarn:
yarn add --dev glc-data-glue-crawlers-serverless-plugin
Use the plugin
Add the plugin to your serverless.yml
file:
plugins:
- glc-data-glue-crawlers-serverless-plugin
Usage
serverless deploy --stage <yourStage>
Example
You can specify the custom section in your serverless.yml
:
custom:
glcGlueCrawler:
name: <crawlerName> # the crawler name you want
source:
path: <s3Path> # the s3 path to crawl (eg. "s3://stats.datalake.${opt:stage}/classified")
classifier: <classifierName> # optional mapper ("DynamoDbStreamNewField" to crawl only $.new)
exclusions: [<exclusionPattern1>, <exclusionPattern2>, ...] # optional exclusions (eg. ["2018/**", "2019/0[1-2]/**"])
destination:
database: <databaseName> the database where to add the crawled table (eg. "datalake_${opt:stage}")
tablePrefix: <tablePrefix> # optional table prefix (eg. "lc_") to the crawled last S3 folder name (which will be "classified" if path is "s3://stats.datalake.${opt:stage}/classified")
tags:
Env: "${opt:stage}"
Bloc: "data"
App: "datalakehouse"
Comp: <crawlerName>
Team: <teamTag> # may be already defined in stackTags
IsInfraAsCode: "serverless" # may be already defined in stackTags
You can also define a list of multiple glue crawlers at once:
custom:
glcGlueCrawler:
- name: ${self:service}-${opt:stage}
source:
path: "s3://stats.datalake.${opt:stage}/my-lake"
destination:
database: "datalake_${opt:stage}"
tags:
Env: "${opt:stage}"
Bloc: "data"
App: "datalakehouse"
Comp: "${self:service}"
Team: "my-team" # may be already defined in stackTags
IsInfraAsCode: "serverless" # may be already defined in stackTags
- name: another-crawler-${opt:stage}
source:
path: "s3://stats.datalake.${opt:stage}/another-lake"
destination:
database: "datalake_${opt:stage}"
tags:
Env: "${opt:stage}"
Bloc: "data"
App: "datalakehouse"
Comp: "another-crawler"
Team: "my-team" # may be already defined in stackTags
IsInfraAsCode: "serverless" # may be already defined in stackTags