serverless-aws-glue
v0.0.44
Published
Serverless plugin to deploy AWS Glue Jobs
Downloads
410
Maintainers
Readme
Serverless Glue
This is a plugin for Serverless framework that provide the posiblitiy to deploy AWS Glue Jobs
Install
- run
npm install --save-dev serverless-aws-glue
- add serverless-glue in serverless.yml plugin section
plugins: - serverless-aws-glue
How work
The plugin create CloufFormation resources of your configuration before make the serverless deploy then add it to the serverless template.
So any glue-job deployed with this plugin is part of your stack too.
How configure your GlueJobs
Configure yours glue jobs in custom section like this:
custom:
Glue:
bucketDeploy: someBucket # Required
s3Prefix: some/s3/key/location/ # optional, default = 'glueJobs/'
jobs:
- job:
name: super-glue-job # Required
script: src/glueJobs/test-job.py # Required script will be named with the name after '/' and uploaded to s3Prefix location
tempDir: true # Optional true | false
type: spark # spark / pythonshell # Required
glueVersion: python3-2.0 # Required python3-1.0 | python3-2.0 | python2-1.0 | python2-0.9 | scala2-1.0 | scala2-0.9 | scala2-2.0
role: arn:aws:iam::000000000:role/someRole # Required
MaxConcurrentRuns: 3 # Optional
WorkerType: Standard # Optional | Standard | G1.X | G2.X
NumberOfWorkers: 1 # Optional
Connections: "RDS-MySQL5.7-Connection1,RDS-MySQL5.7-Connection2" # Optional
extraPyFilePaths: "/path/to/file1.py,/path/to/file2.py" # Optional
extraJarPaths: "/path/to/file1.jar,/path/to/file2.jar" # Optional
additionalModules: "mysql-connector-python==8.0.5,pymongo==3.11.4" # Optional
sparkUIPath: "s3://path" # Optional
DefaultArguments: # Optional
stage: "dev"
table_name: "test"
you can define a lot of jobs..
custom:
Glue:
bucketDeploy: someBucket
jobs:
- job:
...
- job:
...
Glue configuration parameters
|Parameter|Type|Description|Required| |-|-|-|-| |bucketDeploy|String|S3 Bucket name|true| |jobs|Array|Array of glue jobs to deploy|true|
Jobs configurations parameters
|Parameter|Type|Description|Required|
|-|-|-|-|
|name|String|name of job|true|
|script|String|script path in the project|true|
|tempDir|Boolean|flag indicate if job required a temp folder, if true plugin create a bucket for tmp|false|
|type|String|Indicate if the type of your job. Values can use are : spark
or pythonshell
|true|
|glueVersion|String|Indicate language and glue version to use ( [language][version]-[glue version]
) the value can you use are: python3-1.0python3-2.0python2-1.0python2-0.9scala2-1.0scala2-0.9scala2-2.0|true|
|role|String| arn role to execute job|true|
|MaxConcurrentRuns|Double|max concurrent runs of the job|false|
|WorkerType|String|worker type, default value if you dont indicate is Standard
|false|
|NumberOfWorkers|Integer|number of workers|false|
|Connections|String|Database connections (For multiple connection use ,
for seperation)|false|
|extraPyFilesPath|String|Python file path (For multiple files use ,
for seperation)|false|
|extraJarsPath|String|Jar file path (For multiple files use ,
for seperation)|false|
|additionalModules|String|Additional modules (For multiple multiple use ,
for seperation)|false|
|sparkUIPath|String|S3 Path|false|
|DefaultArguments|Json|Key Value pair values|false|
And now?...
Only run serverless deploy