cloud-duck
v0.0.17
Published
CDK construct for creating an analysis environment using DuckDB for S3 data
Downloads
1,343
Maintainers
Readme
CloudDuck is a CDK construct for simple and easy-to-use analysis environment for S3 data, featuring DuckDB with built-in authentication.
By simply deploying the Construct, you can launch a SaaS that provides an analytics dashboard like the one shown below. User authentication for access is implemented using Cognito, ensuring that only authorized users can log in.
Use Cases
- When you want to request data analysis on S3 using DuckDB but prefer not to issue S3 access credentials to the analysts.
- When you want to minimize the costs incurred from downloading large amounts of S3 data to local storage.
Architecture
Installation
npm i cloud-duck
Setup
Deploy
You can deploy the CloudDuck with the following code in the CDK stack.
import { CloudDuck } from 'cloud-duck';
import { Size } from 'aws-cdk-lib';
import * as cognito from 'aws-cdk-lib/aws-cognito';
declare const logBucket: s3.IBucket;
new CloudDuck(this, 'CloudDuck', {
// The S3 bucket to analyze
// CloudDuck can access to all of the buckets in the account by default.
// If you want to restrict the access, you can use the targetBuckets property.
targetBuckets: [logBucket],
// The memory size of the Lambda function
// Default: 1024 MB
memory: Size.mebibytes(1024),
// You can customize the Cognito User Pool
// For example, you can force the user to use MFA.
userPoolPlpos: {
mfa: cognito.Mfa.REQUIRED,
mfaSecondFactor: {
sms: false,
otp: true,
},
},
});
Add user to the Cognito User Pool
Add a user to the Cognito User Pool with the following command.
aws cognito-idp admin-create-user \
--user-pool-id "us-east-1_XXXXX" \
--username "[email protected]" \
--user-attributes Name=email,Value="[email protected]" Name=email_verified,Value=true \
--message-action SUPPRESS \
--temporary-password Password1!
You can also add a user via the AWS Management Console.
Access
Access to the CloudDuck with the cloudfront URL.
❯ npx cdk deploy
...
AwsStack.CloudDuckDistributionUrl84FC8296 = https://dosjykpv096qr.cloudfront.net
Stack ARN:
arn:aws:cloudformation:us-east-1:123456789012:stack/AwsStack/dd0960c0-b3d5-11ef-bcfc-12cf7722116f
✨ Total time: 73.59s
Enter the username and password.
When you log in at the first time, you need to change the password.
Play with the CloudDuck!
Usage
Query
You can query the S3 data with SQL.
SELECT * FROM read_csv_auto('s3://your-bucket-name/your-file.csv');
SELECT * FROM parquet_scan('s3://your-bucket-name/your-file.parquet');
Ofcourse, you can store the result as a new table.
CREATE TABLE new_table AS SELECT * FROM read_csv_auto('s3://your-bucket-name/your-file.csv');
Detail usage of DuckDB is available at DuckDB Documentation.
Persistence
All query results are persisted in individual DuckDB files for each user. Therefore, you can freely save your query results without worrying about affecting other users.
Note
CloudDuck is still under development. Updates may include breaking changes. If you encounter any bugs, please report them via issues.