s3db.js
v3.2.6
Published
Use AWS S3, the world's most reliable document storage, as a database with this ORM.
Downloads
71
Readme
s3db.js
Another way to create a cheap document-base database with an easy ORM to handle your dataset!
- Motivation
- Usage
- Install
- Quick Setup
- Insights
- Database
- Create a resource
- Resource methods
- Insert one
- Get one
- Update one
- Delete one
- Count
- Insert many
- Get many
- Get all
- Delete many
- Delete all
- List ids
- Resource streams
- Readable stream
- Writable stream
- S3 Client
- Events
- Plugins
- Cost Simulation
- Big Example
- Small example
- Roadmap
Motivation
First of all:
- Nothing is for free, but it can be cheaper.
- I'm not responsible for your AWS Costs strategy, use
s3db.js
at your own risk. - Please, do not use in production!
Let's go!
You might know AWS's S3 product for its high availability and its cheap pricing rules. I'll show you another clever and funny way to use S3.
AWS allows you define Metadata
to every single file you upload into your bucket. This attribute must be defined within a 2kb limit using in UTF-8
encoding. As this encoding may vary the bytes width for each symbol you may use [500 to 2000] chars of metadata storage. Follow the docs at AWS S3 User Guide: Using metadata.
There is another management subset of data called tags
that is used globally as [key, value] params. You can assign 10 tags with the conditions of: the key must be at most 128 unicode chars lengthy and the value up to 256 chars. With those key-values we can use more 2.5kb
of data, unicode will allow you to use up to 2500 more chars. Follow the official docs at AWS User Guide: Object Tagging.
With all this set you may store objects that should be able to store up to 4.5kb
of free space per object.
Check the cost simulation section below for a deep cost dive!
Lets give it a try! :)
Usage
You may check the snippets bellow or go straight to the Examples section!
Install
npm i s3db.js
# or
yarn add s3db.js
Quick setup
Our S3db client use connection string params.
import { S3db } from "s3db.js";
const {
AWS_BUCKET,
AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY,
} = process.env
const s3db = new S3db({
uri: `s3://${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}@${AWS_BUCKET}/databases/mydatabase`
});
s3db
.connect()
.then(() => console.log('connected!')))
If you do use dotenv
package:
import * as dotenv from "dotenv";
dotenv.config();
import { S3db } from "s3db.js";
Insights
This implementation of ORM simulates a document repository. Due to the fact that
s3db.js
usesaws-sdk
's' S3 api; all requests are GET/PUT askey=value
resources. So the best case scenario is to access like a document implementation.For better use of the cache and listing, the best ID format is to use sequential ids with leading zeros (eq: 00001, 00002, 00003) due to S3 internal keys sorting method. But you will need to manage this incremental ID by your own.
Database
Your s3db.js
client can be initiated with options:
| option | optional | description | type | default |
| :---------: | :------: | :-------------------------------------------------: | :-------: | :---------: |
| cache | true | Persist searched data to reduce repeated requests | boolean
| undefined
|
| parallelism | true | Number of simultaneous tasks | number
| 10 |
| passphrase | true | Your encryption secret | string
| undefined
|
| ttl | true | (Coming soon) TTL to your cache duration in seconds | number
| 86400 |
| uri | false | A url as your S3 connection string | string
| undefined
|
Config example:
const {
AWS_BUCKET = "my-bucket",
AWS_ACCESS_KEY_ID = "secret",
AWS_SECRET_ACCESS_KEY = "secret",
AWS_BUCKET_PREFIX = "databases/test-" + Date.now(),
} = process.env;
const uri = `s3://${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}@${AWS_BUCKET}/${AWS_BUCKET_PREFIX}`;
const options = {
uri,
parallelism: 25,
passphrase: fs.readFileSync("./cert.pem"),
};
s3db.connect()
This method must always be invoked before any operation take place. This will interact with AWS' S3 api and check the itens below:
- With current credentials:
- Check if client has access to the S3 bucket.
- Check if client has access to bucket life-cycle policies.
- With defined database:
- Check if there is already a database in this connection string.
- If any database is found, downloads it's medatada and loads each
Resource
definition. - Else, it will generate an empty
metadata
file into this prefix and mark that this is a new database from scratch.
- If any database is found, downloads it's medatada and loads each
- Check if there is already a database in this connection string.
Metadata file
s3db.js
will generate a file /s3db.json
at the pre-defined prefix with this structure:
{
// file version
"version": "1",
// previously defined resources
"resources": {
// definition example
"leads": {
"name": "leads",
// resource options
"options": {},
// resource defined schema
"schema": {
"name": "string",
"token": "secret"
},
// rules to simplify metadata usage
"mapper": {
"name": "0",
"token": "1"
},
}
}
}
Create a resource
Resources are definitions of data collections.
// resource
const attributes = {
utm: {
source: "string|optional",
medium: "string|optional",
campaign: "string|optional",
term: "string|optional",
},
lead: {
fullName: "string",
mobileNumber: "string",
personalEmail: "email",
},
};
const resource = await s3db.createResource({
name: "leads",
attributes,
});
Resources' names cannot prefix each other, like: leads
and leads-copy
! S3's api lists keys using prefix notation, so every time you list leads
, all keys of leads-copy
will appear as well.
Attributes
s3db.js
use the fastest-validator package to define and validate your resource. Some few examples:
const attributes = {
// few simple examples
name: "string|min:4|max:64|trim",
email: "email|nullable",
mobile: "string|optional",
count: "number|integer|positive",
corrency: "corrency|symbol:R$",
createdAt: "date",
website: "url",
id: "uuid",
ids: "array|items:uuid|unique",
// s3db defines a custom type "secret" that is encrypted
token: "secret",
// nested data works aswell
geo: {
lat: "number",
long: "number",
city: "string",
},
// may have multiple definitions.
address_number: ["string", "number"],
};
Reference:
You may just use the reference:
const Leads = s3db.resource("leads");
Limitations:
As we need to store the resource definition within a JSON file, to keep your definitions intact the best way is to use the string-based shorthand definitions in your resource definition.
By design, the resource definition will will strip all functions in attributes to avoid eval()
calls.
The fastest-validator
starts with the params below:
// fastest-validator params
{
useNewCustomCheckerFunction: true,
defaults: {
object: {
strict: "remove",
},
},
}
Resources methods
Consider resource
as:
const resource = s3db.resource("leads");
Insert one
// data
const insertedData = await resource.insert({
id: "[email protected]", // if not defined a id will be generated!
utm: {
source: "abc",
},
lead: {
fullName: "My Complex Name",
personalEmail: "[email protected]",
mobileNumber: "+5511234567890",
},
invalidAttr: "this attribute will disappear",
});
// {
// id: "[email protected]",
// utm: {
// source: "abc",
// },
// lead: {
// fullName: "My Complex Name",
// personalEmail: "[email protected]",
// mobileNumber: "+5511234567890",
// },
// invalidAttr: "this attribute will disappear",
// }
If not defined an id attribute, s3db.js
will use nanoid
to generate a random unique id!
Get one
const obj = await resource.get("[email protected]");
// {
// id: "[email protected]",
// utm: {
// source: "abc",
// },
// lead: {
// fullName: "My Complex Name",
// personalEmail: "[email protected]",
// mobileNumber: "+5511234567890",
// },
// }
Update one
const obj = await resource.update("[email protected]", {
lead: {
fullName: "My New Name",
mobileNumber: "+5511999999999",
},
});
// {
// id: "[email protected]",
// utm: {
// source: "abc",
// },
// lead: {
// fullName: "My New Name",
// personalEmail: "[email protected]",
// mobileNumber: "+5511999999999",
// },
// }
Delete one
await resource.delete(id);
Count
await resource.count();
// 101
Insert many
You may bulk insert data with a friendly method that receives a list of objects.
const objects = new Array(100).fill(0).map((v, k) => ({
id: `bulk-${k}@mymail.com`,
lead: {
fullName: "My Test Name",
personalEmail: `bulk-${k}@mymail.com`,
mobileNumber: "+55 11 1234567890",
},
}));
await resource.insertMany(objects);
Keep in mind that we need to send a request to each object to be created. There is an option to change the amount of simultaneos connections that your client will handle.
const s3db = new S3db({
parallelism: 100, // default = 10
});
This method uses supercharge/promise-pool
to organize the parallel promises.
Get many
await resource.getMany(["id1", "id2", "id3 "]);
// [
// obj1,
// obj2,
// obj3,
// ]
Get all
const data = await resource.getAll();
// [
// obj1,
// obj2,
// ...
// ]
Delete many
await resource.deleteMany(["id1", "id2", "id3 "]);
Delete all
await resource.deleteAll();
List ids
const ids = await resource.listIds();
// [
// 'id1',
// 'id2',
// 'id3',
// ]
Resource streams
As we need to request the metadata for each id to return it's attributes, a better way to handle a huge amount off data might be using streams.
Readable stream
const readableStream = await resource.readable();
readableStream.on("id", (id) => console.log("id =", id));
readableStream.on("data", (lead) => console.log("lead.id =", lead.id));
readableStream.on("end", console.log("end"));
Writable stream
const writableStream = await resource.writable();
writableStream.write({
lead: {
fullName: "My Test Name",
personalEmail: `bulk-${k}@mymail.com`,
mobileNumber: "+55 11 1234567890",
},
});
S3 Client
s3db.js
has a S3 proxied client named S3Client
. It brings a few handy and less verbose functions to deal with AWS S3's api.
import { S3Client } from "s3db.js";
const client = new S3Client({ connectionString });
Each method has a :link: link to the official aws-sdk
docs.
getObject :link:
const { Body, Metadata } = await client.getObject({
key: `my-prefixed-file.csv`,
});
// AWS.Response
putObject :link:
const response = await client.putObject({
key: `my-prefixed-file.csv`,
contentType: "text/csv",
metadata: { a: "1", b: "2", c: "3" },
body: "a;b;c\n1;2;3\n4;5;6",
});
// AWS.Response
headObject :link:
const { Metadata } = await client.headObject({
key: `my-prefixed-file.csv`,
});
// AWS.Response
deleteObject :link:
const response = await client.deleteObject({
key: `my-prefixed-file.csv`,
});
// AWS.Response
deleteObjects :link:
const response = await client.deleteObjects({
keys: [`my-prefixed-file.csv`, `my-other-prefixed-file.csv`],
});
// AWS.Response
listObjects :link:
const response = await client.listObjects({
prefix: `my-subdir`,
});
// AWS.Response
count
Custom made method to make it easier to count keys within a listObjects loop.
const count = await client.count({
prefix: `my-subdir`,
});
// 10
getAllKeys
Custom made method to make it easier to return all keys in a subpath within a listObjects loop.
All returned keys will have the it's fullpath replaced with the current "scope" path.
const keys = await client.getAllKeys({
prefix: `my-subdir`,
});
// [
// key1,
// key2,
// ...
// ]
Events
The 3 main classes S3db
, Resource
and S3Client
are extensions of Javascript's EventEmitter
.
| S3Database | S3Client | S3Resource | S3Resource Readable Stream | | ---------- | ------------- | ---------- | -------------------------- | | error | error | error | error | | connected | request | insert | id | | | response | get | data | | | response | update | | | | getObject | delete | | | | putObject | count | | | | headObject | insertMany | | | | deleteObject | deleteAll | | | | deleteObjects | listIds | | | | listObjects | getMany | | | | count | getAll | | | | getAllKeys | | |
S3Database
error
s3db.on("error", (error) => console.error(error));
connected
s3db.on("connected", () => {});
S3Client
Using this reference for the events:
const client = s3db.client;
error
client.on("error", (error) => console.error(error));
request
Emitted when a request is generated to AWS.
client.on("request", (action, params) => {});
response
Emitted when a response is received from AWS.
client.on("response", (action, params, response) => {});
getObject
client.on("getObject", (options, response) => {});
putObject
client.on("putObject", (options, response) => {});
headObject
client.on("headObject", (options, response) => {});
deleteObject
client.on("deleteObject", (options, response) => {});
deleteObjects
client.on("deleteObjects", (options, response) => {});
listObjects
client.on("listObjects", (options, response) => {});
count
client.on("count", (options, response) => {});
getAllKeys
client.on("getAllKeys", (options, response) => {});
S3Resource
Using this reference for the events:
const resource = s3db.resource("leads");
error
resource.on("error", (err) => console.error(err));
insert
resource.on("insert", (data) => {});
get
resource.on("get", (data) => {});
update
resource.on("update", (attrs, data) => {});
delete
resource.on("delete", (id) => {});
count
resource.on("count", (count) => {});
insertMany
resource.on("insertMany", (count) => {});
getMany
resource.on("getMany", (count) => {});
getAll
resource.on("getAll", (count) => {});
deleteAll
resource.on("deleteAll", (count) => {});
listIds
resource.on("listIds", (count) => {});
Plugins
Anatomy of a plugin:
const MyPlugin = {
setup(s3db: S3db) {},
start() {},
};
We have an example of a costs simulator plugin here!
Cost simulation
S3's pricing deep dive:
- Data volume [1 GB x 0.023 USD]: it relates to the total volume of storage used and requests volume but, in this implementation, we just upload
0 bytes
files. - GET Requests [1,000 GET requests in a month x 0.0000004 USD per request = 0.0004 USD]: every read requests
- PUT Requests [1,000 PUT requests for S3 Standard Storage x 0.000005 USD per request = 0.005 USD]: every write request
- Data transfer [Internet: 1 GB x 0.09 USD per GB = 0.09 USD]:
Check by yourself the pricing page details at https://aws.amazon.com/s3/pricing/ and https://calculator.aws/#/addService/S3.
Big example
Lets try to simulate a big project where you have a database with a few tables:
- pageviews: 100,000,000 lines of 100 bytes each
- leads: 1,000,000 lines of 200 bytes each
const Fakerator = require("fakerator");
const fake = Fakerator("pt-BR");
const pageview = {
ip: this.faker.internet.ip(),
domain: this.faker.internet.url(),
path: this.faker.internet.url(),
query: `?q=${this.faker.lorem.word()}`,
};
const lead = {
name: fake.names.name(),
mobile: fake.phone.number(),
email: fake.internet.email(),
country: "Brazil",
city: fake.address.city(),
state: fake.address.countryCode(),
address: fake.address.street(),
};
If you write the whole database of:
- pageviews:
- 100,000,000 PUT requests for S3 Standard Storage x 0.000005 USD per request = 500.00 USD (S3 Standard PUT requests cost)
- leads:
- 1,000,000 PUT requests for S3 Standard Storage x 0.000005 USD per request = 5.00 USD (S3 Standard PUT requests cost)
It will cost 505.00 USD, once.
If you want to read the whole database:
- pageviews:
- 100,000,000 GET requests in a month x 0.0000004 USD per request = 40.00 USD (S3 Standard GET requests cost)
- (100,000,000 × 100 bytes)÷(1024×1000×1000) ≅ 10 Gb Internet: 10 GB x 0.09 USD per GB = 0.90 USD
- leads:
- 1,000,000 GET requests in a month x 0.0000004 USD per request = 0.40 USD (S3 Standard GET requests cost)
- (1,000,000 × 200 bytes)÷(1024×1000×1000) ≅ 0.19 Gb Internet: 1 GB x 0.09 USD per GB = 0.09 USD
It will cost 41.39 USD, once.
Small example
Lets save some JWT tokens using the RFC:7519.
await s3db.createResource({
name: "tokens",
attributes: {
iss: 'url|max:256',
sub: 'string',
aud: 'string',
exp: 'number',
email: 'email',
name: 'string',
scope: 'string',
email_verified: 'boolean',
})
function generateToken () {
const token = createTokenLib(...)
await resource.insert({
id: token.jti || md5(token)
...token,
})
return token
}
function validateToken (token) {
const id = token.jti || md5(token)
if (!validateTokenSignature(token, ...)) {
await resource.deleteById(id)
throw new Error('invalid-token')
}
return resource.getById(id)
}
Roadmap
Tasks board can be found at this link!
Feel free to interact and PRs are welcome! :)