s3db.js

v3.2.6

Published

4 months ago

Use AWS S3, the world's most reliable document storage, as a database with this ORM.

Downloads

0High
0Medium
0Low

fforattini

s3 aws database

s3db.js

Another way to create a cheap document-base database with an easy ORM to handle your dataset!

Motivation
Usage
1. Install
2. Quick Setup
3. Insights
4. Database
5. Create a resource
Resource methods
1. Insert one
2. Get one
3. Update one
4. Delete one
5. Count
6. Insert many
7. Get many
8. Get all
9. Delete many
10. Delete all
11. List ids
Resource streams
1. Readable stream
2. Writable stream
S3 Client
Events
Plugins
Cost Simulation
1. Big Example
2. Small example
Roadmap

Motivation

First of all:

Nothing is for free, but it can be cheaper.
I'm not responsible for your AWS Costs strategy, use s3db.js at your own risk.
Please, do not use in production!

Let's go!

You might know AWS's S3 product for its high availability and its cheap pricing rules. I'll show you another clever and funny way to use S3.

AWS allows you define Metadata to every single file you upload into your bucket. This attribute must be defined within a 2kb limit using in UTF-8 encoding. As this encoding may vary the bytes width for each symbol you may use [500 to 2000] chars of metadata storage. Follow the docs at AWS S3 User Guide: Using metadata.

There is another management subset of data called tags that is used globally as [key, value] params. You can assign 10 tags with the conditions of: the key must be at most 128 unicode chars lengthy and the value up to 256 chars. With those key-values we can use more 2.5kb of data, unicode will allow you to use up to 2500 more chars. Follow the official docs at AWS User Guide: Object Tagging.

With all this set you may store objects that should be able to store up to 4.5kb of free space per object.

Check the cost simulation section below for a deep cost dive!

Lets give it a try! :)

Usage

You may check the snippets bellow or go straight to the Examples section!

Install

npm i s3db.js

# or

yarn add s3db.js

Quick setup

Our S3db client use connection string params.

import { S3db } from "s3db.js";

const {
  AWS_BUCKET,
  AWS_ACCESS_KEY_ID,
  AWS_SECRET_ACCESS_KEY,
} = process.env

const s3db = new S3db({
  uri: `s3://${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}@${AWS_BUCKET}/databases/mydatabase`
});

s3db
  .connect()
  .then(() => console.log('connected!')))

If you do use dotenv package:

import * as dotenv from "dotenv";
dotenv.config();

import { S3db } from "s3db.js";

Insights

This implementation of ORM simulates a document repository. Due to the fact that s3db.js uses aws-sdk's' S3 api; all requests are GET/PUT as key=value resources. So the best case scenario is to access like a document implementation.
For better use of the cache and listing, the best ID format is to use sequential ids with leading zeros (eq: 00001, 00002, 00003) due to S3 internal keys sorting method. But you will need to manage this incremental ID by your own.

Database

Your s3db.js client can be initiated with options:

Config example:

const {
  AWS_BUCKET = "my-bucket",
  AWS_ACCESS_KEY_ID = "secret",
  AWS_SECRET_ACCESS_KEY = "secret",
  AWS_BUCKET_PREFIX = "databases/test-" + Date.now(),
} = process.env;

const uri = `s3://${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}@${AWS_BUCKET}/${AWS_BUCKET_PREFIX}`;

const options = {
  uri,
  parallelism: 25,
  passphrase: fs.readFileSync("./cert.pem"),
};

s3db.connect()

This method must always be invoked before any operation take place. This will interact with AWS' S3 api and check the itens below:

With current credentials:
- Check if client has access to the S3 bucket.
- Check if client has access to bucket life-cycle policies.
With defined database:
- Check if there is already a database in this connection string.
  - If any database is found, downloads it's medatada and loads each Resource definition.
  - Else, it will generate an empty metadata file into this prefix and mark that this is a new database from scratch.

Metadata file

s3db.js will generate a file /s3db.json at the pre-defined prefix with this structure:

{
  // file version
  "version": "1",

  // previously defined resources
  "resources": {
    // definition example
    "leads": {
      "name": "leads",

      // resource options
      "options": {},

      // resource defined schema
      "schema": {
        "name": "string",
        "token": "secret"
      },

      // rules to simplify metadata usage
      "mapper": {
        "name": "0",
        "token": "1"
      },
    }
  }
}

Create a resource

Resources are definitions of data collections.

// resource
const attributes = {
  utm: {
    source: "string|optional",
    medium: "string|optional",
    campaign: "string|optional",
    term: "string|optional",
  },
  lead: {
    fullName: "string",
    mobileNumber: "string",
    personalEmail: "email",
  },
};

const resource = await s3db.createResource({
  name: "leads",
  attributes,
});

Resources' names cannot prefix each other, like: leads and leads-copy! S3's api lists keys using prefix notation, so every time you list leads, all keys of leads-copy will appear as well.

Attributes

s3db.js use the fastest-validator package to define and validate your resource. Some few examples:

const attributes = {
  // few simple examples
  name: "string|min:4|max:64|trim",
  email: "email|nullable",
  mobile: "string|optional",
  count: "number|integer|positive",
  corrency: "corrency|symbol:R$",
  createdAt: "date",
  website: "url",
  id: "uuid",
  ids: "array|items:uuid|unique",

  // s3db defines a custom type "secret" that is encrypted
  token: "secret",

  // nested data works aswell
  geo: {
    lat: "number",
    long: "number",
    city: "string",
  },

  // may have multiple definitions.
  address_number: ["string", "number"],
};

Reference:

You may just use the reference:

const Leads = s3db.resource("leads");

Limitations:

As we need to store the resource definition within a JSON file, to keep your definitions intact the best way is to use the string-based shorthand definitions in your resource definition.

By design, the resource definition will will strip all functions in attributes to avoid eval() calls.

The fastest-validator starts with the params below:

// fastest-validator params
{
  useNewCustomCheckerFunction: true,
  defaults: {
    object: {
      strict: "remove",
    },
  },
}

Resources methods

Consider resource as:

const resource = s3db.resource("leads");

Insert one

// data
const insertedData = await resource.insert({
  id: "[email protected]", // if not defined a id will be generated!
  utm: {
    source: "abc",
  },
  lead: {
    fullName: "My Complex Name",
    personalEmail: "[email protected]",
    mobileNumber: "+5511234567890",
  },
  invalidAttr: "this attribute will disappear",
});

// {
//   id: "[email protected]",
//   utm: {
//     source: "abc",
//   },
//   lead: {
//     fullName: "My Complex Name",
//     personalEmail: "[email protected]",
//     mobileNumber: "+5511234567890",
//   },
//   invalidAttr: "this attribute will disappear",
// }

If not defined an id attribute, s3db.js will use nanoid to generate a random unique id!

Get one

const obj = await resource.get("[email protected]");

// {
//   id: "[email protected]",
//   utm: {
//     source: "abc",
//   },
//   lead: {
//     fullName: "My Complex Name",
//     personalEmail: "[email protected]",
//     mobileNumber: "+5511234567890",
//   },
// }

Update one

const obj = await resource.update("[email protected]", {
  lead: {
    fullName: "My New Name",
    mobileNumber: "+5511999999999",
  },
});

// {
//   id: "[email protected]",
//   utm: {
//     source: "abc",
//   },
//   lead: {
//     fullName: "My New Name",
//     personalEmail: "[email protected]",
//     mobileNumber: "+5511999999999",
//   },
// }

Delete one

await resource.delete(id);

Count

await resource.count();

// 101

Insert many

You may bulk insert data with a friendly method that receives a list of objects.

const objects = new Array(100).fill(0).map((v, k) => ({
  id: `bulk-${k}@mymail.com`,
  lead: {
    fullName: "My Test Name",
    personalEmail: `bulk-${k}@mymail.com`,
    mobileNumber: "+55 11 1234567890",
  },
}));

await resource.insertMany(objects);

Keep in mind that we need to send a request to each object to be created. There is an option to change the amount of simultaneos connections that your client will handle.

const s3db = new S3db({
  parallelism: 100, // default = 10
});

This method uses supercharge/promise-pool to organize the parallel promises.

Get many

await resource.getMany(["id1", "id2", "id3 "]);

// [
//   obj1,
//   obj2,
//   obj3,
// ]

Get all

const data = await resource.getAll();

// [
//   obj1,
//   obj2,
//   ...
// ]

Delete many

await resource.deleteMany(["id1", "id2", "id3 "]);

Delete all

await resource.deleteAll();

List ids

const ids = await resource.listIds();

// [
//   'id1',
//   'id2',
//   'id3',
// ]

Resource streams

As we need to request the metadata for each id to return it's attributes, a better way to handle a huge amount off data might be using streams.

Readable stream

const readableStream = await resource.readable();

readableStream.on("id", (id) => console.log("id =", id));
readableStream.on("data", (lead) => console.log("lead.id =", lead.id));
readableStream.on("end", console.log("end"));

Writable stream

const writableStream = await resource.writable();

writableStream.write({
  lead: {
    fullName: "My Test Name",
    personalEmail: `bulk-${k}@mymail.com`,
    mobileNumber: "+55 11 1234567890",
  },
});

S3 Client

s3db.js has a S3 proxied client named S3Client. It brings a few handy and less verbose functions to deal with AWS S3's api.

import { S3Client } from "s3db.js";

const client = new S3Client({ connectionString });

Each method has a :link: link to the official aws-sdk docs.

getObject :link:

const { Body, Metadata } = await client.getObject({
  key: `my-prefixed-file.csv`,
});

// AWS.Response

putObject :link:

const response = await client.putObject({
  key: `my-prefixed-file.csv`,
  contentType: "text/csv",
  metadata: { a: "1", b: "2", c: "3" },
  body: "a;b;c\n1;2;3\n4;5;6",
});

// AWS.Response

headObject :link:

const { Metadata } = await client.headObject({
  key: `my-prefixed-file.csv`,
});

// AWS.Response

deleteObject :link:

const response = await client.deleteObject({
  key: `my-prefixed-file.csv`,
});

// AWS.Response

deleteObjects :link:

const response = await client.deleteObjects({
  keys: [`my-prefixed-file.csv`, `my-other-prefixed-file.csv`],
});

// AWS.Response

listObjects :link:

const response = await client.listObjects({
  prefix: `my-subdir`,
});

// AWS.Response

count

Custom made method to make it easier to count keys within a listObjects loop.

const count = await client.count({
  prefix: `my-subdir`,
});

// 10

getAllKeys

Custom made method to make it easier to return all keys in a subpath within a listObjects loop.

All returned keys will have the it's fullpath replaced with the current "scope" path.

const keys = await client.getAllKeys({
  prefix: `my-subdir`,
});

// [
//   key1,
//   key2,
//   ...
// ]

Events

The 3 main classes S3db, Resource and S3Client are extensions of Javascript's EventEmitter.

| S3Database | S3Client | S3Resource | S3Resource Readable Stream | | ---------- | ------------- | ---------- | -------------------------- | | error | error | error | error | | connected | request | insert | id | | | response | get | data | | | response | update | | | | getObject | delete | | | | putObject | count | | | | headObject | insertMany | | | | deleteObject | deleteAll | | | | deleteObjects | listIds | | | | listObjects | getMany | | | | count | getAll | | | | getAllKeys | | |

S3Database

error

s3db.on("error", (error) => console.error(error));

connected

s3db.on("connected", () => {});

S3Client

Using this reference for the events:

const client = s3db.client;

error

client.on("error", (error) => console.error(error));

request

Emitted when a request is generated to AWS.

client.on("request", (action, params) => {});

response

Emitted when a response is received from AWS.

client.on("response", (action, params, response) => {});

getObject

client.on("getObject", (options, response) => {});

putObject

client.on("putObject", (options, response) => {});

headObject

client.on("headObject", (options, response) => {});

deleteObject

client.on("deleteObject", (options, response) => {});

deleteObjects

client.on("deleteObjects", (options, response) => {});

listObjects

client.on("listObjects", (options, response) => {});

count

client.on("count", (options, response) => {});

getAllKeys

client.on("getAllKeys", (options, response) => {});

S3Resource

Using this reference for the events:

const resource = s3db.resource("leads");

error

resource.on("error", (err) => console.error(err));

insert

resource.on("insert", (data) => {});

get

resource.on("get", (data) => {});

update

resource.on("update", (attrs, data) => {});

delete

resource.on("delete", (id) => {});

count

resource.on("count", (count) => {});

insertMany

resource.on("insertMany", (count) => {});

getMany

resource.on("getMany", (count) => {});

getAll

resource.on("getAll", (count) => {});

deleteAll

resource.on("deleteAll", (count) => {});

listIds

resource.on("listIds", (count) => {});

Plugins

Anatomy of a plugin:

const MyPlugin = {
  setup(s3db: S3db) {},
  start() {},
};

We have an example of a costs simulator plugin here!

Cost simulation

S3's pricing deep dive:

Data volume [1 GB x 0.023 USD]: it relates to the total volume of storage used and requests volume but, in this implementation, we just upload 0 bytes files.
GET Requests [1,000 GET requests in a month x 0.0000004 USD per request = 0.0004 USD]: every read requests
PUT Requests [1,000 PUT requests for S3 Standard Storage x 0.000005 USD per request = 0.005 USD]: every write request
Data transfer [Internet: 1 GB x 0.09 USD per GB = 0.09 USD]:

Check by yourself the pricing page details at https://aws.amazon.com/s3/pricing/ and https://calculator.aws/#/addService/S3.

Big example

Lets try to simulate a big project where you have a database with a few tables:

pageviews: 100,000,000 lines of 100 bytes each
leads: 1,000,000 lines of 200 bytes each

const Fakerator = require("fakerator");
const fake = Fakerator("pt-BR");

const pageview = {
  ip: this.faker.internet.ip(),
  domain: this.faker.internet.url(),
  path: this.faker.internet.url(),
  query: `?q=${this.faker.lorem.word()}`,
};

const lead = {
  name: fake.names.name(),
  mobile: fake.phone.number(),
  email: fake.internet.email(),
  country: "Brazil",
  city: fake.address.city(),
  state: fake.address.countryCode(),
  address: fake.address.street(),
};

If you write the whole database of:

pageviews:
- 100,000,000 PUT requests for S3 Standard Storage x 0.000005 USD per request = 500.00 USD (S3 Standard PUT requests cost)
leads:
- 1,000,000 PUT requests for S3 Standard Storage x 0.000005 USD per request = 5.00 USD (S3 Standard PUT requests cost)

It will cost 505.00 USD, once.

If you want to read the whole database:

pageviews:
- 100,000,000 GET requests in a month x 0.0000004 USD per request = 40.00 USD (S3 Standard GET requests cost)
- (100,000,000 × 100 bytes)÷(1024×1000×1000) ≅ 10 Gb Internet: 10 GB x 0.09 USD per GB = 0.90 USD
leads:
- 1,000,000 GET requests in a month x 0.0000004 USD per request = 0.40 USD (S3 Standard GET requests cost)
- (1,000,000 × 200 bytes)÷(1024×1000×1000) ≅ 0.19 Gb Internet: 1 GB x 0.09 USD per GB = 0.09 USD

It will cost 41.39 USD, once.

Small example

Lets save some JWT tokens using the RFC:7519.

await s3db.createResource({
  name: "tokens",
  attributes: {
    iss: 'url|max:256',
    sub: 'string',
    aud: 'string',
    exp: 'number',
    email: 'email',
    name: 'string',
    scope: 'string',
    email_verified: 'boolean',
  })

function generateToken () {
  const token = createTokenLib(...)

  await resource.insert({
    id: token.jti || md5(token)
    ...token,
  })

  return token
}

function validateToken (token) {
  const id = token.jti || md5(token)

  if (!validateTokenSignature(token, ...)) {
    await resource.deleteById(id)
    throw new Error('invalid-token')
  }

  return resource.getById(id)
}

Roadmap

Tasks board can be found at this link!

Feel free to interact and PRs are welcome! :)

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

s3db.js

Motivation

Usage

Install

Quick setup

Insights

Database

s3db.connect()

Metadata file

Create a resource

Attributes

Reference:

Limitations:

Resources methods

Insert one

Get one

Update one

Delete one

Count

Insert many

Get many

Get all

Delete many

Delete all

List ids

Resource streams

Readable stream

Writable stream

S3 Client

getObject :link:

putObject :link:

headObject :link:

deleteObject :link:

deleteObjects :link:

listObjects :link:

count

getAllKeys

Events

S3Database

error

connected

S3Client

error

request

response

getObject

putObject

headObject

deleteObject

deleteObjects

listObjects

count

getAllKeys

S3Resource

error

insert

get

update

delete

count

insertMany

getMany

getAll

deleteAll

listIds

Plugins

Cost simulation

Big example

Small example

Roadmap