npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

s3db.js

v3.2.6

Published

Use AWS S3, the world's most reliable document storage, as a database with this ORM.

Downloads

71

Readme

s3db.js

license: unlicense npm version Maintainability Coverage Status

Another way to create a cheap document-base database with an easy ORM to handle your dataset!

  1. Motivation
  2. Usage
    1. Install
    2. Quick Setup
    3. Insights
    4. Database
    5. Create a resource
  3. Resource methods
    1. Insert one
    2. Get one
    3. Update one
    4. Delete one
    5. Count
    6. Insert many
    7. Get many
    8. Get all
    9. Delete many
    10. Delete all
    11. List ids
  4. Resource streams
    1. Readable stream
    2. Writable stream
  5. S3 Client
  6. Events
  7. Plugins
  8. Cost Simulation
    1. Big Example
    2. Small example
  9. Roadmap

Motivation

First of all:

  1. Nothing is for free, but it can be cheaper.
  2. I'm not responsible for your AWS Costs strategy, use s3db.js at your own risk.
  3. Please, do not use in production!

Let's go!

You might know AWS's S3 product for its high availability and its cheap pricing rules. I'll show you another clever and funny way to use S3.

AWS allows you define Metadata to every single file you upload into your bucket. This attribute must be defined within a 2kb limit using in UTF-8 encoding. As this encoding may vary the bytes width for each symbol you may use [500 to 2000] chars of metadata storage. Follow the docs at AWS S3 User Guide: Using metadata.

There is another management subset of data called tags that is used globally as [key, value] params. You can assign 10 tags with the conditions of: the key must be at most 128 unicode chars lengthy and the value up to 256 chars. With those key-values we can use more 2.5kb of data, unicode will allow you to use up to 2500 more chars. Follow the official docs at AWS User Guide: Object Tagging.

With all this set you may store objects that should be able to store up to 4.5kb of free space per object.

Check the cost simulation section below for a deep cost dive!

Lets give it a try! :)


Usage

You may check the snippets bellow or go straight to the Examples section!

Install

npm i s3db.js

# or

yarn add s3db.js

Quick setup

Our S3db client use connection string params.

import { S3db } from "s3db.js";

const {
  AWS_BUCKET,
  AWS_ACCESS_KEY_ID,
  AWS_SECRET_ACCESS_KEY,
} = process.env

const s3db = new S3db({
  uri: `s3://${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}@${AWS_BUCKET}/databases/mydatabase`
});

s3db
  .connect()
  .then(() => console.log('connected!')))

If you do use dotenv package:

import * as dotenv from "dotenv";
dotenv.config();

import { S3db } from "s3db.js";

Insights

  • This implementation of ORM simulates a document repository. Due to the fact that s3db.js uses aws-sdk's' S3 api; all requests are GET/PUT as key=value resources. So the best case scenario is to access like a document implementation.

  • For better use of the cache and listing, the best ID format is to use sequential ids with leading zeros (eq: 00001, 00002, 00003) due to S3 internal keys sorting method. But you will need to manage this incremental ID by your own.

Database

Your s3db.js client can be initiated with options:

| option | optional | description | type | default | | :---------: | :------: | :-------------------------------------------------: | :-------: | :---------: | | cache | true | Persist searched data to reduce repeated requests | boolean | undefined | | parallelism | true | Number of simultaneous tasks | number | 10 | | passphrase | true | Your encryption secret | string | undefined | | ttl | true | (Coming soon) TTL to your cache duration in seconds | number | 86400 | | uri | false | A url as your S3 connection string | string | undefined |

Config example:

const {
  AWS_BUCKET = "my-bucket",
  AWS_ACCESS_KEY_ID = "secret",
  AWS_SECRET_ACCESS_KEY = "secret",
  AWS_BUCKET_PREFIX = "databases/test-" + Date.now(),
} = process.env;

const uri = `s3://${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}@${AWS_BUCKET}/${AWS_BUCKET_PREFIX}`;

const options = {
  uri,
  parallelism: 25,
  passphrase: fs.readFileSync("./cert.pem"),
};

s3db.connect()

This method must always be invoked before any operation take place. This will interact with AWS' S3 api and check the itens below:

  1. With current credentials:
    • Check if client has access to the S3 bucket.
    • Check if client has access to bucket life-cycle policies.
  2. With defined database:
    • Check if there is already a database in this connection string.
      • If any database is found, downloads it's medatada and loads each Resource definition.
      • Else, it will generate an empty metadata file into this prefix and mark that this is a new database from scratch.

Metadata file

s3db.js will generate a file /s3db.json at the pre-defined prefix with this structure:

{
  // file version
  "version": "1",

  // previously defined resources
  "resources": {
    // definition example
    "leads": {
      "name": "leads",

      // resource options
      "options": {},

      // resource defined schema
      "schema": {
        "name": "string",
        "token": "secret"
      },

      // rules to simplify metadata usage
      "mapper": {
        "name": "0",
        "token": "1"
      },
    }
  }
}

Create a resource

Resources are definitions of data collections.

// resource
const attributes = {
  utm: {
    source: "string|optional",
    medium: "string|optional",
    campaign: "string|optional",
    term: "string|optional",
  },
  lead: {
    fullName: "string",
    mobileNumber: "string",
    personalEmail: "email",
  },
};

const resource = await s3db.createResource({
  name: "leads",
  attributes,
});

Resources' names cannot prefix each other, like: leads and leads-copy! S3's api lists keys using prefix notation, so every time you list leads, all keys of leads-copy will appear as well.

Attributes

s3db.js use the fastest-validator package to define and validate your resource. Some few examples:

const attributes = {
  // few simple examples
  name: "string|min:4|max:64|trim",
  email: "email|nullable",
  mobile: "string|optional",
  count: "number|integer|positive",
  corrency: "corrency|symbol:R$",
  createdAt: "date",
  website: "url",
  id: "uuid",
  ids: "array|items:uuid|unique",

  // s3db defines a custom type "secret" that is encrypted
  token: "secret",

  // nested data works aswell
  geo: {
    lat: "number",
    long: "number",
    city: "string",
  },

  // may have multiple definitions.
  address_number: ["string", "number"],
};
Reference:

You may just use the reference:

const Leads = s3db.resource("leads");
Limitations:

As we need to store the resource definition within a JSON file, to keep your definitions intact the best way is to use the string-based shorthand definitions in your resource definition.

By design, the resource definition will will strip all functions in attributes to avoid eval() calls.

The fastest-validator starts with the params below:

// fastest-validator params
{
  useNewCustomCheckerFunction: true,
  defaults: {
    object: {
      strict: "remove",
    },
  },
}

Resources methods

Consider resource as:

const resource = s3db.resource("leads");

Insert one

// data
const insertedData = await resource.insert({
  id: "[email protected]", // if not defined a id will be generated!
  utm: {
    source: "abc",
  },
  lead: {
    fullName: "My Complex Name",
    personalEmail: "[email protected]",
    mobileNumber: "+5511234567890",
  },
  invalidAttr: "this attribute will disappear",
});

// {
//   id: "[email protected]",
//   utm: {
//     source: "abc",
//   },
//   lead: {
//     fullName: "My Complex Name",
//     personalEmail: "[email protected]",
//     mobileNumber: "+5511234567890",
//   },
//   invalidAttr: "this attribute will disappear",
// }

If not defined an id attribute, s3db.js will use nanoid to generate a random unique id!

Get one

const obj = await resource.get("[email protected]");

// {
//   id: "[email protected]",
//   utm: {
//     source: "abc",
//   },
//   lead: {
//     fullName: "My Complex Name",
//     personalEmail: "[email protected]",
//     mobileNumber: "+5511234567890",
//   },
// }

Update one

const obj = await resource.update("[email protected]", {
  lead: {
    fullName: "My New Name",
    mobileNumber: "+5511999999999",
  },
});

// {
//   id: "[email protected]",
//   utm: {
//     source: "abc",
//   },
//   lead: {
//     fullName: "My New Name",
//     personalEmail: "[email protected]",
//     mobileNumber: "+5511999999999",
//   },
// }

Delete one

await resource.delete(id);

Count

await resource.count();

// 101

Insert many

You may bulk insert data with a friendly method that receives a list of objects.

const objects = new Array(100).fill(0).map((v, k) => ({
  id: `bulk-${k}@mymail.com`,
  lead: {
    fullName: "My Test Name",
    personalEmail: `bulk-${k}@mymail.com`,
    mobileNumber: "+55 11 1234567890",
  },
}));

await resource.insertMany(objects);

Keep in mind that we need to send a request to each object to be created. There is an option to change the amount of simultaneos connections that your client will handle.

const s3db = new S3db({
  parallelism: 100, // default = 10
});

This method uses supercharge/promise-pool to organize the parallel promises.

Get many

await resource.getMany(["id1", "id2", "id3 "]);

// [
//   obj1,
//   obj2,
//   obj3,
// ]

Get all

const data = await resource.getAll();

// [
//   obj1,
//   obj2,
//   ...
// ]

Delete many

await resource.deleteMany(["id1", "id2", "id3 "]);

Delete all

await resource.deleteAll();

List ids

const ids = await resource.listIds();

// [
//   'id1',
//   'id2',
//   'id3',
// ]

Resource streams

As we need to request the metadata for each id to return it's attributes, a better way to handle a huge amount off data might be using streams.

Readable stream

const readableStream = await resource.readable();

readableStream.on("id", (id) => console.log("id =", id));
readableStream.on("data", (lead) => console.log("lead.id =", lead.id));
readableStream.on("end", console.log("end"));

Writable stream

const writableStream = await resource.writable();

writableStream.write({
  lead: {
    fullName: "My Test Name",
    personalEmail: `bulk-${k}@mymail.com`,
    mobileNumber: "+55 11 1234567890",
  },
});

S3 Client

s3db.js has a S3 proxied client named S3Client. It brings a few handy and less verbose functions to deal with AWS S3's api.

import { S3Client } from "s3db.js";

const client = new S3Client({ connectionString });

Each method has a :link: link to the official aws-sdk docs.

getObject :link:
const { Body, Metadata } = await client.getObject({
  key: `my-prefixed-file.csv`,
});

// AWS.Response
putObject :link:
const response = await client.putObject({
  key: `my-prefixed-file.csv`,
  contentType: "text/csv",
  metadata: { a: "1", b: "2", c: "3" },
  body: "a;b;c\n1;2;3\n4;5;6",
});

// AWS.Response
headObject :link:
const { Metadata } = await client.headObject({
  key: `my-prefixed-file.csv`,
});

// AWS.Response
deleteObject :link:
const response = await client.deleteObject({
  key: `my-prefixed-file.csv`,
});

// AWS.Response
deleteObjects :link:
const response = await client.deleteObjects({
  keys: [`my-prefixed-file.csv`, `my-other-prefixed-file.csv`],
});

// AWS.Response
listObjects :link:
const response = await client.listObjects({
  prefix: `my-subdir`,
});

// AWS.Response
count

Custom made method to make it easier to count keys within a listObjects loop.

const count = await client.count({
  prefix: `my-subdir`,
});

// 10
getAllKeys

Custom made method to make it easier to return all keys in a subpath within a listObjects loop.

All returned keys will have the it's fullpath replaced with the current "scope" path.

const keys = await client.getAllKeys({
  prefix: `my-subdir`,
});

// [
//   key1,
//   key2,
//   ...
// ]

Events

The 3 main classes S3db, Resource and S3Client are extensions of Javascript's EventEmitter.

| S3Database | S3Client | S3Resource | S3Resource Readable Stream | | ---------- | ------------- | ---------- | -------------------------- | | error | error | error | error | | connected | request | insert | id | | | response | get | data | | | response | update | | | | getObject | delete | | | | putObject | count | | | | headObject | insertMany | | | | deleteObject | deleteAll | | | | deleteObjects | listIds | | | | listObjects | getMany | | | | count | getAll | | | | getAllKeys | | |

S3Database

error

s3db.on("error", (error) => console.error(error));

connected

s3db.on("connected", () => {});

S3Client

Using this reference for the events:

const client = s3db.client;

error

client.on("error", (error) => console.error(error));

request

Emitted when a request is generated to AWS.

client.on("request", (action, params) => {});

response

Emitted when a response is received from AWS.

client.on("response", (action, params, response) => {});

getObject

client.on("getObject", (options, response) => {});

putObject

client.on("putObject", (options, response) => {});

headObject

client.on("headObject", (options, response) => {});

deleteObject

client.on("deleteObject", (options, response) => {});

deleteObjects

client.on("deleteObjects", (options, response) => {});

listObjects

client.on("listObjects", (options, response) => {});

count

client.on("count", (options, response) => {});

getAllKeys

client.on("getAllKeys", (options, response) => {});

S3Resource

Using this reference for the events:

const resource = s3db.resource("leads");

error

resource.on("error", (err) => console.error(err));

insert

resource.on("insert", (data) => {});

get

resource.on("get", (data) => {});

update

resource.on("update", (attrs, data) => {});

delete

resource.on("delete", (id) => {});

count

resource.on("count", (count) => {});

insertMany

resource.on("insertMany", (count) => {});

getMany

resource.on("getMany", (count) => {});

getAll

resource.on("getAll", (count) => {});

deleteAll

resource.on("deleteAll", (count) => {});

listIds

resource.on("listIds", (count) => {});

Plugins

Anatomy of a plugin:

const MyPlugin = {
  setup(s3db: S3db) {},
  start() {},
};

We have an example of a costs simulator plugin here!


Cost simulation

S3's pricing deep dive:

  • Data volume [1 GB x 0.023 USD]: it relates to the total volume of storage used and requests volume but, in this implementation, we just upload 0 bytes files.
  • GET Requests [1,000 GET requests in a month x 0.0000004 USD per request = 0.0004 USD]: every read requests
  • PUT Requests [1,000 PUT requests for S3 Standard Storage x 0.000005 USD per request = 0.005 USD]: every write request
  • Data transfer [Internet: 1 GB x 0.09 USD per GB = 0.09 USD]:

Check by yourself the pricing page details at https://aws.amazon.com/s3/pricing/ and https://calculator.aws/#/addService/S3.

Big example

Lets try to simulate a big project where you have a database with a few tables:

  • pageviews: 100,000,000 lines of 100 bytes each
  • leads: 1,000,000 lines of 200 bytes each
const Fakerator = require("fakerator");
const fake = Fakerator("pt-BR");

const pageview = {
  ip: this.faker.internet.ip(),
  domain: this.faker.internet.url(),
  path: this.faker.internet.url(),
  query: `?q=${this.faker.lorem.word()}`,
};

const lead = {
  name: fake.names.name(),
  mobile: fake.phone.number(),
  email: fake.internet.email(),
  country: "Brazil",
  city: fake.address.city(),
  state: fake.address.countryCode(),
  address: fake.address.street(),
};

If you write the whole database of:

  • pageviews:
    • 100,000,000 PUT requests for S3 Standard Storage x 0.000005 USD per request = 500.00 USD (S3 Standard PUT requests cost)
  • leads:
    • 1,000,000 PUT requests for S3 Standard Storage x 0.000005 USD per request = 5.00 USD (S3 Standard PUT requests cost)

It will cost 505.00 USD, once.

If you want to read the whole database:

  • pageviews:
    • 100,000,000 GET requests in a month x 0.0000004 USD per request = 40.00 USD (S3 Standard GET requests cost)
    • (100,000,000 × 100 bytes)÷(1024×1000×1000) ≅ 10 Gb Internet: 10 GB x 0.09 USD per GB = 0.90 USD
  • leads:
    • 1,000,000 GET requests in a month x 0.0000004 USD per request = 0.40 USD (S3 Standard GET requests cost)
    • (1,000,000 × 200 bytes)÷(1024×1000×1000) ≅ 0.19 Gb Internet: 1 GB x 0.09 USD per GB = 0.09 USD

It will cost 41.39 USD, once.

Small example

Lets save some JWT tokens using the RFC:7519.

await s3db.createResource({
  name: "tokens",
  attributes: {
    iss: 'url|max:256',
    sub: 'string',
    aud: 'string',
    exp: 'number',
    email: 'email',
    name: 'string',
    scope: 'string',
    email_verified: 'boolean',
  })

function generateToken () {
  const token = createTokenLib(...)

  await resource.insert({
    id: token.jti || md5(token)
    ...token,
  })

  return token
}

function validateToken (token) {
  const id = token.jti || md5(token)

  if (!validateTokenSignature(token, ...)) {
    await resource.deleteById(id)
    throw new Error('invalid-token')
  }

  return resource.getById(id)
}

Roadmap

Tasks board can be found at this link!

Feel free to interact and PRs are welcome! :)