durafetch-server

v0.3.0

Published

a year ago

Download *all* data stored in Cloudflare Durable Objects to a SQLite database.

Downloads

0High
0Medium
0Low

emadda

Cloudflare Durable Object Cloudflare Durable Object Cloudflare Workers durafetch

Durafetch

Durafetch allows you to download your Cloudflare Durable Object state into a local SQLite database file.

It consists of two npm packages:

1 . durafetch-server
- This repo - JS code that you import into your Cloudflare Worker.
- Wraps functions such as fetch to keep a list of Durable Object IDs.
- Works on localhost for usage during development.
2 . durafetch
- Repo: durafetch
- A CLI client that:
  - Downloads the list of Durable Object IDs.
  - Determines which objects have new data since the last run.
  - Connects to each Durable Object directly via WebSocket and downloads only the changes since last download.
  - Writes them to a local SQLite database.
- Usage:
  - npm install --global durafetch
  - durafetch --config-file config.json

Why use Durafetch?

As an admin UI

Durable Objects do not have an admin UI or any other method of observing their state other than the provided JS API's.

This makes development difficult as you cannot see what is stored in your Durable Object.

Durafetch gives you a SQL interface to see what the state of your system is so you can observe it during development and in production.

For queries.

Durable Objects are distributed by their nature, but it is often useful to create a central database of the state so you can query it as a single datastore. SQLite gives you a SQL query engine with JSON functions.

For backup and restoring.

There is no method to extract data from Durable Objects - Durafetch lets you do this.

Presently there is no method for restoring - this may be added later.

Steps to add Durafetch to your Cloudflare worker.

test/worker-1 Minimal example worker you can test locally.

Steps

1 . npm install durafetch-server
2 . Create a durafetch-with-config.ts file.
- This will pass the worker_name to durafetch.
- Import functions from here.
3 . Add DURAFETCH_DO to wrangler.toml, along with subdomain routes, DURAFETCH_AUTH env.
4 . Add wrap_worker_env() to worker fetch, along with external Durafetch API handler.
5 . Add wrap_durable_object(this) to any Durable Objects you want to download the data from.
6 . Add 127.0.0.1 durafetch_{worker_name}.localhost to /etc/hosts so that ws://durafetch_{worker_name}.localhost:1234 connects to workerd locally during development.

Setting up subdomain routing.

Each worker has its own Durafetch external API (CF service bindings are not used). The Durafetch CLI fetches data from each of them and writes them all to the same SQLite DB.

The Durafetch external API is reachable from a subdomain: durafetch_{your_worker_name}.your-domain.com. CF has automatic HTTPS cert provisioning for first level subdomains - the wildcard subdomain allows you to route any subdomain to your worker.

Add this to your wrangler.toml:

# Note: "wrangler dev" in version 3+ rewrites localhost URLs to match the zone_name when running locally.
# - This breaks subdomain routing as it replaces "http://durafetch_{your_worker_name}.localhost:1234" with "http://your-domain.com".
# - A temp fix is to add these routes using the CF web UI and remove them from your wrangler.toml file.
routes = [
    { pattern = "*.your-domain.com/*", zone_name = "your-domain.com" }
]

Add your-domain.com to Cloudflare DNS.

Add a CNAME record:

| Type | Name | Content | Proxy Status | |-------|------|-----------------------------|--------------| | CNAME | * | can.be.anything.example.com | Proxied |

Because this is "Proxied", the Content target is ignored and CF DNS returns the IP of your worker.

Using the CLI client to download data to a SQLite db

1 . npm install durafetch
2 . Save JSON config that looks like this (worker-1 is the name of your worker):

{
    "db_file": "./del/db.sqlite",
    "servers": [
        {
            "ws_url": "ws://durafetch_worker-1.localhost:8720",
            "auth_token": "secret_http_auth_bearer_token_replace_this_with_more_than_40_chars"
        }
    ]
}

3 . Start client: durafetch --config-file ./config.json

Pricing

Scalability

Durafetch has been designed with scalability in mind:

You should be able to extract more than 128MB of data (a single worker has 128MB of RAM) as WebSockets are used to stream the key/values instead of storing them in RAM.
WebSockets connect directly to each Durable Object (they do not go via a proxy Durable Object which would become a bottleneck).
Only changes are downloaded - DF will not re-read previous key/value data it already has downloaded.
- Minimizes requests, CPU time and costs.
Minimal data copies.
- The values of storage.put("key", "value") are not copied on every change - when the CLI client downloads data it reads the current state directly from the Durable Object.
  - Values do not get sent to intermediate storage (like R2 or another Durable Object) on change - this reduces request/storage costs.
- Changes to keys are recorded for each write - this allows the "download only changes" logic to work.
  - The assumption is that keys are generally much smaller than values.
- When using Durafetch the number of write requests are doubled - each write triggers a second write that records the key(s) that were written to along with a integer write_id.
  - The cost of write requests is currently $1 per million.

Please create an issue if you encounter any problems.

Security

In production only HTTPS/WSS is allowed.
Requests for the external Durafetch API must use a Authorization: Bearer HTTP header with the secret token set as an env var.

To do

Client
- [ ] Detect writes to the SQLite database, write them back to the remote object.
- [ ] Optionally keep a SQLite write audit history to make deletes visible and allow syncing to other systems.
- [ ] Export/restore CLI.
- [ ] Convert polling to bidrectional WebSocket when receiving the current list of Durable Objects.
Server
- [ ] Compress/Garbage collect the "changed keys" history stored in each Durable Object.
- [ ] Regex filter to include/exclude by class/name/key.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme