refile

v0.2.11

Published

2 years ago

Redis-based content publisher

Downloads

0High
0Medium
0Low

evanx

refile

This service archives JSON documents from Redis to disk-based BLOB storage for publication.

Use case

The intended use case is for publishing cacheable data to the web. Structured data is stored in Redis for simplicity and in-memory speed. However, to reduce RAM requirements, large collections of JSON documents are archived to disk-based storage. Those documents are typically retrieved via HTTP e.g. using Nginx.

Config

See lib/config.js

module.exports = {
    description: 'Utility to archive Redis JSON keys to BLOB storage.',
    required: {
        blobStore: {
            description: 'the BLOB store options e.g. directory for file storage',
            default: 'data/'
        },
        blobStoreType: {
            description: 'the BLOB store type',
            default: 'fs-blob-store'
        },
        host: {
            description: 'the Redis host',
            default: 'localhost'
        },
        port: {
            description: 'the Redis port',
            default: 6379
        },
        snapshot: {
            description: 'the snapshot ID for recovery',
            default: 1
        },
        outq: {
            description: 'the output queue for archived keys',
            required: false
        },
        expire: {
            description: 'the expiry to set on archived keys',
            unit: 'seconds',
            example: 60,
            required: false
        },
        action: {
            description: 'the action to perform on archived keys if expire not set',
            options: ['delete'],
            required: false
        },
    }
}

Note that if outq is set, then the processed key is pushed to that queue. Further processing from that queue takes responsibility to expire or delete the archived keys.

if (config.outq) {
    multi.lpush(config.outq, key);
} else if (config.expire) {
    multi.expire(key, config.expire);
} else if (config.action === 'delete'){
    multi.del(key);
}

Otherwise if expire is set then once the key has been extracted to BLOB storage, it is set to expire.

Otherwise if action is set to delete then the key is deleted.

Usage

The application sets some JSON data in Redis:

redis-cli set user:evanxsummers '{"twitter": "@evanxsummers"}'

The application pushes the updated key to refile:key:q

redis-cli lpush refile:key:q user:evanxsummers

This utility will read the JSON content from Redis and write it to BLOB storage.

The intention is that the documents are retrieved via HTTP sourced from that BLOB storage, rather than from Redis.

A document that has been deleted can similarly be pushed to this queue:

redis-cli del user:evanxsummers
redis-cli lpush refile:key:q user:evanxsummers

where in this case, refile will remove the JSON file from the BLOB store.

Files

In the case of the key user:evanxsummers the following files are written to storage:

data/key/498/user-evanxsummers.json.gz
data/sha/858/858cc063aaa86d463676b39889fc317562b7bb1a.user-evanxsummers.json.gz
data/time/2017-02-14/01h12m20/998/user-evanxsummers.json.gz

where the file in data/key/ is the current version of the document to be published via HTTP.

Key files

Note that the path is split up with / so that when using a simple file system as BLOB storage, e.g served using Nginx, there will be a limited number of files per subdirectory, for practical reasons.

In the case of data/key/ the path is prefixed by first three hex digits of the SHA of the key itself:

evan@dijkstra:~$ echo -n 'user:evanxsummers' | sha1sum | cut -b1-3
498

Also note that any alphanumeric characters including colons are replaced with a dash, hence the file name user-evanxsummers.json.gz for the key user:evanxsummers

Immutable historical files

Additionally two historical versions are stored:

a copy named according to the SHA of the contents i.e. content addressable
a copy named by the timestamp when the content is archived

These two files are intended to be immutable facts, i.e. not overwritten by subsequent updates. The SHA files are intended for versioning, and the timestamped copies are useful for debugging.

$ zcat data/time/2017-02-14/01h12m20/998/user-evanxsummers.json.gz | jq
{
  "twitter": "@evanxsummers"
}

Incidently, naturally the compressed content can be streamed as in by the HTTP server, assuming the client accepts gzip encoding.

Snapshots

The SHA and timestamp for each archival is recorded in Redis against the current snapshot ID. That data in Redis, together with the above files, should be sufficient to create a snapshot, e.g. for recovery.

Another service will publish a specified snapshot from the BLOB store, by looking up the corresponding SHA (version) from Redis for that document and snapshot. Such a service can be useful for a rollback/forward strategy.

The following related services are planned:

delete an older snapshot, including related SHA files
recover a specific snapshot to BLOB storage
redirecting web server for a specific snapshot i.e. to the appropriate SHA file
proxying web server for a specific snapshot

Docker

You can build as follows:

docker build -t refile https://github.com/evanx/refile.git

For a sample deployment script with the following docker run command, see https://github.com/evanx/refile/blob/master/bin/redeploy.sh

docker run --name refile -d \
  --restart unless-stopped \
  --network=host \
  -v $home/volumes/refile/data:/data \
  -e NODE_ENV=$NODE_ENV \
  -e host=localhost \
  -e expire=2 \
  refile

where

the host's Redis instance is used since --network=host
the host's filesystem is used relative to a specified $home directory
refiled keys are expired after two seconds.

Test

See test/run.sh https://github.com/evanx/refile/blob/master/test/run.sh

redis-cli -h $encipherHost -p 6333 set user:evanxsummers '{"twitter":"evanxsummers"}'
redis-cli -h $encipherHost -p 6333 lpush refile:key:q user:evanxsummers
appContainer=`docker run --name refile-app -d \
  --network=refile-network \
  -v $HOME/tmp/volumes/refile/data:/data \
  -e host=$encipherHost \
  -e port=6333 \
  evanxsummers/refile`

Builds:

isolated network refile-network
isolated Redis instance named refile-redis
two spiped containers to test encrypt/decrypt tunnels
the prebuilt image evanxsummers/refile
host volume $HOME/volumes/refile/data

evan@dijkstra:~/refile$ sh test/run.sh
...
/home/evan/volumes/refile/data/time/2017-02-17/20h28m53/919/user-evanxsummers.json.gz
/home/evan/volumes/refile/data/key/498/user-evanxsummers.json.gz
/home/evan/volumes/refile/data/sha/814/8148962a123c3b629a8b78d70052a14d71563694.user-evanxsummers.json.gz/hom...
{"twitter":"evanxsummers"}

Implementation

See lib/main.js

We monitor the refile:key:q input queue.

    const blobStore = require(config.blobStoreType)(config.blobStore);
    while (true) {
        const key = await client.brpoplpushAsync('refile:key:q', 'refile:busy:key:q', 1);    
        ...        
    }

We record the following in Redis:

multi.hset(`refile:modtime:h`, key, timestamp);
multi.hset(`refile:sha:h`, key, sha);
multi.hset(`refile:${config.snapshot}:sha:h`, key, sha);
multi.zadd(`refile:${config.snapshot}:key:${key}:z`, timestamp, sha);

where the sha of the key is stored for the snapshot, and also the historical SHA's for a specific key are recorded in a sorted set by the timestamp

If the specified Redis key does not exist, we can assume it was deleted. In this case we record the following in Redis:

multi.hset(`refile:modtime:h`, key, timestamp);
multi.hdel(`refile:sha:h`, key);
multi.hdel(`refile:${config.snapshot}:sha:h`, key);
multi.zadd(`refile:${config.snapshot}:key:${key}:z`, timestamp, timestamp);

where we delete current entries for this key and add the timetamped to a sorted set, for point-of-time recovery.

Appication archetype

Incidently lib/index.js uses the redis-app-rpf application archetype.

require('redis-app-rpf')(require('./spec'), require('./main'));

where we extract the config from process.env according to the spec and invoke our main function.

See https://github.com/evanx/redis-app-rpf.

This provides lifecycle boilerplate to reuse across similar applications.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

refile

Use case

Config

Usage

Files

Key files

Immutable historical files

Snapshots

Docker

Test

Implementation

Appication archetype