vercast
v0.2.9
Published
VERsion Controlled Application STate: A version control system for application state.
Downloads
31
Readme
Overview
Vercast (VERsion Controlled Application STate) is an application framework that is also a version control system. How do the two connect? Well, Vercast is based on the notion that the state of an application, the "data" that it holds, is important. It is not less important than its source code. For source code we use version control, so why not use it for state?
Version Control for Source Code
So why do we use version control for source code? We do this mainly for two reasons. The first is to allow users to to make any change to the code without being afraid to break things. If they do, they can always go back to a previous version and do things right. The second reason is to allow multiple users to make changes to the code in parallel, while keeping the code consistent. Code changes often cross source-file boundaries. For example, a code change may consist of adding a function definition in one file, and a use of that function in another. These changes need to be kept consistent for the program to compile and work properly. Version control allows such changes to move together from one developer to another, keeping the program consistent.
The Problem with State
For many years, relational databases were the de-facto standard in maintaining application state. If you go to the "hello world" tutorial of any decent application framework, you'll see just that. Relational databases needed the same notion of consistency source code does, just that instead of consistency between files, relational databases require consistency between records, because one "object" (e.g., a customer invoice) may contain a few of these (e.g., one for each purchased item). To allow these records to remain consistent, relational databases started using transactions. Transactions allow for database operations to have the look and fill as if they were made in sequence, although some of them were made in parallel.
As the Web grew, big applications started getting bottlenecked by transactions. Developers of such big applications then turned to other consistency models, and the NoSQL movement was born.
The NoSQL Movement
NoSQL is a common name for databases that came to address the problems that are common in the relational world. The main two problems were avilablility and performance scalability, which were addressed mostly by introducing more relaxed consistency models. The main idea is: less required consistency means less coordination needed, means less change for something to go wrong (hence more availability) and generally, less things to do (performance). Roughly speaking, the new consistency models adopted by NoSQL databases abandoned the concept of a transaction, and consider each change as standing on its own.
The problem is, the world did not change. The world still holds dependencies between different pieces of data. Sometimes it is easy to capture these pieces together (e.g., consider the entire customer invoice as a single "thing"), but sometimes it is more difficult. For these times, we are developing Vercast.
How Vercast Changes Things
Vercast is a version control system for application state. It allows developers to implement classes of objects that can be placed under version control. If the state of an entire application is built from these classes, the entire state can be branched off and merged back. As with source-code, different users may see a different state, but the state is always consistent with itself. Vercast is intended to provide availability and scalable performance without harming consistency.
How Vercast Works
Vercast is a Javascript framework, running over Node.js. It allows users (application developers) to implement classes which we call versionable classes. These are similar to classes in most object-oriented languages, except that instead of responding to method invocations or messages, as they are called in Smalltalk, versionable class instances respond to patches. The main difference is that patches need to be reversible. For this, they must convey not only what needs to change, but also what was the state before. If the state before applying the patch is different than what the patch assumes, the class needs to report a conflict.
In most OO programming languages and frameworks, objects have identities, and their value changes with time. This means that if you ask one object the same question twice, you may get a different answer each time. Vercast has no objects. It has object versions. Each version has an ID, and querying the same version with the same question will always yield the same answer. When we apply a patch to a version, a new version may be created. That new version (although can be treated as a new version of the same object), has a new ID. Versions may reference other versions. This is how complex structures can be built. Patches can propagate from one version to another, creating new versions along its path. Eventually a new version of the entire application state is created.
Vercast tracks the versions of the application state using a version graph. This is a graph in the mathematical sense, where the nodes are versions of the state, and the edges are the patches that lead from one version to another. This graph is useful for merging two versions. This is done by finding the lowest common ancestor (LCA) of these two versions in the graph, and applying the path that leads to one of these versions, to the other version.
Branches are Vercast's way of tracking the current version. With Vercast, there is no global sense of a current state, but each user / data-center / anything else can have its own branch, with its own current state. Branches are updated atomically, and can be temporarily unavailable from time to time. However, these failures may delay synchronization, but not harm the availability of the entire system.
Under the Hood
Vercast is designed to use different NoSQL databases to store versions, version graphs and branches. It is designed to use different NoSQL databases in parallel, leveraging each database's advantages. These databases are hooked to Vercast through drivers, which need to implement a simple interface (one per each kind of database).
Our aim is to provide a practical framework for applications. For this, we use a few optimizations to achieve reasonable performance with respect to CPU usage, disk usage and the amount of I/O. This work is still in progress, but given a reasonable implementation of the application, our algorithms can guarantee the same big-O complexity as when using regular NoSQL databases to store application state, given that we do track history.
Development
Vercast is a work in progress. We currently have a basic prototype that demonstrates the basic functionality. One can define classes and send patches to them, use branches and merge the state. We currently do not have real database drivers. Instead, we have in-memory stubs that allow us to play around with it.
Getting Started
For reference on the Vercast API please see the spec.
Pre-requisites:
- Install Node.js.
- We use mocha for BDD, so you'll need it as well.
Then you just clone the repo and run:
> mocha
from inside the main directory.
Then you can play around with it... Have fun :-)