VoltDB Introduces Extreme Transaction Processing Database System

By: Charles Babcock
To read more DBA articles, visit http://dba.fyicenter.com/article/

VoltDB, a new Michael Stonebraker database system, is designed for extreme transaction processing, such as that encountered in equities trading or a website experiencing a surge in sales.

VoltDB became a commercially supported product May 25.

VoltDB is faster than other online transaction processing systems because it dispenses with many of the conventions of today's general-purpose relational systems, according to Andy Ellicott, VP of marketing, in an interview.

Both it and its data reside in server memory, or more likely, memories, since it was designed to automatically distribute itself over a cluster; it doesn't do calls to disk, one of the time consuming steps of relational systems as they load data from tables. In this respect, it resembles a combination of high speed, in-memory database systems, such as Oracle's TimesTen, and the key value stores, such as Cassandra and MongoDB, that deal with massive volumes of data by working as distributed system on a cluster.

Despite perceptions, the performance penalty of virtualization is marginal in many cases.

How To Benchmark Virtual App Performance

"It will perform 40 times faster than Oracle," claimed Andy Ellicott, a long-term Stonebraker business partner.

But Ellicott said VoltDB's innovations don't abandon the key data integrity features of relational transaction systems, unlike Cassandra and MongoDB. A VoltDB will meet the lost-at-sea researcher Jim Gray's ACID test and always return the same answer to a query, regardless of the query's timing; Jim Gray's ACID test means a transaction maintains atomicity, consistency, isolation and durability.

Cassandra and MongoDB assure their users that, eventually, as the system does its work, any jet-lagged data will end up being right, even if a rare query gets a wrong answer. Cassandra, CouchDB and MongoDB are not designed as transaction systems but as sorting, filtering and cleansing systems for massive amounts of unstructured data. They are also useful in read-only situations. VoltDB appropriates some of their distributed features, while keeping transactions intact.

VoltDB eliminates multi-threaded approaches to a piece of work. Instead, it latches a single thread to a transaction and allows it to run to completion, without getting caught in line behind other threads. In VoltDB, transactions are also embedded in the system as stored procedures, where they can be activated and run efficiently. That of course requires knowing the nature of the transaction ahead of time.

In some ways, VoltDB resembles the open source database MySQL running with another piece of open source code, Memcached, which manages the random access memories of a server cluster as a single resource. But MySQL tends to be sharded or divided up into a series of separate database systems, each running on a server. The application programmer has to cope with the nature of a sharded system and figure out how to distribute his database workload across it, according to a frequently answered questions document on the VoltDB site.

VoltDB, on the other hand, engages in a specific form of partitioning, that keeps the database looking the same to the application, regardless of how many servers it's been spread across. No new work of distributing transactions gets handed to the application programmer. In an example, Ellicott said each server might be divided into three partitions; partitions are able to execute transactions both autonomously or in coordination with other partitions.

VoltDB will outperform a distributed MySQL or centralized large relational database server by a factor of 45X, Ellicott claimed. Asked if that performance had been capture in a TPC benchmark, Ellicott said it had shown up in VoltDB's own testing; TPC performance proof was still to come.

VoltDB borrows some of the thinking of the key value store systems by replicating data to a second location on the cluster and if the implementer chooses, to a third location, preferably at a distance "wider than a hurricane" from the core system. Straight forward data replication takes place instead of logging each step of a transaction, so that it can be rebuilt in the event of natural disaster or system failure. If a core VoltDB system should fail, all the transactions and their data would exist at another location, where they could be reactivated. Logging is another resource hog that can be eliminated in a modern transaction design, Ellicott said.

Like the key value store systems, VoltDB scales up by adding commodity servers to the cluster. It is able to exploit multi-core servers, Ellicott said. VoltDB, tested on "TPC-like" transactions, executed 560,000 a second. (TPC results are usually given in a number per minute.) On a 12-node cluster, VoltDB executed 1.3 million online game transactions per second, according to Ellicott.

VoltDB, the system, comes out of the H-Store project, a research effort by MIT, Yale and Brown. Stonebraker is an adjunct professor at MIT. VoltDB, the company, located in Billerica, Mass., comes out of a predecessor Stonebraker firm, Vertica. Volt was commissioned to implement a different set of ideas from Vertica's column-oriented database system. Stonebraker previously worked on a complex event processing system, Streambase, and the Illustra object-relational system, bought by Informix in 1997, now part of IBM. He was a principal of Relational Technology Inc. when it fielded the Ingres system versus Oracle. VoltDB is available both as GPL open source code for free download and as a commercially supported subscription for $15,000 a year per four-server cluster. Over the past six months, 150 customers have been using a beta version of the system.

VoltDB is a 12-person company, with Stonebraker serving as CTO; Scott Jarr as president; and Bobbi Heath as VP of engineering.

Full article...