My Take on IT: One in a Million, PUT

Here's the performance result for PUT operations. See the previous post for information on the setup. A PUT operations updates the value of an existing key. One million keys are stored. The PUT operations are run one-by-one by a single thread. 10,000 or 100,000 PUT operations (random keys) are made in sequence and timed. The time per PUT is recorded.

This result should be taken with a grain of salt! The default settings are used. For MySQL 5.5 this means that the InnoDB storage engine is used and it flushes to disk for every write (innodb_flush_log_at_trx_commit=1) before it returns to the client. Presumably, this is the case for Riak too, while MongoDB seems to return to the client without flushing data to disk.

It would be interesting to dig further into this... As is always the case with performance testing. My ugly source code is available upon request.

As often is the case, I/O limits performance, not the CPU. When the data is cached and available in-process, a PUT operation only needs to take in the order of 10 us (BergDB, Berkeley, BerkeleyT). Thus, a throughput in the order of 100,000 transactions/s is achieved. Prevayler is a bit slower. This is likely because of the overhead of serializing Java objects.

The in-process databases write data for every transaction, but they do not flush the data all the way to physical disk storage for every transaction. It is impossible to do if one wants to achieve a throughput in the order of 100,000 transactions/s. The default approach by BergDB is to flush data to disk every 0.1 seconds.

There are many important things to say about disk flushing, how to define "durable", and what I like to call durability control. Hopefully, I will be able to address these important topics in future posts.

My Take on IT

January 16, 2014

One in a Million, PUT

No comments:

Post a Comment