January 10, 2014

One in a Million

GET performance of MongoDB, Riak, BergDB, MySQL, and more

Ola Rende (a colleague at Citerus) and I will hold a presentation of a few NoSQL databases on January 21st in Stockholm. Welcome to participate! Contact Citerus for details and to sign-up.

While preparing, I wanted to write some code to test the databases for one specific use case. I chose a simple test that works for any database that can store key-value pairs.

First, the database is populated with one million 4-byte keys with corresponding 4-byte values. The value is the same as the key. The keys are the integers from 0 to 999,999 encoded in binary. They are added in random order. The key set happens to be dense (all non-negative integers < 1,000,000), but this is just a coincidence. The database must store the entries in a way that would allow any set of integers as keys.

Then the GET performance of the database is tested. This is done by accessing 100,000 of the entries selected randomly. This is repeated three times and the best result (best average GET time) is recorded. This means that the cache of the database should be filled and presumably, there is no need to read from disk for the 2nd and 3rd run. The access is made from a Java process on my computer. My setup: Ubuntu 12.04 LTS, Toshiba Z930 laptop, i7 processor, SSD disk, JVM 1.7 from Oracle. The database is either run in-process for the in-processes databases, or as a separate process on localhost. By default, I used the default settings of the databases.

Here is the result:


Conclusions

For a data set that fits into cache, the tested in-process databases are in the order of 100 times faster that the out-of-process databases (MySQL, MongoDB, Riak). This is not surprising. On my computer, I measured the network round-trip (TCP, one byte back and forth) between two processes on localhost to be 50 us.

For many applications, all databases tested can be considered fast. Riak responds in a little more than 1 ms which is acceptable to many applications. Note that the tests are run with client and server on localhost. The network round-trip overhead could of course be 10 ms or something for computers further away from each other. Also, if data would not fit into cache, the disk access time may be as high as 10 ms and it would limit the performance. Consider using SSD disks!

Of, course the result should not be seen as some overall evaluation of the databases. For example, Riak scales horizontally and provides high-availability and BergDB supports historic queries. There is much, much more to these databases then what is tested here.

Other comments:
  • IO limits performance. Most likely, the network or the disk IO will limit the performance of your database setup. When data is cached and available in-process, lookups can be done within a few microseconds, while disk and network access times often are in the order of milliseconds.
  • Consider an in-process database. If you want an application database, not an integration database, an in-process database may be a performant alternative.
  • Little benefits of all-in-memory databases? I question the benefits of all-in-memory databases. The in-process databases Berkeley DB and BergDB perform on par with the all-in-memory solutions (TreeMap, Prevayler). So why use an all-in-memory database with its problem of a slow startup (all data must be read from disk to RAM at startup time which may take a long time)?
  • You might need 1000 servers to beat one. Riak and Cassandra may scale horizontally and linearly, but a single server can have a very impressive throughput for some use cases. So, don't get a huge cluster of servers if you only need one server. See The LMAX Architecture by Martin Fowler as an example.

Per database comments

TreeMap. This is the java.util.TreeMap class. Not a real database, but included for comparison. Access to it is made in a synchronized block.

Prevayler. Prevayler 2.6. Prevayler is an all-in-memory database. All data is stored in memory and must be read to RAM at startup. The data is stored in one big serializable Java object. Since it is stored as a Java object in the JVM, the performance for random access is optimal. It takes less than one microsecond to get a value given its key. This is the same performance as a TreeMap; actually, a TreeMap is used to store the data for this performance test.

BergDB. BergDB is a database I created. When data is cached, the GET time is on par with what is offered by an all-in-memory solution like Prevayler or TreeMap.

Berkeley. Berkeley DB, Java Edition, 5.0.97 is a stable, high-performance in-process database. "BerkeleyT" is Berkeley DB used with transactions enabled. When comparing performance, note that BergDB and Prevayler always supports transactions (cannot be disabled).

MySQL. MySQL 5.5 with default settings (InnoDB storage engine, isolation level: repeatable read). The officially supported JDBC Java Driver is used. To save time, only 100k keys were used. The actual GET time for 1M keys could be somewhat higher.

MongoDB. MongoDB with their official Java driver. Default settings.

Riak. Latest stable release of Riak with Java client 1.4.2. For this database, I used only 100k key-value entries to save some time. So the actual GET time for 1M keys could be somewhat higher.

18 comments:

  1. Java is the best programming language that are serving as a entry point for fresher like me. The content you have provided here tells me that clearly. This will be useful for my training program. Thanks for sharing this useful information here. You are running a great blog though.

    JAVA J2EE Training in Chennai | JAVA Training in Chennai | web designing course in chennai

    ReplyDelete
  2. Responsive desing can yield a very good revenue to a business. It has been discovered since the usage of multiple devices increase. The content furnished above too tells the same. Thanks for sharing this information in here. Please keep bloging content like this.

    Web designing course in chennai | Web designing training | PHP Training in Chennai

    ReplyDelete
  3. Thanks for sharing those useful basic programming stuff’s, it helps me to explore my knowledge in programming...if you want to switch your career in developing area you should know the basic of programming’s for that you have to read python because it was the first programming language for more visit.
    python training in chennai|Python Course in Chennai

    ReplyDelete
  4. This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic.
    Regards,
    sas training in Chennai|sas course in Chennai|sas courses in chennai

    ReplyDelete
  5. your data base concepts are really good and it is interesting too thanks for sharing those information.


    sas training in chennai

    ReplyDelete


  6. this blog is new and informative , it is really interesting and useful too , thanks for sharing this information.


    sas training in chennai

    ReplyDelete
  7. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it. The Struts, Spring, Hibernate are the advanced level of programming language which are most widely used.
    hibernate training in chennai | hibernate training

    ReplyDelete
  8. Ethical hacking describes hacking performed by a company or individual to help them to identify potential threats on a computer or network.
    Ethical hacking Course in Chennai | Ethical hacking Training in Chennai

    ReplyDelete
  9. Very useful content thanks for sharing such a informative content which provided me the required information on the various technology.
    AngularJS Training in Chennai | AngularJS course in Chennai

    ReplyDelete
  10. Nice interesting information on the latest arrived technology which helped me to get update according to the recent trends.
    Salesforce Training in Chennai | Salesforce Course in Chennai

    ReplyDelete
  11. I simply wanted to thank you so much again. I am not sure the things that I might have gone through without the type of hints revealed by you regarding that situation.
    Best Hadoop Training Institute In chennai

    amazon-web-services-training-institute-in-chennai

    ReplyDelete
  12. I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.

    Best Hadoop Training in Chennai

    ReplyDelete
  13. This comment has been removed by the author.

    ReplyDelete
  14. This company was began in San Francisco by a female sex educator in the '70s and is known for helping popularize the Magic Wand. The Good Vibrations and Babeland websites share the same structure, however the merchandise and inventory differ slightly, so it is a good suggestion to examine both websites if you're looking for a selected merchandise. Good Vibrations additionally has a workers sexologist, Carol Queen, PhD, who is properly known|a extensively known} sex expert and a rose sex toy wonderful resource. Okay, so perhaps you’re thinking about anal play, but butt plugs aren’t precisely what you want—chances are, you might choose anal beads. A string of beads connected by a handle or ring, anal beads can be inserted progressively, permitting their person to get use the sensations. Anal beads {can additionally be|may additionally be|can be} a fantastic first step to anal play, as you'll be able to|you possibly can} insert as few or as many beads as you want.

    ReplyDelete
  15. In a digital casino sport, the result result} of each sport is dependent on the data produced by apseudo-random number generator. This determines the order of the cards in card games, the result result} of a dice throw, or the results produced by the spinning of a slot 파라오 카지노 machine or roulette wheel. PRNGs use a set of mathematical directions to generate an extended stream of numbers that give the impression of true randomness. When carried out appropriately, a PRNG algorithm will certain that|be positive that} the games are each honest and unpredictable.

    ReplyDelete