January 10, 2014

One in a Million

GET performance of MongoDB, Riak, BergDB, MySQL, and more

Ola Rende (a colleague at Citerus) and I will hold a presentation of a few NoSQL databases on January 21st in Stockholm. Welcome to participate! Contact Citerus for details and to sign-up.

While preparing, I wanted to write some code to test the databases for one specific use case. I chose a simple test that works for any database that can store key-value pairs.

First, the database is populated with one million 4-byte keys with corresponding 4-byte values. The value is the same as the key. The keys are the integers from 0 to 999,999 encoded in binary. They are added in random order. The key set happens to be dense (all non-negative integers < 1,000,000), but this is just a coincidence. The database must store the entries in a way that would allow any set of integers as keys.

Then the GET performance of the database is tested. This is done by accessing 100,000 of the entries selected randomly. This is repeated three times and the best result (best average GET time) is recorded. This means that the cache of the database should be filled and presumably, there is no need to read from disk for the 2nd and 3rd run. The access is made from a Java process on my computer. My setup: Ubuntu 12.04 LTS, Toshiba Z930 laptop, i7 processor, SSD disk, JVM 1.7 from Oracle. The database is either run in-process for the in-processes databases, or as a separate process on localhost. By default, I used the default settings of the databases.

Here is the result:


Conclusions

For a data set that fits into cache, the tested in-process databases are in the order of 100 times faster that the out-of-process databases (MySQL, MongoDB, Riak). This is not surprising. On my computer, I measured the network round-trip (TCP, one byte back and forth) between two processes on localhost to be 50 us.

For many applications, all databases tested can be considered fast. Riak responds in a little more than 1 ms which is acceptable to many applications. Note that the tests are run with client and server on localhost. The network round-trip overhead could of course be 10 ms or something for computers further away from each other. Also, if data would not fit into cache, the disk access time may be as high as 10 ms and it would limit the performance. Consider using SSD disks!

Of, course the result should not be seen as some overall evaluation of the databases. For example, Riak scales horizontally and provides high-availability and BergDB supports historic queries. There is much, much more to these databases then what is tested here.

Other comments:
  • IO limits performance. Most likely, the network or the disk IO will limit the performance of your database setup. When data is cached and available in-process, lookups can be done within a few microseconds, while disk and network access times often are in the order of milliseconds.
  • Consider an in-process database. If you want an application database, not an integration database, an in-process database may be a performant alternative.
  • Little benefits of all-in-memory databases? I question the benefits of all-in-memory databases. The in-process databases Berkeley DB and BergDB perform on par with the all-in-memory solutions (TreeMap, Prevayler). So why use an all-in-memory database with its problem of a slow startup (all data must be read from disk to RAM at startup time which may take a long time)?
  • You might need 1000 servers to beat one. Riak and Cassandra may scale horizontally and linearly, but a single server can have a very impressive throughput for some use cases. So, don't get a huge cluster of servers if you only need one server. See The LMAX Architecture by Martin Fowler as an example.

Per database comments

TreeMap. This is the java.util.TreeMap class. Not a real database, but included for comparison. Access to it is made in a synchronized block.

Prevayler. Prevayler 2.6. Prevayler is an all-in-memory database. All data is stored in memory and must be read to RAM at startup. The data is stored in one big serializable Java object. Since it is stored as a Java object in the JVM, the performance for random access is optimal. It takes less than one microsecond to get a value given its key. This is the same performance as a TreeMap; actually, a TreeMap is used to store the data for this performance test.

BergDB. BergDB is a database I created. When data is cached, the GET time is on par with what is offered by an all-in-memory solution like Prevayler or TreeMap.

Berkeley. Berkeley DB, Java Edition, 5.0.97 is a stable, high-performance in-process database. "BerkeleyT" is Berkeley DB used with transactions enabled. When comparing performance, note that BergDB and Prevayler always supports transactions (cannot be disabled).

MySQL. MySQL 5.5 with default settings (InnoDB storage engine, isolation level: repeatable read). The officially supported JDBC Java Driver is used. To save time, only 100k keys were used. The actual GET time for 1M keys could be somewhat higher.

MongoDB. MongoDB with their official Java driver. Default settings.

Riak. Latest stable release of Riak with Java client 1.4.2. For this database, I used only 100k key-value entries to save some time. So the actual GET time for 1M keys could be somewhat higher.

17 comments:

  1. Java is the best programming language that are serving as a entry point for fresher like me. The content you have provided here tells me that clearly. This will be useful for my training program. Thanks for sharing this useful information here. You are running a great blog though.

    JAVA J2EE Training in Chennai | JAVA Training in Chennai | web designing course in chennai

    ReplyDelete
    Replies
    1. I have read your blog its very attractive and impressive. I like it your blog.

      Java Training in Chennai Core Java Training in Chennai Core Java Training in Chennai

      Java Online Training Java Online Training Core Java 8 Training in Chennai Core java 8 online training JavaEE Training in Chennai Java EE Training in Chennai

      Delete
    2. I've read some just right stuff here. Certainly price bookmarking for revisiting. I surprise how a lot effort you put to make such a excellent informative website. study in australia consultant in jalandhar

      Delete
  2. Responsive desing can yield a very good revenue to a business. It has been discovered since the usage of multiple devices increase. The content furnished above too tells the same. Thanks for sharing this information in here. Please keep bloging content like this.

    Web designing course in chennai | Web designing training | PHP Training in Chennai

    ReplyDelete
  3. Thanks for sharing those useful basic programming stuff’s, it helps me to explore my knowledge in programming...if you want to switch your career in developing area you should know the basic of programming’s for that you have to read python because it was the first programming language for more visit.
    python training in chennai|Python Course in Chennai

    ReplyDelete
  4. This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic.
    Regards,
    sas training in Chennai|sas course in Chennai|sas courses in chennai

    ReplyDelete
  5. Cloud has beome the common word that is being used by most of the professional these day. The reason for relying on this technology is security. Your content too lecture the same. Thanks for sharing this worth able information in here. Keep blogging article like this.

    Hadoop Training Chennai
    | Hadoop training institutes in chennai | Manual testing training in Chennai

    ReplyDelete
  6. your data base concepts are really good and it is interesting too thanks for sharing those information.


    sas training in chennai

    ReplyDelete


  7. this blog is new and informative , it is really interesting and useful too , thanks for sharing this information.


    sas training in chennai

    ReplyDelete
  8. HTML5 is the recently arrived most trending technology that has bright future. If you are interested in studying HTML5 training visit this website.
    html5 training in chennai | html5 training chennai | html5 course in chennai

    ReplyDelete
  9. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it. The Struts, Spring, Hibernate are the advanced level of programming language which are most widely used.
    hibernate training in chennai | hibernate training

    ReplyDelete
  10. Ethical hacking describes hacking performed by a company or individual to help them to identify potential threats on a computer or network.
    Ethical hacking Course in Chennai | Ethical hacking Training in Chennai

    ReplyDelete
  11. Very useful content thanks for sharing such a informative content which provided me the required information on the various technology.
    AngularJS Training in Chennai | AngularJS course in Chennai

    ReplyDelete
  12. Well Said, you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas.
    Web Designing Course in Chennai | web designing training in chennai

    ReplyDelete
  13. Nice interesting information on the latest arrived technology which helped me to get update according to the recent trends.
    Salesforce Training in Chennai | Salesforce Course in Chennai

    ReplyDelete
  14. Very interesting content which helps me to get the in depth knowledge about the technology. To know more details about the course visit this website.
    Digital marketing course in Chennai | Digital marketing training in Chennai

    ReplyDelete
  15. Wonderful blog.. Thanks for sharing informative blog.. its very useful to me..

    iOS Training in Chennai

    ReplyDelete