July 25, 2014

BergDB 2014.1

I made a release of BergDB today. bergdb-2014.1.zip can be downloaded from bergdb.com.

The release is focused on stability, clean code, and an improved API. The software is getting stable. There are no known issues and 650 units tests that say it works (85 per cent code coverage). The API is getter more stable between releases and the binary database file format is not changing much.

So, in my opinion, it is now production ready for many types of development projects.

July 20, 2014

Database Mathematics

We need mathematical rigor in the theory of databases. The current situation is that we have a very weak foundation for database theory. Not even the basic concepts such as atomicity and durability have generally accepted definitions. Instead some ill-defined (yet often somewhat useful) concepts are taken for granted without clear definitions.

As mentioned before in post ACID Does Not Make Sense, the concept of ACID transactions is ill-defined. This blog also has a post (The CAP Theorem Is Not a Theorem) on the so called CAP Theorem where I agree with Mark Burgess that the so called CAP Theorem is not a theorem at all. Brewers conjecture has not been proven with mathematical rigour.

Data storage is mathematics. I believe it is possible to describe the function of modern databases in rigorous mathematical terms using existing or mostly existing mathematics. I wish I had the resources and the skills myself to develop the much needed database mathematics (or "data storage theory", "data mathematics", "database theory").

The need for more rigor is even more important now when there is a more diverse set of database products available. Yes, I do talk about the NoSQL movement. The relational model of SQL databases has a useful mathematical background. However, a decreasing subset of databases use the relational model introduced by Edgar F. Codd in 1969.

Durability could be defined using probability theory. What is the probability that we can read a value written to a database after after one second, one hour, one day, and after ten years? What is acceptable?

Atomicity would also benefit from a mathematical treatment. With global atomicity and transaction with serializable isolation level (SQL speak), the database state evolves through a sequence of distinct states. A transaction is then a mathematical function that takes the database from one state to the next. I have a draft blog entry on the different types of atomicity that are used in common database products and programming languages. We will see if and when it will be published.

There is so much more to say and think about this. It would be interesting to get in contact with a mathematician who would be interested in making a contribution in this field. And maybe I missed something in the existing literature? Maybe there is significant work on this already?