July 20, 2014

Database Mathematics

We need mathematical rigor in the theory of databases. The current situation is that we have a very weak foundation for database theory. Not even the basic concepts such as atomicity and durability have generally accepted definitions. Instead some ill-defined (yet often somewhat useful) concepts are taken for granted without clear definitions.

As mentioned before in post ACID Does Not Make Sense, the concept of ACID transactions is ill-defined. This blog also has a post (The CAP Theorem Is Not a Theorem) on the so called CAP Theorem where I agree with Mark Burgess that the so called CAP Theorem is not a theorem at all. Brewers conjecture has not been proven with mathematical rigour.

Data storage is mathematics. I believe it is possible to describe the function of modern databases in rigorous mathematical terms using existing or mostly existing mathematics. I wish I had the resources and the skills myself to develop the much needed database mathematics (or "data storage theory", "data mathematics", "database theory").

The need for more rigor is even more important now when there is a more diverse set of database products available. Yes, I do talk about the NoSQL movement. The relational model of SQL databases has a useful mathematical background. However, a decreasing subset of databases use the relational model introduced by Edgar F. Codd in 1969.

Durability could be defined using probability theory. What is the probability that we can read a value written to a database after after one second, one hour, one day, and after ten years? What is acceptable?

Atomicity would also benefit from a mathematical treatment. With global atomicity and transaction with serializable isolation level (SQL speak), the database state evolves through a sequence of distinct states. A transaction is then a mathematical function that takes the database from one state to the next. I have a draft blog entry on the different types of atomicity that are used in common database products and programming languages. We will see if and when it will be published.

There is so much more to say and think about this. It would be interesting to get in contact with a mathematician who would be interested in making a contribution in this field. And maybe I missed something in the existing literature? Maybe there is significant work on this already?

2 comments:

  1. With traditional storage methods, it was difficult for companies to analyze their databases, requiring well-trained specialists. The larger the company, the longer it would take. With the data room virtual, you have the tools you need to sort through data quickly and smoothly.

    ReplyDelete
  2. All businesses are vulnerable to cyber attacks, especially the ones that don't have a special part of their budget intended for hacker attacks prevention.
    data room reviews

    ReplyDelete