December 13, 2013

"ACID" Does Not Make Sense

OK, that title is a bit provocative, but it seems to have caught your attention. A more sensible title would have been "A Critique of the ACID Properties".

Even though I have studied databases for some time, I have always had a problem with the commonly referenced ACID concept. It never really made sense to me. At least as four independent properties. So I decided to study the roots of it.

The Transaction Concept in [Haerder] and [Gray1]

"ACID" was first mentioned in [Haerder], 1983 (Principles of Transaction-Oriented Database Recovery). Let me quote:
Atomicity. It must be of the all-or-nothing-type described above, and the user must, whatever happens, know which state he or she is in.

Consistency. A transaction reaching its normal end (EOT, end of transaction), thereby committing its results, preserves the consistency of the database. In other words, each successful transaction by definition commits only legal results. This condition is necessary for the fourth property, durability.

Isolation. Events within a transaction must be hidden from other transactions running concurrently. If this were not the case, a transaction could not be reset to its beginning for the reasons sketched above. The techniques that achieve isolation are known as synchronization, and since Gray et al. [1976] there have been numerous contributions to this topic of database research [Kohler 1981].

Durability. Once a transaction has been completed and has committed its results to the database, the system must guarantee that these results survive any subsequent malfunctions. [...].

These four properties, atomicity, consistency, isolation, and durability (ACID), describe the major highlights of the transaction paradigm, which has influenced many aspects of development in database systems. We therefore consider the question of whether the transaction is supported by a particular system to be the ACID test of the system's quality.
The authors of [Haerder] says that the section quoted above relies on the concept of a transaction in [Gray1]. In that paper, Jim Gray writes: "The transaction concept derives from contract law". And further:
The transaction concept emerges with the following properties:
Consistency: the transaction must obey legal protocols.
Atomicity: either it happens or it does not; either all are bound by the contract or none are.
Durability: once a transaction is committed, it cannot be abrogated.
Haerder added "Isolation" to the previous three properties introduced by Gray.

Trying to make sense

Neither [Gray1] nor [Haerder], provide an exact definition of the transaction properties and I believe this was never the intention of the authors.

I have two main issues with the standard use of the term "ACID" as defined by [Haerder] (or Wikipedia).
  1. The ACID properties are not well-defined. There are multiple interpretations. A more stringent formal definition would be useful. These properties should and can be defined with mathematical preciseness.
  2. My preferred interpretation of "atomicity" would imply the "isolation" property.
I interpret the all-or-nothing notion of atomicity to mean that a reader of the database will see either all or none of the updates that a transaction makes to the database. Let's call this interpretation Atomic1.

The current text from Wikipedia on Atomicity says:
In an atomic transaction, a series of database operations either all occur, or nothing occurs. A guarantee of atomicity prevents updates to the database occurring only partially, which can cause greater problems than rejecting the whole series outright. In other words, atomicity means indivisibility and irreducibility.
The text says that either all operations occur or nothing occurs and that atomicity means indivisibility. To me, this wording imply isolation of transactions since a transaction cannot see the partial updates made by another transaction.

Another interpretation of atomicity (lets call it Atomic2) is that a transaction either completely survives a database crash or is completely ignored when the database restarts. While the database is running atomicity (Atomic2) means that eventually all updates of the transaction are visible to all readers or none of them is visible to any reader. However, while the transaction is running, readers may see uncommitted updates. With this interpretation of atomicity, the isolation property makes sense. We could then have transactions that have the atomicity property, but not necessarily the isolation property.

With the Atomic2 interpretation, the isolation property is meaningful and the ANSI SQL isolation levels are useful. However, there are multiple problems with those. See [Gray2]. The definitions of the isolation levels are ambiguous and and the set of isolation levels are biased towards pessimistic concurrency control (locking). These isolation levels are sometimes not useful for optimistic conconcurrency control.

The simplest solution would be if databases that claim to support transactions would always use the isolation level SERIALIZABLE. Then, the isolation property can be dropped and ACD (Atomic1 interpretation) would be the only properties of interest. There would be no inconsistent reads: Dirty Reads, Non-Repeatable Reads, and Phantom reads. This would obviously make the world a little bit simpler for us programmers. I will probably write about the feasibility of not relaxing the isolation property of transactions in a later blog post.

The database I am working on, BergDB, only supports the strongest isolation level: SERIALIZABLE. Actually, the only way to change the state of the database is through strong transactions. With optimistic concurrency control (STM), there is little to gain by introducing the read issues that can occur due to lack of isolation between transactions (Dirty Reads, Non-Repeatable Reads, and Phantom reads).

ACD transaction

Let us consider the original proposal by Gray with three essential properties of a transaction: atomicity, consistency, durability. Let us further consider a database with a global state that evolves over time from one state to the next by executing a transaction. Such transactions are easy to define in a precise way. Below, a transaction, f, is defined as a mathematical function that takes the database from one state to the next.


The first "property", atomicity, is part of the definition of the database and transactions. It implies that a reader can only read the distinct database states S0, S1, ... There is no visible state between those states. The other two properties are independent. We may have a transaction that is not consistent or one that is not durable. However, according the definition, it would not be a "transaction" then.

The above definition of durability is intentionally vague. It is an important concept worthy its own discussion. There is no such thing as a 100 per cent durability guarantee. Data could always be lost. Instead, the probability that a transaction result can be read in the future is a more accurate (but more complicated) model.

Conclusion

In my opinion, the original definitions of the ACID properties of database transactions are not well defined and confusing. There are multiple interpretations, and there seems to be no precise mathematical definition of them (please tell me, if you know one). I would prefer to work with the original three properties proposed by Gray: atomicity, consistency and durability. This set of properties can easily be defined in a precise way. An attempt to do so was made in this article.

References

[Gray1]  The Transaction Concept, Jim Gray, 1981.

[Gray2] A Critique of ANSI SQL Isolation Levels, Berenson, Bernstein, Gray et al, 1995.

[Haerder] Principles of Transaction-Oriented Database Recovery, Haerder and Reuter, 1983.

3 comments:

  1. Thanks for sharing nice post i will share this
    Properties

    ReplyDelete
  2. Some people may have skin that is too sensitive to use a product containing both glycolic acid and retinol so be careful when using this type of product and test in a small area, perhaps on the inner arm, before applying to your face.
    sizegenix reviews

    ReplyDelete
  3. I see this is an old blog, but I'd love to hear views on Consistency. It is variously defined as being about integrity (but that is wrong because integrity constraints are dealt with elsewhere by the DBMS), enforcing business rules (that is wrong, a transaction can violate business rules if you program it to do so - nothing in ACID will stop that) and legal states (the approach taken here). That leaves the question: What is a legal state? For me, consistency means that any read from the DB will see the data that the transaction intended it to see. Atomicity and isolation both work to ensure that happens by making sure reads don't see partial results. Consistency adds to atomicity (in my opinion) by requiring that all the work needed by a transaction is protected from being read.

    Do you agree?

    ReplyDelete