From Transactional Consistency to Real-Time Consistency

Luigi Scappin
5 min readMay 2, 2021

Data consistency was designed for transactions, but nowadays it is also key for real-time data management.

An historical perspective

Why is data consistency a key requirement for implementing transactions?

Let’s take a classic example: if money is debited from one account, it must also be credited to another account. In addition we have to make sure that either both operations are successfully completed, or both are rolled back.

At the same time we need to avoid that a query running between the two operations may find the debit from the first account but not the credit to the second.

We have to make sure that data are always consistent, at any point in time.

This is key also for simple operations like changing the address of a customer in a database. Changing the city and the street should not be separated activities, otherwise a query running in between might happen to find the new city with the old street.

In the era of analytics and big data, companies started to create consistent and static copies of data into Data Warehouses and Data Lakes, usually for analysing and “extracting value” from data.

Nowadays the concept of “data driven company” is evolving beyond just “understanding” static data for making decisions, moving towards “using” real-time data to effectively do new business.

Building new modern applications that can leverage real-time ever changing data in a consistent way is going to be the new big challenge.

BASE is too code centric, ACID is not enough

During the last decade, BASE consistency became trendy among developers.

BASE stands for “Basically Available, Soft state, Eventually consistent”.

  • Basically Available: Rather than enforcing immediate consistency, BASE databases will ensure availability of data by spreading and replicating it across the nodes of a database cluster.
  • Soft State: Due to the lack of immediate consistency, data values may change over time. The BASE model breaks off with the concept of a database which enforces its own consistency, delegating that responsibility to developers.
  • Eventually Consistent: The fact that BASE does not enforce immediate consistency does not mean that it can never be achieved, especially if data is not frequently changing. However, until it does, data reads are still possible, even though the results may be inaccurate.

So BASE consistency model requires experienced developers who know how to deal with the limitations of the model, and for sure developers must spend a significant amount of time and effort in order to avoid inaccurate data reads.

Relying on the code of a developer in order to enforce data consistency of an application is really too “code centric”, while a “data centric” approach is getting more and more traction (please see my previous article “From Code Centric to Data Centric”).

I’m not advocating that BASE consistency is wrong “tout-court”, it can be good enough if data is not changing or at least not updating values but just appending new data, as it happens with social network feeds usually maintaining huge amounts of data in BASE consistent and very scalable databases.

But for most applications, though, BASE consistency is not enough, we need more data consistency, we need ACID consistency.

ACID stands for “Atomic, Consistent, Isolated, and Durable”.

  • Atomic: Each transaction must be either totally carried out or rolled back to before the transaction started.
  • Consistent: A query will always find only consistent data, at any point in time.
  • Isolated: Intermediate changes produced by a transaction still not completed cannot be visible by other transactions or queries.
  • Durable: Changes related to completed transaction will persist even in the cases of outages.

Sounds great, doesn’t? Yes, sure… but the devil is in the (implementation) details!

Implementing ACID capabilities without impacting on concurrency, parallelism, performances and scalability, can be very difficult.

The risk is to “serialize” all data accesses, limiting the concurrent access to data, thus impacting on performance as well as on scalability.

Very sophisticated row-level locking algorithms and isolation level methods have been patented by leading database vendors, and although some patents have expired, providing a solid and non-blocking implementation of an ACID database is still very challenging for any database vendor.

One of the challenges for ACID databases is the ability to scale-out. Scaling-out databases are usually not implementing ACID capabilities, or disabling them when available, in order to improve both performances and scalability.

However, leveraging modern low-latency network technologies and protocols like RDMA, database vendors have been able to implement grid architectures — like the one of Exadata — that are able to scale-out while keeping the ACID consistency.

Finally, as stated by the CAP Theorem, enforcing the consistency while keeping a reactive availability and tolerance to a partition fault doesn’t seem to be possible for a distributed system. Thus a consistent, reactive and highly-available database cannot be globally distributed, due to data gravity.

This is true, however in my article “How a Distributed Data Mesh can be both Data Centric and Event Driven” I presented a simple way to implement a distributed data mesh of consistent databases building an event-driven architecture based on event streaming and on the single writer principle.

Summary

Exploiting real-time data is becoming more and more critical for the business.

Using real-time ever changing data in a consistent way is going to be the big challenge in developing new digital applications.

Data consistency is back, evolving from transactional consistency to real-time consistency, leveraging the same technology fundamentals while improving scalability and distribution, as well as performance, semantics and access interfaces.

References

--

--