Hey guys,
I've developd a serializability implementation for MySQL Cluster(NDB Cluster) and you are invited to peer-review it for me. I believe it is the fifth one in commercial database systems after: MySQL InnoDB's
2PL, PostgreSQL's Serializable Snapshot Isolation, Google's Spanner's isolation level(I gave a proof in Appendix D of my article, the google guys
may not have known this), CockroachDB's timestamp-based serializability implementation. The aim is to solve consistent, large(usually implies
a distributed architecture) and performance-boosted database applications, which is daunting for those who care about consistency and
serializability. This solution to the serializability problem is a 2nd-tier one, which means it doen't require any coding. So as long as you can
manage a MySQL Cluster, you can readily deploy and test your application with it.
This on-going project is hosted @(URL Removed by Staff)
I also set up a discussion site @ (URL Removed by Staff), besides that of github's
I am posting this here since Google's Spanner is also discussed in my article.
Come check it out if you are interested. Your help is highly appreciated!
The github site for my project is: https://github.com/creamyfish/conflict_serializability
It seems I can't leave a link in this forum. Please search for the following keyword to id my project at github:
creamyfish/conflict_serializability
The subsection 'Durability of consistency' will be pushed up in a few days!
The subsection 'Durability of consistency' has just been pushed up to my github site. Serializability guarantees consistency only if application execution finishes. This subsection, on the other hand, deals with various failure scenarios during execution: system-wide crash, replication failure, split-brained situation. I am trying to provide more confidence to MySQL Cluster users who are willing to adopt my method.
Section 'Type D of the Serializability Theorem' will be pushed up in a few days. It's a major update!
The section 'Type D of the Serializability Theorem' is up. This starts the second part of the article. In the first part, we've tried to use type B of the Serializability Theorem to interpret conflicts down to field level, hoping conflict-based resource contention would be minimized. Unfortunately, the existence of tuple version in type B implies serialization of its updates, even if two updates write non-overlapping subsets of the tuple; this further implies tuple locks in a lock-based system and isolation rules like First-Committer-Wins in a lock-free system. This is absolutely unnecessary for the writes we've just mentioned, but we can't get around it since this generation of OLTP systems are still tuple-based.
Type C and type D of the Serializability Theorem is based on a tuple-version-free, field-version-only database system we hope future generation of OLTP systems will implement upon. If that could be assumed, the dream of minimizing conflict-based resource contention would become a reality. We'll try to provide a road map to bring this dream to life in the next three sections, with this one being the foundation of the latter two.
The section 'Generic generalization of Serializable Snapshot Isolation to a distributed database system' will be up in a few days. It is a major update!
The section 'Generic generalization of Serializable Snapshot Isolation to a distributed database system' is up.
This section paves the way of generalizing Serializable Snapshot Isolation to a distributed database system that can take advantage of type D of the Serializability Theorem. Then a technical path utilizing First-Committer-Wins rule is explored. Another technical path employing First-Updater-Wins rule need to be delayed until next section since it requires a field level capable locking system for type D of the Serializability Theorem to apply. The generalization is generic because of the following reasons: a generic 'Clock Condition' is explored so that various distributed clock algorithms satisfying it can be deployed for the underlying Snapshot Isolation; various ways of implementing the underlying Snapshot Isolation are demonstrated so that an implementer can choose from or be inspired; a flag system to eliminate 'dangerous structure's in Cahill's Serializable Snapshot Isolation algorithm is explored so that an implementer got another option other than the hybrid system deployed in PostgreSQL. A characteristic of this technical path is that it doesn't depend on a locking system.
In the case a field level capable locking system is available, besides the First-Updater-Wins rule technical path for Serializable Snapshot Isolation, we may also implement a pessimistic technology system and a hybrid system on top of it. If things are smooth, I will present all these in about 6 months. By then, we'll have four technical paths to achieve serializability in a distributed database system and I'll recommend them to relevant distributed database system implementers, hoping that they could implement at least one of them at 3rd-tier to give us more options at 2nd-tier.
But I don't know if or when this could happen. Even if it would, the serializability implementation I've developed for MySQL Cluster might still trump in performance for the following two reasons:
Most of these relevant distributed database system are disk-based, while MySQL Cluster is memory-based.
The serializability implementation I've developed for MySQL Cluster is based on a READ-COMMITTED isolation level, which in general allows more concurrency than those based on Snapshot Isolation, like Serializable Snapshot Isolation.
So it looks like we need to hang in to 2nd-tier solutions for at least a while. To make this hang-in easier, I will review TiDB and mount a serializability implementation on their READ-COMMITTED isolation level in a few weeks. I will also provide more details on mounting such an implementation on MySQL InnoDB's READ-COMMITTED isolation level, hoping these trio will sail us through before dawn.
After I finish the major parts of this work, I will also provide better software so that this work is more accessible to 2nd-tier developers.