BioHacker News | An illustrated proof of the CAP theorem (2018)

▲An illustrated proof of the CAP theorem (2018)(mwhittaker.github.io)

269 points by tripdout 269 days ago | 22 comments

CAP theorem is great, though it does omit latency. Which leads you to the logical extension, PACELC [1]:

If there is a network Partition you must choose Availability or Consistency, Else the tradeoff is Latency vs Consistency.

This offers practical intuition for certain design choices.

For example Google Spanner [2] is a distributed DB that offers globally consistent reads/writes, but to achieve high performance (low latency) nodes must synchronize with multiple extremely accurate reference clocks (GPS & atomic) and follow a complex two phase commit protocol that ensures transactional linearizability using lower bounds and uncertainty of timestamps.

As another example, your multicore CPU leverages a cache coherency protocol that faces remarkably similar tradeoffs. Perhaps others have made this connection before…it does feel like some sort of universal law of physics.

[1] https://en.m.wikipedia.org/wiki/PACELC_theorem

[2] https://static.googleusercontent.com/media/research.google.c...

▲danielheath 268 days ago

Is a Partition meaningfully distinguishable from especially high Latency?

▲refibrillator 268 days ago

In theory no, a network partition is indistinguishable from infinite latency.

Though in common use I think “latency” tends to imply a response and measurable timing.

▲ 268 days ago

▲shepherdjerred 268 days ago

CAP doesn't omit latency. Availability essentially is latency.

PACELC only says that even during normal operation you still need to make tradeoff between availability and latency whereas CAP only deals with a tradeoff during a partition event.

▲tux3 268 days ago

The A in CAP doesn't mean what people think it means, it has nothing to do with nodes being up, or crashes, or latency, or SLA, or any abnormal dysfunction.

Availability in CAP is purely a software decision, it's an algorithmic property. Roughly speaking, it is the decision to continue accepting requests during a partition, possibly sacrificing Consistency. If your refuse requests during a partition, you conserve Consistency, and you lose Availability. If you keep accepting requests during a partition, it's the opposite.

High latency is considered a partition in CAP. Any hardware error, any network trouble, any bug, crash, any unreachable machines is never an Availability issue.

▲shepherdjerred 268 days ago

> If your refuse requests during a partition, you conserve Consistency, and you lose Availability. If you keep accepting requests during a partition, it's the opposite.

I agree

> High latency is considered a partition in CAP.

Can you support this claim? I don't think this is true unless you're specifically talking about high latency between nodes rather than latency between the sender and the system.

▲tux3 268 days ago

>high latency between nodes rather than latency between the sender and the system

Yes, that's what I meant. To clarify, latency between the sender and the system would still not be an availability issue, but it also wouldn't be a partition. It's just out of scope of the CAP theorem. The user might be on a bad mobile connection, but it doesn't impact the distributed system itself.

▲bearbearfoxsq 268 days ago

I think the CAP 12 Years Later [1] paper covers this aspect of what latency means versus a partition. It describes that at some point in time a designer has to decide whether nodes have waited long enough and should return a response (A) or that they should wait until (potentially forever) before a response (C)

▲rtpg 268 days ago

Availability's definition:

> every request received by a non-failing node in the system must result in a response

I don't believe that encodes latency

Instead, here's consistency:

> any read operation that begins after a write operation completes must return that value, or the result of a later write operation

"after a write operation completes" feels like where latency kicks in? Because within that space you can play around with completing a write operation to get what you want.

▲shepherdjerred 268 days ago

It's impossible to distinguish between high latency and unavailability. You can model unavailability as infinite latency.

In a real system you can wait for a partition event to resolve itself before replying which preserves consistency at the cost of latency (availability).

▲nextaccountic 268 days ago

> In a real system you can wait for a partition event to resolve itself before replying which preserves consistency at the cost of latency (availability).

This is true, but PACELC states that even if there is no partition you still have a consistency vs latency tradeoff (because the processes that guarantee consistency eat up latency in form of network roundtrips)

▲shepherdjerred 268 days ago

100%, I don't think what we're saying conflicts. There is _always_ a tradeoff between consistency and availability whether you're thinking of PAC or PACELC. PACELC just makes it explicit that this tradeoff exists whether you're partitioned or not.

▲Retr0id 268 days ago

I'm sure I'm far from the first, but I've also made that connection before. I think I phrased it "modern CPUs are just very small distributed systems"

▲ojkelly 268 days ago

> Perhaps others have made this connection before…it does feel like some sort of universal law of physics.

This has been on my mind too, and I can’t help but think the fundamental concept underpinning it all is causality.

Reading about “consistency as logical monotonicity” — CALM [0], after diving into Spanner, there’s definitely something to databases beyond CAP.

I’m yet to find a simple and clear law or theorem that captures what you’re hinting to, but it feels like we must be getting close. This has been bouncing around my head for a few years since I first wrote a toy CRDT DB.

It seems to show up anywhere we have more than one system with independent memory (a place where state can be held), needs to maintain a shared representation or fact about something.

Within one memory (akin to a reference frame in quantum physics), we can do anything we want. We can mutate state without any interaction or interference from the outside world. This also sounds like a pure function. By itself, this is academic, theoretical—it does not, it does not exist. If a tree falls in the woods.

So if we want to interact with any other systems, we then need to coordinate, and the question is how.

The issue and pattern seems to rhyme everywhere. CPUs, HDD, SSD, file systems, networks, UIs, people, teams, etc. The best possible collaboration seems to be that which requires the least coordination. Money is a good example here, someone can string together a series of products from companies who know nothing about each other, by coordinating with money - as a means of exchange. Not to mention being able to buy complex technology with no idea the supply chain behind it. I don’t have to coordinate with a mine to use a computer, which contains something from said mine.

It sort of looks like distributed databases build a virtual memory on top of many smaller memories, which need to share some parts with each other to protect the system as a whole.

New information may need a few writes before it can be considered a new fact. I think this is an implementation detail, in that it’s irrelevant to the observer (who has no control over it).

This isn’t eventual consistency, which is perhaps the cruder form of this where the implementation detail above is wide open for all to see. Instead new information is available immediately, just your definition of immediately is different to the databases.

It follows then, that as an observer of the system you cannot violate causality in any way by learning information from the future, while they are still in the past.

My understanding from Spanner is that when you ask for a read, you provide or are issued a timestamp which provides the causal information to determine what you are allowed to know.

The system can then maintain both a true and correct representation of what it knows, and an incomplete working memory of what it is trying to know next (the writes which need to be committed into multiple memories).

Memory being anything from ram, ssd, carrier pigeon, lines in sand, etc.

I think where this breaks most of our brains is that it’s a universe simulation.

And both time and causality are fundamental invariants of the stability of the system, but are leaky abstractions that we need to deal with.

In CALM this is abstracted into what is effectively entropy. If your program never moves backward in time/loses entropy it’s CALM (I think). In earlier works I think Lamport and vector clocks were used, in Spanner it’s based on very precise measurements of our own time, where the maximum speed of information (ie speed of light) is the greater of the smallest unit of time we can measure (the precision of the GPS clock) and the time it takes for new data to become available in the system.

The other part where this differs from the read world is that the speed of information, the latency of a request, is different for reads and writes. Not true in the quantum world where an everything is a wrote (I think). Then, consider that in our own reference frame we can do a huge amount of work while waiting for a db read/write, something that would violate the speed of light if not in our virtualised world.

We cannot break causality in the world we live and breathe in, but we do it all the time in our simulated ones.

[0] https://arxiv.org/abs/1901.01930

▲kstrauser 268 days ago

It “feels” to me like the uncertainty principle. Think of availability as an interval of time by which all nodes have to be connected. If you set A high enough, sure, you can have both CP to your heart’s delight. As A shrinks, you lose the ability to have both C and P and have to pick one. It’s something like CxP/A>n, where n is a constant within a system.

▲mrfox321 268 days ago

simultaneity is relative.

That's all.

▲stevebmark 268 days ago

I prefer the walkthrough of the CAP theorem in “Designing Data Intensive Applications,” which says that the CAP “theorem” is an undefined and generally unhelpful concept in distributed systems. “Available” has no formal definition.

And it’s confusing that it’s three letters, because it’s not “choose two”. It’s not that a system is “partition tolerant,” it’s if there’s a network partition, you choose availability or consistency. And obviously you choose availability for the distributed systems you most commonly encounter.

▲yas_hmaheshwari 268 days ago

I also like the fact that how he says that that theorem was a good theorem for the time but is now not that relevant

Maybe PACELC is better theorem, maybe not - but I guess three letter acronyms would always rule better than a six letter one

▲sethammons 268 days ago

Esp since CAP is easily pronounced. "PACELC"? Is it "Pace-elk"? "Pak-Elk"? "Pacel-C"?

▲chuckadams 268 days ago

I think you got it with the third one. Picture an elk wearing saddle bags. A Pack Elk.

▲dijit 268 days ago

> And obviously you choose availability for the distributed systems you most commonly encounter.

fail safe (rather than fail open) is generally regarded as the most acceptable thing we can do as CRUD programmers; that’s true, however it’s categorically untrue that this is a foregone conclusion.

There are many cases (especially in financial services) were failing safe is a much preferable option and having retry logic in application is much preferred

▲stevebmark 268 days ago

In distributed banking back-ends, if one of your datacenters goes down, you don't take down the other one for safety. You don't force global consistency/linearity of all transactions before allowing UI updates. There are delays in financial reconciliation all the time, the important thing is they are eventually consistent with a ledger, not that if one thing fails, you stop the train. And the reality of distributed systems is things fail constantly. Hard drives, networks, bugs, clock drift...

This is in contrast to something like a supercomputer, or a distributed map-reduce job, where if one node fails as part of a distributed process, it will corrupt your data, and you have the luxury to stop the whole thing, fix the issue, and restart the whole process.

▲tajd 268 days ago

For the interested reader you can see a paper talking about this by Martin Kleppmann https://arxiv.org/abs/1509.05393 - the author of DDIA.

▲jeremyjh 268 days ago

The formal definition for Available is given in TFA.

▲stevebmark 268 days ago

"No formal definition" includes where we are today, not the original, outdated idea. By the original (not useful) definition, an "available" distributed system can return a response 10 years later, which is not helpful nor relevant when thinking about distributed systems.

▲jeremyjh 268 days ago

What has changed that makes the idea out-dated? You can argue that it was always unhelpful but I can't see how it could be out-dated.

You're correct that the theorem doesn't address latency requirements. There are all kinds of things it doesn't address. The point of it is simply as you say - you must give up consistency to have availability in the face of a partition, or vice-versa. Some vendors of distributed systems would have us believe otherwise. This theorem gives us a framework to understand at a very basic level the trade-offs that must be made in distributed system implementations. That isn't really very much for as much airtime as it got, which I suspect is what your contention is with.

▲er4hn 268 days ago

Availability is similarly poorly defined for CVSS scoring, i.e. what gets used to decide how bad a CVE is.

▲puzzledobserver 268 days ago

One thing that I've always wondered about: Is the CAP theorem really making a non-trivial claim?

If a distributed system is consistent and available, it must return the last written value, and successfully do this all the time. How can it do this if the system is partitioned, and the node receiving the last write was unable to propagate this to other nodes?

The proof described in this website appears to confirm this. Am I missing something?

▲wesselbindt 268 days ago

You're not missing something. In the original paper the proof is half a page long (and still worth a read, very low investment, decent payoff). I think the value of the CAP theorem is not so much in its statement (CP or AP, choose one), but rather in its proof. It helps you cook up similar arguments/reasonings for more interesting situations and constraints.

▲dijit 268 days ago

I think it’s also very interesting to point out that very often people try to choose all three but end up choosing only one inadvertently.

We should also point out that it’s not a foregone conclusion that you will even be able to achieve two of three.

CAP theorem really is a best case scenario.

▲colmmacc 268 days ago

Long before the CAP theorem paper I remember learning the same thing in the context of routing protocols like BGP and it being an "aha moment". It really helped me understand that there is no single version of the state of distributed systems and that it is all local and a matter of perspective under a partition. When the paper came out I remember thinking how trivial and obvious the conclusion was, and it confused me that it was getting attention ... but in retrospect the reformulation and acronym helped bring understanding to a far broader audience, especially software people. That's really good impact and effectiveness and I wish more academic papers were focused on this.

▲mrkeen 268 days ago

> How can it do this if the system is partitioned, and the node receiving the last write was unable to propagate this to other nodes?

I think this is pretty spot on. You could suggest a workaround that a write must have reached a quorum of nodes before it can be considered 'written'. But then you can just make another hypothetical and find a new problem with that solution.

> Is the CAP theorem really making a non-trivial claim?

People still claim to beat it every now and then: https://news.ycombinator.com/item?id=39676493

▲vzaliva 268 days ago

The text left me wanting a more formal treatment of the theorem. I went ahead and found a paper which does just that:

https://dl.acm.org/doi/10.1145/564585.564601 PDF: https://users.ece.cmu.edu/~adrian/731-sp04/readings/GL-cap.p...

▲fithisux 268 days ago

I was always wondering the same. Thank you for the references. Above I asked for a mechanized proof.

▲magnio 268 days ago

The "proof" is kind of weird. We assume there exists a system that has all three of CAP, but how can we assume that system has the layout in the post with two servers and one client?

▲sushibowl 268 days ago

The proof is not really formal, but you could view the shown system as a minimal subset of an arbitrarily shaped large system.

For the system to be distributed it must have at least two nodes, and to be available all nodes must respond to requests. So however the rest of the system is shaped, the proof still holds.

▲dmurray 268 days ago

The minimal subset is too small. It should have a second client, to avoid solutions like "the client stores a copy of all the messages it sent, so it can detect the stale value".

▲jeremyjh 268 days ago

The theory has nothing to do with what the client knows. It’s only concerned with the server’s responses.

▲lfmunoz4 267 days ago

I guess by respond, it must respond with a value, it cannot response by saying "i don't know"

▲evertedsphere 268 days ago

what if the graph remains connected when you remove that link, so the two nodes can communicate through other nodes?

▲dmurray 268 days ago

Partitioning is defined so that all, or at least arbitrarily many, of the communication links may be down.

In practice you're absolutely right and one approach to distributed systems is to make partition tolerance less important by having lots of redundant links instead.

▲jascha_eng 268 days ago

This is trivial but the problem is that especially consistency is not a binary choice. Heck even non distributed systems can give you varying consistency guarantees it's the whole point of isolation levels of most RDBMS.

It's good to visualize that there is a trade off to be made. However that trade off does not have to be binary. You can get a bit more consistency for a little less partition tolerance or availability. All of Designing data intensive applications is about those trade offs.

▲dilyevsky 268 days ago

Consistency in CAP and Consistency in ACID have entirely separate meanings.

▲gpderetta 268 days ago

The C in ACID has a different definition of Consistency, yet the combination of guarantees given by ACID should also imply Consistency in the distributed system sense, right? I.e. a distributed database that claims to be ACID cannot sacrifice consistency [edit: at least for some isolation levels].

▲anonymousDan 268 days ago

Isolation (the I in ACID) is more closely related to the notion of consistency in the distributed system community.

▲anonymousDan 268 days ago

Consistency in the case of CAP refers to linearizability.

▲dmitry-vsl 268 days ago

So, it's impossible to transfer a value from one machine to another if there's no network connection between them? How did this extremely trivial observation become a well-known theorem?

▲crdrost 268 days ago

So, when it was originally phrased, the primary thing that you would have learned about databases, was that they enable ACID transactions. (And if you were a practitioner you would have learned about the various isolation levels and dirty reads and dirty writes.)

But if you wanted to go from this level to implementation, typically you could get a prototype working, but it would be slow. When things are slow, the WD-40 of the programming world is to add a cache. And this is where you get the quip that there are only two hard problems in computing, cache invalidation and naming things. (The “and off-by-one errors” is a later addition.) The problem is that cache invalidation shows up as consistency bugs in your database. Someone does a rare combination of commits and rollbacks and some cache doesn't get wiped quite as fast as it needs to, or is wiped overoptimistically causing a pull from uncommitted data, and your isolation level has dropped to READ UNCOMMITTED.

The CAP theorem was originally raised as a conjecture, something like, “once you shard the database, I don't think there's any way to solve these cache problems without one of the replicas just going silent for arbitrarily long pauses while it tries to at least partially synchronize its cache with the other shards.” Phrased that way, you can understand why it was a conjecture, it relies on nobody at MIT having a super clever way to deal with caches and cleverly routing the synchronization around the sharding.

BUT, you can make that statement for many different reasons, and this was not for Pedagogical Reasons, its point was rather Evangelism! The author was attempting to introduce the idea of Eventual Consistency, and gain adoption by ditching all of the wisdom about ACID transactions. This antagonism was deliberate, eventual consistency became the E in a rival acronym BASE. And so the argument was that we could explore a new corner of design-space.

It was later that someone decided they could prove it by coming up with a universal subgraph, “whatever connection you've got, it has to contain this: two nodes, fighting over one value, with a network connection possibly passing through other nodes but we can abstract all that away.” And then you have a proof, and then you have a bunch of people comparing the proof to the stated claims of various database vendors, and finding that over and over they claim to be both shardable with high availability among the shards, and to support ACID transactions that keep everything consistent. It turns out those statements are usually made assuming a happy path!

(You also get Paxos and Raft, “here is how to get consistency without arbitrary latency on two-phase commit via majority vote”, and the Jepsen blog “you said this had consistency level X, let’s fuzz it and see if we can generate a counterexample”, and some interesting exceptions like Datomic saying “this one part is not scalable and it's a single point of failure to sacrifice P for the CAP theorem’s sake, but in exchange we can simplify our C and A guarantees so that you can scale the reads of the system consistently.”)

▲lmm 268 days ago

Because, sadly, a lot of system designers want to believe they have a way around it.

▲whatshisface 269 days ago

How about this:

1. A read is returned with a map of the network the replying node has become consistent with.

2. A client that is not satisfied with the map can read again from the other partition.

These two steps get around the proof by changing one of its assumptions (that the operation to read a value can only involve a request to one node).

▲anderskaseorg 268 days ago

You’re certainly allowed to make the client a more active participant in your consensus protocol, but then it needs to play by the same rules if you want the system to have guarantees. For example, you need to handle network partitions between clients and some servers, and you need to be able to reconcile multiple reads from servers that might have seen different sets of writes. The CAP theorem still applies to the system as a whole.

▲satisfice 269 days ago

How can it know what it is consistent with? Anything could have happened since the last update. All it knows is that it most recently got an update at a certain time. It may or may not be consistent.

▲Dylan16807 268 days ago

Do you have a plan for doing writes in that scenario? It's not proper availability without writes.

If the partitions are configured for consistency, and they can't hear from each other, then at most one will accept writes. If the client relays metadata between the partitions to make the write happen in both, then you don't actually have a partition anymore.

Also in practical terms, the nodes almost always have better contact with each other than the client has to the nodes. The situation where the client can connect to both sides of a partition is unlikely enough that if you add code for it you're probably getting negative value. It wastes time and can add scary bugs.

▲fallingsquirrel 269 days ago

Consider the following sequence* of events:

1. client A reads from partition X

2. client A is unsatisfied with the network map, and requests another read from partition Y

3. (meanwhile) client B writes a value to partition X

4. client A reads from partition Y, sees the value is the same as partition X's (stale) value, and accepts the value as consistent

This is the same kind of behavior you might get if your servers used e.g. a buggy version of Raft or something. You can't get around the proof by just relabeling some of your server nodes as client nodes.

* In the spirit of distributed systems, I use this term loosely :)

▲Dylan16807 268 days ago

The value is now stale, but it was correct at some point between the read starting and the read finishing, right?

That can happen with any read. Even without any partitions. Can't it?

In that case I don't see the problem.

▲lmm 268 days ago

> The value is now stale, but it was correct at some point between the read starting and the read finishing, right?

Not necessarily. Maybe both versions of it were from partial writes that were never committed, so your invariants are violated (if we're talking about e.g. a credit account A and debit account B scenario).

> That can happen with any read. Even without any partitions. Can't it?

Depends on your isolation level. If your system has serializable transactions then it's supposed to give you a history equivalent to one where all transactions were executed serially, for example.

▲Dylan16807 268 days ago

> Not necessarily. Maybe both versions of it were from partial writes that were never committed, so your invariants are violated (if we're talking about e.g. a credit account A and debit account B scenario).

I'm pretty sure the scenario above is looking at committed writes.

If you're reading uncommitted writes, you're not really in the market for consistency to being with. (Or you could be handling consistency by waiting to see if the transaction succeeds or fails, making sure it would fail if the data you read got backed out. But in that situation nothing goes wrong here.)

> Depends on your isolation level. If your system has serializable transactions then it's supposed to give you a history equivalent to one where all transactions were executed serially, for example.

Even then, a new write can happen before your "read finished" packet arrives at the client, making the read stale. Your entire transaction is now doomed to fail, but you won't know until you try to start committing it.

For pure read operations, I'm not convinced it's a proper stale read unless the value was stale before the read operation started.

▲lmm 268 days ago

> I'm pretty sure the scenario above is looking at committed writes.

> If you're reading uncommitted writes, you're not really in the market for consistency to being with.

But what does "committed" mean when you're only reading from one partition in a partitioned scenario? You literally can't tell whether what you're reading is committed or not (or rather, you have to build your own protocol for when a write is considered committed).

> For pure read operations, I'm not convinced it's a proper stale read unless the value was stale before the read operation started.

I think you can get a read that is half from a prior stale operation and half from a subsequent uncommitted operation, or something on those lines.

▲Dylan16807 268 days ago

> But what does "committed" mean when you're only reading from one partition in a partitioned scenario?

I would say you can't make new commits in that situation? I don't know, I didn't make up the scenario, I think you need to figure out your own answer and/or get clarification from fallingsquirrel if you want to talk about that kind of problem.

> I think you can get a read that is half from a prior stale operation and half from a subsequent uncommitted operation, or something on those lines.

What's the full timeline for that?

If you're specifically talking about the ABA problem, that's trivial to fix with a generation counter.

▲lmm 268 days ago

> I think you need to figure out your own answer and/or get clarification from fallingsquirrel if you want to talk about that kind of problem.

Right. The point is that fallingsquirrel hasn't solved the hard part of the problem.

> If you're specifically talking about the ABA problem, that's trivial to fix with a generation counter.

Maybe, but you need to specify that that's what you're doing, and it may come with undesirable consequences.

▲Dylan16807 268 days ago

> The point is that fallingsquirrel hasn't solved the hard part of the problem.

Sure. My point is that I don't see any issue with the read they described, not that I think there is a solution to partitioning here.

▲ 268 days ago

▲kentosi-dw 268 days ago

I was a little surprised that they author went though such detail to explain Consistency, and yet Availability was just 2 sentences and a quote...

▲mrybczyn 269 days ago

Thank you! That makes a lot more sense than several other descriptions of the proof I have read (and promptly forgotten!)

▲dweinus 268 days ago

> "Let's consider a very simple distributed system. Our system is composed of two servers, [Math Processing Error] and [Math Processing Error]."

Yup. Pretty sure I've built that system before. Can relate.

▲dang 268 days ago

An Illustrated Proof of the CAP Theorem - https://news.ycombinator.com/item?id=17528817 - July 2018 (71 comments)

▲ljsprague 269 days ago

Is availability sacrificed more often than the other two?

▲shepherdjerred 268 days ago

It depends purely on the application you're writing. Many applications prefer availability over consistency, but the application is built to cope with inconsistent data.

▲wmf 268 days ago

Properties are only sacrificed during partitions. Due to Murphy's Law, if you assume partitions never happen they will happen more often but if you can tolerate partitions they happen really rarely.

▲anderskaseorg 268 days ago

There are plenty of systems that sacrifice consistency even while the network is fully connected, in the name of performance—for example, DNS, or any system with a caching proxy server.

▲wmf 268 days ago

Yeah, CAP is about the best possible behavior a system can have but you can always do worse.

▲shepherdjerred 268 days ago

That is not true with PACELC

https://en.wikipedia.org/wiki/PACELC_theorem

▲shepherdjerred 268 days ago

To clarify I’m not saying PACELC invalidates PAC, just that it expands to consider the case where there isn’t a partition.

▲anonymousDan 268 days ago

The only way to sacrifice P is to have have a single node (i.e. non-replicated) system. So in practice it's a choice between C and P.

▲Groxx 268 days ago

Not really. Both are all over because they address vastly different (literally incompatible) needs.

There is a fair bit of industry hype in the past decade or so around eventually consistent systems, because a lot of components of a business can use them, and that opens up a lot of extremely desirable performance options that can be fine tuned to the moon. But they're a real nightmare to use (i.e. often literally impossible) for anything that needs to be correct, so they are often just used in addition to the more classical consistency-oriented databases like postgres.

▲fithisux 268 days ago

Is there a formalization of this proof? Lean, Coq, .... ?

▲lysecret 268 days ago

I always think of capital asset pricing theorem first.

▲ 268 days ago

▲peter_d_sherman 268 days ago

>"G2 returns v0 to our client after the client had already written v1 to G1. This is inconsistent."

This is absolutely true IF AND ONLY IF client and servers (and data) have no notion of time...

If client and servers are say, synchronized to a central clock, and do include a timestamp with any data updated, and include timestamps with any response/"done" messages and check those timestamps for accuracy (the client should perform timestamp checking as well as the servers), and if the client checks timestamps of response messages from different servers, it could then check different servers that it would later connect with, to see if they had been updated appropriately or not.

If later-connected-to-servers that should have been updated were not appropriately updated, then the client (if client and server are exchanging data and timestamp information) could decide how to proceed at that point.

If a later-connected-to-server was determined to be inconsistent, then depending upon how the client code was written, it could do many things to mitigate the problem. It could notify other servers that it knows about of data inconsistency on the inconsistent server(s), for example...

Hmm, now that I think about it, on any distributed database, for any given row of data, there should not be one, but TWO timestamp fields, one for the time that the data was updated on the first server it was updated on, i.e., the one the client connected to.

The second timestamp field would be for the time that a secondary given distributed server received the data and processed it...

Those two fields could be separated by mere nanoseconds of time (if distributed servers are tightly coupled), but it could be up to weeks if a secondary server was knocked offline for a long enough time period...

But see, if the client software is software engineered properly, and contains the client's history, then the client (or main server, or server multiplexing proxy) can check the veracity of any arbitrary distributed server it connects to by asking it to "read me back this data", getting its timestamps and comparing with the local copies...

All of that coupled with databases that do not delete/overwrite old records when data is updated, but rather keep a log of all updates with a timestamp, aka "append-only" aka "immutable" database, such as InfluxDB, Apache Cassandra, CouchDB, SQL Server Temporal Tables, (and I think that RethinkDB may have done something like that in the past) should equate to, at the very least, the client being able to know if a given server in a distributed system was updated properly (is consistent), or not...

If it were engineered properly, the client itself could determine what to do under such anomalous conditions...

It could even take it upon itself to update the server properly IF it contained a copy of the correct most up-to-date data AND had the authority to do so... which wouldn't be the case for some types of applications (i.e., banking, due to necessary security constraints), but might be the case for other types of applications (i.e., a distributed social media app, where the client is posting to the user's own account and has total permission to update any of the user's data as needed...)

Maybe a better question to ask when dealing with the CAP theorem is not so much "is it true", so much as "what kind of distributed database requires consistency such that the CAP theorem is required?" (i.e., banking), and "what kind of distributed database doesn't require consistency such that the CAP theorem isn't required?" (i.e., distributed social media over a plethora of distributed servers)...

If we see that CAP is required in only some specifc types of database/application/database contexts, then perhaps those specific databases/applications/distributed systems -- should be individually separated from non-CAP requiring ones...

This being said, I think Michael Stonebraker is a brilliant Computer Scientist, and I have tremendous respect for him and the CAP Theorem...

▲nadyatolica 266 days ago

[dead]

▲arthurgrunewald 267 days ago

[dead]

▲crunchy55 268 days ago

[flagged]

▲raincole 268 days ago

What if the client just reads from every server and compares the timestamps?

▲shepherdjerred 268 days ago

It is possible but difficult at scale.

https://dl.acm.org/doi/pdf/10.1145/112600.112601

https://cloud.google.com/spanner/docs/true-time-external-con...

▲googledocsftw 268 days ago

Read from every server is not necessarily possible, unless we assume no network partitions.

▲coolThingsFirst 268 days ago

it violates the consistency rule. it must get the value that it was written

▲af3d 268 days ago

Unfortunately, timestamps are not immutable. The system clock can be miscalibrated or even deliberately manipulated. It is however something of an open question. CAP is still a bit lacking in terms of rigorous mathematical proof (albeit the arguments are pretty convincing).

▲DarkmSparks 268 days ago

The problem with the example is it never actually visualises a partition, because the client can always reach both servers.

an actual partition is when clients and servers are partitioned and cant talk to each other.

solving for the situation presented is fairly trivial, and is pretty much exactly what blockchain systems solve.