CAP Theorem
The CAP Theorem is a classic problem in Distributed Systems, first postulated by Eric Brewer in (Brewer 2000), and formally proved in (Gilbert and Lynch 2002).
Definitions
For (Brewer 2000), a distributed system has some very specific properties and behaviour, being a collected set of Nodes that all share data. A limitation of such systems happens when a write request is followed by a read request.
┌────────┐ │ Client │ └────────┘ [R] ^ | [W] | v ┌−−−−−−−−−−−−−−−−−−−−−−−−−┐ ╎ System ╎ ╎ ┌────┐ ╎ ╎ ┌──────────│ a0 │──┐ ╎ ╎ │ └────┘ │ ╎ ╎ │ │ │ ╎ ╎ ┌────┐ ┌────┐ │ ╎ ╎ │ a3 │───────│ a1 │ │ ╎ ╎ └────┘ └────┘ │ ╎ ╎ │ │ ╎ ╎ ┌────┐ │ ╎ ╎ │ a2 │──┘ ╎ ╎ └────┘ ╎ └−−−−−−−−−−−−−−−−−−−−−−−−−┘
Read | Write |
---|---|
A client can read data from the system by talking to any Node | A client can write data to the system by talking to any Node |
The Theorem states that for any given pair of requests this kind of system can only guarantee two out of three properties:
- Consistency
- The system can read data that is (at least) as fresh as what has been just written.
- Avaliability
- Every request received by a non-failing node in the system must result in a response.
- Partition Tolerance
- The network will be allowed to lose arbitrarily many messages sent from one node to another.
Idea of the Proof
Here is a sketch of the proof provided in (Gilbert and Lynch 2002), consider the same system as before, but focus on a pair of requests (R/W) and two nodes.
[W] ┌──────────────────────────┐ v │ ┌−−−−−−−−−−−−−−−−−−−−−−┐ │ ╎ System ╎ │ ╎ ╎ │ ╎ ┌────┐ ┌────┐ ╎ [R] ┌──────┐ ╎ │ a0 │ ────── │ a1 │ ╎────>│Client│ ╎ └────┘ └────┘ ╎ └──────┘ ╎ ╎ └−−−−−−−−−−−−−−−−−−−−−−┘
Assume, via contraction, that such a system does follow all CAP
properties. Given a network partition between a0
and a1
, the followig will happen:
[W] ┌──────────────────────────┐ v │ ┌−−−−−−−−−−−−−−−−−−−−−−┐ │ ╎ System ╎ │ ╎ ╎ │ ╎ ┌────┐ ┌────┐ ╎ [R] ┌──────┐ ╎ │ a0 │ ──//── │ a1 │ ╎─??─>│Client│ ╎ └────┘ └────┘ ╎ └──────┘ ╎ ╎ └−−−−−−−−−−−−−−−−−−−−−−┘
a1
will never receive an update froma0
, which will break its avaliability promises (by never delivering the new result)a1
sends stale data back to the client, breaking the consistency property.