Skip to content

CAP is not just for your head.

August 24, 2013

Today i like to write about an important theorem in distributed computer systems. I’m sure you notice the subject of this post is about CAP  theorem (also known as Brewer’s theorem). Eric brewer is the man who proposed CAP theorem in 2000.

CAP is the acronym of three words:

Consistency: All nodes must read the latest changed data, In the other word every node in our distributed system should read same data. If a write operation occured in one of the nodes, Reading same data from the other node must return the latest write ( When system received something newer, then must not return any of the older data items )

Availability: There must not be any request what is blocked with any of nodes, All of the requests must have a response about the status of request.

Partition Tolerance: The system continues its convenient tasks even any of the messages lost or there are some failure parts in system.

CAP theorem is about the impossibility of having all of these attributes together in a system.  ٍEvery distributed system at most can have two of these three attributes. Most of the references introduce CAP as an triangle which a distributed system can have just two of its angles.

CAP Triangle

CAP Triangle

Examples of Consistency + Availability are:

  • Single-node Databases
  • Cluster Databases
  • LDAP
  • xFS file system

Examples of Consistency + Partition Tolerance are:

  • Distributed Databases
  • Distributed Locking
  • Majority Locking

Examples of Availability + Partition Tolerance are:

  • Coda
  • Web caching
  • DNS

You can read the formal proof of CAP theorem in : Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-tolerant Web. It is not very hard and with reading this article can clear everything for you.

But in this post i proof this concept with some justifications.

Assumption: In a simple distributed system we have two nodes: NODE A & NODE B.  a client writes “DataItem1″ on NODE A and at the same time a client request to read the “DataItem1″ on NODE B.

Assume we have CA environment, so all the data in all nodes are consistent and all of the nodes can execute every query, If all the messages between nodes fail then a query to node B cannot have the latest value of data item. As you see there are some situations that we can not have “CA” environment with “P”.

Assume we have CP environment, so all the data in all nodes are consistent and there is partition tolerance attribute. Now if before writing “DataItem1″ on NODE A connection between two nodes break, Requesting to node B cannot execute our query, So web missed availability. Node B wants to synch its data with NODE A but the connection is broken so response cannot be available.

At last assume we have PA environment.  So every request from nodes will have a response and partition tolerance permits our system to continue its tasks with any message and system failures. If client writes “DataItem1″ on NODE A and in the same time other client send request for “DataItem1″  to Node B and the connection between two nodes are broken then client will read old version of “DataItem1″.

Note: It is possible to have delay in communication and  synchronizing between the nodes, It is the most important reason which a PA system cannot have consistency at all. In these environments we have partial consistency between our nodes.

About these ads
One Comment

Trackbacks & Pingbacks

  1. CAP is not just for your head. | IT World Web

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: