The design of blockchain networks is guided by several tradeoffs. Overall, any blockchain will be able to achieve, at most, two of the following: Decentralisation, Security, or Scalability. However, the design of blockchains can be divided into further trade-offs to enhance different aspects of its functionality. The additional trade-offs blockchains face are outlined in the following article. Depending on the properties on which the network has settled, participants will face certain advantages and drawbacks. Furthermore, every blockchain aims to solve specific problems. By building a deep understanding of blockchain design principles, it is possible to evaluate the viability of specific blockchain solutions.
Availability vs. Consistency
Take, for example, a decentralized system that is trying to be consistent and available. Essentially, the network should be distributed (e.g. not centralized) and all the nodes should always agree on the same value (consistency). Ultimately, both characteristics should be present at the same time, regardless of any network partition. This makes sense when put it in terms of existing networks such as Ethereum. Ethereum holds billions of dollars of value on a decentralized network. The decentralized collection of nodes maintains the state of the network and the ownership history for digital assets. It would be catastrophic if Ethereum lacked consistency. If half of the network recorded a history different than the other half, then the platform would have essentially forked and would exist in two separate states.
However, theoretically, maintaining this consistency is not possible if a part of the network is suddenly partitioned and unable to communicate with the network as a whole. Either the network ensures that all the available nodes return some response (this response will only be visible within the partition: Availability and no Consistency) OR some nodes are ignored, but those that are messaging are all relaying the same information (Consistency and no Availability — Availability is compromised for part of the network in order to keep consistency within another part). So essentially, the network chooses one of three options. First, the network can ignore a selection of the nodes or include all the messages but recognize that they won’t be agreeing on the same value (Availability, no Consistency). Second, the network can freeze up and simply accept all the incoming messages. It would then stay that way until the messages became consistent again. Or lastly, the network could simply continue working within one partition.
For example, imagine a group of 10 friends texting about their movie plans. They must decide both on a location, a movie, and a time. Ultimately, this network is fault tolerant. The friends can still go see the movie even if there is a network fault (a phone dies, breaks, gets stolen, or the friend simply stops responding). However, the friends will need to choose either consistency or availability to deal with a fault. Let’s say three of the friends stop responding. If the group chooses consistency, they would simply wait for those friends to respond. If the group chooses availability, they would simply accept the “partition” and plan with those friends that responded.
The outcomes of the scenarios, discussed above, are dependent on the principals of the FLP and CAP theorems discussed below. Depending on the trade-offs in a network, it is possible to predict the behaviour of nodes and the state of the system in a scenario where a network partition occurs.
The FLP Impossibility Theorem
The FLP Impossibility Theorem (from here on referred to as the FLP Theorem, or simply as FLP) relates to distributed consensus, during which a number of nodes must agree on the same value. Any decentralized network (NEO, Ethereum, EOS, etc.) has a consensus mechanism. Ideally, within a decentralised system, all nodes would produce the same output, deliver messages, and agree on the same value.
The FLP Theorem is characterised by three properties:
- Safety: All nodes agree on a valid value proposed by one of the nodes. Nodes will never decide on different values. This is, for example, the state of the network or the next block proposal.
- Liveness: Nodes eventually reach an agreement, i.e. the system must make progress.
- Fault-tolerance: the solution must continue to work during network failures. (1) “Fault refers to a defect at the lowest level of abstraction” (4)
“FLP Theorem: In an asynchronous network, it is not possible to achieve safety and liveness when there may be one crash failure.” (1) Failure is defined as the malfunctioning of entities within the network, which can result in system stagnation by slowing down consensus. Node failure occurs when nodes are malicious, under attack and therefore unable to respond to messages, or entirely disconnected and offline.
Any decentralised system can have, at most, two out of the three properties stated in the FLP theorem. If a system is fault-tolerant, it is designed to handle failure. A fault tolerant system can handle crashes, whereas a non-fault tolerant system cannot. Any decentralized network must be fault tolerant.
Failure can occur within an asynchronous system, but since there is no feedback loop of messages, the system cannot aptly react to the failure. In simpler terms, take the following example: Two nodes are attempting to communicate. However, the message is not successfully communicated. Node 1 is not able to know why Node 2 is unresponsive. This may be because Node 1 is partitioned and unable to connect to its peers, Node 2 is offline, or because either of them is compromised through an attack. Node 1 cannot aptly react to this failure and must simply continue on with the consensus round (or halt depending on the network definitions). Therefore, only a certain percentage of nodes can be compromised in each operating round if consensus of the majority is still to be achieved.
Note that an asynchronous system is not fault-tolerant by design. A network would have to be implemented in certain measurements to enable fault tolerance, in order to detect when nodes are offline. Byzantine fault-tolerant systems ensure that up to ⅓ of all nodes can be malicious. Resulting, if the other ⅔ of nodes are able to communicate properly, they will be able to reach consensus eventually (Liveness and Fault Tolerance). In contrast, if a network is unable to deal with the failure of a subset of nodes, it will not be able to achieve liveness nor safety, because “only one potential failure is necessary to eliminate consensus guarantees.” (1)
Also called Brewer theorem. CAP relates to distributed storage systems, “which present illusions of single-copy data to large numbers of end users.” (1) The theorem considers three properties in distributed, replicated storage systems. Each letter in CAP represents the first letter of each property.
- Consistency: Each node sees and has access to the same data, at all times.
- Availability: Data for any request is always available.
- Partition tolerance: The system is always operational, even if a subset of nodes fails to operate.
A blockchain cannot achieve all three of the above for the same reasons as stated for FLP. Most would settle on availability and partition tolerance. “An example of a network partition is when two nodes can’t talk to each other, but there are clients that are able to talk to either one or both of those nodes.”(6) Generally, a network should not compromise on partition tolerance since any network is vulnerable to partitions. This rule is equivalent to the requirement for a decentralized network to be fault tolerant as defined by the FLP Theorem.
Ultimately, nodes do not have to always have a consistent view of the network in order to make and validate transactions. Therefore, nodes in an available system, can make transactions and ensure consensus regarding their validity. The system will still be able to evolve, even if a subset of nodes do not have the latest view of the network (i.e. they are not consistent with the rest of the network). In contrast, if the network is consistent, nodes will have access to the same responses. Without availability, nodes will not have access to all responses.
FLP vs. CAP
Both the CAP and the FLP theorem make designers and users aware of the architecture and the functionality of a decentralized system. Independent of the design choices made, any network is only able to achieve, at most, two of the three objectives outlined within each theorem. The overlapping property between both theorems is a fault or partition tolerance. A system has to be able to achieve either safety or liveness (in the case of FLP — CAP defines it as Availability or Consistency) in the event of an adversarial attack on the network. Therefore, most systems would choose to implement fault tolerance and periodically adopt one of the other two properties.
Did you like our analysis of blockchain design trade-offs, and would like to be informed of future releases?