Partitions in decentralized networks
Distributed ledger technology connects computers (nodes) throughout a decentralized network. Ideally, this collection of nodes is always cohesive in their vision of the network and in their communication efforts. Networks need effective communication for a variety of reasons. First, communication allows the network to effectively evolve. New transactions come through, changing the overall state of the network. Nodes communicate the state change to ensure that there is always a collective vision of the network. This collective vision is how most blockchains gain their security and immutability. Without effective communication, the network either can’t evolve or evolves to separate states.
A breakdown in network communication is known as a partition. Partitions occur if a network is split up into two or more sub-segments. Other than in scenarios such as sharding (intentional network divisions to increase scalability), partitions are unintentional. In some cases, participants in a partition might not even realize they are fractured from the larger network since communication within the partition continues as normal. Network partitions are an unfortunate reality for decentralized networks. As such, networks must make determinations for how they deal with inevitable partitions. Depending on the network, there are several methods a network can employ to continue operating in spite of a network partition. Within this article, we will discuss partition scenarios and the possible solutions based on the CAP and FLP theories discussed in the previous article.
Once a partition occurs, there are a variety of options for handling partitions and facilitating recovery. Based on the inherent trade-offs as dictated by the CAP Theorem, the priority must be either to maximise consistency or availability, depending on the anticipated trade-off.
A good example of a network partition is known as a “Firewall Attack,” earning its name from the internet firewall that countries such as China employ. Imagine that China suddenly shuts down the incoming and outgoing Bitcoin communications. This certainly is possible given the capabilities of their firewall and their ability to censor various communications. As a result, the system would have two siloed networks (one within China, and one outside of China). Conceivably, neither network would realize what had happened and communication within each partition would continue as usual. If a user had access to both networks, they could spend the same bitcoin twice. The communication wouldn’t extend outside of the network, and, as such, one account outside of the network wouldn’t be debited for a transaction within the network. This would result in the potential for double spending, whereby the same coin could be spent on both networks. Relating to the CAP Theorem, this would be an example of availability of the network but no consistency for users. The networks remain available, but it does not maintain a consistent view of the network.
In the case of network partitions, the system must make a decision either to:
- cancel the operation and thus decrease availability, or
- proceed with the operation and risk inconsistency. (7)
However, in order to implement any decision, the network must know that there has been a partition. Since nodes are able to go offline, it is often unclear if the peers of a node are offline, if the node has been attacked, or if the network is partitioned.
As previously mentioned, the frequency of partitions and the partition tolerance of a network depends highly upon its design choices. Semi-synchronous networks can account for partitions better than asynchronous networks. Through the implementation of consensus boundaries, like a common clock in the network or Byzantine agreement, the network is able to measure node gossip and thus, more effectively identify abnormalities. In scenarios where a bound time is set, the system will enter a partition mode more often than “when the network is merely slow and not actually partitioned,” (7) because a subset of nodes might not be able to adopt the current state of the network within the time boundary. Resulting, these nodes will have a different view of the network than the other nodes which have the latest state. Nodes in this situation have two option: they can either reattempt to communicate to the other part of the network an indefinite amount of times or they can move on to the next consensus round. The former implies that the network chooses consistency over availability. In contrast, if nodes continue gossiping and issuing transactions within their partition, they have chosen availability. Asynchronous blockchains have to rely on the network not to partition in order to maintain security. In other words, semi-synchronous networks create a time-bound during which the network accepts incoming messages. Assuming not all messages have been received, the network must make a decision. Either the network chooses availability and moves on to the next consensus round, ignoring the missing messages and ensuring the network continues to work; or, the network chooses consistency and stays in limbo until the messages are received.
Generally, if the network is split into two partitions (within China and outside of China), both continue to operate. Several possibilities exist when the partition ends and the network attempts to reunite the fractured state. Once the partition is resolved, generally the bigger partition is chosen and the smaller one is disregarded. This is similar to the Bitcoin protocol’s decision to always refer to the chain with more computing power. Resulting, if the within-China partition has more transactions than the outside-China, then the former is chosen and latter disregarded. “If there were a worldwide network partition where the Bitcoin network was divided more or less in half, things would get pretty ugly; Bitcoin would fork off into two separate streams of data, one for each region. If they later regained connectivity, the longer stream would ‘win,’ erasing all the data in the shorter stream.” (8) Note that a partition will only create new blocks in proportion to the amount of mining power available. Therefore, small partitions will experience the network partition far more drastically – the transaction speed would slow down dramatically.
However, to avoid destroying a significant amount of the transaction history when the network reunites, networks adopt various solutions. One option is known as the “Last Writer Wins” which only processes messages after a specific timestamp. As such, the first submitted messages are the first to process. Imagine this like an overbooked airplane; only as many passengers fit into the plane as there are seats available. If there are too many passengers, some will be left behind. A similar model can be implemented with transactions, whereby certain transactions do not get appended into the final state of the network. For example, a network has two partitions of a and b, and each partition has processed 50 transactions since the partition occurred. The network’s implemented rule designates, that in the case of a partition, in order to reduce the network latency of processing x transactions at once, only 50% of the transactions can be processed. Therefore, just like an overbooked flight, the network has to choose which 50 transactions of the 100 transactions to include. A reasonable approach in this scenario is through the use of a timestamp. Only the transactions, which have been included in the partition the earliest get included in the main chain. This can prevent double spending as well since there will be a clear ordering of transactions according to their timestamp.
Another partition recovery solution is achieved by comparing the two partitions and canceling out the “same” transactions while merging the two. Resulting, the transactions from the China partition will be compared with the transactions from the non-China partition. If any of the nodes in the network had access to both partitions and participated in double-spending, the network will either cancel both transactions or remove either of the transactions. Resulting, large chunks of transactions may become irrelevant if they rely on previous, double spent, transactions. Therefore, canceling both transactions would be the fairest option, otherwise, one partition may be favored over the other. While this solution offers security advantages, it is very time consuming and inefficient to check the validity of every transaction. In addition, it risks destroying large chunks of both blockchains.
Furthermore, a third option exists. A blockchain can have a consensus round to agree on which partition is valid. Once the majority of the network has agreed on the validity of one of the partitions, all other partitions are erased. In this scenario, the largest partitions will most likely win the voting as it benefits participants to vote for their own partition. If their partition loses, their transactions will be destroyed. As such, the larger chain generally has the most votes. Note that this path of partition resolution might not work in the case of an equal network split, whereby both partitions are more or less the same size. Both partitions would be unlikely to gain the majority of votes.
Most blockchains settle for partition tolerance and either availability or consistency. If the network settles for consistency, the network will halt until the partition is resolved. This would make it easier to rejoin both networks since no new transactions would have occurred. In contrast, if the network settles for availability, nodes will still be able to make transactions. Resulting, once the network rejoins the partitions, a solution must be taken in order to determine which transactions to approve on both chains. In both scenarios, in order to react to a network partition, the network has to have mechanisms in place which identify the partition in the first place. The main objective for networks is to prevent double spending at the time of the partition and to prevent one partition from permanently forking from the rest of the network. The partition must be recognized and stitched back together without permanently destroying the network.