BABE
Authors: Handan Kilinc Alper
1. Overview
In Polkadot, we produce relay chain blocks using our Blind Assignment for Blockchain Extension protocol, abbreviated BABE. BABE assigns block production slots using roughly the randomness cycle from Ouroboros Praos [2].
In brief, all block producers have a verifiable random function (VRF) key, which they register with the locked stake. These VRFs produce secret randomness, which determines when they produce blocks. A priori, there is a risk that block producers could grind through VRF keys to bias results, so VRF inputs must include public randomness created only after the VRF key. We therefore have epochs in which we create fresh public on-chain randomness by hashing together all the VRF outputs revealed in block creation during the epoch. In this way, we cycle between private but verifiable randomness and collaborative public randomness.
The main differences of BABE from Ouroboros Praos [2] are the best chain selection mechanism and slot synchronization assumption i.e.:
- BABE's best chain selection is based on GRANDPA and longest chain.
- Block producers in BABE do not have access to a central authority (e.g., Network Time Protocol (NTP)) to count slots instead, they construct their own clock to follow the slots.
2. BABE
In BABE, we have sequential non-overlapping epochs , each of which consists of a number of sequential block production slots () up to some bound . At the beginning of an epoch, we randomly assign each block production slot to a "slot leader", often one party or no party, but sometimes more than one party. These assignments are initially secrets known only to the assigned slot leader themselves, but eventually they publicly claim their slots when they produce a new block in one.
Each party has as session key containing at least two types of secret/public key pair:
- a verifiable random function (VRF) key , and
- a signing key for blocks .
We favor VRF keys being relatively long-lived because new VRF keys cannot be used until well after creation and submission to the chain. Yet, parties should update their associated signing keys from time to time to provide forward security against attackers who might exploit from creating slashable equivocations. There are more details about the session key available here.
Each party keeps a local set of blockchains . All these chains have some common blocks, at least the genesis block, up until some height.
We assume that each party has a local buffer that contains a set of transactions to be added to blocks. All transactions in a block are validated with a transaction validation function before entering this buffer.
In BABE, we would like to achieve that each validator has the same chance to be selected as a block producer on a slot. Therefore, we define the probability that a validator is selected on a slot as
where is a constant parameter and is the number of validators.
In order to achieve the equality of validators in BABE, we define a threshold parameter as in [2] for the slot assignment:
where is the length of the VRF's first output (randomness value).
BABE consists of three phases:
1st: Genesis Phase
In this phase, we manually produce the unique genesis block.
The genesis block contain a random number for use during the first two epochs for slot leader assignments. Session public keys of initial validators are (), ).
2nd: Normal Phase
We assume that each validator divided their timeline in slots after receiving the genesis block. They determine the current slot number according to their timeline as explained in Section 4. Similarly, when a new validator joins to BABE after the genesis block, this validator divides his timeline into slots.
In normal operation, each slot leader should produce and publish a block. All other nodes attempt to update their chain by extending with new valid blocks they observe.
We suppose each validator has a set of chains in the current slot in the epoch and has a best chain selected in by our selection scheme in Section 3, and the length of is .
Each validator produces a block if he is the slot leader of . If the first output () of the following VRF computation is less than the threshold then he is the slot leader.
If is the slot leader, generates a block to be added on in slot . The block should at least contain the slot number , the hash of the previous block , the VRF output , transactions , and the signature . updates with the new block and sends .
In any case (being a slot leader or not being a slot leader), when receives a block produced by a validator , it validates the block with . must at least check the followings in order to validate the block:
if (signature verification),
if the validator is the slot leader: and (verification with the VRF's verification algorithm).
if there exists a chain with the header ,
if the transactions in are valid.
If the validation process goes well, adds to . Otherwise, it ignores the block.
At the end of the slot, decides the best chain with the chain selection rule we give in Section 3.
3rd: Epoch Update
Starting from first slot, in every slots, the new epoch starts.
Before starting a new epoch , validators should obtain the new epoch randomness and active validators set for the new epoch.
The validator set for the epoch has to be included to the relay chain until the end of the last block of the epoch so that they are able to actively participate the block production in epoch . So, a new validator can actively join the block production at earliest two epochs later after included to relay chain.
A fresh randomness for the epoch is computed as in Ouroboros Praos [2]: Concatenate all the VRF outputs of blocks in epoch (let us assume the concatenation is ). Then the randomness in epoch :
The reason of including a validator after two epochs later is to make sure that the VRF keys of the new validators added to the chain before the randomness of the epoch that they are going to be active is revealed.
3. Best Chain Selection
Given a chain set and the parties current local chain , the best chain algorithm eliminates all chains which do not include the finalized block by GRANDPA. Let's denote the remaining chains by the set . If we do not have a finalized block by GRANDPA, then we use the probabilistic finality in the best chain selection algorithm (the probabilistically finalized block is the block which is block before than the last block of ).
We do not use the chain selection rule as in Ouroboros Genesis [3] because this rule is useful for parties who become online after a period of time and do not have any information related to current valid chain (for parties always online the Genesis rule and Praos is indistinguishable with a negligible probability). Thanks to Grandpa finality, the new comers have a reference point to build their chain so we do not need the Genesis rule.
4. Clock Adjustment (Relative Time Algorithm)
It is important for parties to know the current slot for the security and completeness of BABE. For this, validators can use their computer clocks which is adjusted by the Network Time Protocol. However, in this case, we need to trust servers of NTP. If an attack happens to one of these servers than we cannot claim anymore that BABE is secure. Therefore, we show how a validator realizes the notion of slots without using NTP. Here, we assume we have a partial synchronous network meaning that any message sent by a validator arrives at most -slots later. is an unknown parameter.
Each party has a local clock and this clock is not updated by any external source such as NTP or GPS. When a validator receives the genesis block, it stores the arrival time as as a reference point of the beginning of the first slot. We are aware that the beginning of the first slot is not the same for everyone. We assume that the maximum difference of start time of the first slot between validators is at most . Then each party divides their timeline in slots and update periodically its local clock with the following algorithm.
Median Algorithm: The median algorithm is run by all validators in the end of sync-epochs (we note that epoch and sync-epoch are not related). The first sync-epoch () starts just after the genesis block is released. The other sync-epochs () start when the slot number of the last (probabilistically) finalized block is which is the smallest slot number such that where is the slot number of the last (probabilistically) finalized block in the sync-epoch . Here, is the parameter of the chain quality (CQ) property. If the previous epoch is the first epoch then . We define the last (probabilistically) finalized block as follows: Retrieve the best blockchain according to the best chain selection rule, prune the last blocks of the best chain, then the last (probabilistically) finalized block will be the last block of the pruned best chain. Here, is defined according to the common prefix property.
The details of the protocol is the following: Each validator stores the arrival time of valid blocks constantly according to its local clock. In the end of a sync-epoch, each validator retrieves the arrival times of valid and finalized blocks which has a slot number where
- if .
- if .
Let's assume that there are such blocks that belong to the current sync-epoch and let us denote the stored arrival times of blocks in the current sync-epoch by whose slot numbers are , respectively. A validator selects a slot number and runs the median algorithm which works as follows:
for i = 0 to n:
a_i = sl - sl'_i
store t_i + a_i * T to lst
lst = sort (lst)
return median(lst)
In the end, the validator adjusts its clock by mapping to the output of the median algorithm.
The following image with chains explains the algorithm with an example in the first epoch where and :
Lemma 1: (The difference between outputs of median algorithms of validators) Asuming that is the maximum network delay, the maximum difference between start time is at most .
Proof Sketch: Since all validators run the median algorithm with the arrival time of the same blocks, the difference between the output of the median algorithm of each validator differs at most .
Lemma 2: (Adjustment Value) Assuming that the maximum total drift on clocks between sync-epochs is at most and , the maximum difference between the new start time of a slot and the old start time of is at most .
This lemma says that the block production may stop at most at the beginning of the new synch-epoch.
Proof Sketch: With the chain quality property, we can guarantee that more than half of arrival times of the blocks used in the median algorithm sent on time. Therefore, the output of all validators' median algorithm is the one which is sent on time. The details of the proof is in Theorem 1 our paper Consensus on Clocks.
Having small enough is important not to slow down the block production mechanism a while after a sync-epoch. For example, (a very extreme example) we do not want to end up with a new clock that says that we are in the year 2001 even if we are in 2019. In this case, honest validators may wait 18 years to execute an action that is supposed to be done in 2019.
Temporarily Clock Adjustment
For validators who were offline at some point during one synch-epoch, they can adjust their clock temporarily (till the next synch epoch) with the following algorithm.
1. Case: If was online at some point of a synch-epoch and when he becomes online if his clock works well, he should continue to collect the arrival time of valid blocks and produce his block according to his clock as usual. A block is considered as valid in this case if it is not equivocated, if the block is sent by the right validator and if its slot number belong to the current synch epoch. In the end of the synch-epoch, if he has collected arrival time of valid blocks he runs the median algorithm with these blocks. If it has less than blocks it should wait till collecting arrival time of valid blocks. We note that he does not run the median algorithm not only with the arrival time of the finalized blocks.
2. Case: If was online at some point of a synch-epoch and when he becomes online if his clock does not work anymore, he should continue to collect the arrival time of valid blocks. He can adjust his clock according to e.g., the arrival time of the last finalized block in GRANDPA to continue to produce block. He can use this clock till collecting valid blocks. After collecting valid blocks he should readjust his clock according to the output of the median algorithm with these valid blocks.
With the temporary clock adjustment, we can guarantee that the difference between this new clock and an honest parties clock is at most .
We note that during one sync-epoch the ratio of such offline validators should not be more than 0.05 otherwise it can affect the security of the relative time algorithm.
5. Security Analysis
(If you are interested in parameter selection and practical results based on the security analysis, you can directly go to the next section) BABE is the same as Ouroboros Praos except for the chain selection rule and clock adjustment. Therefore, the security analysis is similar to Ouroboros Praos with few changes.
Definitions
We give the definitions of security properties before jumping to proofs.
Definition 1 (Chain Growth (CG)) [1,2]: Chain growth with parameters and ensures that if the best chain owned by an honest party at the onset of some slot is , and the best chain owned by an honest party at the onset of slot is , then the difference between the length of and is greater or equal than/to .
The honest chain growth (HCG) property is a weaker version of CG which is the same definition with the restriction that and are assigned to honest validators. The parameters of HCG are and instead of and in the CG definition.
Definition 2 (Existential Chain Quality (ECQ)) [1,2]: Consider a chain possessed by an honest party at the onset of a slot . Let and be two previous slots for which . Then contains at least one block generated by an honest party.
Definition 2 (Chain Density (CD)): The CD property with parameters ensures that any portion of a final blockchain spanning between rounds and contains more honest blocks.
Definition 3 (Common Prefix) Common prefix with parameters ensures that any chains possessed by two honest parties at the onset of the slots are such satisfies where denotes the chain obtained by removing the last blocks from , and denotes the prefix relation.
With using these properties, we show that BABE has persistence and liveness properties. Persistence ensures that, if a transaction is seen in a block deep enough in the chain, it will stay there and liveness ensures that if a transaction is given as input to all honest players, it will eventually be inserted in a block, deep enough in the chain, of an honest player.
Security Proof of BABE
We analyze BABE with the NTP protocol and with the Median algorithm.
We first prove that BABE (both versions) satisfies chain growth, existential chain quality and common prefix properties in one epoch. We also show the chain density property for the BABE with median. Then, we prove that BABE is secure by showing that BABE satisfies persistence and liveness in multiple epochs.
In Polkadot, all validators have equal stake (the same chance to be selected as slot leader), so the relative stake is for each validator where is the total number of validators. We assume that the ratio of honest validators is and the ratio of validators sending on time is .
We use notation (resp. ) to show the probability of an honest validator (resp. a malicious validator) is selected. Similarly, we use (resp. ) to show the probability of only honest validators (resp. malicious validators) are selected. is the probability of having an empty slot (no validator selected).
The probability of having timely validator is
and probability of having non-timely validator is .
The validators in BABE with NTP are perfectly synchronized (i.e., the difference between their clocks is 0). On the other hand, the validators in BABE with the median algorithm have their clocks differ at most . In BABE with the NTP, any honest validator builds on top of an honest block generated in slot for sure if the block arrives all validators before starting the next non-empty slot . We call these slots good slots. In BABE with NTP, a slot is good if it is assigned to only honest validators and the next slots are empty. However, it is different in BABE with the median algorithm because of the clock difference between validators. If a slot is assigned to an honest validator that has the earliest clock, in order to make her to build on top of blocks of all previous honest slots for sure, we should make sure that this validator sees all blocks of the previous slots before generating her block. We can guarantee this if previous slots are empty. Also, if a slot is assigned to an honest validator that has the latest clock, we should make sure that the next honest block producers see the block of the latest validator before generating her block. We can guarantee this if the next slots are empty. We use in our analysis below.
Theorem 1: BABE with NTP satisfies HCG property with parameters where and in slots with probability .
Proof: We need to count the honest and good slots (i.e., the slot assigned to at least one honest validator and the next slots are empty) (Def. Appendix E.5. in Genesis) to show the HCG property. The best chain grows one block in honest slots. If honest slots out of slot are less than , the HCG property is violated. The probability of having an honest and good slot is .
We find below the probability of less than slots are honest slots. From Chernoff bound we know that
BABE with median satisfies HCG property with parameters where and in slots with probability .
Theorem 2 (Chain Densisty) Chain desisty property is satisfied with in BABE with probability where and .
Proof: We first find the minimum difference between the number of honest slots and the number of malicious slots in slots belonging one synch-epoch. For this, we need to find the minimum number of honest slots and a maximum number of honest slots .
We can show with the Chernoff bound that for all
and for all
So, . Let's denote where
Assume that the last block of the previous sync-epoch is . So, we only consider the chains that are constructed on top of . Consider a chain which has finalized blocks spanned in subslots and . The longest subchain produced between and is because of the honest chain growth among the chains constructed on top . The longest subchain with more malicious blocks than the honest blocks is possible with malicious blocks and honest blocks. However, this chain can never beat the longest subchain produced at the end of except with probability . This means that there is not any subchain that has more malicious block and can be finalized except with a negligible probability. Therefore, all finalized chains in a synch epoch has more honest slots.
We note that we need the chain densisty property only for the BABE with the median algorithm.
Theorem 3 (Existential Chain Quality): Let and let . Then, the probability of an adversary violates the ECQ property with parameters with probability at most in BABE with NTP.
Proof (sketch): If proportion of a chain does not include any honest blocks, it means that the malicious slots are more than the good and honest slots between the slots that spans these blocks. Since the probability of having good and honest slots is greater than , having more bad slots falls exponentially with . Therefore, the ECQ property is broken in slots at most with the probability .
Let and let . Then, the probability of an adversary violates the ECQ property with parameters with probability at most in BABE with median.
Theorem 4 (Common Prefix): Let and let , the adversary violates the common prefix property with parammeter in slots with probability at most in BABE with NTP. We should have the condition for BABE with median.
Overall Results:
According to Lemma 10 in Genesis chain growth is satisfied with
and chain quality is satisfied with
Theorem 5 (Persistence and Liveness BABE with NTP): Assuming that and given that is the ECQ parameter, is the CP parameter, , , the epoch length is BABE with NTP is persistent and live.
Proof (Sketch): The overall result says that . The best chain at the end of an epoch grows at least blocks in one epoch thanks to the chain growth.
Since , the last block of includes at least one honest block. Therefore, the randomness includes one honest randomness and the adversary can have at most slots to change the randomness. This grinding effect can be upper-bounded by where is the hashing power [2]. The randomness generated by an epoch is finalized at latest one epoch later thanks to the common prefix property. Similary, the session key update which is going to be used in three epochs later is finalized one epoch later before a randomness of the epoch where the new key are going to be used starts to leak. Therefore, BABE with NTP is persistent and live.
Theorem 6 (Persistence and Liveness BABE with the Median Algorithm): Assuming that and where , , the clock difference is between honest valdators is at most , BABE with median is persistent and live given that given that is the ECQ parameter, is the CP parameter, , .
These results are valid assuming that the signature scheme with account key is EUF-CMA (Existentially Unforgible Chosen Message Attack) secure, the signature scheme with the session key is forward secure, and VRF realizing is realizing the functionality defined in [2].
6. Practical Results
In this section, we find parameters of two versions of BABE to achieve the security in BABE.
We fix the lifetime of the protocol as seconds. We denote the slot time by (e.g., seconds). The lifetime of the protocol in terms of slots is . The maximum network delay is .
BABE with the NTP
- Define and . Let if . Otherwise, let
- Decide the parameter such that the condition is satisfied. If there is not any such , then consider to increase (honest validator assumption) or decrease (more optimistic network assumption).
- Set up a security bound to define the probability of an adversary to break BABE in e.g., 3 years. Of course, very low is better for the security of BABE but on the other hand it may cause to have very long epochs and long probabilistic finalization. Therefore, I believe that setting is reasonable enough in terms of security and performance.
- Set (e.g., 0.5) and find and to set the epoch length such that . For this we need an initial value and find and that satisfies the three equations below:
From Theorem 6, we want that the best chain grows at least blocks. Therefore, we need
We need slots to guarantee blocks growth for the ECQ property. So, we need:
Lastly, we need the following as given in the Overall Result:
Iterate to find that satisfy above conditions until :
- Let (The CQ property parameter) We note that is the optimal value that minimizes .
- (to satisfy the condition in Theorem 1)
- (from Equation (1) and (3))
- (from Equation (1) and (2))
After finding such that , let the epoch length .
The parameters below are computed with the code in https://github.com/w3f/research/blob/master/experiments/parameters/babe_NTP.py. In this code, we choose the parameter not only according to security conditions but also according to having in expectation twice more single leader than multiple leaders.
-################### PARAMETERS OF BABE WITH NTP ###################
c = 0.52, slot time T = 6
It is secure in 3 years with a probability 0.99523431732
It is resistant to (6 - block generation time) second network delay
-~~ Common Prefix Property ~~
k = 140
It means: Prune the last 140 blocks of the best chain. All the remaining ones are probabilistically finalized
-~~ Epoch Length ~~
Epoch length should be at least 1440 slots,2.4 hours
If we want more network resistance, , the parameters should be selected as follows:
-################### PARAMETERS OF BABE WITH NTP ###################
c = 0.22, slot time T = 6
It is secure in 3 years with probability 0.996701592969
It is resistant to (12 - block generation time) second network delay
-~~ Common Prefix Property ~~
k = 172
It means: Prun the last 172 blocks of the best chain. All the remaining ones are probabilistically finalized
-~~ Epoch Length ~~
Epoch length should be at least 4480 slots, 7.46666666667 hours
BABE with the Median Algorithm
Define , , and in Theorem 2.
Define and . Let
Decide the parameter such that the condition and is satisfied. If there is not any such , then consider increasing (honest validator assumption) or or decreasing (more optimistic network assumption).
Do the rest as in BABE with NTP.
Finding synch-epoch length
- Set with respect to Theorem 2.
The parameters below are computed with the code in https://github.com/w3f/research/blob/master/experiments/parameters/babe_median.py
-############## PARAMETERS OF BABE WITH THE MEDIAN ALGORITHM ##############
c = 0.38, slot time T = 6
It is secure in 3 years with probability 0.99656794973
It is resistant to 2.79659722222 second network delay and 0.198402777778 seconds drift in one sync-epoch
-~~ Common Prefix Property ~~
k = 140 It means: Prune the last 140 blocks of the best chain. All the remaining ones are probabilistically finalized
-~~ Epoch Length ~~
Sync-Epoch length should be at least 2857 slots, 4.76166666667 hours
Epoch length should be at least 2000 slots,3.33333333333 hours
-~~ Offline validators' parameters for clock adjustment ~~
for temporarily clock adjustment.
Offline validators should collect
Some Notes about clock drifts: http://www.ntp.org/ntpfaq/NTP-s-sw-clocks-quality.htm#AEN1220 All computer clocks are not very accurate because the frequency that makes time increase is never exactly right. For example the error about 0.001% make a clock be off by almost one second per day. Computer clocks drift because the frequency of clocks varies over time, mostly influenced by environmental changes such as temperature, air pressure or magnetic fields, etc. Below, you can see the experiment in a non-air conditioned environment on linux computer clocks. 12 PPM correspond to one second per day roughly. I seems that in every 10000 second the change on the clocks are around 1 PPM (i.e., every 3 hours the clocks drifts 0.08 seconds.). We can roughly say that the clock drifts around 1 second per day. If we have sync epoch around 12 hours it means that we have 0.5 second drift and
Figure. Frequency Correction within a Week
References
[1] Kiayias, Aggelos, et al. "Ouroboros: A provably secure proof-of-stake blockchain protocol." Annual International Cryptology Conference. Springer, Cham, 2017.
[2] David, Bernardo, et al. "Ouroboros praos: An adaptively-secure, semi-synchronous proof-of-stake blockchain." Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, Cham, 2018.
[3] Badertscher, Christian, et al. "Ouroboros genesis: Composable proof-of-stake blockchains with dynamic availability." Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2018.
[4] Aggelos Kiayias and Giorgos Panagiotakos. Speed-security tradeoffs in blockchain protocols. Cryptology ePrint Archive, Report 2015/1019, 2015. http://eprint.iacr.org/2015/1019