Computing a balanced solution
This is a technical note with algorithmic considerations related to the validator election protocol under NPoS. We consider a scenario where a committee of validators has already been elected, and we explore the best way to assign the nominators' stake to them. The reader should already be familiar with our research paper, and in particular the concept of balanced solutions defined in it. Although we prove in that paper that balanced solutions can be computed efficiently, not many details are given about it. Such details are presented in this note.
After establishing some notation, we introduce the balancing problem and explain why this is exactly the problem we need to solve. We then establish two algorithmic ways to solve the balancing problem, namely 1) using parametric flow algorithms, and 2) using a heuristic called star balancing, and we compare them.
1. Notation
We consider an instance of NPoS consisting of a bipartite graph , where is the set of nominators, is a committee of elected validators of size , with , and there is an edge whenever nominator approves of validator . We are also given a vector of nominator stakes, where is the stake of nominator . An edge weight vector is feasible if it is component-wise non-negative and observes the constraints: for each nominator . We say that is tight if the previous inequality is tight for each nominator that has at least one neighbor in .
Let be the node-edge incidence matrix for the validator set . For any , the total support that assigns to each validator in is given by the vector , so that for any validator , its support is the total amount of stake that assigns to from the nominators.
Given an instance as above, the balancing problem consists of finding a tight vector that minimizes the squared norm of the support vector, i.e. minimize the value
Clearly, an optimal solution to this problem corresponds precisely to a balanced solution, as defined in our paper.
2. Algorithms
There are three possible ways to solve the balancing problem:
- Via convex programming: it can be solved with numerical methods for convex quadratic programs, but this is too computationally expensive to consider any further.
- Via parametric flow algorithms: We show in the research paper that the balancing problem can potentially be solved in time using some advanced techniques for parametric flow problems.
- Via a simple combinatorial heuristic: the star balancing heuristic starts with any tight vector and converges to an optimal vector by following a local weight-balancing rule. It executes in time , ignoring logarithmic factors.
At first look, the worst-case complexity bound is much better for technique 2 than for technique 3. However, we point out that Babenko et al. (2007) studied a parametric max flow problem closely related to the balancing problem and performed experimental evaluations of both of these techniques, over real data for an application in revenue optimization as well as over synthetic data. They concluded that the performance of star balancing is actually comparable to that of parametric flow algorithms, except for instances with degenerate graph topologies. In fact, they conjecture that these two techniques have similar complexities whenever the underlying graph has moderately good expansion properties.
In view of this and of the fact that star balancing is vastly easier to implement than the algorithm based in parameter flow, we suggest that star balancing be used for NPoS.
3. The star balancing heuristic
Star balancing is a combinatorial randomized algorithm that outputs a solution arbitrarily close to optimal with high probability (this is what is known as a polynomial-time randomized approximation scheme, or PRAS). We remark that a different analysis to this algorithm can be found in Tarjan et al. (2006). We show the following.
Theorem: For any fixed parameters , the star balancing algorithm returns a tight weight vector whose value has a probability at least of being within a multiplicative factor at most from minimal, and runs in time
Algorithm: Star balancing.
Consider an instance . For each nominator let be its set of neighbors in .
Fix constants . The algorithm starts with an arbitrary tight vector , and improves it iteratively by performing rounds, where we will give a precise value for and prove that .
Find any tight vector .
Repeat times: a. Select a nominator uniformly at random. b. Modify the weights of the edges incident to , keeping tight and observing the non-negativity constraints, so that the supports of the neighboring validators are as close to each other as possible, i.e. so that
Return .
Running time: Consider a round of the algorithm. If nominator is selected, the running time of the round is , assuming that floating-point arithmetic operations take constant time. Hence, the average running time per round is proportional to . Together with the bound on , we obtain a global running time of
Analysis: For each , let be the state of weight vector at the end of the -th round, and let be the initial vector. Let be an optimal solution. Let's start with an easy observation.
Lemma 1: .
Proof: Recall that the objective value to minimize is . As both and are tight, the norm of their support vectors are equal. Hence
Next we show that, in expectation, the progress in objective value perceived in each round is proportional to the difference between the current and optimal values.
Lemma 2: For each round that starts with vector and ends with vector , the expected objective value of is such that
Proof: We fix a round , and for notational convenience we drop the superscripts and within the scope of this proof. In particular, we let be the initial vector, and let be the final vector in the case that nominator is picked in the round. Clearly, the expected progress in objective value equals the average progress . To lower bound the latter, it is sufficient to exhibit a different family of weight vectors such that for each , and then bound the average progress when moving from to a member of that family.
Define the vector . The following is a necessary technical observation whose proof we delay temporarily.
Lemma 3:
Consider the decomposition of vector as , where is the restriction of over the edges incident to nominator , and define the family of weight vectors . We have for all as desired, because by construction (step 2.b. of the algorithm), is precisely the vector of minimum objective value among all maximally affordable vectors that differ from only at the edges incident to . Hence, it only remains to bound the average progress in objective value with respect to the new family.
For a fixed , we have
Thus, the average progress over all is
where the inequality comes from Lemma 3. This completes the proof of Lemma 2.
Proof of Lemma 3: We interpret as a flow over the network . As both and are tight, there is flow preservation over all nominators. Let be respectively the sets of sources and sinks, i.e. the sets of validators with net excess and net demand. By the flow decomposition theorem, there exists a decomposition into single-source subflows, where has as its single source. We can assume that this decomposition generates no cycles by adjusting the choice of the optimal solution .
Consider one of these subflows . Its edge support looks like a directed acyclic graph (DAG) with single root . We arrange the edges on this DAG by levels, where the level of an edge is the length of the longest path from containing this edge. These levels start at 1 for the edges incident to , up to at most because any simple path alternates between a nominator and a validator and there are only validators. We now split by levels, , where is the restriction of over the edges at level . Since the excess in node is and no other node in the DAG has any excess, the sum of edge weights along each level is . Therefore,
Putting things together, we get
\begin{align} |f|^22 &= |\sum{v\in As} f^v|_2^2 \ &\leq |A_s|\sum{v\in As} |f^v|_2^2 \ & \leq 2k|A_s|\sum{v\in A_s} (Bf)_v^2 \ &= 2k|A_s|\cdot |Bf|_2^2, \ \end{align}
where the first inequality is an application of a Cauchy-Schwarz inequality.
In a similar manner, working with sinks instead of sources, we can obtain the bound . Summing up these two bounds and dividing by two, we get which proves the claim.
For each round , consider the random variable , which represents how far from optimal the current solution is in terms of objective value. We now use Lemma 2 to show that decays exponentially fast in expectation.
Lemma 4: For any , the expected value of observes
Proof: A reformulation of Lemma 2 gives . By induction and linearity of expectation, this implies that . Finally, by Lemma 1.
Recall now that we want the value of the output solution to be within a factor of from with probability at least . The next lemma completes the analysis of the algorithm and the proof of the main theorem.
Lemma 5: If , then .
Proof: By Lemma 4 and the choice of value , it follows that
As the variable is non-negative, we can use Markov's inequality:
which is the claim.