5. A Phragmén-like Heuristic

The sequential Phragmén’s method is fast, seems to work well in practice, and gives a solution that satisfies the Proportional Justified Representation (PJR) axiom. However, we identify two problems with it. First, it does not give a constant factor approximation to the maximin support objective. Second, it lacks versatility, in the sense that it cannot be used to improve upon an arbitrary solution given as input. We describe a heuristic closely related to the sequential Phragmén’s method, which takes as input an arbitrary partial solution, defines scores over the validator candidates, and uses them to add a new validator to the solution, removing another validator if necessary.

Checking if a committee satisifies the PJR property is NP-hard. However, we define a stronger property called PJR(\(d\)), which depends on a parameter \(d\). We show that it implies PJR (for an adecuate value of \(d\)), and that it can checked efficiently, using our heuristic. We also build from our heuristic a “PJR(\(d\))-enabler”; that is, an algorithm that takes as input a solution of minimum support \(d\), performs some swaps using the heuristic, and returns a solution of minimum support at least \(d\) and observing PJR(\(d\)) (and also PJR). Finally, we provide an efficient factor-3.15 approximation algorithm, which starts with an empty set and alternates between using our heuristic to elect a new validator, and running a flow balancing algorithm to improve the support distribution on the current set.

Notation

Recall that an instance of the NPoS election problem is a graph \((N\cup V, E)\) representing the trust relations between nominators and validator candidates, a vector \(b\in\mathbb{R}_{\geq 0}^N\) of nominator budgets, and a target number \(m\) of validators to elect.

A support distribution from nominators to validators is represented by a vector \(w\in\mathbb{R}_{\geq 0}^E\) of edge weights. Such a vector is called affordable if, besides non-negativity constraints, it observes the budget constraints \(\sum_{v\in V: nv\in E} w_{nv}\leq b_n\) for each nominator \(n\in N\). Furthermore, it is called maximally affordable with respect to a validator set \(S\subseteq V\) if \(\sum_{v\in S: \ nv\in E} w_{nv} = b_n\) for each \(n\in N\) having at least one neighbor in \(S\). By a solution we mean a pair \((S,w)\) where \(S\subseteq V\) with \(|S|\leq m\) and \(w\) is maximally affordable for \(S\). Our algorithm will return solutions of size \(m\), but we will often consider solutions of smaller sizes in intermediate steps and in the analyses.

Given an edge weight vector \(w\) and a validator \(v\in V\), the support of \(v\) relative to \(w\) is \(supp_w(v):=\sum_{n: nv\in E} w_{nv}\). For a solution \((S,w)\), we extend this definition to \(supp_w(S):=\min_{v\in S} supp_w(v)\). The maximin support objective is maximizing \(supp_w(S)\) over all feasible solutions \((S,w)\) to the given instance.

The basic heuristic

Suppose that \((S,w)\) is a maximally affordable solution with \(|S|\leq m\), and we wish to add a new candidate \(v'\in V\setminus S\) and give it some support \(d\). To do this, in general we will need to reduce the support of any other candidate \(v \in S\) who has a neighbor nominator \(n\) in common with \(v'\). When we do this, we want to ensure that we do not reduce the support of \(v\) below \(d\) (assuming it was peviously above \(d\)). A simple way to ensure this is to reduce the weight on edge \(nv\) from \(w_{nv}\) to \(w_{nv}\cdot(d/supp_w(v))\), and assign the difference to edge \(nv'\). That way, it is clear that even if all nominators \(n\) supporting \(v\) are also neighbors of \(v'\), the new support of \(v\) does not go below \(d\).

Thus, if for each \(n\in N\) and \(d\geq 0\) we define the nominator’s slack as

\[\begin{split}\begin{align} slack_w(n,d):= & \begin{cases} b_n & \text{if }\nexists v\in S: nv \in E\\ \sum_{v\in S: \ nv\in E, \ supp_w(v)> d} w_{n,v}(1-d/supp_w(v)) & \text{otherwise } \end{cases}\\ = & b_n - \sum_{v\in S: \ nv\in E} w_{nv} \cdot\min\Big\{1, \frac{d}{supp_w(v)}\Big\} \end{align}\end{split}\]

and for each \(v'\in V\setminus S\) and \(d\geq 0\) we define that candidate’s pre-score as

\[prescore_w(v',d) := \sum_{n\in N: \ nv' \in E} slack_w(n,d)\]

then we can add \(v'\) to the solution with support \(prescore_w(v,d)\), while not making any other validator’s support decrease from over \(d\) to under \(d\). In particular, if \(prescore_w(v',d)\geq d\), the new solution \((S \cup \{v'\},w')\) has \(supp_{w'}(S \cup \{v'\}) \geq \min\{supp_w(S),d\}\).

The resulting heuristic, which adds a new candidate to the initial solution, is formalized below.

Algorithm: InsertCandidate\((S,w,v',d)\)

  1. Let \(S'\leftarrow S \cup \{v\}\) and \(w'\leftarrow w\).

  2. For all \(n\) with \(nv' \in E\):

    • If \(\nexists v\in S\) with \(nv\in E\), set \(w'_{nv'}\leftarrow b_n\), otherwise set \(w'_{nv'}\leftarrow 0\);

    • For all \(v\in S\) with \(w_{nv} > 0\) and \(supp_w(v) > d\): increase \(w'_{nv'}\) by \(w_{nv}(1-d/supp_w(v))\), and set \(w'_{nv} \leftarrow w_{nv}(d/supp_w(v))\);

  3. Return \((S',w')\).

The next result follows from the definitions and the discussion above and its proof is skipped.

Lemma 1: If we run InsertCandidate(\(S,w,v',d\)) for some maximally affordable solution \((S,w)\) with \(|S|\leq m\), \(v'\in V\setminus S\) and \(d\geq 0\) to get \((S',w')\), then

  • \((S',w')\) is maximally affordable,

  • \(supp_{w'}(v')=prescore_w(v',d)\),

  • for all \(v\in S\) we have that \(supp_{w'}(v)\geq \min\{d, supp_w(v)\}\), and consequently if \(prescore_w(v',d)\geq d\), we obtain

\[supp_{w'}(S') \geq \min \{d, supp_{w}(S) \},\]
  • the running time of the algorithm is \(O(|E|\cdot |S|)=O(|E|\cdot m)\).

How high can we make \(d\) and have the property \(prescore_w(v',d)\geq d\) hold? We define \(score_w(v')\) to be the maximum \(d\) such that \(prescore_w(v',d) \geq d\). Our heuristic now becomes apparent.

Heuristic:Given a maximally affordable solution \((S,w)\) with \(|S|\leq m\), find the candidate \(v'\in V\setminus S\) maximizing \(score_w(v')\) and execute InsertCandidate(\(S,w,v',score_w(v')\)), so that the new solution \((S',w')\) observes

\[supp_{w'}(S')\geq \min\{supp_w(S),\max_{v'\in V\setminus S} score_w(v')\}.\]

This is the core idea of our method. In the remainder of the section we establish how to find the candidate with the largest score efficiently.

Fix a candidate \(v'\in V\setminus S\), and consider the function \(f(d):=prescore_w(v',d)-d\) in the interval \([0,\infty)\). Notice that this function is continuous and strictly monotone decreasing, and that \(score_w(v')\) corresponds to its unique root. We can therefore approximate this root with binary search, as binary search works on any monotonic function. However, we can do better. Sort the set of support values \(\{supp_w(v): \ v\in S\}=\{d_1,d_2,\cdots,d_k\}\) so that \(d_1<d_2<\cdots<d_k\), for some \(k\leq |S|\), and note that \(prescore_w(v',d)\) is piecewise linear with respect to \(d\), namely it is linear in each of the intervals \([0,d_1), \ [d_1, d_2), \cdots, [d_k, \infty)\). By treating \(prescore_w(v',d)\) as a linear function in the neighborhood of \(d^*:=score_w(v')\), and solving for \(f(d^*)=0\), we obtain

\[score_w(v') =\frac{\sum_{n: \ nv'\in E} (b_n - \sum_{v \in S: \ supp_w(v) \leq d^*} w_{nv})} {1+\sum_{n: \ nv' \in E} \sum_{v \in S: \ supp_w(v) > d^*} w_{nv} /supp_w(v) }.\]

The interesting thing about the previous identity is that the right-hand side stays constant if we replace \(d^*=score_w(v')\) by any other value \(d\) within the same interval, among the above-defined intervals. This motivates us to define the following score function, for any \(v'\in V\setminus S\) and \(d\geq 0\): \(score_w(v',d) :=\frac{\sum_{n: \ nv'\in E} (b_n - \sum_{v \in S: \ supp_w(v) \leq d} w_{nv})} {1+\sum_{n: \ nv' \in E} \sum_{v \in S: \ supp_w(v) > d} w_{nv} /supp_w(v) }.\)

Function \(score(v',d)\) is very similar to, but algorithmically more convenient than function \(prescore(v',d)\). We remark that for \(d=0\), the expression for \(1/score_w(v',0)\) corresponds to the notion of score in sequential Phragmén’s method. Hence, the latter can be seen as a special case of our approach.

Lemma 2. Fix a maximally affordable solution \((S,w)\) and a candidate \(v'\in V\setminus S\):

(i) Function \(score_w(v',d)\) is piece-wise constant with respect to \(d\), namely it is constant in each of the intervals \([0,d_1), \ [d_1, d_2), \cdots, [d_k, \infty)\), where the values \(d_1<d_2<\cdots <d_k\) constitute the set \(\{supp_w(v): \ v\in S\}\).

(ii) \(d^*:=score_w(v')\) is the unique root of function \(g(d):= score_w(v',d) - d\); moreover, \(g(d)\) is strictly positive for all \(d<d^*\) and strictly negative for all \(d>d^*\).

(iii) The above defined value \(d^*\) is equal to \(\max_{d\geq 0} score_w(v',d) \).

Proof. Let \(num(d)\) and \(denom(d)\) be respectively the numerator and the denominator in the definition of \(score_w(v',d)\).

It is easy to check that if \(d_i<d_j\) are two values such that \(\nexists v\in S\) with \(d_i \leq supp_w(v)< d_j\), then both \(num(d)\) and \(denom(d)\) stay constant in the interval \([d_i,d_j)\). This proves point (i).

Now consider function \(g(d):=score_w(v',d) - d\):

\[\begin{split}\begin{align} g(d)&:= score_w(v',d)-d \\ &=\frac{num(d) - d\cdot denom(d)}{denom(d)} \\ &= \frac{\sum_{n: \ nv'\in E}\Big(b_n -\sum_{v\in S} w_{nv}\cdot \min\{1,d/supp_w(v)\}\Big) - d}{denom(d)}\\ &= \frac{prescore_w(v',d)-d}{denom(d)} = \frac{f(d)}{denom(d)}. \end{align}\end{split}\]

As \(denom(d)\) is always strictly positive for \(d\geq 0\), we have that functions \(g(d)\) and \(f(d)\) have the same roots and the same signs, and we already argued that function \(f(d):=prescore_w(v'd)-d\) is strictly monotone decreasing and has \(d^*=score_w(v')\) as its only root. This proves point (ii).

To prove point \((iii)\), let \(d^*:=score_w(v')=score_w(v',d^*)\), and consider first a value \(d<d^*\). If \(score_w(v',d)=d^*\) then there is nothing to prove. Otherwise, we have that

\[\begin{align} \frac{num(d) - num(d^*)}{denom(d) - denom(d^*)} = \frac{\sum_{n: \ nv'\in E} \sum_{v\in S: \ d<supp_w(v) \leq d^*} w_{nv}}{\sum_{n: \ nv'\in E} \sum_{v\in S: \ d<supp_w(v) \leq d^*} \frac{w_{nv}}{supp_w(v)}}\leq d^*. \end{align}\]

Therefore,

\[num(d) - num(d^*) \leq d^*( denom(d) - denom(d^*)) = d^*\cdot denom(d) - num(d^*),\]

where we used the fact that \(d^*\cdot denom(d^*)=num(d^*)\) because \(d^*=score_w(v',d^*)=num(d^*)/denom(d^*)\). From the previous inequality, we conclude that \(d^* \geq num(d)/denom(d) = score(v',d)\), as desired.

Similarly, if \(d>d^*\) and \(score_w(v',d)\neq d^*\), then

\[\begin{align} \frac{num(d^*) - num(d)}{denom(d^*) - denom(d)} = \frac{\sum_{n: \ nv'\in E} \sum_{v\in S: \ d^*<supp_w(v) \leq d} w_{nv}}{\sum_{n: \ nv'\in E} \sum_{v\in S: \ d^*<supp_w(v) \leq d} \frac{w_{nv}}{supp_w(v)}}> d^*. \end{align}\]

Therefore,

\[num(d^*) - num(d) > d^*( denom(d^*) - denom(d)) = num(d^*) - d^*\cdot denom(d),\]

and thus we conclude that \(d^* > num(d)/denom(d) = score(v',d)\).

\(\square\)

Corollary 3. Fix a maximally affordable solution \((S,w)\). Then,

(i) Function \(\max_{v'\in V\setminus S} score_w(v',d)\) is constant in each of the above-defined intervals \([0,d_1), \ [d_1, d_2), \cdots, [d_k, \infty)\).

(ii) \(d^*:= \max_{v'\in V\setminus S} score_w(v')\) is the unique root of function \(h(d):=\max_{v'\in V\setminus S} score_w(v',d) - d\); moreover, \(h(d)\) is strictly positive for each \(d<d^*\) and strictly negative for each \(d>d^*\).

(iii) The above defined point \(d^*\) is equal to \( \max_{d\geq 0} \max_{v'\in V\setminus S} score_w(v',d)\).

This corollary easily follows from the previous lemma, and its proof is skipped. Notice that the value \(d^*\) is precisely what we need to find in our heuristic, and point (ii) establishes that we can find it with binary search. We define the explicit algorithm next.

Algorithm. CalculateMaxScore\((S,w)\)

  1. Compute the values \(0=d_0<d_1<d_2<\cdots<d_k\), where \(\{d_1,\cdots, d_k\} = \{supp_w(v): \ v\in S\}\).

  2. Let \(i_{lo}=0\), \(i_{hi}=k\), \(i=0\).

  3. For each \(n\in N\), compute \(b_n - \sum_{v \in S: \ supp_w(v) \leq d_i} w_{nv}\), and \(\sum_{v \in S: \ supp_w(v) > d_i} w_{nv} /supp_w(v)\).

  4. For each \(v'\in V\setminus S\), compute \(score_w(v',d_i)= \frac{\sum_{n\in N: \ nv'\in E} (b_n - \sum_{v \in S: \ supp_w(v) \leq d_i} w_{nv})}{1+\sum_{n\in N: \ nv'\in E} \sum_{v \in S: \ supp_w(v) > d_i} w_{nv} /supp_w(v) }.\)

  5. Let \((d_{\max}, v_{\max})=(\max, \arg\max)_{v'\in V\setminus S} score_w(v',d_i)\).

  6. Let \(i'\) be the highest value such that \(d_{max}\geq d_{i'}\), and set \(i_{\lo}\leftarrow \max\{i_{lo}, i'\}\). If \(d_{\max < d_i}\), set \(i_{\hi}\leftarrow i-1\).

  7. If \(i_{lo}< i_{hi}\), set \(i\leftarrow \lceil (i_{lo}+i_{hi})/2 \rceil\) and go back to 3.; else, return \((d_{\max}, v_{\max})\).

Lemma 4: CalculateMaxScore\((S,w)\) returns \(\max_{v'\in V\setminus S} score_w(v')\) and a \(v'\) that attains that score, in \(O(\log |S|)=O(\log m)\) iterations, where each iteration takes time \(O(|E|)\).

Proof: It is easy to verify that each iteration of the algorithm above executes in time \(O(|E|)\). Let \(d^*:=\max_{v'\in V\setminus S} score_w(v')=\max_{d\geq 0} \max_{v'\in V\setminus S} score_w(v',d)\). From point (i) in the corollary, we can reduce our search for \(d^*\) to only the evaluations of function \(\max_{v'\in V\setminus S} score_w(v',d)\) over the \(O(m)\) points \(d_i\). Moreover, by point (ii), we can perform binary search, so that if \(d_{\max}:=\max_{v'\in V\setminus S} score_w(v',d_i)\) is larger than \(d_i\) then we can restrict our search to values \(d\) larger than the current \(d_i\), and otherwise we can restrict our search to values smaller than the current \(d_i\). This shows that only \(O(\log m)\) iterations are performed. Finally, by point (iii) we can also restrict our search to values \(d\) larger than the current \(d_{\max}\), thus speeding up our search even more. To finish the proof, we remark that we choose to initialize the index \(i\) to zero because it seems to speed up the search in many implementations, but it can be initialized to any other value between \(i_{lo}\) and \(i_{hi}\).

\(\square\)

When we have the score, we can insert candidate \(v_{\max}\) to the current solution \((S,w)\) using InsertCandidate(\(S,w,v_{\max},d_{\max}\)), thus obtaining a new solution \((S',w')\) with \(supp_{w'}(S')\geq \min\{supp_w(S), \max_{v'\in V\setminus S} score_w(v')\}\), as desired.

(Parameterised) Proportional Justified Representation.

We can generalise the PJR property to our weighted votes setting and consider adding a parameter. For each nominator \(n\in N\), let \(V_n\subseteq V\) be the subset of candidates that are trusted by \(n\), i.e. \(V_n:=\{v\in V: \ nv\in E\}\).

Definition: A committee \(S\) (of any size) satisfies Proportional Justified Representation with parameter \(d\) (PJR(\(d\)) for short) if there is no subset \(N'\subseteq N\) of nominators and integer \(t>0\) such that:

a) \(\sum_{n\in N'} b_n \geq t\cdot d\),

b) \(|\cap_{n\in N'} V_n|\geq t\), and

c) \(|S\cap (\cup_{n\in N'} V_n)|<t\).

In other words, if there is a set \(N'\) of nominators who can “afford” to provide a support of \(d\) to each one of \(t\) commonly trusted candidates, they will indeed be represented by at least \(t\) candidates in \(S\) (though not necessarily commonly trusted). Notice that if a committee satisfies PJR(\(d\)), then it also satisfies PJR(\(d'\)) for each \(d'\geq d\), so the property gets stronger as \(d\) decreases. Notice also that a committee \(S\) satisfies the standard version of PJR if and only if it satisfies PJR(\(d\)) for \(d=\sum_{n\in N}b_n / |S|\).

Checking whether a committee \((S,w)\) satisfies standard PJR is known to be NP-hard. However, for any \(d\geq 0\) we can efficiently check whether \(\max_{v'\in V\setminus S} score_w(v')<d\), and this in turn implies PJR(\(d\)).

Lemma 5: If a set \(S\) (of any size) does not satisfy PJR(\(d\)) for some parameter \(d\) then, for any maximally affordable edge weight vector \(w\in\mathbb{R}^E_{\geq 0}\), there must be a candidate \(v'\in V\setminus S\) with \(prescore_w(v',d)\geq d\), and consequently \(score_w(v')\geq d\).

Proof: If \(S\) does not satisfy PJR(\(d\)), there must be a subset \(N'\subseteq N\) of nominators with a) \(\sum_{n\in N'}b_n \geq t\cdot d\), b) \(|\cap_{n\in N'} V_n|\geq t\), and c) \(|S\cap (\cup_{n\in N'} V_n)| \leq t-1\). Therefore, the set \((\cap_{n\in N'} V_n)\setminus S\) must be non-empty; let \(v'\) be a candidate in it. Fix a maximally affordable weight vector \(w\); we claim that \(prescore_w(v',d)\geq d\). We have

\[\begin{split}\begin{align} prescore_w(v',d) &= \sum_{n\in N: \ v'\in V_n} slack_w(n,d)\\ &\geq \sum_{n\in N'} slack_w(n,d)\\ &= \sum_{n\in N'} \Big(b_n - \sum_{v\in S\cap V_n} w_{nv}\cdot \min\{1, d/supp_w(v)\}\Big) \\ &\geq \sum_{n\in N'}b_n - \sum_{v\in S\cap(\cup_{n\in N'} V_n)}\min\{1,d/supp_w(v)\} \cdot \sum_{n\in N'} w_{nv}\\ &\geq t\cdot d - \sum_{v\in S\cap(\cup_{n\in N'} V_n)}\min\{1,d/supp_w(v)\} \cdot supp_w(v)\\ &\geq t\cdot d - \sum_{v\in S\cap(\cup_{n\in N'} V_n)}d\\ & \geq d\cdot (t - |S\cap(\cup_{n\in N'} V_n)|) \\ & \geq d, \end{align}\end{split}\]

where we used fact a) on the fifth line, and fact c) on the last line. This proves that \(prescore_w(v,d) \geq d\). The fact that \(score_w(v) \geq d\) follows from the definition of score. \(\square\)

Algorithm. TestThatImpliesPJR(\(S,w,d\))

  1. For each \(v \in S\) compute \(supp_w(v)\).

  2. For each \(n\in N\), compute \(slack(n,d)=b_n - \sum_{v\in S\cap V_n} w_{n,v}\cdot \min\{1,d/supp_w(v)\}.\)

  3. For each \(v'\in V\setminus S\), compute \(prescore_w(v',d)= \sum_{n\in N: \ v'\in V_n} slack(n,d) \; ,\) and if \(prescore_w(v',d)\geq d\), return false.

  4. Return true.

Lemma 6: Algorithm TestThatImpliesPJR(\(S,w,d\)) runs in time \(O(|E|)\), and if it returns true for a given solution \((S,w)\), then \(S\) satisfies PJR(\(d\)).

Local Search for provable PJR

From the previous lemma, it follows that for a maximally affordable solution \((S,w)\) and for any \(d\geq 0\), either the solution satisfies PJR(\(d\)), or there is a new candidate which can be inserted with support at least \(d\), and can be used to replace another candidate with low support. This observation naturally gives rise to the following local search procedure, for which we fix a small constant \(\varepsilon>0\).

Algorithm. LocalSearchForPJR(\(S,w, \varepsilon\))

  1. Let \((d_{current}, v) \leftarrow (\min, \arg\min)_{v\in S} supp_w(v)\).

  2. Let \(d_{next}\leftarrow \min\{(1+\varepsilon)\cdot d_{current}, \sum_{n\in N} b_n / m\}\).

  3. Let \((v_{\max}, d_{\max})\leftarrow CalculateMaxScore(S,w)\).

  4. If \(d_{\max}<d_{next}\), return \((S,w)\).

  5. Remove \(v\) from \(S\), and set \(w_{nv}\leftarrow 0\) for each \(n\in N\).

  6. Let \((S,w)\leftarrow InsertCandidate(S,w,v_{\max}, d_{\max})\), to add \(v_{\max}\) to \((S,w)\).

  7. Go to 1.

Theorem 7. If we run LocalSearchForPJR(\(S,w,\varepsilon\)) on any maximally affordable solution \((S,w)\) of size \(m\) and support \(d:=supp_w(S)\), then it returns a maximally affordable solution \((S',w')\) of size \(m\):

i) with \(d':=supp_{w'}(S')\geq d\), and

ii) satisfying PRJ(\(d''\)), where \(d'':=\min\{(1+\varepsilon)\cdot d', \sum_{n\in N} b_n / m\}\), and so also satisfying standard PJR.

Moreover, if the input has a \(c\)-factor approximation guarantee for the maximin support objective, for some parameter \(c>1\), then the algorithm performs \(m\cdot O(1+ varepsilon^{-1}\log c)\) iterations, where each iteration executes in time \(O(|E|\cdot m)\).

Proof. By the correctness of algorithm InsertCandidate(\(S,w,v_{\max}, d_{\max}\)), at the beginning of each iteration we have a maximally affordable solution of size \(m\), where the new \(d_{current}\) is larger than the minimum between \(d_{current}\) and \(d_{next}\) in the previous iteration. Now, notice that \(d_{current}\) can never be greater than \(\sum_{n\in N} b_n/m\), as the sum of supports cannot exceed the sum of budgets, and thus it is always the case that \(d_{next}\geq d_{current}\). This shows that the minimum support of the current solution never decreases throughout the iterations, and proves point i).

Next, if the algorithm eventually finalizes and outputs a solution \((S',w')\) at step 4. with minimum support \(d'\), then the current value of \(d_{next}\) is \(d''\) as defined in ii). By correctness of the algorithm CalculateMaxScore(\(S,w\)) we know that \(\max_{v'\in V\setminus S'} score_{w'}(v')<d''\), and so by Lemma 5 it must satisfy PJR(\(d'\)), and since \(d'\leq \sum_{n\in N} b_n/m\) then it also satisfies standard PJR.

It is easy to see that the complexity of each iteration is dominated by the execution of algorithm InsertCandidate(\(S,w,v_{\max}, d_{\max}\)) at step 6., with a running time of \(O(|E|\cdot m)\). So, it only remains to prove that the algorithm terminates after only \(O(m\varepsilon^{-1}\log c)\) iterations. To do that, we analyze the evolution of the parameter \(d_{current}\): First, we argue that if \(d_{current}=\sum_{n\in N}b_n/m\), then the algorithm terminates in the current iteration. This is because all nominators have zero slack, so all candidates in \(V\setminus S'\) have zero score as well, and \(d_{\max}=0\) so the condition at step 4. is fulfilled.

Next, we claim that if the value of \(d_{current}\) at iteration \(i\) is \(d^i_{current}\), then either the algorithm terminates at iteration \(i+m\) at the latest, or \(d^{i+m}_{current}\geq (1+\varepsilon)\cdot d^i_{current}\). This is because \(d^i_{next}=\min\{(1+\varepsilon)\cdot d^i_{current}, \sum_{n\in N} b_n/m\}\), and in each iteration after \(i\) we are removing one candidate of least support while not adding any candidate with support under \(d^i_{next}\). As there are only \(m\) candidates, we can do this at most \(m\) times before the minimum support reaches the value \(d^i_{next}\). This value is either at least \((1+\varepsilon)\cdot d^i_{current}\), or it is \(\sum_{n\in N} b_n/m\) and in the latter case we terminate immediately.

Finally, if the algorithm executes a total of \(t\) iterations before returning a solution with minimum support \(d'\), and the minimum support \(d\) of the input is \(c\)-approximation to the maximin support problem, then

\[c\cdot d \geq d'=d_{current}^t\geq (1+\varepsilon)^{\lfloor \frac{t-1}{m}\rfloor} \cdot d_{current}^1= (1+\varepsilon)^{\lfloor \frac{t-1}{m}\rfloor} \cdot d.\]

We conclude that \(\lfloor\frac{t-1}{m}\rfloor \log_{1+\varepsilon} c\), and so \(t \leq m(1+\log_{1+\varepsilon} c)=m\cdot O(1+ \varepsilon^{-1} \log c)\)

\(\square\)

Factor 3.15 approximation algorithm

We propose a greedy algorithm that starts with an empty set and runs \(m\) iterations, where each iteration uses our heuristic to insert a new validator and then runs a weight redistribution algorithm over the current set.

In particular, for a given solution \((S,w)\) with \(|S|\leq m\), we run a weight rebalancing algorithm that computes an \(\varepsilon\)-approximation of the min-norm max-flow (MNMF) weight vector for set \(S\). We formalize this definition below.

Definition: For a non-empty validator set \(S\subseteq V\) and a constant \(\varepsilon>0\), an edge weight vector \(w\in\mathbb{R}_{\geq 0}^E\) is an \(\varepsilon\)-MNMF for \(S\) if

(i) \(w\) is maximally affordable, i.e. \(\sum_{v\in S: \ nv\in E} w_{nv}=b_n\) for each \(n\in N\) having at least one neightbor in \(S\),

(ii) for any \(n\in N\) and any two neighbors \(v,v'\in S\) of it, if \(w_{nv}>0\) then \(supp_w(v)\leq (1+\frac{\varepsilon}{5\cdot|S|})supp_{w}(v')\), and

(iii) For all affordable \(w'\), \(supp_{w'}(S)\leq (1+\epsilon)supp_w(S)\).

In our note on the min-norm max-flow problem, we provide more information about \(\varepsilon\)-MNMF vectors and present an algorithm MNMF(\(S,w,\varepsilon\)) that returns an \(\varepsilon\)-MNMF for \(S\) in polynomial time, and where the input vector \(w\) is optional.

Consider the following algorithm.

Algorithm. BalanceBetweenHeuristic()

  1. Initialise \(S\) to the empty set and \(w\) to the empty vector;

  2. For \(i\) from \(1\) to \(m\):

    • Let \((v,d)\leftarrow CalculateMaxScore(S,w)\);

    • Update \((S,w)\leftarrow InsertCandidate(S,w,v,d)\);

    • Update \(w \leftarrow MNMF(S,w,\varepsilon)\);

  3. Return \((S,w)\).

Analysis

The main result of the section is showing that the previous algorithm offers a \(3.15\cdot (1+\varepsilon)\)-factor approximation, and satisfies PJR.

Theorem 8: The procedure \(BalanceBetweenHeuristic()\) returns a solution \((S, w)\) for which \(supp_{w}(S) \geq d^*/(3.15\cdot (1+\varepsilon))\), where \(d^*\) is the maximin support across all solutions of the given NPoS election instance. Moreover, the solution satisfies PJR(\((1+\varepsilon)\cdot supp_{w}(S)\)) and, if \(\varepsilon \leq 1/m\), PJR.

We begin with a couple of needed technical result.

Observation. For any \(0\leq \varepsilon\leq 1\) and any \(m\geq 1\), we have the inequality

\[\Big(1+\frac{\varepsilon}{5m}\Big)^{m} \leq 1+\frac{\varepsilon}{4}.\]

Proof: The inequality \(1+x\leq e^x\) holds for any real \(x\). Replacing \(x\) by \(\varepsilon/(5m)\) and raising both sides to the power \(m\), we obtain

\[\Big(1+\frac{\varepsilon}{5m}\Big)^m \leq (e^{\frac{\varepsilon}{5m}})^m = e^{\frac{\varepsilon}{5}}.\]

Finally, since the function \(f(\varepsilon):=e^{\varepsilon/5}\) on the right-hand side is convex, within the range \(0\leq \varepsilon \leq 1\) it can be upper bounded by the linearization \(1+(e^{1/5} -1)\varepsilon\). It can be checked that \(e^{1/5} - 1 \leq 1/4\), and the claim follows.

\(\square\)

Proposition 9: Let \((S,w)\) be a solution where \(|S|< m\) and \(w\) is an \(\epsilon\)-MNMF of \(S\) for some \(0\leq \varepsilon \leq 1\), and let \(d^*\) be as in Theorem 8. Then, there exists a non-empty set \(T\subseteq V\setminus S\) with the property that for each \(0\leq a\leq 1\), there is a set of nominators \(N_a\subseteq N\) such that

a) each \(n\in N_a\) has a neighbor in \(T\),

b) \(\sum_{n\in N_a}\geq (1-a)\cdot |T|\cdot d^*\), and

c) For each \(v\in S\) such that \(w_{nv}>0\) for some \(n\in N_a\), we have that \(supp_{w}(v)\geq \frac{a\cdot d^*}{1+\varepsilon/4}\).

Proof. Let \(m':=|S|\), where \(m'<m\). Let \((S^*, w^*)\) be an optimal size-\(m\) solution to the maximin support problem, so that \(supp_{w^*}(S^*)=d^*\). We define set \(T\) as \(T:=S^*\setminus S\), which is clearly non-empty. Fix a parameter \(0\leq a\leq 1\).

We have by the previous observation that \((1+\frac{\varepsilon}{5m'})^{m'} \leq 1+ \frac{\varepsilon}{4}\). By the pigeonhole principle, there must be an integer \(0\leq i\leq m'\) such that \(\nexists v\in S\) with

\[a\cdot d^*\Big(1+\frac{\varepsilon}{5m'}\Big)^{i}\Big(1+\frac{\varepsilon}{4}\Big)^{-1} \leq supp_w(v) <a\cdot d^*\Big(1+\frac{\varepsilon}{5m'}\Big)^{i+1}\Big(1+\frac{\varepsilon}{4}\Big)^{-1}.\]

Let \(d_l\) and \(d_u\) be respectively the lower and upper bounds above, and notice that \(a\cdot d^*/(1+ \varepsilon/4) \leq d_l\leq a\cdot d^*\) . Define the sets \(S_l\) and \(S_u\) as containing the validators \(v\) such that \(supp_w(v)< d_l\) and \(supp_w(v)\geq d_u\), respectively. By the definition of \(d_l\) and \(d_u\), we know that \(S_l\) and \(S_u\) partition the set \(S\).

Going forward, partition the set \(N=N_l \cup N_u\), where \(N_l\) contains the \(n\in N\) that have a neighbor in \(S_l\), and \(N_u\) contains those that do not. We highlight some properties of these sets.

* $\nexists n\in N_l, v\in S_u$ such that $w_{n,v}>0$: Assuming otherwise that there is such a pair, the fact that $n$ is in $N_l$ implies that it has a neighbor $v'\in S_l$. But then we can apply the definition of $\varepsilon$-MNMF to obtain $supp_w(v)/supp_w(v')\leq 1+\epsilon/(5m')=d_u/d_l$, which contradicts the definitions of $S_l$ and $S_u$.

* By the fact that $w$ is maximally affordable, and that each nominator in $N_l$ has neighbors in $S_l$ but gives no support to $S_u$, we have

\(\sum_{n\in N_l} b_n \leq \sum_{v\in S_l} supp_w(v)< |S_l|\cdot d_l.\)

* If we define $N_a\subseteq N_u$ as those $n\in N_u$ that have a neighbor in $T$, then claim a) becomes evident, and claim c) follows from the fact that $N_a$ has no neighbors in $S_l$ (by def. of $N_u$) and all of its neighbors in $S_u$ have a support of at least $d_u>d_l\geq ad^*/(1+\epsilon/4)$. Hence, it only remains to prove claim b).

Assume without loss of generality that \(w^*\) provides a support of exactly \(d^*\) to each \(v\in S^*\), by possibly capping its edge weights, and define the edge weight vector \(w'\) by capping \(w\) arbitrarily such that \(supp_{w'}(v)=d_l\) if \(v\in S_u\), and \(0\) otherwise (these vectors are affordable but in general not maximally affordable). Define \(f:=w^*-w'\in\mathbb{R}^E\), which we consider as a flow over the network induced by \(N\cup S\cup S^*\).

Clearly, the net excess of set \(N\) relative to \(f\) is \(md^* - |S_u|d_l\\), the net excess of set $N_l$ is at most $\sum_{n\in N_l} b_n < |S_l|d_l$, and the net demand of set $S_u$ is \(|S^*\cap S_u|(d^*-d_l)\). By subtracting the last two terms from the first one, we obtain that the flow going from \(N_u\) to \(S^*\setminus S_u\) is at least

\[\begin{split}\begin{align} &md^* - |S_u|d_l - |S_l|d_l - |S^*\cap S_u|(d^*-d_l) \\ &=md^* - m'd_l - |S^*\cap S_u|(d^*-d_l) \\ & \geq m(d^*-d_l) - |S^*\cap S_u|(d^*-d_l) \\ &\geq |S^* \setminus S|(d^*-d_l) \\ &\geq |T|d^*(1-a). \end{align}\end{split}\]

A key observation now is that none of the flow originating at \(N_u\) can pass by, or end in, \(S_l \cup N_l\). This is because there are no edges between \(N_u\) and \(S_l \cup N_l\), by definition of \(N_u\); and even though the flow can pass by \(S_u\), there is no flow from \(S_u\) to \(S_l\cup N_l\) in \(f=w^*-w'\) because \(w'\) provides no flow from \(S_l\cup N_l\) to \(S_u\). Therefore, the formula above is actually a lower bound on the flow going from \(N_u\) to \(T=S^*\setminus S\). Finally, we notice that if we decompose flow \(f\) into simple paths, any path from \(N_u\) to \(T\) must have the last edge originating in \(N_u\), or more specifically in \(N_a\). This proves that \(\sum_{n\in N_a} b_n \geq |T|d^*(1-a)\), which is claim b), and concludes the proof of the proposition.

\(\square\)

Lemma 6: Given a solution \((S,w)\) with \(|S| \leq m\) that satisfies Condition 1, there exists a \(v \notin S\) with \(score_w(v) \geq d^*/4(1+\epsilon/2)\).

Proof: Apply Proposition 1 and set \(a=1/2\), then by using that for any \(a_1,\dots, a_n\), there exists an \(a_i\) with \(a_i \geq \sum_i a_i/n\), there is a \(v \in T\) such that the set \(A_{v,d^*/2}:=\Big\{n\in N \ | \ n,v \in E \text{ and }\forall v'\in V, w_{n,v'} > 0 \Rightarrow supp_{w}(v') \geq \frac{d^*}{2(1+\epsilon/2)} \Big\}\)

has \(\sum_{n \in A_{v,d^*/2}} b_n \geq d^*/2\). Now for any \(n \in A_{v,d^*/2}\), \(slack(n,d^*/4(1+\epsilon/2))= \sum_{v:(n,v) \in E} w_n,v (1-d^*/4(1+\epsilon/2) supp_w(v)) \geq b_n/2\) and so \(prescore(v,d^*/4(1+\epsilon/2)) \geq \sum_{n \in A_{v,d^*/2}} slack(n, d^*/4(1+\epsilon/2)) \geq d^*/4 > d^*/4(1+\epsilon/2)\). Thus \(score_w(v) \geq d^*/4(1+\epsilon/2)\).

We can do better than this by using different \(a\).

Lemma 7 Given a solution \((S,w)\) with \(|S| \leq m\) that satisfis Condition 1, there exists a \(v \notin S\) with \(score_w(v) \geq d^*/3.15(1+epsilon/4)\).

Proof: The following will be crucial:

Lemma 8 Consider a finite sum $ \sum_i f(x_i) a_i$, where \(f\) is strictly increasing with derivative \(f'(x)\), \(a_i \geq 0\) for all \(i\) and for some \(y \leq \min_i x_i\), \(f(y) = 0\), then \(\sum_i f(x_i) a_i = \int_{y}^infty f'(x) (\sum_{i:x_i \geq x} a_i) dx\)

Proof: We can write the sum as a Lebesgue integral over the measure with weights \(a_i\), obtaining that: \(\sum_i f(x_i) a_i = \int_0^\infty (\sum_{i:f(x_i) >t} a_i ) dt\) The conditions on \(f\) are enough for it to be invertible with derivative \(df^{-1}/dt=1/f'(f^{-1}(t))\) and \(f^{-1}(0)=y\), so we can substiture \(x=f^{-1} t\) into the above to obtain: \begin{align} \sum_i f(x_i) a_i & = \int_0^\infty (\sum_{i:f(x_i)} >t a_i ) dt 92 &= \int_y^\infty f’(x) (\sum_{i:f(x_i) >f(x)} a_i ) dx 92 &= \int_y^\infty f’(x) (\sum_{i:x_i > x} a_i ) dx 92 &= \int_y^\infty f’(x) (\sum_{i:x_i \geq x} a_i ) dx \end{align}

For sume \(0 \leq b \leq 1\), to be determined later, we consider \(\sum_{v \in T} prescore(w,b d^*)\), we have \(\sum_{v \in T} prescore(w,b d^*) \geq \sum_{n: \exists v \in T, (n,v) \in E} slack(n, b d^*) \geq \sum_{n \in N_b} slack(n,b d^*) \; .\)

Now for \(n \in N_b\), let \(supp(n)= \max_{v':w_{n,v'} > 0} supp_w(n)\) and \(supp(n)=\infty\) if \(w_{n,v'}=0\) for all \(v'\). We certainly have \(supp(n) \geq b d^*\) from the definition of \(N_b\). More generally for \(n \in N_b\), \(n \in N_a\) if and only if \(supp(n) \geq a d^*\). We have \(slack(n, b d^*) = b_n - \sum_{v' \in S} w_n,v bd^*/ supp_w(v') \geq b_n (1- bd^*/supp(n))\) and so using Lemma 8, \begin{align} \sum_{v \in T} prescore(w,b d^) & \geq \sum_{n \in N_b} slack(n,b d^) 92 & \geq \sum_{n \in N_b} b_n (1- bd^/supp(n)) 92 &= \int_{b d^}^{infty} (bd^/x^2) (\sum_{n \in N_{x/d^}} b_n) dx 92 & \geq \int_{b}^{1} (b/a^2) (1-a)|T| d^/(1+\epsilon/2) da 92 & = (bd^|T|/(1+\epsilon/2) \int_{b}^1 1/a^2 - 1/a da 92 &= (bd^|T|/(1+\epsilon/2) (1/b-1+ln b) 92 &= (d^|T|/(1+\epsilon/3) (1-b+bln b) \end{align} So for any \(b\) with \(b(2-ln b) \leq 1\), we have that there is an \(v \in T\) with \begin{align} prescore_w(b, b d^/(1+\epsilon/4)) & \geq prescore_w(b, b d^) 92 & \geq (d^/(1+\epsilon/4) (1-b+bln b) 92 & \geq b d^/(1+\epsilon/4) \end{align} In particular, this holds for \(b=1/3.15\). Thus there exists a \(v \notin S\) with \(score_w(v) \geq d^*/3.15(1+\epsilon/4)\).

Now we can prove Theorem 1

Proof of Theorem 1: For a set \(S\), we define \(d^*_S\) to be the maximum over afforable \(w\) of \(supp_w(S)\).In order not to lose error with condition 1 (ii), we need:

Lemma 9: Let \((S,w)\) be a solution with \(|S| \leq m\) that satisfies Condition 1. Let \((S',w')\) be a solution of any size with \(S \subseteq S'\) and for all \(v \in S\) with \(supp_{w'}(v) < sepp_w(v)\), we have \(supp_{w'}(v) \geq d\) for some \(d\). Then \(d^*_{S'}\) defined similarly has \(d^*_{S'} \geq \min \{d^*_S, d/(1+\epsilon/4)\}\).

Proof: Note that \((1+\epsilon/12m)^{m+1}\) \leq (1+\epsilon/4)$. Since there are at most \(m\) values of \(supp_w(v)\) for \(v \in S\), by the pidgeon hole principle, there must be an \(x=d/(1+\epsilon/12m)^i\) for \(1 \leq i \leq m\) such that no \(v \in S\) has \(x < supp_w(v) \leq x(1+\epsilon/12m)\). Let \(X \subseteq S\) be the set of \(v \in S\) with \(supp_v(w) \leq x\).

Let \(N_X=\{n:\exists v \in X, (n,v) \in E\}\). By condition 1, any \(n \in N_X\) has \(w_{n,v'}=0\) for \(v' \notin X\). By our construction, we also have \(w'_{n,v'} =0\) for \(v \notin X\).

If \(X\) is empty, then \(d^*_S \geq supp_w(S) \geq x \geq d/(1+\epsilon/3)\) and the construction gives a \(w'\) with \(d^*_{S'} \geq supp_{w'}(S') \geq d/(1+\epsilon/3)=\min \{d^*_S, d/(1+\epsilon/3)\}\) and we are done.

If \(X\) is non-empty, then we claim that \(d^*_S=d^*_{S'}=d^*_X\). Let \(w^*_S\),\(w^*_{S'}\) and \(w^*_X\) be affordable assignments that achieve these. Then by optimality of \(d^*_X\), \(d^*_S \leq supp_{w^*_S}(X) \leq d^*_X\) and similarly \(d^*_{S'} \leq d^*_X\). On the other hand, we have that \(|X| d^*_X \leq \sum_{v \in X} supp_{w^*_X} \leq \sum_{n \in N_X} b_n = \sum_{v \in X} supp_{w} (X) \leq |X| x\) Now consider modifying \(w\) or \(w'\) by setting \(w_{n,v}\) for $v \in X, \(n \in N_X\) to be \((w^*_X)_{n,v}\). Since \(n \in n_X\) had \(w_{n,v}=w'_{n,v} =0\) for \(v \notin X\), this is still affordable and the supports for \(v \notin X\) remain the same, that is \(> x\). So these have minimum support \(d^*_X\), and we have \(d^*_{S'}=d^*_S\), which gives the the lemma.

Now we claim inductively that \(d^*_{S_i} \geq d^*/3.15(1+\epsilon/4)^2\). Lemma 7 implies that there is a \(v \notin S_{i-1}\) with \(score_w(v) \geq d^*/3.15(1+\epsilon/4)\). We add this to \(w\), while not reducing the support of any \(v\) to below \(d^*/3.15(1+\epsilon/4)\). Lemma 9 gives that \(d_{S_i} \geq d^*/3.15(1+\epsilon/4)^2\) The induction gives that \(d^*_{S_m} \geq d^*/3.15(1+\epsilon/4)^2\). Stare balances gives a \(w_m\) that satiusfies condtion 1 (ii) and so \(supp_{w_m}(S_m) \geq d^*_{S_m}/(1+\epsilon/4) \geq d^*/3.15(1+\epsilon/4)^3 \geq d^*/3.15(1+\epsilon)\).

For the PJR claim, suppose that \(S_m\) does not satify PJR(\(d\)) for some \(d\). Then nor do any \(S_i\), since they are subsets of \(S_m\). So by Lemma 5, for every \(0 \leq i \leq m-1\), there is a \(v_i \notin S_i\) with \(score_{w_i}(v_i) \geq d\). The same argument as above gives that \(supp_{w_m}(S_m) \geq d/(1+\epsilon/4)^2 < d/(1+\epsilon)\). It follows that \(S_m\) does satisfy PJR(\(supp_{w_m}(S_m)(1+epsilon)\)).

Suppose that \(S_m\) does not satisfy PJR but \(\epsilon \leq 1/m\). Then by Lemma 5, there exists a \(v \notin S\) with \(score_{w_n}(v) \geq \sum_n b_n/m\). By Lemmas 2 and 1, we can run InsertNewCandidate(\(w_n, score_{w_n}(v)\)), to obatian a \(w'\) such that \(\supp_{w'}(S_m \cup \{v\}) = \min \{ score_{w_n}(v), supp_{w_m}(S_m) \}\). But since \(|S_m \cup \{v\}|=m+1\), we must have \(\supp_{w'}(S_m \cup \{v\}) \leq \sum_n b_n/(m+1)\). Thus we get that \(supp_{w_m}(S_m) \leq \sum_n b_n/(m+1)\). So if \(\epsilon \leq 1/m\), then \(supp_{w_m}(S_m) (1+\epsilon) \leq supp_{w_m(S_m) (m+1)/m \leq sum_n bn /m\) and since PJR (\(d\)) implies PJR(\(d'\)) for \(d' > d\), we have PJR. This means that \(\epsilon \leq 1/m\) implies PJR.