# 5. A Phragmén-like Heuristic
The [sequential Phragmén's method](4. Sequential Phragmén’s method.md) is fast, seems to work well in practice, and gives a solution that satisfies the Proportional Justified Representation (PJR) axiom. However, we identify two problems with it. First, it does not give a constant factor approximation to the [maximin support objective](3. The maximin support problem.md). Second, it lacks versatility, in the sense that it cannot be used to improve upon an arbitrary solution given as input. We describe a heuristic closely related to the sequential Phragmén's method, which takes as input an arbitrary partial solution, defines scores over the validator candidates, and uses them to add a new validator to the solution, removing another validator if necessary.
Checking if a committee satisifies the PJR property is NP-hard. However, we define a stronger property called PJR($d$), which depends on a parameter $d$. We show that it implies PJR (for an adecuate value of $d$), and that it can checked efficiently, using our heuristic. We also build from our heuristic a "PJR($d$)-enabler"; that is, an algorithm that takes as input a solution of minimum support $d$, performs some swaps using the heuristic, and returns a solution of minimum support at least $d$ and observing PJR($d$) (and also PJR). Finally, we provide an efficient factor-3.15 approximation algorithm, which starts with an empty set and alternates between using our heuristic to elect a new validator, and running a flow balancing algorithm to improve the support distribution on the current set.
## Notation
Recall that an instance of the NPoS election problem is a graph $(N\cup V, E)$ representing the trust relations between nominators and validator candidates, a vector $b\in\mathbb{R}_{\geq 0}^N$ of nominator budgets, and a target number $m$ of validators to elect.
A support distribution from nominators to validators is represented by a vector \(w\in\mathbb{R}_{\geq 0}^E\) of edge weights. Such a vector is called *affordable* if, besides non-negativity constraints, it observes the budget constraints \(\sum_{v\in V: nv\in E} w_{nv}\leq b_n\) for each nominator \(n\in N\). Furthermore, it is called *maximally affordable* with respect to a validator set \(S\subseteq V\) if \(\sum_{v\in S: \ nv\in E} w_{nv} = b_n\) for each \(n\in N\) having at least one neighbor in \(S\). By a *solution* we mean a pair \((S,w)\) where \(S\subseteq V\) with \(|S|\leq m\) and \(w\) is maximally affordable for $S$. Our algorithm will return solutions of size \(m\), but we will often consider solutions of smaller sizes in intermediate steps and in the analyses.
Given an edge weight vector \(w\) and a validator \(v\in V\), the *support* of $v$ relative to $w$ is \(supp_w(v):=\sum_{n: nv\in E} w_{nv}\). For a solution $(S,w)$, we extend this definition to \(supp_w(S):=\min_{v\in S} supp_w(v)\). The *maximin support* objective is maximizing \(supp_w(S)\) over all feasible solutions \((S,w)\) to the given instance.
## The basic heuristic
Suppose that $(S,w)$ is a maximally affordable solution with $|S|\leq m$, and we wish to add a new candidate $v'\in V\setminus S$ and give it some support $d$. To do this, in general we will need to reduce the support of any other candidate $v \in S$ who has a neighbor nominator $n$ in common with $v'$. When we do this, we want to ensure that we do not reduce the support of $v$ below $d$ (assuming it was peviously above $d$). A simple way to ensure this is to reduce the weight on edge $nv$ from $w_{nv}$ to $w_{nv}\cdot(d/supp_w(v))$, and assign the difference to edge $nv'$. That way, it is clear that even if all nominators $n$ supporting $v$ are also neighbors of $v'$, the new support of $v$ does not go below $d$.
Thus, if for each $n\in N$ and $d\geq 0$ we define the nominator's slack as
\begin{align}
slack_w(n,d):= & \begin{cases}
b_n & \text{if }\nexists v\in S: nv \in E\\
\sum_{v\in S: \ nv\in E, \ supp_w(v)> d} w_{n,v}(1-d/supp_w(v)) & \text{otherwise }
\end{cases}\\
= & b_n - \sum_{v\in S: \ nv\in E} w_{nv}
\cdot\min\Big\{1, \frac{d}{supp_w(v)}\Big\}
\end{align}
and for each $v'\in V\setminus S$ and $d\geq 0$ we define that candidate's pre-score as
$$prescore_w(v',d) := \sum_{n\in N: \ nv' \in E} slack_w(n,d)$$
then we can add $v'$ to the solution with support $prescore_w(v,d)$, while not making any other validator's support decrease from over $d$ to under $d$. In particular, if $prescore_w(v',d)\geq d$, the new solution $(S \cup \{v'\},w')$ has \(supp_{w'}(S \cup \{v'\}) \geq \min\{supp_w(S),d\}\).
The resulting heuristic, which adds a new candidate to the initial solution, is formalized below.
**Algorithm: InsertCandidate\((S,w,v',d)\)**
1. Let \(S'\leftarrow S \cup \{v\}\) and $w'\leftarrow w$.
2. For all $n$ with $nv' \in E$:
* If $\nexists v\in S$ with $nv\in E$, set \(w'_{nv'}\leftarrow b_n\), otherwise set \(w'_{nv'}\leftarrow 0\);
* For all $v\in S$ with $w_{nv} > 0$ and $supp_w(v) > d$: increase \(w'_{nv'}\) by \(w_{nv}(1-d/supp_w(v))\), and set \(w'_{nv} \leftarrow w_{nv}(d/supp_w(v))\);
3. Return \((S',w')\).
The next result follows from the definitions and the discussion above and its proof is skipped.
**Lemma 1:** If we run InsertCandidate($S,w,v',d$) for some maximally affordable solution $(S,w)$ with $|S|\leq m$, $v'\in V\setminus S$ and $d\geq 0$ to get $(S',w')$, then
* \((S',w')\) is maximally affordable,
* \(supp_{w'}(v')=prescore_w(v',d)\),
* for all $v\in S$ we have that \(supp_{w'}(v)\geq \min\{d, supp_w(v)\}\), and consequently if \(prescore_w(v',d)\geq d\), we obtain
$$supp_{w'}(S') \geq \min \{d, supp_{w}(S) \},$$
* the running time of the algorithm is $O(|E|\cdot |S|)=O(|E|\cdot m)$.
How high can we make $d$ and have the property $prescore_w(v',d)\geq d$ hold? We define $score_w(v')$ to be the maximum $d$ such that $prescore_w(v',d) \geq d$. Our heuristic now becomes apparent.
**Heuristic:** *Given a maximally affordable solution $(S,w)$ with $|S|\leq m$, find the candidate $v'\in V\setminus S$ maximizing $score_w(v')$ and execute InsertCandidate($S,w,v',score_w(v')$), so that the new solution $(S',w')$ observes*
$$supp_{w'}(S')\geq \min\{supp_w(S),\max_{v'\in V\setminus S} score_w(v')\}.$$
This is the core idea of our method. In the remainder of the section we establish how to find the candidate with the largest score efficiently.
Fix a candidate $v'\in V\setminus S$, and consider the function $f(d):=prescore_w(v',d)-d$ in the interval \([0,\infty)\). Notice that this function is continuous and strictly monotone decreasing, and that $score_w(v')$ corresponds to its unique root. We can therefore approximate this root with binary search, as binary search works on any monotonic function. However, we can do better. Sort the set of support values \(\{supp_w(v): \ v\in S\}=\{d_1,d_2,\cdots,d_k\}\) so that $d_1 d^*} w_{nv} /supp_w(v) }.$$
The interesting thing about the previous identity is that the right-hand side stays constant if we replace $d^*=score_w(v')$ by any other value $d$ within the same interval, among the above-defined intervals. This motivates us to define the following score function, for any $v'\in V\setminus S$ and $d\geq 0$:
$$score_w(v',d)
:=\frac{\sum_{n: \ nv'\in E} (b_n - \sum_{v \in S: \ supp_w(v) \leq d} w_{nv})}
{1+\sum_{n: \ nv' \in E} \sum_{v \in S: \ supp_w(v) > d} w_{nv} /supp_w(v) }.$$
Function $score(v',d)$ is very similar to, but algorithmically more convenient than function $prescore(v',d)$. We remark that for $d=0$, the expression for $1/score_w(v',0)$ corresponds to the notion of score in sequential Phragmén's method. Hence, the latter can be seen as a special case of our approach.
**Lemma 2.** Fix a maximally affordable solution $(S,w)$ and a candidate $v'\in V\setminus S$:
(i) Function $score_w(v',d)$ is piece-wise constant with respect to $d$, namely it is constant in each of the intervals $[0,d_1), \ [d_1, d_2), \cdots, [d_k, \infty)$, where the values $d_1d^*\).
(iii) The above defined value \(d^*\) is equal to \(\max_{d\geq 0} score_w(v',d) \).
*Proof.* Let $num(d)$ and $denom(d)$ be respectively the numerator and the denominator in the definition of $score_w(v',d)$.
It is easy to check that if $d_id^*\) and \(score_w(v',d)\neq d^*\), then
\begin{align}
\frac{num(d^*) - num(d)}{denom(d^*) - denom(d)}
= \frac{\sum_{n: \ nv'\in E} \sum_{v\in S: \ d^* d^*.
\end{align}
Therefore,
$$num(d^*) - num(d) > d^*( denom(d^*) - denom(d)) = num(d^*) - d^*\cdot denom(d),$$
and thus we conclude that $d^* > num(d)/denom(d) = score(v',d)$.
$\square$
**Corollary 3.** Fix a maximally affordable solution $(S,w)$. Then,
(i) Function \(\max_{v'\in V\setminus S} score_w(v',d)\) is constant in each of the above-defined intervals $[0,d_1), \ [d_1, d_2), \cdots, [d_k, \infty)$.
(ii) \(d^*:= \max_{v'\in V\setminus S} score_w(v')\) is the unique root of function \(h(d):=\max_{v'\in V\setminus S} score_w(v',d) - d\); moreover, $h(d)$ is strictly positive for each \(dd^*\).
(iii) The above defined point \(d^*\) is equal to \( \max_{d\geq 0} \max_{v'\in V\setminus S} score_w(v',d)\).
This corollary easily follows from the previous lemma, and its proof is skipped. Notice that the value \(d^*\) is precisely what we need to find in our heuristic, and point (ii) establishes that we can find it with binary search. We define the explicit algorithm next.
**Algorithm. CalculateMaxScore\((S,w)\)**
1. Compute the values \(0=d_0 d_i} w_{nv} /supp_w(v)$.
4. For each $v'\in V\setminus S$, compute $$score_w(v',d_i)= \frac{\sum_{n\in N: \ nv'\in E} (b_n - \sum_{v \in S: \ supp_w(v) \leq d_i} w_{nv})}{1+\sum_{n\in N: \ nv'\in E} \sum_{v \in S: \ supp_w(v) > d_i} w_{nv} /supp_w(v) }.$$
5. Let \((d_{\max}, v_{\max})=(\max, \arg\max)_{v'\in V\setminus S} score_w(v',d_i)\).
6. Let $i'$ be the highest value such that \(d_{max}\geq d_{i'}\), and set \(i_{\lo}\leftarrow \max\{i_{lo}, i'\}\). If \(d_{\max < d_i}\), set \(i_{\hi}\leftarrow i-1\).
7. If $i_{lo}< i_{hi}$, set $i\leftarrow \lceil (i_{lo}+i_{hi})/2 \rceil$ and go back to 3.; else, return \((d_{\max}, v_{\max})\).
**Lemma 4:** CalculateMaxScore\((S,w)\) returns \(\max_{v'\in V\setminus S} score_w(v')\) and a $v'$ that attains that score, in $O(\log |S|)=O(\log m)$ iterations, where each iteration takes time $O(|E|)$.
*Proof:*
It is easy to verify that each iteration of the algorithm above executes in time $O(|E|)$. Let \(d^*:=\max_{v'\in V\setminus S} score_w(v')=\max_{d\geq 0} \max_{v'\in V\setminus S} score_w(v',d)\). From point (i) in the corollary, we can reduce our search for \(d^*\) to only the evaluations of function \(\max_{v'\in V\setminus S} score_w(v',d)\) over the $O(m)$ points $d_i$. Moreover, by point (ii), we can perform binary search, so that if \(d_{\max}:=\max_{v'\in V\setminus S} score_w(v',d_i)\) is larger than $d_i$ then we can restrict our search to values $d$ larger than the current $d_i$, and otherwise we can restrict our search to values smaller than the current $d_i$. This shows that only $O(\log m)$ iterations are performed. Finally, by point (iii) we can also restrict our search to values $d$ larger than the current $d_{\max}$, thus speeding up our search even more. To finish the proof, we remark that we choose to initialize the index $i$ to zero because it seems to speed up the search in many implementations, but it can be initialized to any other value between $i_{lo}$ and $i_{hi}$.
$\square$
When we have the score, we can insert candidate $v_{\max}$ to the current solution $(S,w)$ using InsertCandidate(\(S,w,v_{\max},d_{\max}\)), thus obtaining a new solution \((S',w')\) with \(supp_{w'}(S')\geq \min\{supp_w(S), \max_{v'\in V\setminus S} score_w(v')\}\), as desired.
## (Parameterised) Proportional Justified Representation.
We can generalise the PJR property to our weighted votes setting and consider adding a parameter. For each nominator $n\in N$, let $V_n\subseteq V$ be the subset of candidates that are trusted by $n$, i.e. \(V_n:=\{v\in V: \ nv\in E\}\).
**Definition:** A committee $S$ (of any size) satisfies Proportional Justified Representation with parameter $d$ (PJR($d$) for short) if there is no subset $N'\subseteq N$ of nominators and integer $t>0$ such that:
a) $\sum_{n\in N'} b_n \geq t\cdot d$,
b) $|\cap_{n\in N'} V_n|\geq t$, and
c) $|S\cap (\cup_{n\in N'} V_n)|0$.
**Algorithm.** LocalSearchForPJR($S,w, \varepsilon$)
1. Let \((d_{current}, v) \leftarrow (\min, \arg\min)_{v\in S} supp_w(v)\).
2. Let \(d_{next}\leftarrow \min\{(1+\varepsilon)\cdot d_{current}, \sum_{n\in N} b_n / m\}\).
3. Let \((v_{\max}, d_{\max})\leftarrow CalculateMaxScore(S,w)\).
4. If $d_{\max}1$, then the algorithm performs $m\cdot O(1+ varepsilon^{-1}\log c)$ iterations, where each iteration executes in time $O(|E|\cdot m)$.
*Proof.* By the correctness of algorithm InsertCandidate($S,w,v_{\max}, d_{\max}$), at the beginning of each iteration we have a maximally affordable solution of size $m$, where the new $d_{current}$ is larger than the minimum between $d_{current}$ and $d_{next}$ in the previous iteration. Now, notice that $d_{current}$ can never be greater than $\sum_{n\in N} b_n/m$, as the sum of supports cannot exceed the sum of budgets, and thus it is always the case that $d_{next}\geq d_{current}$. This shows that the minimum support of the current solution never decreases throughout the iterations, and proves point i).
Next, if the algorithm eventually finalizes and outputs a solution $(S',w')$ at step 4. with minimum support $d'$, then the current value of $d_{next}$ is $d''$ as defined in ii). By correctness of the algorithm CalculateMaxScore($S,w$) we know that $\max_{v'\in V\setminus S'} score_{w'}(v')0$, an edge weight vector \(w\in\mathbb{R}_{\geq 0}^E\) is an $\varepsilon$-MNMF for $S$ if
(i) $w$ is maximally affordable, i.e. \(\sum_{v\in S: \ nv\in E} w_{nv}=b_n\) for each $n\in N$ having at least one neightbor in $S$,
(ii) for any $n\in N$ and any two neighbors $v,v'\in S$ of it, if $w_{nv}>0$ then $supp_w(v)\leq (1+\frac{\varepsilon}{5\cdot|S|})supp_{w}(v')$, and
(iii) For all affordable $w'$, $supp_{w'}(S)\leq (1+\epsilon)supp_w(S)$.
In our note on [the min-norm max-flow problem](2. The min-norm max-flow problem.md), we provide more information about $\varepsilon$-MNMF vectors and present an algorithm MNMF($S,w,\varepsilon$) that returns an $\varepsilon$-MNMF for $S$ in polynomial time, and where the input vector $w$ is optional.
Consider the following algorithm.
**Algorithm.** BalanceBetweenHeuristic()
1. Initialise $S$ to the empty set and $w$ to the empty vector;
2. For $i$ from $1$ to $m$:
* Let $(v,d)\leftarrow CalculateMaxScore(S,w)$;
* Update $(S,w)\leftarrow InsertCandidate(S,w,v,d)$;
* Update $w \leftarrow MNMF(S,w,\varepsilon)$;
3. Return $(S,w)$.
### Analysis
The main result of the section is showing that the previous algorithm offers a $3.15\cdot (1+\varepsilon)$-factor approximation, and satisfies PJR.
**Theorem 8:** The procedure $BalanceBetweenHeuristic()$ returns a solution $(S, w)$ for which \(supp_{w}(S) \geq d^*/(3.15\cdot (1+\varepsilon))\), where \(d^*\) is the maximin support across all solutions of the given NPoS election instance. Moreover, the solution satisfies PJR(\((1+\varepsilon)\cdot supp_{w}(S)\)) and, if $\varepsilon \leq 1/m$, PJR.
We begin with a couple of needed technical result.
**Observation.** For any $0\leq \varepsilon\leq 1$ and any $m\geq 1$, we have the inequality
$$\Big(1+\frac{\varepsilon}{5m}\Big)^{m} \leq 1+\frac{\varepsilon}{4}.$$
**Proof:** The inequality \(1+x\leq e^x\) holds for any real $x$. Replacing $x$ by $\varepsilon/(5m)$ and raising both sides to the power $m$, we obtain
$$\Big(1+\frac{\varepsilon}{5m}\Big)^m \leq (e^{\frac{\varepsilon}{5m}})^m = e^{\frac{\varepsilon}{5}}.$$
Finally, since the function \(f(\varepsilon):=e^{\varepsilon/5}\) on the right-hand side is convex, within the range $0\leq \varepsilon \leq 1$ it can be upper bounded by the linearization $1+(e^{1/5} -1)\varepsilon$. It can be checked that \(e^{1/5} - 1 \leq 1/4\), and the claim follows.
$\square$
**Proposition 9:** Let $(S,w)$ be a solution where $|S|< m$ and $w$ is an $\epsilon$-MNMF of $S$ for some $0\leq \varepsilon \leq 1$, and let $d^*$ be as in Theorem 8. Then, there exists a non-empty set $T\subseteq V\setminus S$ with the property that for each $0\leq a\leq 1$, there is a set of nominators $N_a\subseteq N$ such that
a) each $n\in N_a$ has a neighbor in $T$,
b) \(\sum_{n\in N_a}\geq (1-a)\cdot |T|\cdot d^*\), and
c) For each $v\in S$ such that $w_{nv}>0$ for some $n\in N_a$, we have that \(supp_{w}(v)\geq \frac{a\cdot d^*}{1+\varepsilon/4}\).
*Proof.* Let \(m':=|S|\), where $m'0$: Assuming otherwise that there is such a pair, the fact that $n$ is in $N_l$ implies that it has a neighbor $v'\in S_l$. But then we can apply the definition of $\varepsilon$-MNMF to obtain $supp_w(v)/supp_w(v')\leq 1+\epsilon/(5m')=d_u/d_l$, which contradicts the definitions of $S_l$ and $S_u$.
Next, by the fact that $w$ is maximally affordable, and that each nominator in $N_l$ has neighbors in $S_l$ but gives no support to $S_u$, we have
$$\sum_{n\in N_l} b_n \leq \sum_{v\in S_l} supp_w(v)< |S_l|\cdot d_l.$$
Finally, if we define $N_a\subseteq N_u$ as those $n\in N_u$ that have a neighbor in $T$, then claim a) becomes evident, and claim c) follows from the fact that $N_a$ has no neighbors in $S_l$ (by def. of $N_u$) and all of its neighbors in $S_u$ have a support of at least $d_u>d_l\geq ad^*/(1+\epsilon/4)$. Hence, it only remains to prove claim b).
Assume without loss of generality that $\(w^*\) provides a support of exactly \(d^*\) to each \(v\in S^*\), by possibly capping its edge weights, and define the edge weight vector $w'$ by capping $w$ arbitrarily such that $supp_{w'}(v)=d_l$ if $v\in S_u$, and $0$ otherwise (these vectors are affordable but in general not maximally affordable). Define \(f:=w^*-w'\in\mathbb{R}^E\), which we consider as a flow over the network induced by \(N\cup S\cup S^*\).
Clearly, the net excess of set $N$ relative to $f$ is $md^* - |S_u|d_l$, the net excess of set $N_l$ is at most $\sum_{n\in N_l} b_n < |S_l|d_l$, and the net demand of set $S_u$ is \(|S^*\cap S_u|(d^*-d_l)\). By subtracting the last two terms from the first one, we obtain that the flow going from $N_u$ to \(S^*\setminus S_u\) is at least
\begin{align}
&md^* - |S_u|d_l - |S_l|d_l - |S^*\cap S_u|(d^*-d_l) \\
&=md^* - m'd_l - |S^*\cap S_u|(d^*-d_l) \\
& \geq m(d^*-d_l) - |S^*\cap S_u|(d^*-d_l) \\
&\geq |S^* \setminus S|(d^*-d_l) \\
&\geq |T|d^*(1-a).
\end{align}
A key observation now is that none of the flow originating at $N_u$ can pass by, or end in, $S_l \cup N_l$. This is because there are no edges between $N_u$ and $S_l \cup N_l$, by definition of $N_u$; and even though the flow can pass by $S_u$, there is no flow from $S_u$ to $S_l\cup N_l$ in \(f=w^*-w'\) because $w'$ provides no flow from $S_l\cup N_l$ to $S_u$. Therefore, the formula above is actually a lower bound on the flow going from $N_u$ to \(T=S^*\setminus S\). Finally, we notice that if we decompose flow $f$ into simple paths, any path from $N_u$ to $T$ must have the last edge originating in $N_u$, or more specifically in $N_a$. This proves that \(\sum_{n\in N_a} b_n \geq |T|d^*(1-a)\), which is claim b), and concludes the proof of the proposition.
$\square$
**Lemma 6:** Given a solution $(S,w)$ with $|S| \leq m$ that satisfies Condition 1, there exists a $v \notin S$ with $score_w(v) \geq d^*/4(1+\epsilon/2)$.
**Proof:** Apply Proposition 1 and set $a=1/2$, then by using that for any $a_1,\dots, a_n$, there exists an $a_i$ with $a_i \geq \sum_i a_i/n$, there is a $v \in T$ such that the set
$$A_{v,d^*/2}:=\Big\{n\in N \ | \ n,v \in E
\text{ and }\forall v'\in V, w_{n,v'} > 0
\Rightarrow supp_{w}(v') \geq
\frac{d^*}{2(1+\epsilon/2)} \Big\}$$
has $\sum_{n \in A_{v,d^*/2}} b_n \geq d^*/2$. Now for any $n \in A_{v,d^*/2}$, $slack(n,d^*/4(1+\epsilon/2))= \sum_{v:(n,v) \in E} w_n,v (1-d^*/4(1+\epsilon/2) supp_w(v)) \geq b_n/2$ and so $prescore(v,d^*/4(1+\epsilon/2)) \geq \sum_{n \in A_{v,d^*/2}} slack(n, d^*/4(1+\epsilon/2)) \geq d^*/4 > d^*/4(1+\epsilon/2)$. Thus $score_w(v) \geq d^*/4(1+\epsilon/2)$.
We can do better than this by using different $a$.
**Lemma 7** Given a solution $(S,w)$ with $|S| \leq m$ that satisfis Condition 1, there exists a $v \notin S$ with $score_w(v) \geq d^*/3.15(1+epsilon/4)$.
**Proof:** The following will be crucial:
**Lemma 8** Consider a finite sum $ \sum_i f(x_i) a_i$, where $f$ is strictly increasing with derivative $f'(x)$, $a_i \geq 0$ for all $i$ and for some $y \leq \min_i x_i$, $f(y) = 0$, then
$$\sum_i f(x_i) a_i = \int_{y}^infty f'(x) (\sum_{i:x_i \geq x} a_i) dx$$
**Proof:** We can write the sum as a Lebesgue integral over the measure with weights $a_i$, obtaining that:
$$\sum_i f(x_i) a_i = \int_0^\infty (\sum_{i:f(x_i) >t} a_i ) dt$$
The conditions on $f$ are enough for it to be invertible with derivative $df^{-1}/dt=1/f'(f^{-1}(t))$ and $f^{-1}(0)=y$, so we can substiture $x=f^{-1} t$ into the above to obtain:
\begin{align*}
\sum_i f(x_i) a_i & = \int_0^\infty (\sum_{i:f(x_i)} >t a_i ) dt \\
&= \int_y^\infty f'(x) (\sum_{i:f(x_i) >f(x)} a_i ) dx \\
&= \int_y^\infty f'(x) (\sum_{i:x_i > x} a_i ) dx \\
&= \int_y^\infty f'(x) (\sum_{i:x_i \geq x} a_i ) dx
\end{align*}
For some $0 \leq b \leq 1$, to be determined later, we consider $\sum_{v \in T} prescore(w,b d^*)$, we have
$$\sum_{v \in T} prescore(w,b d^*) \geq \sum_{n: \exists v \in T, (n,v) \in E} slack(n, b d^*) \geq \sum_{n \in N_b} slack(n,b d^*) \; .$$
Now for $n \in N_b$, let $supp(n)= \max_{v':w_{n,v'} > 0} supp_w(n)$ and $supp(n)=\infty$ if $w_{n,v'}=0$ for all $v'$. We certainly have $supp(n) \geq b d^*$ from the definition of $N_b$. More generally for $n \in N_b$, $n \in N_a$ if and only if $supp(n) \geq a d^*$. We have
$$slack(n, b d^*) = b_n - \sum_{v' \in S} w_n,v bd^*/ supp_w(v') \geq b_n (1- bd^*/supp(n))$$
and so using Lemma 8,
\begin{align*} \sum_{v \in T} prescore(w,b d^*) & \geq \sum_{n \in N_b} slack(n,b d^*) \\
& \geq \sum_{n \in N_b} b_n (1- bd^*/supp(n)) \\
&= \int_{b d^*}^{infty} (bd^*/x^2) (\sum_{n \in N_{x/d^*}} b_n) dx \\
& \geq \int_{b}^{1} (b/a^2) (1-a)|T| d^*/(1+\epsilon/2) da \\
& = (bd^*|T|/(1+\epsilon/2) \int_{b}^1 1/a^2 - 1/a da \\
&= (bd^*|T|/(1+\epsilon/2) (1/b-1+ln b) \\
&= (d^*|T|/(1+\epsilon/3) (1-b+bln b)
\end{align*}
So for any $b$ with $b(2-ln b) \leq 1$, we have that there is an $v \in T$ with
\begin{align*} prescore_w(b, b d^*/(1+\epsilon/4)) & \geq prescore_w(b, b d^*) \\
& \geq (d^*/(1+\epsilon/4) (1-b+bln b) \\
& \geq b d^*/(1+\epsilon/4)
\end{align*}
In particular, this holds for $b=1/3.15$. Thus there exists a $v \notin S$ with $score_w(v) \geq d^*/3.15(1+\epsilon/4)$.
$\square$
Now we can prove Theorem 1
**Proof of Theorem 1:** For a set $S$, we define $d^*_S$ to be the maximum over afforable $w$ of $supp_w(S)$.In order not to lose error with condition 1 (ii), we need:
**Lemma 9:** Let $(S,w)$ be a solution with $|S| \leq m$ that satisfies Condition 1. Let $(S',w')$ be a solution of any size with $S \subseteq S'$ and for all $v \in S$ with $supp_{w'}(v) < sepp_w(v)$, we have $supp_{w'}(v) \geq d$ for some $d$. Then $d^*_{S'}$ defined similarly has $d^*_{S'} \geq \min \{d^*_S, d/(1+\epsilon/4)\}$.
**Proof:** Note that $(1+\epsilon/12m)^{m+1}$ \leq (1+\epsilon/4)$. Since there are at most $m$ values of $supp_w(v)$ for $v \in S$, by the pidgeon hole principle, there must be an $x=d/(1+\epsilon/12m)^i$ for $1 \leq i \leq m$ such that no $v \in S$ has $x < supp_w(v) \leq x(1+\epsilon/12m)$. Let $X \subseteq S$ be the set of $v \in S$ with $supp_v(w) \leq x$.
Let $N_X=\{n:\exists v \in X, (n,v) \in E\}$. By condition 1, any $n \in N_X$ has $w_{n,v'}=0$ for $v' \notin X$.
By our construction, we also have $w'_{n,v'} =0$ for $v \notin X$.
If $X$ is empty, then $d^*_S \geq supp_w(S) \geq x \geq d/(1+\epsilon/3)$ and the construction gives a $w'$ with $d^*_{S'} \geq supp_{w'}(S') \geq d/(1+\epsilon/3)=\min \{d^*_S, d/(1+\epsilon/3)\}$ and we are done.
If $X$ is non-empty, then we claim that $d^*_S=d^*_{S'}=d^*_X$. Let $w^*_S$,$w^*_{S'}$ and $w^*_X$ be affordable assignments that achieve these. Then by optimality of $d^*_X$, $d^*_S \leq supp_{w^*_S}(X) \leq d^*_X$ and similarly $d^*_{S'} \leq d^*_X$. On the other hand, we have that
$$|X| d^*_X \leq \sum_{v \in X} supp_{w^*_X} \leq \sum_{n \in N_X} b_n = \sum_{v \in X} supp_{w} (X) \leq |X| x$$
Now consider modifying $w$ or $w'$ by setting $w_{n,v}$ for $v \in X, $n \in N_X$ to be $(w^*_X)_{n,v}$. Since $n \in n_X$ had $w_{n,v}=w'_{n,v} =0$ for $v \notin X$, this is still affordable and the supports for $v \notin X$ remain the same, that is $> x$. So these have minimum support $d^*_X$, and we have $d^*_{S'}=d^*_S$, which gives the the lemma.
Now we claim inductively that $d^*_{S_i} \geq d^*/3.15(1+\epsilon/4)^2$. Lemma 7 implies that there is a $v \notin S_{i-1}$ with $score_w(v) \geq d^*/3.15(1+\epsilon/4)$. We add this to $w$, while not reducing the support of any $v$ to below $d^*/3.15(1+\epsilon/4)$. Lemma 9 gives that $d_{S_i} \geq d^*/3.15(1+\epsilon/4)^2$
The induction gives that $d^*_{S_m} \geq d^*/3.15(1+\epsilon/4)^2$. Stare balances gives a $w_m$ that satiusfies condtion 1 (ii) and so $supp_{w_m}(S_m) \geq d^*_{S_m}/(1+\epsilon/4) \geq d^*/3.15(1+\epsilon/4)^3 \geq d^*/3.15(1+\epsilon)$.
For the PJR claim, suppose that $S_m$ does not satify PJR($d$) for some $d$. Then nor do any $S_i$, since they are subsets of $S_m$. So by Lemma 5, for every $0 \leq i \leq m-1$, there is a $v_i \notin S_i$ with $score_{w_i}(v_i) \geq d$. The same argument as above gives that $supp_{w_m}(S_m) \geq d/(1+\epsilon/4)^2 < d/(1+\epsilon)$. It follows that $S_m$ does satisfy PJR($supp_{w_m}(S_m)(1+epsilon)$).
Suppose that $S_m$ does not satisfy PJR but $\epsilon \leq 1/m$. Then by Lemma 5, there exists a $v \notin S$ with $score_{w_n}(v) \geq \sum_n b_n/m$. By Lemmas 2 and 1, we can run InsertNewCandidate($w_n, score_{w_n}(v)$), to obatian a $w'$ such that $\supp_{w'}(S_m \cup \{v\}) = \min \{ score_{w_n}(v), supp_{w_m}(S_m) \}$. But since $|S_m \cup \{v\}|=m+1$, we must have $\supp_{w'}(S_m \cup \{v\}) \leq \sum_n b_n/(m+1)$. Thus we get that $supp_{w_m}(S_m) \leq \sum_n b_n/(m+1)$. So if $\epsilon \leq 1/m$, then $supp_{w_m}(S_m) (1+\epsilon) \leq supp_{w_m(S_m) (m+1)/m \leq sum_n bn /m$ and since PJR
($d$) implies PJR($d'$) for $d' > d$, we have PJR. This means that $\epsilon \leq 1/m$ implies PJR.