Pseudorandomness from Shrinkage

One powerful theme in complexity theory and pseudorandomness in the past few decades has been the use of lower bounds to give pseudorandom generators (PRGs). However, the general results using this hardness vs. randomness paradigm suffer from a quantitative loss in parameters, and hence do not give nontrivial implications for models where we don't know super-polynomial lower bounds but do know lower bounds of a fixed polynomial. We show that when such lower bounds are proved using random restrictions, we can construct PRGs which are essentially best possible without in turn improving the lower bounds.

More specifically, say that a circuit family has shrinkage exponent $\Gamma$ if a random restriction leaving a $p$ fraction of variables unset shrinks the size of any circuit in the family by a factor of $p^{\Gamma + o(1)}$. Our PRG uses a seed of length $s^{1/(\Gamma + 1) + o(1)}$ to fool circuits in the family of size $s$. By using this generic construction, we get PRGs with polynomially small error for the following classes of circuits of size $s$ and with the following seed lengths:

For de Morgan formulas, seed length $s^{1/3+o(1)}$;

For formulas over an arbitrary basis, seed length $s^{1/2+o(1)}$;

For read-once de Morgan formulas, seed length $s^{.234...}$;

For branching programs of size $s$, seed length $s^{1/2+o(1)}$.

The previous best PRGs known for these classes used seeds of length bigger than $n/2$ to output $n$ bits, and worked only for size $s=O(n)$ [8].

1 INTRODUCTION

Two of the most important general challenges for complexity are to prove constructive lower bounds for non-uniform measures of computational complexity such as circuit size, and to show that randomized algorithms have efficient deterministic simulations. The “Hardness vs. Randomness” paradigm [1, 6, 7, 21, 23, 36] shows that these questions are linked. More precisely, these results show how to use any problem that is hard for a class of circuits to create a pseudorandom generator (PRG) for the same class of circuits. This PRG can then be used to construct a relatively efficient deterministic version of any probabilistic algorithm with a corresponding complexity. This has been used to create unconditional PRGs for circuit classes with known lower bounds, such as for $AC_0$, as well as for obtaining conditional results—implications between the existence of hard problems and derandomization for classes where no strong lower bounds are known. In the converse direction, it is easy to see that any PRG for a circuit class immediately gives a corresponding lower bound for the class. Somewhat more surprisingly, it has been shown that any efficient deterministic simulation of some probabilistic algorithms would yield circuit lower bounds [14, 17, 35]. This hardness vs. randomness connection is one of the most important tools in computational complexity. It formalizes the intuition that efficient algorithms for “meta-computational problems”, where the input is a computational device from a certain class, is linked to our ability to prove lower bounds for that class.

However, being so general comes at a quantitative price. Ideally, the stretch of a PRG (the output length as a function of the input length) equals the known lower bound. However, in the hardness- to-randomness constructions, there are a number of stages that each lose a large polynomial factor. In particular, this means that, for example, a quadratic or cubic circuit lower bound for a class does not immediately give any nontrivial PRG. For completely generic, “black-box”, reductions between a hard problem and a PRG, some of these costs are inherent [5, 11, 28, 34]. In particular, this is an issue for those models where super-linear but not super-polynomial bounds are known, such as Boolean formulas.

In this work, we show a general method for obtaining tight “hardness-to-randomness” results from the proofs of lower bounds, rather than as a black-box consequence of the lower bounds. In particular, our methods apply to lower-bound proofs that involve restricting some of the inputs to the circuit. Our construction goes in two stages. We start with a lower bound proved by the following kind of shrinkage argument: if we restrict a size $s$ circuit leaving a $p$ fraction of variables unset, the expected size of the restricted circuit is $O(p^{\Gamma } s)$. The best $\Gamma$ for which this holds is known as the “shrinkage exponent” of the circuit class. The first stage of our construction is to derandomize the shrinkage argument, showing that there is a distribution with similar shrinkage properties that can be sampled with few bits. This stage of our argument is general, but not totally generic. While the same general construction and analysis ideas work in a variety of models, the details depend on the model. Then we show how to go from such a distribution on restrictions to a PRG. This part is generic, being identical for all models, and is closelyrelated to the generator from [24]. The total number of bits $r$ used by the generator is roughly $s^{1/(1+\Gamma)}$ times the number of bits needed to sample from the distribution on restrictions.

Every generator using $r$ bits to fool tests of size $s=s(r)$ immediately gives a problem requiring size $\Omega (s(r))$ to compute in the model. So, if our function $s(r)$ is close to the known lower bounds, this shows that we have essentially converted all of the “hardness” in the lower bound to “randomness”. This is indeed the case for a variety of natural models of computation. For Boolean formulas over the de Morgan basis, we give a generator with $s(r) = r^{3-o(1)}$, almost matching the known lower bound of $s(n) = \Omega (n^3 / (\log ^2 n \log \log n)$ due to Håstad ([12] with a polylogarithmic improvement by Tal [31], based on earlier work by [4, 15, 25, 30]). To avoid technicalities, we assume that the size $s$ is at least the number of input variables $n$ in the following statements:

Theorem 1.1. For any constant $c\gt 0$, there is an explicit PRG using a seed of length $s^{1/3} \cdot 2^{O(\log ^{2/3} s)} = s^{1/3+o(1)}$ random bits that $s^{-c}$-fools formulas of size $s$ over the de Morgan basis.

For Boolean formulas over an arbitrary basis, our generator has stretch $s(r) = r^{2-o(1)}$, almost matching the Khrapchenko bound of $s(n) = \Omega (n^2)$ [18].

Theorem 1.2. For any constant $c\gt 0$, there is an explicit PRG using a seed of length $s^{1/2} \cdot 2^{O(\log ^{1/2} s)} = s^{1/2 + o(1)}$ random bits that $s^{-c}$-fools formulas of size $s$ over an arbitrary basis.

For branching programs, with size being the total number of edges, we get a similar bound.

Theorem 1.3. For any constant $c \gt 0$, there is an explicit PRG using a seed of length $s^{1/2} \cdot 2^{O(\log ^{1/2} s)} = s^{1/2 + o(1)}$ random bits that $s^{-c}$-fools branching programs of size at most $s$.

Note that similar to the case for arbitrary formulas, the above nearly matches known-size lower bounds for branching programs.

We also consider the case of read-once formulas over the de Morgan basis. Here, there is no sensible notion of lower bound, since all functions computable in the model have size exactly $n$, but the notion of shrinkage is defined. The optimal shrinkage exponent for such read-once de Morgan formulas was shown by [13] and [25] to be $\Gamma = \log 2 / \log (\sqrt {5}-1) = 3.27\ldots ;$ using this result, we get a PRG with stretch $s(r) = \Omega (r^{4.27\ldots})$.

Theorem 1.4. For any constant $c\gt 0$, there is an explicit PRG using a seed of length $s^{1/(\Gamma + 1)} \cdot 2^{O(\log ^{2/3} s)} = s^{1/(\Gamma + 1)+o(1)}$ random bits that $s^{-c}$-fools read-once formulas of size $s$ over the de Morgan basis, where $\Gamma = \log 2 / \log (\sqrt {5}-1) = 3.27\ldots .$

Any substantial improvement in our PRGs would thus yield better lower bounds than what is currently known.

Our results dramatically improve previous work. The only directly comparable PRG was by Bogdanov et al. [8], who constructed a PRG using a $(1-\Omega (1))n$ bit seed to output $n$ bits that fool read-once formulas and read-once branching programs, where the order of the bits is unknown beforehand.

There has been significant work on read-once branching programs where the order of the bits is known in advance (e.g., [16, 21, 24]), but that is a much simpler model and the generators of [16] and [21] are known to fail if the order of bits is unknown [8].

1.1 Outline of Constructions

Our techniques build upon those of [16, 22, 24]. The intuition behind all of these PRGs is to exploit communication bottlenecks in the computation. Suppose the random inputs to a computation can be partitioned into two subsets $X$ and $Y$, and the computation can be simulated with $k$ bits of communication between these two subsets. Then, given the communication pattern, the two sets of bits have high entropy conditioned on each other. Then, instead of using independent bits for the two sides, we can use a randomness extractor to convert the conditional entropy of $X$ given the communication into random bits to be used in place of $Y$.

Our construction follows the same basic intuition. The key insight is that shrinkage under random restrictions is a form of communication bottleneck, between $X$, the set of variables with values specified by the restriction $\rho$, and $Y$, the set of variables left unrestricted (and chosen later). Consider a one-way protocol where the player knowing $X$ has to send a message allowing the $Y$-player to compute the function $f$. What this message exactly needs to specify is the restricted function $f_{\rho }$. If the circuit size of $f_{\rho }$ is small, much smaller than the size of $X$, the message can be the circuit computing the restricted function, showing low communication.

Most of the previous constructions were for computational models like read-once branching programs, where one had an explicit description of which sets $X$ and $Y$ had a communication bottleneck, and there was a hierarchical structure available on such sets so that the construction could be defined recursively. Here, we do not have either one, but we know the bottleneck occurs for most sets $X$ and their complements. Instead of explicitly partitioning the variables into blocks with low communication, we randomly sample sets that exhibit this bottleneck until all variables are covered. So far, we are not able to utilize recursion, which blocks us from making the seed size subpolynomial (and hence proving superpolynomial lower bounds).

More concretely, consider the case of read-once width $w$ branching programs, where the bits may be read in any order (as opposed to some fixed order, which is the setting of Nisan [22]). In this arbitrary-order case, we show that the Nisan-Zuckerman PRG [24], without recursion, gives a PRG with seed length $\tilde{O}(\sqrt {n})$. Recall that this PRG uses an extractor $E:\lbrace 0,1\rbrace ^s \times \lbrace 0,1\rbrace ^d \rightarrow \lbrace 0,1\rbrace ^m$ and is defined by $G(x,y_1,\ldots ,y_t) = E(x,y_1) E(x,y_2) \ldots E(x,y_t)$, where $x \in \lbrace 0,1\rbrace ^s$ and $y_i \in \lbrace 0,1\rbrace ^d$.

To see that this works, suppose a branching program accepts a uniform input with significantly different probability than the output of $G$. By a hybrid argument, changing some $Z_i = E(X,Y_i)$ to uniform must change the probability significantly (note that $Y_i$ are independent of each other). However, if we fix all bits except the $m$ bits corresponding to $Z_i$, we are left with a read-once branching program on these $m$ bits. There are at most $w^{wm}$ such branching programs on $m$ bits. Thus, if we condition on a typical such branching program for these $m$ bits, $X$ still has min-entropy at least $s-2mw\log w$. As long as this exceeds the min-entropy requirement of the extractor, the extractor output is close to uniform, contradicting the assumption of significantly different acceptance probabilities. We can set $t=m=\sqrt {n}$, $s=4mw\log w$, and $d=O(\log n)$.

For general branching programs, we need to handle variables that are read many times, which we can do by pseudorandomly permuting the output of the above generator. However, for general formulas and to get a general reduction, we need to extend the above generator. We do this by combining the extractor outputs with pseudorandom restrictions that shrink with high probability (and leave every bit unfixed with the same probability). Specifically, for a restriction $\rho$ that leaves $m$ bits unfixed, we can define the random variable $V_\rho \in \lbrace 0,1\rbrace ^n$ that takes the values of $\rho$ for the fixed bits and the values of $E(X,Y_\rho)$ for the unfixed bits. We do this for enough independent pseudorandom restrictions that with high probability every coordinate has some $\rho$ which leaves that coordinate unfixed (via a coupon collector bound). The PRG output is the XOR of all these $V_\rho$.

In fact, the above achieves the desired bounds only when the shrinkage $\Gamma = 1$. For larger shrinkage, we must also apply a $k$-wise independent distribution to $E(X,Y_\rho)$.

Derandomized Shrinkage Bounds. To use our main generator construction, we need a family of random restrictions that can be sampled with few random bits, and still causes the circuits to shrink. For branching programs and formulas over an arbitrary basis (shrinkage exponent $\Gamma = 1$), these are not too hard to get by taking $O(\log n)$-wise independent random restrictions. For formulas over the de Morgan basis and read-once formulas getting such restrictions is far trickier.

The first difficulty we face is that Håstad's original proof [12] only shows shrinkage in expectation and does not give a high probability bound for the formulas to shrink. We get around this difficulty as follows: Let $f$ be a formula over the de Morgan basis. We first show shrinkage under restrictions for which the probability of being unset $p = n^{-\alpha }$ for some $\alpha = o(1)$ and have $k=n^{o(1)}$-wise independence. By repeating this process independently, we get shrinkage for all values of $p$ (both in the known lower bounds and in our PRG construction we need $p \sim n^{1/(\Gamma + 1)}$). To do this, we decompose the target formula $f$ into $O(n/\ell)$ subformulas $g_i$ of size at most $\ell$, for a suitable $\ell \lt k$. Since each $g_i$ now has size at most $k$, the behavior of $g_i$ under restrictions should be the same under $k$-wise independent restrictions or truly random restrictions. Thus, we can roughly expect each $g_i$ to shrink by $p^{\Gamma }$ in expectation.

For read-once formulas, the events that the different $g_i$ shrink are independent, and hence, by a Chernoff bound, with high probability the total shrinkage is as promised. For the read-$t$ case (each variable appears at most $t$ times in the formula), we partition the subformulas into $t\ell + 1$ color classes, such that within a color class the subformulas are on disjoint sets of variables. We can then proceed as in the read-once case. For the general case, we condition on heavy variables (the ones that appear many times) in a subtle way and reduce to the read-$t$ case.

1.2 Related Work

Independently and concucurrently to this work, Komargodski and Raz [19] showed an average-case lower bound for de Morgan formulas nearly matching Andreev's [4] worst-case lower bound: Komargodski and Raz [19] give an explicit function on $n$ variables such that any de Morgan formula of size $n^{2.499}$ agrees with the function on at most $1/2+\varepsilon$ fraction of the inputs, where $\varepsilon = \exp (n^{-\Omega (1)})$ is exponentially small. In the course of showing their result, Komargodski and Raz also show that shrinkage happens with high probability as opposed to in expectation, which compares to Lemmas 4.2 and 4.8 in this work. However, the corresponding results in [19] work with truly random restrictions and achieve an exponent of 1.5 for de Morgan formulas.

1.3 Subsequent Work

Subsequent to our work, Komargodski et al. [20] used ideas from our article and [19] to improve the average-case lower bound for de Morgan formulas. Specifically, they gave an explicit function on $n$ variables such that any de Morgan formula of size $n^{3-o(1)}/r^2$ agrees with the function on at most $1/2 + 2^{-r}$ fraction of the inputs.

Trevisan and Xue [32] used ideas related to ours to give a derandomized switching lemma and an improved PRG for ${\rm AC}^0$.

Progress has also been made on PRGs fooling small-width read-once branching programs [9, 26, 29], where the order of the bits is unknown beforehand. However, our PRG fools the much larger class of branching programs of a given size.

2 PRELIMINARIES

We start with some definitions and notations.

For a restriction $\rho \in \lbrace 0,1,*\rbrace ^n$, let the set of active variables be $A(\rho) = \lbrace i: \rho _i = *\rbrace$.
For $\rho \in \lbrace 0,1,*\rbrace ^n$ and $f:\lbrace 0,1\rbrace ^{n} \rightarrow \lbrace 0,1\rbrace$, define the $\rho$-restricted function $f \lceil _{\rho }:\lbrace 0,1\rbrace ^{A(\rho)} \rightarrow \lbrace 0,1\rbrace$ by $f \lceil _{\rho }(y) = f(x)$, where $x \in \lbrace 0,1\rbrace ^n$ satisfies $x_i = y_i, i \in A(\rho)$ and $x_i = \rho _i$ otherwise. When there is no ambiguity about subscripts, we sometimes use $f_\rho$ to denote $f \lceil _{\rho }$.
Call a distribution $\mathcal {D}$ on $\lbrace 0,1,*\rbrace ^n$ $p$-regular if for every $i \in [n]$, $\mathsf {Pr}_{\rho \leftarrow \mathcal {D}}[\rho _i = *] = p$. We say $\mathcal {D}$ is $k$-wise independent if any $k$ coordinates of $\mathcal {D}$ are independent. There exist explicit $k$-wise independent distributions samplable with $O(k(\log n) \log (1/p))$ random bits (see e.g., [3]).
For a class of functions $\mathcal {F}$ on $n$ variables, we say is a size function if $|\lbrace f \in \mathcal {F}: s(f) \le m\rbrace | \le m^{O(m)}$ for $m \ge n$. By default, we shall assume that $\mathcal {F}$ is closed under negating the input variables and for any $f \in \mathcal {F}$, $s(g) = O(s(f))$ if $g$ is obtained from $f$ by negating some input variables. We also assume that any $f \in \mathcal {F}$ depends on at most $s(f)$ variables.
We say two distributions $\mathcal {D},\mathcal {D}^{\prime }$ (on the same universe) are $\varepsilon$-close if the statistical distance between $\mathcal {D}$ and $\mathcal {D}^{\prime }$ is at most $\varepsilon$.
We say a generator $G:\lbrace 0,1\rbrace ^r \rightarrow \lbrace 0,1\rbrace ^n$ $\delta$-fools a function $f:\lbrace 0,1\rbrace ^n \rightarrow \lbrace 0,1\rbrace$ if

\[ \left|\underset{{x \in _u \lbrace 0,1\rbrace ^n}}{\mathsf {Pr}}[f(x) = 1] - \underset{y \in _u \lbrace 0,1\rbrace ^r}{\mathsf {Pr}}[f(G(y)) = 1]\right| \le \delta . \]

Similarly, we say $G$ $\delta$-fools a class of functions $\mathcal {F}$ if $G$ $\delta$-fools all functions in $\mathcal {F}$. We refer to the parameter $r$ as the seed-length of the generator $G$ and say $G$ is explicit if $G$ can be computed in $\mathrm{poly}(n,1/\delta)$ time.
Throughout, we use uppercase letters for random variables and lowercase for constants.

As mentioned in the introduction, our generator is motivated by the pseudorandom generator for small space machines of Nisan and Zuckerman [24]. As in their article, our construction will make use of extractors for linear min-entropysources.¹

Definition 2.1 (Extractor). We say $E:\lbrace 0,1\rbrace ^N \times \lbrace 0,1\rbrace ^d \rightarrow \lbrace 0,1\rbrace ^m$ is a $(k,\varepsilon)$-extractor if for every random variable $X$ over $\lbrace 0,1\rbrace ^N$ with $H_\infty (X) \ge k$, and $Y \in _u \lbrace 0,1\rbrace ^d$, $E(X,Y)$ is $\varepsilon$-close to the uniform distribution on $\lbrace 0,1\rbrace ^m$. We say $E(\;)$ is explicit if it can be computed in time $\mathrm{poly}(N,d)$.

We use the following explicit extractor as given by the work of Zuckerman [37], with a modest lower bound on $\varepsilon$ eliminated by the work of Guruswami et al. [10].

Theorem 2.2 [10, 37]. Forany $\varepsilon \gt 0$, there is an explicit function $E:\lbrace 0,1\rbrace ^N \times \lbrace 0,1\rbrace ^d \rightarrow \lbrace 0,1\rbrace ^m$ that is an $(N/2,\varepsilon)$-extractor with $m = N/4$ and $d = O(\log (N/\varepsilon))$.

We will use well-known constructions of $k$-wise independent random variables.

Lemma 2.3 [2]. Forevery $k \ge 2$ there is a PRG $G_k:\lbrace 0,1\rbrace ^{r_k} \rightarrow \lbrace 0,1\rbrace ^{n}$ generating a $k$-wise independent distribution with seed length $r_k = 1+\lfloor k/2 \rfloor \log (2n)$.

We also use the following large-deviation bounds for $k$-wise independent random variables. The first is an easy corollary of Theorem 4 in [27].

Lemma 2.4. Let with $\max _i a_i = m$, and suppose $X_1,\ldots ,X_n \in \lbrace 0,1 \rbrace$ are $k$-wise independent indicator random variables with $\Pr [X_i =1] = p$. Let $X=\sum _i a_i X_i$ and . Then, $\mathsf {Pr}[\, X \ge 2 k (m + \mu)\,] \le 2^{-k}$.

When the expectation is small, the following simple lemma sometimes gives better bounds:

Lemma 2.5. Suppose $X_1,\ldots ,X_n \in \lbrace 0,1 \rbrace$ are $k$-wise independent indicator random variables with $\Pr [X_i =1] = p$. Let $X=\sum _i X_i$ and . Then, $\mathsf {Pr}[X \ge k] \le \mu ^k/k!$.

Proof. This probability is at most ${\tbinom{n}{k}}p^k \le (np)^k/k!$.

3 PSEUDORANDOM GENERATORS FROM SHRINKAGE

We now describe our main construction which allows us to use classical lower bound arguments based on random restrictions to get pseudorandom generators (PRGs). Our main result will apply to any class of functions with nontrivial “shrinkage exponent”. We next define this central notion.

Definition 3.1. Let $\mathcal {F}$ be a class of functions with an associated size function and let $\mathcal {D}$ be a $p$-regular distribution on $\lbrace 0,1,*\rbrace ^n$. We say $\mathcal {F}$ has shrinkage exponent $\Gamma$ with respect to $\mathcal {D}$ if, for all $f \in \mathcal {F}$,

We say $\mathcal {F}$ has $\varepsilon$-shrinkage exponent $\Gamma$ w.r.t $\mathcal {D}$ if, there exists a constant $c$ such that for all $f \in \mathcal {F}$,

\[ \underset{\rho \leftarrow \mathcal {D}}{\mathsf {Pr}}\left[s(f_\rho) \gt c\, (p^{\Gamma } s(f) + 1)\cdot \log (1/\varepsilon)\right] \le \varepsilon . \]

The shrinkage exponent is a classical concept in complexity theory with its origins going back to the very first lower bounds of Subbotovskaya [30]. The best lower bounds we know for several important complexity classes such as read-once formulas, de Morgan formulas are based on estimating the shrinkage exponent of the associated class. This connection can be summarized in the following informal statement:

Theorem 3.2 [4]. Ifa class $\mathcal {F}$ has shrinkage exponent $\Gamma$, then there is an explicit Boolean function $h:\lbrace 0,1\rbrace ^n \rightarrow \lbrace 0,1\rbrace$ that cannot be computed by functions in $\mathcal {F}$ of size at most $n^{\Gamma + 1}/\mathrm{poly}(\log n)$.

Our main result shows that with some additional guarantees on the behavior of $\mathcal {F}$ under random restrictions, one can actually get very strong average-case lower bounds, PRGs, for $\mathcal {F}$. Our construction and its analysis are quite different from that of Andreev, and give the first pseudorandom generators with $o(n)$ seed-length for several well-studied classes of functions like read-once formulas, de Morgan formulas, branching programs of linear size.

Theorem 3.3. Fix $\varepsilon \gt 0$ and let $\mathcal {F}$ be a class of functions from $\lbrace 0,1\rbrace ^n$ to $\lbrace 0,1\rbrace$ with an associated size function . Fix $s \gt 0$ and let $p = 1/s^{1/(\Gamma + 1)}$. Let $\mathcal {D}_p$ be a $p$-regular distribution on $\lbrace 0,1,*\rbrace ^n$ such that $\mathcal {F}$ has $\varepsilon$-shrinkage exponent $\Gamma$ w.r.t $\mathcal {D}$. Then, there exists an explicit pseudorandom generator $G:\lbrace 0,1\rbrace ^r \rightarrow \lbrace 0,1\rbrace ^n$ that $\delta$-fools all functions of size at most $s$ in $\mathcal {F}$ for $\delta = O(\varepsilon \cdot r)$ and has seed-length

\[ r = O\big((R(s) + \log (s/\varepsilon)) \cdot \log (n/\varepsilon)\cdot s^{1/(\Gamma +1)}\big), \]

where $R(s)$ denotes the number of bits needed to efficiently sample from $\mathcal {D}_p$.

Proof of Theorem 3.3. Here is a high-level description of the generator. We use the restriction family $\mathcal {D}_p$ to sample $t$ restrictions $\rho _1,\ldots ,\rho _t$ so that together, the set of active ($*$) variables in them covers $[n]$ (with high probability). We next have to choose the assignments for the active variables in the restrictions. Instead of choosing these variables independently (which would lead to no gains in the number of bits used), we use a single string $X$ and independent seeds $Y_1,\ldots ,Y_t$ (which are much shorter) to set the values for the unassigned variables in the restrictions according to $E(X,Y_1),E(X,Y_2),\ldots ,E(X,Y_t)$ where $E(\;)$ is an explicit extractor as given by Theorem 2.2. Our PRG then outputs the XOR of these $t$ strings. If the shrinkage exponent $\Gamma \gt 1$, then the extractor output isn't long enough, and we actually use $G_k(E(X,Y_i))$ in place of $E(X,Y_i)$, where $G_k$ generates a suitably chosen $k$-wise independent distribution. In order to just have one construction, we actually use $G_k$ regardless of $\Gamma$.

More formally, fix $p = 1/s^{1/(1+\Gamma)}$ and $t = \lceil \log (n/\varepsilon)/p\rceil$. For $k$ to be chosen later, let $G_k:\lbrace 0,1\rbrace ^{r_k} \rightarrow \lbrace 0,1\rbrace ^{n}$ be a PRG generating a $k$-wise independent distribution on $\lbrace 0,1\rbrace ^n$, from Lemma 2.3. Let $N \ge 4r_k$ and let $E:\lbrace 0,1\rbrace ^N \times \lbrace 0,1\rbrace ^d \rightarrow \lbrace 0,1\rbrace ^{r_k}$ be an explicit extractor that works for entropy rate at least $1/2$ sources with $d = O(\log (N/\varepsilon))$ and error at most $\varepsilon$ as given by Theorem 2.2.

We now describe our PRG by giving a randomized algorithm to compute its output.

Sample $t$ independent restrictions $\rho _1, \rho _2,\ldots ,\rho _t$ from $\mathcal {D}_p$.
Sample $X \in _u \lbrace 0,1\rbrace ^N$ and $Y_1,\ldots ,Y_t \in _u \lbrace 0,1\rbrace ^d$ independently.
For $1 \le i \le t$, let $Z_i = G_k(E(X,Y_i))$.
For $1 \le i \le t$, define $V_i \in \lbrace 0,1\rbrace ^n$ by $(V_i)_j = (Z_i)_j$ if $j \in A(\rho _i)$ and $(\rho _i)_j$ otherwise.
Output $V \equiv G(\rho _1,\ldots ,\rho _t,X,Y_1,\ldots ,Y_t) = V_1 \oplus V_2 \oplus \cdots \oplus V_t$, where $\oplus$ denotes bit-wise XOR.

We will show that for $N = \tilde{O}(p^\Gamma s)$ sufficiently large, functions in $\mathcal {F}$ of size at most $s$ cannot distinguish $V$ from a truly random string. We will do so by a hybrid argument. To this end, let $Z_i^{\prime }$ be independent uniformly random strings in $\lbrace 0,1\rbrace ^n$ and with $\rho _1,\ldots ,\rho _t$ as in the definition of $V$, define $U_i \in \lbrace 0,1\rbrace ^n$, $1 \le i \le t$, by $(U_i)_j = (Z_i^{\prime })_j$ if $j \in A(\rho _i)$ and $(\rho _i)_j$ otherwise.

For $0 \le i \le t$, let $W_i = U_1 \oplus \cdots \oplus U_{i} \oplus V_{i+1} \oplus V_{i+2} \oplus \cdots \oplus V_t$. Then, $W_0 \equiv V$ and $W_t = U_1 \oplus \cdots \oplus U_t$. We first observe that $W_t$ is $\varepsilon$-close to the uniform distribution on $\lbrace 0,1\rbrace ^n$.

Claim 3.4. The distribution of $W_t$ is $\varepsilon$-close to the uniform distribution on $\lbrace 0,1\rbrace ^n$.

Proof. Observe that if $\cup _{i=1}^t A(\rho _i) = [n]$, then $W_t$ is exactly the uniform distribution on $\lbrace 0,1\rbrace ^n$. Thus, it suffices to bound the probability that . Now, as $\mathcal {D}_p$ is $p$-regular, for every $i \in [t]$, $j \in [n]$, $\mathsf {Pr}[j \in A(\rho _i)] = p$ and these events are independent for different $i \in [t]$. Thus, $\mathsf {Pr}[j \notin \cup _{i=1}^t A(\rho _i)] = (1-p)^t \le \varepsilon /n$. The claim now follows by a union bound.

From the above claim, it suffices to show that $\mathcal {F}$ cannot distinguish $W_0$ from $W_t$. Fix a $f \in \mathcal {F}$ with $s(f) \le s$ and $i \ge 1$. We will show that $f$ cannot distinguish between $W_{i-1}$ and $W_{i}$.

Claim 3.5. For $i \ge 1$, $|\mathsf {Pr}[f(W_{i-1}) = 1] - \mathsf {Pr}[f(W_i) = 1]| \le 3 \varepsilon$.

Proof. Let $W = U_1 \oplus \cdots \oplus U_{i-1} \oplus V_{i+1} \oplus \cdots \oplus V_t$. Then, $W_{i-1} \equiv W \oplus V_{i}$ and $W_{i} = W \oplus U_{i}$. The intuition behind our argument is as follows: Note that $W$ does not depend on $\rho _i, Y_i$. Let $f_W:\lbrace 0,1\rbrace ^n \rightarrow \lbrace 0,1\rbrace$ be given as $f_W(x) = f(x \oplus W)$. Now, by our assumption about the shrinkage exponent of $\mathcal {F}$, for any fixing of $W$, with high probability over the choice of $\rho _i$, $s(f_W \lceil _{\rho _i}) \le c p^{\Gamma } s \log (1/\varepsilon)= s_0$. Let $\mathcal {E}$ denote this event. Observe that conditioned on $\mathcal {E}$, the restricted function $f_W \lceil _{\rho _i}$ can be described with roughly $O(s_0\log s_0)$ bits (as it has size at most $s_0$). We then argue that under conditioning on $\mathcal {E}$ (which is independent of $Y_i$), $X$ has min-entropy at least $N - O(s_0\log s_0)$ even when given the function $g = f_W \lceil _{\rho _i}$. Therefore, for $N$ sufficiently large, $E(X,Y_i)$ is $\varepsilon$-close to a uniformly random string on $\lbrace 0,1\rbrace ^{r_k}$, so that $G_k(E(X,Y_i))$ is $\varepsilon$-close to a $k$-wise independent distribution on $\lbrace 0,1\rbrace ^n$. Finally, as $g$ depends only on at most $s(f_W \lceil _{\rho _i}) \le s_0$ variables, we get that $g$ cannot distinguish $V_i$ from a truly random string if $k \ge s_0$. We now formalize this intuition.

For brevity, let $H$ denote the random variable $f_W \lceil _{\rho _i}:\lbrace 0,1\rbrace ^{A(\rho _i)} \rightarrow \lbrace 0,1\rbrace$. Observe that $H$ is independent of $Y_i$ and the domain of $f_W \lceil _{\rho _i}$ is independent of $X$ (it is a random variable over $\rho _i$ as well as $W$). We abstract the two essential properties of the random restriction family $\mathcal {D}_p$ that we shall exploit.

Fact 3.6. With probability at least $1-\varepsilon$, $s(H) \le s_0$. In particular, with probability at least $1-\varepsilon$: (1) $H$ can be described by $c s_0 \log s_0$ bits for a constant $c$ and (2) $H$ is fooled by $s_0$-wise independent distributions.

Proof. By our assumption about $\mathcal {D}_p$, for every $W$, $\mathsf {Pr}_{\rho _i}[s(f_W \lceil _{\rho _i}) \gt s_0] \le \varepsilon$. The claim now follows as the number of functions in $\mathcal {F}$ of size at most $s_0$ is $s_0^{O(s_0)}$ and any function of size $s_0$ has at most $s_0$ relevant variables.

Let $\mathcal {F}_i^{\prime }$ denote the set of all possible values for $H$ obtained under arbitrary settings of $W$ and $\rho _i$. Let $\mathcal {F}_i = \lbrace g \in \mathcal {F}_i^{\prime }: \mathsf {Pr}_{W,\rho _i}[H = g] \ge \varepsilon /s_0^{c s_0}\rbrace$. Let $\mathcal {E}^{\prime }$ denote the event that the conclusions of Fact 3.6 hold and let $\mathcal {E}$ be the event that $H \in \mathcal {F}_i$ and $\mathcal {E}^{\prime }$. Note that conditioned on $\mathcal {E}^{\prime }$, the number of possibilities for $H$ is at most $s_0^{cs_0}$ as $H$ is described completely by $c s_0 \log s_0$ bits. Therefore,

\begin{equation} 1 - \mathsf {Pr}[\mathcal {E}] \le \mathsf {Pr}[\lnot \mathcal {E}^{\prime }] + \mathsf {Pr}[ (H \notin \mathcal {F}_i) \wedge \mathcal {E}^{\prime }] \le \varepsilon + s_0^{c s_0} \cdot \varepsilon /s_0^{cs_0} \le 2\varepsilon , \end{equation} (3.1)

for $c$ a sufficiently large constant.

In the remaining argument, we condition on the event $\mathcal {E}$. From the above equation, this will only affect our error estimate by an additive $2\varepsilon$. Fix an element $g \in \mathcal {F}_i$. Then, as the random function $H$ equals $g$ with probability at least $\varepsilon /s_0^{cs_0}$, conditioning on this event cannot change the min-entropy of $X$ much:

\[ H_\infty (X | H = g) \ge N - \log (1/\varepsilon) - c s_0 \log s_0. \]

Recall that $H$ is independent of $Y_i$. Thus, by the definition of the extractor, for $N \ge 2c s_0\log s_0 + 2\log (1/\varepsilon)$, $E(X,Y_i)$ is $\varepsilon$-close to the uniform distribution on $\lbrace 0,1\rbrace ^{r_k}$ even conditioned on $H = g$. In particular, $Z_i = G_k(E(X,Y_i))$ is $\varepsilon$-close to a $k$-wise independent distribution. Therefore, even conditioned on $H = g$, $(V_i)_{A(\rho _i)} = (Z_i)_{A(\rho _i)}$ is $\varepsilon$-close to $k$-wise independent. Finally, note that $f(W_{i-1}) = (f_W \lceil _{\rho _i})((V_i)_{\rho _i}) = H((V_i)_{\rho _i})$ and similarly, $f(W_i) = H((U_i)_{\rho _i})$. Thus, for $k \ge s_0$,

As the above is true for every $g \in \mathcal {F}_i$, it follows that

Combining Equation (3.1) and the above equation, we get

The claim now follows.

Combining Claims 3.4 and 3.5 and summing from $1 \le i \le t$, we get that

Let us now estimate the seed-length of the generator. To generate $V,$ we need to sample $\rho _1,\ldots ,\rho _t$ and $X, Y_1,\ldots ,Y_t$ for a total of

\begin{equation*} r = (R(s) + d) t + N = O(R(s) + \log (s_0/\varepsilon))\cdot (\log (n/\varepsilon))/p + O(s_0 \log s_0). \end{equation*}

Substituting $s_0 = c p^{\Gamma } s\log (1/\varepsilon)$, in the above equation gives us the theorem. The above calculation explains our choice of $p = 1/s^{(\Gamma + 1)}$: we want to balance out $1/p$ and $p^{\Gamma } s$.

We next use Theorem 3.3 to get PRGs for specific classes of functions.

4 PRGS FOR FORMULAS

A formula is a tree where each leaf is labeled by a literal (either a variable or its negation) and each internal node by an operation of constant arity. Any such tree naturally defines a function $f$. Let $L(f)$ denote the formula size (number of leaves) of $f$. We assume, without loss of generality, that $L(f) = n$, the number of variables, since we can always add dummy variables otherwise. A de Morgan formula is a binary tree with the set of allowed operations being $\lbrace \wedge ,\vee \rbrace$.

The simplest case for our arguments is formulas over an arbitrary basis, since these have shrinkage 1. More challenging are de Morgan formulas. It has been known for many years that shrinkage for such general formulas is 2 [12] and for read-once formulas (no variable appears more than once) is $\log 2 / \log (\sqrt {5}-1) = 3.27\ldots$ [13]. In this section, we show that even pseudorandom restrictions using $n^{o(1)}$ random bits achieve essentially the same shrinkage with high probability. This will be shown in Lemmas 4.2 and 4.8. We then use Theorem 3.3 to get Theorems 1.2, 1.1, and 1.4.

In our arguments, we will often have to handle “heavy” variables—variables that appear in many leaves. The following lemma shows that any $s$ variables can only increase the formula size by a factor of about $2^s$:

Lemma 4.1. Let $f$ be any formula, and let $H$ denote any subset of the variables. For each $h \in \lbrace 0,1 \rbrace ^H$, let $\rho _h$ denote the restriction formed by setting variables in $H$ to $h$, leaving all other variables unfixed. Then $L(f) \le \sum _{h \in H} (L(f \lceil _{\rho _h}) + |H|) \le 2^{|H|} (\max _{h \in H} L(f \lceil _{\rho _h})+|H|)$.

Proof. Let $\mathsf {I}_{H=h}(x)$ denote the formula of size $|H|$ which is true iff all variables in $H$ are set to $h$. Then $f = \bigvee _{h \in \lbrace 0,1 \rbrace ^H} ((\mathsf {I}_{H=h}(x)) \wedge (f \lceil _{\rho _h}) )$.

We begin with formulas over an arbitrary basis.

4.1 Arbitrary Basis

Here Lemma 4.1 and concentration bounds imply shrinkage with $\Gamma = 1$ for $O(\log n)$-wise independent restrictions.

Lemma 4.2. For any constant $c$ and formula $f$ with $L(f)=n$, a $(p=1/\sqrt {n})$-regular $c\log n$-wise independent random restriction $\rho$ yields

\[ \Pr \left[ L(f \lceil _{\rho }) \ge 2^{3\sqrt {c\log n}} pn\right] \le 2n^{-c}. \]

Proof. Let $k=c\log n$. The formula $f$ depends on at most $n$ variables $x_1,\ldots ,x_n$. Let variable $x_i$ appear as a leaf $n_i$ times, so $n=L(f) = \sum _i n_i$. For $\alpha$ to be chosen later, call $x_i$ heavy if $n_i \ge p^{1-\alpha }n$ and light otherwise. Then, for $H,$ the set of heavy variables, $|H| \le p^{\alpha - 1}$. Let $H(\rho)$ denote the heavy variables set to * by $\rho$, and $h(\rho) = |H(\rho)|$. Define a new restriction $\rho ^{\prime }$ with $\rho ^{\prime }(i) = \rho (i)$ for $i \not\in H(\rho)$, and adversarially chosen in $\lbrace 0,1 \rbrace$ otherwise. Lemma 4.1 implies that $L(f \lceil _{\rho }) \le 2^{1+h(\rho)} L(f \lceil _{\rho ^{\prime }})$. We bound

\begin{equation*} \Pr \left[ L(f \lceil _{\rho }) \ge 2^{h+3} k \cdot p^{1-\alpha }n\right] \le \Pr [h(\rho) \ge h] + \Pr \left[L(f \lceil _{\rho ^{\prime }}) \ge 4 k\, p^{1-\alpha } n\right]. \end{equation*}

Define random variables $X_i = 1$ if $\rho (x_i) = *$, and 0 otherwise. We bound the first term with Lemma 2.5. Here, we have $\mu = |H| p \le p^\alpha$. Thus, as long as $h\alpha \ge 2c,$ this term will contribute at most $n^{-c}$.

For the second term, we need only consider light variables $L$ and apply Lemma 2.4. Now the coefficients are the $n_i$. Note that $\mu \le pn$ and $m = \max _{x_i \in L} n_i \lt p^{\alpha - 1}n$, so $m + \mu \le 2p^{1-\alpha }n$. Hence Lemma 2.4 bounds the second term by $2^{-k} \le n^{-c}$.

Setting $h= \lceil 2c/\alpha \rceil$, we minimize $2^h (1/p)^\alpha$ by setting $\alpha = 2\sqrt {c/\log n}$.

This allows us to set $\Gamma =1$ in Theorem 3.3, yielding Theorem 1.2.

4.2 De Morgan Basis

We follow the high-level intuition described in the introduction by writing each formula as a composition of smaller subformulas (these, as we will see, are obtained essentially by looking at specific nodes of the original formula). One subtle issue we face in carrying out the approach is that the subformulas $g_i$ in our decomposition will have some overlapping nodes, which in turn forces some additional constraints on these nodes. We next show that these additional constraints can be removed with only a minor loss. Throughout this section, we assume that $\Gamma$ denotes the shrinkage exponent for the class of formulas under consideration—$\Gamma = 2$ for general formulas and $\Gamma = 3.27\ldots$ for read-once formulas.

The lemma below intuitively says that we can write any formula $f$ as a function of smaller formulas $g_i$ such that the size of any restriction $\rho$ of $f$, $f \lceil _{\rho }$ can be bounded as a sum of the sizes of the restrictions $g_i \lceil _{\rho ^{\prime }}$ for a suitably defined $\rho ^{\prime }$. In doing so, we will introduce some new variables which will be unrestricted under $\rho ^{\prime }$ and the restriction $\rho ^{\prime }$ will agree with $\rho$ on the variables of $f$. We will also ensure that each of the formulas $g_i$ only depends on at most two newly introduced variables.

Lemma 4.3. For any positive $\ell$ and any formula $f$ on a set of variables $X$ with $L(f) \ge \ell$, there exists at most $6n/\ell$ formulas $g_i$ with $L(g_i) \le \ell$, where the $g_i$ may depend on variables outside $X$, such that the following holds: For any restriction $\rho$, $L(f \lceil _{\rho }) \le \sum _i L(g_i \lceil _{\rho ^{\prime }})$, where $\rho ^{\prime }(j) = \rho (j)$ for $j \in X$, and $\rho ^{\prime }(j) = *$ otherwise. Moreover, each $g_i$ depends on at most 2 variables outside $X$ (called special variables).

This follows from the following claim:

Claim 4.4. Any binary tree of size $n \ge \ell$ can be decomposed into at most $6n/\ell$ subtrees² of size at most $\ell$, such that each subtree has at most two other subtree children. Here, subtree $T_1$ is a child of subtree $T_2$ if there exists nodes $t_1 \in T_1$, $t_2 \in T_2$, such that $t_1$ is a child of $t_2$.

Proof. Proceed inductively, using the well-known fact that any binary tree of size $s$ can be divided into two edge-disjoint subtrees, each of size between $s/3$ and $2s/3$. This results in subtrees of size between $\ell /3$ and $\ell$, and hence there are at most $3n/\ell$ of them. For each subtree $T$ with more than two subtree children, find a subtree $T^{\prime }$ of $T$ with exactly two subtree children, and divide $T$ into $T^{\prime }$ and $T\setminus T^{\prime }$. Note that $T\setminus T^{\prime }$ now has one fewer subtree child. Continue doing this until all subtrees have at most two subtree children. This process can continue at most the original number of subtrees steps, and hence the total number of such subtrees is as desired.

Proof of Lemma 4.3. View the formula $f$ as a tree. By Claim 4.4, we can decompose $f$ into subformulas $g_i$, where each input to $g_i$ is either an ordinary variable in $X$ or a special input: the output of some other $g_j$. In each $g_i$, replace these special inputs with distinct, new variables not in $X$. The total number of new variables is at most the number of subformulas.

We'd now like to show that restricting by $\rho ^{\prime }$ is not much worse than restricting by $\rho$, i.e., requiring a few variables to be * does not hurt the restricted size too much. We want to show this simply using results about restrictions by $\rho$ as a black box. For general formulas, this follows from Lemma 4.1. However, for read-once formulas, we need a different method. This method involves replacing these special variables by relatively short formulas which are unlikely to get fixed. We show that such read-once formulas exist using a result of Valiant [33] on computing the majority function by monotone formulas.

Lemma 4.5. For any $0 \lt p,\varepsilon \lt 1$, there exists a read-once formula $h$ of size at most $(\log (1/\varepsilon)/p)^{4}$ such that a $p$-regular (truly) random restriction fixes $h$ with probability less than $\varepsilon$.

Proof. We shall use Valiant's result on computing the majority function using monotone formulas [33]. His main result is a probabilistic way to construct monotone formulas for majority. However, the formulas he constructs come from a distribution on read once formulas of size $\mathrm{poly}(1/p)$ so that, if the inputs have bias $1/2+ p$ (of being 1), they almost always output 1, and if they have bias $1/2 - p$, they almost always output 0. He then boosts the error probability to be exponentially small in $n$. We don't need to do that last step.

The point is that, if a monotone formula has the above property, then it is resistant to restrictions leaving $p$ fraction of bits unrestricted. Because, if we go back and set the unset bits to 1, we get random bits biased towards 1 as inputs, and if we set them to 0, we get random inputs biased towards 0. Since the output has to change with high probability, the circuit cannot be constant after the restriction.

The precise bound one gets from Valiant's arguments is $O((\log (1/\varepsilon))^2/p^{3.27\ldots}) \lt (\log (1/\varepsilon)/p)^4$.

Lemma 4.6. Suppose that, for all formulas $f$ of size $\ell _0$ and a $p$-regular random restriction $\rho$, . Suppose $g$ is a formula with $w$ special variables with $L(g) \ge \ell _0$. Let $\rho ^{\prime }$ be a $p$-regular restriction with the constraint that the special variables in $g$ be unrestricted. Then,

Proof. Construct a new formula $g^{\prime }$ by replacing each special variable in $g$ by the formula $h$ given in Lemma 4.5 for $\varepsilon = 1/w\, p^{\Gamma }L(g)$, on disjoint sets of variables. Let $A$ denote the event that none of these formulas $h$ is fixed. The key observation is that, conditioned on $A$, we have $L(g^{\prime } \lceil _{\rho }) \ge L(g \lceil _{\rho ^{\prime }})$. Therefore,

Thus, for $r = (\log (1/\varepsilon)/p)^{4}$,

The lemma follows.

We next show that $k$-wise independent restrictions shrink formulas in which no variable is read too many times with high probability.

Lemma 4.7. There is a large enough constant $c$, such that for any $\varepsilon \gt 0$, $p \ge n^{-1/(4\Gamma)}$, $t \le p^{9\Gamma }n/(c\log (n/\varepsilon))$, and any read-$t$ formula $f$ on $n$ variables with $L(f) = n$, a $p$-regular $(k=c\log (n/\varepsilon)/p^\Gamma )$-wise independent restriction $\rho$ yields $\Pr [ L(f \lceil _{\rho }) \ge 60\, p^\Gamma n] \le \varepsilon$.

Proof. Set $\ell = 1/p^{4\Gamma }$. Use Lemma 4.3 to get the formulas $g_i$. By Lemma 4.1 and Lemma 4.6, for any $g_i$, we have .

Form a graph where the vertices are the $g_i$, with an edge between $g_i$ and $g_j$ if they share a variable. This graph has $m$ vertices, where $n/\ell \le m \le 6n/\ell$, and degree at most $\ell t$. Hence, we can divide the vertices into independent sets of size at least $s=\lfloor m/(\ell t + 1) \rfloor \ge c\log (n/\varepsilon)/(2p^\Gamma)$.

For any such independent set $I$, note that $Y_i = L(g_i \lceil _{\rho ^{\prime }})/\ell$ are independent random variables in the range $[0,1]$. Hence, we can apply Lemma 2.4 for large enough $c$ to show that

Thus, by a union bound, with probability at least $1-\varepsilon$ no such event occurs. The lemma follows because

We now remove the assumption that the formula is read $t$, leading to our final derandomized shrinkage bound.

Lemma 4.8. For any constant $c \ge 11$, any $p \ge n^{-1/\Gamma }$, and any formula $f$ on $n$ variables with $L(f) = n$, there is a $p$-regular distribution on restrictions $\rho$ using a $2^{O(\log ^{2/3} n)}$ bit seed such that

\[ \Pr \,\left[ L(f \lceil _{\rho }) \ge 2^{3c\log ^{2/3} n} \cdot p^\Gamma n\,\right] \le n^{-c}. \]

Before proving the lemma we first note that combining the lemma with Theorem 3.3, and the shrinkage exponent estimates of [12] and [13] directly implies Theorem 1.1 and Theorem 1.4, respectively.

We now prove Lemma 4.8. We will implement this $p$-regular restriction as a sequence of $r$ $q$-regular restrictions, where $p=q^r$ and $q=n^{-\alpha }$ for some $\alpha$ only slightly sub-constant. For each of the $r$ rounds of restriction, we will have a set of at most $n^{6\alpha }$ heavy variables, which can change in each round. We handle the heavy variables by conditioning on the values of the restrictions applied to the heavy variables for the current round and six rounds ahead, and then applying Lemma 2.5. We handle all other variables with Lemma 4.7. Note that the shrinkage exponents proved in [12] and [13] have an extra polylogarithmic term. However, the extra factor when restricting by $q=n^{-\alpha }$ is $polylog(n)$, so the total extra factor is $(polylog(n))^r$, which can be absorbed into the $2^{O(\log ^{2/3} n)}$ term. We now formalize this.

Proof. Set $q=p^{1/r}$ for an $r \ge 11$ such that $q=n^{-\alpha }$ for $\alpha$ to be chosen later. Let $k_0=n^{10\alpha }$, and $k=rk_0$. Our pseudorandom $p$-regular restriction will be the composition of $r$ independent restrictions $\rho _i$, where each $\rho _i$ is a $k$-wise independent $q$-regular restriction.

The analysis proceeds in rounds. Let $X_0=X$ denote the $n$ variables for $f$. Let $X_i$ denote the unfixed variables in round $i$, and let $n_i = |X_i|$. Let $f_0 = f$, and let $f_i$ denote the restricted formula after $i$ rounds. Call a variable $x_j$ heavy in round $i$ if $x_j$ appears more than $t_i = L(f_i)/n^{10\alpha }$ times in $f_i$. Letting $H_i$ denote the heavy variables in round $i$, we see that $|H_i| \le n^{10\alpha }$. Let $H = \cup H_i$, and $Y_i = X_i \setminus H$. Let $p_i = p/q^i$.

We now condition on the heavy variables in a somewhat subtle way. In round 1, we condition on all of $\rho _1$, as well as the values of all $\rho _i$ on the variables $H_1$. Now all the $\rho _i$ on $H_1$ determine $\rho$ on $H_1$. Since all $\rho _i$ are $k$-wise independent, so is $\rho$. Since $k \ge |H_1|$, Lemma 2.5 implies that

\begin{equation} \Pr [\,\hbox{$\rho $ leaves at least $s$ variables from $H_1$ unfixed}\,] \le (n^{10\alpha }p)^s \le n^{-\alpha s}. \end{equation} (4.1)

Now, conditioned on all $\rho _i$ on $H_1$, $\rho _1$ remains $k_0$-wise independent on $X\setminus H_1$. Suppose $\rho$ leaves $s_1 \lt s$ variables unfixed from $H_1$. Then for each setting $\tau$ of these $s_1$ variables, Lemma 4.7 implies that

\begin{equation} \Pr \left[ L(f \lceil _{\rho _1 \cup \rho (H) \cup \tau }) \ge 60\, q^\Gamma n\right] \le \exp (n^{-\Omega (\alpha)}). \end{equation} (4.2)

Combining ( 4.1) and ( 4.2) with Lemma 4.1, we obtain

\[ \Pr \left[ L(f \lceil _{\rho _1 \cup \rho (H)}) \ge 2^{s+6} q^\Gamma n\right] \le 2n^{-\alpha s}. \]

We continue in this manner, in round $i$ fixing $\rho _i$ as well as all $\rho _j$, $j \ge i$, on $H_i$. We do this for $r-11$ rounds; as in (4.1), the $p$ becomes $p_i$ and we need to ensure that $n^{10\alpha } p_i \le n^{-\alpha }$. Thus,

\[ \Pr [ L(f \lceil _{\rho }) \ge (2^{s+6} q^\Gamma)^{r-11}\cdot n] \le 2(r-11)n^{-\alpha s}. \]

Since $q^{\Gamma r} = p^\Gamma$, the extra factor we lose in the formula size beyond $p^\Gamma$ is at most $2^{(s+6)(r-11)}n^{11\alpha }$. To make the error at most $n^{-c}$, we set $s=2c/\alpha$. Since $p \ge 1/n$, we have $r \le 1/\alpha$. Thus, the extra factor is at most $2^{rs}n^{11\alpha } = 2^{2c/\alpha ^2 + 11\alpha \log n}$. To minimize this exponent (up to constants), we set $\alpha = (\log n)^{-1/3}$. We restrict $c \ge 11$ in the lemma statement so that $2c+11 \le 3c$. Note that the seed-length is $O(k \log n)$ which is $2^{O(\log ^{2/3} n)}$ as stated in the lemma.

5 PRGS FOR BRANCHING PROGRAMS

We now apply our main generator construction to get PRGs for branching programs with seed-length $s^{1/2 + o(1)}$ for branching programs of size $s$. We first formally define branching programs.

Definition 5.1. An $n$-variable branching program (BP) $M$ is a directed acyclic graph with the following properties:

There are three special vertices—start which has no incoming edges and two terminal vertices accept, reject which have no outgoing edges.
Every vertex in the graph is labeled with a variable from $\lbrace x_1,\ldots ,x_n\rbrace$.
Every non-terminal vertex has two outgoing edges labeled $\lbrace 0,1\rbrace$.

The size of $M$, $s(M)$, is defined as the number of vertices in $M$. The length of $M$ is defined as the length of the longest path from start to either of the terminals. We say $M$ is read-once if no two vertices in a path from start to the terminals have the same label.

A branching program $M$ as above, naturally induces a function $M:\lbrace 0,1\rbrace ^n \rightarrow \lbrace 0,1\rbrace$ that on input $x \in \lbrace 0,1\rbrace ^n$, traverses the graph according to $x$ and outputs 1 if the computation reaches accept and 0 otherwise.

We shall construct an explicit pseudorandom generator for branching programs of size at most $s$ with seed-length $s^{1/2+o(1)}$ and error $1/\mathrm{poly}(s)$. Previously, only PRGs with seed-length $\Omega (n)$ were known even for the special case of layered read-once branching programs³ (ROBPs) with each layer constrained to have a constant number of nodes (constant width). For the more restricted class of oblivious ROBPs (these are ROBPs where the labeling of the layers is known a priori) of length at most $T$ and width at most $W$, Nisan [22] and Impagliazzo et al. [16] gave PRGs with seed-length $O((\log T)(\log (T/\varepsilon) + \log W))$ to get error $\varepsilon$.

We now prove Theorem 1.3. The arguments here are basically the same as those from Section 4.1. To this end, we first show an analogue of Lemma 4.1 for branching programs.

Lemma 5.2. Let $f$ be any BP, and let $H$ denote any subset of the variables. For each $h \in \lbrace 0,1 \rbrace ^H$, let $\rho _h$ denote the restriction formed by setting variables in $H$ to $h$, leaving all other variables unfixed. Then, $s(f) \le 2^{|H|} \cdot (\,\max _{h \in H} s(f \lceil _{\rho _h}) + 2)$.

Proof. Build a complete decision tree $T$ of depth $|H|$ so that leaves correspond to specific assignments for the variables in $h$. We can now obtain a BP $M_f$ for $f$, by appending a BP for $f \lceil _{\rho _h}$ to the leaf of $T$ corresponding to the assignment $h$. Clearly, the resulting BP has size as stated.

Lemma 5.3. For any constant $c$ and BP $f$ with $s(f)=n$, a $(p=1/\sqrt {s})$-regular $c\log s$-wise independent random restriction $\rho$ yields

\[ \Pr \left[ s(f \lceil _{\rho }) \ge 2^{3\sqrt {c\log n}}\cdot p\,s\right] \le 2\, s^{-c}. \]

Proof. Let $k=c\log s$. The BP $f$ on at most $s$ variables $x_1,\ldots ,x_s$. For $i \le s$, let the number of vertices labeled $x_i$ be $n_i$. Then, $s = s(f) = \sum _i n_i$. We now repeat the calculations from Lemma 4.2 for this setting of $n_i$’s.

For $\alpha$ to be chosen later, call $x_i$ heavy if $n_i \ge p^{1-\alpha } s$ and light otherwise. Then for $H$ the set of heavy variables, $|H| \le p^{\alpha - 1}$. Let $H(\rho)$ denote the heavy variables set to * by $\rho$, and $h(\rho) = |H(\rho)|$. Define a new restriction $\rho ^{\prime }$ with $\rho ^{\prime }(i) = \rho (i)$ for $i \not\in H(\rho)$, and adversarially chosen in $\lbrace 0,1 \rbrace$ otherwise. Lemma 4.1 implies that $s(f \lceil _{\rho }) \le 2^{h(\rho)+1} s(f \lceil _{\rho ^{\prime }})$. We bound

\begin{equation*} \Pr \left[ s(f \lceil _{\rho }) \ge 2^{h+3} k \cdot p^{1-\alpha } s\right] \le \Pr [h(\rho) \ge h] + \Pr \left[s(f \lceil _{\rho ^{\prime }}) \ge 4 k\, p^{1-\alpha } s\right]. \end{equation*}

For the second term, we need only consider light variables $L$ and apply Lemma 2.4. Now the coefficients are the $n_i$. Note that $\mu \le p s$ and $m = \max _{x_i \in L} n_i \lt p^{\alpha - 1}n$, so $m + \mu \le 2p^{1-\alpha } s$. Hence, Lemma 2.4 bounds the second term by $2^{-k} \le s^{-c}$.

Setting $h= \lceil 2c/\alpha \rceil$, we minimize $2^h (1/p)^\alpha$ by setting $\alpha = 2\sqrt {c/\log s}$.