Chapter 10: Borrowed Judgment

Thesis: When you lack the capacity to verify, bring it in from outside. Either place a trusted judge inside the loop (an oracle), or use many mutually independent and unreliable judges and trust their agreement (redundancy / consensus).

The previous pair of moves still worked on yourself, shrinking the unknown. But sometimes what you lack is not information but the power of judgment itself: you simply have no capacity to render a reliable verdict on the matter before you. The response in this pair of moves is to stop looking inside yourself and to borrow judgment from elsewhere. There are two ways to borrow. Either you bring in a judge you trust (an oracle), or you assemble many judges who do not trust one another and trust their agreement (redundancy).

The Oracle in the Loop: Bringing In a Judge

The pure form of the first move: at the decision point where you lack the capacity to verify, insert an external judge and let it deliver the verdict you cannot.

The plainest version is the human-in-the-loop of Chapter 5, the expert consultation, the hard case escalated upward. But the most profound form of this move hides in two seemingly unrelated places.

One is interactive theorem proving. De Bruijn's AUTOMATH of 1970¹, Edinburgh LCF (Gordon, Milner, and Wadsworth, 1979²), down to today's Coq (Bertot and Castéran, 2004³), all embody the same division of labor: the human supplies the flash of proof insight that the machine cannot produce (the oracle), and the machine scrupulously checks every step (certificate checking). The oracle does the "finding," the machine does the "verifying," meshing exactly with the asymmetry of Chapter 2.

The other is more astonishing: the interactive proof of complexity theory. How can a computationally feeble verifier, faced with a powerful but untrustworthy prover, elicit a reliable answer to a question it cannot compute on its own? The answer given by Goldwasser, Micali, and Rackoff in 1989⁶ and by Babai in 1985⁷ is: through repeated interrogation plus random challenge. The verifier throws out random questions it could not itself predict, and if the prover is lying, sooner or later it will give itself away on some challenge. Shamir's astonishing $\mathrm{IP}=\mathrm{PSPACE}$ of 1992¹⁵ shows that by this method alone, "interrogating an untrustworthy oracle," a weak verifier can reliably adjudicate an enormous class of problems; Blum and Kannan's "programs that check their work" of 1995¹⁷ and Goldwasser et al.'s "computation for mortals" of 2015¹⁹ belong to the same lineage. This is the purest mathematization of "borrowed judgment": even if the oracle is untrustworthy, so long as you interrogate it cleverly, you can still wring reliability out of it.

The unifying idea is this: manufacture a reliability you do not possess on your own by bringing in an external judge. Its standard failure mode is just as plain: the oracle is itself unreliable or biased. The arbiter you bring in may simply be a wrong arbiter, and the question "who verifies the oracle" carries you into a regress with no retreat.

Redundancy: Synthesizing Reliability from Many Unreliable Parts

The second move changes direction: instead of bringing in one trustworthy judge, it convenes many untrustworthy ones and trusts their agreement.

Its theoretical foundation has two cornerstones. Von Neumann proved in 1956⁴ that one can assemble computation of arbitrary reliability out of components that are themselves error-prone, by stacking redundancy. Condorcet's jury theorem of 1785⁵ supplies its arithmetic: if every judge is slightly better than chance (accuracy $p>\tfrac12$) and they are independent of one another, then the probability that the majority vote is correct tends toward certainty as the number of judges grows,

$$P_N\to 1\quad(N\to\infty).$$

The cross-domain forms of this move spread very wide. In distributed systems it is Byzantine fault tolerance: the Byzantine generals problem of Pease, Shostak, and Lamport (1980)⁸ and Lamport et al. (1982)⁹ seeks consensus under the condition that some nodes may act maliciously (adversarially), and the classic threshold is that the number of nodes must satisfy $n\ge 3f+1$ in order to tolerate $f$ traitors; Castro and Liskov's PBFT of 1999¹¹ built it into a practical system (while the impossibility theorem of Fischer, Lynch, and Paterson, 1985¹⁰, marks out its boundary). In machine learning it is the ensemble: the ensemble methods of Hansen and Salamon (1990)²⁰ and Dietterich (2000)²², and Breiman's random forest of 2001²³, use a crowd of weak models voting to beat a single strong one. In crowds it is the "wisdom of crowds" (Surowiecki, 2004²⁵). In 1906, at a country fair, Galton recorded the independent guesses of about eight hundred villagers on the weight of an ox; not one of them guessed it exactly, yet the average of all the estimates was 1197 pounds, and the true weight of the ox was 1198 pounds: the whole crowd together was off by almost nothing at all. Hong and Page even proved in 2004²⁴ that under suitable conditions a diverse group of ordinary problem solvers can beat a group of experts. In science it is peer review and replicated experiment (Chapter 3); in engineering it is RAID and quorum; in medicine it is the second opinion.

But this move has one crucial precondition that must be stressed with a whole section: independence. Redundancy holds only when failures are uncorrelated. Average many estimates, and the variance falls with the number of judges,

$$\mathrm{Var}(\bar X)=\frac{\sigma^2}{N};$$

but once there is a positive correlation $\rho$ among these judgments, the variance no longer tends to zero. It sticks at a floor,

$$\mathrm{Var}(\bar X)=\rho\,\sigma^2+\frac{(1-\rho)\,\sigma^2}{N}\ \xrightarrow{N\to\infty}\ \rho\,\sigma^2.$$

The correlation floor of redundancy: once correlation is present, piling on more judges cannot get past it

Correlation cancels the entire value of redundancy at a stroke. Stack as many judges as you like, and you still cannot get past this floor set by correlation. This is no abstraction: Knight and Leveson's famous experiment of 1986¹³, which had many programmers independently write programs to the same specification, expected their errors to be mutually uncorrelated, but found that they stumbled in the same places, because human beings, facing the same hard point, make the same mistakes (Eckhardt and Lee had predicted this theoretically as early as 1985¹²). Groupthink, flawed training data drawn from a common source, common-mode failure: all are this floor making itself visible. This is the standard failure mode of redundancy: believing in independence where in truth there is correlation.

The Confluence of the Two Moves, and an Echo Across Chapters

Put the two moves side by side: one brings in a single, expensive oracle, the other synthesizes many cheap, independent judgments, and both borrow a power of judgment you do not possess on your own. The lever they share is to supply yourself with the verifying capacity you lack; their failure modes, too, stand opposed in pairs: the single oracle may be wrong, the many judgments may be secretly correlated.

There is an episode in mathematics that ties this pair of moves together with the certificate of the previous chapter. Appel and Haken's proof of the four color theorem in 1977²⁶ drew lasting controversy because it relied on computer exhaustion, which amounted to asking the mathematical community to trust an oracle. Later Gonthier in 2008²⁷ redid it as a machine-checkable formal proof, and Hales's team²⁸ did the same for the Kepler conjecture: they converted "trusting an oracle" into "checking a certificate." What MacKenzie²⁹ tracks in Mechanizing Proof is precisely how this trust shifts among people, machines, and social processes; DeMillo et al.'s¹⁴ thesis that a proof is, at bottom, a social process comes down in the end to placing the credibility of mathematics on the redundancy of human judgment.

One thing must be seen clearly, though: up to this point, the first two pairs of moves, compressing the unknown and borrowing judgment, are still pursuing the same thing, the truth of the object. They still want to know whether the matter is right or wrong. The next pair of moves does something more radical: it stops demanding that truth.

Next chapter: 11. Swap the Problem →← 9. Compressing the Unknown

References

Waypoints: 1. historical scientific judgment; 2. theoretically studied material; 3. how science progresses; 4. how to live in an unverifiable world. This section was checked source by source.

N. G. de Bruijn (1970). "The mathematical language AUTOMATH, its usage, and some of its extensions." In Symposium on Automatic Demonstration. Springer (Lecture Notes in Mathematics 125), pp. 29-61. [2] De Bruijn introduced AUTOMATH, one of the earliest formal languages able to let an entire body of mathematics be checked step by step by a machine, with the human writing the proof and the machine verifying that it is sound. It is the earliest engineered specimen of this chapter's "oracle in the loop": the human supplies the idea, the machine does nothing but scrupulously check, and the reader can see in it the source of the division of labor between "finding" and "verifying."
M. Gordon, R. Milner, C. Wadsworth (1979). Edinburgh LCF: A Mechanized Logic of Computation. Springer (Lecture Notes in Computer Science 78). [2] This book proposed the LCF interactive proof system, whose design has been deeply influential: a small trusted kernel guarantees the reliability of every inference step, and no number of proof tactics can ever get around it. It supplies the classic paradigm for this chapter's account of "certificate checking," and the reader can see how the "trusted checker" is contracted into a part as small and as reliable as possible.
Y. Bertot, P. Castéran (2004). Interactive Theorem Proving and Program Development. Coq'Art: The Calculus of Inductive Constructions. Springer (Texts in Theoretical Computer Science, EATCS Series). [2] This is the authoritative tutorial for the Coq proof assistant, explaining systematically how to construct and machine-check proofs interactively atop the calculus of inductive constructions. It carries the tradition represented by the previous two entries into contemporary practice and is the tool foundation for this chapter's later formalization work on the four color theorem and the Kepler conjecture; the reader who wants to understand "human gives the idea, machine verifies every step" hands-on can start here.
J. von Neumann (1956). "Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components." In C. E. Shannon and J. McCarthy, eds., Automata Studies (Annals of Mathematics Studies 34). Princeton University Press, pp. 43-98. [2] Von Neumann here proves that one can assemble computation of arbitrarily reliable performance out of components that are themselves error-prone, by stacking redundancy and majority voting. This is one cornerstone of this chapter's "redundancy" move, supplying the earliest rigorous argument for "synthesizing reliability from many unreliable parts," and the reader should read its core idea of how redundancy drives down the error rate.
Marquis de Condorcet (1785). Essai sur l'application de l'analyse à la probabilité des décisions rendues à la pluralité des voix. Imprimerie Royale, Paris. [2][4] In this work on voting, Condorcet gives the famous jury theorem: if every judge is slightly better than chance and they are independent of one another, the probability that the majority vote is correct tends toward certainty as numbers grow. It supplies the arithmetic skeleton of this chapter's redundancy move and also plants its weak point in advance; the reader should note the theorem's dependence on the premise of "independence."
S. Goldwasser, S. Micali, C. Rackoff (1989). "The Knowledge Complexity of Interactive Proof Systems." SIAM Journal on Computing, 18(1), pp. 186-208. [2] This paper founded the theoretical framework of interactive and zero-knowledge proofs: a verifier of limited computational power, through repeated interrogation plus random challenge, can wring a reliable verdict out of an untrustworthy prover. It is the purest mathematical source of this chapter's "interrogating an untrustworthy oracle," and the reader should read how it uses randomness to force out a truthful answer.
L. Babai (1985). "Trading Group Theory for Randomness." In Proceedings of the 17th Annual ACM Symposium on Theory of Computing (STOC), pp. 421-429. [2] Babai here independently proposed the Arthur-Merlin class of interactive proofs with randomness, staking out the same theoretical territory at almost the same moment as the previous entry. It reinforces this chapter's central idea: the random challenge is the key weapon by which a weak verifier subdues a strong and untrustworthy prover, and the reader can read it alongside the previous entry for the complementary perspective.
M. Pease, R. Shostak, L. Lamport (1980). "Reaching Agreement in the Presence of Faults." Journal of the ACM, 27(2), pp. 228-234. [2] This paper first rigorously characterized how to reach consensus when some nodes may behave arbitrarily maliciously, giving the famous threshold: to tolerate $f$ traitors, the number of nodes must satisfy $n\ge 3f+1$. It is the source of this chapter's redundancy move in its adversarial version within distributed systems, and the reader should read the impossibility argument behind this threshold.
L. Lamport, R. Shostak, M. Pease (1982). "The Byzantine Generals Problem." ACM Transactions on Programming Languages and Systems, 4(3), pp. 382-401. [2] This paper used the famous "Byzantine generals" metaphor to recast the result of the previous entry as a parable, and from then on "Byzantine fault tolerance" became the common name for adversarial consensus. It is the signature text for this chapter's idea of seeking agreement from many mutually distrustful judges, and the reader can read how it dresses an abstract threshold in an intuitively clear story.
M. J. Fischer, N. A. Lynch, M. S. Paterson (1985). "Impossibility of Distributed Consensus with One Faulty Process." Journal of the ACM, 32(2), pp. 374-382. [2] This famous FLP impossibility theorem proves that in a fully asynchronous system, even if only one process may crash, there exists no deterministic consensus algorithm guaranteed to terminate. It marks out the boundary for this chapter's redundancy move, and the reader should read how it shows that consensus is not unconditionally available, thereby understanding why later practical systems must rely on extra assumptions to get around it.
M. Castro, B. Liskov (1999). "Practical Byzantine Fault Tolerance." In Proceedings of the 3rd USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 173-186. [2] The PBFT algorithm proposed in this paper was the first to turn Byzantine fault tolerance from theory into a system that runs in a real asynchronous network with acceptable performance. It is the key step by which this chapter's redundancy move descended from paper to engineering, and the reader can read how it kept the $n\ge 3f+1$ threshold while squeezing the overhead into a practical range, and can also recognize the direct precursor of later blockchain consensus.
D. E. Eckhardt, L. D. Lee (1985). "A Theoretical Basis for the Analysis of Multiversion Software Subject to Coincident Errors." IEEE Transactions on Software Engineering, SE-11(12), pp. 1511-1517. [2] This paper argued theoretically that multiversion software, even when developed independently by different people, need not have independent errors: facing the same hard point, different versions tend to fail together, making the gain from redundancy far lower than the independence assumption would predict. It supplies a theoretical prediction of this chapter's "correlation floor" ahead of the experiment, and the reader should read how it characterizes common-cause error.
J. C. Knight, N. G. Leveson (1986). "An Experimental Evaluation of the Assumption of Independence in Multiversion Programming." IEEE Transactions on Software Engineering, SE-12(1), pp. 96-109. [2] This is the famous experiment: many programmers were set to write programs independently to the same specification, expecting the errors to be mutually uncorrelated, but it was found that they stumbled together at the same hard points, and the independence assumption was refuted by experience. It supplies the empirical confirmation of the theoretical prediction of the previous entry, and is the most persuasive instance of this chapter's failure mode of "believing in independence where in truth there is correlation."
R. A. De Millo, R. J. Lipton, A. J. Perlis (1979). "Social Processes and Proofs of Theorems and Programs." Communications of the ACM, 22(5), pp. 271-280. [3][4] This famous and controversial paper argues that what makes a mathematical proof credible is not the mechanical correctness of formal derivation but the social process by which the mathematical community repeatedly tests, propagates, and accepts it, and on this basis it questions the prospects of formal program verification. It supports this chapter's view that credibility is in the end placed on the redundancy of human judgment, and the reader should read its argument that a proof is, at bottom, a social process.
A. Shamir (1992). "IP = PSPACE." Journal of the ACM, 39(4), pp. 869-877. [2] Shamir proved how astonishing the power of interactive proof is: by interrogating an untrustworthy prover alone, a weak verifier can reliably adjudicate the whole of PSPACE, an enormous class of problems, that is, $\mathrm{IP}=\mathrm{PSPACE}$. It is the most forceful mathematical footnote to this chapter's "borrowed judgment," and the reader should read how it delimits the ceiling reachable by "cleverly interrogating an oracle."
C. Lund, L. Fortnow, H. Karloff, N. Nisan (1992). "Algebraic Methods for Interactive Proof Systems." Journal of the ACM, 39(4), pp. 859-868. [2] This paper introduced the algebraic method of arithmetizing Boolean formulas and then checking them with polynomials, and it was precisely this technical groundwork that led directly to the proof of $\mathrm{IP}=\mathrm{PSPACE}$ in the previous entry. It reveals the concrete mechanism of "clever interrogation": translating the verification problem into an algebraic identity that can be spot-checked at random. It stands as further reading around this chapter's lineage, and the reader can follow it into this set of techniques.
M. Blum, S. Kannan (1995). "Designing Programs That Check Their Work." Journal of the ACM, 42(1), pp. 269-291. [2] This paper proposed the idea of the "program checker": let a program, when it gives a result, carry an independent and cheap checking routine that verifies whether this particular output is correct, without trusting the program itself. It brings the spirit of interactive proof to everyday computation, and for this chapter it is the model of the idea of "not trusting the producer, only checking its product"; the reader should read the construction of its checker.
S. Arora, C. Lund, R. Motwani, M. Sudan, M. Szegedy (1998). "Proof Verification and the Hardness of Approximation Problems." Journal of the ACM, 45(3), pp. 501-555. [2] This is one of the core papers of the famous PCP theorem: any proof can be rewritten into a special format such that the verifier need only randomly spot-check a constant number of bits in it to judge its truth or falsity with high confidence. It pushes "spot-checking is enough" to the extreme and is the theoretical pinnacle of "how a weak verifier can efficiently check an enormous proof," a natural piece of further reading around this chapter; the reader should read the astonishing conclusion of its probabilistically checkable proofs.
S. Goldwasser, Y. T. Kalai, G. N. Rothblum (2015). "Delegating Computation: Interactive Proofs for Muggles." Journal of the ACM, 62(4), Article 27. [2][4] This paper makes interactive proof genuinely serve "mortals": a user of limited computational power outsources a computation to a powerful but untrustworthy server, then checks whether the result is correct at a cost far smaller than recomputing it. It is where this chapter's ideas land in the age of cloud computing, and the reader should read how it turns "delegating computation for the weak, verifiably" into a practically feasible protocol.
L. K. Hansen, P. Salamon (1990). "Neural Network Ensembles." IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), pp. 993-1001. [2] This paper showed, relatively early, that combining several independently trained neural networks to vote can give an overall accuracy significantly higher than any single network. It is the beginning of this chapter's redundancy move within machine learning, and the reader should read how it carries "majority voting lowers error" from logic circuits over to learning models, and again lands on the dependence on decorrelating member errors.
A. Krogh, J. Vedelsby (1995). "Neural Network Ensembles, Cross Validation, and Active Learning." In Advances in Neural Information Processing Systems 7. MIT Press, pp. 231-238. [2] This paper gives the classic decomposition of ensemble error: the overall error of the ensemble equals the average error of the members minus the disagreement among the members. It gives a machine-learning version of the precise "correlation floor" formula that chimes with this chapter, and the reader should read how it shows mathematically that the more diverse the members are, the more each errs in its own way, the more the ensemble is worth.
T. G. Dietterich (2000). "Ensemble Methods in Machine Learning." In Multiple Classifier Systems (MCS 2000). Springer (Lecture Notes in Computer Science 1857), pp. 1-15. [2] This is a widely influential survey that sorts out why ensemble methods work and explains it from three angles: statistical, computational, and representational. It is a convenient entry point for the reader to survey the whole landscape of this chapter's redundancy move within machine learning, gathering the scattered techniques of voting, bagging, and boosting under one framework to be understood together.
L. Breiman (2001). "Random Forests." Machine Learning, 45(1), pp. 5-32. [2] Breiman's random forest cultivates a crowd of mutually decorrelated decision trees through double randomization over samples and features, then votes to synthesize a powerful and robust predictor. It is one of the most successful practical specimens of the redundancy move, and its significance for this chapter is that its whole power comes precisely from deliberately manufactured independence; the reader can read how it actively drives down the correlation among members.
L. Hong, S. E. Page (2004). "Groups of Diverse Problem Solvers Can Outperform Groups of High-Ability Problem Solvers." Proceedings of the National Academy of Sciences, 101(46), pp. 16385-16389. [2][3][4] Hong and Page argue, by way of a formal model, that under suitable conditions a group made up of diverse ordinary problem solvers can beat a group of homogeneous experts, because the difference in perspective that diversity brings is itself a resource. It generalizes this chapter's intuition that "independence and diversity are the heart of redundancy" to human groups, and the reader should read its thesis of "diversity beats ability" and its boundaries.
J. Surowiecki (2004). The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies, and Nations. Doubleday. [3][4] This widely circulated book by Surowiecki argues that under conditions of diversity, independence, decentralization, and a proper aggregation mechanism, the collective judgment of a crowd often beats that of an individual expert, and he repeatedly stresses that once independence is lost and convergence sets in, the crowd turns stupid. It brings this chapter's redundancy move to the everyday and social level, and the reader should read its repeated insistence on "the preconditions under which the wisdom of crowds holds."
K. Appel, W. Haken (1977). "Every Planar Map Is Four Colorable. Part I: Discharging." Illinois Journal of Mathematics, 21(3), pp. 429-490. [1][3] This is the proof of the four color theorem, the first major mathematical proof in history to depend essentially on computer exhaustion of a great many cases, and for that reason it set off a long-running argument over "whether one can trust a machine conclusion that no human hand can check case by case." For this chapter it is the signature case of "trusting an oracle" and its cost, and the reader should read how this argument forced out the demand for a machine-checkable certificate.
G. Gonthier (2008). "Formal Proof: The Four-Color Theorem." Notices of the American Mathematical Society, 55(11), pp. 1382-1393. [2][3] Gonthier used Coq to redo the four color theorem as a fully formalized proof that can be checked step by step by machine, thereby converting the "please trust the computer" predicament of the previous entry into "checking a certificate." It is a key point of contrast for this chapter, and the reader should read how it demonstrates that demoting the output of an untrustworthy oracle to an independently verifiable certificate dissolves the controversy along with it.
T. Hales et al. (2017). "A Formal Proof of the Kepler Conjecture." Forum of Mathematics, Pi, 5, article e2. [2][3] Hales's team, after many years, used formal proof systems to machine-check the much-disputed proof of the Kepler conjecture all the way through, settling doubts that even peer review could not fully guarantee. It belongs to the same lineage as the previous entry, and its significance for this chapter is to confirm once more that when a proof grows too large for human checking, shifting trust from the oracle to a checkable certificate is the way back to certainty.
D. MacKenzie (2001). Mechanizing Proof: Computing, Risk, and Trust. MIT Press. [1][3][4] This work of sociological history by MacKenzie tracks the rise of computer proof and formal verification, examining how "proof" and "certainty" are repeatedly defined and shifted among mathematicians, machines, and social processes. It supplies this chapter with a connecting perspective, placing oracle, certificate, and redundancy alike within a larger narrative of how trust is established and ceded, and the reader should read its examination of "how mechanized proof changed what we trust."

← 9. Compressing the Unknown11. Swap the Problem →

Chapter 10: Borrowed Judgment

The Oracle in the Loop: Bringing In a Judge

Redundancy: Synthesizing Reliability from Many Unreliable Parts

The Confluence of the Two Moves, and an Echo Across Chapters

References

Open with WeChat