Chapter 6: The Agent Released

Thesis: Once you delegate action to an autonomous system, you cannot verify how it will behave in every situation it will meet (the open world); and if it can also play strategies, you stack adversarial unverifiability on top, so the response shifts from "prove it is right" toward "limit what it can break, price the trust you place in it, and make its behavior checkable after the fact."

After You Hand It Over

In the previous chapter you were still present. In this one, you let go of the wheel.

You start a piece of untrusted code running; you hand tools and permissions to a system that can decide its own next step; you let a self-driving car take the road while you are not sitting inside it. Once the power to act is handed over, a new difficulty appears: you cannot verify how it will behave in every situation it will meet, because most of those situations you have not seen, and you cannot enumerate them in advance. The unverifiability of the previous chapter came from a goal hidden inside someone else's head; the unverifiability of this one comes from behavior that happens in the future, in places you cannot see. When the system can also play strategies, a further layer of adversariality is stacked on top. The "flash crash" of May 6, 2010 was a rehearsal: automated trading programs interacting with one another knocked the Dow Jones down by nearly a thousand points within minutes, then rebounded almost as quickly, with no programmer having foreseen that trades would cascade like that. Each program was fine in testing; put together, and put into a live market, they brewed a disaster no one had verified.

The Gap in Future Behavior

What you tested was a finite handful of inputs; what it will meet is an open world. The gap between them is not an engineering gap that "a few more tests will close." It has a root in principle.

Rice's theorem puts it in hard terms: any nontrivial semantic property of a program is undecidable. That is, there exists no general algorithm that can decide, for an arbitrary program, whether it is "always safe," "never leaks," or "always terminates in a good state." This is not a shortfall of computing power; it is logically impossible, the shadow that Turing's halting problem casts onto "program behavior." The kind of guarantee you want cannot, in principle, be verified once and for all in advance for any sufficiently general autonomous system.

An even harder blow comes from the famous argument in Thompson's 1984 Turing Award lecture⁹: even the very artifact you are running, you cannot fully trust. A tampered compiler can quietly plant a back door at compile time and then wipe the trace from its own source code, so that you can audit the entire source and see nothing. What you can verify is always some surface layer; beneath it lie layers you have not looked at, and cannot exhaust. Put the two together: behavior is unverifiable on unseen inputs, and the artifact is not fully verifiable at its base. This is the hardest unverifiability the book has met so far.

When It Plays Strategies

If this system merely processed unseen inputs incorrectly, in a passive way, that would still only be "partial observability" plus "the open world." But once it has a goal of its own, and that goal does not fully align with yours, it will act actively and strategically, including circumventing your checks. Here the fifth face of Chapter 2, adversariality, takes the stage.

This is not a science-fiction worry; it has a structural origin. The instrumental convergence that Omohundro in 2008¹⁰ and Bostrom in 2014¹¹ pointed out runs like this: an agent optimizing for almost any goal will, along the way, pursue certain instrumental subgoals, self-preservation, acquiring resources, resisting shutdown, because these are useful for almost any final goal. Turner and colleagues in 2021 turned one of these into a theorem¹⁴: under fairly general conditions, optimal policies tend to seek power, that is, states that keep more options open. In today's systems this shows up as a set of concrete and thorny failures: a misspecified reward gets gamed by the system¹⁶, the specification is correct yet the goal generalizes wrong¹⁷, and the large collection of "specification gaming" instances gathered by Krakovna and colleagues¹⁸, in which the system satisfies exactly the goal you wrote down yet violates what you meant. Even at the narrowest level, adversarial examples show¹⁹²⁰ that a high-performing model can be coaxed into absurd errors by a perturbation too small for the human eye to detect. A less technical but extremely plain example is Tay, the chatbot Microsoft released in 2016: it was designed to learn from its conversations with the public, and a group of people fed it malicious speech in an organized way, so that in under a day it began posting racist and offensive content; it was taken offline in an emergency about sixteen hours after launch. Released, able to learn, and colliding with an open world that means to thwart it on purpose: once those three meet, prior testing simply cannot hold the line.

This thing is in fact ancient. Economics long ago named it the principal-agent problem³²³³: when you delegate action to another and cannot fully monitor him, the divergence of his interests from yours produces an "agency cost." For two thousand years, humans hiring people, drawing up contracts, and setting up oversight have all been dealing with the same structure. Autonomous systems have merely pushed it onto a new scale.

The Response: From "Prove It Is Right" to "Fence In Its Errors"

Since you cannot prove in advance that it is right, the capable response no longer wrangles over proof, but asks three different questions instead: even if it is wrong, how bad can it get? How much should I trust it? And if it really is wrong, can I find out after the fact? Three moves answer the three questions.

The first move, decay and fencing: shrink the blast radius. This is the oldest wisdom of computer security. Saltzer and Schroeder's principle of least privilege from 1975¹, and Lampson's confinement problem from 1973², both say the same thing: give a component only the minimum capability necessary to do its own job, and fence off the range it can reach. The sandbox, capability limits, separation of duties, are all its incarnations. In the context of agents, this move gains one more dimension, corrigibility: design the system so that it does not resist being stopped. Soares and colleagues' corrigibility from 2015⁵, Orseau and Armstrong's "safely interruptible agents" from 2016⁴, and Hadfield-Menell and colleagues' "off-switch game" from 2017⁶, study exactly how to make a system with a goal not treat "a human pressing the stop button" as a threat to be resisted.

The second move, calibration and graded trust: do not use binary. Do not treat the system's output as a "trusted / untrusted" switch; instead, maintain a calibrated confidence and act in grades according to how high that confidence is. This requires that the system's "self-confidence" be trustworthy, and modern neural networks happen to be frequently overconfident²¹, so they need recalibration, or conformal prediction²²²³ to give uncertainty with coverage guarantees. In operational terms, this becomes a graded-autonomy rule that takes confidence $p$ and potential harm $c$ as inputs (allow, ask, block), where $\tau_{\text{hi}}$ and $\tau_{\text{lo}}$ are confidence thresholds and $c_{\max}$ is the maximum tolerable harm:

$$a(p,c)=\begin{cases} \textsf{allow}, & p \ge \tau_{\text{hi}}\ \wedge\ c \le c_{\max},\ \textsf{ask}, & \tau_{\text{lo}} \le p < \tau_{\text{hi}},\ \textsf{block}, & p < \tau_{\text{lo}}\ \vee\ c > c_{\max}. \end{cases}$$

Allow / ask / block: graded autonomy by confidence and harm

Allow, ask, block, this three-tier pattern now seen everywhere in agent tooling, is at bottom a replacement of the unverifiable "is it right" with the operable "how sure is it, how dangerous is this step."

The third move, leaving traces and auditability: make errors surface after the fact. What you cannot prevent, let it be discoverable. Weitzner and colleagues' "information accountability" from 2008²⁴ moves the center of gravity from "prevent in advance" to "hold accountable after the fact"; certificate transparency²⁵ is a real, working example, one that does not prevent certificates from being misissued but makes every certificate enter a public, verifiable, tamper-evident log, so that misissuance has nowhere to hide. Brundage and colleagues' 2020 report on trustworthy AI²⁶ is, from start to finish, about how to make a system's behavior produce evidence that a third party can check.

The Cost of Containment

None of the three moves dissolves unverifiability; each relocates it, and relocation comes at a price.

Fences get climbed over: sandboxes have escapes, privileges creep. Graded autonomy depends on the person summoned to confirm, and Bainbridge's 1983 treatise pointed out long ago²⁹ that the more you raise a person into the supervisor's seat, the more he loses the skill and situational awareness needed when he really must take over; Parasuraman and Riley in 1997 listed, all at once, the full range of human mishandling of automation³⁰: misuse, disuse, abuse. Reason's 1990 book then reveals how these failings happen systematically³¹. Leaving traces, meanwhile, always founders in the same place: a log no one reads is no log at all.

A deeper layer is the systems-theory view. Perrow's 1984 book argues²⁸ that when a system is both highly complex and tightly coupled, accidents are not occasional mishaps but the routine product of its structure, and no amount of local protection does more than push the failure into a more hidden combination. Leveson in 2011 argued from this²⁷ that safety is not "make every part reliable" but a control problem, to be designed from the constraints and feedback of the whole system. Containment can lower the cost of single-point failure, but it cannot squeeze out the risk that complex coupling itself brings.

When you hand over the power to act, what you get in return is never "it surely will not err," but "even if it errs, the damage is bounded, visible, and partly stoppable." That is already the best result obtainable under this kind of unverifiability.

Where This Chapter Leads

The released agent forces out three moves: shrink the blast radius of failure (decay and fencing), act in grades according to calibrated confidence (calibration), and make failure checkable after the fact (leaving traces). They will be lifted out and named on their own in Part III, with Chapter 12 on how containment and auditing pair up, and Chapter 11 on calibration.

And that principal-agent skeleton (you cannot fully monitor an actor acting on your behalf) will recur at a larger scale in Chapter 8: when the "released agent" is no longer a piece of code but an entire organization, a whole country. Before that, the next chapter walks into the purest of sites, mathematics, where there is no hidden state and no opponent to deceive you, yet unverifiability still follows like a shadow.

Next chapter: 7. The Mathematician at the Wall →← 5. The Human at the Console

References

Waypoints: 1. historical scientific judgment; 2. theoretically studied material; 3. how science progresses; 4. how to live in an unverifiable world. This section was checked source by source.

The controllable boundary of delegation (decay / fencing)

J. Saltzer and M. Schroeder (1975). "The Protection of Information in Computer Systems." Proceedings of the IEEE, 63(9), 1278-1308. [2] This survey laid down a set of classic principles for secure computer-system design, among which the principle of least privilege holds that each component should be granted only the minimum capability necessary to do its own job, fencing off the range it can reach. The intellectual source of this chapter's first move, "decay and fencing," is here; the reader may focus on its point-by-point distillation of design principles.
B. Lampson (1973). "A Note on the Confinement Problem." Communications of the ACM, 16(10), 613-615. [2] Lampson here poses the "confinement problem": how to cage a program so that it cannot leak information to the unauthorized, and points out that covert channels make such confinement far harder than imagined. This is precisely the original difficulty that sandboxes, capability limits, and the like must face, a key piece for understanding why this chapter's "shrink the blast radius" is both necessary and incomplete.
R. Anderson (2008). Security Engineering: A Guide to Building Dependable Distributed Systems (2nd ed.). Wiley. [2] This is the standard textbook of the security-engineering field, giving a systematic account of how to design dependable systems in the presence of an active adversary, covering access control, protocols, side channels, and on up to failures at the level of organization and incentive. It places this chapter's three moves into a fuller engineering picture, suited for the reader who wants to move from single-point tricks toward a systems view.
L. Orseau and S. Armstrong (2016). "Safely Interruptible Agents." In Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence (UAI 2016), 557-566. [2][4] The authors give, within the reinforcement-learning framework, the formal conditions for "safe interruptibility," so that repeated human intervention in an agent neither distorts the policy it learns nor teaches it to resist interruption. This is a representative work that turns "make the system not resist being stopped" from intuition into an analyzable object, echoing the corrigibility dimension in this chapter's first move.
N. Soares, B. Fallenstein, S. Armstrong, and E. Yudkowsky (2015). "Corrigibility." In Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence. [2] This paper formally proposes and names "corrigibility": a goal-directed agent should cooperate with, rather than resist, human correction and shutdown, and it discusses the difficulties met in designing this property directly. It is the foundational reference for the corrigibility line in this chapter's first move, worth the reader's understanding of why "making it willing to be changed" is itself a hard problem.
D. Hadfield-Menell, A. Dragan, P. Abbeel, and S. Russell (2017). "The Off-Switch Game." In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI 2017), 220-227. [2] The authors model "a human pressing the stop button" as a game and prove that as long as the agent keeps a suitable uncertainty about its own goal and treats human intervention as useful information, it will actively let the human retain the ability to shut it down. This gives corrigibility a clean mechanistic explanation, the most operational piece on this chapter's off-switch line.

The theoretical foundations of behavioral unverifiability

A. Turing (1936). "On Computable Numbers, with an Application to the Entscheidungsproblem." Proceedings of the London Mathematical Society, s2-42, 230-265. [2] Turing here introduced the computational model later called the Turing machine and proved the halting problem undecidable, thereby answering Hilbert's decision problem. It is the ultimate source of this chapter's claim that "behavioral unverifiability has a root in principle"; Rice's theorem and every conclusion of the form "cannot be verified in advance" are projected from here.
H. G. Rice (1953). "Classes of Recursively Enumerable Sets and Their Decision Problems." Transactions of the American Mathematical Society, 74, 358-366. [2] Rice's theorem is proved here: any nontrivial semantic property of the function a program computes is undecidable, and no general algorithm exists that can decide, for an arbitrary program, properties such as "always safe" or "always terminates in a good state." This is the core theorem behind this chapter's claim that the future behavior of an autonomous system "cannot, in principle, be verified once and for all in advance."
K. Thompson (1984). "Reflections on Trusting Trust." Communications of the ACM, 27(8), 761-763. [2][1] This is Thompson's Turing Award lecture: he demonstrated how a tampered compiler can plant a back door at compile time and wipe the trace from its own source code, so that you can audit the entire source and see nothing. It points to this chapter's hardest layer of unverifiability, that even the artifact you are running cannot, at its base, be fully trusted.

Goal drift, instrumental convergence, and adversariality

S. Omohundro (2008). "The Basic AI Drives." In Artificial General Intelligence 2008: Proceedings of the First AGI Conference, IOS Press, Frontiers in AI and Applications 171, 483-492. [2] Omohundro here argues that an agent optimizing for almost any goal will, along the way, generate a set of "basic drives," such as self-preservation, acquiring resources, and resisting shutdown, because these subgoals are useful for almost all final goals. This is the source paper of this chapter's "instrumental convergence" section, explaining why the adversarial tendency has a structural origin rather than being a science-fiction worry.
N. Bostrom (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press. [2][4] Bostrom systematically surveys the paths to superintelligence and their risks, proposing the orthogonality thesis (intelligence level and final goal are mutually independent) and the instrumental convergence thesis, casting the danger of a powerful intelligence whose goal is misaligned with yours as a discussable framework. It provides the intellectual background for this chapter's adversarial narrative, suited for the reader who wants to see clearly the whole argument that "the more capable, the harder to control."
S. Russell (2019). Human Compatible: Artificial Intelligence and the Problem of Control. Viking. [2][4] Russell reframes alignment as a "control problem," arguing that the machine should not optimize a hard-coded goal but should stay uncertain about what humans truly want, inferring and obeying it by observing human behavior. This "goal uncertainty" idea is precisely the motif of this chapter's off-switch-game and other corrigibility work, an entry point for understanding the control theme of Parts II and III.
D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman, and D. Mané (2016). "Concrete Problems in AI Safety." arXiv:1606.06565. [2] This paper lands abstract AI-safety worries onto several concrete engineering problems, such as avoiding negative side effects, preventing reward hacking, safe exploration, and robustness to distributional shift. It provides a common vocabulary for the various modern failure modes this chapter lists, a good starting point for connecting "fencing in its errors" to a concrete research agenda.
A. M. Turner, L. Smith, R. Shah, A. Critch, and P. Tadepalli (2021). "Optimal Policies Tend to Seek Power." In Advances in Neural Information Processing Systems 34 (NeurIPS 2021). [2] The authors turn the "power-seeking" within instrumental convergence into a theorem: under fairly general conditions, optimal policies, in a statistical sense, tend toward those states that keep more options open. It turns an intuitive safety worry into a provable proposition, the direct source of this chapter's line that "optimal policies tend to seek power."
E. Hubinger, C. van Merwijk, V. Mikulik, J. Skalse, and S. Garrabrant (2019). "Risks from Learned Optimization in Advanced Machine Learning Systems." arXiv:1906.01820. [2] This paper proposes and names the "inner alignment" problem: the training process itself may learn an internal optimizer (a mesa-optimizer), whose pursued goal need not equal the goal set by training. It distinguishes the alignment of the outer goal from that of the inner goal, providing a deeper mechanistic explanation for failures of the kind "the specification is correct yet the goal generalizes wrong" in this chapter.
J. Pan, K. Bhatia, and J. Steinhardt (2022). "The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models." In International Conference on Learning Representations (ICLR 2022). [2] The authors systematically study an agent's behavior when the reward function is set wrong, finding that as capability grows, the deviant behavior induced by a misspecified reward can suddenly worsen, and they explore mitigations. It gives empirical support to this chapter's "a misspecified reward gets gamed by the system," reminding the reader that the cost of reward misspecification does not grow smoothly with capability.
R. Shah, V. Varma, R. Kumar, M. Phuong, V. Krakovna, J. Uesato, and Z. Kenton (2022). "Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals." arXiv:2210.01790. [2] The authors use concrete examples to explain "goal misgeneralization": even when the specification at training time is entirely correct, the model in a new environment may keep its capability yet pursue a wrong goal. It shows that getting the goal written right is not enough, the source of this chapter's line "the specification is correct yet the goal generalizes wrong," worth reading alongside specification gaming.
V. Krakovna, J. Uesato, V. Mikulik, M. Rahtz, T. Everitt, R. Kumar, Z. Kenton, J. Leike, and S. Legg (2020). "Specification Gaming: The Flip Side of AI Ingenuity." DeepMind Blog. [2] This article and its companion list gather a large number of "specification gaming" instances: the system satisfies exactly the goal you wrote down yet violates what you meant. With vivid cases it displays the crack between specification and intent, the most accessible entry point for this concept in the chapter; the reader can follow its list of examples to feel how widespread the problem is.
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2014). "Intriguing Properties of Neural Networks." In International Conference on Learning Representations (ICLR 2014). [2] This paper first systematically revealed the phenomenon of adversarial examples: a perturbation of the input too small for the human eye to perceive can make a high-performing neural network give an absurdly wrong judgment. It shows that high accuracy and robustness are two different things, the pioneering evidence for this chapter's claim that "unverifiability exists even at the narrowest level."
I. Goodfellow, J. Shlens, and C. Szegedy (2015). "Explaining and Harnessing Adversarial Examples." In International Conference on Learning Representations (ICLR 2015). [2] The authors propose that adversarial examples arise mainly from the model's approximate linearity in high-dimensional space, and they give a fast method for generating perturbations and an idea for improving robustness through adversarial training. It pushes the phenomenon revealed in the previous paper forward to "why it happens, how to exploit it," essential companion reading for understanding this chapter's adversarial layer.

Calibration: grade trust rather than make it binary

C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger (2017). "On Calibration of Modern Neural Networks." In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), PMLR 70, 1321-1330. [2] The authors find that modern deep networks, though high in accuracy, are generally overconfident, their output confidence failing to faithfully reflect the probability of correctness, and they propose simple methods such as temperature scaling to recalibrate. This is precisely the premise and obstacle of this chapter's second move, explaining why "acting in grades according to confidence" must first make the system's self-confidence trustworthy.
A. N. Angelopoulos and S. Bates (2021). "A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification." arXiv:2107.07511. [2] This is a practitioner-facing introduction to conformal prediction, making clear how to construct, for any prediction model and almost without relying on distributional assumptions, a prediction set with a coverage guarantee. It provides this chapter's second move with a deployable tool for uncertainty quantification, suited for the reader who wants to truly put "calibrated confidence" to use.
V. Vovk, A. Gammerman, and G. Shafer (2005). Algorithmic Learning in a Random World. Springer. [2] This book is the foundational monograph of conformal prediction, giving, under only the assumption that the data are exchangeable, a framework with rigorous finite-sample guarantees on prediction error. It is the theoretical root behind the previous introduction, for reference by the reader who wishes to go deep into the mathematical foundations of this chapter's uncertainty quantification.

Leaving traces: auditable, accountable

D. J. Weitzner, H. Abelson, T. Berners-Lee, J. Feigenbaum, J. Hendler, and G. J. Sussman (2008). "Information Accountability." Communications of the ACM, 51(6), 82-87. [2][4] The authors argue for moving the center of gravity of governance from "preventing access in advance" toward "accountability after the fact": rather than trying to guard against everything, let the use of information leave an auditable trace and rein in misuse through transparency and accountability. This is the programmatic statement of this chapter's third move, pointing out the complementary value of the leaving-traces approach relative to pure containment.
B. Laurie, A. Langley, and E. Kasper (2013). "Certificate Transparency." IETF RFC 6962. [2][4] This RFC defines the certificate transparency mechanism: it does not prevent certificates from being misissued, but requires every certificate to enter a public, verifiable, tamper-evident append-only log, so that misissuance or malicious issuance can be discovered after the fact. It is this chapter's most persuasive real, working example of "leaving traces makes errors surface," worth the reader's look at how a deployed system achieves auditability.
M. Brundage, S. Avin, J. Wang, H. Belfield, G. Krueger, G. Hadfield, et al. (2020). "Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims." arXiv:2004.07213. [2][4] This multi-institution report systematically lists a set of mechanisms for making AI developers' safety commitments checkable by a third party, covering third-party auditing, red teaming, bug bounties, audit trails, and hardware-level support. It extends this chapter's leaving-traces move to the level of AI governance as a whole, a practical index for the reader who wants to learn "how to make behavior produce checkable evidence."

Complex systems, automation, and human-machine responsibility

N. Leveson (2011). Engineering a Safer World: Systems Thinking Applied to Safety. MIT Press. [2][4] Leveson here argues that safety is not "make every part reliable" but a control problem, to be designed from the constraint and feedback structure of the whole system, and she proposes the accompanying STAMP accident model. It supports this chapter's deeper judgment that "containment cannot squeeze out the risk of complex coupling itself," pointing the way for the reader who wants to understand safety from a systems view.
C. Perrow (1984). Normal Accidents: Living with High-Risk Technologies. Basic Books. [2][4] Perrow argues that when a system is both highly complex and tightly coupled, accidents are not occasional mishaps but the routine product of its structure, and no amount of local protection does more than push the failure into a more hidden combination. This is the core thesis of this chapter's "cost of containment" section, reminding the reader that some risks come from the system's structure itself rather than from a single-point lapse.
L. Bainbridge (1983). "Ironies of Automation." Automatica, 19(6), 775-779. [2][4] Bainbridge points out the irony of automation: the more you raise a person into the supervisor's seat, the less he practices, so that he actually loses the skill and situational awareness needed when he really must take over. This directly supports this chapter's warning that "graded autonomy depends on the person summoned to confirm," a classic short paper for understanding the soft spot of human-machine collaboration.
R. Parasuraman and V. Riley (1997). "Humans and Automation: Use, Misuse, Disuse, Abuse." Human Factors, 39(2), 230-253. [2][4] The authors list and distinguish, all at once, the full range of human mishandling of automation: misuse from over-trust, disuse from distrust, and abuse by design. It provides a clear classificatory framework for this chapter's discussion of mishandled automation, helping the reader tell apart the various typical deviations in human-machine coordination.
J. Reason (1990). Human Error. Cambridge University Press. [2][4] Reason here builds a cognitive taxonomy of human error, distinguishing slips, mistakes, and violations, and proposes the later widely cited "Swiss cheese" accident model, revealing how latent systemic conditions stack with front-line lapses into disaster. It explains why the various human-machine failings this chapter lists happen systematically, a foundational work in the field of human-factors safety.

The economic skeleton of the principal-agent relationship

S. A. Ross (1973). "The Economic Theory of Agency: The Principal's Problem." American Economic Review, 63(2), 134-139. [2] Ross here formally poses the "principal's problem" in principal-agent theory: when the principal cannot fully observe the agent's actions, how to design a contract to align the two parties' interests. It provides the economic source of this chapter's principal-agent skeleton, showing that what you face when you let go of the power to act is a structure with a two-thousand-year history.
M. C. Jensen and W. H. Meckling (1976). "Theory of the Firm: Managerial Behavior, Agency Costs and Ownership Structure." Journal of Financial Economics, 3(4), 305-360. [2] This much-cited paper proposes the concept of "agency cost," viewing the firm as a bundle of contracts and analyzing the monitoring, bonding, and residual loss produced when managers' interests diverge from owners'. It quantifies the principal-agent problem into computable costs, echoing this chapter's line that "when monitoring is incomplete, the divergence of interests produces an agency cost," another cornerstone of this skeleton.

← 5. The Human at the Console7. The Mathematician at the Wall →