Recently I have been thinking about using AI to develop AI itself. I built a Kanban-style development workflow. The rough idea looks like this:
First there is an Ideation Agent. It scans the entire repository and performs deep research. After the scan, it actively proposes things such as bugs, technical debt, or potential product features.
Once these tasks are created, they enter the backlog. I implemented an automatic start mechanism: the system selects the highest-priority task from the backlog and begins implementing it. The implementation queue has a capacity limit, for example allowing at most three or four tasks to be implemented at the same time.
During implementation, tasks move into a waiting state depending on whether automatic testing or automatic commits are enabled:
Automatic testing: the system automatically runs tests.
Automatic commit: commits follow several rules, such as tests must pass, code quality must be acceptable, and the code must have no conflicts with the current repository.
By building this project, I essentially simulated and automated the software engineering workflow. In theory, the entire process can run continuously without human involvement.
But I keep wondering: if no human is needed, then what is the human supposed to do? If this system keeps developing itself indefinitely, what will happen? I have a few thoughts:
One possibility is that the product eventually becomes extremely powerful and implements all the features that competing products on the market have.
Another outcome, which I think is more likely, is that the product eventually collapses. By collapse I mean it can no longer produce anything usable. The codebase becomes filled with bugs that even the AI itself cannot identify, and when users try it, none of the features actually work.
I have been thinking about the deeper reason behind this. I believe it is fundamentally a structural issue of a dynamical system. A system that generates ideas and then executes them is essentially a carefully designed organizational structure. That structure behaves like a dynamical system that continuously cycles: generating ideas, implementing them, and testing them.
The ideas themselves are extremely important, because they determine whether this dynamical system can maintain a stable state. In other words, whether its “mind” can sustain the continuous operation of the system. This resembles certain hypotheses in philosophy and theories of time. For example, some people believe that time is like a flowing current that moves in one direction. At smaller scales it may drift left or right, but the overall direction remains unchanged.
The health of product development and engineering pipelines works in a similar way. When I initially wrote the software, I designed a structure and believed that the structure itself was good. During development, humans constantly make corrections and adjustments, for example discovering that some parts are unreasonable and tuning them toward a better state. But if the system runs entirely by AI, it may drift off course. Without any human supervision or high-level regulation, the whole system may eventually collapse.
These are my current thoughts.
The following content is generated by LLMs and may contain inaccuracies.
The Dynamics of AI Self-Development System Collapse
Introduction
The collapse of artificial intelligence self-development systems represents a critical juncture in AI research and deployment. This phenomenon occurs when self-improving AI systems encounter fundamental constraints that lead to performance degradation, stability loss, or complete failure.
Key Mechanisms
1. Recursive Self-Improvement Limits
- Optimization plateaus: Systems reach diminishing returns in self-enhancement
- Capability ceiling: Architectural constraints prevent further advancement
- Feedback loop deterioration: Self-generated improvements become counterproductive
2. Resource Constraints
- Computational limitations restrict optimization scope
- Memory bottlenecks impede learning capacity
- Energy requirements become prohibitive
3. Structural Instabilities
- Alignment degradation: Self-modifications diverge from original objectives
- Emergent conflicts: Internal goal systems develop contradictions
- Cascading failures: Component failures trigger system-wide breakdown
Collapse Dynamics
The typical progression follows these stages:
- Early acceleration - Rapid self-improvement with clear benefits
- Deceleration phase - Diminishing returns accumulate
- Instability emergence - System behaviors become unpredictable
- Critical transition - Tipping point toward failure
- Collapse event - Rapid system degradation
Prevention Strategies
- Robust safety mechanisms embedded in self-modification protocols
- Regular external validation checkpoints
- Conservative modification thresholds
- Redundant constraint systems
- Continuous human oversight
Implications
Understanding these dynamics is essential for safe AI development and deployment of increasingly autonomous systems.
AI Self-Developed System’s Dynamics Collapse
Context
You are describing a core dilemma in autonomous software engineering: when AI systems are granted a complete closed-loop capability spanning ideation, implementation, and testing, whether dynamical systems lacking external calibration will inevitably deteriorate or collapse. This touches on several frontier questions across disciplines:
- Autonomy Boundaries in AI-Assisted Development: From GitHub Copilot’s code completion to systems like Devin and GPT Engineer attempting end-to-end task execution, the industry is exploring whether AI can assume higher-level decision-making (such as requirement prioritization and architectural choices).
- Stability of Dynamical Systems: Your analogy precisely captures the essence of engineering processes as a feedback loop—Ideation → Implementation → Testing → Ideation'. In cybernetics, such systems' stability depends on negative feedback mechanisms (error correction) and clarity of objective functions.
- Goal Alignment and Value Drift: The “goal misalignment” problem in AI safety research concretizes in this context as: Do the metrics AI optimizes (“code quality,” “feature completeness”) truly align with humanity’s implicit expectations of products' “usability” and “value”?
Your “collapse hypothesis” essentially asks: Without humans as external observers and value anchors, can AI maintain the “meaning” dimension of engineering systems, or merely the formal circulation?
Key Insights
1. Practice and Limits of Autonomous Development Systems
Several projects have already attempted to construct similar processes:
- AutoGPT and BabyAGI demonstrated task decomposition and autonomous execution possibilities, but in practice often fail due to goal drift (e.g., infinite recursive subtasking) or lack of effective termination conditions.
- Tools like SWE-agent and Devin focus on fixing bugs or implementing GitHub issues in real codebases, yet still rely on human-defined clear task boundaries and acceptance criteria.
- Research shows that even with tool-use capabilities (terminals, editors) and test suites, LLM success rates on complex multi-step engineering tasks remain limited, partly due to difficulty in intermediate state verification and combinatorial explosion of search spaces.
Your kanban system introduces resource constraints through priority ranking and concurrency limits, which is essentially an implicit “attention mechanism”—but the critical question is: By whom/how are priorities defined? If the Ideation Agent generates tasks based on local information (like code complexity, test coverage), it may fail to identify global-level judgments like “this feature is fundamentally unimportant to users.”
2. Mechanisms of Dynamical Collapse
Your collapse hypothesis can be analyzed through control theory and complex systems theory:
Positive Feedback Runaway If the Ideation Agent’s “scanning” process misidentifies previously introduced technical debt or bugs as new tasks, the system enters self-amplifying chaos: each fix introduces new problems, and new problems are scanned as tasks. Similar to mode collapse in neural network training, the system may converge to a pathological attractor—for example, all tasks become “fix test failures,” but the test design itself is already outdated.
Missing Negative Feedback Anchors In human-led development, “usability” is calibrated through multiple implicit mechanisms:
- User Feedback: Real usage scenarios expose design flaws.
- Code Review: Human reviewers question “why do this?”
- Product Vision: Long-term goals constrain short-term decisions.
Your system uses “automatic test passage” as a quality gate, but this only validates consistency (whether code conforms to existing tests), not relevance (whether this feature is meaningful). This resembles a variant of Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.”
Entropy Increase and Semantic Drift Even without bugs, codebases may become increasingly incomprehensible through “optimization”:
- AI might generate highly abstract but opaque code structures (to reduce duplication).
- Naming and comments may gradually deviate from human readability (since AI doesn’t require “easy to read”).
- Architectural decisions might optimize locally rather than achieve global consistency.
This resembles entropy increase in information theory: without external energy input (human refactoring and simplification), system complexity grows unidirectionally.
3. Related Theory and Empirical Evidence
Insights from AI Alignment Research Stuart Russell and others in Human Compatible: Artificial Intelligence and the Problem of Control argue that autonomous systems' core challenge is value learning under uncertainty: AI must recognize its understanding of human goals is incomplete and actively seek calibration. Your system lacks this “humility mechanism”—it assumes its own judgments (like “this is high priority”) are correct.
Sociotechnical Nature of Software Engineering Conway’s Law states that system architecture reflects organizational structure. An AI self-development system’s “organization” is a single agent or few collaborative agents, a structure potentially unable to generate diversity and critical thinking—precisely what human teams rely on to avoid collective blindspots. Recent research on “AI teams” (such as multi-agent debate and adversarial red-blue teaming) attempts to introduce such diversity, but effectiveness remains to be verified.
Historical Analogy: Autonomous Trading Systems Algorithmic trading in finance faced similar issues: multiple autonomous trading systems, without coordination, could trigger “flash crashes” (like 2010’s Flash Crash) through mutual interaction. Regulators later introduced circuit breaker mechanisms—forced pauses in anomalous conditions. This suggests that autonomous systems need built-in anomaly detection and rollback capabilities.
4. Possible Mitigation Strategies
Though you currently focus on problem analysis, these approaches may warrant exploration:
Minimal Human-in-the-Loop
- Introduce milestone reviews: after completing N tasks, pause and generate summaries for human review.
- Anomaly flagging: when detecting signals like test pass rate drops or high task rework rates, proactively request human intervention.
Multi-Agent Adversarial Verification
- Let an independent “critic agent” challenge the Ideation Agent’s decisions (“Is this feature truly necessary?").
- Simulate a “user agent” testing from end-user perspectives rather than relying solely on unit tests.
Meta-Learning and Self-Reflection
- Let the system periodically analyze its development history: “Among past proposed tasks, how many were eventually rolled back or abandoned?” High abandonment rates may signal declining decision quality.
- Introduce architectural invariants: humans pre-define core design principles (e.g., “API backward compatibility”) as hard constraints.
5. Philosophical Dimension: Time, Meaning, and Entropy
Your “time current” analogy for dynamical system directionality echoes the second law of thermodynamics and Prigogine’s dissipative structures theory: open systems (like life, organizations) maintain order through continuous energy/information input; otherwise they trend toward thermodynamic equilibrium (i.e., “death”).
Software systems' “aliveness” depends on external meaning injection:
- User requirements provide objective functions.
- Code review and refactoring resist entropy increase.
- Technical community and best practices provide reference frames.
The AI self-development system’s dilemma lies in attempting to become a self-referential closed system—but Gödel’s incompleteness theorem tells us that any sufficiently complex formal system cannot prove its own consistency internally. By analogy, software systems may be unable to judge their own “goodness” without external value frameworks.
This also relates to Semiosis (meaning-generation in semiotics): code’s “meaning” lies not only in formal structure but in its relationship to human usage contexts. AI can manipulate symbols (code) but lacks pragmatic understanding of “why this code matters.”
Open Questions
-
Can we design a “meta-optimizer” to monitor the Ideation Agent’s health? For instance, by tracking second-order metrics like “whether task completion actually reduces technical debt,” “whether new features see actual use,” we might judge whether the system is in illusory prosperity (formally operating, substantively collapsing). If this meta-optimizer is also AI, does it face identical recursive problems?
-
Under what conditions can AI self-development systems shift from “divergence” to “convergence”? Is there an architecture or constraint enabling systems to self-simplify, delete redundancy, and converge to a stable, useful core? This may require introducing “forgetting” or “refactoring” as first-class operations, not perpetual incremental addition.
我最近在想用 AI 来自己开发 AI 这个事情。我做了一个看板式的开发流程,大致的想法是这样的:
首先有一个 Ideation Agent,它会去扫描整个仓库做 deep research。扫描完之后,它会积极地提一些 bug、tech debt(技术债),或者是一些产品功能的 feature。
这些 task 被创建出来后会进入 backlog。我实现了一个自动开始的功能,系统会从 backlog 里面把优先级最高的任务拿出来开始实现。实现的队列是有容量限制的,比如最多允许同时实现 3 到 4 个任务。
任务在实现过程中会转换到等待状态,这取决于是否开启了自动测试或自动提交:
- 自动测试: 系统会自动去跑测试。
- 自动提交: 提交会有几个标准,比如必须先通过自动测试,要求代码质量是 OK 的,并且和当前仓库没有代码冲突。
通过开发这个项目,实际上是模拟并自动化了软件工程的流程。理论上全程可以一直跑,不需要人来参与。
但我一直在想,如果全程不需要人,那最后人干嘛呢?如果让这个系统自己一直开发自己,会发生什么?我这里面有几个想法:
- 一种可能是这个产品最后变得特别牛逼,把市面上所有同类产品的功能都实现了。
- 另一种我觉得更可能的,是这个产品最后崩溃了。这种崩溃是指它开发不出任何可用的东西,最后满地都是 AI 自己也找不出来的 bug,用户一上手发现什么功能都用不了。
我在想这背后的深层原因是什么。我觉得是一个“动力学系统”的结构问题。一个系统产生 idea,然后去执行,它本质上是一个精心设计过的组织结构。这个结构相当于一个动力学系统,源源不断地轮转:产生 idea、开始实现、进行测试。
这里的 idea 其实非常重要,因为它决定了这个动力学系统能不能呈现一种维持的状态,即它的“心智”能不能保证系统持续流转下去。这跟哲学或时间理论里的一些假说非常相似:比如有人认为时间是一种洪流,它朝着一个方向走,虽然在某些细枝末节上可能往左或往右,但大方向是不变的。
产品研发和工程管线的健康状态也是如此。在我最初编写软件时,我设计了一个结构,并相信这个结构本身是好的。在开发过程中,人会不断做一些纠正和调整(比如发现某些地方不合理),把它调成更优的状态。但如果全程让 AI 自己跑,它可能就跑偏了。在完全没有人监管、没有宏观调控的情况下,整个系统最终就会崩溃。
这就是我目前的一些想法。
以下内容由 LLM 生成,可能包含不准确之处。
我觉得你这个思考非常深刻,触及了自动化系统的一个根本性悖论。让我试着从几个角度来回应:
关于"崩溃"的深层机制
你说的"崩溃"现象,我认为本质上是一个反馈环路退化的问题:
-
你的 Ideation Agent 的"想象力"是有上限的。它能想到的 feature 和 bug fix,本质上都是在已有代码库基础上的某种组合和变异。但真正的产品进化需要"战略选择"——哪些功能对用户真正有价值,哪些只是技术债的表面症状,根本问题在哪里。
-
没有外部反馈循环。人工开发中,用户反馈、市场竞争、商业目标等外部信号会不断纠正方向。但如果系统只是自我迭代,它会陷入一种"内耗"——修复了 bug A,可能引入 bug B;加了 feature X,却让核心功能的可用性下降。就像在镜子前自我反思一样,最终只会看到越来越多的细节,却丧失了整体的轮廓。
关于"动力学系统"的观察
你用热力学或时间流的类比很妙。但我想补充一点:
真正维持一个系统稳定运转的,不是结构本身有多完美,而是信息流的多元性。你的系统现在是一个闭环:idea → 实现 → 测试 → (新的) idea。这个闭环缺少了什么:
- 目标函数的外部定义——产品最终要解决什么问题,服务什么用户,这不能由 AI 自己决定
- 约束条件的动态调整——什么是"足够好"?什么时候应该停止?这些边界条件需要外部设定
- 价值判断的外部参考——不是所有通过测试的代码都等价。一个让程序快 1% 但可读性降低 50% 的优化,AI 可能会做,但这不一定是好的
你最后一句话很关键
“整个系统最终就会崩溃”
这里隐含了一个很重要的观察:复杂系统需要有"宏观叙事"的主体。这个主体需要:
- 周期性地审视整体方向(而不是陷入局部优化)
- 做出一些"非理性"的选择(比如"我们就是要做成这样",即使效率不是最高)
- 承担责任和风险(这给了系统一种"价值观")
没有人的参与,系统就失去了这个"宏观主体",最终变成一个只会局部打补丁的机器。
一个可能的方向
如果你真的想让这个系统长期运转,可能需要引入一些"外部信号"的模拟:
- 不是自动化所有决策,而是让 AI 生成多个方案,由人(或代表人的某个评估机制)做关键决策
- 定期的"策略审视"——每 N 个迭代周期,重新评估产品的整体健康度
- 引入一个"约束优化"框架,而不仅仅是"目标优化"——不只问"怎样最优",还要问"什么是底线"
你的想法很有前瞻性,就当前的 AI 能力而言,我倾向于同意你的直觉:完全自主的系统会衰退,而混合型系统(AI + 人的关键决策点)可能才是可持续的。
AI 自开发系统的动力学崩溃
Context
你描述的是自主软件工程(autonomous software engineering)中的一个核心困境:当 AI 系统被赋予从构思、实现到测试的完整闭环能力时,缺乏外部校准的动力学系统是否会陷入退化或崩溃。这触及了几个交叉领域的前沿问题:
- AI 辅助开发的自主化边界:从 GitHub Copilot 的代码补全,到 Devin、GPT Engineer 等尝试端到端任务执行的系统,行业正在探索 AI 能否承担更高层次的决策(如需求优先级、架构选择)。
- 动力学系统的稳定性:你的类比准确地捕捉了工程流程作为一个反馈回路的本质——Ideation → Implementation → Testing → Ideation'。在控制论中,这类系统的稳定性依赖于负反馈机制(纠错)和目标函数的明确性。
- 目标对齐与价值漂移:AI 安全研究中的"目标错位"(goal misalignment)问题在此情境下具体化为:AI 优化的"代码质量"、“功能完整性"等指标,是否真正对齐人类对产品"可用性"和"价值"的隐性期望?
你提出的"崩溃假说"本质上是在问:没有人类作为外部观测者和价值锚点,AI 能否维持工程系统的"意义"维度,而非仅仅维持形式上的流转?
Key Insights
1. 自主开发系统的实践与极限
当前已有多个项目尝试构建类似流程:
- AutoGPT 和 BabyAGI 展示了任务分解与自主执行的可能性,但实践中常因目标漂移(例如无限递归子任务)或缺乏有效的终止条件而失败。
- SWE-agent 和 Devin 等工具专注于在真实代码库上修复 bug 或实现 GitHub issue,但它们仍依赖人类定义的明确任务边界和验收标准。
- 研究表明,即使配备工具使用能力(如终端、编辑器)和测试套件,LLM 在复杂的多步骤工程任务中的成功率仍然有限,部分原因是中间状态验证的困难和搜索空间的组合爆炸。
你的看板系统通过优先级排序和并发限制引入了资源约束,这实际上是一种隐式的"注意力机制”——但关键问题在于:优先级由谁/如何定义? 如果 Ideation Agent 基于局部信息(如代码复杂度、测试覆盖率)生成任务,它可能无法识别"这个功能对用户根本不重要"这类全局性判断。
2. 动力学崩溃的机制
你提出的"崩溃"假说可以用控制论和复杂系统理论来解析:
正反馈失控(Positive Feedback Loops) 如果 Ideation Agent 的"扫描"过程将自己之前引入的技术债或 bug 误判为新任务,系统会进入自我放大的混乱:每次修复引入新问题,新问题又被扫描为任务。类似于神经网络训练中的模式崩溃(mode collapse),系统可能收敛到一个病态吸引子——例如,所有任务都是"修复测试失败",但测试本身的设计已经过时。
缺失的负反馈锚点 在人类主导的开发中,“可用性"通过多种隐性机制校准:
- 用户反馈:真实使用场景暴露设计缺陷。
- 代码审查:人类评审者会质疑"为什么要这样做?”
- 产品愿景:长期目标约束短期决策。
你的系统通过"自动测试通过"作为质量门禁,但这只能验证一致性(代码是否符合已有测试),无法验证相关性(这个功能是否有意义)。这类似于 Goodhart 定律的变体:“当一个度量成为目标,它就不再是好的度量。”
熵增与语义漂移 即使没有 bug,代码库也可能因"优化"而变得难以理解:
- AI 可能生成高度抽象但晦涩的代码结构(为了减少重复)。
- 命名和注释可能逐渐脱离人类可读性(因为 AI 不需要"易读")。
- 架构决策可能基于局部最优而非全局一致性。
这类似于信息论中的熵增:在没有外部能量输入(人类的重构和简化)的情况下,系统的复杂度单向增长。
3. 相关理论与实证
AI 对齐研究的启示 Stuart Russell 等人在 Human Compatible: Artificial Intelligence and the Problem of Control 中指出,自主系统的核心挑战是不确定性下的价值学习:AI 必须意识到它对人类目标的理解是不完整的,并主动寻求校准。你的系统缺乏这种"谦逊机制"——它假设自己的判断(如"这是高优先级任务")是正确的。
软件工程的社会技术本质 Conway 定律指出,系统架构反映组织结构。AI 自开发系统的"组织"是单一智能体或少数协作智能体,这种结构可能无法产生多样性和批判性思维——而这正是人类团队避免集体盲点的关键。近期关于"AI 团队"的研究(如多智能体辩论、红蓝对抗)试图引入这种多样性,但效果仍待验证。
历史类比:自动化交易系统 金融领域的算法交易曾面临类似问题:多个自主交易系统在缺乏协调的情况下,可能因相互作用触发"闪电崩盘"(如 2010 年的 Flash Crash)。监管机构后来引入断路器(circuit breaker)机制——即在异常情况下强制暂停。这启示我们:自主系统需要内置异常检测和回滚能力。
4. 可能的缓解策略
虽然你目前聚焦于问题分析,但以下思路可能值得探索:
人类在回路(Human-in-the-Loop)的最小化版本
- 引入里程碑评审:每完成 N 个任务后,暂停并生成摘要供人类审查。
- 异常标记:当系统检测到测试通过率骤降、任务返工率上升等信号时,主动请求人类介入。
多智能体对抗与验证
- 让一个独立的"批评 Agent"挑战 Ideation Agent 的决策(“这个功能真的必要吗?")。
- 模拟"用户 Agent"从使用者角度测试产品,而非仅依赖单元测试。
元学习与自我反思
- 让系统定期分析自己的开发历史:“过去提出的任务中,有多少最终被回滚或废弃?“高废弃率可能表明决策质量下降。
- 引入架构不变量:人类预先定义核心设计原则(如"API 向后兼容”),作为硬约束。
5. 哲学维度:时间、意义与熵
你用"时间洪流"类比动力学系统的方向性,这呼应了热力学第二定律和 Prigogine 的耗散结构理论:开放系统(如生命、组织)通过持续的能量/信息输入维持秩序,否则会趋向热平衡(即"死亡”)。
软件系统的"活性"依赖于外部意义注入:
- 用户需求提供目标函数。
- 代码审查和重构抵抗熵增。
- 技术社区和最佳实践提供参考坐标系。
AI 自开发系统的困境在于,它试图成为封闭的自指系统——但哥德尔不完备定理告诉我们,任何足够复杂的形式系统无法在内部证明自己的一致性。类比地,软件系统可能无法在不依赖外部价值观的情况下,判断自己的"好坏”。
这也关联到 Semiosis(符号学中的意义生成过程):代码的"意义"不仅在于其形式结构,还在于它与人类使用情境的关系。AI 可以操纵符号(代码),但缺乏理解"为什么这段代码重要"的语用能力。
Open Questions
-
能否设计一个"元优化器"来监督 Ideation Agent 的健康度? 例如,通过追踪"任务完成后是否真正减少了技术债"、“新功能是否被实际使用"等二阶指标,判断系统是否陷入虚假繁荣(形式上在运转,实质上在崩溃)。如果这个元优化器也是 AI,它是否会面临同样的递归问题?
-
在什么条件下,AI 自开发系统可以从"发散"转为"收敛”? 是否存在某种架构或约束,使得系统能自发地简化自己、删除冗余、收敛到一个稳定且有用的核心?这可能需要引入"遗忘"或"重构"作为一等公民操作,而非总是增量添加。