Changkun's Blog欧长坤的博客

Science and art, life in between.科学与艺术,生活在其间。

  • Home首页
  • Ideas想法
  • Posts文章
  • Tags标签
  • Bio关于
Changkun Ou

Changkun Ou

Human-AI interaction researcher, engineer, and writer.人机交互研究者、工程师、写作者。

Bridging HCI, AI, and systems programming. Building intelligent human-in-the-loop optimization systems. Informed by psychology, philosophy, and social science.连接人机交互、AI 与系统编程。构建智能的人在环优化系统。融合心理学、哲学与社会科学。

Science and art, life in between.科学与艺术,生活在其间。

276 Blogs博客
165 Tags标签
Changkun's Blog欧长坤的博客

Language-Centric AI While Human Cognition Shifts Toward Visual-Spatial Thinking以语言为中心的人工智能,而人类认知转向视觉-空间思维

Published at发布于:: 2026-02-16

From a Sapir-Whorf perspective, one could argue that LLMs excel because they simulate the linear structure of language and, by extension, the structure of reasoning itself. This aligns nicely with a Wittgenstein-style view in which thought is fundamentally language-bound, or at least becomes intelligible only through language.

For a long time, I almost fully believed this framing.

That confidence began to erode when I started paying closer attention to Generation Z, who are growing up fully immersed in modern digital environments. Several patterns appear consistently: 1) less reliance on linear, language-centric reasoning; 2) stronger dependence on visual representations; 3) communication patterns that are compositional and spatial rather than sequential.

This feels like a fundamental shift in cognitive structure, where thinking seems less anchored in linear linguistic narratives and more scaffolded by external systems that manage sequencing, memory, and coherence on the user’s behalf. In other words, modern software increasingly carries the burden of maintaining linear structure.

It’s well known that LLMs and adjacent technologies have begun to offload key cognitive processes, and research (e.g., https://arxiv.org/abs/2506.08872) has criticized how younger generations fail to develop certain critical thinking skills that our generation believed to be crucial. I think this offloading enables rapid context switching across tasks, ideas, and modalities. The effect aligns closely with findings from our prior work on short-form video consumption (https://arxiv.org/abs/2302.03714), where fragmented attention patterns reshape how intentions are formed, sustained, and abandoned. For adults, especially those trained in long-horizon, language-heavy problem solving, this dynamic may lead to a paradoxical outcome: while LLMs dramatically increase efficiency, they also make it harder to retain stable intentions, increasing the risk of cognitive overload rather than reducing it.

What is less clear, and more interesting, is whether the same overload applies to younger generations. I tend to believe our education system needs heavy adaptation and redesign (https://doi.org/10.3389/feduc.2025.1504726) to help younger people’s cognitive systems adapt more naturally to this environment, moving away from linear, language-dominated intelligence toward forms that are more visual, spatial, and externally coordinated (https://www.emerald.com/oth/article-abstract/9/6/1/318200/Digital-Natives-Digital-Immigrants-Part-2-Do-They). If so, we may be witnessing a divergence: LLMs doubling down on linear linguistic reasoning, while human cognition gradually moves elsewhere. If that divergence holds, the long-term question is no longer whether LLMs “think like humans,” but whether humans will continue to think in the way language-centric AI systems are optimized to emulate.

The following content is generated by LLMs and may contain inaccuracies.

Context

This idea sits at the intersection of cognitive science, AI alignment, and generational media studies. It challenges the dominant framing that positions LLMs as successful because they mirror human reasoning. Instead, it raises the possibility of a cognitive bifurcation: AI systems crystallizing around mid-20th-century models of linear, language-bound thought (Wittgenstein’s Tractatus) just as younger cohorts develop intelligence shaped by visual-spatial interfaces, distributed cognition, and algorithmic curation. This tension matters now because education systems, workplace norms, and AI design philosophies still assume a stable, language-first model of competence—one that may be eroding.

Key Insights

  1. Offloading vs. Atrophy: The cognitive offloading literature distinguishes between functional offloading (tools extend capacity) and structural offloading (tools replace internal processes). Your short-form video research documents fragmented attention as a symptom of structural offloading, where algorithmic feeds manage sequencing and LLMs handle coherence. This aligns with findings that GPS reliance degrades hippocampal spatial memory (Javadi et al., Nature Comms, 2017)—not just convenience, but neuroplastic adaptation. The critical thinking concern you cite may reflect not deficiency but incommensurability: Gen Z’s compositional, multimodal problem-solving doesn’t map cleanly onto linear essay-based assessment.

  2. Divergence, Not Convergence: Prensky’s “Digital Natives” framework is dated but prescient here. Modern interfaces—TikTok, Figma, spatial canvases—privilege configurational over sequential reasoning. If cognition co-evolves with its media (McLuhan, Understanding Media), then LLMs optimizing for linguistic coherence may be solving yesterday’s problem. This echoes concerns in HCI about mode confusion when tools embody outdated mental models.

Open Questions

  • If younger users develop visual-spatial reasoning that LLMs cannot replicate, will human-AI collaboration require new interface paradigms—perhaps spatial or diagrammatic—that translate between modalities rather than defaulting to text?
  • Could educational systems paradoxically widen the cognitive gap by forcing Gen Z into language-centric evaluation schemes, making them less competitive in contexts where LLMs excel, while also failing to validate their native strengths?

从Sapir-Whorf假说的角度来看,可以论证LLM之所以表现出色,是因为它们模拟了语言的线性结构,进而模拟了推理本身的结构。这与维特根斯坦式的观点相吻合,即思想从根本上受语言束缚,或者至少只有通过语言才能被理解。

在很长一段时间里,我几乎完全相信了这个框架。

当我开始更仔细地观察完全沉浸在现代数字环境中成长的Z代时,这种信心开始动摇。几个模式一致地出现:1) 对线性、以语言为中心的推理的依赖减少;2) 对视觉表现形式的依赖增强;3) 交流模式更具组合性和空间性,而非顺序性。

这感觉像是认知结构的根本转变,思维似乎不再那么受线性语言叙述的束缚,而是更多地由外部系统支撑,这些系统代表用户管理顺序、记忆和连贯性。换句话说,现代软件越来越多地承担了维持线性结构的负担。

众所周知,LLM及相关技术已经开始卸载关键的认知过程,研究(例如https://arxiv.org/abs/2506.08872)批评年轻一代未能发展出我们这代人认为至关重要的某些批判性思维技能。我认为这种卸载使得在任务、观点和模式之间进行快速上下文切换成为可能。这个效果与我们之前关于短视频消费的研究(https://arxiv.org/abs/2302.03714)的发现紧密一致,其中碎片化的注意力模式重塑了意图如何形成、维持和放弃。对于成年人,尤其是那些受过长期、语言密集型问题解决训练的人来说,这种动态可能导致一个看似矛盾的结果:虽然LLM大幅提高了效率,但也使保持稳定意图变得更加困难,增加了认知过载的风险,而不是减少了它。

不太清楚但更有趣的是,相同的过载是否也适用于年轻一代。我倾向于认为我们的教育系统需要大幅调整和重新设计(https://doi.org/10.3389/feduc.2025.1504726),以帮助年轻人的认知系统更自然地适应这种环境,从线性、语言主导的智能向更具视觉性、空间性和外部协调性的形式转变(https://www.emerald.com/oth/article-abstract/9/6/1/318200/Digital-Natives-Digital-Immigrants-Part-2-Do-They)。如果是这样,我们可能正在见证一种分化:LLM加倍投入于线性语言推理,而人类认知则逐渐转向他处。如果这种分化成立,那么长期问题就不再是LLM是否"像人类一样思考",而是人类是否会继续以语言中心的AI系统被优化为仿效的方式思考。

以下内容由 LLM 生成,可能包含不准确之处。

背景

这一观点处于认知科学、AI对齐和代际媒体研究的交叉点。它挑战了将大语言模型定位为成功因为它们镜像人类推理的主流框架。相反,它提出了一种认知分化的可能性:AI系统围绕20世纪中期的线性、以语言为中心的思维模型(维特根斯坦的《逻辑哲学论》)结晶化,而年轻群体则发展出由视觉-空间界面、分布式认知和算法策展塑造的智能。这种紧张关系现在很重要,因为教育系统、工作规范和AI设计哲学仍然假设一种稳定的、以语言为先的能力模型——而这种模型可能正在被侵蚀。

关键洞见

  1. 卸载vs.衰退:认知卸载文献区分了功能性卸载(工具扩展能力)和结构性卸载(工具替代内部流程)。你的短视频研究将碎片化注意力记录为结构性卸载的症状,其中算法信流管理序列,而大语言模型处理连贯性。这与GPS依赖降低海马体空间记忆的发现一致(Javadi等,Nature Communications, 2017)——不仅是便利,还有神经可塑性适应。你引用的批判性思维关切可能反映的不是缺陷,而是不可通约性:Gen Z的组合型、多模态问题解决不能清晰地对应到线性论文型评估。

  2. 分化,而非趋同:Prensky的"数字原住民“框架已过时但具有预见性。现代界面——TikTok、Figma、空间画布——优先考虑配置型而非序列型推理。如果认知与其媒介共同进化(麦克卢汉,《理解媒介》),那么为语言连贯性优化的大语言模型可能在解决昨天的问题。这呼应了人机交互中的模式混淆关切——当工具体现过时的心智模型时。

开放问题

  • 如果年轻用户发展出大语言模型无法复制的视觉-空间推理,人机协作是否需要新的界面范式——也许是空间型或图解型——来翻译不同的模式,而不是默认使用文本?
  • 教育系统是否会矛盾地加剧认知差距,通过强制Gen Z进入以语言为中心的评估体系,使他们在大语言模型擅长的环境中竞争力下降,同时也无法验证他们的本土优势?
© 2008 - 2026 Changkun Ou. All rights reserved.保留所有权利。 | PV/UV: /
0%