Large Language Models as Optimization Tools

LLM as Optimizer:

Large Language Models as Optimizers https://arxiv.org/abs/2309.03409
When Large Language Models Meet Optimization https://www.sciencedirect.com/science/article/abs/pii/S2210650224002013?via%3Dihub
Large Language Models to Enhance Bayesian Optimization https://arxiv.org/abs/2402.03921
Cooperative Design Optimization through Natural Language Interaction https://arxiv.org/abs/2508.16077
Language-Based Bayesian Optimization Research Assistant (BORA) https://arxiv.org/abs/2501.16224
LILO: Bayesian Optimization with Interactive Natural Language Feedback https://arxiv.org/abs/2510.17671
Bayesian Optimization of High-dimensional Outputs with Human Feedback https://openreview.net/pdf?id=2fHwkHskpo

The following content is generated by LLMs and may contain inaccuracies.

Context

This collection addresses a paradigm shift in optimization: using Large Language Models not as tools to be optimized, but as optimizers themselves. Traditional derivative-based methods fail when gradients are unavailable or expensive to compute—a common constraint in hyperparameter tuning, experimental design, and real-world engineering. By framing optimization as a natural language reasoning task, researchers are exploring whether LLMs' pattern recognition and contextual understanding can rival or augment classical methods like Bayesian optimization. This matters now because LLMs have demonstrated surprising competence in mathematical reasoning, and their ability to incorporate domain knowledge through prompting offers a potential escape from local optima traps that plague blind search algorithms.

Key Insights

LLMs as meta-optimizers outperform hand-crafted heuristics in prompt engineering. Yang et al.’s OPRO framework demonstrates that LLMs can iteratively refine solutions by conditioning on historical performance—achieving up to 50% improvement over human-designed prompts on reasoning benchmarks. This suggests LLMs excel when the optimization landscape can be encoded linguistically, exploiting their pre-trained semantic knowledge rather than relying solely on numerical gradients.

Hybrid systems combining LLMs with Bayesian optimization show complementary strengths. LLAMBO integrates LLMs for zero-shot warm-starting and surrogate modeling in early search stages, while BORA uses LLMs to inject domain knowledge from literature into experimental design. These approaches address Bayesian optimization’s sample inefficiency in high dimensions by leveraging LLMs' ability to reason about plausible regions—though they inherit LLMs' hallucination risks when proposing scientifically implausible candidates.

Natural language interfaces democratize expert-level optimization but introduce cognitive tradeoffs. Niwa et al.’s cooperative framework enables designers to steer optimization mid-flight through conversational input, matching performance of automated methods with lower cognitive load. However, the explainability gains (LLMs narrating their reasoning) compete with potential over-reliance on plausible-sounding but suboptimal suggestions—a tension between human agency and algorithmic efficiency.

Open Questions

When do LLMs' semantic biases help versus harm search? If pre-training data over-represents certain solution types, could LLM-guided optimization systematically miss unconventional optima in scientific discovery tasks?
Can we quantify the sample efficiency frontier between pure BO and LLM-augmented methods? Under what dimensionality, evaluation cost, and prior knowledge regimes does linguistic contextualization outweigh the risk of premature convergence to plausible-but-local solutions?

大型语言模型作为优化器：

大型语言模型作为优化器 https://arxiv.org/abs/2309.03409
当大型语言模型遇见优化 https://www.sciencedirect.com/science/article/abs/pii/S2210650224002013?via%3Dihub
增强贝叶斯优化的大型语言模型 https://arxiv.org/abs/2402.03921
通过自然语言交互的协作设计优化 https://arxiv.org/abs/2508.16077
基于语言的贝叶斯优化研究助手 (BORA) https://arxiv.org/abs/2501.16224
LILO: 具有交互式自然语言反馈的贝叶斯优化 https://arxiv.org/abs/2510.17671
具有人类反馈的高维输出贝叶斯优化 https://openreview.net/pdf?id=2fHwkHskpo

以下内容由 LLM 生成，可能包含不准确之处。

背景

这个研究集合涉及优化范式的转变：不再将大型语言模型（LLM）作为被优化的工具，而是作为优化工具本身。当梯度不可用或计算成本高昂时，传统基于导数的方法会失效——这在超参数调优、实验设计和现实工程中是常见的约束。通过将优化问题框架化为自然语言推理任务，研究人员正在探索LLM的模式识别和语境理解能力是否能与贝叶斯优化等经典方法相匹敌或增强这些方法。这现在之所以重要，是因为LLM已经展现出令人惊讶的数学推理能力，而它们通过提示词融入领域知识的能力，提供了一条逃离困扰盲搜索算法的局部最优陷阱的潜在途径。

关键见解

作为元优化器的LLM在提示词工程中表现优于手工设计的启发式方法。 Yang等人的OPRO框架证明LLM可以通过以历史表现为条件来迭代精化解决方案——在推理基准上相比人工设计的提示词实现了高达50%的性能改进。这表明当优化景观能够用语言编码时，LLM表现出色，利用其预训练的语义知识，而不是仅依赖数值梯度。

结合LLM与贝叶斯优化的混合系统展现出互补的优势。 LLAMBO在搜索早期阶段利用LLM进行零样本预热启动和代理建模，而BORA使用LLM将文献中的领域知识注入实验设计。这些方法通过利用LLM推理合理区域的能力来解决贝叶斯优化在高维中的样本效率不足问题——尽管当提出科学上不合理的候选方案时，它们会继承LLM的幻觉风险。

自然语言界面使专家级优化民主化，但带来认知权衡。 Niwa等人的协作框架使设计人员能够通过对话输入在优化过程中实时调整方向，在认知负荷较低的情况下达到自动化方法的性能。然而，可解释性收益（LLM叙述其推理过程）与潜在的过度依赖看似合理但次优建议之间存在冲突——这是人类代理权和算法效率之间的张力。

开放问题

LLM的语义偏差何时有助于搜索，何时有害？ 如果预训练数据过度代表某些解决方案类型，LLM引导的优化是否会在科学发现任务中系统性地错过非常规最优解？
我们能否量化纯贝叶斯优化与LLM增强方法之间的样本效率边界？ 在什么维度、评估成本和先验知识范围内，语言语境化的优势会超过过早收敛到看似合理但局部最优解的风险？