AI大模型多轮对话可靠性 - filings, earnings calls, financial reports, news

AI大模型多轮对话可靠性

Search documents

Sou Hu Cai Jing· 2026-02-21 14:26

Core Insights - A recent Microsoft study confirms that even the most advanced large language models experience a significant decline in reliability during multi-turn conversations [1][3] - The phenomenon termed "lost conversation" reveals a systemic flaw in these models [3] Performance Metrics - The success rate of these models in single prompt tasks can reach 90%, but drops to approximately 65% when the same tasks are broken down into multi-turn dialogues [6] - While the core capabilities of the models decrease by only about 15%, their "unreliability" surges by 112% in multi-turn scenarios [7][8] Behavioral Mechanisms - Two primary behaviors contribute to performance decline: "premature generation," where models attempt to provide final answers before fully understanding user needs, leading to compounded errors [10] - "Answer inflation" occurs in multi-turn dialogues, where response lengths increase by 20% to 300%, introducing more assumptions and "hallucinations" that affect subsequent reasoning [10] Model Limitations - Even next-generation reasoning models equipped with additional "thinking tokens," such as OpenAI o3 and DeepSeek R1, did not significantly improve performance in multi-turn conversations [12] - Current benchmark tests primarily focus on ideal single-turn scenarios, neglecting real-world model behavior, posing challenges for developers relying on AI for complex dialogue processes [12]

AI大模型多轮对话可靠性

Artificial Intelligence

Artificial Intelligence

GPT-4.1

Gemini 2.5 Pro

Claude 3.7 Sonnet