大模型幻觉

Search documents
DeepSeekR1幻觉率最高降低50%,用户喊话想要R2模型
Di Yi Cai Jing· 2025-05-29 14:10
Core Insights - The updated R1 model from DeepSeek has significantly improved its capabilities, particularly in reducing the "hallucination" rate, which previously stood at around 21% [1][4]. Model Performance - The new R1 model has achieved top-tier performance in various benchmark tests, surpassing all domestic models and nearing the performance of international leaders like o3 and Gemini-2.5-Pro [4]. - The hallucination rate has been reduced by approximately 45%-50% in tasks such as rewriting, summarization, and reading comprehension, providing more accurate and reliable results [4][18]. - In the AIME 2025 test, the model's accuracy improved from 70% to 87.5% in complex reasoning tasks [18]. Model Features and Capabilities - The updated R1 model can generate longer and more structured pieces of writing, including essays, novels, and prose, while aligning more closely with human writing styles [18]. - The model's coding capabilities have also seen significant enhancements, performing nearly on par with OpenAI's o3-high model in code testing environments [18]. - The new model has a parameter count of 685 billion and supports a context length of 128K in the open-source version [19]. Future Developments - There is considerable anticipation in the industry for the next-generation R2 model, with users expressing their eagerness for its release [19]. - DeepSeek has not commented on speculations regarding the R2 model, but the ongoing competition in the foundational model space remains intense [19].
巴菲特刚退休,他的 “替身” 就来帮大家炒股了?
Sou Hu Cai Jing· 2025-05-18 16:18
Group 1 - Warren Buffett, at the age of 94, announced his retirement, but his investment strategies can still be accessed through an AI tool called AI Hedge Fund, which incorporates strategies from nine renowned investors, including Buffett and his mentors [1][2] - The AI Hedge Fund has gained significant popularity, with users eager to test its effectiveness in the stock market, particularly in the A-share market [2][4] - Initial tests of the AI Hedge Fund showed promising results, with a hypothetical short position on Apple yielding a profit of approximately $140,000 if $1 million was invested based on the AI's predictions [4][8] Group 2 - The AI Hedge Fund allows users to select investment strategies from various renowned investors, with Buffett's analysis indicating concerns about Apple's financial health, including a debt-to-equity ratio of 4.2 and a current ratio of 0.9, leading to a bearish signal [6][7] - In a five-stock test, Buffett's predictions were accurate for four out of five stocks, demonstrating a high accuracy rate, although combining multiple investors' strategies reduced the accuracy to three correct predictions out of five [11][15] - The AI Hedge Fund includes a backtesting feature that allows users to validate the effectiveness of strategies using historical data, although the results may vary between predictions and actual outcomes [16][23] Group 3 - The AI Hedge Fund requires users to configure APIs for data access, with costs associated with using OpenAI's services, highlighting the financial investment needed to utilize the tool effectively [17][19] - The tool's predictive capabilities are based on defining the investment habits of various investors and using a large model to analyze current market conditions, although the predictions can be inconsistent [22][26] - The AI Hedge Fund is primarily intended for educational and research purposes, emphasizing the importance of understanding the reasoning behind investment decisions rather than blindly following AI-generated predictions [28][30]
整理:昨日今晨重要新闻汇总(5月18日)
news flash· 2025-05-18 00:17
Domestic News - The successful launch of the upgraded Zhuque-2 remote two rocket has been reported [4] - The J-10CE fighter jet has gained significant attention from global military enthusiasts after its first combat performance [4] - The Shenzhen Stock Exchange will host the 2025 Global Investor Conference from May 19 to 20 [4] - Tianjin is guiding social capital to establish angel investment funds and venture capital funds focused on the AI sector [4] - Ant Group's CTO He Zhengyu stated that the source of large model hallucinations is a lack of data [4] - In March, China reduced its holdings of US Treasury bonds by $18.9 billion, bringing its total holdings down to third place, while the UK rose to second [4] - CATL announced the official production launch of its battery production base in Shandong [4] - Guangzhou has raised mortgage rates by 10 basis points, with multiple banks already implementing the change [4] International News - Trump stated that he is not in a hurry to reach an agreement regarding India's proposal to reduce US tariffs [3] - Vietnam and the US held their first ministerial direct talks [3] - Japanese media reported that Japan is considering providing subsidies for Tesla charging stations in tariff negotiations with the US [3] - Hamas is willing to release some personnel in exchange for a two-month ceasefire, according to Israeli sources [3] - The US has proposed a 5% tax on remittances sent by non-citizens [4] - A 6.0 magnitude earthquake occurred in central Peru, with a depth of 100 kilometers [4] - European Central Bank's Schnabel emphasized the need for caution in interest rate measures [4]
李彦宏说 DeepSeek 幻觉高,是真的吗?
3 6 Ke· 2025-05-02 04:29
Core Insights - The article discusses the hallucination problem in large language models (LLMs), particularly focusing on DeepSeek-R1, which has a high hallucination rate compared to its predecessor and other models [2][6][13] - Li Yanhong criticizes DeepSeek-R1 for its limitations, including high hallucination rates, slow performance, and high costs, sparking discussions about the broader issues of hallucinations in AI models [2][6][19] - The hallucination phenomenon is not unique to DeepSeek, as other models like OpenAI's o3/o4-mini and Alibaba's Qwen3 also exhibit significant hallucination issues [3][8][13] Summary by Sections Hallucination Rates - DeepSeek-R1 has a hallucination rate of 14.3%, significantly higher than DeepSeek-V3's 3.9%, indicating a fourfold increase in hallucination [6][7] - Other models, such as Qwen-QwQ-32B-Preview, show even higher hallucination rates at 16.1% [6][7] - OpenAI's o3 model has a hallucination rate of 33%, nearly double that of its predecessor o1, while the lightweight o4-mini model reaches 48% [8][10] Industry Response - The AI industry is grappling with the persistent issue of hallucinations, which complicates the development of more advanced models [13][19] - Companies are exploring various methods to mitigate hallucinations, including retrieval-augmented generation (RAG) and strict data quality control [20][22][23] - Despite advancements in certain areas, such as multimodal outputs, hallucinations remain a significant challenge in generating long texts or complex visual scenarios [18][19] Implications of Hallucinations - Hallucinations are increasingly seen as a common trait among advanced models, raising questions about their reliability and user trust, especially in professional or high-stakes contexts [17][27] - The phenomenon of hallucinations may also contribute to creativity in AI, as they can lead to unexpected and imaginative outputs [24][26] - The acceptance of hallucinations as an inherent characteristic of AI models suggests a need for a paradigm shift in how AI is perceived and utilized [27]
AI动态汇总:MetaLIama4开源,openAI启动先锋计划
China Post Securities· 2025-04-15 10:50
- The report introduces the Llama 4 model series, which includes Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth, highlighting their advanced multimodal capabilities and efficiency through the MoE (Mixture of Experts) architecture[10][11][12] - Llama 4 Scout features 16 experts with 17 billion activated parameters, supports a 10M context window, and is optimized for single H100 GPU deployment, achieving state-of-the-art (SOTA) performance in various benchmarks[11][12] - Llama 4 Maverick employs 128 routed experts and a shared expert, activating only a subset of total parameters during inference, which reduces service costs and latency. It also incorporates post-training strategies like lightweight SFT, online RL, and DPO to balance model intelligence and conversational ability[12][14] - The CoDA method is introduced to mitigate hallucination in large language models (LLMs) by identifying overshadowed knowledge through mutual information calculations and suppressing dominant knowledge biases. This method significantly improves factual accuracy across datasets like MemoTrap, NQ-Swap, and Overshadow[23][25][29] - The KG-SFT framework enhances knowledge manipulation in LLMs by integrating external knowledge graphs. It includes components like Extractor (NER and BM25 for entity and triple extraction), Generator (HITS algorithm for generating explanatory text), and Detector (NLI models for detecting knowledge conflicts). KG-SFT demonstrates superior performance, especially in low-data scenarios, with a 14% accuracy improvement in English datasets[45][47][52] - DeepCoder-14B-Preview, an open-source code reasoning model, achieves competitive performance with only 14 billion parameters. It utilizes GRPO+ for stable training, iterative context length extension, and the verl-pipeline for efficient reinforcement learning. The model achieves a Pass@1 accuracy of 60.6% on LiveCodeBench and a Codeforces score of 1936, placing it in the 95.3rd percentile[53][61][64]
ERP厂商要被集体颠覆了?
虎嗅APP· 2025-03-27 10:21
Core Viewpoint - The traditional ERP systems are expected to decline, but the industry itself will not die. The emergence of AI Agents is set to disrupt the traditional SaaS landscape, leading to a new generation of SaaS solutions that leverage AI capabilities [3][5]. Group 1: Industry Transformation - The introduction of DeepSeek's strong reasoning capabilities and low-cost, open-source models is anticipated to bring significant disruption to the SaaS industry [4]. - Microsoft CEO's prediction that "AI Agents will replace all SaaS" is becoming a reality, with AI Agents expected to first impact B2B scenarios [5][6]. - Traditional SaaS vendors are urged to adapt to these changes or risk being eliminated from the competitive landscape [4][7]. Group 2: Application in Enterprises - Use cases for AI Agents in enterprises include automating complex internal processes, such as financial operations and contract management, which can significantly enhance efficiency [9][10]. - Companies like Yonyou have begun implementing AI Agents across various departments, allowing employees with minimal technical background to create intelligent assistants quickly [9][10]. - AI Agents can learn from historical data and improve their accuracy in tasks like revenue recognition, demonstrating the potential for self-learning and efficiency gains in business operations [14][16]. Group 3: Market Dynamics - The emergence of DeepSeek has altered the competitive dynamics between enterprise service providers and large model vendors, allowing for localized deployment and training of models [19][20]. - The software service providers are now in a stronger position, leveraging their industry expertise to drive innovation and create new applications [20]. - The stock prices of SaaS companies like Yonyou and Kingdee have risen in anticipation of the positive impact of AI Agents on their performance, indicating a potential market recovery for these firms [21].
AI的胡编乱造,正在淹没中文互联网
虎嗅APP· 2025-03-05 10:03
以下文章来源于阑夕 ,作者→ 阑夕 . 寻找科技与商业的光芒。 本文来自微信公众号: 阑夕 ,作者:阑夕,题图来自:AI生成 虽然DeepSeek-R1确实好用,但它在爆火之后,成了人手一个的AI工具,也对中文互联网的信息环境造成了严重的污染情况,这是一个固然难以避免 但也理应得到重视的问题。 最近一个星期以来,就我看到的刷屏文章,至少有三例都是DeepSeek-R1生成出来的、充满了事实错误的内容,却因其以假乱真的迷惑性,让很多朋友 信以为真,情绪激动地分享传播。 第一例,是知乎的这条高赞回答: 83% 知友推荐 提莫吃蘑菇 O 我已委托维权骑士。但他们拒绝了 ... Ω 30人听过 7036 人赞同了该回答 > 我表弟在杭州某游戏公司当原画师,他们加班到十 点就能上脉脉骂公司上热搜。但去年跳槽去追光动 画*,跟着团队为赶《白蛇2》连续三个月凌晨两点下 班,朋友圈却天天晒工作照配文"痛并快乐着"。我 问他是不是被洗脑了,他给我算了两笔账: 在游戏公司加班改图,甲方爸爸能让你把哪吒的混 天绫*从红色改成荧光粉;在动画电影团队,自己画 的眼神戏能直接决定千万观众的泪点。这种创作话 语权的差距,比敖丙*和虾兵蟹将 ...