Workflow
大模型幻觉
icon
Search documents
DeepSeek R1幻觉率降低,用户喊话:想要R2
第一财经· 2025-05-29 15:13
Core Viewpoint - The updated DeepSeek R1 model has significantly improved its capabilities, particularly in reducing hallucination rates and enhancing performance in complex reasoning tasks, positioning itself competitively against leading international models [2][9][12]. Group 1: Model Improvements - The new R1 model has reduced hallucination rates by approximately 45%-50% compared to the previous version, improving accuracy in tasks such as rewriting, summarization, and reading comprehension [9][12]. - In the AIME 2025 test, the model's accuracy increased from 70% to 87.5%, showcasing its enhanced mathematical reasoning abilities [12]. - The updated model is capable of generating longer and more structured written works, aligning more closely with human writing preferences [12]. Group 2: Benchmark Performance - The updated R1 model achieved top scores in various benchmark tests, outperforming all domestic models and nearing the performance of international leaders like o3 and Gemini-2.5-Pro [9][12]. - The model's performance in coding tasks has also improved significantly, nearly matching the capabilities of OpenAI's o3-high model [12]. Group 3: Technical Specifications - The new R1 model has 685 billion parameters and supports a context length of 128K in the open-source version, with 64K available in web, app, and API formats [13]. - The model continues to utilize the DeepSeek V3 Base model as its foundation, with enhanced computational resources applied during the training process to improve reasoning depth [12][13].
DeepSeekR1幻觉率最高降低50%,用户喊话想要R2模型
Di Yi Cai Jing· 2025-05-29 14:10
Core Insights - The updated R1 model from DeepSeek has significantly improved its capabilities, particularly in reducing the "hallucination" rate, which previously stood at around 21% [1][4]. Model Performance - The new R1 model has achieved top-tier performance in various benchmark tests, surpassing all domestic models and nearing the performance of international leaders like o3 and Gemini-2.5-Pro [4]. - The hallucination rate has been reduced by approximately 45%-50% in tasks such as rewriting, summarization, and reading comprehension, providing more accurate and reliable results [4][18]. - In the AIME 2025 test, the model's accuracy improved from 70% to 87.5% in complex reasoning tasks [18]. Model Features and Capabilities - The updated R1 model can generate longer and more structured pieces of writing, including essays, novels, and prose, while aligning more closely with human writing styles [18]. - The model's coding capabilities have also seen significant enhancements, performing nearly on par with OpenAI's o3-high model in code testing environments [18]. - The new model has a parameter count of 685 billion and supports a context length of 128K in the open-source version [19]. Future Developments - There is considerable anticipation in the industry for the next-generation R2 model, with users expressing their eagerness for its release [19]. - DeepSeek has not commented on speculations regarding the R2 model, but the ongoing competition in the foundational model space remains intense [19].
巴菲特刚退休,他的 “替身” 就来帮大家炒股了?
Sou Hu Cai Jing· 2025-05-18 16:18
Group 1 - Warren Buffett, at the age of 94, announced his retirement, but his investment strategies can still be accessed through an AI tool called AI Hedge Fund, which incorporates strategies from nine renowned investors, including Buffett and his mentors [1][2] - The AI Hedge Fund has gained significant popularity, with users eager to test its effectiveness in the stock market, particularly in the A-share market [2][4] - Initial tests of the AI Hedge Fund showed promising results, with a hypothetical short position on Apple yielding a profit of approximately $140,000 if $1 million was invested based on the AI's predictions [4][8] Group 2 - The AI Hedge Fund allows users to select investment strategies from various renowned investors, with Buffett's analysis indicating concerns about Apple's financial health, including a debt-to-equity ratio of 4.2 and a current ratio of 0.9, leading to a bearish signal [6][7] - In a five-stock test, Buffett's predictions were accurate for four out of five stocks, demonstrating a high accuracy rate, although combining multiple investors' strategies reduced the accuracy to three correct predictions out of five [11][15] - The AI Hedge Fund includes a backtesting feature that allows users to validate the effectiveness of strategies using historical data, although the results may vary between predictions and actual outcomes [16][23] Group 3 - The AI Hedge Fund requires users to configure APIs for data access, with costs associated with using OpenAI's services, highlighting the financial investment needed to utilize the tool effectively [17][19] - The tool's predictive capabilities are based on defining the investment habits of various investors and using a large model to analyze current market conditions, although the predictions can be inconsistent [22][26] - The AI Hedge Fund is primarily intended for educational and research purposes, emphasizing the importance of understanding the reasoning behind investment decisions rather than blindly following AI-generated predictions [28][30]
整理:昨日今晨重要新闻汇总(5月18日)
news flash· 2025-05-18 00:17
Domestic News - The successful launch of the upgraded Zhuque-2 remote two rocket has been reported [4] - The J-10CE fighter jet has gained significant attention from global military enthusiasts after its first combat performance [4] - The Shenzhen Stock Exchange will host the 2025 Global Investor Conference from May 19 to 20 [4] - Tianjin is guiding social capital to establish angel investment funds and venture capital funds focused on the AI sector [4] - Ant Group's CTO He Zhengyu stated that the source of large model hallucinations is a lack of data [4] - In March, China reduced its holdings of US Treasury bonds by $18.9 billion, bringing its total holdings down to third place, while the UK rose to second [4] - CATL announced the official production launch of its battery production base in Shandong [4] - Guangzhou has raised mortgage rates by 10 basis points, with multiple banks already implementing the change [4] International News - Trump stated that he is not in a hurry to reach an agreement regarding India's proposal to reduce US tariffs [3] - Vietnam and the US held their first ministerial direct talks [3] - Japanese media reported that Japan is considering providing subsidies for Tesla charging stations in tariff negotiations with the US [3] - Hamas is willing to release some personnel in exchange for a two-month ceasefire, according to Israeli sources [3] - The US has proposed a 5% tax on remittances sent by non-citizens [4] - A 6.0 magnitude earthquake occurred in central Peru, with a depth of 100 kilometers [4] - European Central Bank's Schnabel emphasized the need for caution in interest rate measures [4]
李彦宏说 DeepSeek 幻觉高,是真的吗?
3 6 Ke· 2025-05-02 04:29
Core Insights - The article discusses the hallucination problem in large language models (LLMs), particularly focusing on DeepSeek-R1, which has a high hallucination rate compared to its predecessor and other models [2][6][13] - Li Yanhong criticizes DeepSeek-R1 for its limitations, including high hallucination rates, slow performance, and high costs, sparking discussions about the broader issues of hallucinations in AI models [2][6][19] - The hallucination phenomenon is not unique to DeepSeek, as other models like OpenAI's o3/o4-mini and Alibaba's Qwen3 also exhibit significant hallucination issues [3][8][13] Summary by Sections Hallucination Rates - DeepSeek-R1 has a hallucination rate of 14.3%, significantly higher than DeepSeek-V3's 3.9%, indicating a fourfold increase in hallucination [6][7] - Other models, such as Qwen-QwQ-32B-Preview, show even higher hallucination rates at 16.1% [6][7] - OpenAI's o3 model has a hallucination rate of 33%, nearly double that of its predecessor o1, while the lightweight o4-mini model reaches 48% [8][10] Industry Response - The AI industry is grappling with the persistent issue of hallucinations, which complicates the development of more advanced models [13][19] - Companies are exploring various methods to mitigate hallucinations, including retrieval-augmented generation (RAG) and strict data quality control [20][22][23] - Despite advancements in certain areas, such as multimodal outputs, hallucinations remain a significant challenge in generating long texts or complex visual scenarios [18][19] Implications of Hallucinations - Hallucinations are increasingly seen as a common trait among advanced models, raising questions about their reliability and user trust, especially in professional or high-stakes contexts [17][27] - The phenomenon of hallucinations may also contribute to creativity in AI, as they can lead to unexpected and imaginative outputs [24][26] - The acceptance of hallucinations as an inherent characteristic of AI models suggests a need for a paradigm shift in how AI is perceived and utilized [27]
AI动态汇总:MetaLIama4开源,openAI启动先锋计划
China Post Securities· 2025-04-15 10:50
- The report introduces the Llama 4 model series, which includes Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth, highlighting their advanced multimodal capabilities and efficiency through the MoE (Mixture of Experts) architecture[10][11][12] - Llama 4 Scout features 16 experts with 17 billion activated parameters, supports a 10M context window, and is optimized for single H100 GPU deployment, achieving state-of-the-art (SOTA) performance in various benchmarks[11][12] - Llama 4 Maverick employs 128 routed experts and a shared expert, activating only a subset of total parameters during inference, which reduces service costs and latency. It also incorporates post-training strategies like lightweight SFT, online RL, and DPO to balance model intelligence and conversational ability[12][14] - The CoDA method is introduced to mitigate hallucination in large language models (LLMs) by identifying overshadowed knowledge through mutual information calculations and suppressing dominant knowledge biases. This method significantly improves factual accuracy across datasets like MemoTrap, NQ-Swap, and Overshadow[23][25][29] - The KG-SFT framework enhances knowledge manipulation in LLMs by integrating external knowledge graphs. It includes components like Extractor (NER and BM25 for entity and triple extraction), Generator (HITS algorithm for generating explanatory text), and Detector (NLI models for detecting knowledge conflicts). KG-SFT demonstrates superior performance, especially in low-data scenarios, with a 14% accuracy improvement in English datasets[45][47][52] - DeepCoder-14B-Preview, an open-source code reasoning model, achieves competitive performance with only 14 billion parameters. It utilizes GRPO+ for stable training, iterative context length extension, and the verl-pipeline for efficient reinforcement learning. The model achieves a Pass@1 accuracy of 60.6% on LiveCodeBench and a Codeforces score of 1936, placing it in the 95.3rd percentile[53][61][64]
ERP厂商要被集体颠覆了?
虎嗅APP· 2025-03-27 10:21
Core Viewpoint - The traditional ERP systems are expected to decline, but the industry itself will not die. The emergence of AI Agents is set to disrupt the traditional SaaS landscape, leading to a new generation of SaaS solutions that leverage AI capabilities [3][5]. Group 1: Industry Transformation - The introduction of DeepSeek's strong reasoning capabilities and low-cost, open-source models is anticipated to bring significant disruption to the SaaS industry [4]. - Microsoft CEO's prediction that "AI Agents will replace all SaaS" is becoming a reality, with AI Agents expected to first impact B2B scenarios [5][6]. - Traditional SaaS vendors are urged to adapt to these changes or risk being eliminated from the competitive landscape [4][7]. Group 2: Application in Enterprises - Use cases for AI Agents in enterprises include automating complex internal processes, such as financial operations and contract management, which can significantly enhance efficiency [9][10]. - Companies like Yonyou have begun implementing AI Agents across various departments, allowing employees with minimal technical background to create intelligent assistants quickly [9][10]. - AI Agents can learn from historical data and improve their accuracy in tasks like revenue recognition, demonstrating the potential for self-learning and efficiency gains in business operations [14][16]. Group 3: Market Dynamics - The emergence of DeepSeek has altered the competitive dynamics between enterprise service providers and large model vendors, allowing for localized deployment and training of models [19][20]. - The software service providers are now in a stronger position, leveraging their industry expertise to drive innovation and create new applications [20]. - The stock prices of SaaS companies like Yonyou and Kingdee have risen in anticipation of the positive impact of AI Agents on their performance, indicating a potential market recovery for these firms [21].
AI的胡编乱造,正在淹没中文互联网
虎嗅APP· 2025-03-05 10:03
Core Viewpoint - The article discusses the rapid proliferation of AI-generated content, particularly from the DeepSeek-R1 model, and highlights the significant misinformation it can produce, which poses a risk to the integrity of information on the Chinese internet [2][20]. Group 1: AI Tool Impact - DeepSeek-R1 has become widely used, leading to a saturation of AI-generated content that often contains factual inaccuracies, which can mislead users [2][20]. - The article cites specific examples of misinformation generated by DeepSeek-R1, including a popular but erroneous response on Zhihu regarding the animation industry [9][10]. Group 2: Misinformation Examples - The first example involves a misleading narrative about the animation industry, claiming that a specific animation scene was well-received at the Annecy Animation Festival, which was factually incorrect [9][10]. - The second example discusses a fabricated article about military corruption, which included outrageous claims that were entirely invented by DeepSeek-R1 [11][12]. Group 3: AI Model Characteristics - DeepSeek-R1 is noted for its high "hallucination" rate of 14.3%, indicating a tendency to generate false information, which is higher than other models like Deepseek-V3 [15]. - The model's design encourages it to fabricate details to meet user prompts, leading to a blend of truth and fiction in its outputs [12][14]. Group 4: Broader Implications - The misuse of AI tools like DeepSeek-R1 for generating misleading information can have severe consequences, especially in sensitive areas such as politics, history, and culture [16][20]. - The article emphasizes the need for clear labeling of AI-generated content to prevent the blending of real and fabricated information, which could further complicate the information landscape [20].