Workflow
大模型幻觉
icon
Search documents
爆火AI神器“智算一体机”,如何迎接Agent元年?
Core Insights - The emergence of integrated AI computing machines, driven by DeepSeek, is making AI large models more accessible to the mass market, but challenges remain in application deployment [1][2] - The evolution of AI large models continues, with a focus on reducing model hallucinations and adapting to ongoing technological advancements [1][4] Group 1: Integrated AI Computing Machines - Integrated AI computing machines are pre-integrated solutions combining hardware, software platforms, models, and applications, lowering the barriers to AI adoption [2][3] - The current market features a diverse range of integrated AI computing machines, necessitating differentiation through performance optimization, customization, and business innovation to avoid price wars [2][3] - Key focus areas for integrated AI computing machines include computing power, model accessibility, and application development to meet diverse enterprise needs [2][3] Group 2: Application and Industry Impact - The deployment of integrated AI computing machines is not a one-time process; it requires continuous adjustment of computing power and model capabilities based on business needs [3][4] - The introduction of AIS one-stop intelligent platforms aids enterprises in quickly developing industry-specific models and exploring potential value scenarios [4][5] - Current applications of integrated AI computing machines are primarily in knowledge Q&A, customer assistance, and code assistance, but face challenges such as data quality and integration into existing workflows [3][4] Group 3: Addressing Model Hallucinations - Model hallucinations arise from the interplay of models, data, and tasks, and are an inherent risk in generative models [5][6] - Short-term solutions to mitigate hallucinations include using RAG, supervised fine-tuning, and exploring verification mechanisms [5][6] - The relationship between integrated AI computing machines and agents is synergistic, with each enhancing the capabilities of the other, leading to more efficient solutions [6]
紫东太初开源视觉神经增强方法,即插即用终结多模态幻觉 | ACL 2025
量子位· 2025-06-27 10:57
Core Viewpoint - The article discusses a novel solution, Visual Head Reinforcement (VHR), to address the hallucination phenomenon in Large Visual Language Models (LVLMs) by enhancing the model's attention mechanisms to better utilize visual information rather than relying on language priors [1][2][3]. Group 1: Introduction and Background - LVLMs often generate factually incorrect outputs due to an over-reliance on language knowledge instead of actual visual content, leading to hallucinations [4][5]. - Experiments show that when models are prompted to describe images, they frequently include entities not present in the images, indicating a systemic reliance on language co-occurrence patterns [4][5][7]. Group 2: VHR Methodology - VHR identifies and strengthens attention heads that are sensitive to visual information, thereby reducing the model's dependency on language priors and significantly lowering hallucination occurrences [8]. - The Visual Head Divergence (VHD) metric is introduced to quantify each attention head's sensitivity to visual inputs, revealing that only a few heads are responsive to visual information while most rely on language patterns [9][11]. - The VHR process involves filtering out abnormal VHD scores, selecting and scaling the outputs of the top 50% of attention heads based on VHD scores, and applying a layer-wise enhancement strategy to avoid interference [14][15][16]. Group 3: Experimental Results - VHR has been tested against multiple benchmarks, showing superior performance compared to existing methods while maintaining efficiency with minimal additional time costs [16][17]. - The results indicate that VHR outperforms baseline methods in various evaluations, demonstrating its effectiveness in reducing hallucinations in LVLMs [17]. Group 4: SSL Method - The article also introduces a Semantic Guided Learning (SSL) method that analyzes the internal representation space of models to mitigate hallucinations by injecting real semantic directions and suppressing hallucination-related projections [19][22]. - This method shows cross-model applicability, enhancing the robustness of hallucination mitigation across different LVLM architectures [22].
海致科技港股IPO:自称技术实力全球领先 研发费用及费用率连续下降且低于同行
Xin Lang Zheng Quan· 2025-06-20 07:39
Core Viewpoint - Haizhi Technology claims to be the first AI company in China to effectively reduce large model hallucinations through knowledge graphs, but its revenue from this AI agent business is relatively low, accounting for only 17.2% in 2024 [1][2]. Business Overview - Haizhi Technology's main business market share is only 1.11%, with the AI agent business market share at 2.8% [2][3]. - The company's Atlas intelligent agent revenue from 2022 to 2024 shows a growth from 0 to 86.55 million RMB, but still represents a small portion of total revenue [2][4]. Financial Performance - The company reported revenues of 313 million RMB, 376 million RMB, and 503 million RMB for 2022, 2023, and 2024 respectively, with net losses of 176 million RMB, 266 million RMB, and 94 million RMB, indicating a narrowing loss [4]. - The Atlas graph solution revenue accounted for 100%, 97.6%, and 82.8% of total revenue in the same years [4]. R&D Expenditure - R&D expenses for Haizhi Technology decreased from 86.94 million RMB in 2022 to 60.68 million RMB in 2024, with a significant drop in R&D expense ratio from 27.8% to 12.1% [6][9]. - The company’s R&D expenses are significantly lower than competitors like Minglue Technology and Xinghuan Technology, raising questions about its claimed technological advantages [9]. Market Position and Future Outlook - The market for integrated knowledge graph AI agents is projected to grow from 200 million RMB in 2024 to 13.2 billion RMB by 2029, with a compound annual growth rate of 140% [10]. - If Haizhi Technology can capitalize on opportunities for significant growth in AI agent revenue, it could strengthen its market position, although competition from internet giants poses potential challenges [11].
DeepSeek R1幻觉率降低,用户喊话:想要R2
第一财经· 2025-05-29 15:13
Core Viewpoint - The updated DeepSeek R1 model has significantly improved its capabilities, particularly in reducing hallucination rates and enhancing performance in complex reasoning tasks, positioning itself competitively against leading international models [2][9][12]. Group 1: Model Improvements - The new R1 model has reduced hallucination rates by approximately 45%-50% compared to the previous version, improving accuracy in tasks such as rewriting, summarization, and reading comprehension [9][12]. - In the AIME 2025 test, the model's accuracy increased from 70% to 87.5%, showcasing its enhanced mathematical reasoning abilities [12]. - The updated model is capable of generating longer and more structured written works, aligning more closely with human writing preferences [12]. Group 2: Benchmark Performance - The updated R1 model achieved top scores in various benchmark tests, outperforming all domestic models and nearing the performance of international leaders like o3 and Gemini-2.5-Pro [9][12]. - The model's performance in coding tasks has also improved significantly, nearly matching the capabilities of OpenAI's o3-high model [12]. Group 3: Technical Specifications - The new R1 model has 685 billion parameters and supports a context length of 128K in the open-source version, with 64K available in web, app, and API formats [13]. - The model continues to utilize the DeepSeek V3 Base model as its foundation, with enhanced computational resources applied during the training process to improve reasoning depth [12][13].
DeepSeekR1幻觉率最高降低50%,用户喊话想要R2模型
Di Yi Cai Jing· 2025-05-29 14:10
Core Insights - The updated R1 model from DeepSeek has significantly improved its capabilities, particularly in reducing the "hallucination" rate, which previously stood at around 21% [1][4]. Model Performance - The new R1 model has achieved top-tier performance in various benchmark tests, surpassing all domestic models and nearing the performance of international leaders like o3 and Gemini-2.5-Pro [4]. - The hallucination rate has been reduced by approximately 45%-50% in tasks such as rewriting, summarization, and reading comprehension, providing more accurate and reliable results [4][18]. - In the AIME 2025 test, the model's accuracy improved from 70% to 87.5% in complex reasoning tasks [18]. Model Features and Capabilities - The updated R1 model can generate longer and more structured pieces of writing, including essays, novels, and prose, while aligning more closely with human writing styles [18]. - The model's coding capabilities have also seen significant enhancements, performing nearly on par with OpenAI's o3-high model in code testing environments [18]. - The new model has a parameter count of 685 billion and supports a context length of 128K in the open-source version [19]. Future Developments - There is considerable anticipation in the industry for the next-generation R2 model, with users expressing their eagerness for its release [19]. - DeepSeek has not commented on speculations regarding the R2 model, but the ongoing competition in the foundational model space remains intense [19].
巴菲特刚退休,他的 “替身” 就来帮大家炒股了?
Sou Hu Cai Jing· 2025-05-18 16:18
Group 1 - Warren Buffett, at the age of 94, announced his retirement, but his investment strategies can still be accessed through an AI tool called AI Hedge Fund, which incorporates strategies from nine renowned investors, including Buffett and his mentors [1][2] - The AI Hedge Fund has gained significant popularity, with users eager to test its effectiveness in the stock market, particularly in the A-share market [2][4] - Initial tests of the AI Hedge Fund showed promising results, with a hypothetical short position on Apple yielding a profit of approximately $140,000 if $1 million was invested based on the AI's predictions [4][8] Group 2 - The AI Hedge Fund allows users to select investment strategies from various renowned investors, with Buffett's analysis indicating concerns about Apple's financial health, including a debt-to-equity ratio of 4.2 and a current ratio of 0.9, leading to a bearish signal [6][7] - In a five-stock test, Buffett's predictions were accurate for four out of five stocks, demonstrating a high accuracy rate, although combining multiple investors' strategies reduced the accuracy to three correct predictions out of five [11][15] - The AI Hedge Fund includes a backtesting feature that allows users to validate the effectiveness of strategies using historical data, although the results may vary between predictions and actual outcomes [16][23] Group 3 - The AI Hedge Fund requires users to configure APIs for data access, with costs associated with using OpenAI's services, highlighting the financial investment needed to utilize the tool effectively [17][19] - The tool's predictive capabilities are based on defining the investment habits of various investors and using a large model to analyze current market conditions, although the predictions can be inconsistent [22][26] - The AI Hedge Fund is primarily intended for educational and research purposes, emphasizing the importance of understanding the reasoning behind investment decisions rather than blindly following AI-generated predictions [28][30]
整理:昨日今晨重要新闻汇总(5月18日)
news flash· 2025-05-18 00:17
Domestic News - The successful launch of the upgraded Zhuque-2 remote two rocket has been reported [4] - The J-10CE fighter jet has gained significant attention from global military enthusiasts after its first combat performance [4] - The Shenzhen Stock Exchange will host the 2025 Global Investor Conference from May 19 to 20 [4] - Tianjin is guiding social capital to establish angel investment funds and venture capital funds focused on the AI sector [4] - Ant Group's CTO He Zhengyu stated that the source of large model hallucinations is a lack of data [4] - In March, China reduced its holdings of US Treasury bonds by $18.9 billion, bringing its total holdings down to third place, while the UK rose to second [4] - CATL announced the official production launch of its battery production base in Shandong [4] - Guangzhou has raised mortgage rates by 10 basis points, with multiple banks already implementing the change [4] International News - Trump stated that he is not in a hurry to reach an agreement regarding India's proposal to reduce US tariffs [3] - Vietnam and the US held their first ministerial direct talks [3] - Japanese media reported that Japan is considering providing subsidies for Tesla charging stations in tariff negotiations with the US [3] - Hamas is willing to release some personnel in exchange for a two-month ceasefire, according to Israeli sources [3] - The US has proposed a 5% tax on remittances sent by non-citizens [4] - A 6.0 magnitude earthquake occurred in central Peru, with a depth of 100 kilometers [4] - European Central Bank's Schnabel emphasized the need for caution in interest rate measures [4]
李彦宏说 DeepSeek 幻觉高,是真的吗?
3 6 Ke· 2025-05-02 04:29
Core Insights - The article discusses the hallucination problem in large language models (LLMs), particularly focusing on DeepSeek-R1, which has a high hallucination rate compared to its predecessor and other models [2][6][13] - Li Yanhong criticizes DeepSeek-R1 for its limitations, including high hallucination rates, slow performance, and high costs, sparking discussions about the broader issues of hallucinations in AI models [2][6][19] - The hallucination phenomenon is not unique to DeepSeek, as other models like OpenAI's o3/o4-mini and Alibaba's Qwen3 also exhibit significant hallucination issues [3][8][13] Summary by Sections Hallucination Rates - DeepSeek-R1 has a hallucination rate of 14.3%, significantly higher than DeepSeek-V3's 3.9%, indicating a fourfold increase in hallucination [6][7] - Other models, such as Qwen-QwQ-32B-Preview, show even higher hallucination rates at 16.1% [6][7] - OpenAI's o3 model has a hallucination rate of 33%, nearly double that of its predecessor o1, while the lightweight o4-mini model reaches 48% [8][10] Industry Response - The AI industry is grappling with the persistent issue of hallucinations, which complicates the development of more advanced models [13][19] - Companies are exploring various methods to mitigate hallucinations, including retrieval-augmented generation (RAG) and strict data quality control [20][22][23] - Despite advancements in certain areas, such as multimodal outputs, hallucinations remain a significant challenge in generating long texts or complex visual scenarios [18][19] Implications of Hallucinations - Hallucinations are increasingly seen as a common trait among advanced models, raising questions about their reliability and user trust, especially in professional or high-stakes contexts [17][27] - The phenomenon of hallucinations may also contribute to creativity in AI, as they can lead to unexpected and imaginative outputs [24][26] - The acceptance of hallucinations as an inherent characteristic of AI models suggests a need for a paradigm shift in how AI is perceived and utilized [27]
ERP厂商要被集体颠覆了?
虎嗅APP· 2025-03-27 10:21
Core Viewpoint - The traditional ERP systems are expected to decline, but the industry itself will not die. The emergence of AI Agents is set to disrupt the traditional SaaS landscape, leading to a new generation of SaaS solutions that leverage AI capabilities [3][5]. Group 1: Industry Transformation - The introduction of DeepSeek's strong reasoning capabilities and low-cost, open-source models is anticipated to bring significant disruption to the SaaS industry [4]. - Microsoft CEO's prediction that "AI Agents will replace all SaaS" is becoming a reality, with AI Agents expected to first impact B2B scenarios [5][6]. - Traditional SaaS vendors are urged to adapt to these changes or risk being eliminated from the competitive landscape [4][7]. Group 2: Application in Enterprises - Use cases for AI Agents in enterprises include automating complex internal processes, such as financial operations and contract management, which can significantly enhance efficiency [9][10]. - Companies like Yonyou have begun implementing AI Agents across various departments, allowing employees with minimal technical background to create intelligent assistants quickly [9][10]. - AI Agents can learn from historical data and improve their accuracy in tasks like revenue recognition, demonstrating the potential for self-learning and efficiency gains in business operations [14][16]. Group 3: Market Dynamics - The emergence of DeepSeek has altered the competitive dynamics between enterprise service providers and large model vendors, allowing for localized deployment and training of models [19][20]. - The software service providers are now in a stronger position, leveraging their industry expertise to drive innovation and create new applications [20]. - The stock prices of SaaS companies like Yonyou and Kingdee have risen in anticipation of the positive impact of AI Agents on their performance, indicating a potential market recovery for these firms [21].
AI的胡编乱造,正在淹没中文互联网
虎嗅APP· 2025-03-05 10:03
以下文章来源于阑夕 ,作者→ 阑夕 . 寻找科技与商业的光芒。 本文来自微信公众号: 阑夕 ,作者:阑夕,题图来自:AI生成 虽然DeepSeek-R1确实好用,但它在爆火之后,成了人手一个的AI工具,也对中文互联网的信息环境造成了严重的污染情况,这是一个固然难以避免 但也理应得到重视的问题。 最近一个星期以来,就我看到的刷屏文章,至少有三例都是DeepSeek-R1生成出来的、充满了事实错误的内容,却因其以假乱真的迷惑性,让很多朋友 信以为真,情绪激动地分享传播。 第一例,是知乎的这条高赞回答: 83% 知友推荐 提莫吃蘑菇 O 我已委托维权骑士。但他们拒绝了 ... Ω 30人听过 7036 人赞同了该回答 > 我表弟在杭州某游戏公司当原画师,他们加班到十 点就能上脉脉骂公司上热搜。但去年跳槽去追光动 画*,跟着团队为赶《白蛇2》连续三个月凌晨两点下 班,朋友圈却天天晒工作照配文"痛并快乐着"。我 问他是不是被洗脑了,他给我算了两笔账: 在游戏公司加班改图,甲方爸爸能让你把哪吒的混 天绫*从红色改成荧光粉;在动画电影团队,自己画 的眼神戏能直接决定千万观众的泪点。这种创作话 语权的差距,比敖丙*和虾兵蟹将 ...