Workflow
Cohere
icon
Search documents
AI 横扫医学问答,赢麻了?牛津大学团队实锤 AI 临床短板
3 6 Ke· 2025-05-13 08:04
Core Insights - The Oxford University study challenges the reliability of AI models in real-world medical scenarios, despite their high performance in controlled environments [1][3][11] - The research indicates that while AI models like GPT-4o and Llama 3 perform well in isolated tasks, their effectiveness diminishes when interacting with real users [5][10][12] Group 1: Study Design and Methodology - The study involved 1,298 participants who were presented with ten real medical scenarios to assess their decision-making regarding symptoms and treatment options [3][5] - Participants were divided into groups, with one group using AI assistance and the other relying on personal knowledge or search engines [5][10] - The AI models demonstrated high accuracy in identifying diseases and suggesting treatment options when evaluated independently [3][5] Group 2: Interaction Challenges - When real users interacted with AI, the accuracy of disease identification dropped to 34.5%, indicating a significant gap in practical application [5][7] - The study found that users often failed to provide complete information, leading to misdiagnoses by the AI [7][11] - The communication between users and AI was identified as a critical failure point, with users misunderstanding or not following AI recommendations [7][9] Group 3: Implications for AI in Healthcare - The findings suggest that high scores in controlled tests do not translate to effective real-world applications, highlighting the complexities of human-AI interaction [11][12] - The study emphasizes the need for improved communication strategies between AI systems and users to enhance the practical utility of AI in medical settings [12] - The research serves as a reminder that the integration of AI into healthcare requires addressing the challenges of human behavior and communication, rather than solely focusing on technological advancements [12]
Meta to start selling its Ray-Ban smart glasses in India from May 19
TechCrunch· 2025-05-13 07:27
Group 1 - Meta's Ray-Ban smart glasses will be available for sale in India starting May 19 at a price of ₹29,990 (approximately $353) [1] - The smart glasses support Meta AI, which can answer questions, translate audio and video live, send messages, and make calls [1] - Approximately 2 million pairs of the smart glasses have been sold since their launch in 2023 [2] Group 2 - The smart glasses currently support live translation for English, French, Italian, and Spanish, but do not yet support Indian languages [2] - The glasses can connect to music apps like Spotify, Amazon Music, Shazam, and Apple Music in India [2]
速递|OpenAI首投机构再出手!Khosla1750万美元押注“轻量化AI”Fastino,AI训练平民化
Z Potentials· 2025-05-08 05:33
图片来源: Fastino 科技巨头常吹嘘需要庞大昂贵 GPU 集群的万亿参数 AI 模型,但 Fastino 正采取截然不同的策略 这家位于帕洛阿尔托初创公司称,他们发明了一种新型 AI 模型架构,专为小型化和特定任务设计。 其模型小到仅需总值不足 10 万美元的低端游戏显卡即可完成训练。 该方法正引发关注。 Fastino 透露,已获得由 Khosla Ventures 领投的 1750 万美元种子轮融资,该风 投机构正是 OpenAI 的首个风险投资人。 这使得该初创公司的总融资额接近 2500 万美元。去年 11 月,它曾由微软风投部门 M12 和 Insight Partners 领投,在一轮预种子融资中筹集了 700 万美元。 "我们的模型速度更快、准确性更高,训练成本仅为旗舰模型的一小部分,同时在特定任务上表现优 于它们," Fastino 的CEO兼联合创始人 Ash Lewis 表示。 Fastino 开发了一套小型模型,销售给企业客户。每个模型专注于公司可能需要的特定任务,如敏感 数据脱敏或企业文档摘要。 Fastino 尚未透露早期指标或用户情况,但表示其性能已令早期用户惊叹。例如, L ...
68页论文再锤大模型竞技场!Llama4发布前私下测试27个版本,只取最佳成绩
量子位· 2025-05-02 04:36
Core Viewpoint - The credibility of large model rankings, particularly the Chatbot Arena, has been called into question due to systemic issues highlighted in a recent paper titled "The Leaderboard Illusion" [2][3]. Group 1: Issues Identified - The paper identifies four main issues with the current ranking system [8]. - First, selective reporting and private testing by major model providers (e.g., Meta, Google, Amazon) allow them to only disclose the best-performing versions of their models [10][11]. - This "best N out of 1" strategy inflates rankings, as testing multiple variants can significantly increase expected scores [13][14]. - Second, data access inequality exists, with major providers receiving a disproportionate amount of user feedback compared to open-source models [23]. - Third, the use of Arena data for training can lead to significant performance improvements, with a noted increase in win rates when training data usage rises [24][25]. - Fourth, many models are "silently deprecated," with 205 out of 243 public models being effectively abandoned, which undermines the reliability of rankings [27][28]. Group 2: Recommendations and Responses - The research team provided five improvement suggestions to enhance the ranking system's credibility [30]. - The official response from LMArena acknowledged some issues but defended the ranking system's integrity, emphasizing that it reflects community preferences [6][34]. - Alternative platforms like OpenRouter are suggested as potential options for more reliable model comparisons [36][37]. - The paper's findings have prompted a reconsideration of relying solely on one ranking system, highlighting the need for diverse benchmarks [35].
李彦宏说 DeepSeek 幻觉高,是真的吗?
3 6 Ke· 2025-05-02 04:29
Core Insights - The article discusses the hallucination problem in large language models (LLMs), particularly focusing on DeepSeek-R1, which has a high hallucination rate compared to its predecessor and other models [2][6][13] - Li Yanhong criticizes DeepSeek-R1 for its limitations, including high hallucination rates, slow performance, and high costs, sparking discussions about the broader issues of hallucinations in AI models [2][6][19] - The hallucination phenomenon is not unique to DeepSeek, as other models like OpenAI's o3/o4-mini and Alibaba's Qwen3 also exhibit significant hallucination issues [3][8][13] Summary by Sections Hallucination Rates - DeepSeek-R1 has a hallucination rate of 14.3%, significantly higher than DeepSeek-V3's 3.9%, indicating a fourfold increase in hallucination [6][7] - Other models, such as Qwen-QwQ-32B-Preview, show even higher hallucination rates at 16.1% [6][7] - OpenAI's o3 model has a hallucination rate of 33%, nearly double that of its predecessor o1, while the lightweight o4-mini model reaches 48% [8][10] Industry Response - The AI industry is grappling with the persistent issue of hallucinations, which complicates the development of more advanced models [13][19] - Companies are exploring various methods to mitigate hallucinations, including retrieval-augmented generation (RAG) and strict data quality control [20][22][23] - Despite advancements in certain areas, such as multimodal outputs, hallucinations remain a significant challenge in generating long texts or complex visual scenarios [18][19] Implications of Hallucinations - Hallucinations are increasingly seen as a common trait among advanced models, raising questions about their reliability and user trust, especially in professional or high-stakes contexts [17][27] - The phenomenon of hallucinations may also contribute to creativity in AI, as they can lead to unexpected and imaginative outputs [24][26] - The acceptance of hallucinations as an inherent characteristic of AI models suggests a need for a paradigm shift in how AI is perceived and utilized [27]
AI圈惊天丑闻,Meta作弊刷分实锤?顶级榜单曝黑幕,斯坦福MIT痛斥
猿大侠· 2025-05-02 04:23
Core Viewpoint - The LMArena ranking system is under scrutiny for potential manipulation by major AI companies, with researchers alleging that these companies have exploited the system to inflate their models' scores [1][2][12]. Group 1: Allegations of Manipulation - A recent paper from researchers at institutions like Stanford and MIT claims that AI companies are cheating on the LMArena rankings, using tactics to boost their scores at the expense of competitors [2][12]. - The paper analyzed 2.8 million battles across 238 models from 43 providers, revealing that certain companies implemented preferential policies that led to overfitting specific metrics rather than genuine AI advancements [13][14]. - Researchers noted that a lack of transparency in testing mechanisms allowed some companies to test multiple model variants privately and selectively withdraw low-scoring models, creating a biased ranking system [16][17]. Group 2: Data Disparities - Closed-source commercial models, such as those from Google and OpenAI, participated more frequently in LMArena compared to open-source models, leading to a long-term data access inequality [27][30]. - Google and OpenAI's models accounted for approximately 19.2% and 20.4% of all user battle data on LMArena, while 83 open-source models collectively represented only 29.7% [33]. - The availability of data can significantly impact model performance, with estimates suggesting that even limited additional data could yield up to a 112% relative performance improvement [36][37]. Group 3: Proposed Changes - The paper outlines five necessary changes to restore trust in LMArena: full disclosure of all tests, limiting the number of variants, ensuring fairness in model removal, equitable sampling, and increasing transparency [40]. - LMArena's management has been urged to revise their policies to address these concerns and improve the integrity of the ranking system [38][39]. Group 4: Official Response - LMArena has responded to the allegations, claiming that the paper contains numerous factual errors and misleading statements, asserting that they strive to treat all model providers fairly [41][42]. - The organization emphasized that their policies regarding model testing and ranking have been publicly shared and that they have consistently aimed to maintain transparency [50][51]. Group 5: Future Directions - Andrej Karpathy, a prominent figure in AI, expressed skepticism about LMArena's integrity and recommended OpenRouterAI as a potential alternative ranking platform that may be less susceptible to manipulation [51][56]. - The evolution of LMArena from a student project to a widely scrutinized ranking system highlights the challenges of maintaining objectivity amid increasing corporate interest and investment in AI technologies [58][60].
CoreWeave大规模上线英伟达GB200服务器
news flash· 2025-04-17 06:00
Core Point - CoreWeave has become one of the first cloud service providers to deploy NVIDIA's GB200 NVL72 systems at scale, with Cohere, IBM, and Mistral AI as initial users [1] Performance Improvement - The new systems offer a performance improvement of 2 to 3 times compared to the previous H100 chips, significantly enhancing large model training and inference capabilities according to the latest MLPerf benchmark tests [1]
全球首个!“英伟达亲儿子”CoreWeave大规模上线GB200服务器
硬AI· 2025-04-16 09:52
点击 上方 硬AI 关注我们 测试结果显示,相比前代英伟达Hopper GPU,GB200 NVL72服务器能帮助Cohere在1000亿参数模型的训练实现高达3 倍的性能提升,此外,IBM和Mistral AI也已成为CoreWeave GB200云服务的首批用户。 "世界各地的企业和组织正在竞相将推理模型转化为代理型人工智能应用,这将改变人们的工作和娱 乐方式。" 硬·AI 作者 | 李笑寅 编辑 | 硬 AI CoreWeave再度抢占先机,率先部署英伟达GB200系统,AI巨头争相入局。 英伟达今日在其博客上宣布, AI云计算提供商CoreWeave已成为首批大规模部署英伟达GB200 NVL72 系统的云服务提供商之一。Cohere、IBM和Mistral AI已成为首批用户。 根据最新MLPerf基准测试,这些系统提供了前代H100芯片2-3倍的性能提升,将显著加速大模型训练和推 理能力。 CoreWeave首席执行官Michael Intrator表示,这一成就既展示了公司的工程实力和执行速度,也体现了 其对下一代AI发展的专注: "CoreWeave的设计就是为了更快速地行动——我们一次又一次 ...
NVIDIA Dynamo Open-Source Library Accelerates and Scales AI Reasoning Models
Globenewswire· 2025-03-18 18:17
Core Insights - NVIDIA has launched NVIDIA Dynamo, an open-source inference software aimed at enhancing AI reasoning models' performance and cost efficiency in AI factories [1][3][13] - The software is designed to maximize token revenue generation by orchestrating inference requests across a large fleet of GPUs, significantly improving throughput and reducing costs [2][3][4] Performance Enhancements - NVIDIA Dynamo doubles the performance and revenue of AI factories using the same number of GPUs when serving Llama models on the NVIDIA Hopper platform [4] - The software's intelligent inference optimizations can increase the number of tokens generated by over 30 times per GPU when running the DeepSeek-R1 model [4] Key Features - NVIDIA Dynamo includes several innovations such as a GPU Planner for dynamic GPU management, a Smart Router to minimize costly recomputations, a Low-Latency Communication Library for efficient data transfer, and a Memory Manager for cost-effective data handling [14][15] - The platform supports disaggregated serving, allowing different computational phases of large language models to be optimized independently across various GPUs [9][14] Industry Adoption - Major companies like Perplexity AI and Together AI are planning to leverage NVIDIA Dynamo for enhanced inference-serving efficiencies and to meet the compute demands of new AI reasoning models [8][10][11] - The software supports various frameworks including PyTorch and NVIDIA TensorRT, facilitating its adoption across enterprises, startups, and research institutions [6][14]
速递|英伟达正在构建AI帝国,从GPU霸主到初创企业收割机
Z Potentials· 2025-03-17 13:14
Core Insights - Nvidia has dramatically capitalized on the AI revolution, with significant increases in revenue, profitability, and cash reserves since the launch of ChatGPT two years ago [1] - The company has accelerated investments in AI startups, reinforcing its market position in GPUs and CUDA [2] Investment Activities - In 2024, Nvidia participated in 49 rounds of financing for AI companies, a substantial increase from 34 rounds in 2023 and only 38 rounds over the previous four years [3] - Nvidia's corporate investment aims to support startups that are considered "game changers and market creators" to expand the AI ecosystem [3] Notable Investments - Nvidia invested $100 million in OpenAI during a $6.6 billion funding round, raising OpenAI's valuation to $157 billion [5] - The company also participated in a $6 billion funding round for Elon Musk's xAI, despite prior commitments not to invest in direct competitors [5] - Nvidia was a major investor in Inflection's $1.3 billion funding round, which led to a significant technology licensing deal with Microsoft [6] - In May 2024, Nvidia co-invested $1 billion in Scale AI, which provides data labeling services for training AI models, raising the company's valuation to nearly $14 billion [6] Million-Dollar Club - Nvidia participated in a $686 million funding round for Crusoe, a startup building data centers for major tech companies [7] - In February 2024, Nvidia joined a $675 million funding round for Figure AI, raising the company's valuation to $2.6 billion [7] Over $100 Million Transactions - Nvidia participated in a $155 million funding round for Ayar Labs, which focuses on developing optical interconnect technology to enhance AI computing efficiency [12] - The company also invested in Weka's $140 million funding round, raising the company's valuation to $1.6 billion [13]