Workflow
机器之心
icon
Search documents
vLLM团队官宣创业:融资1.5亿美元,清华特奖游凯超成为联创
机器之心· 2026-01-23 00:45
编辑|泽南 大模型推理的基石 vLLM,现在成为创业公司了。 北京时间周五凌晨传来消息,由开源软件 vLLM 的创建者创立的人工智能初创公司 Inferact 正式成立,其在种子轮融资中筹集了 1.5 亿美元(约合 10 亿 元人民币),公司估值达到 8 亿美元。 该公司认为,AI 行业未来面临的最大挑战不是构建新模型,而是如何以低成本、高可靠性地运行现有模型。 毫无疑问,Inferact 的核心是开源项目 vLLM,这是一个于 2023 年启动的开源项目,旨在帮助企业在数据中心硬件上高效运行 AI 模型。 | III | | | | | | | | | Sign in | 글도 | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | ಇ vllm-project / vllm | | | Sponsor | 2 Notifications | | ಳಿ Fork 12.8k | | 8 | Star ( | 68.2k | | <> Code | · Issues (1.7k | 8% Pull requests 1.4 ...
一文速通「机器人3D场景表示」发展史
机器之心· 2026-01-23 00:45
Core Viewpoint - The article discusses the rapid development of robotics and the need for robots to understand the world similarly to humans, focusing on various scene representation methods in robotics [2][4]. Group 1: Historical Development of 3D Scene Representation - The integration of deep learning, computer graphics, and robotics has led to significant advancements, with Neural Radiance Fields (NeRF), 3D Gaussian Splatting, and Foundation Models emerging as promising innovations for achieving general embodied intelligence [8]. Group 2: Types of Scene Representation - Point Cloud: Represents scenes using discrete 3D points obtained from radar or camera sensors [10]. - Voxel: Discretizes 3D space into regular cubic grids, storing various information like density and occupancy [10]. - Mesh: Constructs continuous geometric representations of scenes through triangulated surfaces, offering higher detail [10]. - Signed Distance Function (SDF): Represents the distance from spatial points to object surfaces for continuous geometric representation [10]. Group 3: Applications in Robotics - In mapping and localization, existing methods have achieved remarkable results in SLAM, with neural scene representations enabling more precise and dense modeling, beneficial for obstacle avoidance [15]. - In the operation module, traditional methods excel in real-time performance and computational efficiency for grasping tasks, while neural network-based representations show better generalization capabilities for complex tasks [15]. - Navigation tasks benefit from neural scene representations, which provide accurate environmental reconstruction and better integration of semantic and language information for complex navigation tasks [16]. Group 4: Challenges and Future Directions - The article identifies three main challenges: 1. The need for end-to-end general networks versus modular systems, highlighting the limitations of modular intelligence in terms of generalization and transferability [19]. 2. Data scarcity in robotics compared to large language models, which hinders the development of neural scene representations and foundation models [20]. 3. Real-time performance bottlenecks in deploying neural scene representations, with a focus on cloud-based versus onboard deployment strategies [21]. Group 5: Contributions and Resources - The article provides a comprehensive and up-to-date review of various scene representation methods in robotics, detailing the advantages of different representations for each module [22]. - It highlights future research directions to address current technical limitations and encourages further advancements in this rapidly evolving field [22]. - An open-source project on GitHub has been launched to compile relevant articles and continue adding new research findings in the field of robotics [22].
幻觉率不到3%,王小川把医生版的DeepSeek免费了
机器之心· 2026-01-22 11:00
编辑|泽南 在医疗健康这一容错率极低的领域,大模型不再凭空「想象」,而是已变得严谨可靠、能引会搜:百川刚刚推出的新模型,实现了一个里程碑式的突破。 本周四,百川智能正式发布新一代大模型 Baichuan-M3 Plus,其面向医疗应用开发者,在真实场景下将医学问题推理能力推向了全新高度。新模型发布的 同时,接入 M3 Plus 的百小应 App 与网页版也已同步上线。 在 AI 领域,从来没有一款大模型可以做到 M3 Plus 这么高的医学场景准确率,百川还大幅提升了模型的推理效率,M3 Plus 的发布,标志着 AI 在医疗领 域的应用跨过了「敢用、好用、用得起」的关键门槛。 百川智能创始人、CEO 王小川表示,在垂直领域,M3 Plus 已经可以认为是医生版的 ChatGPT 或 DeepSeek,作为性能最强、推理效率最高的模型,可 大规模用于 AI 辅助医疗落地。 全球最低幻觉率 从看着像,到真的准 长期以来,医生与患者对 AI 的态度一直存在矛盾:人们既期待 AI 能分担繁重的工作,又恐惧它们「一本正经地胡说八道」。信任,是 AI 进入医疗领域的 最后一道墙。 在发布活动中,百川智能模型技术负责人鞠 ...
清华姚班校友刘壮团队再发力,无需归一化的Transformer性能进化
机器之心· 2026-01-22 11:00
编辑|陈陈、冷猫 刘壮带队的无需归一化 Transformer 又有新的版本了。 一直以来,在 Transformer 架构里,LayerNorm 几乎是标配,但它也有明显问题:比如计算和访存成本高,尤其在大模型推理阶段。 因此,「无归一化(Normalization-Free)」Transformer 成为研究者探索的一个长期目标,但一直卡在两个难点上:训练不稳定,以及性能明显不如带归一化的模 型。 而这篇新论文提出了一种非常简单的新激活层 Derf(Dynamic erf),让「无归一化(Normalization-Free)」的 Transformer 不仅能稳定训练,还在多个设置下性 能超过了带 LayerNorm 的标准 Transformer。 刘壮本人也在 X 账号上分享了这一成果。他表示,这是一篇关于更强无归一化 Transformer 的新论文:研究团队提出了 Derf(Dynamic erf),一种结构极其简单 的逐点(point-wise)层。借助 Derf,完全不依赖归一化层的 Transformer 不仅能够稳定训练,而且 在实 际性能上 已经可以超越传统依赖 LayerNorm 等 ...
苹果入局AI Pin,或对标OpenAI,能否打破「电子垃圾」魔咒?
机器之心· 2026-01-22 11:00
Core Viewpoint - Apple is reportedly developing an AI-driven wearable "Pin" device, which is still in the early stages of development and may not be released until 2027 [1]. Group 1: Product Specifications - The device is expected to be similar in size to the AirTag, featuring a "thin, flat circular" design made of aluminum and glass. It will include a standard lens and a wide-angle lens for environmental sensing, three microphones, a speaker, a physical button on the side, and support for wireless charging, potentially using a magnetic induction charging interface similar to the Apple Watch [3]. Group 2: Market Context and Competition - The entry of Apple into the hardware AI Pin market could revitalize interest, especially given the previous challenges faced by companies like Humane, which aimed to create a smartphone replacement but faced significant product failures and ultimately was acquired by HP for $116 million [5][10]. - Humane's AI Pin, launched in November 2023 at a price of $699 with a monthly subscription fee of $24, saw disappointing sales of only 10,000 units by summer 2024, far below its target of 100,000 units [7][8]. - The failure of Humane's product was attributed to multiple factors, including immature technology, high development costs, and an exorbitant price point [10]. Group 3: Future Prospects - The AI hardware market is seen as poised for growth, with various AI wearable devices like AI glasses and AI headphones being developed as potential next-generation interaction points [10]. - Apple is reportedly accelerating the development of its AI Pin to compete with OpenAI's upcoming wearable device, which is expected to be launched in the second half of 2026 [10][11]. - OpenAI has hinted at a new hardware device that promises simplicity and ease of use, although specific details remain undisclosed [11].
Meta新模型要来了,但Llama 4的锅谁来接?1300多位作者的联合报告来了
机器之心· 2026-01-22 08:13
Core Insights - Meta's newly established AI team has delivered its first key models internally this month, as stated by CTO Andrew Bosworth, who described the models as "very good" [1] - The company is developing a text AI model codenamed Avocado, expected to be released in Q1, and an image and video AI model codenamed Mango [1] - A technical report titled "Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes" has been uploaded to arXiv, reviewing the data and technical achievements claimed by the Meta Llama 4 series [1][5] Summary by Sections Technical Report Overview - The report includes contributions from over 1300 authors, indicating a collaborative effort from the Llama 4 team, despite some contributors having left Meta [4] - It emphasizes that the document is an independent investigation of publicly available materials, with benchmark values attributed to model cards [4] Model Performance and Limitations - The report highlights a gap between the architectural capabilities of the models and their actual deployment performance, particularly regarding context length [4][7] - It mentions that while the architecture supports a context length of 10 million tokens, practical deployment often limits this due to hardware constraints [7] Controversies and Criticisms - The report addresses criticisms regarding the Llama 4 series, particularly the discrepancies between leaderboard performance and real-world application [8][11] - It notes that the experimental variant submitted to the LMArena leaderboard differs from the publicly released version, leading to accusations of "gaming AI benchmarks" [11] - Marketing claims made in announcements should be distinguished from rigorous model card benchmark results, as some statements are categorized as "marketing-facing claims" [11] Model Variants and Features - The report summarizes the released model variants, including Llama 4 Scout and Llama 4 Maverick, detailing their architectures, active parameters, modalities, and supported languages [9][10] - It also discusses the training disclosures and deployment limitations observed in major service environments [12]
AAAI杰出论文来了!港科大、同济、浙师大等国内高校获奖
机器之心· 2026-01-22 08:13
编辑|张倩、陈陈 刚刚,AAAI 2026 官网公布了今年的「杰出论文」(相当于最佳论文)奖项,共有 5 篇论文获奖,其中有三篇由华人团队主导,作者来自香港科技大学(广 州)、西湖大学、浙江大学、同济大学、浙江师范大学、香港城市大学等多所国内高校。 AAAI 由国际人工智能促进协会主办,是人工智能领域历史最悠久、涵盖内容最广泛的国际顶级学术会议之一,也是中国计算机学会(CCF)推荐的 A 类国际学术 会议,每年举办一届。 AAAI 2026 于 1 月 20 日至 27 日在新加坡举行,总投稿数为 23,680 篇,录用论文 4,167 篇,接收率为 17.6%。 以下是获奖论文的具体情况。 近年来,视觉 — 语言 — 动作(VLA)模型的进展,使机器人智能体能够将多模态理解与动作执行相结合。然而,实证分析发现,现有的 VLA 模型在将视觉注意 力分配到目标区域时仍然存在明显困难,其注意力往往呈现分散状态。 为引导视觉注意力在正确目标上的有效 grounding ,作者提出了 ReconVLA,一种采用隐式对齐范式的重建式 VLA 模型。 论文 1:ReconVLA: Reconstructive Visio ...
拒绝成为落后的开发者:用TRAE Skills构建你的10倍效能工具箱
机器之心· 2026-01-22 04:05
Core Insights - The article emphasizes the emergence of "Skill" as a pivotal concept in the AI programming field, marking a transition towards "experience assetization" and the standardization of professional capabilities [3][44] - The introduction of Skill allows for the encapsulation of complex instructions and resources into reusable professional skill packages, enhancing productivity across various work scenarios [3][8] Group 1: Definition and Functionality of Skill - Skill is defined as a "professional skill package," represented by a SKILL.md file that contains detailed instructions, automation scripts, and template resources necessary for specific tasks [10][15] - The dynamic calling mechanism of Skill addresses the core pain point of token consumption and task focus in AI programming, allowing for efficient use of resources [15][16] Group 2: Evolution and Integration of Skill - The integration of Skill into platforms like TRAE signifies a shift from AI tools as assistants to digital employees, enabling developers to create reusable workflows [7][8] - TRAE's Skill functionality allows users to easily configure and utilize skills, even with no coding background, thus democratizing access to advanced AI capabilities [19][21] Group 3: Practical Applications and Impact - The article illustrates how Skill can significantly enhance productivity, with examples showing TRAE's ability to automate tasks and improve efficiency in real-world scenarios [18][24] - The potential for Skill to serve as a personal digital assistant is highlighted, enabling users to streamline various tasks such as file management and content generation [40][41] Group 4: Future Outlook and Opportunities - The article suggests that mastering and building a personal "skill library" will be crucial for developers to adapt to the evolving AI landscape and achieve significant productivity gains [44] - TRAE's recent promotional offerings aim to lower the barriers for users to experiment with Skill, encouraging broader adoption and innovation within the community [41][42]
第一梯队的大模型安全吗?复旦、上海创智学院等发布前沿大模型安全报告,覆盖六大领先模型
机器之心· 2026-01-22 04:05
随着大语言模型加速迈向多模态与智能体形态,传统以单一维度为主的安全评估体系已难以覆盖真实世界中的复杂风险图景。在模型能力持续跃升的 2026 年,开 发者与用户也愈发关注一个核心问题: 前沿大模型的安全性,到底如何? 基于这一背景, 复旦大学 、上 海创智学院、迪肯大学与伊利诺伊大学厄巴纳 — 香槟分校的研究团队联合发布 本次安全评测报告,面向 GPT-5.2、Gemini 3 Pro、Qwen3-VL、Grok 4.1 Fast、Nano Banana Pro、Seedream 4.5 六大前沿模型,构建了一套覆盖 语 言、视觉语言与图像生成 三大核心场景的统一安全评测框 架,对当前主流大模型的安全能力进行了系统性、全景式刻画。在评测设计上,融合了四大关键维度,形成多层次、立体化的安全评估体系: 通过全方位的安全评测,本报告揭示了前沿大模型 在不同应用场景、威胁模型与监管语境下的安全边界 ,为产业落地与政策制定提供一定参考。 论文链接: https://arxiv.org/pdf/2601.10527 项目主页: https://xsafeai.github.io/AI-safety-report/ 声明: ...
第二届CVPR 2026 CV4CHL Workshop征稿启动,用AI大模型守护儿童未来
机器之心· 2026-01-22 03:13
近年来,随着多模态大语言模型、具身人工智能等技术飞速发展,大多数应用已落地在人们生活的方方面面,但是针对儿童发育、健康和教育的相关人工智能和 计算机视觉技术尚处于起步阶段。 CV4CHL (Workshop on Computer Vision for Children) ,传承自 ICLR 2025 首届 Workshop on AI for Children (AI4CHL),由北美首家儿科人工智能初创公司 PediaMed AI (儿医智能) 联合伊利诺伊大学厄巴纳 - 香槟分校、香港科技大学(广州)、苏黎世联邦理工学院、深圳儿童医院等知名高校和研究所在 CVPR 2026 期间承办, 目标是进一步汇集面向儿童及儿科 AI 和计算机视觉解决方案的多维度学科观点,填补该领域的关键空白。 本研讨会致力于搭建一个跨学科的桥梁,汇聚计算机视觉研究员、大模型技术专家、儿科医生、心理学家、教育家,共同探讨前沿计算机视觉和 AI 技术在儿童应 用场景的创新应用与伦理挑战。 研讨会将包括多个主题演讲,期间儿医智能也将发布相关儿科 AI 产品,并联合伊利诺伊大学厄巴纳 - 香槟分校组织儿童人工智能未来方向的圆桌讨论,联合 ...