Workflow
量子位
icon
Search documents
LeCun创业0产品估值247亿,回应谢赛宁入伙
量子位· 2026-01-23 07:44
Group 1 - The core viewpoint of the article is that Yann LeCun, after leaving Meta, is launching a new company called Advanced Machine Intelligence (AMI), focusing on world models rather than large language models (LLMs) for achieving human-level intelligence [9][17][20] - LeCun criticizes Meta's product development decisions, stating that while research is acceptable, product execution has been poor, particularly under Mark Zuckerberg's leadership [2][3][15] - AMI aims to be an open-source platform, contrasting with the recent trend in Silicon Valley towards closed-source models, which LeCun believes is a misguided approach [11][13][16] Group 2 - The company will initially focus on research and development, specifically on world models, which LeCun argues are essential for building intelligent systems [17][19] - LeCun emphasizes that LLMs are not equivalent to AI and that understanding the real world is crucial for achieving human-like intelligence, which LLMs struggle to do [21][22][23] - AMI is seeking to raise €30 million (approximately 247 billion RMB) in funding, with an initial goal of €3.5 million for early financing, aiming for a total of €5 million in the first round [45][46][50] Group 3 - The company has already attracted interest from potential investors, including Cathay Innovation and Hiro Capital, indicating a shift in venture capital investment logic towards valuing founders over products [52][53][54] - LeCun is actively recruiting talent, including former Meta executives, to strengthen AMI's capabilities [40][42] - The ultimate goal of AMI is to become a leading supplier of intelligent systems, with a focus on practical applications of world models and planning capabilities [38][39]
vLLM团队创业,种子轮10.5亿!清华特奖游凯超加盟
量子位· 2026-01-23 05:03
Core Insights - The core viewpoint of the article is the establishment of a new company, Inferact, by the core team behind the open-source inference framework vLLM, which has successfully raised $150 million in seed funding, achieving a valuation of $800 million [1][2][7]. Funding and Market Trends - The $150 million seed round marks a new high in AI infrastructure funding and is one of the largest seed rounds in history [2]. - Investors highlight a shift in focus from training to inference as AI applications mature, with a growing need for low-cost, reliable operation of existing models [4][9]. Company Mission and Strategy - Inferact aims to address the "inference bottleneck" by building the next-generation commercial engine to tackle large-scale deployment challenges [5]. - The company plans to maintain a dual approach, supporting vLLM as an independent open-source project while developing commercial products to enhance hardware efficiency for AI model deployment [12][14]. Technology and Market Validation - vLLM has already been deployed in real-world industrial environments, including Amazon's core shopping application, validating its stability under high concurrency [10][11]. - The demand for low-cost, reliable operation of existing models has surpassed expectations for new model development [9]. Founding Team and Expertise - Simon Mo, the CEO, has a background in machine learning systems design and was an early engineer at Anyscale, bringing experience in transforming research into industrial-grade products [26][27]. - Co-founder Woosuk Kwon, a PhD from UC Berkeley, contributed significant innovations to vLLM, including the Paged Attention algorithm [30][31]. - The team also includes Kaichao You, a Tsinghua University award winner, and experienced advisors from academia and industry, enhancing the company's technical and strategic capabilities [33][36].
告别「上帝视角」,机器人仅凭几张图精准锁定3D目标,新基准SOTA
量子位· 2026-01-23 05:03
Core Insights - The article discusses the challenges faced by embodied intelligent agents in understanding 3D environments due to limited and sparse visual data, proposing a new task called Multiview 3D Referring Expression Segmentation (MV-3DRES) to address these issues [4][10][30]. Group 1: Problem Statement - Embodied intelligent agents often lack a comprehensive view of their surroundings, relying on sparse RGB images that lead to incomplete and noisy 3D reconstructions [2][9]. - Existing 3D referring segmentation methods are based on idealized assumptions of dense and reliable point cloud inputs, which do not reflect real-world conditions [3][9]. Group 2: Proposed Solution - A new solution, MVGGT (Multimodal Visual Geometry Grounded Transformer), is introduced, which utilizes a dual-branch architecture combining geometric and language features to enhance 3D scene understanding and segmentation [4][11]. - The architecture includes a frozen geometric reconstruction branch that provides stable 3D geometric priors and a trainable multimodal branch that integrates language instructions with visual features [13][15]. Group 3: Optimization Strategy - The research identifies a core optimization challenge known as Foreground Gradient Dilution (FGD), which complicates training due to the sparse representation of target instances [20][18]. - To address this, the team introduces the PVSO (Per-View No-Target Suppression Optimization) strategy, which amplifies meaningful gradient signals from effective views while suppressing misleading signals from no-target views [22][18]. Group 4: Experimental Results - The team developed a benchmark dataset called MVRefer to evaluate the MV-3DRES task, simulating scenarios with eight randomly collected sparse views [23][24]. - Experimental results demonstrate that MVGGT significantly outperforms existing baseline methods across various metrics, particularly in challenging scenarios where target pixel ratios are low [25][26]. Group 5: Practical Implications - The work emphasizes the practical significance of aligning 3D grounding with real-world perception conditions, providing new directions for enhancing the perception capabilities of embodied intelligence in constrained environments [30]. - The research team invites further exploration and improvements based on the established benchmark to advance the field of sparse perception in embodied intelligence [30].
腾讯重仓的GPU公司要上市了!燧原科技IPO获受理,拟募资60亿,All in研发
量子位· 2026-01-23 05:03
henry 发自 凹非寺 量子位 | 公众号 QbitAI 2026年A股首单IPO新受理,落在国产GPU上。 刚刚,上交所信息显示: 上海燧原科技股份有限公司科创板IPO获受理 ,拟募资 60亿元 。 | | | | 首页 党建 发行上市 披露 监管 数据 产品 服务 规则 关于 | | | | 一网通办 "三开门"专栏 | | --- | --- | --- | --- | --- | --- | --- | --- | | 发行上市 > 审核项目动态 > | | | | | | | | | 上海燧原科技股份有限公司 科创板IPO | | | | | | | | | 已受理 2026-01-22 | | 已问询 | | 上市委会议 | | 提交注册 | 注册结果 | | 项目基本信息 | | | | | | | | | 公司全称 | 上海燧原科技股份有限公司 | | | 受理日期 | 2026-01-22 | | | | 公司简称 | 燧原科技 | | | 融资金额(亿元) | 60.00 | | | | 审核状态 | 已受理 | | | 更新日期 | 2026-01-22 | | | | 保荐机构 | ...
量子位编辑作者招聘
量子位· 2026-01-22 11:13
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术、新工具应用于工作,提升工作效率和创造力。 打造个人影响力 :通过撰写独家原创内容,建立个人知名度,成为AI领域的意见领袖。 拓展行业人脉 :与AI领域大咖零距离接触,参与重要科技活动和发布会,拓展行业视野。 获 ...
成立两年半登顶全球AI创作社区,背后是中国团队在“卖情绪”??
量子位· 2026-01-22 11:13
嘻疯 发自 凹非寺 量子位 | 公众号 QbitAI 全球第一AI创作社区易主! 这次是一个贩卖" 情绪 "与" 品味 "的平台,全球访问量超过Midjourney、Leonardo、Civitai, 注册用户超 过5000 万 ,月访问量超3000 万,ARR超5000万美金 ,用户在此平台 单日能生成2000万张图+50万视频 。 这个平台名叫 SeaArt ,具备全链路多模态AI创作能力,包括图片、视频、音频、数字人生成。 同时SeaArt配备ComfyUI可视化工作流、海量模型库及LoRA训练分享功能,且融合了社区互动生态。 它 并非单一工具 ,而是 被定位为" AI时代的全民级创作消费平台 " 。 就在最近,其背后团队趁热打铁—— 全新推出 全模态创作 消费平台 Se a Ve r se , 相当于是SeaArt 2.0 ,野心直指帮每个创作者打造AI时代的个人IP。 我们不禁好奇,当下AI模型能力快速趋同,SeaArt究竟靠什么持续吸引这么多用户?作为2.0的SeaVerse又有哪些能力?量子位迅速上手实 测了一波。 让人意外的是,这个火遍海外的社区,原来还是Made in China。 SeaV ...
最强大模型的视觉能力不如6岁小孩
量子位· 2026-01-22 11:13
Core Insights - The current state of visual reasoning in AI models is still significantly behind human capabilities, with the best model, Gemini 3 Pro Preview, only slightly outperforming a three-year-old child and lagging 20% behind a six-year-old child [2][10] - The performance of Gemini 3 Pro Preview is noted as the highest among existing models, with a score of 49.7%, while other leading models like GPT-5.2 and Claude 4.5 Opus show even poorer results [6][14] - The article emphasizes the need for future models to rebuild visual capabilities from the ground up rather than relying on language-based translations of visual problems [11] Performance Comparison - In closed-source models, Gemini 3 Pro Preview leads with 49.7%, followed by GPT-5.2 at 34.4% and Doubao-Seed-1.8 at 30.2% [14] - Other models such as Qwen3-VL-Plus, Grok-4, and Claude-4.5-Opus scored significantly lower, indicating a general underperformance in visual reasoning tasks [15] - The best-performing open-source model, Qwen3VL-235B-Thinking, achieved a score of 22.2%, still far behind the top closed-source systems [16] Challenges in Visual Reasoning - The article identifies four core challenges faced by multi-modal large language models (MLLMs) in visual reasoning: 1. **Lack of Non-verbal Fine Details**: MLLMs struggle to accurately describe fine visual details that cannot be easily expressed in language [25] 2. **Loss of Manifold Consistency**: MLLMs often fail to maintain perceptual consistency over long distances, leading to errors in tasks involving spatial relationships [31] 3. **Spatial Imagination**: MLLMs have difficulty constructing stable three-dimensional representations from two-dimensional images, which affects their ability to perform mental transformations [39] 4. **Visual Pattern Induction**: MLLMs tend to focus on counting attributes rather than understanding the underlying changes in visual examples, limiting their ability to generalize from few examples [47] Proposed Solutions - The research suggests two potential directions to improve visual reasoning: 1. **Reinforcement Learning with Verifiable Rewards (RLVR)**: This approach showed an overall accuracy improvement of 4.8 percentage points after fine-tuning, particularly in fine-grained discrimination and spatial perception tasks [56][58] 2. **Generative Model Approaches**: The study introduces BabyVision-Gen, which evaluates generative models like NanoBanana-Pro, GPT-Image-1.5, and Qwen-Image-Edit, highlighting that while success rates are still low, some models exhibit explicit visual thinking capabilities [60][62] Future Directions - The article concludes that overcoming the "language bottleneck" in visual reasoning is crucial, advocating for unified architectures that retain high-fidelity visual representations during reasoning processes [68][70] - Models like Bagel and Sora 2 demonstrate the potential for generative methods to serve as advanced forms of reasoning, emphasizing the importance of robust visual semantic understanding [71]
大模型Infra新突破!腾讯混元开源LLM推理算子库,推理吞吐提升30%
量子位· 2026-01-22 11:13
混元AI Infra团队 投稿 量子位 | 公众号 QbitAI 大模型竞赛中,算力不再只是堆显卡,更是抢效率。 面对H20等推理卡在主流算子库下难以跑满性能的痛点, 腾讯混元AI Infra团队正式 开源生产级高性能LLM推理核心算子库HPC-Ops 。 该算子库采用CUDA和CuTe从零构建,通过抽象化工程架构、微架构深度适配及指令级极致优化等,降低底层算子开发门槛,将核心算子性 能逼近硬件峰值,实现了显著性能突破。 在真实场景下,基于HPC-Ops,混元模型推理QPM提升 30% ,DeepSeek模型QPM提升 17% 。 同时,在单算子性能方面,HPC-Ops实现Attention相比FlashInfer/FlashAttention 最高提升2.22倍 ; GroupGEMM 相比DeepGEMM最高提升1.88倍;FusedMoE相比TensorRT-LLM最高提升1.49倍。 主流算子库亟需更适配的底层支持 在大模型时代,计算效率已成为AI应用及发展的关键瓶颈。 目前主流算子库(如FlashInfer、DeepGEMM)多以 NVIDIA H800 等高配训练卡为首要优化目标,但限于客观原因, ...
大学开始用AI招生了
量子位· 2026-01-22 07:37
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 以前只听说过招聘用AI筛简历、搞面试; 没想到现在连大学招生的活儿,也被AI拿下了…… 最近美国多所高校开启AI评审,但各种操作中最让人意外的,是弗吉尼亚理工大学 靠AI审核学生入学申请材料 。 这直接省下了8000小时人工工作量,让录取结果比往常早出了一个月。 「AI选拔」渗透越来越广 这种AI选拔的现象其实早有苗头,只是现在风水轮流转,吹到了象牙塔。 早就听大家吐槽过,许多公司不仅用AI筛简历,甚至还用AI进行视频面试,通过分析应聘者的面部表情和语气来判断性格。 就比如,知名消费品大厂联合利华就曾大规模引入AI面试系统,要求应聘者对着摄像头自言自语,由后台模型分析面试者的表现,决定打工 人的去留。 现在, AI招生 也来了。 但可能也是被逼出来的。 近年来,随着美国许多高校将SAT/ACT考试改为可选项目,申请门槛降低,申请人数就爆发式增长了。 招生部门面临的工作压力,已经不是靠多喝几杯咖啡就能解决的了(doge)。 以弗吉尼亚理工大学2025年秋季招生为例,他们原本计划只招收约7085名新生,结果却收到了超过5.7万份申请。 如果只是看个简历也就罢了 ...
2025最强AI产品一文看尽丨量子位智库年度AI 100
量子位· 2026-01-22 07:37
Core Viewpoint - The article highlights the transformation of China's AI product ecosystem in 2025, marking it as the "Year of AI Applications," where the focus shifts from mere functionality to system reconstruction driven by advancements in underlying models, user demand, and business model evolution [5][6]. Group 1: AI Product Landscape - The 2025 AI market in China is characterized by the launch of major AI companies like Zhipu and MiniMax, indicating a maturing market [3]. - The "AI 100" product list released by Quantum Bit Think Tank categorizes AI products into three main segments: "Flagship AI 100," "Innovative AI 100," and the top products from ten popular sectors [7][29]. - The "Flagship AI 100" focuses on the strongest AI products of 2025, showcasing those that have achieved significant technological breakthroughs and practical application value [8][29]. Group 2: User Engagement and Market Trends - The top five AI products on the web account for over 62% of monthly active users (MAU), while the top five on mobile apps represent over 65% of daily active users (DAU) [12]. - AI general assistants and AI office platforms remain the most popular sectors, significantly outpacing other categories in user scale [12]. - The "Innovative AI 100" aims to identify products with potential for explosive growth in 2026, highlighting emerging trends in various AI sectors [13][16]. Group 3: Sector-Specific Insights - The article identifies ten key AI application sectors, including AI browsers, AI agents, AI smart assistants, and AI education, each featuring top three products that exemplify innovation and engineering excellence [19][23]. - The evaluation of these sectors serves as a retrospective on the AI application market in 2025, emphasizing the competitive landscape and user engagement [24]. Group 4: Evaluation Methodology - The "AI 100" list employs a dual assessment system combining quantitative and qualitative metrics, focusing on user data, growth, and long-term development potential [26]. - Quantitative metrics include user scale, growth, and engagement, while qualitative assessments consider technology, market space, and user experience [26].