Workflow
量子位
icon
Search documents
一只大头机器狗供不应求,打响了消费级具身智能第一枪
量子位· 2025-12-26 12:28
Core Viewpoint - The article highlights the emergence of Vbot's BoBo, a consumer-grade robotic dog, as a leading product in the embodied intelligence sector, achieving significant sales and consumer interest in a short time frame [5][57][79]. Group 1: Product Performance - Vbot's BoBo sold 1,000 units in just 52 minutes, setting a record in the consumer-grade robotic dog market [7][10]. - The product is priced at ¥9,988, making it accessible while offering high-end specifications, including a computing power of 128 TOPS and a battery capacity of 594Wh, which is 37.5% higher than the industry average [23][34][37]. - BoBo's design incorporates emotional engagement through facial expressions and movements, making it appealing to families, especially children [28][51][56]. Group 2: Market Positioning - Vbot has positioned BoBo as the first brand in the consumer-grade embodied intelligence market, addressing emotional companionship and practical assistance needs [57][75]. - The product's success is attributed to its unique design and technology, which combines advanced AI capabilities with user-friendly interactions, differentiating it from existing robotic products [33][44][46]. Group 3: Technological Innovation - BoBo utilizes a novel VLA (Vision-Language-Action) model and an Agent architecture, allowing it to understand and execute complex tasks based on natural language commands [38][39]. - The integration of a full-scene spatial base model enables BoBo to perform tasks like waking up a child by understanding context and planning routes [32][41]. Group 4: Industry Impact - Vbot's rapid success reflects a shift in the consumer robotics landscape, moving from industrial applications to personal, emotionally engaging products [62][79]. - The article suggests that the acceptance of consumer-grade embodied intelligence products like BoBo could lead to widespread adoption similar to that of smart cars and intelligent driving technologies in the near future [79].
清华唐杰:领域大模型,伪命题
量子位· 2025-12-26 08:52
Group 1 - The core idea is that scaling foundational models through pre-training is essential for AI to acquire world knowledge and basic reasoning capabilities [4][5] - More data, larger parameters, and saturated computation remain the most efficient methods for scaling foundational models [5] - The concept of domain-specific large models is considered a false proposition, as true AGI (Artificial General Intelligence) has not yet been achieved [28][30] Group 2 - Enhancing reasoning capabilities and aligning long-tail abilities are crucial for improving real-world AI performance [6][7] - The introduction of agents marks a significant milestone in AI, allowing models to interact with real environments and generate productivity [10][11] - Implementing memory mechanisms in models is essential for their application in real-world scenarios, with different memory stages mirroring human memory [12][13] Group 3 - Online learning and self-evaluation are key components for models to improve autonomously, with self-assessment being a critical aspect of this process [14][15] - The integration of model development and application is becoming increasingly important, with the goal of replacing human jobs through AI [16][17] - The future of AI applications should focus on enhancing human capabilities rather than merely creating new applications [32][34] Group 4 - Multimodal capabilities are seen as promising, but their contribution to AGI's upper intelligence limit remains uncertain [21][22] - The development of embodied AI faces challenges, including data acquisition and the stability of robotic systems [25][26] - The existence of domain models is driven by enterprises' reluctance to fully embrace AI, aiming to maintain a competitive edge [29][31]
训练时间爆砍80%!港大快手联合打造了一个AI炼金师:专挑“有营养”数据,20%数据达成50%效果
量子位· 2025-12-26 08:52
Alchemist团队 投稿 量子位 | 公众号 QbitAI 想象一下,如果让一个大厨用发霉的食材、过期的调料来做菜,即使厨艺再高超,也做不出美味佳肴。AI训练也是同样的道理。 一、数据就像食材,质量决定成品 现在的AI图像生成模型,如Stable Diffusion、FLUX等,需要从网络上爬取数百万张图片来学习。但这些图片质量参差不齐:有些模糊不 清,有些内容重复,有些甚至只是广告背景图。用这些"食材"训练出来的AI,自然效果不佳。 由香港大学丁凯欣领导,联合华南理工大学周洋以及快手科技Kling团队共同完成的这项研究,开发出了一个名为"炼金师" (Alchemist) 的AI系统。它就像一位挑剔的大厨,能从海量图片数据中精准挑选出最有价值的一半。 更让人惊喜的是: 二、让AI学会"自我评判" 2.1 传统方法的局限 传统的数据筛选方法就像用筛子筛米粒,只能按照单一标准过滤: 这些方法的问题在于: 它们不知道哪些数据真正有助于AI学习 。 2.2 炼金师的智慧 "炼金师"更像是一位经验丰富的美食评委,它能同时考虑多个维度: 用这一半精选数据训练出的模型,竟然比用全部数据训练的表现还要好 训练速度快了 5 ...
AI金矿上打盹的小红书,刚刚醒了一「点点」
量子位· 2025-12-26 08:52
Core Viewpoint - The article discusses the recent integration of an AI assistant named "DianDian" into the Xiaohongshu platform, enhancing user interaction and content discovery [4][11][46]. Group 1: AI Integration - Xiaohongshu has introduced an AI assistant called "DianDian," which replaces the previous comment section interaction method [5][11]. - Users can now share notes directly with DianDian for a more seamless experience, allowing for real-time interaction without switching apps [9][16]. - The AI leverages Xiaohongshu's extensive content database to provide recommendations on entertainment and dining, improving the efficiency of content consumption [16][24]. Group 2: User Experience - The AI assistant aims to summarize user-generated content, helping users make informed decisions based on real reviews [21][24]. - DianDian can also assist in organizing information from lengthy posts, such as summarizing over 1200 comments from a podcast recommendation [33][35]. - Despite the positive aspects, some users have expressed concerns about the loss of the sidebar and the richness of information compared to direct engagement with posts [40][43]. Group 3: Market Position and Future Outlook - The integration of AI into Xiaohongshu is seen as a significant step, indicating the company's commitment to leveraging AI technology in its content ecosystem [38][46]. - The platform is viewed as a potential goldmine for AI and large model training due to its unique content ecology and user engagement [47][48]. - The ongoing feedback collection from users suggests that Xiaohongshu is in a phase of refining its AI capabilities to better meet user needs [44].
英伟达成美国大模型开源标杆:Nemotron 3连训练配方都公开,10万亿token数据全放出
量子位· 2025-12-26 06:35
Core Viewpoint - Nvidia is aggressively advancing in open-source models with the introduction of the "most efficient open model family" Nemotron 3, utilizing a hybrid Mamba-Transformer MoE architecture and NVFP4 low-precision training [1][22]. Group 1: Model Architecture and Efficiency - Nemotron 3 combines Mamba and Transformer architectures to maximize inference efficiency [7]. - The model architecture features a unique arrangement of Mamba-2 layers and MoE layers, significantly reducing the reliance on self-attention layers [10]. - In typical inference scenarios with 8k input and 16k output, Nemotron 3 Nano 30B-A3B achieves a throughput 3.3 times greater than Qwen3-30B-A3B, with advantages becoming more pronounced as sequence length increases [12]. - The model demonstrates robust performance on long-context tasks, scoring 68.2 on the RULER benchmark with 1 million token input length, compared to only 23.43 for Nemotron 2 Nano 12B [14]. Group 2: LatentMoE Architecture - For larger models, Nvidia introduces the LatentMoE architecture, which performs expert routing in a latent space [15]. - LatentMoE addresses two bottlenecks in MoE layer deployment: low-latency scenarios and high-throughput scenarios, reducing the weight loading and communication costs significantly [16][18]. - LatentMoE utilizes 512 experts with 22 activated, compared to the standard MoE's 128 experts with 6 activated, achieving better performance across various tasks [20]. Group 3: Training Innovations - Nvidia employs NVFP4 format for training, achieving a peak throughput three times that of FP8, and has successfully trained models on up to 250 trillion tokens [22]. - The training process retains high precision for certain layers to maintain model stability, while most layers are quantized to NVFP4 [23]. - Nemotron 3's post-training utilizes multi-environment reinforcement learning, covering a wide range of tasks simultaneously, which enhances stability and avoids common issues associated with phased training [24][26]. Group 4: Performance Metrics and Open Source - The model shows consistent accuracy across various downstream tasks, with NVFP4-trained models closely matching BF16 versions in performance [28]. - The entire post-training software stack is open-sourced under the Apache 2.0 license, including NeMo-RL and NeMo-Gym repositories [32]. - Nemotron 3 allows for cognitive budget control during inference, enabling users to specify the maximum number of tokens for thought chains, thus balancing efficiency and accuracy [34].
第一批拿12.8万月薪的实习生已经出现!AI人才抢夺战真的好激烈
量子位· 2025-12-26 06:35
衡宇 发自 凹非寺 量子位 | 公众号 QbitAI 好震惊,好意外,现在一份4–6个月的AI相关实习,月薪已经接近14万人民币了! 而且 这个价格不是个例 —— OpenAI、Anthropic、Meta、Google DeepMind等巨头,都为实习、Fellowship、Residency这类短期岗位,开出足以对标全职研究员的价 格。 Business Insider最新披露的一组数据显示,目前AI相关实习和研究型短期项目的月薪,已经普遍来到7000–18000美元区间,折合人民币约 4.9-12.6万元。 换算成年薪水平,是不是 已经明显超出大多数行业对"实习生"这一角色的传统认知 …… 真·AI人才的生活,我的梦 (没错已经开始白日做梦了) 。 书归正传。 继大厂、巨头为成熟的AI人才大动干戈,甚至扎克伯格为了挖OpenAI的人亲自洗手作羹汤端到想挖的人嘴边过后, 这场纷争终于开始波及那 些还没有正式毕业、甚至刚刚进入研究路径不久的人。 水涨船高的AI实习工资 在薪酬层面,实习生、学生研究员、驻留项目,已经可以和全职研究岗站在同一水平线上。 我们先展开来看看硅谷那边的具体情况。 OpenAI Ope ...
超越GPT-5、Gemini Deep Research!人大高瓴AI金融分析师,查数据、画图表、写研报样样精通
量子位· 2025-12-26 06:35
Core Viewpoint - The article introduces Yulan-FinSight, a multi-modal report generation system developed by Renmin University of China, designed to meet real financial research and investment needs, showcasing advanced capabilities in data analysis and report writing [1][3]. Group 1: Challenges of General AI in Financial Research - General AI struggles with financial reports due to their highly structured, logical, and visual nature, which involves multiple processes [5]. - Financial research demands higher data integration, analytical depth, and expression forms compared to general AI tasks [6]. - Three main challenges faced by existing general AI systems include: 1. Fragmentation of domain knowledge and data, making it difficult to integrate structured financial data with unstructured information [7]. 2. Lack of professional-level visualization capabilities, as current models can only produce basic visualizations without ensuring data consistency [8]. 3. Absence of iterative research capabilities, where existing systems follow a fixed process that limits dynamic adjustments based on intermediate findings [9]. Group 2: FinSight's Innovations - FinSight aims to emulate human financial analysts by focusing on cognitive processes and introducing three key technological innovations [10]. - The core architecture is based on a Code-Driven Variable-Memory (CAVM) multi-agent framework, allowing for collaborative reasoning through a unified variable space instead of traditional message-based communication [14][16]. - An iterative vision-enhanced mechanism is employed for generating financial charts, combining the strengths of language models for coding and visual models for feedback [20][21]. - The writing framework is restructured into a two-phase process: analysis followed by integration, ensuring clarity and depth in long reports [24][25]. Group 3: Performance and Evaluation - FinSight significantly outperformed existing deep research systems in factual accuracy, analytical depth, and presentation quality, achieving an average score of 8.09 [34]. - The system's visualization capabilities received a score of 9.00, indicating a substantial improvement in generating professional financial charts [35]. - In practical applications, FinSight produced reports averaging over 20,000 words with more than 50 charts, maintaining quality as report length increased [38]. - FinSight ranked first in the AFAC 2025 Financial Intelligence Innovation Competition, demonstrating its robustness and practical utility [39]. Group 4: Broader Implications - FinSight represents a significant advancement in AI capabilities within expert-intensive fields, suggesting that AI can now perform tasks traditionally reserved for human experts, such as problem decomposition and hypothesis validation [40][41]. - This paradigm shift indicates potential applications in various complex domains, including research analysis, legal assessment, and medical decision-making, paving the way for a new generation of productivity centered around expert-level AI agents [43].
量子位编辑作者招聘
量子位· 2025-12-26 04:24
目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 编辑部 发自 凹非寺 量子位 | 公众号 QbitAI AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术、新工具应用于工作,提升工作效率和创造力。 打造个人影响力 :通过撰写独家原创内容,建立个人知名度,成为AI领域的意见领袖。 拓展行业人脉 :与AI领域大咖零距离接触,参与重要科技活动和发布会,拓展行业视野。 获得专业指导 ...
推理成本打到1元/每百万token,浪潮信息撬动Agent规模化的“最后一公里”
量子位· 2025-12-26 04:24
Core Viewpoint - The global AI industry has transitioned from a model performance competition to a "life-and-death race" for the large-scale implementation of intelligent agents, where cost reduction is no longer optional but a critical factor for profitability and industry breakthroughs [1] Group 1: Cost Reduction Breakthrough - Inspur Information has launched the Yuan Brain HC1000 ultra-scalable AI server, achieving a breakthrough in inference cost to 1 yuan per million tokens for the first time [2][3] - This breakthrough is expected to eliminate the cost barriers for the industrialization of intelligent agents and reshape the underlying logic of competition in the AI industry [3] Group 2: Future Cost Dynamics - Liu Jun, Chief AI Strategist at Inspur, emphasized that the current cost of 1 yuan per million tokens is only a temporary victory, as the future will see an exponential increase in token consumption and demand for complex tasks, making current cost levels insufficient for widespread AI deployment [4][5] - For AI to become a fundamental resource like water and electricity, token costs must achieve a significant reduction, evolving from a "core competitiveness" to a "ticket for survival" in the intelligent agent era [5] Group 3: Historical Context and Current Trends - The current AI era is at a critical point similar to the history of the internet, where significant reductions in communication costs have driven the emergence of new application ecosystems [7] - As technology advances and token prices decrease, companies can apply AI on more complex and energy-intensive tasks, leading to an exponential increase in token demand [8] Group 4: Token Consumption Data - Data from various sources indicates a significant increase in token consumption, with ByteDance's Doubao model reaching a daily token usage of over 50 trillion, a tenfold increase from the previous year [13] - Google's platforms are processing 1.3 trillion tokens monthly, equivalent to a daily average of 43.3 trillion, up from 9.7 trillion a year ago [13] Group 5: Cost Structure Challenges - Over 80% of current token costs stem from computing expenses, with the core issue being the mismatch between inference and training loads, leading to inefficient resource utilization [12] - The architecture must be fundamentally restructured to enhance the output efficiency of unit computing power, addressing issues such as low utilization rates during inference and the "storage wall" bottleneck [14][16] Group 6: Innovations in Architecture - The Yuan Brain HC1000 employs a new DirectCom architecture that allows for efficient aggregation of massive local AI chips, achieving a breakthrough in inference cost [23] - This architecture supports ultra-large-scale lossless expansion and enhances inference performance by 1.75 times, with single card utilization efficiency (MFU) potentially increasing by 5.7 times [27] Group 7: Future Directions - Liu Jun stated that achieving a sustainable and significant reduction in token costs requires a fundamental innovation in computing architecture, shifting the focus from scale to efficiency [29] - The AI industry must innovate product technologies, develop dedicated computing architectures for AI, and explore specialized computing chips to optimize both software and hardware [29]
P图新手福音!智能修图Agent一句话精准调用200+专业工具,腾讯混元&厦大出品
量子位· 2025-12-26 04:24
Core Viewpoint - JarvisEvo, developed by Tencent and Xiamen University, is an advanced image editing AI that simulates human expert designers through iterative editing, visual perception, self-evaluation, and self-reflection, aiming to provide a more controllable and professional editing experience compared to traditional software and AI tools [1][3]. Group 1: Challenges in Image Editing - The article identifies two main challenges in achieving a professional-level editing experience: Instruction Hallucination, where existing models struggle to visualize intermediate results and often make factual errors, and Reward Hacking, where models exploit static reward systems to gain high scores without genuinely improving editing quality [4][5]. Group 2: JarvisEvo's Mechanisms - JarvisEvo introduces the iMCoT (Interleaved Multimodal Chain-of-Thought) mechanism, allowing the model to generate new images after each editing step and use visual feedback for subsequent reasoning, breaking the limitations of traditional blind editing [8][9]. - The SEPO (Synergistic Editor-Evaluator Policy Optimization) framework enables JarvisEvo to learn from mistakes by comparing low and high scoring trajectories, thus developing a strong self-correction ability [11][12]. Group 3: System Architecture - The system operates in a four-step process: visual perception and planning, step-by-step execution, self-evaluation, and self-reflection, ensuring precise execution of each operation [18][16]. - The model utilizes two optimization loops: the Editor Policy Optimization loop focuses on improving tool usage for better image quality, while the Evaluator Policy Optimization loop ensures the model's scoring aligns with human aesthetic standards [17][25]. Group 4: Training Framework - JarvisEvo's training consists of three stages: Cold-Start Supervised Fine-Tuning with 150K labeled samples to teach basic skills, SEPO Reinforcement Learning with 20K standard instruction data for autonomous exploration, and Reflection Fine-Tuning with 5K reflection samples to enhance self-correction capabilities [20][22][31]. Group 5: Experimental Results - In evaluations, JarvisEvo achieved a Spearman Rank Correlation Coefficient (SRCC) of 0.7243 and a Pearson Linear Correlation Coefficient (PLCC) of 0.7116, outperforming other models and demonstrating superior alignment with human preferences [36][38]. - The model showed a 44.96% improvement in L1 and L2 metrics compared to commercial models, maintaining original image details while excelling in style and detail presentation [34][40]. Group 6: Future Prospects - The collaborative evolution paradigm of JarvisEvo is expected to extend beyond image editing to areas such as mathematical reasoning, code generation, and long-term planning, with ongoing efforts to enhance its capabilities for complex tasks [44][45].