量子位
Search documents
机器人连续叠衣120分钟!仅用0.9B参数实现五大SOTA|清华AIR & 上海AI Lab开源
量子位· 2025-10-18 07:33
清华大学智能产业研究院(AIR)与上海人工智能实验室联合发布 通用跨本体具身基座模型X-VLA ,通过创新的Soft-Prompt机制、高效的 框架设计与定制化训练范式,显著提升预训练效率与模型性能。 X-VLA团队 投稿 量子位 | 公众号 QbitAI 机器人也是卷疯了! 不仅能叠衣服,而且一干就是俩小时,且全程无任何辅助。 更关键的是, X-VLA是 首个实现120min无辅助自主叠衣任务的全开源模型 (公开数据、代码与参数),以仅 0.9B的参数量 在五大权威仿 真基准上全面刷新性能纪录 。 | Methods | Size | | Simpler | | | | LIBERO | | | Calvin | | RoboTwin-2.0 | VLABench | NAVSIM | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | | | VM | VA | WidowX | Spatial | Object | Goal | Long | Avg | ABC -> ...
AI画手总是六根手指?阿大/美团/上交首次系统量化扩散模型计数幻觉
量子位· 2025-10-18 07:33
Core Viewpoint - The article discusses the challenges of hallucination samples generated by diffusion probability models (DPMs) in image generation tasks, particularly focusing on a specific type of hallucination called "counting hallucination" [1][2]. Group 1: Research Background - Despite the prevalence of hallucination issues in DPMs, there has been a lack of systematic methods to quantify these factual errors, hindering the development of high-reliability generative models [2]. - A research team from the University of Adelaide, Meituan, and Shanghai Jiao Tong University has conducted a systematic study on counting hallucinations in diffusion models [2][3]. Group 2: Key Questions and Dataset - The research team posed several key questions regarding the quantification of counting hallucinations and the effectiveness of common optimization techniques [3][7]. - They constructed the CountHalluSet dataset suite, which includes three datasets with increasing complexity of countable objects: ToyShape, SimObject, and RealHand [10]. Group 3: Findings and Experiments - The study revealed that increasing sampling steps can reduce counting hallucination rates in synthetic datasets but may increase them in real datasets due to overfitting [19]. - The research found that higher-order ODE solvers can lower overall failure rates but may increase counting hallucination rates, indicating a trade-off in model sensitivity to counting constraints [20][21]. - The study identified that the complexity of object shapes correlates with the severity of counting hallucinations, with more complex structures leading to higher rates of errors [26]. Group 4: Correlation Analysis - The correlation between counting hallucination rates and FID scores varies depending on the dataset and solver type, suggesting that FID may not reliably reflect factual consistency [30][32]. - Non-counting failure rates showed a stable and significant correlation with FID across conditions, indicating that FID is more effective in assessing overall visual consistency rather than specific factual features [32]. Group 5: Proposed Solution - The research team proposed a Joint-Diffusion Model (JDM) that incorporates structural constraints during the diffusion process to guide the model in generating the correct number of objects [33][35]. - This approach enhances the semantic consistency and visual credibility of generated results, effectively mitigating counting hallucination issues [35]. Group 6: Future Directions - The work opens avenues for exploring higher-order factual consistency in generative models, extending the analysis to more complex hallucination types and integrating abstract knowledge into the diffusion process [37]. - The ultimate goal is to transform generative models from mere creative tools into reliable world models applicable in critical fields requiring high accuracy [37].
季度AI视频生成产品:多模态输入成标配,角逐一站式生成能力 | 量子位智库AI 100
量子位· 2025-10-18 07:33
Core Insights - The article highlights the rapid growth and competition in the AI video generation sector, with significant advancements in technology and user engagement metrics [3][6][7]. Group 1: Market Trends - Sora2 has achieved over 1 million downloads in just five days, indicating a surge in interest in AI video generation [3]. - Major companies like Google are launching competitive products such as Veo3.1, focusing on audio generation, which is expected to further intensify market competition [4]. - The integration of visual models with world models is enhancing the realism of AI-generated videos, allowing for the creation of intricate 3D physical scenes [6]. Group 2: Technological Advancements - The latest AI 100 list from Quantum Bit Think Tank shows a diverse technological evolution in AI video generation, with multi-modal input becoming standard [7]. - Output quality has significantly improved, with video lengths extending from seconds to minutes, and resolutions reaching 2K and 4K, with frame rates up to 60fps [7]. - User data reflects this trend, with five AI video generation products exceeding 200,000 visits, showcasing the growing demand [8]. Group 3: Product Highlights - The article details several leading AI video generation products, including: - **Jimeng AI**: Over 11 million downloads, with a 27% increase in visits, reaching approximately 9.5 million [9]. - **Keling AI**: Web version monthly visits surpassing 1 million, indicating strong user engagement [9]. - **RoboNeo**: A product from Meitu, focusing on image and video generation with a comprehensive workflow [10]. Group 4: Competitive Landscape - The competitive landscape features various companies, each with unique offerings: - **Jimeng AI**: A one-stop AI creation platform with advanced video generation capabilities [15]. - **Tencent's Mixed Yuan 3D**: A platform for creating immersive 3D content [18]. - **Keling AI**: A creative productivity platform with robust video generation features [20]. - Other notable products include **Sea Cucumber AI**, **Drawing Ideas**, and **Medeo**, each contributing to the diverse capabilities in the AI video generation market [24][56].
61岁退休后,华为海思创始总裁成了复旦北大清华老师
量子位· 2025-10-18 07:33
Jay 发自 凹非寺 量子位 | 公众号 QbitAI 原来低调退休的华为海思创始总裁 徐文伟 ,现在的新身份是 大学老师 。 最近,清华五道口AI首期班开学的报道中,徐文伟以教授身份亮相,给企业家学员上了一堂 《AI时代的企业创新》 为题的课程。 据说课上,徐文伟教授结合华为突破欧洲市场的故事,生动地解析了创新与商业的关系,还为企业家们分享了干货满满的创新方法论。 这也是这位前华为董事、科学家咨询委员会主任、战略研究院院长、战略Marketing总裁、企业业务总裁、IRB主任、欧洲地区部总裁以及海 思半导体总裁…… 在满满当当的履历中, 一步一个脚印凝结下来的宝贵经验 。 1963年9月,徐文伟出生于江苏常州,1990年从东南大学毕业,一年后加入华为,开启了长达三十多年的职业生涯长跑。 任职期间,徐文伟战功赫赫 ,包括但不限于:主持研发首款局用程控交换机、首颗芯片、首套GSM系统及首台云数据中心核心交换机、提出 创新2.0战略,2020年发布面向数学的十大挑战问题,布局光子计算、裸眼3D显示等前沿技术研发…… 直到2024年,在61岁时低调退休。 低调荣休后的新生活 从2023年4月起,华为启动新一轮高层换 ...
杨振宁教授千古!中国AI计算机产业因他而不同
量子位· 2025-10-18 04:45
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI 巨星陨落! 据新华社消息,杨振宁先生因病于2025年10月18日在北京逝世,享年103岁。 杨振宁,1922年出生于安徽合肥,中国理论物理学家。 1938年,杨振宁考入西南联大,1942年入清华大学研究院,1944年获理学硕士学位,1945 年作为清华大学留美公费生赴美留学,就读于芝加哥大学,1948年获博士学位后留校工作。 1949年,他加入普林斯顿高等研究院,1952年任永久研究员,1955年任教授。1966年任纽 约州立大学石溪分校爱因斯坦讲座教授,创立理论物理研究所(现名为杨振宁理论物理研究 所),并在该研究所工作至1999年。1986年起应邀担任香港中文大学博文讲座教授。1997 年起任新成立的清华大学高等研究中心(现名为高等研究院)名誉主任,1999年起任清华大 学教授。 他最为人所熟知的物理学贡献,是在1956年,和李政道合作提出了 弱相互作用中宇称不守恒 的理论。 这一理论彻底改变了物理学界对自然界基本对称性的认识,被认为是20世纪物理学一项重要 的里程碑。 对于中国的后辈学者而言,杨振宁更是一位伟大的教育者。 自1970年代起,杨振宁就 ...
通用型产品增长停滞,垂直赛道成市场新解法丨季度AI 100数据解读
量子位· 2025-10-18 02:07
Core Insights - The "AI 100" list has been released, indicating a highly competitive landscape for AI products, with both internet giants and startups optimizing user experiences to capture market share [2][4]. APP Sector AI Product Status - There is a stagnation in growth for web-based AI products, with total visits and monthly active users (MAU) remaining flat at 600 million and 130 million respectively, while leading products show slight declines [6]. - Growth engines have shifted from general head products to niche, high-segment products, with new applications in emerging fields like AI health gaining significant traction [6]. - Notable growth in user numbers for comprehensive office agents and industry-specific agents, such as Kouzi Space and RoboNeo, indicates a validation of agent product value [6]. User Scale Top 10 Products - The top 10 AI products by cumulative downloads on the APP end as of September 2025 include: 1. Quark: ~251 million 2. Doubao: ~233 million 3. Kimi: ~92 million 4. DeepSeek: ~77 million 5. Xingtou: ~77 million 6. Jimeng AI: ~76 million 7. QQ Browser: ~74 million 8. Tencent Yuanbao: ~67 million 9. Meitu Xiuxiu: ~41 million 10. NetEase Youdao Dictionary: ~40 million - A total of 23 products have downloads exceeding 10 million [7]. User Growth Top 10 Products - The top 10 products by new downloads in September 2025 include: 1. Doubao: ~27 million 2. Quark: ~23 million 3. Jimeng AI: ~12 million 4. Tencent Yuanbao: ~11 million 5. QQ Browser: ~8.1 million 6. Xingtou: ~7 million 7. Xingge: ~6.7 million 8. NetEase Youdao Dictionary: ~5.6 million 9. AQ: ~5 million 10. Kimi: ~4.8 million - Total new downloads for AI apps exceeded 166 million in September, a 27% increase from June [9][10]. User Activity Top 10 Products - The top 10 products by daily active users (DAU) in September 2025 include: 1. WPS: ~61 million 2. QQ Browser: ~52 million 3. Doubao: ~33 million 4. DeepSeek: ~26 million 5. Quark: ~22 million 6. Meitu Xiuxiu: ~18 million 7. Tencent Yuanbao: ~17 million 8. Kuaidui: ~12 million 9. NetEase Mail Master: ~7.3 million 10. NetEase Youdao Dictionary: ~6.5 million - The average daily usage of AI apps reached nearly 300 million, with a nearly 50% increase since June [11][12]. APP Sector Analysis - The concentration of top AI products has weakened, with noticeable increases in downloads and daily active users for mid-tier products [14]. - The market share of the top 5 products has decreased from over 60% in Q2 to below 50% [15]. - Doubao and Quark are the only two products with new downloads exceeding 20 million in September, leading the market significantly [16]. Web Sector AI Product Status - The top 10 web-based AI products by total visits in September 2025 include: 1. DeepSeek: ~115 million 2. Doubao: ~85 million 3. Quark: ~82 million 4. Baidu AI Search: ~44 million 5. Tencent Docs: ~41 million 6. Kimi: ~30 million 7. Tongyi: ~29 million 8. WPS Office: ~25 million 9. Tencent Yuanbao: ~22 million 10. Baidu Wenku: ~17 million - The top three products account for 47% of total web-based AI product visits [18]. User Activity Top 10 Products (Web) - The top 10 products by unique visitors in September 2025 include: 1. Quark: ~19 million 2. Baidu AI Search: ~13 million 3. DeepSeek: ~13 million 4. Doubao: ~10 million 5. Baidu Wenku: ~8.6 million 6. Tongyi: ~7.1 million 7. Tencent Docs: ~6.2 million 8. WPS: ~4.9 million 9. Kimi: ~4.6 million 10. Zhihu Zhidao: ~3.4 million - There are 19 products with MAU exceeding 1 million, with Baidu AI Search showing significant growth [21][23]. User Engagement Top 10 Products (Web) - The top 10 products by average visits per user in September 2025 include: 1. Mogao Design: 9.5 2. DeepSeek: 9.1 3. Doubao: 8.3 4. Tencent Yuanbao: 8.2 5. Wenxiaobai: 8.0 6. Moke AI: 7.5 7. Modao AI: 6.8 8. Tencent Docs: 6.6 9. Xiangzhi HaiSnap: 6.6 10. Kimi: 6.5 - The top 10 in user engagement is dominated by AI office efficiency and intelligent assistant applications [25][26]. Web Sector Analysis - Total visits for web-based AI products exceeded 600 million in September, showing growth from 570 million in June, while total active users remained stable at approximately 124 million [27]. - The threshold for the top 10 products in visits and active users has decreased, indicating a shift in user engagement dynamics [27]. - The emergence of AI agents is diverting traffic from traditional web-based products, with agent products gaining significant traction [33].
黄仁勋2025都在投啥?出手50次,32家公司覆盖产业链闭环
量子位· 2025-10-18 02:07
Jay 发自 凹非寺 量子位 | 公众号 QbitAI 英伟达不光自己成长高速,现在它在AI领域的投资也坐上火箭了。 最新数据显示,2025年过去的三个季度里,英伟达参与了 50 笔A I相关风险投资 ,这个数量已经超过了2024年全年的48笔。 其中,做数据中心、计算等AI基础设施的公司和模型厂商平分秋色,各占了31%,做应用的今年也多了不少,占到了四分之一,其余则投向具 身智能与自动驾驶领域。 可以说是全面开花了。 更夸张的是,这还没有算上自家VC部门NVentures。 后者今年也相当活跃,已经出手 21次 (2022年时才刚起步时,只出手过1次) 。 截至今年9月, NVentures已投出了4只独角兽 ,分别是Hippocatic Ai、Field AI、Abridge和Synthesia。 而在老黄的AI投资版图中,也能窥见站在AI基础层核心的英伟达和黄仁勋,对趋势、赛道的看好。 英伟达AI俱乐部 量子位这么一数,发现老黄的AI俱乐部已有32名大将。 根据融资规模的不同,大致可分为三档—— 十亿美元级 Op enAI: 2024年10月,老黄首次押注掀起这轮AI巨浪的公司——OpenAI,以一张1 ...
破解空间智能数据稀缺难题,影石开源DiT架构全景生成模型,在线可玩
量子位· 2025-10-18 02:07
Core Insights - The article discusses the introduction of DiT360, a panoramic image generation model based on the Diffusion Transformer (DiT) architecture, which addresses the scarcity of high-quality panoramic data in the field of spatial intelligence [2][11][50]. Group 1: DiT360 Model Overview - DiT360 utilizes a hybrid training framework that combines limited panoramic data with a large volume of high-quality perspective images, significantly enhancing both realism and geometric consistency in generated images [4][12][50]. - The model is capable of generating high-resolution panoramic images (2048×1024) across various environments, demonstrating superior detail and realism compared to existing methods [11][30]. Group 2: Challenges in Panoramic Image Generation - Generating panoramic images involves overcoming geometric challenges such as seamless stitching and polar distortion, compounded by the scarcity and quality limitations of real panoramic data [8][9][10]. - Existing approaches either break panoramic images into multiple planar views or generate them directly on a spherical surface, both of which face issues with boundary consistency and distortion [9][10]. Group 3: Training Mechanisms - DiT360 employs a multi-level hybrid training mechanism that enhances the diversity and realism of generated results through image-level and feature-level strategies [12][17]. - The image-level approach includes panorama refinement and perspective image guidance to improve the structural quality of panoramic data and facilitate cross-domain knowledge transfer [14][16]. Group 4: Performance Evaluation - DiT360 outperforms various state-of-the-art methods in visual quality and geometric consistency, achieving leading scores across multiple evaluation metrics [30][32][36]. - User studies indicate that DiT360 is preferred for realism and overall quality, with preference rates of 63.8% and 80.9%, respectively, significantly higher than competing methods [38][39]. Group 5: Future Applications - The hybrid training strategy of DiT360 can be extended to applications such as panoramic video generation, VR/AR content creation, and dynamic scene simulation, enhancing the realism and spatial consistency of generated scenes [51][52].
这是最新AI产品季度百强丨量子位智库AI 100
量子位· 2025-10-17 11:30
Core Insights - The latest "AI 100" lists reveal shifts in the leading AI products and highlight emerging players in the market, focusing on both established and innovative products [1] Group 1: Flagship 100 - Leading AI products have experienced a decline in both web and app usage compared to June, yet the overall landscape remains stable [2] - The top web products maintain a strong presence, with total visits and MAU exceeding 80% and 70% respectively, featuring products like DeepSeek, Doubao, and Quark [2] - The app segment also shows stability, with notable products including WPS, QQ Browser, and Doubao, which has seen a cumulative download of over 230 million [2] - A total of 35 new products entered the flagship list, with 18 being successful from the previous innovation list, showcasing a diverse range of AI applications [2][3] Group 2: Innovation 100 - The innovation list serves as a forward-looking index of rapidly growing AI products with unique designs, differing from the flagship list's focus on established products [7] - This quarter's innovation list includes 56 new entrants, primarily from sectors like comprehensive AI agents, AI education, and AI entertainment [9][8] - The competition among innovative products will hinge on user engagement, product functionality, and effective marketing strategies [13] Group 3: Market Trends - The current phase is marked by intense competition in AI products, driven by new workflows and long-term user retention strategies [14] - Key themes include "scene segmentation" and "hyper-personalization," which are seen as critical for enhancing user experience and product differentiation [16] - Companies must navigate the evolving landscape by focusing on user-specific needs and operational efficiency to maintain relevance [15]
百度文心助手都成这样了
量子位· 2025-10-17 11:30
Core Insights - The article highlights the recent advancements in Baidu's AI capabilities, particularly focusing on the comprehensive upgrades to its AI assistant, Wenxin, and the introduction of new features in AI video generation [3][4][10]. Baidu's AI Upgrades - Baidu has launched eight new multimodal creative capabilities, including the ability to generate long videos in real-time, marking a significant upgrade from previous models [3][4]. - The Wenxin assistant has improved its speed and intelligence, being five times faster than competitors while maintaining lower operational costs [11][34]. - The assistant now offers a wide range of functionalities, including real-time responses for travel queries and 24/7 AI medical consultations [12][13]. User Engagement and Features - Wenxin assistant is designed to be a personal AI partner, capable of handling complex tasks like market analysis and homework assistance [14][15]. - It supports various creative outputs, including long videos, images, and music, making AI creation accessible to everyday users [16][19]. - The assistant can generate videos with over 30 special effects and allows users to interactively modify video content in real-time [21][29]. Market Position and Strategy - Baidu's AI search capabilities are ranked first in the industry based on user scale and technical ability, with daily AIGC generation exceeding ten million [4][6]. - The company emphasizes rapid execution and continuous iteration of its AI models to maintain a competitive edge in a saturated market [34][36]. - Baidu's strategy includes a dual focus on both B2B and B2C markets, leveraging its extensive product ecosystem to enhance AI offerings [36][39]. Future Developments - Baidu plans to launch an AI podcast feature by the end of October and continues to develop interactive digital personas for deeper user engagement [24][26]. - The company aims to refine its AI ecosystem further, ensuring a comprehensive range of services that meet diverse user needs [40][41].