量子位
Search documents
卡帕西:强化学习很糟糕,但其他所有方法都更糟
量子位· 2025-10-18 09:30
Group 1 - The core viewpoint of the article is that achieving Artificial General Intelligence (AGI) will take at least another decade, as current AI systems need significant improvements to reach their full potential [5][10][28] - Karpathy emphasizes that existing AI systems lack maturity, multi-modal capabilities, and the ability to learn continuously, which are essential for them to function effectively in collaboration with humans [8][9][10] - He critiques the current state of Large Language Models (LLMs), stating that they have cognitive deficiencies and overestimate their capabilities, requiring substantial enhancements [16][18] Group 2 - Karpathy argues that reinforcement learning is more flawed than commonly perceived, as it reinforces all steps taken in reaching a correct answer, regardless of their validity, leading to inefficient learning [20][21][23] - He believes that AGI will not lead to a sudden leap in productivity but will follow a gradual growth pattern, similar to the historical 2% GDP growth trend observed with the internet [25][29] - The lengthy development of autonomous driving technology is attributed to the high stakes involved, where even minor errors can have severe consequences, necessitating extensive reliability improvements [30][32][33] Group 3 - As a full-time educator, Karpathy aims to establish a leading-edge educational institution that offers a unique mentorship experience, focusing on personalized learning and advanced AI education [34][36] - He highlights the importance of tailored teaching methods, which current LLMs cannot replicate, emphasizing the need for human instructors to provide appropriate challenges to students [36][38]
AI打通第一/第三人称视觉,跨视角视觉理解新SOTA|ICCV 2025 Highlight
量子位· 2025-10-18 09:30
ObjectRelator团队 投稿 量子位 | 公 众号 Q bitAI 具身智能落地迈出关键一步,AI拥有第一人称与第三人称的"通感"了! INSAIT、复旦大学等单位联合提出 O bjectRelat or框架 ,让 AI精准匹配不同视角下的同一物体,实现跨视角的统一表征与理解 。 实验中,ObjectRelator在Ego (第一人称视觉) 转Exo (三人称视觉) 和Exo转Ego两个任务上都显著超越了所有基线模型,拿下SOTA。 Ego→Exo效果,be like: Exo→Ego也可以很好地对齐: 目前,该工作已被ICCV 2025接收为Highlight论文,代码已开源。 Ego与Exo之间的鸿沟 在人类技能习得过程中,需要在两个视角之间进行流畅的转换。 我们在观看别人的演示过程时,会尝试在脑海中想象自己进行这些操作的场景。然而这一跨视角理解的能力对于计算机和机器人来说却是一个 巨大的挑战,制约着机器人学习、VR交互等关键领域的发展。 第一人称视角具备较强的沉浸感与交互细节捕捉能力,能够精确刻画主体与环境之间的动态交互过程。然而,其 视觉范围受限、画面稳定性 较差,难以全面反映场景全貌 。 尽 ...
量子位实习招聘|AI学术编辑实习生,线下远程均可
量子位· 2025-10-18 09:30
Core Viewpoint - The article emphasizes the rapid updates in the AI academic field and the recruitment of an intern to assist in managing the latest AI research papers and findings [1]. Group 1: Recruitment Information - The company is looking for a quick and attentive academic editor intern focused on AI to help with content organization and submission of recent AI research papers [1][2]. - The intern will be involved in editing AI and computer science academic papers, assisting in content selection, abstract extraction, and media dissemination [5]. - The internship requires a minimum commitment of three months, with flexible working arrangements, and offers a stipend and recommendation opportunities [5][6]. Group 2: Company Overview - Quantum Bit (量子位) has over 2.3 million subscribers on WeChat and more than 7 million users across the internet, with a daily reading volume exceeding 2 million [3]. - The company is recognized as a top media outlet in the AI and technology sector, receiving accolades from various mainstream media platforms [4][8]. - Quantum Bit is a strategic partner in major industry conferences and is part of influential organizations in the AI field, enhancing its credibility and reach [8]. Group 3: Company Culture and Values - The company promotes a culture driven by curiosity, encouraging individuals to explore and share new technological advancements [10][11]. - It values diverse educational backgrounds, focusing on curiosity and the ability to act on it rather than specific academic qualifications [9][10].
首创“AI+真人”双保障模式!刚刚,百度健康推出7x24小时「能聊、有料、会管」AI管家
量子位· 2025-10-18 07:33
Core Viewpoint - Baidu Health has launched a 24/7 AI health manager that integrates AI with human verification to enhance the medical consultation experience, marking a significant shift from traditional medical services to a more interactive and reliable health management system [3][6][68]. Group 1: AI Health Manager Features - The AI health manager offers a comprehensive service that includes health education, diagnosis, treatment recommendations, and health record management, all within the Baidu app [6][12]. - It features a unique "AI + human" dual verification model, where AI-generated medical advice is confirmed by real doctors, ensuring higher safety and reliability for users [6][26]. - The AI can conduct multi-turn conversations and accurately identify 127 types of skin issues, achieving a 98% accuracy rate in interpreting various medical documents [21][22]. Group 2: User Experience and Accessibility - Users can access the AI health manager directly through the Baidu app without needing to download a separate application, making it more user-friendly [10][11]. - The AI serves as a "smart health partner," capable of managing all aspects of medical consultations, including purchasing medication and booking appointments [12][27]. - The system integrates over 300,000 quality doctor resources and authoritative hospital rankings to assist users in selecting the right medical services [29]. Group 3: Data and Model Architecture - Baidu Health has established a robust data pipeline supported by 360,000 doctors, ensuring the quality and reliability of the medical data used by the AI [40][42]. - The model architecture includes multi-modal capabilities and online reinforcement learning, allowing the AI to continuously evolve and improve its decision-making processes [52][57]. - The AI's knowledge base includes over 2 million medical journal articles and 14 million authoritative health resources, enabling it to provide up-to-date medical information [54]. Group 4: Industry Impact and Future Vision - The introduction of the AI health manager signifies a transition in the healthcare industry from traditional doctor-patient interactions to a model where services proactively engage with users [70]. - Baidu Health aims to create a comprehensive health ecosystem that connects users with medical professionals and resources, enhancing the overall healthcare experience [64][75]. - The company envisions becoming the preferred health content and decision-making platform for the public, leveraging AI technology to make healthcare more accessible and trustworthy [73][75].
机器人连续叠衣120分钟!仅用0.9B参数实现五大SOTA|清华AIR & 上海AI Lab开源
量子位· 2025-10-18 07:33
清华大学智能产业研究院(AIR)与上海人工智能实验室联合发布 通用跨本体具身基座模型X-VLA ,通过创新的Soft-Prompt机制、高效的 框架设计与定制化训练范式,显著提升预训练效率与模型性能。 X-VLA团队 投稿 量子位 | 公众号 QbitAI 机器人也是卷疯了! 不仅能叠衣服,而且一干就是俩小时,且全程无任何辅助。 更关键的是, X-VLA是 首个实现120min无辅助自主叠衣任务的全开源模型 (公开数据、代码与参数),以仅 0.9B的参数量 在五大权威仿 真基准上全面刷新性能纪录 。 | Methods | Size | | Simpler | | | | LIBERO | | | Calvin | | RoboTwin-2.0 | VLABench | NAVSIM | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | | | VM | VA | WidowX | Spatial | Object | Goal | Long | Avg | ABC -> ...
AI画手总是六根手指?阿大/美团/上交首次系统量化扩散模型计数幻觉
量子位· 2025-10-18 07:33
Core Viewpoint - The article discusses the challenges of hallucination samples generated by diffusion probability models (DPMs) in image generation tasks, particularly focusing on a specific type of hallucination called "counting hallucination" [1][2]. Group 1: Research Background - Despite the prevalence of hallucination issues in DPMs, there has been a lack of systematic methods to quantify these factual errors, hindering the development of high-reliability generative models [2]. - A research team from the University of Adelaide, Meituan, and Shanghai Jiao Tong University has conducted a systematic study on counting hallucinations in diffusion models [2][3]. Group 2: Key Questions and Dataset - The research team posed several key questions regarding the quantification of counting hallucinations and the effectiveness of common optimization techniques [3][7]. - They constructed the CountHalluSet dataset suite, which includes three datasets with increasing complexity of countable objects: ToyShape, SimObject, and RealHand [10]. Group 3: Findings and Experiments - The study revealed that increasing sampling steps can reduce counting hallucination rates in synthetic datasets but may increase them in real datasets due to overfitting [19]. - The research found that higher-order ODE solvers can lower overall failure rates but may increase counting hallucination rates, indicating a trade-off in model sensitivity to counting constraints [20][21]. - The study identified that the complexity of object shapes correlates with the severity of counting hallucinations, with more complex structures leading to higher rates of errors [26]. Group 4: Correlation Analysis - The correlation between counting hallucination rates and FID scores varies depending on the dataset and solver type, suggesting that FID may not reliably reflect factual consistency [30][32]. - Non-counting failure rates showed a stable and significant correlation with FID across conditions, indicating that FID is more effective in assessing overall visual consistency rather than specific factual features [32]. Group 5: Proposed Solution - The research team proposed a Joint-Diffusion Model (JDM) that incorporates structural constraints during the diffusion process to guide the model in generating the correct number of objects [33][35]. - This approach enhances the semantic consistency and visual credibility of generated results, effectively mitigating counting hallucination issues [35]. Group 6: Future Directions - The work opens avenues for exploring higher-order factual consistency in generative models, extending the analysis to more complex hallucination types and integrating abstract knowledge into the diffusion process [37]. - The ultimate goal is to transform generative models from mere creative tools into reliable world models applicable in critical fields requiring high accuracy [37].
季度AI视频生成产品:多模态输入成标配,角逐一站式生成能力 | 量子位智库AI 100
量子位· 2025-10-18 07:33
Core Insights - The article highlights the rapid growth and competition in the AI video generation sector, with significant advancements in technology and user engagement metrics [3][6][7]. Group 1: Market Trends - Sora2 has achieved over 1 million downloads in just five days, indicating a surge in interest in AI video generation [3]. - Major companies like Google are launching competitive products such as Veo3.1, focusing on audio generation, which is expected to further intensify market competition [4]. - The integration of visual models with world models is enhancing the realism of AI-generated videos, allowing for the creation of intricate 3D physical scenes [6]. Group 2: Technological Advancements - The latest AI 100 list from Quantum Bit Think Tank shows a diverse technological evolution in AI video generation, with multi-modal input becoming standard [7]. - Output quality has significantly improved, with video lengths extending from seconds to minutes, and resolutions reaching 2K and 4K, with frame rates up to 60fps [7]. - User data reflects this trend, with five AI video generation products exceeding 200,000 visits, showcasing the growing demand [8]. Group 3: Product Highlights - The article details several leading AI video generation products, including: - **Jimeng AI**: Over 11 million downloads, with a 27% increase in visits, reaching approximately 9.5 million [9]. - **Keling AI**: Web version monthly visits surpassing 1 million, indicating strong user engagement [9]. - **RoboNeo**: A product from Meitu, focusing on image and video generation with a comprehensive workflow [10]. Group 4: Competitive Landscape - The competitive landscape features various companies, each with unique offerings: - **Jimeng AI**: A one-stop AI creation platform with advanced video generation capabilities [15]. - **Tencent's Mixed Yuan 3D**: A platform for creating immersive 3D content [18]. - **Keling AI**: A creative productivity platform with robust video generation features [20]. - Other notable products include **Sea Cucumber AI**, **Drawing Ideas**, and **Medeo**, each contributing to the diverse capabilities in the AI video generation market [24][56].
61岁退休后,华为海思创始总裁成了复旦北大清华老师
量子位· 2025-10-18 07:33
Jay 发自 凹非寺 量子位 | 公众号 QbitAI 原来低调退休的华为海思创始总裁 徐文伟 ,现在的新身份是 大学老师 。 最近,清华五道口AI首期班开学的报道中,徐文伟以教授身份亮相,给企业家学员上了一堂 《AI时代的企业创新》 为题的课程。 据说课上,徐文伟教授结合华为突破欧洲市场的故事,生动地解析了创新与商业的关系,还为企业家们分享了干货满满的创新方法论。 这也是这位前华为董事、科学家咨询委员会主任、战略研究院院长、战略Marketing总裁、企业业务总裁、IRB主任、欧洲地区部总裁以及海 思半导体总裁…… 在满满当当的履历中, 一步一个脚印凝结下来的宝贵经验 。 1963年9月,徐文伟出生于江苏常州,1990年从东南大学毕业,一年后加入华为,开启了长达三十多年的职业生涯长跑。 任职期间,徐文伟战功赫赫 ,包括但不限于:主持研发首款局用程控交换机、首颗芯片、首套GSM系统及首台云数据中心核心交换机、提出 创新2.0战略,2020年发布面向数学的十大挑战问题,布局光子计算、裸眼3D显示等前沿技术研发…… 直到2024年,在61岁时低调退休。 低调荣休后的新生活 从2023年4月起,华为启动新一轮高层换 ...
杨振宁教授千古!中国AI计算机产业因他而不同
量子位· 2025-10-18 04:45
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI 巨星陨落! 据新华社消息,杨振宁先生因病于2025年10月18日在北京逝世,享年103岁。 杨振宁,1922年出生于安徽合肥,中国理论物理学家。 1938年,杨振宁考入西南联大,1942年入清华大学研究院,1944年获理学硕士学位,1945 年作为清华大学留美公费生赴美留学,就读于芝加哥大学,1948年获博士学位后留校工作。 1949年,他加入普林斯顿高等研究院,1952年任永久研究员,1955年任教授。1966年任纽 约州立大学石溪分校爱因斯坦讲座教授,创立理论物理研究所(现名为杨振宁理论物理研究 所),并在该研究所工作至1999年。1986年起应邀担任香港中文大学博文讲座教授。1997 年起任新成立的清华大学高等研究中心(现名为高等研究院)名誉主任,1999年起任清华大 学教授。 他最为人所熟知的物理学贡献,是在1956年,和李政道合作提出了 弱相互作用中宇称不守恒 的理论。 这一理论彻底改变了物理学界对自然界基本对称性的认识,被认为是20世纪物理学一项重要 的里程碑。 对于中国的后辈学者而言,杨振宁更是一位伟大的教育者。 自1970年代起,杨振宁就 ...
通用型产品增长停滞,垂直赛道成市场新解法丨季度AI 100数据解读
量子位· 2025-10-18 02:07
Core Insights - The "AI 100" list has been released, indicating a highly competitive landscape for AI products, with both internet giants and startups optimizing user experiences to capture market share [2][4]. APP Sector AI Product Status - There is a stagnation in growth for web-based AI products, with total visits and monthly active users (MAU) remaining flat at 600 million and 130 million respectively, while leading products show slight declines [6]. - Growth engines have shifted from general head products to niche, high-segment products, with new applications in emerging fields like AI health gaining significant traction [6]. - Notable growth in user numbers for comprehensive office agents and industry-specific agents, such as Kouzi Space and RoboNeo, indicates a validation of agent product value [6]. User Scale Top 10 Products - The top 10 AI products by cumulative downloads on the APP end as of September 2025 include: 1. Quark: ~251 million 2. Doubao: ~233 million 3. Kimi: ~92 million 4. DeepSeek: ~77 million 5. Xingtou: ~77 million 6. Jimeng AI: ~76 million 7. QQ Browser: ~74 million 8. Tencent Yuanbao: ~67 million 9. Meitu Xiuxiu: ~41 million 10. NetEase Youdao Dictionary: ~40 million - A total of 23 products have downloads exceeding 10 million [7]. User Growth Top 10 Products - The top 10 products by new downloads in September 2025 include: 1. Doubao: ~27 million 2. Quark: ~23 million 3. Jimeng AI: ~12 million 4. Tencent Yuanbao: ~11 million 5. QQ Browser: ~8.1 million 6. Xingtou: ~7 million 7. Xingge: ~6.7 million 8. NetEase Youdao Dictionary: ~5.6 million 9. AQ: ~5 million 10. Kimi: ~4.8 million - Total new downloads for AI apps exceeded 166 million in September, a 27% increase from June [9][10]. User Activity Top 10 Products - The top 10 products by daily active users (DAU) in September 2025 include: 1. WPS: ~61 million 2. QQ Browser: ~52 million 3. Doubao: ~33 million 4. DeepSeek: ~26 million 5. Quark: ~22 million 6. Meitu Xiuxiu: ~18 million 7. Tencent Yuanbao: ~17 million 8. Kuaidui: ~12 million 9. NetEase Mail Master: ~7.3 million 10. NetEase Youdao Dictionary: ~6.5 million - The average daily usage of AI apps reached nearly 300 million, with a nearly 50% increase since June [11][12]. APP Sector Analysis - The concentration of top AI products has weakened, with noticeable increases in downloads and daily active users for mid-tier products [14]. - The market share of the top 5 products has decreased from over 60% in Q2 to below 50% [15]. - Doubao and Quark are the only two products with new downloads exceeding 20 million in September, leading the market significantly [16]. Web Sector AI Product Status - The top 10 web-based AI products by total visits in September 2025 include: 1. DeepSeek: ~115 million 2. Doubao: ~85 million 3. Quark: ~82 million 4. Baidu AI Search: ~44 million 5. Tencent Docs: ~41 million 6. Kimi: ~30 million 7. Tongyi: ~29 million 8. WPS Office: ~25 million 9. Tencent Yuanbao: ~22 million 10. Baidu Wenku: ~17 million - The top three products account for 47% of total web-based AI product visits [18]. User Activity Top 10 Products (Web) - The top 10 products by unique visitors in September 2025 include: 1. Quark: ~19 million 2. Baidu AI Search: ~13 million 3. DeepSeek: ~13 million 4. Doubao: ~10 million 5. Baidu Wenku: ~8.6 million 6. Tongyi: ~7.1 million 7. Tencent Docs: ~6.2 million 8. WPS: ~4.9 million 9. Kimi: ~4.6 million 10. Zhihu Zhidao: ~3.4 million - There are 19 products with MAU exceeding 1 million, with Baidu AI Search showing significant growth [21][23]. User Engagement Top 10 Products (Web) - The top 10 products by average visits per user in September 2025 include: 1. Mogao Design: 9.5 2. DeepSeek: 9.1 3. Doubao: 8.3 4. Tencent Yuanbao: 8.2 5. Wenxiaobai: 8.0 6. Moke AI: 7.5 7. Modao AI: 6.8 8. Tencent Docs: 6.6 9. Xiangzhi HaiSnap: 6.6 10. Kimi: 6.5 - The top 10 in user engagement is dominated by AI office efficiency and intelligent assistant applications [25][26]. Web Sector Analysis - Total visits for web-based AI products exceeded 600 million in September, showing growth from 570 million in June, while total active users remained stable at approximately 124 million [27]. - The threshold for the top 10 products in visits and active users has decreased, indicating a shift in user engagement dynamics [27]. - The emergence of AI agents is diverting traffic from traditional web-based products, with agent products gaining significant traction [33].