量子位 - filings, earnings calls, financial reports, news

量子位

Search documents

沈向洋解读AI演进五大维度！IDEA研究院发布“万物可抓取”模型，GPU渲染器打破国外垄断

量子位· 2025-11-23 01:00

Core Insights - The core perspective presented by Shen Xiangyang emphasizes that opportunities arise not only from technology itself but also from understanding how technology transforms carriers, interactions, computing, and data, thereby redistributing capabilities, resources, and value [19]. Algorithm Paradigm - AI algorithms have evolved from supervised learning, which focuses on building expression and generation capabilities, to reinforcement learning that incorporates causality and execution [3]. - The next phase of exploration is moving towards autonomous learning at a higher cognitive level [4]. Intelligent Carrier - Current focus has shifted from language and multimodal models to world models and embodied models, indicating a transition from abstract symbolic spaces to physical spaces [7][8]. Human-Machine Interaction - Shen highlights the importance of innovation in human-machine interaction, which has undergone several paradigm shifts over the past 70 years, reflecting changes in underlying technological waves [11]. - Developers today must recognize the interaction opportunities brought by intelligent features, with natural interaction evolving from passive responses to proactive modes [12]. Computing Architecture - The trend in computing architecture is moving from general-purpose to specialized, and from singular to diverse solutions, with a focus on balancing performance, cost, and energy efficiency [14][15]. Data Utilization - Data has transitioned from being static textbooks in the simulation world to dynamic feedback in the exploration world, and finally to evidence for hypothesis validation in the induction world [17]. Future of Individuals and Organizations - As AI amplifies individual capabilities, tasks that previously required multi-step collaboration can now be completed by single individuals end-to-end [21]. - The focus of organizations may shift from control to defining problems and integrating individual collaboration [22]. Innovation Projects - The IDEA Research Institute announced the establishment of two innovation platforms and a series of innovative projects, including the DINO-X Grasp model aimed at enhancing embodied intelligence [24][25]. - DINO-X Grasp can accurately identify objects and predict grasp points, demonstrating versatility across various devices and scenarios [29][41]. MoonBit Programming Language - The MoonBit programming language, developed by Zhang Hongbo's team, has gained significant traction, with user numbers increasing from 26,000 to over 100,000 in a year and a half [37]. - MoonBit is designed for the AI era, featuring a native AI assistant and outperforming Java in performance benchmarks by nearly ten times [38]. Smart Renderer - The Smart renderer, developed by Wang Jiaping's team, offers real-time rendering capabilities that significantly reduce the time required for scene rendering in the film industry [46][48]. - Major companies in the film industry, such as MoreVFX and Light Chaser Animation, have begun utilizing the Smart renderer [49].

Artificial Intelligence

具身智能

Artificial Intelligence

DINO-X Grasp模型

MoonBit编程语言

GPU渲染器Smart

Artificial Intelligence

具身智能

Artificial Intelligence

量子位· 2025-11-22 06:00

Core Insights - The article discusses the launch of MiMo-Embodied, the world's first open-source model that integrates embodied intelligence and autonomous driving, developed by Xiaomi's MiMo team [2][6][8]. Group 1: Model Overview - MiMo-Embodied is a unified multimodal foundation model that successfully merges the fields of autonomous driving and embodied AI [6][8]. - The model achieved state-of-the-art (SOTA) performance across 29 benchmarks in tasks related to planning, spatial understanding, environmental perception, and driving [8][25]. Group 2: Challenges Addressed - Previous models in the embodied and autonomous driving domains lacked a unified approach, limiting their ability to interact effectively with dynamic environments [10][12]. - The absence of a comprehensive evaluation system for cross-embodied capabilities hindered the assessment of models' performance across both fields [13][14]. Group 3: Data and Training Strategy - MiMo-Embodied utilizes a high-quality dataset that encompasses general visual understanding, embodied tasks, and driving scenarios, employing a progressive four-stage training strategy [19][21]. - The training strategy includes phases for embodied AI supervision, autonomous driving supervision, chain-of-thought reasoning, and reinforcement learning [23][24]. Group 4: Experimental Results - Quantitative evaluations showed MiMo-Embodied's competitive results in affordance prediction, task planning, and spatial understanding, outperforming both general multimodal models and specialized embodied models [28][29]. - In autonomous driving capabilities, the model demonstrated strong performance in perception, prediction, and planning across various benchmark tests [30][31]. Group 5: Real-World Applications - The model's qualitative assessments highlighted its effectiveness in complex interactive environments, particularly in embodied navigation and operational tasks [32][34]. - MiMo-Embodied excelled in handling diverse driving scenarios, including intersection turns, lane changes, and obstacle avoidance, showcasing its robust decision-making capabilities [38][41].

AI问答，直接「拍」给你看！来自快手可灵&香港城市大学

量子位· 2025-11-22 03:07

Core Insights - The article introduces a novel AI model called VANS, which generates videos as answers instead of traditional text responses, aiming to bridge the gap between understanding and execution in tasks [3][4][5]. Group 1: Concept and Motivation - The motivation behind this research is to utilize video, which inherently conveys dynamic physical world information that language struggles to describe accurately [5]. - The traditional approach to "next event prediction" has primarily focused on text-based answers, whereas VANS proposes a new task paradigm where the model generates a video as the response [8][9]. Group 2: Model Structure and Functionality - VANS consists of a visual language model (VLM) and a video diffusion model (VDM), optimized through a joint strategy called Joint-GRPO, which enhances collaboration between the two models [19][24]. - The workflow involves two main steps: perception and reasoning, where the input video is encoded and analyzed, followed by conditional generation, where the model creates a video based on the generated text title and visual features [20]. Group 3: Optimization Process - The optimization process is divided into two phases: first, enhancing the VLM to produce titles that are visually representable, and second, refining the VDM to ensure the generated video aligns semantically with the title and context of the input video [25][28]. - Joint-GRPO acts as a director, ensuring that both the "thinker" (VLM) and the "artist" (VDM) work in harmony, improving their outputs through mutual feedback [34][36]. Group 4: Applications and Impact - VANS has two significant applications: procedural teaching, where it can provide customized instructional videos based on user input, and multi-future prediction, allowing for creative exploration of various hypothetical scenarios [37][41]. - The model has shown superior performance in benchmarks, significantly outperforming existing models in metrics such as ROUGE-L and CLIP-T, indicating its effectiveness in both semantic fidelity and video quality [46][47]. Group 5: Experimental Results - Comprehensive evaluations demonstrate that VANS excels in procedural teaching and future prediction tasks, achieving nearly three times the performance improvement in event prediction accuracy compared to the best existing models [44][46]. - Qualitative results highlight VANS's ability to accurately visualize fine-grained actions, showcasing its advanced semantic understanding and visual generation capabilities [50][53]. Conclusion - The research on Video-as-Answer represents a significant advancement in video generation technology, moving beyond entertainment to practical applications, enabling a more intuitive interaction with machines and knowledge [55][56].

国产AI拿下国际物理奥赛金牌，13项顶级竞赛豪取12金1银，划重点：开源

量子位· 2025-11-22 03:07

Core Insights - The article discusses the achievements of the P1 model family developed by the Shanghai Artificial Intelligence Laboratory, particularly the P1-235B-A22B model, which has excelled in various physics competitions, including the International Physics Olympiad (IPhO) 2025, where it became the first open-source model to reach the gold medal threshold [1][3][37]. Group 1: Model Performance - P1-235B-A22B scored 21.2 out of 30 in the IPhO 2025 theoretical exam, ranking third overall, just behind Gemini-2.5-Pro and GPT-5 [3][37]. - In the HiPhO benchmark, which includes 13 top physics competitions, the average score of P1-235B-A22B improved from 35.9 to 38.4 after integrating the PhysicsMinions framework, surpassing Gemini-2.5-Pro (37.7) and GPT-5 (37.4) [5][38]. - In the Chinese Physics Olympiad (CPhO) 2025, P1-235B-A22B achieved a score of 227 out of 320, significantly higher than the human gold medalist's score of 199 [6][41]. Group 2: Training Methodology - The model was trained using a multi-stage reinforcement learning process, formalizing physics problem-solving as a sequential decision-making task [19][20]. - A high-quality dataset of 5,065 physics problems was constructed, including 4,126 from Olympiads and 939 from textbooks, covering five major fields and 25 subfields [11][13]. - The training utilized a novel Group Sequence Policy Optimization (GSPO) method to enhance learning efficiency and address the sparsity of rewards in physics problem-solving [20][23]. Group 3: Open Source and Collaboration - The entire process, from model architecture to evaluation datasets and the intelligent agent framework, has been made fully open-source [9]. - The PhysicsMinions framework, consisting of three interactive modules (Visual Studio, Logic Studio, and Review Studio), was designed to enhance the reasoning quality of the model [30][33]. - The collaborative approach within PhysicsMinions allows for continuous improvement of answers through a structured review process [30][33]. Group 4: Competitive Edge - P1-235B-A22B achieved 12 gold and 1 silver medal across 13 competitions, ranking it among the top models in the field [34][38]. - The lightweight model P1-30B-A3B also performed well, securing 8 gold, 4 silver, and 1 bronze medal, placing it third among open-source models [38].

量子位· 2025-11-22 03:07

鱼羊发自凹非寺量子位 | 公众号 QbitAI 北大数院"黄金一代"，现在有了院士代表。 99级毕业生、现任北大博雅特聘教授、数学科学学院副院长的刘若川，刚刚当选中国科学院院士。 △ 图源：北大数院官网 1980年出生的他，入选年龄44岁，是今年新增选两院院士中最年轻的一位，也是首位"80后"院士。本硕学业都在北大完成之外，刘若川在2012年就早早回归北大任教，亦曾表示：把最好的研究成果在中国做出来。首位"80后"院士学生时代，刘若川的数学天赋早有展现—— 他是1999年第40届国际数学奥林匹克竞赛（IMO）金牌得主。同年，他保送进入北京大学数学科学学院学习。在北大，刘若川师从田刚教授，5年就完成了本硕阶段课程：2002年获理学学士学位，2004年获理学硕士学位。 2008年，从MIT博士毕业后，刘若川赴法国巴黎第七大学从事博士后研究。在2012年回归北大，相继在北京大学北京国际数学研究中心、数学科学学院任教。刘若川的主要研究领域是算术几何与代数数论。他在p进霍奇理论、p进自守形式以及代数K理论等当代数学前沿方向取得了一系列杰出成果，特别是对非交换p进霍奇理论作出了开创性工作。他 ...

抢先报名！MEET2026最新嘉宾阵容官宣，一起热聊AI

量子位· 2025-11-22 03:07

Core Insights - The article emphasizes the transformative impact of artificial intelligence (AI) on various industries, marking the beginning of a new era in 2025 [1] - The MEET2026 Intelligent Future Conference will focus on cutting-edge technologies and industry advancements related to AI [2][3] - The conference will feature discussions on key topics such as reinforcement learning, multimodal AI, chip computing power, and AI applications across industries [4] Event Details - The MEET2026 conference will take place on December 10, 2025, at the Beijing Jinmao Renaissance Hotel, with registration now open [105][107] - The event aims to attract thousands of technology professionals and millions of online viewers, establishing itself as a significant annual technology business summit [107] Key Themes and Reports - The conference will highlight the intersection of academic advancements and commercial applications, showcasing leading technological achievements from various sectors [5] - An annual AI ranking and trend report will be released during the conference, focusing on influential companies, products, and individuals in the AI industry [6][102][103] - The 2025 AI Trend Report will analyze ten significant AI trends based on technology maturity, implementation status, and potential value [104] Notable Speakers - The conference will feature prominent figures from academia and industry, including experts from Tsinghua University, JD.com, Xiaomi, and other leading organizations [12][17][22][30][40][44][48][53][58][63][67][75][80][86][90][95] - These speakers will share insights on the latest developments in AI and its applications across various fields [5][107]

Artificial Intelligence

JoyAI大模型

Kaldi

TableGPT

Artificial Intelligence

JoyAI大模型

Kaldi

TableGPT

对话范浩强：10亿融资之前，我们手搓了5000元“丐版硬件”

量子位· 2025-11-21 09:00

Core Viewpoint - The article discusses the emergence of a new player in the field of embodied intelligence, highlighting the journey of the founding team from their previous experiences in AI to their current entrepreneurial venture, which focuses on practical applications in logistics and robotics [4][5][20]. Group 1: Company Formation and Background - The founding team of Yuanli Lingji consists of veterans from the AI 1.0 era, specifically from the company Megvii, bringing extensive experience in transitioning AI from laboratory settings to industrial applications [6][5]. - The initial inspiration for the startup came from the realization that many previously imported components for robotics are now available domestically, providing a solid material foundation for development [9][10]. - The company was officially established in March 2025 after a year of experimentation and prototype development [18][17]. Group 2: Business Focus and Strategy - Yuanli Lingji aims to penetrate the logistics sector, focusing on high-frequency, rule-based tasks such as sorting and distribution, leveraging their self-developed multimodal embodied intelligence model [20][21]. - The company has already demonstrated basic delivery capabilities and completed proof of concept (POC) in logistics scenarios within ten months of establishment [22][25]. - The founders emphasize the importance of hardware, AI, and application scenarios being equally critical for the success of robotics in industrial settings [26][60]. Group 3: Technological Development and Innovation - The company is developing its own hardware to meet industrial standards, focusing on reliability, consistency, and ease of maintenance, with plans to release a new generation of embodied robots [27][28]. - The founding team has a strong background in AI, having achieved significant milestones in various applications, which positions them well for the current AI 2.0 landscape [30][32]. - Yuanli Lingji has released several open-source tools and platforms to lower barriers for researchers and developers in the field of embodied intelligence, including Dexbotic and Robochallenge [38][44][50]. Group 4: Market Perspective and Future Outlook - The company acknowledges the current market's cautious approach, with potential industrial clients being in a phase of observation and exploratory investment [60][62]. - The founders believe that the development and application of technology will follow a long cycle, drawing from their experiences in the AI 1.0 era, and are committed to a patient and steady growth strategy [65][66]. - Yuanli Lingji aims to contribute to the standardization and open collaboration in the field of embodied intelligence, fostering a community that can innovate collectively [47][58].

量子位· 2025-11-21 09:00

henry 发自凹非寺量子位 | 公众号 QbitAI 急，奥特曼现在就很急。隔壁谷歌0CD狂甩"大招"，ChatGPT这头更新了一个不痛不痒的"小功能"—— 群聊。唯一多的可能就是一个更聪明的聊天机器人，能判断群内气氛，决定发言还是闭麦。有意思的是，不久前，奥特曼才在2025的开发者日上坚定地表示"绝不做美国微信，不做社交"。结果，刚过一个月，横跳了！现在，这手更新怎么看都像是"隔壁都那样了，咱也拾掇拾掇跟一个吧？" 多少有点，没活硬整…… 这手GPT版群聊跟微信和QQ上的群聊不能说是一模一样，也能说是神似。 ChatGPT版群聊事情是这样的，在谷歌发布 Nano Banana Pro 的前后脚，GPT在所有套餐（Free、Go、Plus和Pro）上推出了免费的群聊（group chat）功能。 OpenAI表示，这在一些试点地区得到了一致好评。跟一般社媒的建群流程类似，在GPT应用中，点击对话右上角图标就能建群，通过分享链接可邀请最多20人，群成员还能二次转发邀请。同样的，GPT版群聊进群需要换头像，修改群昵称和个人资料。所有群聊都会显示在左侧边栏，也支持管理、设置群昵称， ...

抢先报名！MEET2026最新嘉宾阵容官宣，一起热聊AI

量子位· 2025-11-21 06:29

Core Viewpoint - The article emphasizes the transformative impact of artificial intelligence (AI) on various industries and society as a whole, highlighting the upcoming MEET2026 conference as a platform to explore these advancements and trends in AI technology [1][3]. Group 1: Conference Overview - The MEET2026 Intelligent Future Conference will focus on cutting-edge technologies and industry developments, particularly in AI [2]. - The theme of the conference is "Symbiosis Without Boundaries, Intelligence to Ignite the Future," aiming to explore how AI transcends industry, discipline, and scenario boundaries [3]. - Key topics of discussion will include reinforcement learning, multimodal AI, chip computing power, AI applications in various industries, and AI's global expansion [4]. Group 2: Notable Speakers - The conference will feature prominent figures such as Zhang Yaqin, a renowned scientist and former president of Baidu, who has made significant contributions to digital video and AI [12][13]. - Sun Maosong, Executive Vice President of the Tsinghua University AI Research Institute, will also be a key speaker, known for his leadership in national research projects [17]. - Other notable speakers include Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, and He Xiaodong, Senior Vice President of JD Group, both recognized for their contributions to AI research and applications [21][30]. Group 3: AI Trends and Reports - The conference will feature the release of the "Artificial Intelligence Annual List" and the "Annual AI Trends Report," which are anticipated to provide insights into the most influential companies, products, and individuals in the AI sector [6][102]. - The "Annual AI Trends Report" will identify and analyze ten significant AI trends based on technology maturity, current applications, and potential value [104]. Group 4: Event Details - The MEET2026 conference is scheduled for December 10, 2025, at the Beijing Jinmao Renaissance Hotel, with registration now open for attendees [105]. - The event is recognized as a major technology business summit, attracting thousands of industry professionals and millions of online viewers each year, establishing itself as a key indicator of trends in the intelligent technology sector [107].

Artificial Intelligence

Artificial Intelligence

太炸裂了！全网实测Nano Banana Pro，网友：这模型里到底装了什么鬼东西！

量子位· 2025-11-21 06:29

Core Insights - Google has launched the Nano Banana Pro, a powerful image generation model that has garnered significant attention and excitement across the internet [11][10]. - The model integrates multi-modal understanding capabilities from Gemini 3 Pro and Google's extensive knowledge base, allowing it to comprehend real-world semantics and physical logic [12]. Features and Capabilities - Users can access the Nano Banana Pro for free through the Gemini application, although there are usage limits for free accounts, while subscribers to Google AI Plus, Pro, and Ultra enjoy higher quotas [13]. - The model supports high-resolution outputs, including 2K and 4K, and can generate complex professional charts, enhancing its utility for various applications [15][46]. - It has improved text rendering capabilities, allowing for multi-language support and direct translation of text within images [15]. User Experience and Performance - Initial tests demonstrated the model's ability to create detailed and aesthetically pleasing visual outputs, such as exploded views of bicycle components and scenes with dolls [14][20]. - The model's performance is influenced by the specificity of user prompts, with clearer instructions leading to better results [23]. - Users have reported a surge in creative applications of the Nano Banana Pro, showcasing its versatility in generating illustrations, infographics, and even comic strips [28][34][42]. Industry Impact - The launch of Nano Banana Pro is seen as a significant advancement in AI-generated imagery, pushing the boundaries of what is possible in this field [26]. - Google CEO Sundar Pichai has endorsed the model, highlighting its advanced image generation and editing capabilities, which are designed to meet the needs of professionals in various industries [46].

AI生图

Artificial Intelligence

Artificial Intelligence