量子位
Search documents
AI问答,直接「拍」给你看!来自快手可灵&香港城市大学
量子位· 2025-11-22 03:07
Core Insights - The article introduces a novel AI model called VANS, which generates videos as answers instead of traditional text responses, aiming to bridge the gap between understanding and execution in tasks [3][4][5]. Group 1: Concept and Motivation - The motivation behind this research is to utilize video, which inherently conveys dynamic physical world information that language struggles to describe accurately [5]. - The traditional approach to "next event prediction" has primarily focused on text-based answers, whereas VANS proposes a new task paradigm where the model generates a video as the response [8][9]. Group 2: Model Structure and Functionality - VANS consists of a visual language model (VLM) and a video diffusion model (VDM), optimized through a joint strategy called Joint-GRPO, which enhances collaboration between the two models [19][24]. - The workflow involves two main steps: perception and reasoning, where the input video is encoded and analyzed, followed by conditional generation, where the model creates a video based on the generated text title and visual features [20]. Group 3: Optimization Process - The optimization process is divided into two phases: first, enhancing the VLM to produce titles that are visually representable, and second, refining the VDM to ensure the generated video aligns semantically with the title and context of the input video [25][28]. - Joint-GRPO acts as a director, ensuring that both the "thinker" (VLM) and the "artist" (VDM) work in harmony, improving their outputs through mutual feedback [34][36]. Group 4: Applications and Impact - VANS has two significant applications: procedural teaching, where it can provide customized instructional videos based on user input, and multi-future prediction, allowing for creative exploration of various hypothetical scenarios [37][41]. - The model has shown superior performance in benchmarks, significantly outperforming existing models in metrics such as ROUGE-L and CLIP-T, indicating its effectiveness in both semantic fidelity and video quality [46][47]. Group 5: Experimental Results - Comprehensive evaluations demonstrate that VANS excels in procedural teaching and future prediction tasks, achieving nearly three times the performance improvement in event prediction accuracy compared to the best existing models [44][46]. - Qualitative results highlight VANS's ability to accurately visualize fine-grained actions, showcasing its advanced semantic understanding and visual generation capabilities [50][53]. Conclusion - The research on Video-as-Answer represents a significant advancement in video generation technology, moving beyond entertainment to practical applications, enabling a more intuitive interaction with machines and knowledge [55][56].
国产AI拿下国际物理奥赛金牌,13项顶级竞赛豪取12金1银,划重点:开源
量子位· 2025-11-22 03:07
Core Insights - The article discusses the achievements of the P1 model family developed by the Shanghai Artificial Intelligence Laboratory, particularly the P1-235B-A22B model, which has excelled in various physics competitions, including the International Physics Olympiad (IPhO) 2025, where it became the first open-source model to reach the gold medal threshold [1][3][37]. Group 1: Model Performance - P1-235B-A22B scored 21.2 out of 30 in the IPhO 2025 theoretical exam, ranking third overall, just behind Gemini-2.5-Pro and GPT-5 [3][37]. - In the HiPhO benchmark, which includes 13 top physics competitions, the average score of P1-235B-A22B improved from 35.9 to 38.4 after integrating the PhysicsMinions framework, surpassing Gemini-2.5-Pro (37.7) and GPT-5 (37.4) [5][38]. - In the Chinese Physics Olympiad (CPhO) 2025, P1-235B-A22B achieved a score of 227 out of 320, significantly higher than the human gold medalist's score of 199 [6][41]. Group 2: Training Methodology - The model was trained using a multi-stage reinforcement learning process, formalizing physics problem-solving as a sequential decision-making task [19][20]. - A high-quality dataset of 5,065 physics problems was constructed, including 4,126 from Olympiads and 939 from textbooks, covering five major fields and 25 subfields [11][13]. - The training utilized a novel Group Sequence Policy Optimization (GSPO) method to enhance learning efficiency and address the sparsity of rewards in physics problem-solving [20][23]. Group 3: Open Source and Collaboration - The entire process, from model architecture to evaluation datasets and the intelligent agent framework, has been made fully open-source [9]. - The PhysicsMinions framework, consisting of three interactive modules (Visual Studio, Logic Studio, and Review Studio), was designed to enhance the reasoning quality of the model [30][33]. - The collaborative approach within PhysicsMinions allows for continuous improvement of answers through a structured review process [30][33]. Group 4: Competitive Edge - P1-235B-A22B achieved 12 gold and 1 silver medal across 13 competitions, ranking it among the top models in the field [34][38]. - The lightweight model P1-30B-A3B also performed well, securing 8 gold, 4 silver, and 1 bronze medal, placing it third among open-source models [38].
首位“80后”院士,来自北大数院
量子位· 2025-11-22 03:07
鱼羊 发自 凹非寺 量子位 | 公众号 QbitAI 北大数院"黄金一代",现在有了院士代表。 99级毕业生、现任北大博雅特聘教授、数学科学学院副院长的 刘若川 ,刚刚当选中国科学院院士。 △ 图源:北大数院官网 1980年出生的他,入选年龄44岁,是今年新增选两院院士中最年轻的一位,也是首位"80后"院士。 本硕学业都在北大完成之外,刘若川在2012年就早早回归北大任教,亦曾表示: 把最好的研究成果在中国做出来。 首位"80后"院士 学生时代,刘若川的数学天赋早有展现—— 他是1999年第40届国际数学奥林匹克竞赛(IMO)金牌得主。同年,他保送进入北京大学数学科学学院学习。 在北大,刘若川师从田刚教授,5年就完成了本硕阶段课程:2002年获理学学士学位,2004年获理学硕士学位。 2008年,从MIT博士毕业后,刘若川赴法国巴黎第七大学从事博士后研究。在2012年回归北大,相继在北京大学北京国际数学研究中心、数 学科学学院任教。 刘若川的主要研究领域是算术几何与代数数论。 他在p进霍奇理论、p进自守形式以及代数K理论等当代数学前沿方向取得了一系列杰出成果,特别是对非交换p进霍奇理论作出了开创性工 作。 他 ...
抢先报名!MEET2026最新嘉宾阵容官宣,一起热聊AI
量子位· 2025-11-22 03:07
Core Insights - The article emphasizes the transformative impact of artificial intelligence (AI) on various industries, marking the beginning of a new era in 2025 [1] - The MEET2026 Intelligent Future Conference will focus on cutting-edge technologies and industry advancements related to AI [2][3] - The conference will feature discussions on key topics such as reinforcement learning, multimodal AI, chip computing power, and AI applications across industries [4] Event Details - The MEET2026 conference will take place on December 10, 2025, at the Beijing Jinmao Renaissance Hotel, with registration now open [105][107] - The event aims to attract thousands of technology professionals and millions of online viewers, establishing itself as a significant annual technology business summit [107] Key Themes and Reports - The conference will highlight the intersection of academic advancements and commercial applications, showcasing leading technological achievements from various sectors [5] - An annual AI ranking and trend report will be released during the conference, focusing on influential companies, products, and individuals in the AI industry [6][102][103] - The 2025 AI Trend Report will analyze ten significant AI trends based on technology maturity, implementation status, and potential value [104] Notable Speakers - The conference will feature prominent figures from academia and industry, including experts from Tsinghua University, JD.com, Xiaomi, and other leading organizations [12][17][22][30][40][44][48][53][58][63][67][75][80][86][90][95] - These speakers will share insights on the latest developments in AI and its applications across various fields [5][107]
对话范浩强:10亿融资之前,我们手搓了5000元“丐版硬件”
量子位· 2025-11-21 09:00
Core Viewpoint - The article discusses the emergence of a new player in the field of embodied intelligence, highlighting the journey of the founding team from their previous experiences in AI to their current entrepreneurial venture, which focuses on practical applications in logistics and robotics [4][5][20]. Group 1: Company Formation and Background - The founding team of Yuanli Lingji consists of veterans from the AI 1.0 era, specifically from the company Megvii, bringing extensive experience in transitioning AI from laboratory settings to industrial applications [6][5]. - The initial inspiration for the startup came from the realization that many previously imported components for robotics are now available domestically, providing a solid material foundation for development [9][10]. - The company was officially established in March 2025 after a year of experimentation and prototype development [18][17]. Group 2: Business Focus and Strategy - Yuanli Lingji aims to penetrate the logistics sector, focusing on high-frequency, rule-based tasks such as sorting and distribution, leveraging their self-developed multimodal embodied intelligence model [20][21]. - The company has already demonstrated basic delivery capabilities and completed proof of concept (POC) in logistics scenarios within ten months of establishment [22][25]. - The founders emphasize the importance of hardware, AI, and application scenarios being equally critical for the success of robotics in industrial settings [26][60]. Group 3: Technological Development and Innovation - The company is developing its own hardware to meet industrial standards, focusing on reliability, consistency, and ease of maintenance, with plans to release a new generation of embodied robots [27][28]. - The founding team has a strong background in AI, having achieved significant milestones in various applications, which positions them well for the current AI 2.0 landscape [30][32]. - Yuanli Lingji has released several open-source tools and platforms to lower barriers for researchers and developers in the field of embodied intelligence, including Dexbotic and Robochallenge [38][44][50]. Group 4: Market Perspective and Future Outlook - The company acknowledges the current market's cautious approach, with potential industrial clients being in a phase of observation and exploratory investment [60][62]. - The founders believe that the development and application of technology will follow a long cycle, drawing from their experiences in the AI 1.0 era, and are committed to a patient and steady growth strategy [65][66]. - Yuanli Lingji aims to contribute to the standardization and open collaboration in the field of embodied intelligence, fostering a community that can innovate collectively [47][58].
ChatGPT开始搞社交了
量子位· 2025-11-21 09:00
henry 发自 凹非寺 量子位 | 公众号 QbitAI 急,奥特曼现在就很急。 隔壁谷歌0CD狂甩"大招",ChatGPT这头更新了一个不痛不痒的"小功能"—— 群聊 。 唯一多的可能就是一个更聪明的聊天机器人,能判断群内气氛,决定发言还是闭麦。 有意思的是,不久前,奥特曼才在2025的开发者日上坚定地表示"绝不做美国微信,不做社交"。 结果,刚过一个月,横跳了! 现在,这手更新怎么看都像是"隔壁都那样了,咱也拾掇拾掇跟一个吧?" 多少有点,没活硬整…… 这手GPT版群聊跟微信和QQ上的群聊不能说是一模一样,也能说是神似。 ChatGPT版群聊 事情是这样的,在谷歌发布 Nano Banana Pro 的前后脚,GPT在所有套餐(Free、Go、Plus和Pro)上推出了免费的 群聊 (group chat)功能。 OpenAI表示,这在一些试点地区得到了 一致好评 。 跟一般社媒的建群流程类似,在GPT应用中,点击对话右上角图标就能建群,通过分享链接可邀请最多20人,群成员还能二次转发邀请。 同样的,GPT版群聊进群需要换头像,修改群昵称和个人资料。 所有群聊都会显示在左侧边栏,也支持管理、设置群昵称, ...
抢先报名!MEET2026最新嘉宾阵容官宣,一起热聊AI
量子位· 2025-11-21 06:29
Core Viewpoint - The article emphasizes the transformative impact of artificial intelligence (AI) on various industries and society as a whole, highlighting the upcoming MEET2026 conference as a platform to explore these advancements and trends in AI technology [1][3]. Group 1: Conference Overview - The MEET2026 Intelligent Future Conference will focus on cutting-edge technologies and industry developments, particularly in AI [2]. - The theme of the conference is "Symbiosis Without Boundaries, Intelligence to Ignite the Future," aiming to explore how AI transcends industry, discipline, and scenario boundaries [3]. - Key topics of discussion will include reinforcement learning, multimodal AI, chip computing power, AI applications in various industries, and AI's global expansion [4]. Group 2: Notable Speakers - The conference will feature prominent figures such as Zhang Yaqin, a renowned scientist and former president of Baidu, who has made significant contributions to digital video and AI [12][13]. - Sun Maosong, Executive Vice President of the Tsinghua University AI Research Institute, will also be a key speaker, known for his leadership in national research projects [17]. - Other notable speakers include Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, and He Xiaodong, Senior Vice President of JD Group, both recognized for their contributions to AI research and applications [21][30]. Group 3: AI Trends and Reports - The conference will feature the release of the "Artificial Intelligence Annual List" and the "Annual AI Trends Report," which are anticipated to provide insights into the most influential companies, products, and individuals in the AI sector [6][102]. - The "Annual AI Trends Report" will identify and analyze ten significant AI trends based on technology maturity, current applications, and potential value [104]. Group 4: Event Details - The MEET2026 conference is scheduled for December 10, 2025, at the Beijing Jinmao Renaissance Hotel, with registration now open for attendees [105]. - The event is recognized as a major technology business summit, attracting thousands of industry professionals and millions of online viewers each year, establishing itself as a key indicator of trends in the intelligent technology sector [107].
太炸裂了!全网实测Nano Banana Pro,网友:这模型里到底装了什么鬼东西!
量子位· 2025-11-21 06:29
Core Insights - Google has launched the Nano Banana Pro, a powerful image generation model that has garnered significant attention and excitement across the internet [11][10]. - The model integrates multi-modal understanding capabilities from Gemini 3 Pro and Google's extensive knowledge base, allowing it to comprehend real-world semantics and physical logic [12]. Features and Capabilities - Users can access the Nano Banana Pro for free through the Gemini application, although there are usage limits for free accounts, while subscribers to Google AI Plus, Pro, and Ultra enjoy higher quotas [13]. - The model supports high-resolution outputs, including 2K and 4K, and can generate complex professional charts, enhancing its utility for various applications [15][46]. - It has improved text rendering capabilities, allowing for multi-language support and direct translation of text within images [15]. User Experience and Performance - Initial tests demonstrated the model's ability to create detailed and aesthetically pleasing visual outputs, such as exploded views of bicycle components and scenes with dolls [14][20]. - The model's performance is influenced by the specificity of user prompts, with clearer instructions leading to better results [23]. - Users have reported a surge in creative applications of the Nano Banana Pro, showcasing its versatility in generating illustrations, infographics, and even comic strips [28][34][42]. Industry Impact - The launch of Nano Banana Pro is seen as a significant advancement in AI-generated imagery, pushing the boundaries of what is possible in this field [26]. - Google CEO Sundar Pichai has endorsed the model, highlighting its advanced image generation and editing capabilities, which are designed to meet the needs of professionals in various industries [46].
4K超分Agent修图师来了!一键救活所有模糊照片
量子位· 2025-11-21 06:29
Core Insights - The article discusses the development of 4KAgent, an AI-based system designed to intelligently restore and upscale images to 4K resolution, addressing the limitations of traditional image enhancement methods [3][6][28] Group 1: Technology Overview - 4KAgent utilizes a multi-agent design to create tailored pathways for each image to achieve 4K resolution, enhancing visual perception [6][7] - The system incorporates a perception agent that analyzes image content and degradation information, generating a restoration plan based on various quality metrics [10][11] - The restoration agent employs an "execution-reflection-rollback" mechanism to iteratively optimize the restoration process, ensuring high-quality outputs [12][16] Group 2: Functionality and Features - 4KAgent supports nine different restoration tasks, utilizing state-of-the-art models to generate multiple candidate images for evaluation [13][14] - A face restoration module is integrated to specifically enhance facial details, ensuring high-quality results for images containing human faces [18] - The configuration module allows users to customize preferences for different restoration scenarios without requiring additional training [20] Group 3: Performance and Testing - 4KAgent has been extensively tested across 11 different super-resolution tasks and 26 benchmark datasets, demonstrating superior detail and accuracy in restored images [21][27] - In challenging scenarios, such as 16x upscaling, 4KAgent consistently produces high-detail and realistic textures, showcasing its effectiveness in various applications [25][27] - The system exhibits excellent generalization capabilities, performing well across diverse fields including natural scenes, portraits, AI-generated content, and scientific imaging [28]
振臂一挥,大半个具身机器人圈都来了!智源研究院:别藏了,谁贡献数据多,谁的大脑就更好用
量子位· 2025-11-21 06:29
Core Insights - The article discusses the significant impact of the "Embodied Intelligence Martial Arts Conference" held by Zhiyuan Research Institute, which gathered major players in the robotics industry to address data sharing and collaboration challenges [2][4][6]. Group 1: Zhiyuan's Role and Strategy - Zhiyuan Research Institute aims to be the "Android" of the embodied intelligence era, focusing on creating a collaborative ecosystem rather than competing directly in the market [5][21]. - The institute leverages its non-profit status to break down data silos, encouraging companies to share valuable data through mutual agreements [6][10]. - By providing a neutral platform, Zhiyuan positions itself as a "wall breaker," facilitating cooperation between academic and industrial sectors [11][9]. Group 2: Addressing Industry Pain Points - The robotics industry faces significant challenges due to data silos, where data from one type of robot cannot be utilized by another, leading to inefficiencies [7][8]. - Zhiyuan has introduced open-source high-quality real-world data, addressing the industry's need for better data [15]. - The launch of the RoboXstudio development platform and CoRobot data framework streamlines the development process for startups, allowing them to focus on product innovation [16][17]. Group 3: Standardization and Evaluation - The lack of standardized evaluation metrics in the robotics field has led to discrepancies between demo performances and real-world applications [18][20]. - Zhiyuan has established the RoboChallenge committee to create quantifiable and traceable evaluation standards for robotic models [20]. - This initiative aims to ensure that all robotic models can be assessed fairly, promoting transparency and reliability in the industry [20]. Group 4: Future Vision and Ecosystem Development - Zhiyuan envisions a future where robot development is as simple as building with blocks, emphasizing the need for a robust foundational framework [24][25]. - The institute is focused on creating a comprehensive system for embodied intelligence, including advancements in RoboBrain and Emu models to enhance learning and understanding [23][26]. - By gathering industry data and establishing standards, Zhiyuan aims to become a fundamental resource for the embodied intelligence sector, akin to essential utilities [26][29].