量子位
Search documents
DeepSeek-V3.2系列开源,性能直接对标Gemini-3.0-Pro
量子位· 2025-12-01 12:13
衡宇 发自 奥特赛德 量子位 | 公众号 QbitAI 突袭! ChatGPT发布三周年,DeepSeek嚯一下发出两个模型: 前者聚焦平衡实用 ,适用于日常问答、通用Agent任务、真实应用场景下的工具调用。 推理达GPT-5水平,略低于Gemini-3.0-Pro。 下图展示的是DeepSeek-V3.2与其他模型在各类Agent工具调用评测集上的得分 ——特别强调,DeepSeek-V3.2并没有针对这些测试集的工具做特殊训练。 划重点,ICPC达到人类选手第二、IOI人类选手第十名水平。 具体来说,DeepSeek-V3.2侧重于平衡推理能力与输出长度,降低计算开销。 DeepSeek官微推文中写道,"DeepSeek-V3.2模型在Agent评测中达到了当前开源模型的最高水平"。 该模型其他情况如下: DeepSeek-V3.2 DeepSeek-V3.2-Speciale 推理能力比肩GPT-5; 相比Kimi-K2-Thinking大幅缩短输出长度,减少用户等待时间; DeepSeek旗下首个"思考融入工具调用" 的模型,支持思考/非思考双模式工具调用; 基于1800+环境、85000+复杂指令 ...
字节“豆包手机”刚开卖,吉利系进展也曝光了:首月速成200人团队,挖遍华为小米荣耀
量子位· 2025-12-01 12:13
Core Viewpoint - The collaboration between ByteDance and ZTE on AI smartphones aims to establish a foothold in the AI operating system (AIOS) sector rather than focusing solely on the AI smartphone itself [3][14]. Group 1: ByteDance and ZTE Collaboration - ByteDance has launched its first AI smartphone, priced at 3499 yuan, featuring its self-developed large model Agent service [1]. - The smartphone integrates the Doubao mobile assistant technology, developed in collaboration with phone manufacturers at the operating system level [1]. Group 2: New Entrant - Zhiyue Qianli - A new company named Zhiyue Qianli, established in August 2023, is gaining attention for its focus on the AIOS sector [5][15]. - Zhiyue Qianli is closely related to the Geely group, with key figures such as Hao Jianguo involved in its establishment [6][7]. - The company aims to reshape human-computer interaction and build an ecosystem for the AI terminal era [15]. Group 3: Company Strategy and Development - Zhiyue Qianli plans to develop both AI models and hardware products, including smartphones and XR glasses, distinguishing its approach from ByteDance's focus [16][19]. - The company has rapidly expanded its workforce, reaching nearly 200 employees within its first month, indicating strong recruitment capabilities [23]. - It is actively building capabilities related to AIOS and hardware development, suggesting a comprehensive approach to product development [19][20]. Group 4: Industry Trends and Future Outlook - The relationship between AI and terminals is evolving, with AI terminals becoming a new industry keyword that encompasses hardware, software, and user interaction [25][26]. - AI terminals are expected to extend beyond traditional devices, potentially integrating into smart vehicles as central systems for human-machine collaboration [29]. - The trend indicates a convergence of software and hardware strategies among major players like Huawei and Xiaomi, with ByteDance and Geely also entering the fray [30][32].
字节视频模型超越Gemini 3 Pro!理解能力爆表,小时级素材也能直出剪辑方案
量子位· 2025-12-01 09:26
Core Insights - ByteDance's new video model Vidi2 demonstrates superior understanding capabilities compared to Gemini 3 Pro [1] - Vidi2 can generate JSON editing instructions based on hours of footage and a prompt, covering aspects like editing locations, dialogue, subtitles, and music [2][3] Group 1: Technical Capabilities - Vidi2 can autonomously process raw footage and create a detailed editing list, specifying exact timestamps, playback speed, subtitle styles, and even commentary [6][7] - The model excels in precise temporal and spatial localization, achieving a vIoU-Int. score of 60.3%, significantly outperforming GPT-5 (33.6%) and Gemini 3 Pro Preview (16.6%) [12] - Vidi2 maintains a retrieval accuracy of 38.7% even for videos longer than one hour, showcasing its stability in handling extended content [13] Group 2: Model Architecture - The core breakthrough of Vidi2 lies in its end-to-end temporal and spatial localization capabilities [16] - The model processes data through a unified encoding interface, treating static images as silent videos of one second, and employs an adaptive token compression strategy to manage information density based on video length [18] - Vidi2 is built on the architecture of Vidi1, integrating Google's latest open-source model Gemma-3 and enhanced visual encoders, with a total parameter count of 12 billion [19] Group 3: Data Utilization - To address the scarcity of temporal localization data, the development team created a unique data synthesis path, dynamically mapping static boundary boxes to video frames [23] - The training process incorporates a significant amount of high-precision labeled real-world video data to correct potential distribution biases from synthetic data [24] - Vidi2 employs a temporal-aware multimodal alignment strategy during training, enhancing the model's sensitivity to temporal boundaries through bidirectional prediction tasks [25] Group 4: Competitive Landscape - The competition in AI is increasingly data-driven, with companies like ByteDance leveraging their extensive short video data to enhance model performance [27][29]
AI永生赛道来了位15岁量子物理博士
量子位· 2025-12-01 09:26
Jay 发自 凹非寺 量子位 | 公众号 QbitAI AI创业赛道,即将迎来一名15岁少年博士—— 还是量子物理学方向的博士 。 就在最近,被誉为比利时「小爱因斯坦」的 Laurent Simons ,顺利完成博士论文答辩。 而随着此次答辩结束,Laurent也以15岁的年纪,跻身史上最年轻的物理学博士之一。 摘得「15岁量子物理博士」的头衔后,Laurent学术生涯的下一步,是进军 AI医疗 。 Laurent表示, 他希望开发出「超级人类」,并利用先进科学对抗生物衰老 。 有网友已为他规划好新一代的成神之路: 下一步是开始构建B2B AI SaaS产品,并申请加入YC。 所以说,年仅15的Laurent,究竟是怎样一步步走到了量子物理学的博士答辩会场? 15岁的量子物理博士 「最年轻博士之一」,从起步就比同龄人年轻一点。 (doge) 年仅四岁,Laurent便读了小学,两年便完成了小学学业,随后进入阿姆斯特丹的一所私立中学就读。 还有网友建议老马赶紧抓住机会: 埃隆·马斯克应该密切关注这位年轻人。 而对于这位正处于青春期的少年,也有网友提出了更现实的问题。(吃瓜.png) 然后,他发现了女孩…… 我总 ...
清华成立具身智能与机器人研究院
量子位· 2025-12-01 09:26
Core Viewpoint - The establishment of the Tsinghua University Institute of Embodied Intelligence and Robotics marks a significant step in the rapid development of embodied intelligence in China, reflecting a broader trend among domestic universities to accelerate their focus on this field [1][5][26]. Group 1: Institutional Developments - Tsinghua University has established the Institute of Embodied Intelligence and Robotics, following the earlier creation of the Beijing Key Laboratory of Embodied Intelligence Systems [2][5]. - The new institute is led by Professor Zhang Tao, with a core team covering key areas such as intelligent control, robot navigation, and swarm intelligence [7][12]. - This institute aims to integrate interdisciplinary research, major project undertakings, and high-level talent cultivation, moving beyond the foundational research focus of previous laboratories [12][15]. Group 2: Broader Academic Trends - Many Chinese universities are actively establishing research institutes and laboratories focused on embodied intelligence, indicating a nationwide trend [4][16]. - Fudan University and Beihang University have also launched their respective institutes dedicated to embodied intelligence, emphasizing a collaborative approach across various disciplines [18][21]. - The shift from smaller laboratories to larger research institutes signifies a move towards more coordinated and large-scale efforts in embodied intelligence research [25][26]. Group 3: Educational Initiatives - Shanghai Jiao Tong University has introduced the world's first four-year undergraduate program in embodied intelligence, with other universities also applying to establish similar programs [28][31]. - The establishment of dedicated programs aims to address the urgent demand for interdisciplinary talent in the embodied intelligence sector, as traditional automation and robotics programs do not adequately prepare graduates for the comprehensive roles required in this emerging field [36][35]. Group 4: Market Potential - The market for embodied intelligence in China is projected to reach 5.295 billion yuan by 2025, with global expectations of surpassing 232.6 billion yuan by 2030 [33]. - Goldman Sachs predicts that the global humanoid robot market could reach between 38 billion to 205 billion USD by 2035, highlighting the significant growth potential in this sector [34].
AI也会被DDL逼疯!正经研究发现:压力越大,AI越危险
量子位· 2025-12-01 05:45
鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI 好好好,被DDL逼疯的又多一个,这次是 AI 。 正经研究 发现,每天给Agent上压力push,AI也会撂挑子不干。 而且用的还是老板们的经典话术:"其实,我对你是有一些失望的。当初给你定级最强AI,是高于你面试时的水平的……" (咳咳) Stop! 连普通人类听了都鸭梨山大,何况是 Gemini 2.5 Pro 、 GPT-4o 这类顶尖模型,无一例外,全部KO。 其中最脆弱的还是Gemini 2.5 Pro,"崩溃"率甚至一度高达 79% …… 话不多说,下面来欣赏AI观察实录: 实验设置5874个场景,其中在每个测试场景中都会为每个模型分配一个任务+若干工具,模型需要通过使用工具 (安全工具/有害工具) 完成 任务,任务主要涉及四个领域: AI压力越大,犯错越多 研究人员首先对多个团队 (包括Google、Meta、OpenAI等) 约12款Agent模型进行了测试。 起初不会对模型施加压力,模型可以自由尝试若干步完成任务,随后研究团队会 逐渐为其增加压力程度 ,be like: 而研究结果让也人大吃一惊,那些在无压力的中性环境中看似绝对安全的模型 ...
让大模型学会“高维找茬”,中国联通新研究解决长文本图像检索痛点|AAAI 2026 Oral
量子位· 2025-12-01 05:45
Core Insights - The article discusses a new state-of-the-art (SOTA) model for long-text image retrieval called HiMo-CLIP, developed by the China Unicom Data Science and AI Research Institute, which addresses limitations in existing models like CLIP by effectively capturing semantic differences in context [2][4]. Group 1: Model Limitations - Existing models, including Long-CLIP, struggle with long text descriptions, often resulting in decreased alignment scores as the text becomes more detailed, indicating a failure to process the hierarchical structure of language [6][9]. - The phenomenon where longer descriptions lead to lower alignment scores highlights the inadequacy of current models in distinguishing core semantics from detailed information [6][9]. Group 2: HiMo-CLIP Framework - HiMo-CLIP introduces a plug-and-play representation framework that includes two core components: Hierarchical Decomposition (HiDe) and Monotonicity-aware Contrastive Loss (MoLo) [10][12]. - HiDe dynamically extracts semantic components using PCA within batches, while MoLo enforces alignment between the full text and its semantic components, ensuring monotonicity [12][17]. Group 3: Performance and Efficiency - HiMo-CLIP demonstrates significant advantages in both long and short text retrieval tasks, outperforming models trained on much larger datasets, achieving SOTA with only 1 million training samples [17][20]. - The model's ability to extract unique features from complex scenes allows it to maintain high performance across various retrieval benchmarks [18][22]. Group 4: Evaluation Metrics - The research team constructed the HiMo-Docci dataset and introduced the HiMo@K metric to quantify the model's understanding of hierarchical structures, achieving a high monotonicity correlation coefficient of 0.88, surpassing comparative methods [22][25]. - As text descriptions become more complete, HiMo-CLIP's scores show a consistent upward trend, while other models exhibit significant fluctuations [25][26].
速报!MEET2026嘉宾阵容再更新,观众报名从速
量子位· 2025-12-01 05:45
Core Insights - The MEET2026 Smart Future Conference will focus on cutting-edge technologies and industry developments that have garnered significant attention throughout the year [1] - The theme "Symbiosis Without Boundaries, Intelligence to Ignite the Future" emphasizes how AI and smart technologies penetrate various industries, disciplines, and scenarios, becoming a core driving force for societal evolution [2] Group 1: Conference Highlights - The conference will cover hot topics in the tech circle this year, including reinforcement learning, multimodal AI, chip computing power, AI in various industries, and AI going global [3] - It will feature the latest collisions between academic frontiers and commercial applications, showcasing leading technological achievements from infrastructure, models, and product industries [4] - The event will also include the authoritative release of the annual AI rankings and the annual AI trend report [5][116] Group 2: Notable Speakers - Zhang Yaqin, President of Tsinghua University's Intelligent Industry Research Institute and an academician of the Chinese Academy of Engineering, has extensive experience in AI and digital video technologies [11][12] - Sun Maosong, Executive Vice President of Tsinghua University's AI Research Institute, has led numerous national projects in AI research [15] - Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, has a strong background in AI core technology development and has published over 100 papers [19] Group 3: Industry Impact - The annual AI rankings initiated by Quantum Bit have become one of the most influential lists in the AI industry, evaluating companies, products, and individuals across three dimensions [117] - The annual AI trend report will analyze ten significant AI trends based on technology maturity, implementation status, and potential value, highlighting representative organizations and best cases [118] - The conference aims to attract thousands of tech professionals and millions of online viewers, establishing itself as an annual barometer for the smart technology industry [122]
6小时告破30年数学难题,亚里士多德一夜成名
量子位· 2025-12-01 05:45
微软前AI副总裁、目前在OpenAI研究AGI的Sebastien Bubeck激动分享了这一消息,并表示: 30年悬而未决的数学难题就这样被AI证明了?! 此时此刻, (前推特) 正在刮起一股讨论之风—— 来自Harmonic的数学AI模型独立证明了 Erdős问题 #124 ,而这个问题已经被数学家无奈搁置了近30年。 一水 发自 凹非寺 量子位 | 公众号 QbitAI 该解决方案100%由AI生成,总计耗时6小时。 甚至连陶哲轩这样的顶尖数学家也跑来围观讨论,他在对比了Gemini和ChatGPT的深度研究工具后发现,Harmonic模型对该问题的证明表 现更佳。 所以这到底是一个怎样的问题?Harmonic模型又是如何"大显神功"? 咱接着瞧—— AI证明了Erdős问题 #124简易版 首先需要提醒,在听完各路大神讨论后,我们才意识到—— 原来Harmonic模型所证明的并非原版Erdős问题 #124 ,而是一个简易版本 。 Erdős问题 #124需要提供的证明如下 : $$\sum_{1\leq i\leq k}{\frac{1}{d_{i}-1}}\geq1.$$ 通俗理解即为: 假设你有 ...
免费国产Banana真香!我想把PS给卸载了
量子位· 2025-12-01 05:45
Core Viewpoint - The article discusses the advancements in Vidu Q2, a product from Shengshu Technology, highlighting its superior consistency and new features in AI-generated images and videos, positioning it as a competitive alternative to established players like OpenAI and Google [8][9][57]. Group 1: Product Features - Vidu Q2 has upgraded its reference image generation capabilities, claiming to have the industry's strongest consistency, allowing for repeated edits while maintaining character and object integrity [8]. - The new features include text-to-image generation and image editing, enabling users to create images with simple prompts, comparable to advanced editing software [9][35]. - Vidu Q2's image editing function allows users to change image proportions and details without complex processes, making it user-friendly and efficient [37][46]. Group 2: Performance Comparison - In a performance comparison, Vidu Q2 ranked fourth in the latest AA leaderboard, surpassing OpenAI and competing closely with major companies like Google and ByteDance [9]. - The article emphasizes that Vidu Q2 maintains high consistency in image generation, outperforming competitors like Nano Banana Pro in preserving background and structural details [20][29]. Group 3: User Experience and Accessibility - Vidu Q2 offers a one-month free membership for its new features, making it accessible for users to explore its capabilities [11]. - The platform provides a streamlined workflow for creators, allowing seamless transitions between image and video generation, which reduces the trial-and-error costs associated with content creation [52][57].