量子位
Search documents
新豆包模型让郭德纲喊出发疯文学:(这班)不上了!不上了!不上了!!!
量子位· 2025-10-16 06:11
Core Viewpoint - The article discusses the advancements in AI voice technology by Huoshan Engine, particularly focusing on the upgrades to the Doubao voice synthesis and voice replication models, which enhance emotional expression and contextual understanding in AI-generated speech [5][11][41]. Group 1: AI Voice Technology Upgrades - Huoshan Engine has upgraded its Doubao voice synthesis model to version 2.0, which allows for better emotional expression and understanding of dialogue [7][11]. - The upgrade includes two main models: Doubao voice synthesis model 2.0 and Doubao voice replication model 2.0, enabling AI to replicate voices and understand emotional nuances [7][8]. - The new models can interpret user instructions regarding emotions, dialects, tones, and speech rates, significantly improving the quality of AI-generated speech [12][21]. Group 2: Contextual Understanding and Emotional Expression - The models can now incorporate context from previous dialogue, enhancing the coherence and emotional depth of the generated speech [12][23]. - The ability to accurately read complex formulas has improved, with the Doubao model achieving around 90% accuracy in reading complex formulas for school subjects, compared to less than 50% for similar models [24][25]. - The advancements allow for a more human-like interaction, moving from merely sounding human to truly understanding human emotions and context [11][41]. Group 3: Technological Innovations and Applications - The Doubao large model 1.6 has been upgraded to support adjustable thinking lengths, allowing users to balance effectiveness, latency, and cost [30][33]. - Huoshan Engine has introduced a Smart Model Router, which optimally matches user tasks with the most suitable models, significantly reducing costs by up to 71% in cost-prioritized modes [39][41]. - The technology has been applied in various commercial scenarios, enhancing user experiences in products from companies like Xiaomi and OPPO, and improving complex demand responses in platforms like Dongchedi [45][46]. Group 4: Growth and Infrastructure - The daily token usage of the Doubao large model has surged from 120 billion to over 30 trillion, marking a 253-fold increase in just over a year [47][48]. - This growth is supported by Huoshan Engine's robust AI cloud infrastructure, which provides the necessary computational power and high-quality data for model training and inference [48].
库克人在北京,安卓AiPhone 4499元贴脸开卖!
量子位· 2025-10-16 01:33
Core Viewpoint - The article discusses the launch of the Honor Magic8 series, highlighting its advanced AI capabilities and hardware improvements, positioning it as a competitive alternative to Apple's iPhone, particularly in the AI smartphone market [3][52]. Group 1: Product Features - The Honor Magic8 series was officially launched with a starting price of 4499 yuan [3]. - It features a new YOYO AI system that can learn and evolve, enhancing user interaction and functionality [4][24]. - The series includes a standard version and a Pro version, maintaining a familiar design while introducing new color options inspired by Song Dynasty ceramics [9][12]. Group 2: Battery and Performance - The Magic8 series boasts a battery capacity exceeding 7000mAh, the largest in Honor's history, along with 120W fast charging capabilities [15]. - It is powered by a 3nm processor and runs on MagicOS 10.0, with the Pro version achieving an AnTuTu score of over 4.28 million, setting a record in the smartphone industry [16][19][20]. Group 3: AI Capabilities - The YOYO AI system is enhanced with a self-evolving model, allowing it to learn from user interactions and improve over time [24][25]. - YOYO can assist with online shopping, making it easier for users to find the best deals [26][28]. Group 4: Camera System - The Magic8 Pro features a 200MP camera with advanced low-light capabilities and a new stabilization system to reduce motion blur [33][42]. - The camera system is designed to perform well in various lighting conditions, competing effectively with high-end models like the iPhone 17 Pro [38][40]. Group 5: Future Developments - Honor also introduced the MagicPad3 Pro tablet, which shares the same high-performance processor as the Magic8 series [45][46]. - A future AI terminal, the ROBOT PHONE, is expected to be unveiled in 2026, showcasing Honor's commitment to innovation in AI technology [50].
AI挖出癌症潜在新疗法!谷歌耶鲁联手突破免疫系统冷肿瘤难题
量子位· 2025-10-16 01:33
Core Viewpoint - The article discusses a significant advancement in cancer treatment through the collaboration between Google and Yale, focusing on a new AI model called Cell2Sentence-Scale 27B, which aims to enhance immune signals in cold tumors, a challenging area in cancer immunotherapy [1][2][4]. Group 1: AI and Cancer Treatment - The Cell2Sentence-Scale 27B model has been developed to identify drugs that can enhance immune signals in specific immune environments, addressing the issue of cold tumors that evade immune detection [4][12]. - The model has been made available to the research community, promoting collaboration and further research in the field [5]. Group 2: Cold Tumors Explained - Cold tumors are characterized by a lack of immune signals, making them difficult for the immune system to recognize and attack [7][10]. - Unlike hot tumors, which attract immune cells, cold tumors can suppress immune activity and disguise their presence [8][9]. Group 3: Model Testing and Findings - The model simulated two immune environments: one with low levels of interferon and another completely devoid of immune signals, testing over 4,000 drugs [14][16]. - The promising candidate identified was the CK2 inhibitor silmitasertib, which showed potential when combined with low-dose interferon to enhance antigen presentation, a critical step for immune recognition of tumors [16][17].
「重要性采样」并不「重要」?快手清华ASPO攻克重要性采样权重错配
量子位· 2025-10-15 10:20
Core Insights - Reinforcement Learning (RL) has become a crucial component in the post-training phase of Large Language Models (LLMs) like ChatGPT and DeepSeek [1] - A significant issue has emerged with the increasing scale of model parameters: the importance sampling (IS) mechanism may not be as beneficial as previously thought [2][5] - The research team from Kuaishou and Tsinghua University identified a deep-rooted "weight mismatch" phenomenon in existing supervised RL paradigms, leading to overconfidence in models and potential issues like entropy collapse and premature convergence [2][6] Importance Sampling Issues - Importance sampling is intended to correct the distribution differences between old and new policies, allowing models to reuse old data without deviating from the target distribution [5] - In small-scale RL, IS is effective; however, it fails in the context of supervised RL for large language models [6] - Experiments showed that in GRPO algorithms, IS did not provide the expected benefits and instead contributed to training instability [7] Weight Mismatch and Self-Reinforcing Loops - The research revealed that the advantage values in supervised RL are inaccurate, as different tokens contribute differently to the final answer [8] - The average IS weight for positive advantage tokens is higher than for negative ones, leading to a decrease in entropy [9] - IS in supervised RL algorithms has shifted from being a correction term to a token-level weight, causing a self-reinforcing loop that reinforces high-scoring tokens while neglecting low-probability ones [11][12] ASPO Algorithm Introduction - The proposed ASPO (Asymmetric Importance Sampling Policy Optimization) algorithm addresses these issues by inverting the IS weights for positive advantage tokens, allowing low-probability tokens to receive stronger updates [3][18] - ASPO incorporates a Dual-Clipping mechanism to manage extreme values resulting from the inverted weights, ensuring stability while maintaining effective gradient flow [20] Experimental Results - ASPO demonstrated significant advantages in various benchmarks, including mathematical reasoning and code generation tasks, outperforming traditional methods [24] - The average performance improvement was 12.5% for mathematical tasks and 17.0% for code generation tasks, with smoother training curves and reduced entropy collapse [26] - ASPO achieved notable results in the LiveCodeBench v5 benchmark, indicating its superiority over mainstream RL methods [26][27]
Sora2不够香了!这款国产AI视频模型已经能边看边生成,生成快还互动佳
量子位· 2025-10-15 10:20
Core Viewpoint - The article emphasizes that Baidu's Steam Engine has achieved a significant leap in AI video generation technology, moving from traditional short video creation to real-time, interactive, and long-form video production, thus redefining the creative process in AI video generation [5][9][44]. Group 1: Technological Advancements - Baidu's Steam Engine has become the first to achieve integrated audio and video generation in Chinese, marking a milestone in the AI video generation field [5][61]. - The model supports real-time interaction, allowing users to pause and modify video generation on-the-fly, which contrasts with existing models that require lengthy waiting periods for output [6][15][42]. - The introduction of autoregressive diffusion models enables low-cost, real-time generation and interaction, significantly enhancing the efficiency and quality of video output [45][47]. Group 2: User Experience and Accessibility - Users can generate long videos simply by uploading a single image and providing a prompt, drastically lowering the barrier to entry for video creation [18][56]. - The platform allows for real-time previews and modifications, enabling a more engaging and participatory creative process [49][56]. - The system's design caters to non-professionals, making it accessible for a broader audience without requiring extensive video editing skills [55][58]. Group 3: Market Position and Future Implications - Baidu's Steam Engine has positioned itself as a leader in the AI video generation market, achieving the highest score on the VBench-I2V global ranking for video generation models [61][62]. - The advancements signify a shift from fragmented video generation to continuous storytelling, indicating a new era in AI content creation that emphasizes collaboration and interactivity [63][64]. - The technology is expected to extend its applications across various sectors, including e-commerce, live streaming, education, and film production, enhancing the overall utility of AI-generated content [58][59].
AI玩拼图游戏暴涨视觉理解力,告别文本中心训练,无需标注的多模态大模型后训练范式
量子位· 2025-10-15 10:20
Core Insights - The article emphasizes the importance of a vision-centric approach in post-training for multimodal large models, highlighting the potential of visual self-supervised learning to enhance the understanding of visual information [1] - A novel post-training task called Visual Jigsaw is introduced, which focuses on reconstructing visual data without relying on additional annotations or visual generation modules [1] Visual Jigsaw Method Overview - Visual Jigsaw is a general task for reconstructing visual information by dividing data (images, videos, 3D) into segments and shuffling them, with the model's goal being to predict the correct order [5] - The training process utilizes a reinforcement learning algorithm called GRPO to optimize the model's performance [5] Reward Mechanism - A tiered reward system is designed for validating the model's predictions, where a correct prediction receives a reward of 1, partial correctness is rewarded proportionally with a discount factor, and invalid outputs receive no reward [6] Task Design for Different Visual Modalities - **Image Jigsaw**: Images are divided into equal-sized sub-images in 2D space, and the model must restore the correct spatial order [7] - **Video Jigsaw**: Videos are segmented into equal-length clips, and the model needs to reconstruct the original temporal order [8] - **3D Jigsaw**: RGB-D images are sampled for depth points, requiring the model to restore the order from near to far based on marked positions and shuffled indices [9] Experimental Results - The effectiveness of Visual Jigsaw was validated across various image, video, and 3D modalities, showing significant improvements in fine-grained perception and understanding, spatial understanding from monocular images, and compositional visual reasoning [10][11] - For **Image Jigsaw**, models showed stable improvements across multiple vision-centric benchmarks, enhancing fine-grained perception and understanding [10][11] - For **Video Jigsaw**, the method significantly improved the model's ability to understand temporal relationships and overall video comprehension [14] - For **3D Jigsaw**, notable enhancements were observed in depth estimation tasks and overall 3D spatial reasoning capabilities [15] Conclusion - Visual Jigsaw presents a lightweight, verifiable, and annotation-free self-supervised post-training paradigm that revitalizes visual perception in multimodal large models, encouraging further exploration of vision-focused self/weak supervision tasks [16]
波士顿动力狗gogo回来了!“五条腿”协同发力
量子位· 2025-10-15 10:20
Core Insights - The article discusses the advancements in Boston Dynamics' Spot robot, which can lift and manipulate a tire weighing 15 kg in just 3.7 seconds, showcasing its dynamic whole-body manipulation capabilities [3][31]. Group 1: Dynamic Whole-Body Manipulation - The method combines sampling and learning for dynamic whole-body manipulation, utilizing reinforcement learning and sampling-based control to enable coordinated tasks involving arms, legs, and torso [11][12]. - A hierarchical control approach is employed, dividing control problems into two complementary layers: a low layer for direct motor torque control and a high layer for task-specific strategies [12][13]. Group 2: Task Execution and Control Strategies - For tasks like tire alignment and stacking, the system uses sampling-based control to simulate potential future scenarios and discover optimal strategies [14]. - Reinforcement learning is applied to maintain stability during rolling tasks, capturing the necessary dynamic features and reactive control mechanisms [15][26]. Group 3: Performance and Efficiency - The Spot robot's performance in tire manipulation exceeds traditional static assumptions, demonstrating the ability to handle weights beyond its peak lifting capacity of 11 kg [35]. - The robot's dynamic coordination of movements allows it to efficiently perform tasks that were previously limited to slower, static methods [36][33]. Group 4: Simplification of Control Problems - Separating high-level and low-level control significantly simplifies the control challenges, allowing the high-level controller to focus on task completion without needing to reason about joint torques or stability constraints [37][38]. - The learned motion abstractions enable the high-level controller to operate in a simplified action space, enhancing computational feasibility and task execution efficiency [38].
人工智能年度榜单火热报名中!五大奖项,寻找AI+时代的先锋力量
量子位· 2025-10-15 10:20
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 产品榜 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 1、注册地在中国,或主营业务主要面向中国市场; 2、主营业务属于人工智能及相关产业,或已将人工智能广泛应用于主营业务,并在细分领域居于行业领先地位; 评选标准 : 2025 人工智能年度潜力创业公司 聚焦于中国人工智能领域创新创业力量,将评选出最具投资价值和发展潜力的AI创业公司, 参选条件 : 评选标准 : 3、具备成熟的产品或服务,已获得实际客户应用及市场认可; 4、近一年在技术 ...
腾讯发布超低成本AI训练法!120元效果秒杀70000元微调方案
量子位· 2025-10-15 06:27
Core Viewpoint - Tencent proposes a new method for upgrading large model agents called Training-Free GRPO, which significantly reduces costs and improves performance without the need for parameter tuning [1][5][11]. Group 1: Methodology - The Training-Free GRPO method allows for performance enhancement by learning from brief experiences embedded in prompts, eliminating the need for parameter adjustments [2][11]. - This approach maintains the model parameters in a frozen state while dynamically updating an external knowledge base to optimize performance [14][22]. - The method leverages the core logic of traditional GRPO but transforms it into a non-parametric reasoning process [13]. Group 2: Experimental Results - Experiments demonstrate that the DeepSeek-V3.1-Terminus model using Training-Free GRPO shows significant performance improvements in mathematical reasoning and web search tasks [4][25]. - Compared to fine-tuning a 32B model, Training-Free GRPO requires less training data and incurs lower costs, with a notable example being a cost of approximately $18 compared to over $10,000 for traditional methods [5][28]. - In the AIME24 and AIME25 tests, the model's performance improved from 80.0% to 82.7% and from 67.9% to 73.3%, respectively, showcasing a clear advantage with minimal training samples [28]. Group 3: Performance Evaluation - The method achieved a Pass@1 score of 67.8% on the WebWalkerQA benchmark, a significant increase from the baseline score of 63.2% [35]. - The results indicate that the learned experiences help the model avoid redundant tool calls and improve decision-making efficiency [31][30]. - The effectiveness of Training-Free GRPO is contingent upon the underlying model's reasoning and tool usage capabilities, as demonstrated by its lower performance on less capable models [40].
开源模型TOP5,被中国厂商包圆了
量子位· 2025-10-15 06:27
Core Insights - The article highlights the significant rise of Chinese open-source large models, with notable mentions of Alibaba's Qwen series and DeepSeek, which are expected to have a profound impact on the open-source community starting in the second half of 2024 [1][6][20]. Model Rankings - Chinese open-source models have moved from being followers to leaders in the field, as evidenced by their positions in the LMArena rankings, where models like GLM-4.6 and DeepSeek-v3.2 are closely following top proprietary models such as GPT-5 and Gemini-2.5-pro [7][10]. - Qwen3-max-preview has reached the top three in rankings, although it is not yet open-sourced [8]. Performance in Various Domains - In the text generation domain, Chinese models like DeepSeek-R1/V3.1 and GLM-4.6 are competing closely with leading proprietary models [10]. - In web development tasks, models such as DeepSeek-R1-0528 and Qwen3-Coder have also made it to the top ten [11]. - In the visual domain, Tencent's Hunyuan-vision-1.5 and Qwen3 are among the strongest open-source models, with Hunyuan-vision-1.5 still in the planning phase for open-sourcing [12]. Popularity and Downloads - Qwen3 is noted as one of the highest downloaded models, leading among open-source models when scaled to hundreds of billions of parameters [18]. - The most popular model currently is DeepSeek-R1, indicating strong user engagement and preference [17]. Industry Trends - The article suggests that the shift in dominance within the open-source model landscape is not just about who leads but may redefine the global innovation landscape [21]. - The driving force behind this momentum is increasingly recognized as coming from China, indicating a potential shift in the global AI development paradigm [20].