Workflow
量子位
icon
Search documents
无需重训练+即插即用+性能零损耗,蚂蚁集团×南洋理工首发微调安全框架,让模型既安全又高效
量子位· 2025-11-19 06:20
最近研究表明,模型的微调过程会严重削弱安全对齐能力,也就是说,模型能力越强反而越危险。 EnchTable团队 投稿 量子位 | 公众号 QbitAI 无需重新训练,也能一键恢复模型的安全意识了。 于是蚂蚁集团联合南洋理工大学针对性推出了模型安全对齐框架—— EnchTable ,可以让模型在微调后依旧保持安全意识。 通过 安全蒸馏+干扰感知融合 两大核心技术,在多个模型架构与任务中实现了安全与效用的最佳平衡,甚至在抗攻击能力上超越了官方 Instruct安全模型。 而且 即插即用 ,完全不影响模型性能。 详细内容如下: 安全对齐具有"可迁移性" 目前陆续出现了多起有关微调模型安全能力下降的事件,其根本问题在于当前的安全对齐机制无法随模型微调而持续生效。 对此,研究团队认为: 安全对齐 (Safety Alignment) 本身是一种具有高度可迁移性 (transferability) 的知识。 这意味着 不需要 在每个微调模型上都"重新学习"一遍安全,而是可以将"安全"作为一种独立的知识模块,从一个已对齐的模型中"提取"出来, 再"注入"到另一个模型中。 而这一发现则将问题从"昂贵的重新训练" 转变为"高效 ...
融资数亿、营收过亿!黄仁勋频频关注的具身赛道隐形冠军浮出水面
量子位· 2025-11-19 06:20
衡宇 发自 凹非寺 量子位 | 公众号 QbitAI 刚刚,一家AI公司的融资引发了圈内热议。 Why?因为它与具身智能息息相关,也与通往物理AI的世界模型密不可分。更准确来说,完成融资的这家公司是站在二者相关生态上的关键供 应链公司——仿真合成数据公司。 量子位最新获悉, 仿真合成数据公司光轮智能,刚刚完成数亿元A轮、A+轮融资 。 此次披露的投资方里,既有东方富海、九派资本等机构投资者,也有三七互娱、琥珀资本等产业方。老股东辰韬资本也持续加注。 而同样受关注的是它合作的客户,既有英伟达、谷歌、阿里、字节,也有Figure AI、1X Technology、智元机器人、银河通用,还有 Toyota,BOSCH、比亚迪、吉利…… 一己之力,串起了整个AI生态 。 有消息称,这家全球唯一专注仿真合成数据的技术公司, 营收已突破亿元大关 。 而作为全球首家把生成式AI融入仿真技术的公司, 光轮智能的创始人是圈内声名卓著的大佬谢晨 ——之前英伟达、Cruise及蔚来的仿真负责 人。 最近一次出圈,则因为与黄仁勋女儿Madison Huang的首秀对谈,谈论的话题还是风口上的物理AI…… 物理AI是黄仁勋在2025年 ...
文献、报告、合同翻译的老大难被国产工具治了?三大翻译神器横评后,这家稳得离谱
量子位· 2025-11-19 06:20
Core Viewpoint - The article discusses the advantages of Baidu's "Document Translation" tool, particularly in academic settings, highlighting its superior translation accuracy, formatting preservation, and integrated AI assistance compared to competitors like Google Translate and DeepL [1][3][59]. Translation Capability Comparison - Baidu's "Document Translation" offers specialized translation models for over 10 professional fields, including academic papers, legal documents, and news, making it more user-friendly for specific needs compared to Google and DeepL, which lack such differentiation [8][17]. - The tool boasts a professional translation accuracy rate of 90%, effectively capturing the nuances of academic language, which is crucial for users dealing with complex terminologies [17][22]. AI Assistance Features - The integrated AI assistant in Baidu's tool can summarize content, answer specific questions about the text, and provide explanations for technical terms, enhancing the user experience significantly [26][30][36]. - Users can interact with the AI to clarify difficult sections of the text, making the translation process more intuitive and less daunting [28][32]. Formatting and Editing Capabilities - Baidu's "Document Translation" excels in maintaining the original document's formatting, achieving a near 1:1 restoration of the layout, which is critical for academic papers that often include complex structures like tables and figures [43][46]. - The tool allows for extensive post-translation editing, enabling users to modify text directly within the translated document, which is not supported by DeepL and is limited in Google Translate [52][55]. Overall User Experience - The comprehensive features of Baidu's translation tool cater to the needs of students and professionals, making it a preferred choice for those who require efficient and accurate translations without the hassle of manual corrections [57][58]. - The article concludes that Baidu's "Document Translation" is the closest to an ideal translation tool, effectively integrating into the workflow of users in academic and professional environments [59][60].
Gemini 3打服奥特曼马斯克,谷歌CEO却在担心AI泡沫
量子位· 2025-11-19 05:02
Core Viewpoint - Google CEO Sundar Pichai has expressed concerns about the potential "AI bubble," indicating that the current investment frenzy in AI may contain "irrational factors" and that no company will be immune if the bubble bursts [3][29]. Group 1: AI Investment Trends - Major tech companies are significantly increasing their investments in AI, with Meta projecting capital expenditures between $70 billion and $72 billion for 2025, up from previous estimates [10]. - Microsoft reported a total capital expenditure of $34.9 billion as of September 30, exceeding analyst expectations and previous quarter figures [15]. - Alphabet, Google's parent company, has raised its revenue forecast for the year from $85 billion to between $91 billion and $93 billion, nearly double its projected capital expenditures for 2024 [18]. Group 2: AI Valuations and Financial Performance - Nvidia has become the first company to surpass a market capitalization of $5 trillion, highlighting the financial impact of the AI boom [20]. - OpenAI's valuation has surged to $500 billion following a secondary share sale, reflecting a 67% increase from its previous valuation of $300 billion [23]. - Despite its high valuation, OpenAI reported a quarterly loss of $11.5 billion, which has negatively impacted Microsoft's financial performance, reducing its net profit and EPS by $3.1 billion and $0.41 per share, respectively [24][26]. Group 3: Cautionary Perspectives - Pichai has drawn parallels between the current AI investment climate and the internet bubble of 2000, suggesting that while there is excitement around AI, it is essential to recognize the potential for over-investment [29]. - He emphasized the importance of not blindly trusting AI outputs and advocated for the use of additional tools, such as Google Search, to verify information [33][35]. - Pichai acknowledged the rapid pace of technological development and the need for responsible management of its potential harmful effects, stating that companies must be both bold and responsible [37].
周靖人署名,通义实验室开源智能体自进化系统:让模型学会“自我反思”,14B也能越级打怪
量子位· 2025-11-19 05:02
Core Insights - The article discusses the launch of AgentEvolver, a self-evolving intelligent agent system developed by Alibaba, which significantly enhances the performance of AI models in complex tasks [2][4]. Performance Improvement - AgentEvolver has improved the average completion rate of a 14B model from 29.8% to 57.6%, nearly doubling its performance [4]. - In a smaller 7B model, the average completion rate increased from 15.8% to 45.2%, demonstrating the framework's versatility across different model sizes [5]. - The system has shown the ability to outperform larger models (e.g., 32B models) in specific tasks after optimization [5]. Learning Efficiency - AgentEvolver exhibits rapid convergence in learning efficiency, requiring significantly fewer training steps to reach 90% of baseline model performance—55.6% fewer steps in AppWorld tasks and 66.7% fewer in BFCL tasks [7][8]. - This efficiency leads to reduced training time and computational costs [8]. Cross-Domain Generalization - Models trained on synthetic data maintain high performance when applied to new, unseen domains, indicating strong cross-domain generalization capabilities [9][11]. - For instance, a model trained on AppWorld tasks performed well on BFCL tasks with minimal performance degradation [10]. Self-Evolution Mechanism - AgentEvolver utilizes a data-exploration-feedback automated process to achieve self-evolution, driven by three core mechanisms: self-questioning, self-navigating, and self-attributing [13][20]. - The self-questioning mechanism allows the system to generate challenging tasks autonomously, breaking reliance on external data [21][23]. - The self-navigating mechanism enhances exploration efficiency by leveraging past experiences to guide current decision-making [24][28]. - The self-attributing mechanism provides fine-grained feedback on each action taken, improving sample efficiency in strategy optimization [30][33].
谷歌抢跑L3级AI,Gemini连续工作40分钟,Agent自动生成评审百条创意
量子位· 2025-11-19 01:37
Core Insights - Google is advancing towards L3 AI with its Gemini system, which can autonomously execute tasks for extended periods, marking a significant step in AI development [27][30][32]. Group 1: Gemini's Capabilities - Gemini can continuously operate for 40 minutes on a single task, showcasing its ability to handle complex processes [2][19]. - The system generates over 100 creative ideas based on user input, which are then evaluated and ranked by multiple agents, providing structured feedback [3][15]. - Users only need to make final decisions, as the exploration and iteration processes are managed by the agents, significantly reducing the time spent on refining outputs [4][11]. Group 2: Multi-Agent System - The multi-agent competition system integrates long-term thinking and adversarial generation, enhancing the quality of outputs by utilizing time effectively [10][12]. - This system allows for a comprehensive generation, competition, and selection process, resulting in a well-rounded set of ideas presented to users [15][20]. - Gemini for Enterprise includes applications for creative generation and collaborative research, demonstrating its versatility in different contexts [18][26]. Group 3: Future of AI - The development of L3 AI is characterized by the ability to autonomously run tasks over extended periods, with Gemini's capabilities aligning closely with this definition [30][32]. - Speculations suggest that future agents may be able to operate for even longer durations, potentially up to 3 hours by next year [33]. - As collaborative research features evolve, Gemini may reach L4 AI status, further enhancing its capabilities [37].
谷歌Gemini 3把GPT-5.1打成计量单位!马斯克奥特曼都服了
量子位· 2025-11-19 01:37
Core Insights - Google Gemini 3 Pro shows significant advancements over its predecessor, Gemini 2.5 Pro, outperforming GPT-5.1 and Claude 4.5 in nearly all benchmark tests, including academic reasoning and visual reasoning puzzles [1][2]. Benchmark Performance - In "Humanity's Last Exam," Gemini 3 Pro scored 37.5% without tools and 45.8% with search and code execution, compared to 21.6% for Gemini 2.5 Pro [2]. - For the ARC-AGI-2 visual reasoning puzzles, Gemini 3 Pro achieved 31.1%, a substantial increase from 4.9% in Gemini 2.5 Pro [2]. - In mathematics, Gemini 3 Pro scored 95.0% in AIME 2025 without tools and achieved a perfect score of 100% with code execution [2]. - The LiveCodeBench Pro benchmark saw Gemini 3 Pro with an Elo Rating of 2,439, significantly higher than Gemini 2.5 Pro's 1,775 [2]. Model Evolution - The Gemini series has evolved significantly, with each generation addressing the shortcomings of the previous one. The first generation established multimodal capabilities, while the second focused on decision-making and planning [15][18]. - Gemini 2.5 introduced a reasoning engine for deeper reasoning and problem-solving, leading to the current generation, which integrates multimodal, reasoning, and agent capabilities [19][20]. User Interaction and Usability - Gemini 3 Pro is designed to understand user intent better, allowing for more straightforward interactions without the need for complex prompts [21]. - The model can seamlessly process text, images, videos, audio, and code, enhancing its usability across various applications [23]. Development Platform - Google introduced the Antigravity platform alongside Gemini 3 Pro, aimed at simplifying the development process for AI agents, allowing developers to focus on higher-level tasks [29][33]. - Antigravity supports multiple models, including third-party options, and has attracted significant developer interest due to its generous rate limits [33]. Future Developments - A more advanced version, Gemini 3 Deep Think, is in development, promising further enhancements in capabilities [13][14].
30秒,我用蚂蚁灵光复刻了个支付宝(doge)
量子位· 2025-11-18 09:00
Core Viewpoint - Ant Group has launched a new all-modal general AI assistant called Lingguang, which aims to provide a comprehensive solution for generating interactive applications and content across various formats, including 3D, audio, video, and more [1][3]. Group 1: Features of Lingguang - Lingguang allows users to create a personalized app in as little as 30 seconds, offering editable, interactive, and shareable content [3]. - The app includes three main functionalities: Lingguang Dialogue, Lingguang Flash Apps, and Lingguang Open Eye [5]. - The Lingguang Dialogue feature simplifies complex questions into clear answers, showcasing its ability to generate structured and visually appealing content [7][11]. Group 2: User Experience - Users can interact with the app to generate content, such as a comprehensive guide on world models, with well-organized text and visuals [10][11]. - The app supports various creative modifications, allowing users to animate images and customize styles easily [13][15]. - The Lingguang Flash Apps feature enables users to create simple applications, such as a virtual pet simulator, demonstrating the app's playful and engaging nature [19][20]. Group 3: Technical Aspects - Lingguang utilizes a multi-agent architecture for content generation, where each modality has a dedicated agent to collaborate dynamically [35]. - The app's coding capabilities are designed to be user-friendly, connecting static front-end interfaces with backend model calls [37]. - The Lingguang Open Eye feature employs AGI camera technology for real-time object recognition and understanding, supporting various creative modes [39]. Group 4: Comparison with Qianwen - Lingguang and Qianwen, both from Ant Group, differ significantly in their underlying models and focus areas; Lingguang emphasizes all-modal generation and lightweight applications, while Qianwen focuses on traditional dialogue scenarios and deep thinking capabilities [40][42]. - Lingguang is more suited for diverse interactions, while Qianwen is better for text processing and office workflow assistance [42][43]. Group 5: Ant Group's Strategic Direction - Ant Group is expanding its presence in the AGI space, with initiatives like the AI Medical Assistant AQ and the establishment of Lingbo Technology for robotics and AI interaction [44][47]. - The company aims to transform into an AI-driven tech firm, leveraging its financial services expertise while focusing on low-threshold, multi-modal applications for consumer-facing scenarios [50].
2025年度AI落地案例征集|量子位智库
量子位· 2025-11-18 09:00
Core Insights - The article emphasizes the transformative potential of AI technology in enhancing social innovation, production efficiency, and quality of life [1] - It highlights the need for precise identification of application areas and timely insights to leverage the benefits of AI advancements [2] Group 1: AI Trends and Reports - The "Top Ten Trends Series Report" has been published annually for five years, summarizing and forecasting technology trends, and is recognized as a key reference in the tech industry [3] - Starting in 2024, the report will focus on identifying ten AI trends that are showing significant potential, including developments in new architectures, reasoning capabilities, world models, spatial intelligence, and multi-modal applications [3] - The report aims to help stakeholders recognize technological changes and engage in innovation, thereby riding the wave of transformation [3] Group 2: Collaboration and Participation - The report seeks to involve more technology partners from various sectors, including research, investment, and entrepreneurship, to share insights and predictions about the AI field [7] - Participating partners will be recognized as official collaborators in the "2025 Annual AI Trends Report," gaining media exposure and acknowledgment for their products and cases [8] - The report is set to be released at the "2026 MEET Intelligent Future Conference" in December [9] Group 3: Call for Contributions - The article invites contributions from various entities, including research institutes, venture capital firms, and tech startups, to share their insights on AI trends and noteworthy institutions, products, and cases [10] - The deadline for submissions is November 20, 2025 [12]
AI视频进入“加速度”时代:30%加速+细节随手P,等等党和抽卡党都有救了!
量子位· 2025-11-18 06:00
Core Insights - The article discusses the launch of the upgraded version of "拍我AI" (PixVerse) called V5 Fast, which significantly enhances video generation speed and introduces a new editing feature called "Modify" that allows for real-time video editing and customization [7][49][51]. Group 1: Video Generation and Editing Features - The V5 Fast version improves video generation speed by over 30%, allowing users to produce a 5-second high-definition video in under a minute [49][50]. - The "Modify" feature enables users to make precise edits to videos without needing to regenerate the entire content, addressing a major pain point in the current AI video market [9][10][13]. - Users can now replace elements within videos, such as characters and backgrounds, while maintaining the overall consistency and quality of the original footage [18][20][23]. Group 2: Market Demand and User Experience - The demand for editable AI video content has become a pressing need in the market, as traditional AI video tools often require complete regeneration for minor changes [8][9]. - The new capabilities allow both professional content creators and everyday users to have greater control over their video projects, making the technology more accessible [45][51]. - The article emphasizes that AI video creation is evolving from a one-time generation process to a more iterative and user-friendly experience, enabling users to refine and enhance their videos easily [45][52]. Group 3: Company Growth and Industry Impact - The company behind "拍我AI" has seen significant growth, with over 100 million users and a monthly active user count exceeding 16 million, reflecting its rapid commercialization and adoption [51]. - The recent funding round of 100 million RMB and multiple model iterations highlight the company's commitment to innovation and maintaining a competitive edge in the AI video generation space [50][51]. - The advancements in video generation speed and editing capabilities position the company as a leader in the AI video market, catering to both individual and commercial needs [51][52].