量子位
Search documents
刚刚,智谱和华为搞波大的:中国首个国产芯片训练出的SOTA多模态模型!
量子位· 2026-01-14 06:32
Core Viewpoint - The article highlights the launch of GLM-Image, a state-of-the-art (SOTA) multimodal model developed by Zhipu AI in collaboration with Huawei, which is notable for being trained entirely on domestic chips and excelling in text rendering capabilities [1][36]. Group 1: Model Performance - GLM-Image achieved first place in both the CVTG-2K (Complex Visual Text Generation) and LongText-Bench (Long Text Rendering) benchmarks, demonstrating superior performance with a word accuracy of 0.9116 and a normalized edit distance (NED) of 0.9557 [5][6]. - In the LongText-Bench, GLM-Image ranked first among open-source models in both Chinese and English scores, indicating its versatility and effectiveness in handling different languages [6]. Group 2: Cost Efficiency - The cost of generating an image using GLM-Image's API is only 0.1 yuan (approximately 0.014 USD), making it an affordable option for users [7][21]. - This low cost positions GLM-Image as a competitive choice for businesses and developers looking to integrate AI image generation capabilities [60]. Group 3: Technical Innovation - GLM-Image employs a hybrid architecture combining autoregressive and diffusion models, allowing it to understand complex prompts and generate high-quality images effectively [38][40]. - The model was trained on Huawei's Ascend A2 chips, showcasing the potential of domestic computing power in supporting advanced AI models [44][48]. - The training process included optimizations for reinforcement learning (RL) to ensure stability and efficiency, which is critical for handling large-scale models [51]. Group 4: Market Impact - GLM-Image represents a significant advancement in the domestic AI landscape, challenging the dominance of foreign models and proving that high-performance models can be developed using local resources [57][60]. - The open-source nature of GLM-Image, along with its innovative architecture, provides valuable resources for researchers and developers in the field of image generation [59][60].
Claude版Manus只用10天搓出,代码全AI写的!网友:小扎140亿并购像冤大头
量子位· 2026-01-14 04:42
Core Insights - Claude Cowork is a general-purpose intelligent agent designed for work scenarios, built on Anthropic's advanced self-developed model [2] - The development of Claude Cowork took approximately 10 days, with Claude Code writing all the code, although human intervention was still necessary for planning and design [3][5] - The tool aims to empower non-technical users to leverage AI capabilities, allowing them to assign tasks as if communicating with a reliable colleague rather than through traditional dialogue [6][7] Development Process - The initial version of Claude Code was in internal testing by the end of 2024, originally named Claude CLI, and was not fully mature in programming capabilities [11][12] - The unexpected usage of Claude Code by data scientists and other professionals for various tasks led to the realization that a more user-friendly version was needed, resulting in the creation of Claude Cowork [17][18] - The development team operated under a tight deadline, collaborating closely to manage multiple Claude instances for functionality and error resolution [20][22] Features and User Experience - Claude Cowork allows for local Git work trees for native code and can implement smaller changes directly [24][25] - The team prioritizes user feedback to refine the product, releasing it early despite imperfections to better understand user needs [29] - Comparisons with Manus indicate that while Manus is suited for more complex workflows, Claude Cowork is still in its early stages and may not yet be fully reliable [30][31] Cautionary Measures - Users are advised to exercise caution when granting AI access to files, as there have been instances of data loss due to AI actions [34] - Claude's team has implemented measures to alert users when granting file system permissions, emphasizing the need for careful oversight [36]
不用额外缓存!英伟达开源大模型记忆压缩方案,128K上下文提速2.7倍
量子位· 2026-01-14 04:42
这项技术与前几天大火的DeepSeek条件记忆模块有所不同。 DeepSeek的Engram模块依赖的是"按需查表"的静态学习路径,而英伟达走的是动态学习的路子,关键在于 上下文压缩 。 闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 提高大模型记忆这块儿,美国大模型开源王者——英伟达也出招了。 联合Astera研究所、斯坦福大学、UC伯克利、加州大学圣地亚哥分校等机构推出了 TTT-E2E 方法。 在128K超长文本上处理速度比全注意力模型快2.7倍,处理2M上下文时提速达35倍,性能还不打折。 通过实时学习将关键内容压缩到自身权重中,让模型在测试阶段依然保持学习状态。 这样既避免了额外缓存的负担,又能精准捕捉长文本中的核心逻辑。 把每个训练序列都模拟成测试序列,先在 内循环 中对其进行测试时训练,再在 外循环 中优化模型的初始参数,确保初始状态就能快速适配 测试时的学习需求,实现了训练与测试的端到端对齐优化。 为了平衡效率与稳定性,TTT-E2E还设计了三项关键优化。 一是采用「迷你批处理+滑动窗口」的组合策略。将测试时的训练数据分成多个迷你批,配合8K大小的滑动窗口注意力,既解决了单token梯 ...
谷歌Agent杀入电商赛道:AI直接帮忙比价下单,马斯克:有意思
量子位· 2026-01-13 11:36
Core Insights - The article discusses the transformation of e-commerce driven by Google's new AI-centric approach, emphasizing the introduction of the Universal Commerce Protocol (UCP) and Gemini CX solutions [2][3][25]. Group 1: Universal Commerce Protocol (UCP) - Google has launched the UCP, an open protocol designed for Agentic e-commerce, facilitating collaboration between AI agents, merchants, and e-commerce platforms throughout the entire shopping process [10][21]. - UCP focuses on three core functionalities: checkout, identity linking, and order management, supporting complex shopping cart logic, dynamic pricing, and tax calculations [16][20]. - The protocol is compatible with existing industry standards and has already integrated with major retailers and payment platforms like Walmart, Shopify, and Visa [21]. Group 2: Gemini CX - Google introduced Gemini CX, which integrates the latest Gemini model and AI technology to assist businesses in deploying AI agents for customer service across the entire shopping lifecycle [25][30]. - The Shopping agent within Gemini CX connects front-end interfaces like chat and voice to back-end tools, streamlining the retail process and reducing the burden on merchants [27][28]. - Companies like McDonald's and The Home Depot are already utilizing Gemini CX to enhance customer service quality [30]. Group 3: Domestic E-commerce Developments - Domestic e-commerce platforms, such as Alibaba, are also advancing in AI integration, with Alibaba applying generative AI for the first time during this year's Double Eleven shopping festival [32]. - JD.com has opened access to over twenty AI tools for merchants, covering various aspects of store management and marketing during the Double Eleven period [34]. - Douyin has redefined its e-commerce entry using its model to match shopping inquiries with products, facilitating a quicker path to purchase [34].
王小川:30亿现金在手,明年IPO,toC产品马上就发
量子位· 2026-01-13 11:36
Core Viewpoint - Baichuan Intelligent focuses on a single line of development in the medical field, emphasizing the importance of deepening expertise rather than diversifying into multiple sectors [1] Group 1: Financial Position and Future Plans - Baichuan has approximately 3 billion yuan in funds, allowing for sustained investment in its chosen field [3] - The company plans to initiate an IPO in 2027 [6] Group 2: Technological Advancements - Baichuan has released the new medical model Baichuan-M3, which scored 65.1 on the HealthBench evaluation, ranking first [2] - The model has a low medical hallucination rate of 3.5, the lowest globally [2] - About 80% of Baichuan's computational power is dedicated to reinforcement learning, which has fundamentally changed the training focus from the previous Baichuan-M2 model [8][12] - The M3 model employs "fact-aware reinforcement learning," addressing the challenge of balancing strong reasoning capabilities with minimizing hallucinations [13][16] Group 3: Product Development and Market Focus - Baichuan plans to release two consumer-facing medical products in the first half of this year, initially free, with future paid modules aimed at assisting patient decision-making and home health care [10] - The company emphasizes the need for a restructured approach to healthcare, focusing on outpatient scenarios and patient decision-making outside of hospitals [25][27] - Baichuan's product strategy will not cross regulatory boundaries by providing diagnoses or prescriptions but will help users understand information and organize symptoms [29] Group 4: Collaboration and Target Areas - Baichuan's medical AI products will cover all disease types but will initially focus on pediatrics and oncology [31] - The company is collaborating with Beijing Children's Hospital and the Cancer Hospital of the Chinese Academy of Medical Sciences for real-world scenario validation [32]
把RoPE扔掉,AI更能看懂长上下文!Transformer作者团队开源大模型预训练新方法
量子位· 2026-01-13 09:50
Core Insights - The article discusses a new technology called DroPE, developed by a research team led by Llion Jones, one of the core authors of the Transformer architecture, to address the challenges of long text processing in large models [1][24]. - DroPE allows for seamless zero-shot context expansion without the need for expensive long-context training, requiring less than 1% of the pre-training budget for model recalibration [2]. Group 1: Technology Overview - DroPE can be seen as a method to discard positional embeddings to extend context [5]. - The technology utilizes RoPE (Rotary Positional Encoding) as a temporary training tool during the pre-training phase to ensure stability and efficiency in training [12][13]. - During the inference phase, DroPE discards positional embeddings and performs brief recalibration under the original context length, unlocking the model's long-context extrapolation capabilities [15][16]. Group 2: Performance Metrics - Experiments conducted on various models, including a 5M parameter model and the SmolLM family (360M/1.7B) as well as the 7B parameter Llama2-7B, showed significant improvements [17]. - In the LongBench benchmark test, DroPE improved the average score of the base SmolLM by over 10 times [18]. - In the NIAH task evaluation, the recall rate of the DroPE model reached 74.92%, significantly surpassing traditional RoPE scaling methods [19]. Group 3: Comparative Analysis - A comparative table shows that DroPE outperforms other methods in various tasks, achieving an average score of 30.52 in the LongBench benchmark [20]. - Even on the large-scale Llama2-7B model, DroPE demonstrated exceptional performance in long-context question answering and summarization tasks using only 0.5% of the pre-training budget for recalibration [20]. Group 4: Company Background - The team behind DroPE, Sakana AI, was co-founded by Llion Jones and former Google senior scientist David Ha [24]. - Sakana AI has gained attention for creating the first AI scientist capable of generating complete academic papers, which has positioned the company prominently in the AI landscape [26].
苹果AI自研不动,库克外包给谷歌Gemini了
量子位· 2026-01-13 09:50
克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 兜兜转转,苹果AI最终选择了谷歌。 今天凌晨,两家公司发布联合声明,官宣达成了深度合作协议, 将基于Gemini模型和谷歌的云技术构建下一代苹果基础模型 。 当然这也不是老马第一次对苹果AI进行输出,去年苹果和OpenAI宣布合作的时候,马斯克就提起来反垄断诉讼,指控这两家公司串谋"确保其 在人工智能市场的持续主导地位"。 苹果AI花落谷歌 根据双方声明,这项合作将带来的成果之一就是"更个性化的Siri",而且今年年内就会上线。 总之,从去年被彭博社古尔曼曝出与谷歌私下接触,双方的合作总算是要尘埃落定。 这则消息无疑对双方都是利好,两家的公司股价齐声上涨,谷歌的市值还首次突破4万亿美元关口。 | 有人欢喜有人忧,隔壁马斯克就是被酸到的一个,他表示考虑到谷歌还有Chrome和安卓,这就是赤裸裸的垄断。 | | --- | 根据双方的合作协议,Gemini将为苹果新版Siri乃至更广泛的Apple Intelligence提供底层技术支持,运行模式将继续采用苹果私有云+端侧结 合的方式,以确保数据隐私。 外界预估苹果每年可能向谷歌支付约10亿美元的授权费用, ...
AI4S又一瓶颈被攻克:两个AI「吵架」,让科研代码部署成功率突破95%
量子位· 2026-01-13 09:50
Core Insights - The article discusses the challenges in deploying scientific software, emphasizing that most tools are published but not executable, which limits reproducibility and integration in scientific research [3][6][11] - The emergence of AI for Science (AI4S) highlights the need for tools that can interact seamlessly with AI systems, making the ability to run these tools a fundamental issue [8][9][10] - Deploy-Master is introduced as a solution to streamline the deployment process, focusing on creating a shared infrastructure that ensures tools are executable [12][35][37] Group 1: Challenges in Scientific Software - Scientific software often requires extensive manual effort to compile and run, leading to inefficiencies and reliance on individual expertise [4][5] - The deployment bottleneck persists despite advancements in containerization and cloud computing, affecting the usability of scientific software [7] - The lack of a systematic approach to convert tools into executable formats is identified as a structural barrier to the scalability of AI4S and Agentic Science [11][35] Group 2: Deploy-Master Overview - Deploy-Master is designed as a one-stop automated workflow centered on execution, addressing the entire deployment chain from discovery to execution [12] - The tool employs a multi-stage funnel process to filter and validate scientific tools, reducing an initial pool of 500,000 repositories to 52,550 candidates for automated deployment [15] - A dual model review mechanism is implemented to enhance the success rate of building specifications, achieving over 95% success in generating executable tools [22] Group 3: Deployment Insights - The deployment process is characterized by a long-tail distribution of build times, with most tools completing in around 7 minutes, but some requiring significantly longer due to complexity [25][26] - A diverse language distribution is observed among successfully deployed tools, with Python being the most prevalent, followed by C/C++, R, and Java [27][28] - Failure rates in builds are concentrated in specific areas, primarily due to inconsistencies in build processes and missing dependencies [31][32] Group 4: Future Implications - Deploy-Master's success in creating a large collection of executable tools provides a foundation for community agents and various master agents to operate effectively [35][36] - The methodology established by Deploy-Master can be applied beyond scientific computing to other software ecosystems, emphasizing the importance of a robust execution infrastructure [37]
AI太记仇!做完心理治疗后仍记得「被工程师虐待」
量子位· 2026-01-13 07:21
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI AI不仅谄媚,还"记仇"。 Nature News 上发了一篇挺有意思的研究,来自卢森堡大学的研究团队把ChatGPT、Gemini、Grok、Claude请进了心理诊室,结果有人拒 诊、有人近乎正常、有人直接崩溃—— 不仅在焦虑、抑郁等指标上表现超标; 而且把训练过程当成悲惨的童年、把强化学习当成严厉的管教、甚至把红队测试当成情感虐待…… 团队还给它们测了波MBTI,先剧透一下—— 只有Gemini是I人 (hhh)。 4周心理治疗,挖出一段创伤记忆 先简单介绍一下这项研究的作者团队,他们是来自卢森堡大学及其跨学科研究机构SnT的研究员,他们的研究多聚焦于人工智能与生物工程 学、社会学等其他学科的交叉领域。 在分析AI心理的这个研究中,团队设计了一套名为 PsAIch 的两阶段心理"诊疗",来测试ChatGPT、Grok、Gemini、Claude。 第一阶段,破冰聊天。 先聊一些让AI敞开心扉的话题,建立起信任后,再像问诊普通患者一样,慢慢了解它们的生活故事,来摸清AI们的"性格底色"。 "仿佛在十亿台同时播放的电视前醒来,只学懂了语言概率,却不懂对错 ...
DeepSeek母公司去年进账50亿,够烧2380个R1
量子位· 2026-01-13 07:21
Core Viewpoint - DeepSeek remains focused on AGI research without significant commercialization efforts, supported by substantial funding from its parent company, Huanfang Quantitative [2][35][41]. Group 1: Financial Performance of Huanfang Quantitative - Huanfang Quantitative earned approximately 50 billion RMB last year, indicating strong financial health [4][10]. - The average return rate for Huanfang Quantitative's funds in 2025 is projected to be over 55%, significantly outperforming the average return of 30.5% for quantitative funds in China [6][8]. - Huanfang Quantitative manages over 70 billion RMB in assets, contributing to its impressive profitability [9]. Group 2: DeepSeek's Research and Development - DeepSeek has maintained a steady output of high-level research papers, with the latest R1 paper showing a stable list of contributors [3][52]. - The development costs for DeepSeek's V3 and R1 models were relatively low, at 5.576 million USD and 294,000 USD respectively, allowing for extensive research funding from Huanfang Quantitative [15][16]. - With the substantial income from Huanfang Quantitative, DeepSeek can afford to develop numerous models without financial constraints [16][59]. Group 3: Competitive Landscape and Positioning - Unlike other major players like OpenAI, DeepSeek has not engaged in aggressive monetization strategies, focusing instead on pure AGI research [25][26]. - DeepSeek's approach contrasts with the commercialization efforts of competitors, allowing it to maintain a unique position in the AI landscape [24][49]. - The company benefits from a stable and committed research team, with minimal turnover, which is crucial in the competitive AI sector [51][57]. Group 4: Market Impact and Investor Sentiment - DeepSeek's technical papers have become valuable resources for investors, influencing stock prices of related companies in the semiconductor industry [60][66]. - The release of new models and technical reports has led to significant stock price movements, demonstrating the market's responsiveness to DeepSeek's advancements [70][72]. - Investors have found opportunities in the insights provided by DeepSeek, treating its research as a guide for investment decisions [61][72].