量子位 - filings, earnings calls, financial reports, news

量子位

Search documents

量子位· 2026-01-14 08:10

金磊发自凹非寺量子位 | 公众号 QbitAI 感谢AI！原生1个G的视频，现在只需要传200K数据就能看了—— 视频数据的压缩率干到了 0.02% ，但依旧能保持画面的高清、连贯和画面细节。或许你会问，这又有什么用呢？想象一下，你身处于太平洋的一搜远洋货轮中，卫星信号只有一两格，刷个朋友圈，加载内容的圈圈都要转好久。但正是因为有了这项AI技术，现在在如此极端的环境之下，你甚至可以直接看高清的世界杯直播！而这项新研究，正是来自中国电信人工智能研究院（TeleAI）的技术—— 生成式视频压缩（GVC，Generative Video Compression）作为国资央企、全球领先的综合智能信息服务运营商，中国电信不仅拥有覆盖海陆空天的通信网络基础设施，更具备将前沿AI技术与实际通信场景深度融合的能力。这种"云网融合+AI原生"的独特优势，使得GVC技术从实验室走向远洋船舶、应急现场等真实极端环境成为可能。那么这项研究到底是如何做到的，以及又能给我们现实生活带来什么改变，我们继续往下看。。没错，视频传输的物理法则，算是被重写了。用计算，换宽带在介绍这项黑科技之前，我们需得先聊聊现 ...

谷歌也要「AI抖音」了！新Veo 3.1原生支持竖屏，4K分辨率高画质

量子位· 2026-01-14 08:10

Core Viewpoint - Google has officially entered the AI short video arena with the upgrade of Veo 3.1, enhancing video generation quality and introducing vertical and 4K formats [1][11][12]. Group 1: Features of Veo 3.1 - The upgraded Veo 3.1 allows users to generate videos from a single vertical image and a simple prompt, showcasing creative capabilities [3][14]. - It supports native 9:16 vertical video format optimized for mobile platforms like YouTube, and has increased resolution from 720p to 4K [15][12]. - The model has significantly improved consistency, ensuring characters maintain their appearance across different scenes [16][26]. - Element fusion capabilities have been enhanced, allowing for coherent video generation from simple descriptions of characters, objects, and backgrounds [20][21]. Group 2: Market Context and Competition - Google is not the first to pursue vertical AI video; competitors like OpenAI and Disney have also made strides in this area [33][40]. - OpenAI's Sora app, which mimics TikTok, faced challenges with user retention, highlighting operational difficulties in managing such platforms [36][37]. - Google benefits from its comprehensive operational capabilities, leveraging platforms like YouTube to create a closed-loop ecosystem for content creation and distribution [38][39]. Group 3: Industry Trends - The trend towards vertical AI video is becoming increasingly evident, with various players in the industry recognizing its importance [42][43]. - Domestic AI players in China are also exploring similar video generation applications, indicating a growing interest in this format [44][46].

刚刚，智谱和华为搞波大的：中国首个国产芯片训练出的SOTA多模态模型！

量子位· 2026-01-14 06:32

Core Viewpoint - The article highlights the launch of GLM-Image, a state-of-the-art (SOTA) multimodal model developed by Zhipu AI in collaboration with Huawei, which is notable for being trained entirely on domestic chips and excelling in text rendering capabilities [1][36]. Group 1: Model Performance - GLM-Image achieved first place in both the CVTG-2K (Complex Visual Text Generation) and LongText-Bench (Long Text Rendering) benchmarks, demonstrating superior performance with a word accuracy of 0.9116 and a normalized edit distance (NED) of 0.9557 [5][6]. - In the LongText-Bench, GLM-Image ranked first among open-source models in both Chinese and English scores, indicating its versatility and effectiveness in handling different languages [6]. Group 2: Cost Efficiency - The cost of generating an image using GLM-Image's API is only 0.1 yuan (approximately 0.014 USD), making it an affordable option for users [7][21]. - This low cost positions GLM-Image as a competitive choice for businesses and developers looking to integrate AI image generation capabilities [60]. Group 3: Technical Innovation - GLM-Image employs a hybrid architecture combining autoregressive and diffusion models, allowing it to understand complex prompts and generate high-quality images effectively [38][40]. - The model was trained on Huawei's Ascend A2 chips, showcasing the potential of domestic computing power in supporting advanced AI models [44][48]. - The training process included optimizations for reinforcement learning (RL) to ensure stability and efficiency, which is critical for handling large-scale models [51]. Group 4: Market Impact - GLM-Image represents a significant advancement in the domestic AI landscape, challenging the dominance of foreign models and proving that high-performance models can be developed using local resources [57][60]. - The open-source nature of GLM-Image, along with its innovative architecture, provides valuable resources for researchers and developers in the field of image generation [59][60].

Claude版Manus只用10天搓出，代码全AI写的！网友：小扎140亿并购像冤大头

量子位· 2026-01-14 04:42

Core Insights - Claude Cowork is a general-purpose intelligent agent designed for work scenarios, built on Anthropic's advanced self-developed model [2] - The development of Claude Cowork took approximately 10 days, with Claude Code writing all the code, although human intervention was still necessary for planning and design [3][5] - The tool aims to empower non-technical users to leverage AI capabilities, allowing them to assign tasks as if communicating with a reliable colleague rather than through traditional dialogue [6][7] Development Process - The initial version of Claude Code was in internal testing by the end of 2024, originally named Claude CLI, and was not fully mature in programming capabilities [11][12] - The unexpected usage of Claude Code by data scientists and other professionals for various tasks led to the realization that a more user-friendly version was needed, resulting in the creation of Claude Cowork [17][18] - The development team operated under a tight deadline, collaborating closely to manage multiple Claude instances for functionality and error resolution [20][22] Features and User Experience - Claude Cowork allows for local Git work trees for native code and can implement smaller changes directly [24][25] - The team prioritizes user feedback to refine the product, releasing it early despite imperfections to better understand user needs [29] - Comparisons with Manus indicate that while Manus is suited for more complex workflows, Claude Cowork is still in its early stages and may not yet be fully reliable [30][31] Cautionary Measures - Users are advised to exercise caution when granting AI access to files, as there have been instances of data loss due to AI actions [34] - Claude's team has implemented measures to alert users when granting file system permissions, emphasizing the need for careful oversight [36]

Artificial Intelligence

Claude Cowork

Claude Code

Manus

Artificial Intelligence

Claude Cowork

Claude Code

Manus

不用额外缓存！英伟达开源大模型记忆压缩方案，128K上下文提速2.7倍

量子位· 2026-01-14 04:42

这项技术与前几天大火的DeepSeek条件记忆模块有所不同。 DeepSeek的Engram模块依赖的是"按需查表"的静态学习路径，而英伟达走的是动态学习的路子，关键在于上下文压缩。闻乐发自凹非寺量子位 | 公众号 QbitAI 提高大模型记忆这块儿，美国大模型开源王者——英伟达也出招了。联合Astera研究所、斯坦福大学、UC伯克利、加州大学圣地亚哥分校等机构推出了 TTT-E2E 方法。在128K超长文本上处理速度比全注意力模型快2.7倍，处理2M上下文时提速达35倍，性能还不打折。通过实时学习将关键内容压缩到自身权重中，让模型在测试阶段依然保持学习状态。这样既避免了额外缓存的负担，又能精准捕捉长文本中的核心逻辑。把每个训练序列都模拟成测试序列，先在内循环中对其进行测试时训练，再在外循环中优化模型的初始参数，确保初始状态就能快速适配测试时的学习需求，实现了训练与测试的端到端对齐优化。为了平衡效率与稳定性，TTT-E2E还设计了三项关键优化。一是采用「迷你批处理+滑动窗口」的组合策略。将测试时的训练数据分成多个迷你批，配合8K大小的滑动窗口注意力，既解决了单token梯 ...

谷歌Agent杀入电商赛道：AI直接帮忙比价下单，马斯克：有意思

量子位· 2026-01-13 11:36

Core Insights - The article discusses the transformation of e-commerce driven by Google's new AI-centric approach, emphasizing the introduction of the Universal Commerce Protocol (UCP) and Gemini CX solutions [2][3][25]. Group 1: Universal Commerce Protocol (UCP) - Google has launched the UCP, an open protocol designed for Agentic e-commerce, facilitating collaboration between AI agents, merchants, and e-commerce platforms throughout the entire shopping process [10][21]. - UCP focuses on three core functionalities: checkout, identity linking, and order management, supporting complex shopping cart logic, dynamic pricing, and tax calculations [16][20]. - The protocol is compatible with existing industry standards and has already integrated with major retailers and payment platforms like Walmart, Shopify, and Visa [21]. Group 2: Gemini CX - Google introduced Gemini CX, which integrates the latest Gemini model and AI technology to assist businesses in deploying AI agents for customer service across the entire shopping lifecycle [25][30]. - The Shopping agent within Gemini CX connects front-end interfaces like chat and voice to back-end tools, streamlining the retail process and reducing the burden on merchants [27][28]. - Companies like McDonald's and The Home Depot are already utilizing Gemini CX to enhance customer service quality [30]. Group 3: Domestic E-commerce Developments - Domestic e-commerce platforms, such as Alibaba, are also advancing in AI integration, with Alibaba applying generative AI for the first time during this year's Double Eleven shopping festival [32]. - JD.com has opened access to over twenty AI tools for merchants, covering various aspects of store management and marketing during the Double Eleven period [34]. - Douyin has redefined its e-commerce entry using its model to match shopping inquiries with products, facilitating a quicker path to purchase [34].

王小川：30亿现金在手，明年IPO，toC产品马上就发

量子位· 2026-01-13 11:36

Core Viewpoint - Baichuan Intelligent focuses on a single line of development in the medical field, emphasizing the importance of deepening expertise rather than diversifying into multiple sectors [1] Group 1: Financial Position and Future Plans - Baichuan has approximately 3 billion yuan in funds, allowing for sustained investment in its chosen field [3] - The company plans to initiate an IPO in 2027 [6] Group 2: Technological Advancements - Baichuan has released the new medical model Baichuan-M3, which scored 65.1 on the HealthBench evaluation, ranking first [2] - The model has a low medical hallucination rate of 3.5, the lowest globally [2] - About 80% of Baichuan's computational power is dedicated to reinforcement learning, which has fundamentally changed the training focus from the previous Baichuan-M2 model [8][12] - The M3 model employs "fact-aware reinforcement learning," addressing the challenge of balancing strong reasoning capabilities with minimizing hallucinations [13][16] Group 3: Product Development and Market Focus - Baichuan plans to release two consumer-facing medical products in the first half of this year, initially free, with future paid modules aimed at assisting patient decision-making and home health care [10] - The company emphasizes the need for a restructured approach to healthcare, focusing on outpatient scenarios and patient decision-making outside of hospitals [25][27] - Baichuan's product strategy will not cross regulatory boundaries by providing diagnoses or prescriptions but will help users understand information and organize symptoms [29] Group 4: Collaboration and Target Areas - Baichuan's medical AI products will cover all disease types but will initially focus on pediatrics and oncology [31] - The company is collaborating with Beijing Children's Hospital and the Cancer Hospital of the Chinese Academy of Medical Sciences for real-world scenario validation [32]

把RoPE扔掉，AI更能看懂长上下文！Transformer作者团队开源大模型预训练新方法

量子位· 2026-01-13 09:50

Core Insights - The article discusses a new technology called DroPE, developed by a research team led by Llion Jones, one of the core authors of the Transformer architecture, to address the challenges of long text processing in large models [1][24]. - DroPE allows for seamless zero-shot context expansion without the need for expensive long-context training, requiring less than 1% of the pre-training budget for model recalibration [2]. Group 1: Technology Overview - DroPE can be seen as a method to discard positional embeddings to extend context [5]. - The technology utilizes RoPE (Rotary Positional Encoding) as a temporary training tool during the pre-training phase to ensure stability and efficiency in training [12][13]. - During the inference phase, DroPE discards positional embeddings and performs brief recalibration under the original context length, unlocking the model's long-context extrapolation capabilities [15][16]. Group 2: Performance Metrics - Experiments conducted on various models, including a 5M parameter model and the SmolLM family (360M/1.7B) as well as the 7B parameter Llama2-7B, showed significant improvements [17]. - In the LongBench benchmark test, DroPE improved the average score of the base SmolLM by over 10 times [18]. - In the NIAH task evaluation, the recall rate of the DroPE model reached 74.92%, significantly surpassing traditional RoPE scaling methods [19]. Group 3: Comparative Analysis - A comparative table shows that DroPE outperforms other methods in various tasks, achieving an average score of 30.52 in the LongBench benchmark [20]. - Even on the large-scale Llama2-7B model, DroPE demonstrated exceptional performance in long-context question answering and summarization tasks using only 0.5% of the pre-training budget for recalibration [20]. Group 4: Company Background - The team behind DroPE, Sakana AI, was co-founded by Llion Jones and former Google senior scientist David Ha [24]. - Sakana AI has gained attention for creating the first AI scientist capable of generating complete academic papers, which has positioned the company prominently in the AI landscape [26].

Artificial Intelligence

Transformer

Artificial Intelligence

DroPE

Artificial Intelligence

Transformer

Artificial Intelligence

DroPE

苹果AI自研不动，库克外包给谷歌Gemini了

量子位· 2026-01-13 09:50

克雷西发自凹非寺量子位 | 公众号 QbitAI 兜兜转转，苹果AI最终选择了谷歌。今天凌晨，两家公司发布联合声明，官宣达成了深度合作协议，将基于Gemini模型和谷歌的云技术构建下一代苹果基础模型。当然这也不是老马第一次对苹果AI进行输出，去年苹果和OpenAI宣布合作的时候，马斯克就提起来反垄断诉讼，指控这两家公司串谋"确保其在人工智能市场的持续主导地位"。苹果AI花落谷歌根据双方声明，这项合作将带来的成果之一就是"更个性化的Siri"，而且今年年内就会上线。总之，从去年被彭博社古尔曼曝出与谷歌私下接触，双方的合作总算是要尘埃落定。这则消息无疑对双方都是利好，两家的公司股价齐声上涨，谷歌的市值还首次突破4万亿美元关口。 | 有人欢喜有人忧，隔壁马斯克就是被酸到的一个，他表示考虑到谷歌还有Chrome和安卓，这就是赤裸裸的垄断。 | | --- | 根据双方的合作协议，Gemini将为苹果新版Siri乃至更广泛的Apple Intelligence提供底层技术支持，运行模式将继续采用苹果私有云+端侧结合的方式，以确保数据隐私。外界预估苹果每年可能向谷歌支付约10亿美元的授权费用， ...

AI4S又一瓶颈被攻克：两个AI「吵架」，让科研代码部署成功率突破95%

量子位· 2026-01-13 09:50

Core Insights - The article discusses the challenges in deploying scientific software, emphasizing that most tools are published but not executable, which limits reproducibility and integration in scientific research [3][6][11] - The emergence of AI for Science (AI4S) highlights the need for tools that can interact seamlessly with AI systems, making the ability to run these tools a fundamental issue [8][9][10] - Deploy-Master is introduced as a solution to streamline the deployment process, focusing on creating a shared infrastructure that ensures tools are executable [12][35][37] Group 1: Challenges in Scientific Software - Scientific software often requires extensive manual effort to compile and run, leading to inefficiencies and reliance on individual expertise [4][5] - The deployment bottleneck persists despite advancements in containerization and cloud computing, affecting the usability of scientific software [7] - The lack of a systematic approach to convert tools into executable formats is identified as a structural barrier to the scalability of AI4S and Agentic Science [11][35] Group 2: Deploy-Master Overview - Deploy-Master is designed as a one-stop automated workflow centered on execution, addressing the entire deployment chain from discovery to execution [12] - The tool employs a multi-stage funnel process to filter and validate scientific tools, reducing an initial pool of 500,000 repositories to 52,550 candidates for automated deployment [15] - A dual model review mechanism is implemented to enhance the success rate of building specifications, achieving over 95% success in generating executable tools [22] Group 3: Deployment Insights - The deployment process is characterized by a long-tail distribution of build times, with most tools completing in around 7 minutes, but some requiring significantly longer due to complexity [25][26] - A diverse language distribution is observed among successfully deployed tools, with Python being the most prevalent, followed by C/C++, R, and Java [27][28] - Failure rates in builds are concentrated in specific areas, primarily due to inconsistencies in build processes and missing dependencies [31][32] Group 4: Future Implications - Deploy-Master's success in creating a large collection of executable tools provides a foundation for community agents and various master agents to operate effectively [35][36] - The methodology established by Deploy-Master can be applied beyond scientific computing to other software ecosystems, emphasizing the importance of a robust execution infrastructure [37]