量子位
Search documents
让AI当「动作导演」:腾讯混元动作大模型开源,听懂模糊指令,生成高质量3D角色动画
量子位· 2026-01-14 11:19
Core Viewpoint - The article discusses the limitations in the 3D character animation creation field due to the scarcity of high-quality motion assets and presents Tencent's innovative solution, HY-Motion 1.0, which aims to revolutionize motion generation through advanced data processing and model training techniques [1][2]. Group 1: Core Technology - The HY-Motion 1.0 model is built on over 3000 hours of industrial-grade refined motion data, which is essential for supporting its 1 billion parameter performance [4]. - The data engine integrates various sources, including monocular video motion capture, optical motion capture, and expressive artist hand-keyed animation assets, ensuring a balance between model generalization and generation quality [6]. - A standardized data processing pipeline was established, which includes data cleaning, normalization, and a closed-loop labeling process using VLM and LLM to enhance the diversity of motion descriptions [6][7]. Group 2: Generation Pipeline - A specialized LLM Prompt Engineering module was designed to convert user prompts into structured action descriptions and precise timing, enhancing the controllability of generated motions [7][8]. - The model undergoes two-stage fine-tuning, first transforming vague multi-language instructions into structured English descriptions and then optimizing through reinforcement learning to improve semantic consistency and timing [7][13]. Group 3: Model Design - The core architecture of HY-Motion 1.0 combines Diffusion Transformer (DiT) with Flow Matching, employing a dual-stream to single-stream structure for effective multi-modal integration [10][12]. - Techniques such as "semantic pollution prevention" and "local constraints" are implemented to ensure logical consistency and physical continuity in long-sequence generation [12]. Group 4: Training Process - The training process includes large-scale pretraining on 3000 hours of data, high-quality fine-tuning with 400 hours of selected data, and reinforcement learning to enhance action realism and semantic alignment [15][16]. - The model achieved a SSAE score of 78.6%, significantly outperforming state-of-the-art models in terms of instruction adherence [17]. Group 5: Community Engagement and Applications - Since its open-source release, HY-Motion 1.0 has gained popularity among game developers, AI designers, and animators, who have integrated it into mainstream AI workflows like ComfyUI [26][27]. - The model provides a cost-effective AI solution for creators lacking expensive motion capture equipment, facilitating high-quality motion asset production [33][34].
「AI 100」榜单启动招募,AI产品“年会”不能停丨量子位智库
量子位· 2026-01-14 08:10
Core Insights - The article discusses the emergence of numerous keywords in the AI product sector by 2025, highlighting transformative AI products that are leading the market [4] - The "AI 100" list by Quantum Bit Think Tank aims to evaluate and recognize the top AI products in China, reflecting the industry's evolution and future trends [4][12] Group 1: AI 100 List Overview - The "AI 100" list is divided into three main categories: "Flagship AI 100," "Innovative AI 100," and the top three products in ten popular sub-sectors [6] - The "Flagship AI 100" will focus on the strongest AI products of 2025, showcasing those that have achieved significant technological breakthroughs and practical application value [7] - The "Innovative AI 100" aims to identify products that are expected to emerge in 2026, representing cutting-edge AI technology and potential industry disruptors [8] Group 2: Sub-sector Focus - The ten hottest sub-sectors for the top three products include AI browsers, AI agents, AI smart assistants, AI workstations, AI creation, AI education, AI healthcare, AI entertainment, Vibe Coding, and AI consumer hardware [9] Group 3: Application and Evaluation - The evaluation of the "AI 100" list employs a dual assessment system combining quantitative and qualitative metrics, focusing on user data and long-term development potential [13] - Quantitative metrics include user scale, growth, activity, and retention, while qualitative metrics consider technology, market space, design, monetization potential, team background, and growth speed [13]
量子位编辑作者招聘
量子位· 2026-01-14 08:10
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 加入我们,你可以获得: 以下是岗位详情: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 AI产业方向 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术、新工具应用于工作,提升工作效率和创造力。 打造个人影响力 :通过撰写独家原创内容,建立个人知名度,成为AI领域的意见领袖。 拓展行业人脉 :与AI领域大咖零距离接触,参与重要科技活动和发布会,拓展行业视野。 获得专业指导 :应届新人会由主编级编辑出任mentor,提供一对一指 ...
谷歌也要「AI抖音」了!新Veo 3.1原生支持竖屏,4K分辨率高画质
量子位· 2026-01-14 08:10
Core Viewpoint - Google has officially entered the AI short video arena with the upgrade of Veo 3.1, enhancing video generation quality and introducing vertical and 4K formats [1][11][12]. Group 1: Features of Veo 3.1 - The upgraded Veo 3.1 allows users to generate videos from a single vertical image and a simple prompt, showcasing creative capabilities [3][14]. - It supports native 9:16 vertical video format optimized for mobile platforms like YouTube, and has increased resolution from 720p to 4K [15][12]. - The model has significantly improved consistency, ensuring characters maintain their appearance across different scenes [16][26]. - Element fusion capabilities have been enhanced, allowing for coherent video generation from simple descriptions of characters, objects, and backgrounds [20][21]. Group 2: Market Context and Competition - Google is not the first to pursue vertical AI video; competitors like OpenAI and Disney have also made strides in this area [33][40]. - OpenAI's Sora app, which mimics TikTok, faced challenges with user retention, highlighting operational difficulties in managing such platforms [36][37]. - Google benefits from its comprehensive operational capabilities, leveraging platforms like YouTube to create a closed-loop ecosystem for content creation and distribution [38][39]. Group 3: Industry Trends - The trend towards vertical AI video is becoming increasingly evident, with various players in the industry recognizing its importance [42][43]. - Domestic AI players in China are also exploring similar video generation applications, indicating a growing interest in this format [44][46].
不得了,这个新技术把视频压缩到了0.02%!
量子位· 2026-01-14 08:10
金磊 发自 凹非寺 量子位 | 公众号 QbitAI 感谢AI! 原生1个G的视频,现在只需要传200K数据就能看了—— 视频数据的压缩率干到了 0.02% ,但依旧能保持画面的高清、连贯和画面细节。 或许你会问,这又有什么用呢? 想象一下,你身处于太平洋的一搜远洋货轮中,卫星信号只有一两格,刷个朋友圈,加载内容的圈圈都要转好久。 但正是因为有了这项AI技术,现在在如此极端的环境之下,你甚至可以直接看 高清的世界杯直播! 而这项新研究,正是来自中国电信人工智能研究院(TeleAI)的技术—— 生成式视频压缩(GVC,Generative Video Compression) 作为国资央企、全球领先的综合智能信息服务运营商,中国电信不仅拥有覆盖海陆空天的通信网络基础设施,更具备将前沿AI技术与实际通 信场景深度融合的能力。 这种"云网融合+AI原生"的独特优势,使得GVC技术从实验室走向远洋船舶、应急现场等真实极端环境成为可能。 那么这项研究到底是如何做到的,以及又能给我们现实生活带来什么改变,我们继续往下看。 。 没错,视频传输的物理法则,算是被重写了。 用计算,换宽带 在介绍这项黑科技之前,我们需得先聊聊现 ...
刚刚,智谱和华为搞波大的:中国首个国产芯片训练出的SOTA多模态模型!
量子位· 2026-01-14 06:32
Core Viewpoint - The article highlights the launch of GLM-Image, a state-of-the-art (SOTA) multimodal model developed by Zhipu AI in collaboration with Huawei, which is notable for being trained entirely on domestic chips and excelling in text rendering capabilities [1][36]. Group 1: Model Performance - GLM-Image achieved first place in both the CVTG-2K (Complex Visual Text Generation) and LongText-Bench (Long Text Rendering) benchmarks, demonstrating superior performance with a word accuracy of 0.9116 and a normalized edit distance (NED) of 0.9557 [5][6]. - In the LongText-Bench, GLM-Image ranked first among open-source models in both Chinese and English scores, indicating its versatility and effectiveness in handling different languages [6]. Group 2: Cost Efficiency - The cost of generating an image using GLM-Image's API is only 0.1 yuan (approximately 0.014 USD), making it an affordable option for users [7][21]. - This low cost positions GLM-Image as a competitive choice for businesses and developers looking to integrate AI image generation capabilities [60]. Group 3: Technical Innovation - GLM-Image employs a hybrid architecture combining autoregressive and diffusion models, allowing it to understand complex prompts and generate high-quality images effectively [38][40]. - The model was trained on Huawei's Ascend A2 chips, showcasing the potential of domestic computing power in supporting advanced AI models [44][48]. - The training process included optimizations for reinforcement learning (RL) to ensure stability and efficiency, which is critical for handling large-scale models [51]. Group 4: Market Impact - GLM-Image represents a significant advancement in the domestic AI landscape, challenging the dominance of foreign models and proving that high-performance models can be developed using local resources [57][60]. - The open-source nature of GLM-Image, along with its innovative architecture, provides valuable resources for researchers and developers in the field of image generation [59][60].
Claude版Manus只用10天搓出,代码全AI写的!网友:小扎140亿并购像冤大头
量子位· 2026-01-14 04:42
Core Insights - Claude Cowork is a general-purpose intelligent agent designed for work scenarios, built on Anthropic's advanced self-developed model [2] - The development of Claude Cowork took approximately 10 days, with Claude Code writing all the code, although human intervention was still necessary for planning and design [3][5] - The tool aims to empower non-technical users to leverage AI capabilities, allowing them to assign tasks as if communicating with a reliable colleague rather than through traditional dialogue [6][7] Development Process - The initial version of Claude Code was in internal testing by the end of 2024, originally named Claude CLI, and was not fully mature in programming capabilities [11][12] - The unexpected usage of Claude Code by data scientists and other professionals for various tasks led to the realization that a more user-friendly version was needed, resulting in the creation of Claude Cowork [17][18] - The development team operated under a tight deadline, collaborating closely to manage multiple Claude instances for functionality and error resolution [20][22] Features and User Experience - Claude Cowork allows for local Git work trees for native code and can implement smaller changes directly [24][25] - The team prioritizes user feedback to refine the product, releasing it early despite imperfections to better understand user needs [29] - Comparisons with Manus indicate that while Manus is suited for more complex workflows, Claude Cowork is still in its early stages and may not yet be fully reliable [30][31] Cautionary Measures - Users are advised to exercise caution when granting AI access to files, as there have been instances of data loss due to AI actions [34] - Claude's team has implemented measures to alert users when granting file system permissions, emphasizing the need for careful oversight [36]
不用额外缓存!英伟达开源大模型记忆压缩方案,128K上下文提速2.7倍
量子位· 2026-01-14 04:42
Core Viewpoint - Nvidia has introduced the TTT-E2E method in collaboration with various research institutions to enhance memory capabilities in large models, significantly improving processing speed and efficiency for long texts [1][2]. Group 1: TTT-E2E Method Overview - TTT-E2E processes 128K long texts 2.7 times faster than full attention models and achieves a 35-fold speedup when handling 2M contexts, without compromising performance [3]. - Unlike the recently popular DeepSeek memory module, TTT-E2E employs dynamic learning through context compression rather than static learning paths [5][6]. - The method allows real-time learning, compressing key content into model weights, enabling the model to maintain a learning state during testing [7][8]. Group 2: Technical Implementation - TTT-E2E is based on a standard Transformer with sliding window attention, making it easy to deploy without relying on complex architectures [11]. - The core idea shifts long text modeling from an architectural design issue to a "continuous learning" task [12]. - During testing, the model predicts the next word based on the current context, updating its parameters through gradient descent to dynamically compress information into its weights [13]. Group 3: Training and Optimization - The training phase utilizes meta-learning to prepare the model for "test-time learning," simulating each training sequence as a test sequence [14]. - TTT-E2E incorporates three key optimizations: a combination of mini-batch processing with sliding windows, precise update strategies focusing on specific layers, and a dual MLP design to balance new context absorption with pre-trained knowledge [16][17]. Group 4: Performance and Limitations - Experimental data shows TTT-E2E performs comparably or better than full attention Transformers in terms of test loss, while maintaining consistent inference latency regardless of context length [19][23]. - In tasks requiring precise detail recall, TTT-E2E's performance is inferior to full attention models due to its memory compression approach, which filters out seemingly irrelevant details [25][26]. - The meta-learning process in the training phase is currently slower than standard pre-training methods [27]. Group 5: Research and Development - The project is led by Yu Sun, a postdoctoral researcher at Stanford, who aims to enable AI systems to learn continuously like humans [29][30]. - The code and related papers for TTT-E2E have been fully open-sourced [28].
谷歌Agent杀入电商赛道:AI直接帮忙比价下单,马斯克:有意思
量子位· 2026-01-13 11:36
Core Insights - The article discusses the transformation of e-commerce driven by Google's new AI-centric approach, emphasizing the introduction of the Universal Commerce Protocol (UCP) and Gemini CX solutions [2][3][25]. Group 1: Universal Commerce Protocol (UCP) - Google has launched the UCP, an open protocol designed for Agentic e-commerce, facilitating collaboration between AI agents, merchants, and e-commerce platforms throughout the entire shopping process [10][21]. - UCP focuses on three core functionalities: checkout, identity linking, and order management, supporting complex shopping cart logic, dynamic pricing, and tax calculations [16][20]. - The protocol is compatible with existing industry standards and has already integrated with major retailers and payment platforms like Walmart, Shopify, and Visa [21]. Group 2: Gemini CX - Google introduced Gemini CX, which integrates the latest Gemini model and AI technology to assist businesses in deploying AI agents for customer service across the entire shopping lifecycle [25][30]. - The Shopping agent within Gemini CX connects front-end interfaces like chat and voice to back-end tools, streamlining the retail process and reducing the burden on merchants [27][28]. - Companies like McDonald's and The Home Depot are already utilizing Gemini CX to enhance customer service quality [30]. Group 3: Domestic E-commerce Developments - Domestic e-commerce platforms, such as Alibaba, are also advancing in AI integration, with Alibaba applying generative AI for the first time during this year's Double Eleven shopping festival [32]. - JD.com has opened access to over twenty AI tools for merchants, covering various aspects of store management and marketing during the Double Eleven period [34]. - Douyin has redefined its e-commerce entry using its model to match shopping inquiries with products, facilitating a quicker path to purchase [34].
王小川:30亿现金在手,明年IPO,toC产品马上就发
量子位· 2026-01-13 11:36
Core Viewpoint - Baichuan Intelligent focuses on a single line of development in the medical field, emphasizing the importance of deepening expertise rather than diversifying into multiple sectors [1] Group 1: Financial Position and Future Plans - Baichuan has approximately 3 billion yuan in funds, allowing for sustained investment in its chosen field [3] - The company plans to initiate an IPO in 2027 [6] Group 2: Technological Advancements - Baichuan has released the new medical model Baichuan-M3, which scored 65.1 on the HealthBench evaluation, ranking first [2] - The model has a low medical hallucination rate of 3.5, the lowest globally [2] - About 80% of Baichuan's computational power is dedicated to reinforcement learning, which has fundamentally changed the training focus from the previous Baichuan-M2 model [8][12] - The M3 model employs "fact-aware reinforcement learning," addressing the challenge of balancing strong reasoning capabilities with minimizing hallucinations [13][16] Group 3: Product Development and Market Focus - Baichuan plans to release two consumer-facing medical products in the first half of this year, initially free, with future paid modules aimed at assisting patient decision-making and home health care [10] - The company emphasizes the need for a restructured approach to healthcare, focusing on outpatient scenarios and patient decision-making outside of hospitals [25][27] - Baichuan's product strategy will not cross regulatory boundaries by providing diagnoses or prescriptions but will help users understand information and organize symptoms [29] Group 4: Collaboration and Target Areas - Baichuan's medical AI products will cover all disease types but will initially focus on pediatrics and oncology [31] - The company is collaborating with Beijing Children's Hospital and the Cancer Hospital of the Chinese Academy of Medical Sciences for real-world scenario validation [32]