Workflow
量子位
icon
Search documents
单卡2秒生成一个视频!清华联手生数开源TurboDiffusion,视频DeepSeek时刻来了
量子位· 2025-12-25 11:51
Core Viewpoint - The article discusses the introduction of TurboDiffusion, an open-source framework developed by Tsinghua University's TSAIL lab and Shenshu Technology, which significantly accelerates video generation, achieving speeds up to 200 times faster while maintaining high quality [2][3][39]. Group 1: Speed and Efficiency - TurboDiffusion allows for the generation of a 5-second video at 480P resolution in just 1.9 seconds on a single RTX 5090 GPU, compared to the original time of approximately 184 seconds [3][13]. - For a 720P video, the TurboDiffusion framework can generate content in 24 seconds, a substantial improvement over previous models [12]. - The framework's enhancements enable real-time video generation, reducing the generation delay from 900 seconds to just 8 seconds for high-quality 1080P videos [16][39]. Group 2: Technical Innovations - TurboDiffusion incorporates four key technologies to optimize video generation: SageAttention, Sparse-Linear Attention (SLA), rCM step distillation, and W8A8 quantization [22][24][32]. - SageAttention2++ reduces the computational load of attention mechanisms, achieving a speed increase of 3-5 times while halving memory usage [25][27]. - SLA focuses on important pixels and maintains linear complexity, allowing for additional speed improvements when combined with SageAttention [28][29]. Group 3: Industry Impact - The advancements made by TurboDiffusion are expected to lower cloud inference costs significantly, enabling service to 100 times more users with the same computational power [42]. - The technology is compatible with domestic AI chip architectures, promoting self-sufficiency in China's AI infrastructure [42]. - The framework opens up new possibilities for real-time video editing, interactive video generation, and automated short film production, potentially leading to innovative product forms in the AIGC sector [42].
向量检索爆雷!傅聪联合浙大发布IceBerg Benchmark:HNSW并非最优,评估体系存在严重偏差
量子位· 2025-12-25 11:51
Core Insights - The integration of multimodal data into RAG and agent frameworks is a hot topic in the LLM application field, with vector retrieval being the most natural recall method for multimodal data [1] - There is a misconception that vector retrieval methods have been standardized, particularly the use of HNSW, which does not perform well in many downstream tasks [1] - A new benchmark called IceBerg has been introduced to evaluate vector retrieval algorithms based on downstream semantic tasks rather than traditional metrics like Recall-QPS, challenging past industry perceptions [1] Group 1: Misconceptions in Vector Retrieval - Many believe that vector retrieval methods are standardized, leading to a reliance on HNSW without considering its performance in real-world tasks [1] - The evaluation systems used in the past only scratch the surface of the complexities involved in vector retrieval [1] - A significant disparity exists between the perceived effectiveness of vector retrieval methods and their actual performance in downstream tasks [7] Group 2: Case Studies and Findings - In a large-scale facial verification dataset (Glink360K), the accuracy of facial recognition reached saturation before achieving a Recall of 99%, indicating a disconnect between distance metrics and actual task performance [5] - NSG, a state-of-the-art vector retrieval algorithm, shows absolute advantages in distance metric recall but underperforms in downstream semantic tasks compared to RaBitQ [5] - Different metric spaces can lead to vastly different outcomes in downstream tasks, highlighting the importance of metric selection in vector retrieval [6] Group 3: Information Loss and Model Limitations - An information loss funnel model is proposed to illustrate how information is lost at each stage of the embedding process, leading to discrepancies in expected outcomes [7] - The capacity of representation models directly affects the quality of embeddings, with generalization errors and learning objectives impacting performance [10][11] - Many models do not prioritize learning a good metric space, which can lead to significant information loss during the embedding process [13] Group 4: Metric and Algorithm Selection - The choice of metric (Euclidean vs. inner product) can have a substantial impact on results, especially when using generative representation models [15] - Different vector retrieval methods, categorized into space partitioning and graph-based indexing, perform differently based on data distribution [17] - The IceBerg benchmark reveals a reshuffling of vector retrieval algorithm rankings, demonstrating that HNSW is not always the top performer in downstream tasks [18] Group 5: Automation and Future Directions - IceBerg provides an automated algorithm selection tool that helps users choose the right method without extensive background knowledge [21] - Statistical indicators can reveal the affinity of embeddings to metrics and algorithms, facilitating automated decision-making [23] - The research team calls for future vector retrieval studies to focus on task-metric compatibility and the development of unified vector retrieval algorithms [25]
2500元/月雇个总监级AI数字员工,贵吗?
量子位· 2025-12-25 11:51
Core Insights - A profound transformation in corporate structure is occurring in Silicon Valley, where AI agents are evolving from mere tools to autonomous colleagues, significantly impacting the real estate industry [1] - The shift from traditional software to AI-driven digital employee teams is redefining business processes and operational costs [2][3] Group 1: AI in Real Estate - The real estate sector, characterized by high capital intensity and complex decision-making, is becoming a breakthrough area for AI applications [3] - Deep Intelligence has launched the "Real Estate AI-Ready" strategy, introducing a digital employee team that covers decision-making, marketing, and service scenarios [3][4] - The digital employees can produce comprehensive market analysis reports with high accuracy and efficiency, comparable to senior analysts, at a fraction of the cost [3][11] Group 2: Cost Efficiency and Workforce Transformation - Traditional real estate marketing teams require 6-8 personnel with monthly costs exceeding 150,000 yuan, while digital employees can cover the same functions for around 2,500 yuan, reducing labor costs by over 90% [11] - The future organizational structure will focus on maximizing AI efficiency, allowing human experts to concentrate on high-value tasks while digital employees handle standardized, time-consuming tasks [11] Group 3: Unique Advantages of Specialized AI - Unlike general AI models, Deep Intelligence's digital employees are designed to meet specific job requirements, ensuring they can perform complex tasks without the need for extensive training [12][13] - The proprietary AI space developed by Deep Intelligence integrates industry knowledge, business processes, and private data, creating a robust operational foundation for real estate applications [13][16] Group 4: Industry Trends and Future Outlook - The digital employee market in China is projected to reach 4.12 billion yuan in 2024, with a year-on-year growth of 85.3%, indicating a strong trend towards AI integration in various industries [19] - Companies that leverage AI to enhance their intellectual capacity will have a competitive edge, as opposed to those relying solely on human resources [20][21]
量子位编辑作者招聘
量子位· 2025-12-25 11:51
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 加入我们,你可以获得: 以下是岗位详情: 任职要求: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 AI产业方向 岗位职责: AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术、新工具应用于工作,提升工作效率和创造力。 打造个人影响力 :通过撰 ...
揭秘Agent落地困局!93%企业项目卡在POC到生产最后一公里|亚马逊云科技陈晓建@MEET2026
量子位· 2025-12-25 06:08
Core Insights - The true value of Agents lies not in their impressive demonstrations but in their ability to operate effectively in production environments. Data indicates that over 93% of enterprise Agent projects get stuck in the transition from Proof of Concept (POC) to production [1][17]. Group 1: Agent Development and Challenges - A successful Agent requires three essential modules: the model (brain), code (logic), and tools (connecting to the physical world). The effective integration of these three components presents the greatest engineering challenge [7][9]. - The transition from POC to production is hindered by significant obstacles, primarily due to data quality discrepancies and a lack of engineering capabilities [7][17]. - The best time for model customization is during the foundational model training phase, similar to how humans learn languages more effectively at a young age [21][23]. Group 2: Engineering and Deployment Solutions - To address the challenges faced during the deployment and production phases, the company has introduced Amazon Bedrock AgentCore, a comprehensive toolbox designed to manage foundational infrastructure dynamically [20]. - The introduction of Strands Agents simplifies the development process, allowing complex functionalities to be achieved with significantly less code, enhancing efficiency [13][30]. - The company has also launched features to support TypeScript and edge device deployment, expanding the applicability of Agents across various platforms [15][30]. Group 3: Automation and Workflow Integration - The emergence of large models has opened new possibilities for workflow automation, with the development of Amazon Nova Act, which integrates large model capabilities with engineering functionalities for end-to-end automation [29]. - The success rate of automation using Nova Act can reach over 80%, showcasing its effectiveness compared to traditional RPA tools [29]. Group 4: Case Studies and Industry Impact - Blue Origin has built over 2,700 internal Agents using Bedrock and Strands Agents, achieving a 75% improvement in delivery efficiency and a 40% enhancement in design quality [30]. - Sony has developed an internal "Data Ocean" platform, serving over 57,000 internal users and processing up to 150,000 inference requests daily, while also improving compliance review efficiency by 100 times through model fine-tuning [30].
字节Seed发布最强数学模型:一招“打草稿”,IMO银牌变金牌
量子位· 2025-12-25 06:08
Core Insights - ByteDance's latest mathematical reasoning model, Seed Prover 1.5, achieved a gold medal score at the IMO 2025 by solving five problems in 16.5 hours, scoring 35 points, which meets the gold medal threshold for this year [1][3] - This performance matches that of Google's Gemini, which was certified as an IMO gold medalist in July [3] - The model has not been open-sourced yet, but a technical report has been released, highlighting the performance improvements brought by large-scale reinforcement learning [5][19] Model Performance - Seed Prover 1.5 significantly outperformed its predecessor, which took three days to solve four out of six problems and achieved a silver medal [3] - The model also set new state-of-the-art (SOTA) records in the North American undergraduate mathematics competition, Putnam [4] Technical Innovations - The model features a new architecture called Agentic Prover, which allows it to use formal mathematical reasoning instead of natural language, ensuring more reliable results [10][12] - It incorporates a Sketch Model that simulates how human mathematicians draft proofs, breaking down complex problems into manageable sub-goals [22][23] - The model employs a multi-agent collaborative system that enhances efficiency and success rates by recursively calling the Sketch Model for difficult lemmas [25][28] Reinforcement Learning and Efficiency - The model's proof success rate improved from 50% to nearly 90% with increased reinforcement learning training steps [19] - In comparative tests, Seed Prover 1.5 required significantly less computational resources while outperforming previous models on high-difficulty datasets [19][20] Conclusion - The research is part of ByteDance's Seed AI4Math team, showcasing advancements in mathematical reasoning through innovative model architectures and training methodologies [30]
LeCun哈萨比斯神仙吵架,马斯克也站队了
量子位· 2025-12-25 00:27
Core Viewpoint - The article discusses a heated debate between AI experts Yann LeCun and Demis Hassabis regarding the nature of intelligence, particularly focusing on the concept of "general intelligence" and its implications for artificial intelligence development [3][8][30]. Group 1: Debate Overview - Yann LeCun argues that the idea of "general intelligence" is nonsensical, asserting that human intelligence is highly specialized rather than universal [9][13]. - Demis Hassabis counters LeCun's claims, stating that human brains exhibit significant generality and complexity, and that general intelligence is a valid concept [17][22]. - The debate has attracted considerable attention, with notable figures like Elon Musk publicly supporting Hassabis [5][7]. Group 2: Key Arguments - LeCun emphasizes that human intelligence is shaped by evolutionary pressures to adapt to specific environments, leading to specialized skills rather than general capabilities [14][36]. - Hassabis argues that the brain's complexity allows for general intelligence, and he believes that with sufficient resources, any computable task can be learned, akin to a Turing machine [18][24]. - Both experts agree on the importance of world models in AI development, but they differ in their interpretations and applications of this concept [50][42]. Group 3: Future Directions - LeCun plans to establish a new company, Advanced Machine Intelligence Labs, focusing on world models, with a target valuation of €3 billion (approximately ¥24.7 billion) [43]. - Hassabis highlights that Google DeepMind is also prioritizing world models, emphasizing the understanding of causal relationships and interactions within the world [47][49]. - The article concludes that while the two experts may appear to be discussing different aspects of intelligence, they are ultimately addressing the same fundamental issue of how to achieve artificial general intelligence (AGI) [41][42].
黄仁勋200亿美元带走「TPU核心班底」
量子位· 2025-12-25 00:27
英伟达官宣:以200亿美元现金与AI芯片初创公司Groq达成交易。 消息一出迅速引发市场轰动,因为这是英伟达有史以来最大规模的一笔交易, 远超2019年收购Mellanox的70亿美元 。 但仅仅几小时后,画风突变。 英伟达和Groq双双发表声明,对交易性质进行了澄清,并非收购。Groq在官方博客中写道: 梦晨 发自 凹非寺 量子位 | 公众号 QbitAI 平安夜老黄没有休息,一项 200亿美元 创纪录芯片收购消息,轰动硅谷。 我们与英伟达签订了一份非排他性技术许可协议。 英伟达方面也明确表态: 我们不是在收购Groq这家公司,我们只是获得技术授权,并将Groq的产品整合到未来的产品中。 看起来老黄也学会了"人才收购"这招:重金掏空一家公司的人才和核心资产,但又避免触发反垄断。 所以这200亿美元到底买了什么? 席卷硅谷的"人才收购" 答案是:技术授权,加上一整支核心团队。 最先爆料的是Groq主要投资方Disruptive,其CEO透露英伟达已同意以200亿美元现金收购Groq的资产,交易进展非常迅速。 就在9月,Groq刚刚完成了一轮7.5亿美元的融资,估值达到约69亿美元。 自2016年Groq成立以 ...
攻克长视频生成记忆难题:港大与快手可灵MemFlow设计动态自适应长期记忆,告别快速遗忘与剧情错乱
量子位· 2025-12-25 00:27
Core Viewpoint - The article discusses the challenges of AI-generated long videos, particularly issues with narrative coherence and character consistency, and introduces MemFlow, a new memory mechanism designed to address these problems [1][2][3]. Group 1: Challenges in AI Video Generation - AI-generated long videos often suffer from narrative inconsistencies, such as characters appearing different after a scene change or the AI confusing multiple characters [1]. - Traditional models use a "chunk generation" strategy, which leads to difficulties in maintaining continuity across video segments [4][6]. - Existing memory strategies have significant limitations, including only remembering the first segment, fixed-size memory compression, and independent processing of segments, all of which contribute to narrative disjointedness [5][6]. Group 2: Introduction of MemFlow - MemFlow is a novel adaptive memory mechanism that enhances AI's long-term memory and narrative coherence, aiming to resolve the aforementioned issues [3][7]. - It establishes a dynamic memory system that maintains visual consistency and narrative clarity, even in complex scenarios with multiple characters [8][9]. Group 3: Mechanisms of MemFlow - MemFlow employs two core designs: Narrative Adaptive Memory (NAM) and Sparse Memory Activation (SMA), which allow for efficient retrieval of relevant visual memories and reduce computational load [11]. - NAM intelligently retrieves the most relevant memories based on current prompts, while SMA activates only the most critical information, enhancing both speed and quality of video generation [11]. Group 4: Performance Evaluation - MemFlow demonstrated significant improvements in key performance metrics, achieving a quality consistency score of 85.02 and an aesthetic score of 61.07, outperforming other models in long video generation tasks [13][14]. - The model maintained high semantic consistency throughout the video, particularly in the latter segments, which is crucial for narrative coherence [15][17]. - In terms of subject and background consistency, MemFlow achieved scores of 98.01 and 96.70 respectively, showcasing its ability to maintain visual unity amidst complex narrative changes [18][17]. Group 5: Visual Comparisons and Efficiency - Visual comparisons highlighted MemFlow's superiority in maintaining character consistency and avoiding narrative confusion, unlike other models that struggled with character drift and inconsistencies [19][21][23]. - MemFlow operates efficiently on a single NVIDIA H100, achieving a real-time inference speed of 18.7 FPS, with minimal performance loss compared to baseline models [25]. Group 6: Future Implications - MemFlow represents a significant advancement in AI video generation, transitioning from simple video creation to complex narrative storytelling [26][27]. - This innovation indicates a shift towards AI systems capable of understanding, remembering, and coherently narrating stories, marking the dawn of a new era in AI video creation [28].
用编程大模型登顶开源第一后,智谱GLM团队被拷问了3小时
量子位· 2025-12-24 12:46
Core Viewpoint - The article discusses the release of the new model GLM-4.7 by Z.ai, which has surpassed GPT-5.2 in the WebDev ranking, marking a significant achievement in the open-source large model space [1][2]. Model Performance and Optimization - The improvements in GLM-4.7 are primarily attributed to advancements made during the post-training phase, particularly in supervised fine-tuning (SFT) and reinforcement learning (RL) [8]. - The design of GLM-4.7 considers hardware limitations, aiming for high performance on consumer-grade graphics cards while maintaining logical capabilities close to 30 billion parameters [9]. - A complex pre-training data process was established, involving multi-source data collection and rigorous cleaning to enhance model quality [11]. Model Application Scenarios and Functions - GLM-4.7 has shown significant improvements in programming tasks, with optimizations made specifically for coding languages like Python and JavaScript, as well as lesser-known languages [16]. - The model has enhanced creative writing capabilities, producing more nuanced and engaging text, and has introduced a feature called "Interleaved Thinking" to improve decision-making in complex tasks [21]. Technical Methods and Tools - The introduction of the Slime framework aims to address the inefficiencies and stability issues in large model reinforcement learning, providing developers with tools to replicate high alignment effects [27]. - The team emphasizes transparency in their data collection and processing pipeline, which has garnered respect within the open-source community [28]. Future Commitments and Market Position - Z.ai has committed to maintaining its open-source ethos even after potential IPO plans, recognizing the importance of the open-source ecosystem for its growth [46]. - The competitive pricing of GLM-4.7 has attracted attention, with users noting its affordability compared to other models like Codex and Claude Code [47].