Workflow
量子位
icon
Search documents
Video版的Deep Research来了?先浏览再定位后精读:精度提升token消耗反降58.3%
量子位· 2026-01-22 05:39
Core Insights - The article discusses the evolution of AI Research, particularly focusing on Autonomous Agents and their ability to actively retrieve information rather than passively receive it [1] - It highlights a significant gap in current AI capabilities, specifically in video processing, where existing agents struggle to effectively analyze video content [2][4] Video Processing Challenges - Current AI agents either excel in text comprehension or can only perform limited question-answering on short video clips, failing to handle the dense information in videos [4] - The article identifies two main approaches to video processing: Direct Visual Inference, which is computationally expensive and suffers from context explosion, and Text Summarization, which loses critical visual details [8] Proposed Solution: Video-Browser - The research team introduces the Video-Browser, which aims to enhance video browsing capabilities by mimicking human-like search behaviors [5][6] - The Video-Browser employs a Pyramidal Perception architecture, processing video data in a tiered manner to balance efficiency and accuracy [10][11] Core Components of Video-Browser - The Video-Browser consists of three main components: Planner, Watcher, and Analyst [13] - The Watcher utilizes a three-stage pyramid mechanism: - Stage I: Semantic Filter, which quickly eliminates irrelevant videos using metadata analysis [14] - Stage II: Sparse Localization, which identifies potential answer time windows using subtitles and sparse frame sampling [15] - Stage III: Zoom-in, where high-frame-rate decoding and detailed visual reasoning occur within the identified time windows [16] Benchmark Testing: Video-BrowseComp - The research team created the Video-BrowseComp benchmark to evaluate the true capabilities of agents in video searching, emphasizing the need for agents to actively seek information [17] - The benchmark includes three difficulty levels, ranging from explicit retrieval to multi-source reasoning [18][20] Experimental Results - The Video-Browser achieved a 26.19% accuracy rate, outperforming existing models by 37.5% in accuracy [21] - The architecture led to a 58.3% reduction in token consumption, demonstrating significant efficiency improvements [22] Case Study - A case study illustrates the effectiveness of the Video-Browser in identifying specific details, such as the color of a pen in a film, which traditional methods failed to capture [24][26] Conclusion and Future Directions - The Video-Browser represents a significant advancement towards effective open-web video browsing, addressing the trade-off between accuracy and cost in video search [27] - The research team has made all code, data, and benchmarks open-source to encourage further research in the community [28][29]
马斯克下场抢人!xAI组建「人才狙击队」,极客版HR年薪168万
量子位· 2026-01-22 02:12
Core Viewpoint - xAI is forming an "AI Talent Strike Team" that will report directly to Elon Musk, aiming to rapidly recruit top talent through innovative methods rather than traditional HR practices [2][17]. Group 1: Recruitment Strategy - The "AI Talent Strike Team" will work closely with xAI's engineering and recruitment teams to identify and attract top talent using an engineering mindset [3][9]. - The position is termed "Talent Engineer," focusing on candidates with a technical background rather than traditional HR experience [6][8]. - xAI believes that conventional talent markets cannot yield top-tier candidates, thus relying on referrals, offline events, competitions, and creative channels for recruitment [11][12]. Group 2: Candidate Requirements - Candidates should possess strong technical intuition and experience in high-density talent environments, with a preference for those who have previously recruited top talent [12][14]. - Interpersonal skills are crucial, as candidates must enjoy collaborating with and being accepted by other high-caliber individuals [13]. - Successful candidates will receive annual salaries ranging from $120,000 to $240,000 (approximately 840,000 to 1,680,000 RMB), along with equity benefits [16][32]. Group 3: Company Expansion and Context - xAI is in an expansion phase, having recently launched its second supercomputer cluster, Colossus 2, which is the world's first GW-level supercomputer [27][28]. - The company has completed a $20 billion Series E funding round, indicating strong financial backing for its growth initiatives [32][33]. - xAI is actively hiring across various departments, including data center operations, engineering, finance, and foundational models [25][24]. Group 4: Competitive Landscape - The recruitment approach by Musk is expected to intensify the competition for tech talent in Silicon Valley, reminiscent of Mark Zuckerberg's previous recruitment strategies at Meta [49][51]. - While xAI's salary offerings may seem lower compared to competitors like Meta, Musk emphasizes performance and mission over high salaries, which can attract talent willing to accept lower pay for the opportunity to work with him [34][36][39].
让机器人拥有本能反应!清华开源:一套代码实现跑酷、野外徒步两大能力
量子位· 2026-01-22 02:12
清华MARSLab团队 投稿 量子位 | 公众号 QbitAI 实现人形机器人高速跑步(2.5m/s)跨越障碍物/翻越较高障碍 核心定位:为"本能级"运动智能研究而生 人形机器人的"本能级"智能,指的是像人类一样无需预设轨迹,能通过实时感知自主应对复杂环境的能力——比如看到障碍自动调整跳跃姿 势,踩在楼梯边缘下意识保持平衡。 但长期以来,这类研究面临两大痛点:一是 "感知与运动割裂" ,要么能感知地形却只会简单行走,要么能做高难度动作却"眼盲";二是 "工具链不通用" ,高动态动作与野外locomotion研究需单独搭建环境,适配成本极高。 如何让机器人同时具备"本能反应"与复杂运动能力? 清华大学交叉信息研究院与上海期智研究院联合推出的Project-Instinct框架,给出了一个新答案。 ——专为"本能级"人形机器人运动智能研究设计,以模块化、可灵活配置的全链路工具包,让科研人员无需重复造轮子,专注突破核心技 术。 Project-Instinct旨在以"统一框架+灵活配置"打破僵局: 整套工具包从算法设计、环境搭建到真机部署,全链路围绕"本能级"智能核心,既支持高动态多接触动作的精准训练,也能适配野外 ...
高通砸钱、雷军入股!刚刚,上海诞生一个183亿手机代工巨头
量子位· 2026-01-22 02:12
Core Viewpoint - Longqi Technology, a leading global smartphone ODM, has successfully listed on the Hong Kong Stock Exchange, marking its position as the "first stock of consumer electronics ODM" in Hong Kong, with an opening price of HKD 35 per share, approximately 12.9% higher than the issue price [1][4][7]. Group 1: Company Overview - Longqi Technology holds a one-third share of the global smartphone ODM market, serving major brands such as Xiaomi, Samsung, Lenovo, Honor, OPPO, and vivo [3][22]. - The company has established a comprehensive solution matrix covering product design, hardware innovation, software platform development, lean manufacturing, supply chain integration, and quality control [11]. - Longqi's product offerings include smartphones, AI PCs, automotive electronics, tablets, smartwatches, and smart glasses, structured under a "1+2+X" framework aimed at expanding production capacity and enhancing R&D [11][12]. Group 2: Financial Performance - Longqi's revenue from 2022 to 2024 was CNY 293.4 billion, CNY 271.9 billion, and CNY 463.8 billion, with a decline of 10.3% in the first nine months of 2025 [27][28]. - The company's main revenue source is smartphones, contributing 82.7%, 80.3%, 77.9%, and 69.3% of total revenue from 2022 to 2025 [32]. - The gross profit margins from 2022 to 2024 were 8.1%, 9.5%, and 5.8%, with a recovery to 8.3% in the first nine months of 2025 due to strategic adjustments and improved project quality [36][38]. Group 3: Market Position and Client Base - Longqi is the largest smartphone ODM globally, with a market share of 32.6%, and ranks second in the consumer electronics ODM sector with a 22.4% market share [24][26]. - The company has established long-term partnerships with eight of the top ten smartphone brands, with an average collaboration duration of over five years [15][16]. - Xiaomi is Longqi's largest client, contributing significant revenue across multiple years, accounting for 45.5%, 42.4%, 37.2%, and 28.6% of total revenue from 2022 to 2025 [34][35]. Group 4: R&D and Future Prospects - Longqi has a dedicated R&D team of approximately 5,200 professionals, with R&D expenditures of CNY 15 billion, CNY 16.9 billion, CNY 20.8 billion, and CNY 19.5 billion from 2022 to 2025 [41]. - The company is actively expanding into AI and smart manufacturing, with significant progress in AIoT and new product launches, including smart glasses and AI PCs [21][19]. - Longqi's cash and cash equivalents reached CNY 6.85 billion by the end of the third quarter of 2025, indicating a strong liquidity position [42].
xAI工程师播客聊太嗨,马斯克解雇了他
量子位· 2026-01-21 10:00
Core Insights - The article discusses a recent podcast featuring Sulaiman Ghori, an xAI engineer, who shared significant internal details about the company and its project MacroHard, which aims to create a "human simulator" for digital tasks [1][15][19]. Group 1: MacroHard Project - MacroHard is positioned as a digital world "Optimus," designed to automate tasks that require keyboard and mouse input, and is currently in internal testing [19][20]. - The project utilizes a unique approach by employing small models instead of larger ones, focusing on speed and iteration rather than scaling [6][30]. - The current speed of MacroHard has reached eight times that of a human, with ongoing improvements in performance and generalization capabilities [34][35]. Group 2: Deployment Strategy - xAI is considering renting the idle computing power of approximately 4 million Tesla vehicles in North America to support the deployment of MacroHard [7][47]. - This strategy leverages the existing infrastructure of Tesla cars, which are equipped with powerful onboard computers and batteries, making them ideal for computational tasks [43][44]. Group 3: Company Culture and Operations - xAI operates with a flat organizational structure, where most employees are engineers, and there is minimal management oversight, allowing for rapid decision-making and execution [69][72]. - The company emphasizes extreme speed in reasoning, training, and execution, fostering a culture where employees are expected to work without formal deadlines [66][86]. - Employees often work long hours, with some even sleeping in the office to meet project demands, reflecting the intense work environment [88][89]. Group 4: Internal Dynamics and Reactions - Ghori's candid sharing of internal information has sparked discussions online, with some speculating whether he was allowed to disclose such details by Elon Musk as a form of public relations [92][97]. - The podcast has been well-received, with many viewers expressing excitement over the insights shared, despite concerns about the implications of such disclosures [91][102].
Node.js之父:手写代码已死
量子位· 2026-01-21 10:00
Core Viewpoint - The era of human-written code is coming to an end, as AI programming tools are increasingly taking over coding tasks, fundamentally changing the programming landscape [1][28]. Group 1: Influential Figures and Their Statements - Ryan Dahl, the creator of Node.js, stated that the era of human coding is over, which garnered significant attention with over four million views [2][4]. - Salvatore Sanfilippo, the creator of Redis, echoed this sentiment by asserting that programming has been permanently altered by AI [7][8]. - Linus Torvalds, initially critical of AI-generated code, has shifted his stance, acknowledging the effectiveness of AI in coding while emphasizing that programmers will still be needed for maintenance and oversight [30][34]. Group 2: AI Programming Tools and Their Impact - AI programming tools like OpenAI Codex's Copilot have accelerated development speed by over 50% [15]. - Companies are increasingly adopting AI tools for development, with ByteDance's TRAE generating 100 billion lines of code in 2025, equivalent to the output of 3 million programmers working continuously for a year [22][23]. - A Stack Overflow report indicated that 84% of developers use AI tools, with 69% believing these tools enhance productivity [24]. Group 3: Future Trends and Predictions - Gartner predicts that by 2030, over 80% of enterprises will deeply integrate AI for coding tasks [26]. - The demand for programmers is evolving, with companies now seeking candidates proficient in AI programming tools [28]. - The shift in programming focus is moving from syntax to intent, indicating a transformation in how coding is approached in the AI era [12].
突发!xAI联创杨格过劳病离职,给马斯克干活压力山大
量子位· 2026-01-21 07:47
Core Viewpoint - The article discusses the high-pressure work culture at xAI, highlighted by the recent departure of co-founder Greg Yang due to health issues stemming from long hours and stress, raising concerns about work-life balance in Silicon Valley [2][4][15]. Group 1: Departure of Key Personnel - Greg Yang, a co-founder of xAI, announced his departure due to health issues related to Lyme disease, which he attributes to prolonged high-intensity work [2][4][5]. - Yang's situation is not isolated; other notable departures from xAI include Igor Babuschkin, Christian Szegedy, and Kyle Kosic, indicating a trend of high turnover among executives [26][33]. - The turnover rate among Musk's direct reports is reported at 44%, significantly higher than that of other tech giants like Meta and Amazon, which stand at 9% [35][36]. Group 2: Work Culture and Employee Well-being - Yang's departure has sparked discussions about the detrimental effects of xAI's work culture, with calls for a reassessment of employee health and workload balance [15][16]. - Former employees have criticized the lack of adequate time and mental space to produce quality work, suggesting that a healthier work culture could lead to better outcomes [17]. - Musk's extreme focus on speed and performance is exemplified by instances where he incentivized employees with rewards for rapid task completion, which may not be sustainable for all workers [20][21][22]. Group 3: Management Style and Organizational Structure - Musk's management style is characterized by strict expectations and high levels of direct pressure on employees, contributing to the high turnover rate [37][39]. - The organizational structure at xAI is notably flat, with only three levels, which allows for direct communication but also places significant pressure on core team members [38]. - Some perspectives suggest that this rigorous environment may ultimately filter out less capable employees, leaving a highly efficient core team [42].
微软打包收购OpenAI?就差一点!
量子位· 2026-01-21 07:47
Core Viewpoint - The article discusses the dramatic internal conflict at OpenAI, highlighting Microsoft's significant involvement in the situation, including a potential acquisition and the return of CEO Sam Altman after a brief dismissal [9][10][22]. Group 1: Microsoft's Involvement - Microsoft was prepared to establish a new subsidiary named "Microsoft RAI Company" to support OpenAI and its employees, offering a $25 billion guarantee to retain talent during the turmoil [5][14]. - The urgency of Microsoft's response was driven by concerns that OpenAI employees might leave for competitors like Google or Amazon [15][16]. - Microsoft played a crucial role in facilitating Altman's return by supporting a collective employee letter demanding board changes, which ultimately led to Altman's reinstatement [20][22]. Group 2: OpenAI's Strategic Moves - Following his return, Altman signed a significant $38 billion infrastructure deal with Amazon, indicating OpenAI's desire to reduce dependency on Microsoft [49]. - OpenAI's restructuring aimed to transition from a purely research-focused entity to a more commercially viable organization, with a valuation of $500 billion [47]. - The company is shifting towards a distributed computing model to avoid reliance on any single cloud provider, enhancing its negotiating power [62]. Group 3: Historical Context - Initially, OpenAI partnered with Amazon, receiving $50 million in computing resources in exchange for a $10 million investment, before eventually aligning with Microsoft [27][31]. - Microsoft's hesitance regarding OpenAI's commercial viability was evident in earlier years, with internal doubts about the potential for returns on investment [7][35]. - The relationship evolved significantly after the launch of ChatGPT in 2022, which marked a turning point for both companies [42].
中国团队首次在Nature子刊发布医疗AI标准,未来医生MedGPT摘得全球桂冠
量子位· 2026-01-21 04:09
Core Viewpoint - The article highlights the introduction of the Clinical Safety-Effectiveness Dual-Track Benchmark (CSEDB), a standardized framework for evaluating the real clinical capabilities of medical AI models, developed by a collaboration of Chinese AI medical company "Future Doctor" and 32 leading clinical experts from top medical institutions in China [1][4][14]. Group 1: CSEDB Framework - CSEDB establishes a systematic framework for assessing the clinical capabilities of medical AI, focusing on both safety and effectiveness separately [4][15]. - The framework includes a risk-weighting mechanism, assigning weights from 1 to 5 based on the potential clinical risks associated with each evaluation metric [16][17]. - CSEDB covers 2069 open-ended questions across 26 clinical specialties, simulating real clinical scenarios and emphasizing the model's performance in continuous decision-making [20][22]. Group 2: MedGPT Performance - MedGPT, developed by Future Doctor, ranked first in overall scores, safety, and effectiveness among major global models evaluated under CSEDB [27]. - Notably, MedGPT is the only model that scored higher in safety than in effectiveness, indicating a significant advantage in clinical safety [28]. - The model employs a dual-system architecture, with a "fast system" for routine scenarios and a "slow system" for complex cases, ensuring a balance between speed and thoroughness in clinical decision-making [31][36]. Group 3: Industry Implications - The research signals a shift in the medical AI industry from merely demonstrating capabilities to defining responsibilities and ensuring safety in clinical applications [8][9]. - The competitive landscape in medical AI is intensifying, with major players like Google and OpenAI investing heavily in this sector [9]. - The article emphasizes that the long-term clinical value of medical AI will be more critical than short-term technological advantages, framing the competition as a marathon rather than a sprint [54][56].
马斯克罕见低头:开源𝕏推荐算法,自嘲“很烂”不过未来月更
量子位· 2026-01-21 04:09
Core Viewpoint - GitHub has made Elon Musk's open-source recommendation algorithm system fully visible, which is primarily driven by AI models [1][2] Group 1: Algorithm Transparency and Community Reaction - The open-source announcement has generated significant excitement within the community, with many praising the transparency of the system [2] - Musk acknowledged the algorithm's shortcomings, stating it is "dumb" and requires substantial improvements, but emphasized the importance of transparency in the improvement process [4][5] - Musk has consistently criticized the previous platform's lack of openness and has followed through on his promise to publicly share Twitter's core recommendation algorithm since the acquisition [6][7] Group 2: Algorithm Mechanism - The recommendation system is built on a Transformer architecture similar to Grok-1, which learns from users' historical interactions (likes, replies, retweets) to recommend content [9] - The system begins by identifying the user and their recent activities, aiming to create a "real-time user profile" without pre-set assumptions [12][13] - Two types of user information are collected: Action Sequences (direct interest signals) and Features (long-term attributes) [14] Group 3: Content Filtering and Scoring - The algorithm filters through a vast amount of tweets to select a few thousand potentially relevant ones, using both familiar and external sources [16][17] - The system employs a Hydration module to complete candidate tweet information and a Filtering module to eliminate unwanted content [21][22] - The final scoring is done by a Phoenix ranking model, which predicts various user interactions and assigns scores based on weighted combinations of these predictions [25][26] Group 4: Key Features of the System - The system is purely data-driven, rejecting manual rules and allowing AI models to learn directly from raw user data [33] - It utilizes a candidate isolation mechanism to ensure independent scoring of each piece of content [34] - The algorithm predicts multiple user behaviors rather than providing a single recommendation score [36] - The modular design of the system supports rapid iteration and development [37] Group 5: Acknowledgment of Limitations - Despite the praise for transparency, the algorithm has been criticized for certain flaws, such as the lack of a time decay mechanism for "block" signals, which could negatively impact account recommendations [39][41] - Musk himself acknowledged the algorithm's deficiencies, indicating a commitment to ongoing improvements and updates every four weeks [42][44]