Workflow
量子位
icon
Search documents
OpenClaw代码越改越崩?新研究EvoClaw揭示:Agents持续开发成功率仅13.37%
量子位· 2026-03-25 04:58
Core Insights - By the end of 2025, AI programming will transition from being an auxiliary tool like Copilot to an Agent era dominated by AI with human oversight [1] - The emergence of OpenClaw in early 2026 will evolve Agents from executing single tasks to long-term operational systems, necessitating continuous self-iteration of software interfaces based on real-world interactions [2] Group 1: AI Programming Evaluation - Current top models can satisfactorily address isolated tasks like writing functions or fixing bugs, but struggle significantly in continuous software evolution scenarios, with performance dropping from scores above 80% to below 40% [6] - Existing AI programming benchmarks often overestimate the capabilities of Coding Agents by focusing on independent tasks rather than the continuous evolution of software, which is a persistent process [8][10] - The EvoClaw benchmark introduces a new evaluation paradigm that requires AI to autonomously execute multiple interdependent tasks within the same codebase, revealing vulnerabilities in AI's performance during continuous iterations [10] Group 2: EvoClaw Benchmark Design - EvoClaw is designed to assess AI's ability to handle software evolution by utilizing a milestone-based approach, which aggregates code submissions into cohesive units while preserving task dependencies [17] - The evaluation includes metrics such as Recall (completeness of functionality implementation) and Precision (reliability of modifications), with a combined score calculated using F1 weighting [29][31] - The dataset for EvoClaw spans five major programming languages and covers real development cycles across multiple release intervals, ensuring a comprehensive assessment of AI capabilities [27] Group 3: Performance Analysis - In continuous evaluation scenarios, top models like Claude Opus 4.6 achieve a maximum score of only 38.03%, indicating a significant drop in performance compared to independent evaluations [34] - The analysis shows that while Recall continues to grow, Precision quickly saturates, leading to a stagnation in performance as the complexity of tasks increases [42] - The study highlights that even with unlimited iteration opportunities, AI models will eventually hit a performance ceiling, unable to fully resolve all tasks due to accumulated technical debt [40][44] Group 4: Future Directions - The findings suggest that current AI models are more akin to on-demand code generators rather than comprehensive engineering solutions, lacking the ability to proactively manage technical debt and overall project governance [54] - There is a clear differentiation among models, with some like GPT and Claude series showing steady improvement in continuous evolution capabilities, while others like Gemini series struggle with sustained performance [54] - The future of AI programming lies in evolving from passive code generation to active restructuring and long-term planning, enabling AI to function as a seasoned engineer with a holistic view of projects [54]
量子位编辑作者招聘
量子位· 2026-03-25 04:58
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit," which focuses on tracking AI advancements and has established itself as a leading content platform in the industry [1]. Group 1: Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. - Positions are full-time and based in Beijing, with various levels of roles open for application [2][4]. Group 2: Job Responsibilities - **AI Industry Direction**: Focuses on innovations in infrastructure, including chips, AI infrastructure, and cloud computing [6]. - **AI Finance Direction**: Involves tracking venture capital and financial reports in the AI sector, monitoring capital movements within the industry [6]. - **AI Product Direction**: Concentrates on the application and hardware advancements of AI [6]. Group 3: Benefits and Growth Opportunities - Employees will have the chance to engage with the latest AI technologies, enhance their work efficiency through new AI tools, and build personal influence by writing original content [6]. - The company offers competitive salaries, comprehensive benefits including social insurance, meal allowances, and performance bonuses [6]. Group 4: Company Growth Metrics - By 2025, Quantum Bit aims to have over 2.4 million subscribers on WeChat and more than 7 million users across all platforms, with a daily reading volume exceeding 2 million [12]. - The company is recognized as the top new media outlet in the AI and frontier technology sectors according to third-party data platforms [12].
DeepSeek急招Agent方向!一口气放17个岗位,重度Vibe Coding用户优先
量子位· 2026-03-25 04:58
Core Insights - DeepSeek has opened 17 recruitment positions, focusing on the development of Agent capabilities across various roles [1][2] - The recruitment strategy indicates a shift from foundational model research to the productization of Agent technologies [23][24] Group 1: Recruitment Focus - The core research positions emphasize the development of Agents, covering algorithm research, data evaluation, and infrastructure [2][6] - Several job descriptions highlight the preference for candidates with experience using AI programming tools like Claude Code, Cursor, and Copilot [4] - The full-stack developer role includes responsibilities for high-concurrency server and API system architecture, data processing pipelines, and Agent infrastructure [19][20][21] Group 2: Agent Talent Requirements - DeepSeek is looking for Agent talent that can enhance model capabilities through new methods and paradigms, particularly in reinforcement learning applications [6] - The Agent data evaluation expert role focuses on constructing evaluation datasets to accurately distinguish model capabilities [9] - The infrastructure engineer position is tasked with building the foundational base for Agent operations, integrating external tools into the internal reinforcement learning infrastructure [13] Group 3: Product and System Architecture - A dedicated product manager role for Agent strategies has been established, requiring familiarity with core Agent mechanisms and industry trends [15][16] - The full-stack developer role also emphasizes the development of a next-generation container scheduling and isolation platform to support large-scale AI Agent operations [17][18] - The overall recruitment strategy reflects a comprehensive layout of Agent technology stacks, aiming to create a closed-loop capability from data production to model iteration [24][28] Group 4: Industry Context - Previous reports indicated that DeepSeek is developing advanced Agent functionalities in AI models, with plans to release a competitive product by Q4 2025 [29] - The R-1 inference model has reportedly achieved performance benchmarks comparable to OpenAI's products, challenging the notion that significant investment is necessary for model development [30]
WorkBuddy杀疯了?一群AI专家帮我打工,我在微信里当赛博虾工头!
量子位· 2026-03-25 04:58
Core Viewpoint - The article discusses the transformative capabilities of Tencent's WorkBuddy, an AI-powered tool that automates content creation and management for social media, allowing users to efficiently produce and manage content without extensive prior experience [7][71]. Group 1: WorkBuddy Features - WorkBuddy integrates various industry-specific AI experts to streamline the content creation process, enabling users to automate tasks such as topic selection and content writing [8][71]. - The platform supports multiple communication tools, including WeChat, allowing users to interact with the AI seamlessly without additional installations [55][57]. - Users can set up automated tasks for content delivery, such as regular updates on AI news, which enhances efficiency and reduces manual workload [30][36]. Group 2: Content Creation Process - The content creation process is divided into several stages, including topic selection, content writing, and platform-specific adaptations, all managed by different AI experts [40][72]. - The article highlights the ability of WorkBuddy to generate tailored content for various social media platforms, ensuring that the style and format align with each platform's audience [44][46]. - The platform's experts provide comprehensive strategies for content distribution, audience engagement, and performance monitoring, which are crucial for building a successful social media presence [47][50]. Group 3: User Experience and Benefits - Users can experience a significant reduction in the complexity of deploying AI tools, as WorkBuddy minimizes the need for technical configurations and model management [67][71]. - The article emphasizes the accessibility of WorkBuddy, noting that users can operate it from mobile devices, making it convenient for on-the-go management [55][75]. - The platform offers incentives such as free credits for new users and rewards for content contributions, encouraging engagement and exploration of its features [69][70].
@所有人,2026真的需要自己上手用AI了丨年度AI盛会
量子位· 2026-03-25 04:58
Core Viewpoint - The article emphasizes the transition of AI from a niche technology to a mainstream tool that is now widely adopted in everyday life, marking a significant shift in its accessibility and application [2][5][18]. Group 1: AI's Mainstream Adoption - AI has evolved from being a topic of interest in the tech community to becoming a household name, especially after the Spring Festival, indicating its widespread acceptance [2][5]. - The presence of AI in various daily tasks, such as cooking, cleaning, and healthcare, showcases its integration into everyday life [3][5]. - The upcoming 2026 China AIGC Industry Summit aims to facilitate this transition by encouraging participation from AI entrepreneurs, developers, and users to explore practical applications of AI [5][9]. Group 2: 2026 China AIGC Industry Summit - The summit will focus on the entire industry chain of generative AI, featuring both technology pioneers and application explorers, with over 60 industry leaders expected to share insights [9][12]. - The agenda includes two main sessions: one discussing the necessity of adopting AI and showcasing successful case studies, and the other exploring the integration of AI across various sectors like healthcare and gaming [13][14]. - The event is anticipated to attract significant attention, with over a thousand attendees expected on-site and more than 3.5 million viewers online [12]. Group 3: Recognition of AIGC Enterprises and Products - The article mentions that Quantum Bit will evaluate generative AI enterprises and products based on their performance over the past year, with results to be announced at the summit [19]. - The evaluation will be grounded in real data submitted by companies and insights from industry experts, ensuring credibility and objectivity [19]. - The summit will also serve as a platform to honor outstanding AIGC enterprises and products, inviting millions of industry professionals to witness the recognition [20].
VLA别再「走神」:即插即用提升视觉泛化,相对Pi0.5提升18%
量子位· 2026-03-24 23:52
Core Insights - The article discusses the development of DeepVision-VLA, a visual enhancement framework for robot operations, which addresses the issue of visual information degradation in deep action prediction models [6][7][24]. Group 1: Research Findings - The research team found that the reliance on key visual tokens decreases as the layers of the VLA model deepen, leading to a decline in sensitivity to critical visual information during action prediction [4][11][21]. - DeepVision-VLA incorporates a Vision-Language Mixture-of-Transformers (VL-MoT) framework and Action-Guided Visual Pruning (AGVP) strategy to enhance the model's ability to focus on task-relevant visual areas [8][24][26]. Group 2: Performance Metrics - In simulations using the RLBench simulator, DeepVision-VLA achieved an average success rate of 83%, which is an 18% improvement over the baseline model Pi0.5 [8][35]. - In real-world tasks, DeepVision-VLA reached a 91.7% average success rate, demonstrating enhanced precision and stability in complex operations [43]. Group 3: Experimental Validation - The model was tested under various conditions, including unseen backgrounds and lighting, and maintained stable performance, indicating robust visual modeling capabilities [46][48]. - The experiments showed that even with significant visual token removal in deeper layers, the impact on action prediction was limited, confirming the model's improved efficiency in utilizing visual information [25][30].
离职特斯拉“隐身”14个月,杨硕创业终于亮牌:重新定义机器人训练范式
量子位· 2026-03-24 23:52
Core Viewpoint - Yang Shuo, co-founder and CTO of Mondo Robotics, has remained silent since leaving Tesla's Optimus team over a year ago, but recently unveiled the company's work on a new model called DiT4DiT, which focuses on training robots using video to enhance their action capabilities and adaptability in various scenarios [1][2]. Group 1: DiT4DiT Model Overview - DiT4DiT is an end-to-end model that integrates video diffusion and action diffusion into a cascading framework for robot learning [9]. - The model employs a unique design called "intermediate denoising," which extracts key features during the video generation process to guide robot action decisions without waiting for a complete video output [11][12]. - The model's performance has been validated, achieving a 98.6% average success rate on the LIBERO benchmark, demonstrating its state-of-the-art capabilities [30]. Group 2: Key Design Features - The model's two critical designs include intermediate denoising and a three-timestep scheme, which allows for efficient training of both video generation and action prediction tasks [10][25]. - The intermediate denoising process involves extracting features from a specific layer during the denoising stages, optimizing the robot's ability to understand physical interactions rather than relying on complete video clarity [19][22]. - The three-timestep scheme enables the video model and action model to operate independently yet cohesively, improving convergence speed by 7 times and data efficiency by over 10 times [29]. Group 3: Practical Applications and Performance - DiT4DiT has been deployed on the Yuzhu G1 humanoid robot, successfully completing tasks such as flower arrangement and drawer interactions, outperforming pre-trained models and demonstrating superior deployment potential on robot edge chips [41][42][43]. - The model's design allows it to adapt quickly to new objects and scenarios, addressing limitations of traditional visual-language-action models that struggle with dynamic physical understanding [36][40].
OpenAI关停Sora!25个月从封神到退场
量子位· 2026-03-24 23:52
Core Viewpoint - OpenAI has announced the discontinuation of its Sora video generation platform, which has led to significant backlash and skepticism regarding the reliability of AI products from the company [1][3][6]. Group 1: Sora's Development and Termination - Sora was initially launched with high expectations, showcasing advanced video generation capabilities based on over 200 IPs from Disney, Marvel, Pixar, and Star Wars [5][10]. - The platform experienced rapid growth, reaching the top of the App Store in the US shortly after its launch, but faced declining interest as competition increased and copyright issues arose [11][12]. - OpenAI's decision to shut down Sora is part of a broader strategic shift as the company prepares for an IPO, focusing on commercial and code development functionalities [18][21]. Group 2: Impact on Partnerships and Future Directions - Following the termination of Sora, Disney has ended all collaborations with OpenAI, including a planned $1 billion investment [6][7]. - OpenAI is reallocating resources towards developing productivity tools for businesses and individuals, indicating a shift in focus away from consumer-facing video generation [21][22]. - The research efforts related to Sora will now pivot towards long-term studies in robotics and world simulation, with the product department rebranded as "AGI Deployment" [23]. Group 3: The AI Video Landscape in China - Despite Sora's closure, the AI video generation sector is thriving in China, with companies like ByteDance and Kuaishou leading the market [24][25]. - Kuaishou's AI video platform has reported monthly revenues exceeding $20 million, demonstrating strong user engagement and a successful business model [26]. - The Chinese market is characterized by a dense ecosystem of short video and e-commerce industries, providing a rich data environment for AI development [31][32].
龙虾爆火之后,AI初创的全球化机会来了吗?|线下沙龙报名
量子位· 2026-03-24 11:03
Core Insights - The article discusses the potential of AI startups in the context of globalization, emphasizing the need for these companies to identify the right applications, scenarios, and channels from the outset [2][30] - It highlights the ongoing trend of AI startups seeking to expand globally, driven by the rapid reduction of information gaps and the emergence of new entrepreneurial waves [1][30] Group 1: Event Overview - A salon event is organized featuring leading global practitioners from companies like Xiaoying Technology, FluxA, Google, JD, Agora, and Meshy, who will share reusable experiences in going global [4] - The event aims to facilitate discussions on the real logic of global AI entrepreneurship, welcoming participants at various stages of their international journey [6] Group 2: Startup Presentations - Leewow allows users to design products freely, covering various creative themes and offering items like T-shirts and bags [12] - Brain Recording focuses on consumer-grade non-invasive brain-machine interfaces for sports and cognitive health, developing products like the Nuromova smart sports headband [14] - MeetaVista aims to create an entry point for AI in the real world, integrating technologies like naked-eye 3D and spatial AI terminals for applications in retail and education [16] - DataElem specializes in the application of large models, with products like BISHENG and Clawith aimed at enterprise-level services [17] Group 3: Key Personnel - Founders and key personnel from various startups are highlighted, including: - Shen Xingdong from Leewow, a young entrepreneur with significant early-stage funding [13] - Zhang Haotian from Brain Recording, with a background in industrial design [15] - Song Chongguo from MeetaVista, a former executive with extensive experience in strategy and digital growth [16] - Yutong from DataElem, focusing on international business development [18] - Lin Xiaodong from Xiaoying Technology, who has successfully led products in the Indian market [20] Group 4: Discussion Topics - The salon will cover topics such as the current state of AI globalization, the implications of the "lobster and agent" trend, and the true barriers to entry for AI startups in international markets [30]
黄仁勋暴论核弹:AGI已经实现,Ilya错了,程序员有10亿
量子位· 2026-03-24 08:47
Core Viewpoint - The article discusses the recent statements made by Jensen Huang, CEO of NVIDIA, regarding the achievement of Artificial General Intelligence (AGI) and its implications for the future of technology and society [1][2][3]. Group 1: AGI and Future Outlook - Huang asserts that AGI has already been achieved, emphasizing that this is not merely speculation but a conclusion drawn from various dimensions including technology, society, and human nature [3][8]. - He introduces the concept of OpenClaw as a transformative product in the token era, likening it to the iPhone, suggesting that intelligence will become a tradable commodity in the form of tokens [8][120]. - Huang predicts that the number of programmers will increase from 30 million to 1 billion, as coding becomes less about writing and more about problem-solving [210]. Group 2: Scaling Laws and Data - Huang believes that pre-training has not reached its peak, and synthetic data will continue to expand the scale of data available for AI [4][18]. - He argues that reasoning is a complex process that cannot be simplified to lightweight computations, contrasting it with pre-training which he likens to reading [6][20]. - The next scaling law, termed "agentic scaling," involves the creation of agentic individuals capable of generating and utilizing vast amounts of data [22][24]. Group 3: Energy and Data Center Design - Huang highlights the inefficiencies in current energy grid designs, suggesting that data centers should be rethought to utilize idle power more effectively [46][50]. - He proposes that data centers should be designed to gracefully degrade performance during peak energy demands, rather than requiring constant maximum output [57][58]. - Huang emphasizes the importance of collaboration with power companies to create flexible energy supply agreements [56][59]. Group 4: Cultural and Competitive Landscape - Huang notes that a significant portion of AI researchers are based in China, attributing this to a strong educational system and a competitive environment across various provinces [76][80]. - He describes the cultural factors that contribute to rapid knowledge sharing and innovation in China, including the importance of relationships and open communication among engineers [85][86]. - Huang believes that the rise of AI will not eliminate jobs but will transform them, enhancing the roles of professionals across various fields [226][237]. Group 5: Management Philosophy - Huang's management style focuses on collaboration and open communication, often involving large groups in problem-solving discussions [128][130]. - He emphasizes the importance of curiosity and continuous learning in leadership, encouraging a culture of shared insights and collective decision-making [131][135]. - Huang believes that maintaining humility and a willingness to learn from others is crucial for effective leadership [158][161].