量子位

Search documents
美国版梁文锋来了
量子位· 2025-07-11 06:16
Core Viewpoint - Harmonic AI, co-founded by Vlad Tenev and Tudor Achim, aims to develop an AI system capable of solving complex mathematical problems, striving for Mathematical Superintelligence (MSI) [3][20]. Group 1: Company Overview - Harmonic AI has successfully raised $100 million in Series B funding, bringing its valuation to approximately $875 million [4][17]. - The company was co-founded by Vlad Tenev, who previously established Robinhood Markets, and Tudor Achim, an expert in AI and large model training [5][15]. - Robinhood, under Tenev's leadership, achieved a market cap of around $22.7 billion and reported a revenue of $927 million with a net profit of $336 million in Q1 2025 [8][12]. Group 2: Funding and Valuation - Harmonic AI's Series A funding raised $75 million, led by Sequoia Capital, with a post-money valuation of $325 million [15]. - The recent Series B funding was led by Kleiner Perkins, with participation from Paradigm and Ribbit Capital, among others [16]. - The company intentionally set its valuation below the "unicorn" threshold of $1 billion, focusing on long-term growth rather than short-term valuation targets [18][19]. Group 3: Product Development - Harmonic AI announced its first model, Aristotle, which can formalize natural language problems into formal representations, enhancing collaboration with mathematicians [20]. - The model's performance improved from 83% to 90% on the MiniF2F benchmark, which includes various levels of mathematical problems [23]. - The ultimate goal is to create an AI system with mathematical capabilities surpassing human abilities, addressing challenges like the "hallucination" problem in AI [26][28].
无需CUDA代码给H100加速33%-50%,Flash Attention作者新作火了
量子位· 2025-07-11 06:16
西风 发自 凹非寺 量子位 | 公众号 QbitAI 无需CUDA代码,给H100加速33%-50% ! Flash Attention、Mamba作者之一 Tr i Da o 的新作火了。 他和两位普林斯顿CS博士生提出了 一个名叫 QuACK 的新SOL内存绑定内核库 ,借助CuTe-DSL,完全用Python写,一点CUDA C++代码 都没用到。 在带宽3TB/s的H100上,它的速度比像PyTorch的torch.compile、Liger这类已经过深度优化的库还要快33%-50%。 Tri Dao表示,让内存密集型的内核达到"光速"并非什么神秘技巧,只需把几个细节处理到位就行。 我很喜欢Phil Tillet对不同工具在生产力和性能方面各有取舍的观点,比如torch compile、triton、CUDA、PTX。 但CuTe-DSL以及类似的基于Python的DSL或许能改变这一局面,虽然目前还处于早期阶段。而且,说不定很快我们就能让大语言模型 来生成这些内核了! 新作一经发出,吸引不少大佬关注。 英伟达CUTLASS团队资深架构师Vijay 转发,自夸他们团队做的CuTe-DSL把各种细节都打 ...
中科院“二氧化碳制糖”新成果全网热议!不依赖光合作用,“迈向深空前置技术”
量子位· 2025-07-11 06:16
Core Viewpoint - The article discusses a groundbreaking technology developed by the Tianjin Institute of Industrial Biotechnology, which enables the conversion of carbon dioxide (CO₂) into sugar through a synthetic pathway, bypassing traditional agricultural methods [1][5][24]. Summary by Sections Technology Overview - The new method allows for the conversion of CO₂ to methanol and then to sucrose, effectively circumventing the need for photosynthesis in plants [1][5]. - The energy consumption for this process is significantly lower, requiring only 2 ATP per sucrose compared to 37 ATP in natural pathways [6]. Conversion Efficiency - The conversion efficiency of the new method reaches 86%, with a yield of 5.7 g/L for sucrose [7]. - The research also enables the synthesis of starch and oligosaccharides from low-carbon molecules [10]. Process Steps - The conversion process consists of two main steps: 1. Converting CO₂ to C1 raw materials using existing technologies [13][14]. 2. Transforming C1 molecules into sucrose through an engineered biotransformation system, enhancing the catalytic efficiency of key enzymes by 3 to 71 times [15][18]. Implications - This technology could address significant global challenges such as climate change and food security by reducing the need for arable land and freshwater [31]. - The potential to synthesize essential carbohydrates directly from CO₂ could revolutionize food production and agricultural practices [26][28]. Future Prospects - If successful, this method could eliminate the need for traditional crop cultivation, allowing for the production of staple foods in a controlled environment [24][26]. - The research team aims to simulate the complex systems evolved by plants over millions of years, providing a sustainable alternative to current agricultural practices [29].
文档秒变演讲视频还带配音!开源Agent商业报告/学术论文接近人类水平
量子位· 2025-07-11 04:00
Core Viewpoint - PresentAgent is a multimodal AI agent designed to automatically convert structured or unstructured documents into video presentations with synchronized voiceovers and slides, aiming to replicate human-like information delivery [1][3][22]. Group 1: Functionality and Process - PresentAgent generates highly synchronized visual content and voice explanations, effectively simulating human-style presentations for various document types such as business reports, technical manuals, policy briefs, or academic papers [3][21]. - The system employs a modular generation framework that includes semantic chunking of input documents, layout-guided slide generation, rewriting key information into spoken text, and synchronizing voice with slides to produce coherent video presentations [11][20]. - The process involves several steps: document processing, structured slide generation, synchronized subtitle creation, and voice synthesis, ultimately outputting a presentation video that combines slides and voice [13][14]. Group 2: Evaluation and Performance - The team conducted evaluations using a test set of 30 pairs of human-made "document-presentation videos" across various fields, employing a dual-path evaluation strategy that assesses content understanding and quality through visual-language models [21][22]. - PresentAgent demonstrated performance close to human levels across all evaluation metrics, including content fidelity, visual clarity, and audience comprehension, showcasing its potential in transforming static text into dynamic and accessible presentation formats [21][22]. - The results indicate that combining language models, visual layout generation, and multimodal synthesis can create an explainable and scalable automated presentation generation system [23].
感知错误率降低30.5%:隐式感知损失让模型主动“睁大眼睛” | UIUC&阿里通义
量子位· 2025-07-11 04:00
Core Viewpoint - The article discusses the introduction of a new reinforcement learning algorithm called PAPO (Perception-Aware Policy Optimization) developed by the University of Illinois Urbana-Champaign and Alibaba's Tongyi Laboratory, which focuses on enhancing multimodal reasoning by integrating perception into the learning process [1][3]. Group 1: Introduction of PAPO - PAPO aims to address the limitations of existing reinforcement learning algorithms like GRPO, which excel in text reasoning but struggle with multimodal scenarios due to inadequate visual information utilization [2][3]. - The algorithm introduces an innovative implicit perception loss design that relies on internal supervisory signals, allowing multimodal models to learn perception alongside reasoning [3][6]. Group 2: Error Analysis and Findings - A systematic error analysis revealed that the primary issue in multimodal reasoning is the accuracy of visual perception, rather than logical reasoning capabilities [6][7]. - The analysis of 200 error cases from the Qwen2.5-VL-3B model trained with GRPO showed that 67% of errors were due to perception inaccuracies, while only 18% were due to reasoning errors [9][14]. Group 3: Technical Innovations of PAPO - PAPO's core innovation includes the design of a perception information gain ratio and the maximization of KL divergence to encourage different output distributions for original and damaged images [19][20]. - The complete objective function for PAPO is presented as a simple extension of GRPO, integrating the KL divergence term [21]. Group 4: Experimental Validation - Comprehensive evaluations on eight multimodal reasoning benchmarks demonstrated that PAPO consistently outperformed GRPO, achieving an overall average improvement of 4.4% and a significant 30.5% reduction in perception errors [26][28]. - PAPO exhibited faster convergence and more stable training dynamics compared to GRPO, starting to show improvements as early as 25 training steps [29][30]. Group 5: Visual Dependency Analysis - The analysis of visual dependency in mainstream multimodal reasoning benchmarks indicated that many tasks contain substantial visual information, allowing models to answer correctly without visual input [50][51]. - PAPO showed the most significant improvements in high-visual-dependency tasks, with nearly an 8% enhancement, while maintaining consistent improvements across medium and low-dependency tasks [53][54]. Group 6: Practical Applications - Several practical application cases illustrate PAPO's effectiveness in complex geometric problems, such as accurately calculating relationships in right triangles and distinguishing between different objects [55][63][64].
奖励模型也能Scaling!上海AI Lab突破强化学习短板,提出策略判别学习新范式
量子位· 2025-07-11 04:00
Core Viewpoint - The article discusses the introduction of a new reward modeling paradigm called Policy Discriminative Learning (POLAR), which enhances the post-training phase of large language models (LLMs) and addresses the limitations of traditional reward models in reinforcement learning [1][3][4]. Group 1: Challenges in Reward Modeling - The design and training of reward models have been a bottleneck in improving the effectiveness of post-training and model capabilities [2]. - Traditional reward models lack systematic pre-training and scaling methods, hindering their ability to improve alongside computational resources [2]. Group 2: Introduction of POLAR - POLAR decouples from absolute preferences and allows for efficient scaling of reward modeling, enabling adaptability to various customized needs based on reference answers [3][5]. - POLAR can assign different scores to model outputs based on varying reference styles without needing to retrain the reward model [7]. Group 3: Training Methodology of POLAR - POLAR employs a two-stage training process: pre-training and preference fine-tuning, utilizing a contrastive learning approach to measure the distance between training and target strategies [21][22]. - The pre-training phase uses a large amount of automated synthetic data, allowing for significant scalability [22][23]. Group 4: Performance and Scaling Effects - POLAR demonstrates scaling effects, with validation loss decreasing in a power-law relationship as model parameters and computational resources increase [28][29]. - In preference evaluation experiments, POLAR outperforms state-of-the-art reward models, showing significant improvements in various tasks, particularly in STEM-related tasks [32][34]. - POLAR's ability to learn subtle distinctions between strategy models enhances the generalization of reward signals in real-world applications [35].
抱抱脸进军具身智能机器人:5小时成交破百万,299美元起售
量子位· 2025-07-11 04:00
Core Viewpoint - HuggingFace, known as the "GitHub of AI," has launched an open-source desktop robot called Reachy Mini, which sold over €130,000 (approximately ¥1.09 million) within five hours of its release [1][2]. Group 1: Product Overview - Reachy Mini is available in two versions: a wired version priced at $299 and a wireless version at $499 [1][2]. - The robot stands 28 cm tall, weighs 1.5 kg, and features a movable head and rotating body, making it compact enough to fit beside a computer [2][3]. - Despite its small size, Reachy Mini is equipped with a complete system framework for structure design and AI integration [5]. Group 2: Technical Specifications - The robot has six degrees of freedom in head movement, full-body rotation, animated antennas, a wide-angle camera, multiple microphones, and a 5-watt speaker [6]. - The wireless version includes a Raspberry Pi 5 as its computing core, supports Wi-Fi and battery power, and has four microphones compared to two in the Lite version, along with an added accelerometer for enhanced interaction capabilities [6]. Group 3: Community and Ecosystem - Users can access over 15 preset actions from HuggingFace's platform, allowing for quick exploration and learning, and can also upload and share new robot behaviors with the community [7]. - Reachy Mini can run Python and will support JavaScript and Scratch in the future, functioning as a small robot workstation [8]. Group 4: Company Strategy and Market Position - HuggingFace, valued at $4.5 billion, primarily facilitates the sharing of machine learning models and datasets [10]. - The company is increasingly interested in the robotics field, launching the LeRobot project in May 2024 to provide open-source robot models and tools, aiming to lower the barriers for robot development [11]. - In April 2025, HuggingFace acquired PollenRobotics and introduced a humanoid robot named HopeJR, priced at $3,000 [13]. Group 5: Market Challenges and Perspectives - While low-cost robots like Reachy Mini offer advantages such as affordability and ease of data collection, they face limitations in application scenarios and practical utility [19]. - Some community members express skepticism about the long-term viability of low-cost robots, noting that as product iterations progress, standard humanoid robots may enter higher price brackets [20].
这届985毕业生直播带货一把好手!50余款产品热销全网供不应求
量子位· 2025-07-11 04:00
Core Viewpoint - The article highlights the innovative approach of students from the "Technology Small Courtyard" project at China Agricultural University, showcasing their agricultural products during graduation season, which attracted significant online attention and sales [1][5][19]. Group 1: Graduation and Product Launch - The graduation season saw students selling over 50 types of agricultural products, attracting 30 million viewers online [1][5]. - Products included domestically grown durians, pear juice, and other unique agricultural items, which were sold through live streaming on platforms like Pinduoduo [3][5]. Group 2: Technology Small Courtyard Project - The "Technology Small Courtyard" project, initiated by an academician, aims to provide technology services to small farmers, expanding to over 1,800 courtyards nationwide [7][8]. - Students engage in practical agricultural research, addressing real-world farming issues, such as the cultivation of domestic durians using innovative fertilization techniques [9][10]. Group 3: Economic Impact and Community Engagement - The project has significantly increased local agricultural productivity, with one village's income quadrupling due to improved farming practices [16]. - The graduation event served as a platform to showcase not only products but also the students' commitment to their communities and the agricultural sector [17][19]. Group 4: Pinduoduo's Role and Support - Pinduoduo has established a dedicated section for the Technology Small Courtyard on its platform, facilitating the online exhibition of students' agricultural achievements [18]. - The company has invested in various initiatives, including funding research and providing digital training to local farmers, enhancing the overall agricultural ecosystem [25][30]. Group 5: Broader Agricultural Strategy - Pinduoduo's approach reflects a shift from merely selling products to building a comprehensive agricultural support system, addressing multiple challenges in the sector [28][29]. - The company aims to create a sustainable agricultural network that connects technology, talent, and market access, thereby enhancing the resilience of the agricultural industry [31].
小扎开价14亿让他换个地方打工,库克连挽留尝试都没有
量子位· 2025-07-11 00:34
Core Viewpoint - The article discusses the significant salary package offered by Meta to Pang Ruoming, the head of Apple's foundational model team, highlighting the competitive nature of talent acquisition in the AI industry and the implications for both companies and the broader market [1][6][8]. Group 1: Salary Package Details - Pang Ruoming's total compensation package from Meta is reported to be around $200 million, equivalent to approximately 1.4 billion RMB [2][4]. - This package includes base salary, signing bonuses, and stock options that are unlocked over a period, rather than being a straightforward annual salary [4][9]. - The compensation is notably high, with only Apple's CEO Tim Cook being able to match it within the company, indicating the extreme competitiveness in attracting top AI talent [3][8]. Group 2: Talent Acquisition Strategies - Meta's approach to hiring top AI talent involves aggressive offers with minimal time for consideration, likened to a "limited-time auction" to encourage quick acceptance [5][10]. - Previous high-profile hires, such as Yu Jiahui from OpenAI, received substantial packages, further illustrating the trend of escalating compensation in the AI sector [5][8]. - The article suggests that Meta's strategy may disrupt internal company cultures, as the disparity in compensation can lead to dissatisfaction among existing employees [12][14]. Group 3: Industry Context - The salary levels at Meta's "superintelligent laboratory" are significantly higher than those for other engineering positions within the company and compared to typical Silicon Valley salaries [19][25]. - For instance, the highest-paid software engineers at Meta earn around $480,000 annually, while the average salary for AI research scientists ranges from $170,000 to $230,000 [16][18]. - In contrast, Google's software engineers earn approximately $340,000, with research scientists being among the highest earners [22][24]. Group 4: Pang Ruoming's Background - Pang Ruoming has an impressive background, having graduated from Shanghai Jiao Tong University and holding advanced degrees from USC and Princeton [29][30]. - He previously worked at Google for 15 years, where he contributed to significant projects that enhanced Google's data systems and machine learning frameworks [31][34]. - His leadership at Apple involved overseeing a team of around 100 people focused on developing core AI models, which are crucial for Apple's AI functionalities [43][46].
他一人撑起谷歌90%的AI宣传,劈柴真是挖到鬼才了
量子位· 2025-07-10 08:00
Core Viewpoint - Logan Kilpatrick, a key figure in Google's AI marketing efforts, is responsible for 90% of the company's AI promotional work, having transitioned from OpenAI to Google [3][22]. Group 1: Logan Kilpatrick's Role and Background - Logan Kilpatrick is recognized as Google's AI "promotional expert," actively engaging with the developer community on platforms like X [2][3]. - At just 27 years old, Kilpatrick has a background that includes working at NASA and Apple before joining OpenAI as the Developer Relations Lead [7][8]. - His experience at OpenAI helped him understand ecosystem building and developer engagement, earning him the nickname "LoganGPT" among developers [10][11]. Group 2: Transition to Google and Responsibilities - Kilpatrick joined Google in 2024, where he was tasked with developing the AI Studio platform and integrating it into Google Cloud [12][14]. - Following a significant talent migration within Google, his team was moved under DeepMind, enhancing collaboration between research and development [19][20]. - He has been instrumental in promoting Google's Gemini series models, which have over 400 million monthly active users, although they still lag behind ChatGPT's 500 million weekly active users [23]. Group 3: Marketing Challenges and Strategies - Google faces challenges in marketing due to its diverse product offerings, which can confuse developers and users [24][25]. - Kilpatrick acknowledges that Google needs to improve its marketing efforts to better communicate ongoing innovations [26][27]. - His approach involves direct engagement with developers, which has been well-received and contrasts with traditional marketing channels [28][36]. Group 4: Investment Activities - In addition to his role at Google, Kilpatrick has invested in over 50 startups, indicating his active involvement in the tech ecosystem [39].