量子位
Search documents
马斯克的最快AI模型来了
量子位· 2025-09-15 05:57
henry 发自 凹非寺 量子位 | 公众号 QbitAI 最强不敢说,但最快实锤了! 刚刚,xAI发布 Grok 4 Fast ,生成速度高达每秒 75 个 token,比标准版快 10 倍! 从下面的动图中,我们可以直观地看出差距—— solve the trapping rain water leetcode problem using python,just give me the answer 当左边的Grok 4还在说"让我想一下的时候",Grok 4 Fast已经在说:"下一个问题是什么了。" 天下AI,真就唯快不破? 接下来,我们一起看看Grok 4 Fast的实测表现。 网友实测 从网友的测试来看,Grok 4 Fast的确速度惊人。 例如,在网友的测试中,Grok 4 Fast用不到 2秒 就解决了一道经典的LeetCode题: 不仅Python,让Grok 4 Fast用C语言写链表,同样8秒搞定: 除了编程题,像"量子计算机什么时候取代传统计算机"这样的问答,Grok 4 Fast也能瞬间给出答案。 write a linked list in the C programming la ...
只要科学任务能打分,AI就能实现SOTA结果 | 谷歌最新论文
量子位· 2025-09-15 05:57
Core Viewpoint - The article discusses a new AI system developed by Google that assists scientists in creating expert-level empirical software, achieving state-of-the-art (SOTA) results across various scientific fields [10][12][30]. Group 1: AI System Development - The AI system utilizes a combination of Large Language Models (LLMs) and tree search algorithms to systematically improve software quality metrics [10][17]. - It addresses the slow and labor-intensive process of developing empirical software, which often takes years to complete [14][15]. - The system can automatically create empirical software for quantifiable tasks, significantly enhancing the efficiency of scientific research [17][24]. Group 2: Performance and Achievements - In bioinformatics, the system discovered 40 novel methods for single-cell data analysis, outperforming top human-developed methods on public leaderboards [25][30]. - In epidemiology, it generated 14 models that surpassed the CDC ensemble and all other individual models for forecasting COVID-19 hospitalizations [10][30]. - The system also produced state-of-the-art software for geospatial analysis, neural activity prediction in zebrafish, time series forecasting, and numerical solutions of integrals [10][30]. Group 3: Methodology and Innovation - The AI system enhances code mutation capabilities by injecting research ideas from highly cited papers, textbooks, and search engine results [21][24]. - It generates numerous candidate software solutions and employs tree search algorithms to filter and optimize these candidates [17][24]. - The integration of complex research ideas allows the system to explore a vast solution space, leading to the discovery of high-quality solutions [24][30]. Group 4: Community Response and Implications - The article notes that the introduction of AI in scientific research has sparked discussions about the appropriateness of delegating research authority to AI [32]. - There are concerns regarding the reliability of AI-generated results and the need for human oversight in the verification process [32][40].
谷歌靠Nano Banana超越ChatGPT!登顶苹果App Store第一,玩疯了玩疯了
量子位· 2025-09-15 05:57
Core Viewpoint - Google's Gemini has surpassed ChatGPT in app rankings, driven by the popularity of the image generation tool Nano Banana, which has significantly increased user engagement and application downloads [1][81]. Group 1: Gemini's Rise - Gemini has achieved top rankings not only in the US but also in countries like India, Canada, and Morocco, indicating its global appeal [3]. - The application gained 23 million new users in less than a month, with Nano Banana being utilized to edit over 500 million images [5][4]. - DeepMind's CEO has praised Nano Banana as the best among similar products, highlighting its effectiveness [6]. Group 2: Competitive Landscape - The competition between Google and OpenAI dates back to the founding of OpenAI, with Musk's intention to counter Google's dominance in AI [82]. - Google has faced challenges in the past, including criticism of its Bard AI and other tools that fell short of user expectations [84][85]. - The introduction of the Gemini series has allowed Google to address its technological shortcomings and integrate AI into core applications like Search, Chrome, and YouTube, reaching billions of users [86]. Group 3: Market Impact - The ascent of Gemini in the App Store signifies a pivotal moment in the AI application landscape, marking a shift in user acceptance and market dynamics [90]. - The success of AI applications in the App Store is seen as a benchmark for their influence and the evolving competitive landscape [90]. - Musk's recent accusations against Apple regarding app ranking manipulation highlight the competitive tensions in the AI space, with Gemini's rise being viewed as a potential counter to perceived market control [91][92].
腾讯混元升级AI绘画微调范式,在整个扩散轨迹上优化,人工评估分数提升300%
量子位· 2025-09-15 03:59
Core Viewpoint - The article discusses advancements in AI image generation, specifically focusing on the introduction of two key methods, Direct-Align and Semantic Relative Preference Optimization (SRPO), which significantly enhance the quality and aesthetic appeal of generated images [5][14]. Group 1: Current Challenges in Diffusion Models - Existing diffusion models face two main issues: limited optimization steps leading to "reward hacking," and the need for offline adjustments to the reward model for achieving good aesthetic results [4][8]. - The optimization process is constrained to the last few steps of the diffusion process due to high gradient computation costs [8]. Group 2: Direct-Align Method - Direct-Align method allows for the recovery of original images from any time step by pre-injecting noise, thus avoiding the limitations of optimizing only in later steps [5][10]. - This method enables the model to recover clear images from high noise states, addressing the gradient explosion problem during early time step backpropagation [11]. - Experiments show that even at just 5% denoising progress, Direct-Align can recover a rough structure of the image [11][19]. Group 3: Semantic Relative Preference Optimization (SRPO) - SRPO redefines rewards as text-conditioned signals, allowing for online adjustments without additional data by using positive and negative prompt words [14][16]. - The method enhances the model's ability to generate images with improved realism and aesthetic quality, achieving approximately 3.7 times and 3.1 times improvements, respectively [16]. - SRPO allows for flexible style adjustments, such as brightness and cartoon style conversion, based on the frequency of control words in the training set [16]. Group 4: Experimental Results - Comprehensive experiments on the FLUX.1-dev model demonstrate that SRPO outperforms other methods like ReFL, DRaFT, and DanceGRPO across multiple evaluation metrics [17]. - In human evaluations, the excellent rate for realism increased from 8.2% to 38.9% and for aesthetic quality from 9.8% to 40.5% after SRPO training [17][18]. - Notably, a mere 10 minutes of SRPO training allowed FLUX.1-dev to surpass the latest open-source version FLUX.1.Krea on the HPDv2 benchmark [19].
全新开源模型复现o3视觉推理,无需大量训练即可实现深度思考
量子位· 2025-09-15 03:59
Core Viewpoint - The article discusses the development of Mini-o3, an advanced visual language model (VLM) that enables multi-round visual reasoning, significantly improving upon previous models by allowing for deep reasoning across dozens of steps [1][2][15]. Group 1: Model Development - Mini-o3 is developed by a collaboration between ByteDance and the University of Hong Kong, designed to perform long-cycle visual search without extensive training resources [13]. - The model can extend its reasoning capabilities from a training limit of 6 rounds to dozens during testing, showcasing its advanced multi-modal reasoning abilities [2][15]. Group 2: Key Design Features - Mini-o3 incorporates three critical design elements: the VisualProbe dataset for exploratory reasoning, an iterative data collection process for diverse reasoning strategies, and a super-round masking strategy to balance training efficiency with testing scalability [17][19][34]. - The VisualProbe dataset consists of thousands of visual search challenges specifically designed for deep reasoning tasks, enhancing the model's training [17][38]. Group 3: Training Phases - The training of Mini-o3 occurs in two phases: a cold-start supervised fine-tuning (SFT) phase to activate multi-round tool usage, and a reinforcement learning (RL) phase to optimize interaction rounds [19][25]. - The cold-start SFT phase utilizes a small number of manually constructed samples to generate diverse reasoning trajectories, resulting in approximately 6000 cold-start reasoning paths [24][46]. Group 4: Performance Evaluation - Mini-o3 outperforms existing models in visual search tasks, achieving the best performance across various benchmarks, including VisualProbe, V*Bench, and HR-Bench [43][44]. - The model's performance is attributed to its ability to maintain complex and deep reasoning trajectories, with significant improvements noted in challenging tasks [44][48]. Group 5: Experimental Insights - Experiments indicate that removing RL data leads to a performance drop of about 8.6 points on VisualProbe-Hard, highlighting the importance of challenging RL samples for encouraging complex reasoning [45]. - The super-round masking technique effectively enhances RL performance, particularly in multi-round interaction scenarios, by stabilizing the training process and enabling extended reasoning during testing [48]. Group 6: Conclusion and Future Directions - The technical framework of Mini-o3 provides practical guidance for the development of multi-round interactive multi-modal models and their applications in reinforcement learning [52]. - The research team has made all related code open-source, promoting further exploration and development in this field [53].
昔日王者TensorFlow,已死
量子位· 2025-09-15 00:30
Core Viewpoint - The article discusses the decline of TensorFlow as an open-source framework, contrasting it with the rapid rise of PyTorch and other emerging projects in the AI open-source ecosystem [3][8][54]. Group 1: Decline of TensorFlow - TensorFlow's community activity peaked but has since declined to its lowest point, even lower than its inception [3][10]. - Ant Financial's open-source technology committee vice-chairman Wang Xu announced TensorFlow's removal from the latest open-source landscape map, indicating its diminishing relevance [6][8]. - The decline of TensorFlow reflects a broader trend in the AI open-source landscape, where project lifecycles are now measured in days rather than years [10][53]. Group 2: Open-Source Project Dynamics - The latest open-source landscape map (version 2.0) shows a significant turnover, with 39 new projects added and 60 existing projects removed, indicating a rapid evolution in the ecosystem [17][18]. - Projects that fail to maintain community engagement or lag in iteration speed are at risk of being excluded from the landscape [19][20][21]. - The competitive nature of the AI open-source ecosystem emphasizes the need for continuous innovation and effective community management to sustain project viability [24]. Group 3: New Paradigms in Open Source - The definition and operational model of open source are evolving, with some high-activity projects not adhering to traditional open-source licenses [26][30]. - The operational attributes of open source are becoming more pronounced, with platforms like GitHub serving as critical channels for product release and community engagement [31]. - New AI open-source projects are increasingly adopting customized licensing terms to balance community benefits with commercial interests, indicating a shift towards a more pragmatic approach to open source [32][33]. Group 4: Competitive Landscape - The focus of competition in the AI ecosystem has shifted from broad functionality to performance optimization, particularly in model serving and inference efficiency [35][44]. - The decline in activity for agent frameworks suggests a transition from exploratory phases to more practical, performance-driven applications [41][42]. - The emergence of high-performance inference engines highlights the importance of optimizing model serving to reduce operational costs and enhance application viability [43][44]. Group 5: Global Contribution Dynamics - The global AI open-source landscape is characterized by a "dual center" model, with the U.S. and China as the primary contributors, each excelling in different technological domains [46][49]. - U.S. developers lead in infrastructure contributions, while Chinese developers show strong growth in application innovation, driven by local market demands [51][52]. - The evolving contribution dynamics reflect a shift towards application-driven innovation, with real-world needs shaping the development of AI tools and solutions [50].
一文看尽35万人围观的智博会
量子位· 2025-09-14 07:30
Core Viewpoint - The 2025 Chongqing Smart Expo showcased the latest advancements in the smart industry, featuring over 550 domestic and international companies and more than 3,000 innovative products, attracting over 350,000 visitors [1][3]. Group 1: Main Themes - The main theme of the expo is artificial intelligence, with two core focuses: "Artificial Intelligence +" and "Smart Connected New Energy Vehicles" [5]. - Five major sectors highlighted include smart robotics, low-altitude economy, smart home, smart driving, and digital cities [5]. Group 2: Key Exhibitors and Technologies - Huawei showcased its comprehensive digital transformation solutions, emphasizing its self-developed Kunpeng processors and Ascend AI hardware, which can enhance business performance by 10% to 30% [10]. - Tencent presented its modular embodied intelligence open platform, TAIROS, and demonstrated interactive AI applications across its suite of apps, including QQ and WeChat [12][18]. - iFlytek focused on consumer products, including AI learning machines and intelligent office tools [20]. Group 3: Telecommunications Companies - China Unicom introduced a "three-in-one" system for AI infrastructure, technology, and industry, showcasing collaborative robotics and AI-driven industrial management [24]. - China Mobile highlighted its smart connected vehicles and AI intelligent terminals, integrating 5G technology with smart home ecosystems [27]. - China Telecom's Tianyi Cloud featured a quantum computing model and advanced cloud services, showcasing its leadership in quantum technology [31]. Group 4: State-Owned Enterprises - State Grid displayed nine self-developed chips, addressing the "bottleneck" issue in chip technology with capabilities ranging from 0.1 to 256 TOPS [33]. - Sinopec presented a miniature model of an intelligent factory, demonstrating advanced robotics and drone inspection systems [35]. - PetroChina introduced its first over 10,000-meter deep exploration well model and launched an app tailored for the energy and chemical industry [39]. Group 5: Academic Contributions - Chongqing University developed a digital twin system for coal mines, successfully tested in real-world conditions [41]. - Chongqing Jiaotong University showcased an intelligent inspection system for tunnels, integrating cloud and edge computing [45]. - Chongqing Normal University presented advanced brain imaging and brain-computer interface technologies [49]. Group 6: Smart Home Innovations - Xiaomi and Haier displayed comprehensive smart home solutions, integrating various smart devices for enhanced user experience [79][81]. - Midea showcased its smart kitchen ecosystem, emphasizing climate control and energy efficiency [87]. - Various AI-powered pet care products were introduced, including smart feeding and health tracking devices [96][99]. Group 7: Low-Altitude Economy - The expo featured a dedicated area for low-altitude economy, showcasing drones and air taxis, with DJI presenting its FLYCART 100 capable of carrying 80 kg [103][104]. - Xunyi Technology established urban air logistics networks in collaboration with major delivery platforms, focusing on medical supply delivery [112][114]. - The concept of "air taxis" was highlighted, with companies like GAC and VoloCity planning to launch electric air taxis for urban transport [122][125]. Group 8: Smart Connected Vehicles - The expo emphasized smart connected new energy vehicles, with Tesla showcasing its latest models, including the Model Y L with a range of 751 km [130][132]. - Various automakers, including Changan and BYD, presented their advancements in AI integration and autonomous driving technologies [152][173]. - The focus on "smart driving" reflects the industry's shift towards enhancing vehicle safety and interactivity through AI and IoT technologies [173].
科研学术,现在可以百度AI一下了
量子位· 2025-09-14 07:30
Core Viewpoint - Baidu Academic is transforming into a comprehensive "Research platform" that covers the entire lifecycle of academic papers, from searching and reading to creating and editing, aiming to become the first one-stop AI academic platform in the industry [1][2][29]. Group 1: Features of the New Platform - The platform will include AI academic search, AI literature summarization, AI reading, and paper mapping, enhancing the efficiency of literature collection and research [1][3][7]. - Users can input keywords to find relevant literature, and utilize AI Q&A for summarization, significantly reducing time spent switching between different PDFs [9][10]. - The literature mapping feature allows users to visualize classic literature, research hotspots, and development trajectories in their field within minutes [10][12]. Group 2: Reading and Writing Support - The literature summarization function supports batch uploading of up to 100 files, generating structured summaries in 30 seconds, enabling researchers to grasp core content quickly [13][14]. - The AI reading feature can accurately restore the layout of foreign language literature and provide automatic translations for a smoother reading experience [15][16]. - The writing phase includes a topic recommendation function that suggests valuable innovative research directions based on existing literature [16][19]. Group 3: Academic Resource Integration - Baidu Academic has partnered with professional data analysis platforms like SPSSPRO, allowing for a seamless process from data acquisition to analysis and result presentation [22][23]. - As of now, Baidu Academic has indexed 690 million literature resources, leading globally, with a daily update of over 420,000 documents and a Chinese literature coverage rate of 97% [31][34]. - The platform aims to lower research barriers and enhance academic content dissemination by covering all professional fields classified by the Ministry of Education [33][34]. Group 4: Academic Community Engagement - Baidu Academic has created profiles for 4.2 million scholars, including renowned academicians, facilitating information exchange within the academic community [36][38]. - The vision of upgrading the "academic foundation" to a "global academic ecosystem engine" is becoming increasingly feasible as the academic ecosystem continues to improve [38][40].
啥?陶哲轩18个月没搞定的数学挑战,被这个“AI高斯”三周完成了
量子位· 2025-09-14 05:05
Core Viewpoint - The new AI agent named Gauss has demonstrated remarkable capabilities by solving a mathematical challenge in just three weeks, a task that took renowned mathematicians 18 months to make progress on [2][4][8]. Group 1: Gauss and Its Capabilities - Gauss is developed by a company called Math and is the first AI agent capable of assisting top mathematicians in formal verification through autoformalization [5]. - The process of formalization involves converting human-written mathematical content into a machine-readable format, allowing for verification of correctness [6]. - Gauss has generated approximately 25,000 lines of Lean code, which includes over a thousand theorems and definitions, a task that typically requires years to complete [10][11]. Group 2: Comparison with Historical Projects - The largest historical formalization projects have taken up to ten years and produced around 500,000 lines of code, while Gauss's output is significantly faster and more efficient [12]. - In comparison, the standard mathematical library Mathlib, which contains about 2 million lines of code and 350,000 theorems, took over 600 contributors eight years to develop [13]. Group 3: Technical Infrastructure and Future Plans - To support Gauss's operations, Math collaborated with Morph Labs to develop the Trinity infrastructure, which involves thousands of concurrent agents, each with its own Lean environment, consuming several terabytes of cluster memory [14]. - The Math team anticipates that Gauss will significantly reduce the time required to complete large mathematical projects and plans to increase the total amount of formalized code by 100 to 1,000 times within the next 12 months [15][16]. Group 4: Insights from Mathematicians - Mathematician Terence Tao highlighted the importance of clearly defining both explicit and implicit goals in formalization projects, especially as powerful AI tools change the dynamics of project execution [18][19]. Group 5: Company Background - The founder of Math, Christian Szegedy, is recognized for his contributions to the field, including co-authoring the influential paper on Batch Normalization, a key technology for scaling deep learning [21][24][26].
机器人入职洗衣房,开始打工挣钱!苹果前AI高管打造
量子位· 2025-09-14 05:05
Core Viewpoint - The article discusses the introduction of a laundry folding robot named Isaacs, developed by Weave Robotic, which is designed to automate the labor-intensive task of folding clothes in laundromats, marking a significant advancement in household robotics [1][3][4]. Group 1: Company Overview - Weave Robotic was founded by former Apple team members, indicating a strong background in technology and innovation [4][15]. - The company has successfully completed three rounds of financing even before the official product launch, showcasing investor confidence in its potential [4]. Group 2: Technology and Functionality - Isaacs is not just a folding robot; it is a versatile household robot capable of performing various tasks, including organizing items and home security in the future [12][14]. - The robot operates based on a three-tiered technological framework, which includes a self-trained visual-language-action (VLA) model for precise identification and folding of clothes [10][18]. - Isaacs can achieve 70% autonomous folding with human intervention only when necessary, demonstrating its advanced capabilities [18]. Group 3: Operational Process - The operational workflow begins with the laundromat handling washing and drying, followed by Isaacs taking over the folding process, which is labor-intensive and requires a certain level of neatness [5][8]. - Specific standards for folding are established, ensuring that items like shirts are folded uniformly and neatly, with attention to details such as collar orientation [6][7]. Group 4: Future Prospects - The company plans to expand Isaacs' functionalities beyond folding clothes to include various household tasks, addressing diverse family needs [14]. - Privacy considerations are integrated into the design, with features that allow the robot to shut down its camera when idle [14].