Workflow
机器之心
icon
Search documents
时代2025 AI百人榜出炉:任正非、梁文锋、王兴兴、彭军、薛澜等入选,华人影响力爆棚
机器之心· 2025-08-29 04:34
机器之心报道 机器之心编辑部 刚刚,《时代》周刊发布了 2025 年度 AI 领域最具影响力的 100 人名单。 在这份名单中,我们看到了很多熟悉的学者和企业家。 令人惊喜的是,今年出现了更多的华人面孔,并且有许多是第一次登上 AI 领域的榜单。此次登榜的有大家耳熟能详的 AI 领军人物: 华为创始人任正非、 DeepSeek CEO 梁文锋、宇树科技 CEO 王兴兴、小马智行 CEO 彭军、Meta 首席 AI 官汪滔(Alexandr Wang)、清华大学教授薛澜、斯坦福教授李飞飞等等。 下面我们整理了部分入选人员名单,完整名单请查看原文: https://time.com/collections/time100-ai-2025/ 更多华人身影 领导者(Leaders) 任正非,华为创始人 任正非推动了公司在 AI 领域的长期、高强度投资,旨在打造一套完全自主可控的技术体系。 在他的战略引领下,华为成功推出了作为算力底座的昇腾(Ascend)系列 AI 芯片、昇思(MindSpore)深度学习框架,以及赋能千行百业的盘古(Pangu)大模 型,确保了公司在智能时代的竞争力,也为构建一个关键、独立的 AI ...
谷歌Nano Banana全网刷屏,起底背后团队
机器之心· 2025-08-29 04:34
Core Viewpoint - Google DeepMind has introduced the Gemini 2.5 Flash Image model, which features native image generation and editing capabilities, enhancing user interaction through multi-turn dialogue and maintaining scene consistency, marking a significant advancement in state-of-the-art (SOTA) image generation technology [2][30]. Team Behind the Development - Logan Kilpatrick, a senior product manager at Google DeepMind, leads the development of Google AI Studio and Gemini API, previously known for his role at OpenAI and experience at Apple and NASA [6][9]. - Kaushik Shivakumar, a research engineer at Google DeepMind, focuses on robotics and multi-modal learning, contributing to the development of Gemini 2.5 [12][14]. - Robert Riachi, another research engineer, specializes in multi-modal AI models, particularly in image generation and editing, and has worked on the Gemini series [17][20]. - Nicole Brichtova, the visual generation product lead, emphasizes the integration of generative models in various Google products and their potential in creative applications [24][26]. - Mostafa Dehghani, a research scientist, works on machine learning and deep learning, contributing to significant projects like the development of multi-modal models [29]. Technical Highlights of Gemini 2.5 - The model showcases advanced image editing capabilities while maintaining scene consistency, allowing for quick generation of high-quality images [32][34]. - It can creatively interpret vague instructions, enabling users to engage in multi-turn interactions without lengthy prompts [38][46]. - Gemini 2.5 has improved text rendering capabilities, addressing previous shortcomings in generating readable text within images [39][41]. - The model integrates image understanding with generation, enhancing its ability to learn from various modalities, including images, videos, and audio [43][45]. - The introduction of an "interleaved generation mechanism" allows for pixel-level editing through iterative instructions, improving user experience [46][49]. Comparison with Other Models - Gemini aims to integrate all modalities towards achieving artificial general intelligence (AGI), distinguishing itself from Imagen, which focuses on text-to-image tasks [50][51]. - For tasks requiring speed and cost-effectiveness, Imagen remains a suitable choice, while Gemini excels in complex multi-modal workflows and creative scenarios [52]. Future Outlook - The team envisions future models exhibiting higher intelligence, generating results that exceed user expectations even when instructions are not strictly followed [53]. - There is excitement around the potential for future models to produce aesthetically pleasing and functional visual content, such as accurate charts and infographics [53].
ICCV 2025 Highlight | 3D真值生成新范式,开放驾驶场景的语义Occupancy自动化标注!
机器之心· 2025-08-29 00:15
该论文的第一作者和通讯作者均来自北京大学王选计算机研究所的 VDIG (Visual Data Interpreting and Generation) 实验室,第一作者为北京大学博士生周啸宇, 通讯作者为博士生导师王勇涛副研究员 。VDIG 实验室近年来在 IJCV、CVPR、AAAI、ICCV、ICML、ECCV 等顶会上有多项重量级成果发表,多次荣获国内 外 CV 领域重量级竞赛的冠亚军奖项,和国内外知名高校、科研机构广泛开展合作。 本文介绍了来自北京大学王选计算机研究所王勇涛团队及合作者的最新研究成果 AutoOcc。针对开放自动驾驶场景,该篇工作提出了一个高效、高质量的 Open- ended 三维 语义占据栅格真值标注 框架,无需任何人类标注即可超越现有语义占据栅格自动化标注和预测管线,并展现优秀的通用性和泛化能力,论文已被 ICCV 2025 录用为 Highlight 。 论文标题:AutoOcc: Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting 论 ...
Grok代码模型来了:限时免费用,速度超级快
机器之心· 2025-08-29 00:15
机器之心报道 编辑:泽南 速度比 GPT-5 快三倍,便宜六倍。 本周四,马斯克的 xAI 正式推出了旗下的最新代码模型 Grok Code Fast 1。 grok-code-fast-1 是从零开始训练的语言模型,采用全新的模型架构。为了奠定坚实的基础,xAI 精心构建了一个包含丰富编程相关内容的预训练语料 库。在训练后也精选了能够反映真实世界拉取请求和编码任务的高质量数据集。 其实在本周早些时候,该模型已在部分平台上静默上线了,当时的代号为 Sonic。 在博客文章与模型卡中,xAI 介绍了新模型的一些特性,但模型架构、数据和微调的细节并不详尽。xAI 的推理和超级计算团队开发了多项创新技术,显著 提升了代码模型的服务速度,创造了独特的响应式体验。在人们读完 AI 思考轨迹的第一段之前,模型就已经调用了数十种工具。 xAI 还投入了大量精力进行快速缓存优化,在各个合作伙伴的平台上运行时,缓存命中率通常超过 90%。 在整个训练过程中,xAI 与发布合作伙伴密切合作,不断完善和优化模型在平台上的行为。据介绍,grok-code-fast-1 已经熟练掌握了 grep、终端和文 件编辑等常用工具的使用方法, ...
杜克大学、Zoom推出LiveMCP‑101:GPT‑5表现最佳但未破60%,闭源模型Token效率对数规律引关注
机器之心· 2025-08-28 10:40
Core Insights - The article discusses the introduction of LiveMCP-101, the first evaluation benchmark specifically designed for MCP-enabled Agents in real dynamic environments, consisting of 101 meticulously crafted tasks across various domains such as travel planning, sports entertainment, and software engineering [2][5][27] - The study reveals that even the most advanced models have a success rate of less than 60% on this benchmark, highlighting significant challenges faced by current LLM Agents in practical deployment [2][5][27] Research Background and Motivation - The emergence of external tool interaction capabilities has become central to AI Agents, allowing them to engage dynamically with the real world [5] - Existing benchmarks are limited as they focus on single-step tool calls and synthetic environments, failing to capture the complexity and dynamism of real-world scenarios [5] - User queries in reality often involve detailed context and specific constraints, necessitating precise reasoning across multiple tool calls [5] Evaluation Framework - The benchmark includes 101 high-quality tasks, covering 41 MCP servers and 260 tools, categorized into Easy, Medium, and Hard difficulty levels [6] - A Reference Agent mechanism is established to ensure stable and reproducible results by strictly following predefined execution plans [9] - A dual scoring mechanism is employed, utilizing LLM-as-judge to assess both the results and execution trajectories of the tested agents [11] Key Findings - Among 18 evaluated models, GPT-5 leads with a 58.42% overall success rate, while performance significantly declines with task difficulty [14] - The study identifies a strong correlation between execution quality and task success rates, emphasizing the importance of "process correctness" [17] - Systematic failure modes are categorized into three main types, with planning and orchestration errors being the most prevalent [20] Comparison with Existing Work - LiveMCP-101 offers a more realistic assessment by incorporating a larger tool pool and interference tools, exposing robustness issues under long contexts and selection noise [23] - The benchmark's detailed execution plans and scoring methods provide a clearer differentiation among model capabilities [24] - The framework allows for precise identification of errors in planning, parameters, or post-processing, guiding engineering optimizations [25]
谷歌又赢了,nano banana「被迫」改名后,网友搞出7种神仙玩法
机器之心· 2025-08-28 10:40
Core Viewpoint - Google has claimed the AI image editing model "nano banana," renaming it to "Gemini-2.5-flash-image," which has gained significant popularity, comparable to the excitement generated by GPT-4o [2][5]. Group 1: Model Features and Capabilities - The Gemini-2.5-flash-image model is faster, cheaper, and more capable in image generation and editing compared to competitors, receiving widespread praise as the best AI photo editor [5]. - Users can experience the model for free through Gemini applications and Google AI Studio, allowing for easy image uploads and text prompts [5][10]. - The model can create isometric models by easily isolating buildings or objects, transforming night scenes into daytime images while adding missing architectural details [9][12]. Group 2: Innovative Use Cases - Users have developed various creative applications, such as generating location-based augmented reality experiences by annotating real-world images [15][18]. - The model can produce multiple views of a subject in a consistent isometric perspective, useful for product modeling and industrial design [12]. - It can generate detailed natural landscape images based on digital elevation models (DEMs), accurately reflecting terrain features [26]. Group 3: Fashion and Style Applications - The model allows users to upload outfit photos and instantly generate a clothing list, appealing to fashion enthusiasts [27]. - It can also transform outfits of both real and animated characters, although some minor inaccuracies may occur [31]. Group 4: Creative Content Generation - Users can create storyboard frames for films by uploading character portraits and providing simple prompts, showcasing versatility in style [37]. - The model can recognize hand-drawn content and generate complex action scenes based on specified poses [40]. - It can convert photographs into black-and-white manga styles while adding dynamic effects and even create humorous comic panels based on prompts [43][44]. Group 5: Restoration and Enhancement - The model excels in restoring old photographs and adding color to black-and-white images, demonstrating its capabilities in traditional photo editing tasks [50].
刚刚更新,全球AI百强:中国五款产品进前20,ChatGPT背腹受敌,氛围编程成黑马
机器之心· 2025-08-28 09:33
Core Insights - The report presents the fifth edition of the "Top 100 Gen AI Consumer Applications" by Andreessen Horowitz, highlighting the competitive landscape in AI applications across web and mobile platforms [2][5]. Group 1: Rankings and Competitors - OpenAI's ChatGPT remains the top application, but competitors like Google's Gemini, xAI's Grok, and Meta AI are rapidly closing the gap [3][4]. - The report includes two separate rankings: Web Top 50 and Mobile Top 50, with a total of 100 consumer AI products [5]. - In the web rankings, only 11 new applications entered the list, a decrease from 17 newcomers in March 2025 [9][10]. - Conversely, the mobile rankings saw 14 new entrants, attributed to the cleanup of "ChatGPT imitation apps" in app stores, allowing original products to gain traction [10][11]. Group 2: Chinese Market Influence - Chinese AI applications are making significant strides, with several products entering the global market [4]. - In the web rankings, three applications primarily serve Chinese users, while in the mobile rankings, 22 out of 50 applications originate from Chinese companies, predominantly in the image and video sectors [15][21]. Group 3: Google’s Expanding AI Portfolio - Google has four products debuting in the rankings, indicating a growing AI product matrix [22][25]. - Gemini ranks second in both web and mobile categories, with its traffic reaching approximately 12% of ChatGPT's [27]. - Google Labs, featuring various AI products, saw a significant traffic increase of over 13% following the launch of Veo 3 [27]. Group 4: Emerging Trends and User Engagement - The vibe coding sector is gaining traction, with platforms showing impressive user retention rates, indicating long-term growth potential [38][39]. - The report identifies 14 companies that have consistently appeared in the rankings, showcasing the diversity of consumer AI usage [46][48]. Group 5: Potential Future Leaders - The report highlights potential future leaders in the AI space, with companies like Lovable and Pixverse making significant advancements in their respective categories [56][57].
元石科技正式发布问小白5,性能直追GPT-5
机器之心· 2025-08-28 09:33
Core Viewpoint - The article highlights the launch of the new AI model "Wen Xiaobai 5" by Yuan Stone Technology, which is positioned as a strong competitor to GPT-5, showcasing significant advancements in various AI capabilities and practical applications [2][8][22]. Group 1: Model Performance and Comparison - Wen Xiaobai 5 achieved a score of 64.7 on the AA-Index, surpassing Gemini2.5 Pro and becoming the closest domestic AI model to GPT-5 [8]. - In STEM capabilities, Wen Xiaobai 5 scored 86, closely approaching GPT-5's performance [13]. - The model demonstrated a score of 17.7 in the Human Ultimate Academic Challenge (HLE), indicating strong capabilities in understanding and reasoning [14]. - For coding abilities, Wen Xiaobai 5 excelled with a score of 79.2 on the LiveCodeBench, showcasing its end-to-end problem-solving skills [17]. - In the Instruction Following Benchmark (IFBench), it scored 58.1, indicating robust generalization capabilities for following new instructions [19]. Group 2: Practical Applications - Wen Xiaobai 5 is designed for a wide range of applications, including academic knowledge, writing, office tasks, role-playing, programming, analysis, and healthcare [24]. - The model acts as a professional assistant, efficiently managing tasks such as meeting material organization and project tracking [26]. - It can analyze large datasets for decision-making in marketing and operational analysis, enhancing user efficiency [27]. - The model supports immersive role-playing scenarios, allowing users to engage in various character interactions [30]. - In academic research, Wen Xiaobai 5 assists in parsing complex information and providing structured knowledge frameworks [31]. Group 3: Accessibility and Future Developments - Wen Xiaobai 5 is now available to all users through its official website and app updates [4]. - The API collaboration channel for Wen Xiaobai 5 is set to open soon, inviting partnerships and integrations [34].
AAAI-26投稿量爆炸:近3万篇论文,2万来自中国,评审系统都快崩了
机器之心· 2025-08-28 04:33
Core Insights - The AAAI-2026 conference has received an unprecedented number of submissions, with nearly 29,000 papers submitted, of which around 20,000 (approximately two-thirds) are from China [2][5] - The total number of unique authors submitting papers exceeds 75,000, indicating a significant increase in participation [4] - The review process is facing challenges due to the high volume of submissions, with about 23,000 papers entering the review process, nearly double the number from AAAI-25 [5][6] Submission Statistics - The main technical track of AAAI-2026 received close to 29,000 submissions, with Chinese submissions accounting for approximately 20,000 [2] - The top three research keywords for submissions are computer vision (nearly 10,000 papers), machine learning (around 8,000 papers), and natural language processing (over 4,000 papers) [5] - The number of emails received by the organizing team has surpassed five times the total for AAAI-25, peaking at 400 emails per day [4][5] Review Process and Quality Assurance - To manage the increased demand, AAAI has recruited over 28,000 committee members, nearly tripling the size of the committee from AAAI-25 [6] - AAAI is actively investigating potential ethical issues in the review process and has established committees to ensure integrity and accountability [7] - AI-assisted review experiments have shown promising early results, including tools to detect collusion among reviewers [8] Trends in AI Research - The surge in submissions reflects a broader trend of increasing participation from Chinese researchers in AI, with China becoming a dominant force in the field [17][20] - Reports indicate that the proportion of papers authored by Chinese researchers at top AI conferences has significantly increased over the past decade [20][22] - By 2024, eight of the top 20 institutions ranked by accepted papers at leading AI conferences are from China, highlighting the country's growing influence [24][25]
EMNLP 2025 | 动态压缩CoT推理新方法LightThinker来了
机器之心· 2025-08-28 04:33
Core Viewpoint - The article discusses the development of LightThinker, a model that enhances the efficiency of large language models (LLMs) by compressing reasoning steps, thereby reducing memory usage and computational costs while maintaining accuracy [6][27]. Group 1: LightThinker Overview - LightThinker mimics human cognitive processes by dynamically compressing lengthy reasoning steps into concise representations, significantly reducing the number of tokens stored in the context window [6][27]. - The model's approach involves a cycle of generating, compressing, and discarding information, which helps maintain a small context size and addresses issues of memory overload and slow computation [14][27]. Group 2: Methodology - The first step in LightThinker's methodology is data reconstruction, where training data is modified to include "compression instructions," guiding the model on when to compress information [10]. - The second step involves attention modification, using a technique called "Thought-based Attention Mask" to control what the model can access during reasoning, ensuring it focuses on essential information [12]. - The third step is dynamic reasoning, where the model learns to rely on compact summaries for coherent reasoning rather than lengthy original thoughts [14][17]. Group 3: Experimental Results - LightThinker was tested across four datasets and two different models, showing significant improvements in peak memory usage and reasoning time, with a 70% reduction in peak memory and a 26% decrease in reasoning time while maintaining accuracy [21][27]. - The results indicate that LightThinker achieves a balance between accuracy and efficiency compared to traditional models [24][27]. Group 4: Limitations - The current method has limitations in mathematical tasks due to its data reconstruction approach, which relies on rules rather than semantic understanding, leading to potential information loss during compression [33].