机器之心

Search documents
ICML 2025 | 大模型能在信息不完备的情况下问出正确的问题吗?
机器之心· 2025-07-24 04:08
Core Insights - The article emphasizes the importance of Active Reasoning (AR) in enhancing the capabilities of Large Language Models (LLMs) beyond Passive Reasoning (PR) [1][2][3][4][7][10][55] - It introduces AR-Bench, a benchmark designed to evaluate the active reasoning capabilities of LLMs in real-world scenarios [7][19][55] Group 1: Active Reasoning - Active Reasoning (AR) is defined as the ability of models to actively seek out information through questioning and interaction, contrasting with Passive Reasoning (PR) which relies on complete information [3][4][15][18] - The need for AR is highlighted in various practical applications, such as medical diagnosis and detective work, where information is often incomplete [3][14][15] - The article identifies the core challenge of AR as the necessity to ask the right questions to gather critical information [4][18] Group 2: AR-Bench - AR-Bench is introduced as a systematic tool for assessing LLMs' active reasoning capabilities, simulating real-world information-gathering scenarios [19][20][55] - It consists of three task types: Situation Puzzles (SP), Guessing Numbers (GN), and Dynamic Conversations (DC), each testing different reasoning abilities [21][22][25] - The evaluation framework includes both result assessment and process assessment, focusing on the quality of questions posed and the effectiveness of information retrieval [25] Group 3: Findings on LLM Performance - Current LLMs, including advanced models like GPT-4o, show significant deficiencies in active reasoning, achieving only 35% accuracy in GN tasks [28][34] - The article notes that even state-of-the-art active reasoning methods do not improve model performance on AR-Bench [33] - Human performance in active reasoning tasks significantly surpasses that of existing LLMs, indicating a gap in model capabilities [34][55] Group 4: Recommendations for Future Work - The article suggests several directions for enhancing active reasoning capabilities, including the collection of high-quality fine-tuning datasets and the development of more reliable validation methods for search approaches [56][60] - It emphasizes the need for further research to enable LLMs to ask effective questions and solve real-world problems [55][60]
五倍推理加速,激发自回归潜能,苹果新工作让LLM预测未来
机器之心· 2025-07-24 04:08
Core Viewpoint - The article discusses the advancements in language models, particularly focusing on a new framework developed by Apple researchers that allows autoregressive models to perform multi-token predictions, significantly improving inference speed while maintaining generation quality [7][8][9]. Group 1: Advances in Language Models - Recent progress in language models is attributed to the availability of large-scale text data and the effectiveness of autoregressive training methods [2]. - Autoregressive models predict each token based on preceding context, which provides a clear advantage during training but incurs high computational costs during inference due to sequential execution [5][6]. Group 2: New Framework Development - Apple researchers have developed a framework that enables pre-trained autoregressive language models to execute multi-token predictions, achieving up to 5.35 times speedup for code and math tasks, and approximately 2.5 times for general tasks [7]. - This innovation allows for a significant reduction in AI operational costs and the potential for powerful real-time assistants to run smoothly on lightweight devices [9]. Group 3: Research Findings - The researchers confirmed that language models can generate multiple tokens in a single inference step, which is a promising development for speeding up generation processes [11]. - The study explored whether it is possible to train truly non-autoregressive language models, leading to the design of a training algorithm that minimally alters existing autoregressive frameworks while achieving efficient multi-token generation [13][14]. Group 4: Experimental Results - Experiments conducted on the Tulu3-8B model demonstrated that the proposed multi-token generation algorithm achieved speedups ranging from approximately 1.5 to 5.2 times across various tasks, with the most significant improvements observed in programming and math tasks [46]. - The introduction of mask tokens and a lightweight sampling module allowed the model to leverage its full depth and representational capabilities, resulting in superior performance compared to existing multi-token prediction methods [23][24]. Group 5: Future Directions - Future research could explore the applicability of this method during pre-training or downstream task adaptation phases to further assess its effectiveness [53]. - Another promising direction is the application of diffusion-based generation methods to multi-token prediction tasks, aiming to balance efficiency and quality [53].
如何实现可验证的Agentic Workflow?MermaidFlow开启安全、稳健的智能体流程新范式
机器之心· 2025-07-24 03:19
Core Viewpoint - The article discusses the advancements in Multi-Agent Systems (MAS) and introduces "Agentic Workflow" as a key concept for autonomous decision-making and collaboration among intelligent agents, highlighting the emergence of structured and verifiable workflow frameworks like "MermaidFlow" [1][4][22]. Group 1: Introduction to Multi-Agent Systems - The development of large language models is driving the evolution of AI agents from single capabilities to complex system collaborations, making MAS a focal point in both academia and industry [1]. - Leading teams, including Google and Shanghai AI Lab, are launching innovative Agentic Workflow projects to enhance the autonomy and intelligence of agent systems [2]. Group 2: Challenges in Current Systems - Existing systems face significant challenges such as lack of rationality assurance, insufficient verifiability, and difficulty in intuitive expression, which hinder the reliable implementation and large-scale deployment of MAS [3]. Group 3: Introduction of MermaidFlow - The "MermaidFlow" framework, developed by researchers from Singapore's A*STAR and Nanyang Technological University, aims to advance agent systems towards structured evolution and safe verifiability [4]. - Traditional workflow expressions often rely on imperative code like Python scripts or JSON trees, leading to three core bottlenecks: opaque structure, verification difficulties, and debugging challenges [7][10]. Group 4: Advantages of MermaidFlow - MermaidFlow introduces a structured graphical language that models agent behavior planning as a clear and verifiable flowchart, enhancing the interpretability and reliability of workflows [8][12]. - The structured representation allows for clear visibility of agent definitions, dependencies, and data flows, facilitating easier debugging and optimization [11][14]. Group 5: Performance and Evolution - MermaidFlow demonstrates a high success rate of over 90% in generating executable and structurally sound workflows, significantly improving the controllability and robustness of agent systems compared to traditional methods [18]. - The framework supports safe evolutionary optimization through a structured approach, allowing for modular adjustments and ensuring compliance with semantic constraints [16][19]. Group 6: Conclusion - As MAS and large model AI continue to evolve, achieving structured, verifiable, and efficient workflows is crucial for agent research, with MermaidFlow providing a foundational support for effective collaboration processes [22].
首个多模态工业信号基座模型FISHER,权重已开源,来自清华&上交等
机器之心· 2025-07-24 03:19
Core Viewpoint - The article introduces the FISHER model, the first multi-modal industrial signal foundation model, developed by researchers from Tsinghua University, Shanghai Jiao Tong University, Beijing Huakong Zhijia Technology Co., Ltd., and North China Electric Power University, aimed at unifying the modeling of heterogeneous industrial signals [1][3][5]. Research Background - The increasing installation of sensors on industrial equipment has led to challenges in efficiently analyzing industrial signals due to their significant heterogeneity, summarized as the M5 problem: multi-modal, multi-sampling rate, multi-scale, multi-task, and few faults [3][4]. Research Motivation - Despite the apparent differences in industrial signals, their intrinsic features and semantic information are similar, suggesting that a single model can be used for unified modeling of heterogeneous industrial signals. The FISHER model leverages these similarities to enhance representation capabilities [5][7]. FISHER Model Introduction - FISHER is designed to handle any sampling rate of industrial signals by using sub-bands as modeling units, employing a building-block approach to represent entire signals. It utilizes Short-Time Fourier Transform (STFT) for signal input features, focusing on high-frequency components crucial for fault detection [9][10]. Model Architecture - The FISHER model consists of a ViT Encoder and a CNN Decoder, utilizing a "teacher-student" self-distillation pre-training method. The model processes 80% of the masked sub-bands and combines them with the unmasked portions for output [12][13]. Experimental Results - FISHER's three versions outperformed baseline models by at least 3.91%, 4.34%, and 5.03% on the RMIS benchmark, demonstrating strong generalization capabilities. In anomaly detection, FISHER performed slightly below BEATs, while in fault diagnosis, it significantly surpassed all baseline models [19][22]. Performance Analysis - The performance curve of FISHER models is consistently higher than that of baseline systems, indicating superior pre-training and scaling effectiveness. The article suggests that data cleaning will be crucial for scaling up the training of signal foundation models [22][23].
自曝曾想拆分英伟达,特朗普签署最激进「AI行动计划」,全行业去监管
机器之心· 2025-07-24 03:19
特朗普:黄仁勋拥有100%的AI,这个行业还是不要动了。 机器之心报道 编辑:泽南、+0 特朗普表示,他计划加快美国人工智能的发展,为企业在不受监管和保障措施约束的情况下开发该技术打开大门,但他补充说,人工智能需要摆脱「党派偏 见」。 US AI Action Plan 页面: https://www.ai.gov/action-plan 文件地址: https://www.whitehouse.gov/wp-content/uploads/2025/07/Americas-AI-Action-Plan.pdf 美国 AI 领域的发展,或许马上将会加速到更快。 当地时间本周三,美国总统特朗普详细阐述了一项新的「人工智能行动计划」(AI Action Plan),其中包含鼓励美国 AI 产业发展的政策指南。 这份长达 28 页的人工智能行动计划概述了针对这一快速发展技术的 90 多项政策行动,政府官员表示这些行动可以在明年实施。 2023 年,特朗普的前任乔·拜登签署了一项行政命令,要求制定管理联邦政府人工智能使用的安全标准——但特朗普在今年 1 月上任第一天就撤销了这项 命令。 为此,特朗普签署了三项行政命令 ...
这才是IMO奥赛战神:满分,5战3金,刚被MIT录取
机器之心· 2025-07-23 10:36
Core Viewpoint - The article highlights the impressive performance of AI models, particularly the Seed Prover from ByteDance, in the International Mathematical Olympiad (IMO), alongside the remarkable achievements of human contestant Warren Bei, who scored a perfect 42/42, showcasing the intersection of AI and human intelligence in mathematics [3][4][5]. Group 1: AI Performance - The Seed Prover model from ByteDance successfully solved 4 out of 6 problems in the IMO, achieving a score of 30 points, which is recognized as a silver medal performance [4]. - The article emphasizes the growing interest and advancements in AI's capabilities in formal mathematical reasoning, particularly in competitive environments like the IMO [3][4]. Group 2: Warren Bei's Achievements - Warren Bei, an 11th-grade student from Canada, achieved a perfect score of 42/42 at the IMO, a feat accomplished by only five contestants globally this year [5][6]. - His journey in mathematics includes five years of participation in the IMO, culminating in three gold medals and two silver medals, reflecting consistent improvement and dedication [9][15]. - Warren's accolades also include winning the Canadian Mathematics Olympiad (CMO) multiple times, starting from a young age, which has established him as a prominent figure in the mathematics community [16][17]. Group 3: Personal Insights and Future Aspirations - Warren Bei expresses a passion for mathematics, stating that the joy lies in the process of problem-solving rather than the awards themselves [18]. - He maintains an open attitude towards his future, considering various academic paths while emphasizing the importance of understanding the practical applications of mathematics [12][13]. - His approach to challenges in mathematics is philosophical, focusing on intuition and perseverance as key to overcoming difficulties [19].
ICCV高分论文|可灵ReCamMaster在海外爆火,带你从全新角度看好莱坞大片
机器之心· 2025-07-23 10:36
Core Viewpoint - The article introduces ReCamMaster, a video generation model that allows users to reframe existing videos along new camera trajectories, addressing common issues faced by video creators such as equipment limitations and shaky footage [2][17]. Group 1: ReCamMaster Overview - ReCamMaster enables users to upload any video and specify a new camera path for re-framing, thus enhancing the quality of video production [2]. - The model has significant applications in fields such as 4D reconstruction, video stabilization, autonomous driving, and embodied intelligence [3][17]. Group 2: Innovation and Methodology - The primary innovation of ReCamMaster lies in its new video conditioning paradigm, which combines condition video and target video in a time dimension after patchifying, resulting in substantial performance improvements over previous methods [11][17]. - The model achieves near-product-level performance in re-framing single videos, demonstrating the potential of video generation models in this area [13][17]. Group 3: MultiCamVideo Dataset - The MultiCamVideo dataset, created using Unreal Engine 5, consists of 13,600 dynamic scenes captured by 10 cameras along different trajectories, totaling 136,000 videos and 112,000 unique camera paths [13]. - The dataset features 66 different characters, 93 types of actions, and 37 high-quality 3D environments, providing a rich resource for research in camera-controlled video generation and 4D reconstruction [13][17]. Group 4: Experimental Results - ReCamMaster has shown significant performance improvements compared to baseline methods in experimental comparisons [15][17].
用户暴涨近300万,国产AI音乐神器Mureka重磅升级V7,我们拿它复刻了「印度神曲」
机器之心· 2025-07-23 08:57
Core Viewpoint - The article discusses the rapid advancement of AI-generated music, particularly focusing on the capabilities of the new music model Mureka V7 developed by Kunlun Wanwei, which significantly surpasses its predecessors and competitors in various performance metrics [6][8][51]. Group 1: Mureka V7 Performance - Mureka V7 has been released as the strongest domestic music model, outperforming the overseas AI music platform Suno in key metrics such as average performance rating and overall audio quality [6][8]. - Compared to its predecessor Mureka V6, Mureka V7 shows substantial improvements in music quality, including melody and arrangement, as well as vocal and instrumental realism [7][8]. - The performance metrics for Mureka V7 include an average performance rating of 57.7%, mixing quality of 39.0%, and vocal realism of 70.0% [8]. Group 2: Features and Innovations - Mureka V7 introduces a feature allowing users to upload audio or video links to create songs mimicking specific artists, enhancing personalization in music creation [12][13]. - The model can analyze user-uploaded music to generate original works with similar styles, demonstrating its versatility in music generation [17]. - Mureka V7 has also upgraded its capabilities to generate music videos alongside audio, expanding its creative offerings [20]. Group 3: MusiCoT Technology - The MusiCoT technology has been optimized in Mureka V7, allowing for a structured approach to music creation that aligns with human creative processes [25][28]. - MusiCoT enables the model to generate music with clear structure and coherence, enhancing the overall quality of the output [29][33]. - The technology has shown superior performance in both subjective and objective evaluations, establishing a new standard in the industry [32][34]. Group 4: Voice Model Development - Kunlun Wanwei has also introduced Mureka TTS V1, an audio model that allows for customizable voice generation based on user-defined characteristics [39][40]. - This model surpasses competitors in various aspects of voice synthesis, indicating a strong position in the voice generation market [41]. - Mureka TTS V1 can create voices for various applications, including film, gaming, and advertising, broadening its market potential [45]. Group 5: Industry Trends - The article notes a shift in the industry towards the commercialization of AI models, with a focus on vertical models like music and video generation becoming the new competitive landscape [47][48]. - Kunlun Wanwei's strategy aligns with this trend, aiming to create a comprehensive ecosystem for AI-generated content across multiple domains [49][50]. - The growing user base of Mureka, with nearly 3 million new users since March, highlights its acceptance and impact on music creation [51].
无线合成数据助力破解物理感知大模型数据瓶颈,SynCheck获顶会最佳论文奖
机器之心· 2025-07-23 08:57
Core Insights - The article discusses the importance of wireless perception technology in the context of embodied intelligence and spatial intelligence, emphasizing its ability to overcome traditional sensory limitations and enhance human-machine interaction [1] Group 1: Wireless Perception Technology - Wireless perception is becoming a key technology that allows machines to "see" beyond physical barriers and detect subtle changes in the environment, thus reshaping human-machine interaction [1] - The technology captures the reflective characteristics of wireless signals, enabling the perception of movements and actions from several meters away [1] Group 2: Challenges in Data Acquisition - A significant challenge in developing large models that understand physical principles (like electromagnetism and acoustics) is the scarcity of relevant data, as existing models primarily learn from textual and visual data [2] - The reliance on real-world data collection is insufficient to support the vast data requirements of large models [2] Group 3: SynCheck Innovation - The SynCheck framework, developed by researchers from Peking University and the University of Pittsburgh, provides synthetic data that closely resembles real data quality, addressing the data scarcity issue [3] - The framework was recognized with the best paper award at the MobiSys 2025 conference [3] Group 4: Quality Metrics for Synthetic Data - The research introduces two innovative quality metrics for synthetic data: affinity (similarity to real data) and diversity (coverage of real data distribution) [5] - A theoretical framework for evaluating synthetic data quality was established, moving beyond previous methods that relied on visual cues or specific datasets [7] Group 5: Performance Improvements with SynCheck - SynCheck demonstrated significant performance improvements, achieving a 4.3% performance increase even in the worst-case scenario where traditional methods led to a 13.4% decline [13] - In optimal conditions, performance improvements reached up to 12.9%, with filtered synthetic data showing better affinity while maintaining diversity comparable to original data [13] Group 6: Future Directions - The research team aims to innovate training paradigms for wireless large models by diversifying data sources and exploring efficient pre-training task architectures [18] - The goal is to establish a universal pre-training framework for various wireless perception tasks, enhancing the integration of synthetic and diverse data sources to support embodied intelligence systems [18]
夸克健康大模型万字调研报告流出:国内首个!透视主任医师级「AI大脑」背后的深度工程化
机器之心· 2025-07-23 08:57
Core Insights - The Quark Health Model has successfully passed assessments in 12 core medical disciplines, marking it as the first AI model in China to achieve this milestone, demonstrating its advanced capabilities in the healthcare sector [1][3]. Group 1: Research Summary - The development of high-performance reasoning models in the healthcare sector remains challenging despite rapid advancements in general AI models. The Quark Health Model has established a comprehensive process that enhances performance and interpretability by clearly defining data sources and learning methods [3][5]. - The Quark Health Model team emphasizes the importance of high-quality thinking data (Chain-of-Thought, CoT) as foundational material for enhancing the model's reasoning capabilities through reinforcement learning [5][6]. Group 2: Data Production Lines - The Quark Health Model employs two parallel data production lines: one for verifiable data and another for non-verifiable data, ensuring a systematic approach to data quality and model training [6][17]. - The first production line focuses on cold-start data and model fine-tuning, utilizing high-quality data generated by state-of-the-art language models, which are then validated by medical professionals to ensure accuracy and reliability [19][24]. Group 3: Reinforcement Learning and Training - The reinforcement learning phase is critical for enhancing the model's reasoning capabilities, with a focus on generating diverse and high-quality outputs through iterative training and data selection [24][26]. - The model's training process incorporates various mechanisms to evaluate and improve the quality of reasoning, including the use of preference reward models and verification systems to ensure the accuracy and relevance of outputs [33][38]. Group 4: Quality Assessment and Challenges - The Quark Health Model addresses the complexities of multi-solution and multi-path scenarios in healthcare by implementing a robust evaluation system that recognizes the value of diverse reasoning paths and outputs [31][32]. - The model's training includes strategies to mitigate "cheating" behaviors, ensuring that the outputs are not only structurally sound but also medically accurate and reliable [40][42].