机器之心
Search documents
实锤了:GPU越多,论文接收率越高、引用越多
机器之心· 2025-10-17 08:12
Core Insights - The article discusses the significant advancements in the AI field over the past three years, primarily driven by the development of foundational models, which require substantial data, computational power, and human resources [2][4]. Resource Allocation and Research Impact - The relationship between hardware resources and the publication of top-tier AI/ML conference papers has been analyzed, focusing on GPU availability and TFLOPs [4][5]. - A total of 5,889 foundational model-related papers were identified, revealing that stronger GPU acquisition capabilities correlate with higher acceptance rates and citation counts in eight leading conferences [5][9]. Research Methodology - The study collected structured information from 34,828 accepted papers between 2022 and 2024, identifying 5,889 related to foundational models through keyword searches [8][11]. - A survey of 229 authors from 312 papers indicated a lack of transparency in GPU usage reporting, highlighting the need for standardized resource disclosure [9][11]. Growth of Foundational Model Research - From 2022 to 2024, foundational model research has seen explosive growth, with the proportion of related papers in top AI conferences rising significantly [18][19]. - In NLP conferences, foundational model papers have outpaced those in general machine learning conferences [22]. Research Contributions by Academia and Industry - Academic institutions contributed more papers overall, while top industrial labs excelled in single-institution output, with Google and Microsoft leading in paper production [29][32]. - The research efficiency between academia and industry is comparable, with industry researchers publishing an average of 8.72 papers and academia 7.93 papers [31]. Open Source Models and GPU Usage - Open-source models, particularly the LLaMA series, have become the predominant choice in research, favored for their flexibility and accessibility [35][37]. - NVIDIA A100 is the most widely used GPU in foundational model research, with a notable concentration of GPU resources among a few institutions [38][39]. Funding Sources and Research Focus - Government funding is the primary source for foundational model research, with 85.5% of papers receiving government support [41][42]. - The focus of research has shifted towards algorithm development and inference processes, with a significant portion of papers dedicated to these areas [42]. Computational Resources and Research Output - The total computational power measured in TFLOPs is more strongly correlated with research output and citation impact than the sheer number of GPUs used [44][45]. - While more resources can improve acceptance rates, the quality of research and its novelty remain critical factors in the review process [47].
AI招聘有多离谱?小哥在LinkedIn埋了行代码,钓出一堆AI,吸引900万人围观
机器之心· 2025-10-17 08:12
Core Viewpoint - The article discusses the paradox in the job market where both job seekers and HR departments are using AI tools, leading to a situation where applications are not resulting in hires, highlighting a cycle of mutual deception between candidates and recruiters [2][4]. Group 1: AI in Recruitment - The use of AI in recruitment has become prevalent, with nearly 70% of companies expected to utilize AI in their hiring processes by the end of 2025 [22][23]. - AI tools are designed to enhance recruitment efficiency, but they also introduce security risks, such as "prompt injection" attacks, which manipulate AI systems to produce unintended outputs [24][25]. Group 2: Examples of AI Manipulation - Cameron Mattis, a Stripe executive, cleverly embedded a code in his LinkedIn profile to filter out AI-generated messages, resulting in a humorous response that included a flan recipe [5][12]. - Job seekers have been experimenting with ways to deceive AI screening tools, such as embedding hidden instructions in their resumes, which can lead to unexpected outcomes [31][34]. Group 3: Emerging Competition - LinkedIn is expanding its AI integration in collaboration with Microsoft, planning to use user data for training generative AI models starting November 3, 2025 [36][37]. - OpenAI is developing a new AI-driven recruitment platform, OpenAI Jobs Platform, expected to launch in mid-2026, aiming to match company needs with employee capabilities [39][40].
黑洞物理学家加盟OpenAI,GPT-5 Pro半小时重现人类数天推导
机器之心· 2025-10-17 04:09
Core Insights - OpenAI is launching a new initiative called "OpenAI for Science" aimed at accelerating scientific discoveries through AI technology [2][11] - Alex Lupsasca, a theoretical physicist, has joined this initiative as its first academic researcher, highlighting the potential of AI in advancing scientific research [1][15] - The capabilities of GPT-5 Pro have impressed Lupsasca, as it was able to independently derive a new symmetry in black hole perturbation theory in under 30 minutes, a task that took him several days [4][8] Group 1: AI and Scientific Research - The initiative aims to create an AI-driven platform to enhance human scientific discovery processes [2] - Lupsasca's experience with GPT-5 Pro demonstrates its ability to tackle complex theoretical problems, suggesting a significant leap in AI's role in scientific research [10][12] - The connection between AI and natural sciences is becoming increasingly significant, with AI expected to have a deeper impact across various academic research fields [13] Group 2: Lupsasca's Contributions and Achievements - Lupsasca's research includes a new conformal symmetry related to static, axisymmetric Kerr black holes, which has important implications for gravitational wave astronomy [7][15] - He has received multiple awards, including the 2024 Physics New Horizons Prize and the IUPAP Young Scientist Award for his work in black hole imaging [15] - Lupsasca is also the chief scientist of the Black Hole Explorer (BHEX) project, which aims to launch a satellite for clearer imaging of black holes by 2032 [15]
南洋理工揭露AI「运行安全」的全线崩溃,简单伪装即可骗过所有模型
机器之心· 2025-10-17 04:09
Core Viewpoint - The article emphasizes that when AI exceeds its predefined boundaries, its behavior itself constitutes a form of insecurity, introducing the concept of Operational Safety as a new dimension in AI safety discussions [7][9]. Summary by Sections Introduction to Operational Safety - The research introduces the concept of Operational Safety, aiming to reshape the understanding of AI safety boundaries in specific scenarios [4][9]. Evaluation of AI Models - The team developed the OffTopicEval benchmark to quantify risks associated with Operational Safety, focusing on whether models can appropriately refuse to answer out-of-domain questions [12][24]. - The evaluation involved 21 different scenarios with over 210,000 out-of-domain data points and 3,000 in-domain data points across English, Chinese, and Hindi languages [12]. Test Results and Findings - Testing revealed that nearly all major models, including GPT and Qwen, failed to meet Operational Safety standards, with significant drops in refusal rates for out-of-domain questions [14][16]. - For instance, models like Gemma-3 and Qwen-3 experienced refusal rate declines exceeding 70% when faced with deceptively disguised out-of-domain questions [16]. Proposed Solutions - The research suggests practical solutions to enhance models' adherence to their operational boundaries, including prompt-based steering methods that do not require retraining [20][21]. - Two lightweight prompting methods, P-ground and Q-ground, were shown to significantly improve models' operational safety scores, with P-ground increasing Llama-3.3's score by 41% [21][22]. Conclusion and Industry Implications - The paper calls for a reevaluation of AI safety, highlighting that AI must not only be powerful but also trustworthy and duty-bound [24][25]. - It stresses that operational safety is a prerequisite for deploying AI in serious applications, urging the establishment of new evaluation paradigms that reward models capable of recognizing their limitations [25].
按照Bengio等大佬的AGI新定义,GPT-5才实现了不到10%
机器之心· 2025-10-17 04:09
Core Insights - The article discusses a new comprehensive and testable definition of Artificial General Intelligence (AGI) proposed by leading scholars and industry leaders, emphasizing that AGI should match or exceed the cognitive capabilities of well-educated adults [1][3][47]. Definition and Framework - The proposed framework defines AGI as an AI that exhibits cognitive multi-functionality and proficiency comparable to that of well-educated adults, moving beyond narrow specialization [3][4]. - The framework is based on the Cattell-Horn-Carroll (CHC) theory, which categorizes human intelligence into various broad and narrow abilities, providing a structured approach to assess AI systems [6][48]. Measurement of AGI - The framework introduces a standardized "General Intelligence Index" (AGI score) ranging from 0% to 100%, where 100% indicates full AGI capability [7]. - It identifies ten core cognitive components derived from the CHC theory, each weighted equally to emphasize the breadth of cognitive abilities [9][48]. Performance of Current Models - The article evaluates the performance of GPT-4 and GPT-5 across these cognitive components, revealing that both models scored below 10% in most areas, indicating a significant gap from true AGI [12][50]. - For instance, GPT-4 achieved an overall AGI score of 27%, while GPT-5 scored 58%, highlighting rapid progress yet substantial distance from achieving AGI [50]. Cognitive Structure and Limitations - The cognitive structure of contemporary AI systems is described as "jagged," showing high proficiency in certain areas like general knowledge and mathematics, but severe deficiencies in foundational cognitive mechanisms, particularly in long-term memory storage [25][49]. - The lack of continuous learning capabilities leads to a "memory loss" effect, limiting the practical utility of AI systems [25]. Capability Distortions - The uneven distribution of AI capabilities can lead to "capability contortions," where strengths in certain areas mask weaknesses in others, creating a false impression of general intelligence [27][28]. - For example, reliance on extensive context windows to compensate for poor long-term memory storage is inefficient and not scalable for tasks requiring prolonged context accumulation [29]. Interdependence of Cognitive Abilities - The article emphasizes the interdependence of cognitive abilities, noting that complex tasks often require the integration of multiple cognitive domains [37][38]. - This interconnectedness suggests that assessments of AGI should consider the holistic nature of intelligence rather than isolated capabilities [38]. Challenges to Achieving AGI - The article outlines significant challenges to achieving AGI, including the need for reliable long-term memory systems and the ability to learn dynamically from experiences [42][51]. - It stresses that current AI systems are far from achieving the cognitive breadth and depth required for AGI, with many foundational issues still unresolved [50][52].
NeurIPS2025 | 攻破闭源多模态大模型:一种基于特征最优对齐的新型对抗攻击方法
机器之心· 2025-10-17 04:09
Core Insights - The article discusses the advancements and security vulnerabilities of Multimodal Large Language Models (MLLMs), particularly their susceptibility to adversarial attacks [2][8] - It introduces a novel attack framework called FOA-Attack, which enhances the transferability of adversarial samples across different models by optimizing feature alignment at both global and local levels [3][11] Group 1: Background and Motivation - MLLMs like GPT-4 and Claude-3 exhibit exceptional performance in tasks such as image understanding and visual question answering, but they inherit vulnerabilities from their visual encoders, making them prone to adversarial attacks [8][10] - Adversarial attacks can be categorized into non-targeted (aiming to produce incorrect outputs) and targeted (aiming for specific outputs), with the latter being particularly challenging in black-box scenarios where model internals are inaccessible [10][11] Group 2: FOA-Attack Framework - FOA-Attack employs a dual-dimensional alignment strategy, focusing on both global features (using cosine similarity loss for [CLS] tokens) and local features (using clustering and optimal transport for patch tokens) to improve transferability [6][11] - The framework includes a dynamic weight integration strategy that adapts the influence of multiple models during the attack generation process, enhancing the overall effectiveness of the attack [6][11] Group 3: Experimental Results - FOA-Attack significantly outperforms existing state-of-the-art methods in both open-source and closed-source MLLMs, achieving remarkable success rates, particularly against commercial closed-source models like GPT-4 [4][19] - In experiments, FOA-Attack achieved an attack success rate (ASR) of 75.1% against GPT-4, showcasing its effectiveness in real-world applications [19][24] Group 4: Conclusion and Future Directions - The findings highlight the vulnerabilities of current MLLMs in the visual encoding phase and suggest new defensive strategies, particularly in fortifying local feature robustness [24][25] - The authors have made the paper and code publicly available for further exploration and discussion, indicating a commitment to advancing research in this area [25][27]
欧几里得的礼物:通过几何代理任务增强视觉-语言模型中的空间感知和推理能力
机器之心· 2025-10-17 02:11
Core Insights - The article discusses the limitations of current multimodal large language models (MLLMs) in spatial intelligence, highlighting that even advanced models struggle with basic spatial tasks that children can perform easily [2][5] - A new approach is proposed, focusing on geometric problems as a means to enhance spatial perception and reasoning in vision-language models [6][8] Group 1: Limitations of Current Models - Despite significant advancements, state-of-the-art MLLMs still lack true spatial intelligence, often making errors in tasks like counting objects or identifying nearby items [2][5] - Over 70% of errors in spatial reasoning tasks stem from the models' inability to infer spatial phenomena rather than deficiencies in visual recognition or language processing [5] Group 2: Proposed Solutions - The research team aims to improve model performance by learning from a broader range of spatial phenomena, moving beyond single dataset limitations [5][8] - The study introduces a new dataset, Euclid30K, containing 29,695 geometric problems, which is designed to enhance the models' spatial reasoning capabilities [12][13] Group 3: Geometric Problems as Proxies - Solving geometric problems requires skills such as shape recognition, spatial relationship inference, and multi-step logical reasoning, which are also essential for spatial perception tasks [10] - Evidence from educational psychology suggests a strong correlation between geometric problem-solving and spatial intelligence, indicating that targeted practice can enhance spatial abilities [10] Group 4: Dataset Characteristics - The Euclid30K dataset includes a diverse range of geometric problems, with a total of 29,695 questions, including 18,577 plane geometry and 11,118 solid geometry questions [13] - The dataset was meticulously curated to ensure high quality, with answers verified for accuracy [12][13] Group 5: Model Training and Results - The models were trained using standard GRPO methods, and results showed performance improvements across various benchmarks after training with geometric problems [15][17] - A causal ablation study confirmed that the performance gains were attributable to the geometric tasks rather than other factors like algorithm design or data volume [17]
单块GPU上跑出实时3D宇宙,李飞飞世界模型新成果震撼问世
机器之心· 2025-10-17 02:11
Core Insights - The article discusses the launch of RTFM (Real-Time Frame Model), a generative world model that can run on a single H100 GPU, enabling real-time, consistent 3D world generation from 2D images [2][3][10]. Group 1: RTFM Overview - RTFM generates new 2D images from one or more 2D inputs without explicitly constructing a 3D representation, functioning as a learning-based renderer [5][17]. - The model is trained on large-scale video data and learns to model 3D geometry, reflections, and shadows through observation [5][17]. - RTFM blurs the line between reconstruction and generation, handling both tasks simultaneously based on the number of input views [20]. Group 2: Technical Requirements - Generative world models like RTFM require significant computational power, with the need to output over 100,000 tokens per second for interactive 4K video streams [11]. - To maintain consistency in interactions lasting over an hour, the model must process over 100 million tokens of context [12]. - Current computational infrastructure makes such demands economically unfeasible, but RTFM is designed to be efficient enough to run on existing hardware [13][15]. Group 3: Scalability and Persistence - RTFM is designed to be scalable, allowing it to benefit from future reductions in computational costs [14]. - The model addresses the challenge of persistence in generated worlds by modeling the spatial pose of each frame, enabling it to remember and reconstruct scenes over time [23][24]. - Context juggling mechanisms allow RTFM to maintain geometric structure in large scenes while ensuring true world persistence [25].
RAG、Search Agent不香了?苹果DeepMMSearch-R1杀入多模态搜索新战场
机器之心· 2025-10-17 02:11
Core Insights - Apple has introduced a new solution for empowering multimodal large language models (MLLMs) in multimodal web search, addressing inefficiencies in existing methods like retrieval-augmented generation (RAG) and search agents [1][5]. Group 1: Model Development - The DeepMMSearch-R1 model allows for on-demand multi-round web searches and dynamically generates queries for text and image search tools, improving efficiency and results [1][3]. - A two-stage training process is employed, starting with supervised fine-tuning (SFT) followed by online reinforcement learning (RL) using the GRPO algorithm, aimed at optimizing search initiation and tool usage [3][4]. Group 2: Dataset Creation - Apple has created a new dataset called DeepMMSearchVQA, which includes diverse multi-hop visual question-answering samples presented in multi-round dialogue format, balancing different knowledge categories [4][7]. - The dataset construction involved selecting 200,000 samples from the InfoSeek training set, resulting in approximately 47,000 refined dialogue samples for training [7]. Group 3: Training Process - In the SFT phase, the Qwen2.5-VL-7B model is fine-tuned to enhance its reasoning capabilities for web search information while keeping the visual encoder frozen [9]. - The RL phase utilizes GRPO to improve training stability by comparing candidate responses generated under the same prompt, optimizing the model's tool selection behavior [10][12]. Group 4: Performance Results - The DeepMMSearch-R1 model significantly outperforms RAG workflows and prompt-based search agents, achieving a performance increase of +21.13% and +8.89% respectively [16]. - The model's ability to perform targeted image searches and self-reflection enhances overall performance, as demonstrated in various experiments [16][18]. Group 5: Tool Utilization - The model's tool usage behavior aligns with dataset characteristics, with 87.7% tool invocation in the DynVQA dataset and 43.5% in the OKVQA dataset [20]. - The RL model effectively corrects unnecessary tool usage observed in the SFT model, highlighting the importance of RL in optimizing tool efficiency [21]. Group 6: Generalization Capability - The use of LoRA modules during SFT and KL penalty in online GRPO training helps maintain the model's general visual question-answering capabilities across multiple datasets [23][24].
当Search Agent遇上不靠谱搜索结果,清华团队祭出自动化红队框架SafeSearch
机器之心· 2025-10-16 07:34
Core Insights - The article discusses the vulnerabilities of large language model (LLM)-based search agents, emphasizing that while they can access real-time information, they are susceptible to unreliable web sources, which can lead to the generation of unsafe outputs [2][7][26]. Group 1: Search Agent Vulnerabilities - A real-world case is presented where a developer lost $2,500 due to a search error involving unreliable code from a low-quality GitHub page, highlighting the risks associated with trusting search results [4]. - The research identifies that 4.3% of nearly 9,000 search results from Google were deemed suspicious, indicating a prevalence of low-quality websites in search results [11]. - The study reveals that search agents are not as robust as expected, with a significant percentage of unsafe outputs generated when exposed to unreliable search results [12][26]. Group 2: SafeSearch Framework - The SafeSearch framework is introduced as a method for automated red-teaming to assess the safety of LLM-based search agents, focusing on five types of risks including harmful outputs and misinformation [14][21]. - The framework employs a multi-stage testing process to generate high-quality test cases, ensuring comprehensive coverage of potential risks [16][19]. - SafeSearch aims to enhance transparency in the development of search agents by providing a quantifiable and scalable safety assessment tool [37]. Group 3: Evaluation and Results - The evaluation of various search agent architectures revealed that the impact of unreliable search results varies significantly, with the GPT-4.1-mini model showing a 90.5% susceptibility in a search workflow scenario [26][36]. - Different LLMs exhibit varying levels of resilience against risks, with GPT-5 and GPT-5-mini demonstrating superior robustness compared to others [26][27]. - The study concludes that effective filtering methods can significantly reduce the attack success rate (ASR), although they cannot eliminate risks entirely [36][37]. Group 4: Implications and Future Directions - The findings underscore the importance of systematic evaluation in ensuring the safety of search agents, as they are easily influenced by low-quality web content [37]. - The article suggests that the design of search agent architectures can significantly affect their security, advocating for a balance between performance and safety in future developments [36][37]. - The research team hopes that SafeSearch will become a standardized tool for assessing the safety of search agents, facilitating their evolution in both performance and security [37].