Workflow
大语言模型
icon
Search documents
科研智能体「漫游指南」—助你构建领域专属科研智能体
机器之心· 2025-09-01 02:49
Core Insights - The article presents a comprehensive guide for constructing scientific agents based on large language models (LLMs), emphasizing the integration of AI in scientific research and addressing the epistemological and methodological gaps between AI and natural sciences [2][4]. Summary by Sections Overview of Scientific Agents - The guide aims to provide a structured approach to building scientific agents, detailing the levels of agent capabilities and construction strategies throughout the entire scientific research lifecycle [2][4]. Levels of Scientific Agents - Scientific agents are categorized into three levels: - **Agent as Assistant**: Limited to specific tasks within a domain, constructed using small models through post-training or fine-tuning, with high performance in specialized tasks but lacking comprehensive operational capabilities [8]. - **Agent as Partner**: Integrates various tools for enhanced capabilities, utilizing closed-source large models and modular design to independently perform tasks like literature consultation and hypothesis generation, though still limited in self-validation and reliability [8]. - **Agent as Avatar**: Focuses on multi-dimensional capability enhancement, featuring strong reasoning, memory, and collaboration skills, capable of providing comprehensive support across various research stages [8]. Construction Process of Scientific Agents - The construction process involves three main components: - **Knowledge Organization**: Structuring scientific information for effective understanding and reasoning, including unstructured sequences, structured data, instructions, and knowledge graphs [12][14]. - **Knowledge Injection**: Embedding domain-specific expertise into agents through explicit or implicit methods to enhance their problem-solving capabilities [12][14]. - **Tool Integration**: Expanding agent functionalities by incorporating external tools for specialized tasks, enabling autonomous operation and coordination of resources [12][14]. Capability Enhancement of Scientific Agents - Enhancements focus on: - **Memory Enhancement**: Essential for maintaining context and executing multi-step reasoning, utilizing various memory structures to support complex tasks [19]. - **Reasoning Enhancement**: Addressing limitations of LLMs through structured reasoning chains and domain-specific optimizations to improve output reliability [19]. - **Collaboration Enhancement**: Improving interactions between multi-agent systems and human researchers to optimize research outcomes [19]. Benchmarking and Evaluation - Benchmarks are categorized into knowledge-intensive and experiment-driven tasks, each emphasizing different aspects of scientific research processes [17][18]. - **Knowledge-Intensive Tasks**: Focus on complex, domain-specific tasks requiring deep expertise [17]. - **Experiment-Driven Tasks**: Evaluate the agent's ability to design and validate experiments autonomously [18]. Future Research Directions - Future efforts should focus on: - Ensuring empirical accuracy in scientific experiment designs and integrating verification tools [23]. - Designing flexible frameworks for complex task adaptation in specific research areas [23]. - Incorporating self-reflection and iterative mechanisms for continuous improvement [23]. - Optimizing interactions between agents and human researchers to enhance scientific discovery [23].
硬蛋创新(00400.HK)中期经营溢利2.76亿元 同比增加约20.8%
Ge Long Hui· 2025-08-29 16:56
Group 1 - The company reported a revenue of approximately RMB 6.677 billion for the six months ending June 30, 2025, representing a year-on-year increase of about 54.5% [1] - Operating profit was approximately RMB 276 million, an increase of about 20.8% year-on-year [1] - Net profit after tax was approximately RMB 190 million, reflecting a year-on-year increase of 12.4% [1] - Earnings per share stood at RMB 0.086 [1] Group 2 - The rapid penetration of AI applications has become a core driver of growth in the global semiconductor market [1] - According to the World Semiconductor Trade Statistics (WSTS), the global semiconductor market size reached USD 346 billion in the first half of the year, marking an 18.9% year-on-year growth [1] - The demand related to AI has been particularly significant, with a substantial increase in the need for high-performance GPUs, dedicated AI accelerators, and advanced storage chips [1] - Major global cloud service providers have significantly increased capital expenditures to expand AI training and inference server clusters, further driving the growth in shipments of high-end AI chips [1]
吴恩达最新来信:是时候关注并行智能体了
量子位· 2025-08-29 11:37
Core Viewpoint - The article emphasizes the emerging importance of parallel agents in enhancing AI capabilities, suggesting that collaboration among multiple agents can significantly improve efficiency and speed in task execution [1][3][4]. Summary by Sections Parallel Agents as the Future - The traditional approach to improving AI performance has relied heavily on scaling laws, which focus on increasing data and computational power. However, the article argues that the future lies in the ability of multiple agents to work in parallel [4][8]. Validation of Parallel Agents - Andrew Ng cites his previous work at Baidu and OpenAI as evidence that parallel agent methodologies can yield faster results compared to conventional methods that often require lengthy processing times [5][6]. Challenges in Coordination - The article highlights the inherent challenges in coordinating multiple agents to perform complex tasks, such as web analysis or software development, which can be difficult even for human teams [9][10]. Recent Research Developments - Two recent papers are mentioned that contribute to the understanding of parallel agents: - The first paper discusses how large language models can generate multiple trajectories during inference to enhance problem-solving efficiency in programming [11][13]. - The second paper introduces the Together Mixture Of Agents (MoA) architecture, which utilizes multiple large language models simultaneously to improve performance and allows for adjustments in the hierarchical structure of agents [14][15]. Future Research Directions - Ng concludes that there is still much research and engineering work needed to optimize the use of parallel agents, suggesting that the number of agents capable of working efficiently in parallel could be substantial [18]. Historical Context - The article references Ng's 2009 paper that demonstrated the large-scale application of GPUs in deep learning, marking a significant milestone in the field and underscoring the importance of parallel processing [19][20].
前OpenAI、DeepMind研究员领衔,50+位专家谈AI编程、Agent与具身智能,2025全球机器学习技术大会议程首发!
AI科技大本营· 2025-08-29 10:06
Core Insights - The article emphasizes the transition of AI from impressive demos to a rigorous focus on architecture, systems, data, and business integration, highlighting the need for sustainable industrial capabilities [1] - The 2025 Global Machine Learning Technology Summit, organized by CSDN and Singularity Research Institute, will take place on October 16-17 in Beijing, featuring over 50 prominent speakers from academia and industry [1][3] Group 1: Event Overview - The summit aims to address the pressing question of how to transform technological breakthroughs into sustainable industrial capabilities [1] - A comprehensive "full-stack battle map" of AI has been designed, featuring 12 core topics including the evolution of large language models, AI-enabled software development, and practical applications of large models [3][4] Group 2: Key Speakers and Topics - Zhao Jian will discuss AI safety and governance, focusing on the security risks and ethical challenges of large models, along with innovative governance solutions [5][8] - Zhou Pan will present the MindGPT-4o-Audio, a real-time voice dialogue model that achieves human-like interaction capabilities [11][14] - Leng Dawei will share insights on FG-CLIP, a high-precision image-text alignment model designed for large-scale applications [16][19] - Zhang Heng will explore the transition from academic research to commercial AI visual algorithms, detailing the development process from prototypes to products [20][24] - Zhang Jun will introduce the Wenxin 4.5 open-source model and its key training technologies, addressing challenges in model training and inference [25][29] - Zhang Dao Xin will discuss the application of multimodal models in Xiaohongshu's search functionalities, focusing on content understanding and retrieval systems [30][33] - Han Ai will present the OxyGent framework for multi-agent collaboration in JD Retail, emphasizing its modular design for flexible system development [34][37] - Wang Peiyu will cover advancements in multimodal reasoning and unified models, showcasing the evolution of the r1v series [39][42] - Cui Cheng will discuss the latest technologies in PaddleOCR and its applications in various industries [43][46] - Xiao Chaojun will introduce MiniCPM, an efficient model for edge devices, highlighting breakthroughs in architecture and training algorithms [47][49] - Chen Yingfeng will explore the application of embodied intelligence in engineering machinery, focusing on human-robot collaboration [50][53] - Zhang Shaobo will present the LLM Agent's role in software engineering, demonstrating its capabilities in solving real development challenges [54][57] - Zhang Dan will discuss how AI large models can help overcome challenges in L4 autonomous driving, sharing insights on commercial applications [58][61] - Han Zongbo will address uncertainty modeling in AI, providing a framework for enhancing reliability in complex scenarios [62][65] Group 3: Future Directions - The summit serves as a platform for deep exchanges in AI technology, fostering collaboration and innovation across industries [74] - The event aims to capture cutting-edge trends and explore pathways for industrial upgrades, inviting global AI participants to engage in discussions [74]
人工智能将为你预订假期,但暂时还不会帮你打扫厨房……
3 6 Ke· 2025-08-29 06:59
Group 1: Core Insights on AI Development - The advancement of artificial intelligence (AI) has reached a stage where large language models (LLMs) can engage in autonomous dialogue and problem-solving, yet achieving machines with true human-like intelligence remains a distant goal [1][6] - Despite the perception of AI being highly advanced, it still struggles to accurately replicate many fundamental human tasks, highlighting the limitations and risks that need to be addressed [1][6] - The most significant breakthrough in AI is its ability to analyze vast amounts of data to tackle complex problems and provide practical solutions, creating substantial opportunities for businesses and consumers [1][3] Group 2: AI Integration in Business Strategy - Executives should incorporate generative AI (GenAI) into workflows to save time and enhance efficiency, particularly in handling basic tasks like creating presentations [3] - LLMs can unlock hidden potential by extracting value from unstructured data accumulated in various computer systems, transforming emails, documents, and meeting notes into actionable insights [3] - LLMs also show promise in supporting creative work, generating numerous ideas for marketing campaigns, although the quality may vary [3][4] Group 3: Types of AI Assistants - Three categories of AI assistants are identified, each with increasing complexity and economic value: customer service assistants, automation process assistants, and collaborative assistants [4][6] - Customer service assistants can handle banking inquiries and modify account settings based on customer instructions [4] - Automation process assistants can provide personalized vacation plans and complete bookings using LLMs [4] - Collaborative assistants can solve problems through conversation, optimizing processes that require strict adherence to regulations [4] Group 4: Challenges and Risks of AI - AI usage presents significant flaws and risks that executives must be cautious of, including issues related to privacy, misinformation, bias, copyright, employment disruption, content pollution, and uncontrolled future developments [7][8] - The output from LLMs can often be misleading, producing seemingly credible but incorrect information, which poses a risk of misinformation [8] - The concentration of AI power among a few tech giants and government entities raises concerns about its impact on economic and democratic health [8][9]
传统SLAM的定位导航和具身目标导航有什么区别?
具身智能之心· 2025-08-29 00:03
Core Insights - Goal-Oriented Navigation (GON) empowers robots to autonomously navigate and complete tasks based on goal descriptions, marking a significant shift from traditional Visual Language Navigation (VLN) systems [2][3] - The technology has been successfully implemented across various sectors, including delivery, healthcare, and hospitality, enhancing service efficiency and adaptability in dynamic environments [3][4] - The evolution of GON technology can be categorized into three generations, each with distinct methodologies and advancements [5][7][9] Group 1: Technology Overview - GON is a key area within embodied navigation, relying on language understanding, environmental perception, and path planning [2] - The transition from following explicit instructions to autonomous decision-making involves semantic parsing, environmental modeling, and dynamic decision-making [2][3] - The integration of computer vision, reinforcement learning, and 3D semantic understanding is crucial for the success of GON systems [2] Group 2: Industry Applications - GON technology has been applied in terminal delivery scenarios, enabling robots to navigate complex urban environments effectively [3] - Companies like Meituan and Starship Technologies have deployed delivery robots that utilize dynamic path re-planning capabilities [3] - In healthcare and hospitality, companies such as Aethon and Jiakan Technology have implemented service robots for autonomous delivery of medications and meals, improving response efficiency [3] Group 3: Technological Evolution - The first generation of GON focused on end-to-end methods using reinforcement and imitation learning, achieving breakthroughs in Point Navigation and closed-set image navigation tasks [5] - The second generation introduced modular methods that explicitly construct semantic maps, enhancing performance in zero-shot object navigation tasks [7] - The third generation integrates large language models (LLMs) and visual language models (VLMs) to improve exploration strategies and open-vocabulary target matching accuracy [9] Group 4: Educational Initiatives - A new course has been developed to address the challenges of learning GON, focusing on practical applications and theoretical foundations [10][11] - The curriculum includes modules on semantic navigation frameworks, Habitat simulation ecology, and end-to-end navigation methodologies [15][18] - The course aims to provide a comprehensive understanding of GON, enabling participants to bridge the gap between theory and practice [11][12]
英伟达CEO:更先进AI模型将推动芯片与数据中心持续增长
Sou Hu Cai Jing· 2025-08-28 06:24
Core Viewpoint - The CEO of Nvidia, Jensen Huang, believes that the current phase is a "new industrial revolution" driven by AI, with significant growth opportunities expected over the next decade [2]. Group 1: Company Insights - Nvidia reported a revenue of $46.7 billion for the last quarter, indicating strong performance amid the AI boom [2]. - Huang predicts that by the end of this decade, spending on AI infrastructure could reach $3 trillion to $4 trillion, reflecting ongoing growth in the generative AI sector [2][5]. - The demand for chips and computing power for AI is expected to remain high, with Huang emphasizing the importance of data centers in meeting this demand [2][3]. Group 2: AI Model Developments - New AI models utilizing "reasoning" technology require significantly more computational power, potentially needing 100 times or more than traditional large language models [3][5]. - The "long thinking" approach in AI allows models to research across different sites and integrate information, enhancing the quality of responses [3]. Group 3: Impact of AI Data Centers - The rapid growth of AI data centers is leading to increased land use, water consumption, and energy demands, which could strain local communities and the U.S. power grid [2][5]. - The expansion of generative AI tools is expected to further escalate the demand for energy and resources [5].
理想汽车自研智驾芯片M100上车路测,部分计算性能超英伟达Thor-U!1颗M100所提供有效算力可对标3颗英伟达 Thor-U
Ge Long Hui· 2025-08-28 05:17
Core Insights - Li Auto has successfully developed its self-researched intelligent driving chip M100, which has passed critical pre-mass production stages in Q1 of this year [1] - The M100 chip has completed functional and performance testing within two weeks and is currently undergoing road tests with small batches of vehicles [1] - The M100 chip demonstrates specific performance characteristics, providing effective computing power comparable to multiple NVIDIA Thor-U chips in different tasks [1] Group 1 - The M100 chip has achieved a performance level in running large language model (LLM) tasks equivalent to that of 2 NVIDIA Thor-U chips [1] - In traditional visual tasks related to convolutional neural networks (CNN), the M100 chip's effective computing power is comparable to that of 3 NVIDIA Thor-U chips [1]
阿里巴巴和上汽热捧!这家独角兽要IPO了!
IPO日报· 2025-08-28 02:30
Core Viewpoint - Alibaba Group plans to spin off its subsidiary, Zhibo Network Technology Co., Ltd. (Zhibo Network), which specializes in smart cockpit solutions, for an independent listing on the Hong Kong Stock Exchange. This move aims to enhance the company's value and operational transparency while allowing it to access capital markets independently [1][18]. Industry Overview - The smart cockpit sector is on the verge of explosive growth, driven by supportive government policies, rapid growth in the passenger car market, improved chip performance, breakthroughs in large language models, and the continuous evolution of integrated AI technologies. Global smart vehicle sales are projected to grow from 58 million units in 2024 to 86.5 million units by 2030, with a compound annual growth rate (CAGR) of 6.9% [5]. - The market for smart cockpit solutions in China is expected to expand from 129 billion yuan in 2024 to 327.4 billion yuan by 2030, with a CAGR of 16.8%. Software-based cockpit solutions are anticipated to grow even faster, from 40.1 billion yuan to 114.9 billion yuan, achieving a CAGR of 19.2% [5]. Company Profile - Zhibo Network focuses on developing smart cockpit solutions, offering system-level OS solutions, AI end-to-end solutions, and in-vehicle platform services [4]. - Despite its smaller revenue scale compared to competitors like Desay SV and Huayang Group, Zhibo Network's latest valuation reached 22 billion yuan (approximately 3 billion USD), supported by its parent companies, Alibaba and SAIC [1][12][14]. - Zhibo Network's revenue for 2022 to 2024 is projected at 805 million yuan, 872 million yuan, and 824 million yuan, respectively, with a slight decline in 2024 due to seasonal factors. The company reported a net loss of 878 million yuan, 876 million yuan, and 847 million yuan over the same period, with losses narrowing year by year [6][7]. Competitive Position - Zhibo Network is recognized as the largest software-centric smart cockpit solution provider in China based on revenue projections for 2024 and ranks first in terms of solution deployment volume. It is one of only two third-party suppliers in China with a fully self-developed automotive operating system [11]. - The company has achieved a deployment volume growth from 835,000 units in 2022 to 2.334 million units in 2024, with a CAGR of 67.2%. As of June 30, 2025, its solutions have been installed in over 8 million vehicles across more than 14 countries [11]. Financial Backing and Valuation - Zhibo Network has received significant financial backing, with cumulative financing exceeding 10 billion yuan since its establishment in 2015. Its latest funding round in September 2023 valued the company at approximately 22 billion yuan [12][13]. - The company has a high price-to-sales (P/S) ratio of approximately 26.7 times based on its valuation, significantly higher than Desay SV's 3 times and Huayang Group's 3.8 times [14]. Key Clients and Suppliers - SAIC and Alibaba are not only major shareholders but also the largest clients and suppliers of Zhibo Network. Revenue from the top five clients consistently accounted for around 90% of total revenue during the reporting period, with SAIC contributing significantly [16][17]. - Zhibo Network's relationship with SAIC is highlighted by its recognition as "Annual Software Supplier" by SAIC Volkswagen in 2023, indicating a strong client partnership [16].
理想汽车智驾方案MindVLA方案详解
自动驾驶之心· 2025-08-27 23:33
Core Viewpoint - The article discusses the advancements in autonomous driving technology, particularly focusing on the MindVLA framework, which integrates spatial intelligence, linguistic intelligence, action policy, and reinforcement learning to enhance vehicle autonomy and interaction capabilities. Group 1: MindVLA Framework Overview - MindVLA consists of four main modules: spatial intelligence, linguistic intelligence, action policy, and reinforcement learning, each serving distinct functions in the autonomous driving process [5][6]. - The spatial intelligence module utilizes multi-modal sensor data and a 3D encoder to extract spatiotemporal features, merging sensor and semantic information into a unified representation [5]. - The linguistic intelligence module employs a large language model (MindGP) for joint reasoning between spatial and language inputs, facilitating human-vehicle interaction through voice commands [5]. - The action policy module generates future vehicle behavior trajectories using diffusion models, introducing noise to guide the generation process for diverse action planning [5]. - The reinforcement learning module simulates external environment responses to evaluate actions and optimize behavior through continuous learning [5]. Group 2: GaussianAD Framework - The GaussianAD framework addresses the limitations of traditional end-to-end autonomous driving by using Gaussian representations for 3D scene initialization and interaction [12][10]. - It employs a 4D sparse convolution approach to extract multi-scale features from panoramic images, optimizing Gaussian parameters to create a sparse 3D semantic Gaussian set [16][12]. - The advantages of Gaussian representation include reduced computational redundancy while maintaining fine-grained 3D structure, significantly enhancing downstream task performance [16][15]. Group 3: Linguistic Intelligence Module - The linguistic intelligence module is designed to create a customized large language model (LLM) that is specifically trained on relevant data for autonomous driving, enhancing its spatial reasoning and language capabilities [18][19]. - The model architecture incorporates sparse design to improve inference performance while reducing capacity [18]. Group 4: Action Policy and Trajectory Generation - The action policy utilizes a diffusion model to decode action tokens into trajectories, enhancing the model's ability to navigate complex traffic environments [22][24]. - TrajHF, a component of the action policy, generates diverse trajectories through multi-conditional denoising and reinforcement learning fine-tuning, aligning generated trajectories with human driving preferences [25][26]. - The model structure includes a generative trajectory model and reinforcement learning fine-tuning to maximize human preference rewards, addressing the challenges of traditional imitation learning [28][30]. Group 5: Preference Data Construction - The process of constructing preference data involves labeling driving data with different driving style tags, focusing on key frames where significant actions occur [31][33]. - The key frame annotation process is designed to ensure data quality through random manual checks, allowing for large-scale annotation of driving preferences [31][33].