大型语言模型
Search documents
一个近300篇工作的综述!从“高层规划和低层控制”来看Manipulation任务的发展
具身智能之心· 2026-01-06 00:32
Core Insights - The article discusses the transformative advancements in robotic manipulation driven by the rapid development of visual, language, and multimodal learning, emphasizing the role of large foundation models in enhancing robots' perception and semantic representation capabilities [1][2]. Group 1: High-Level Planning - High-level planning is responsible for clarifying action intentions, organizing sequences, and allocating environmental attention, providing structured guidance for low-level execution [4]. - The core components of high-level planning include task decomposition and decision guidance, integrating multimodal information to address "what to do" and "in what order" [4]. - Task planning based on large language models (LLMs) maps natural language to task steps, with methods like SayCan and Grounded Decoding enhancing execution skill selection and planning capabilities [5]. - Multimodal large language models (MLLMs) break the limitations of pure text input by integrating visual and language reasoning, with models like PaLM-E and VILA demonstrating superior performance in embodied tasks [8]. - Code generation techniques convert planning into executable programs, improving the precision of language-based plans through methods like Code as Policies and Demo2Code [9]. - Motion planning utilizes LLMs and VLMs to generate continuous motion targets, linking high-level reasoning with low-level trajectory optimization [10]. - Usability learning focuses on establishing intrinsic associations between perception and action across geometric, visual, semantic, and multimodal dimensions [11]. - 3D scene representation transforms environmental perception into structured action proposals, bridging perception and action through techniques like Gaussian splatting [12]. Group 2: Low-Level Learning Control - Low-level control translates high-level planning into precise physical actions, addressing the "how to do" aspect of robotic manipulation [14]. - Learning strategies for skill acquisition are categorized into three main types, including pre-training and model-free reinforcement learning [16]. - Input modeling defines how robots perceive the world, emphasizing the integration of multimodal signals through reinforcement learning and imitation learning [18]. - Visual-action models utilize both 2D and 3D visual inputs to enhance action generation, while visual-language-action models integrate semantic, spatial, and temporal information [19]. - Additional modalities like tactile and auditory signals improve robustness in contact-rich manipulation scenarios [20]. Group 3: Challenges and Future Directions - Despite significant technological advancements, robotic manipulation faces four core challenges: the lack of universal architectures, data and simulation bottlenecks, insufficient multimodal physical interaction, and safety and collaboration issues [23][27][28][29]. - Future research directions include developing a "robotic brain" for flexible modal interfaces, establishing autonomous data collection mechanisms, enhancing multimodal physical interaction, and ensuring safety in human-robot collaboration [30]. - The review emphasizes the need for a unified framework that integrates high-level planning and low-level control, with a focus on overcoming data efficiency, physical interaction, and safety collaboration bottlenecks to facilitate the transition of robotic manipulation from laboratory settings to real-world applications [31].
如何应对不同类型的生成式人工智能用户
3 6 Ke· 2025-12-19 03:54
Core Insights - Understanding user perspectives on AI is crucial for designing effective tools based on large language models (LLMs) [1] - User research should not be overlooked, as assumptions about user experiences can lead to product failures [1] User Categories - Unaware Users: These users do not think about AI and do not see its relevance to their lives, leading to limited understanding of the underlying technology [2] - Avoidant Users: This group holds a negative view of AI, approaching it with skepticism and distrust, which can adversely affect brand relationships [3] - AI Enthusiasts: Users in this category have high expectations for AI, often unrealistic, believing it can handle all tedious tasks or provide perfect answers [4] - Informed AI Users: These users possess a realistic perspective and likely have higher information literacy, employing a "trust but verify" approach [5] User Expectations and Experiences - Many users may lack knowledge about how LLMs work and may have unrealistic expectations based on previous experiences with powerful tools [6] - Emotional responses and information levels combine to form user profiles, impacting how they perceive and interact with AI technologies [7] - The unique qualitative aspects of generative AI contribute to polarized user reactions, unlike other technologies [8] Non-Determinism and Complexity - Generative AI introduces non-determinism, breaking the reliability users expect from technology, which can undermine trust [9] - The "black box" nature of generative AI makes it difficult for users to understand how models arrive at specific outputs, leading to challenges in acceptance [10] Autonomy and User Control - The increasing autonomy of generative AI tools can create anxiety among users, especially when they are unaware of AI's involvement in tasks [11] - Users may struggle to recognize AI-generated content, raising concerns about the distinction between AI outputs and human-generated materials [11] Product Development Implications - Building products involving generative AI is feasible, but it requires careful consideration of risks and potential rewards [12] - Conducting thorough user research is essential to understand the distribution of user profiles and plan product features accordingly [13] - Training users on the solutions provided is critical to set realistic expectations and address potential concerns [13] User Adoption Strategies - Companies should respect user preferences, as some may refuse to use generative AI tools due to various reasons, including safety concerns or lack of interest [14] - Effective communication and thorough testing of solutions can help improve adoption rates over time, but imposing AI tools on users is counterproductive [14] Conclusion - The design of generative AI products necessitates a deep understanding of user interactions and expectations, as the impact on user relationships can be significant [15]
医疗AI迎来大考,南洋理工发布首个LLM电子病历处理评测
3 6 Ke· 2025-12-16 03:05
Core Insights - Researchers from Nanyang Technological University have developed the EHRStruct benchmark to evaluate the ability of large language models (LLMs) to process structured electronic health records (EHRs) [1][2] - The benchmark includes 11 core tasks organized by clinical scenarios, cognitive levels, and functional categories, comprising 2,200 samples [1][2] - Findings indicate that general-purpose models outperform medical-specific models, with data-driven tasks showing stronger performance [1][8] Benchmark Overview - EHRStruct is the first comprehensive benchmark for assessing LLMs' capabilities in handling structured EHRs, created collaboratively by computer scientists and medical experts [1][2] - The benchmark is structured into 11 tasks categorized into data-driven and knowledge-driven scenarios, covering understanding and reasoning levels [3][4] Task Categories - The tasks are divided into six typical categories: information retrieval, data aggregation, arithmetic computation, clinical identification, diagnostic assessment, and treatment planning [4][5] - Data-driven tasks include filtering, aggregation, and arithmetic reasoning, while knowledge-driven tasks focus on clinical code identification and predictive assessments [3][4] Evaluation Process - The evaluation process involves a systematic assessment of 20 LLMs, utilizing 200 question-answer samples for each task, with various input formats tested [11][10] - The benchmark supports in-depth experiments on specific models, including few-shot prompting and fine-tuning [11] Key Findings - General-purpose LLMs, particularly the Gemini series, demonstrate superior performance in structured EHR tasks compared to medical-specific models [14][8] - Data-driven tasks yield better results overall, while knowledge-driven tasks, especially diagnostic assessments, remain challenging for existing models [15][17] - The EHRMaster framework, when combined with Gemini, significantly enhances performance in both data-driven and knowledge-driven tasks [20][19] Future Directions - The EHRStruct 2026 challenge has been launched to provide a standardized platform for researchers to evaluate LLMs' capabilities in structured EHR processing [2] - Collaboration with international conferences is anticipated to facilitate the submission of research reports and papers based on the challenge [2]
Llama已死?Meta(META.US)将在明年初推出新AI大模型Avocado
Zhi Tong Cai Jing· 2025-12-09 13:46
Core Viewpoint - Meta is planning to release a new large language model (LLM) named "Avocado" in Q1 2026 to compete with companies like Google and OpenAI [1] Group 1: Product Development - The new model "Avocado" is seen as the successor to Meta's Llama series, which has faced development challenges [1] - Avocado is expected to be a proprietary model, unlike the current open-source Llama models that allow public access and modifications [1] Group 2: Strategic Decisions - In June, there were discussions among Meta executives, including Mark Zuckerberg, about reducing investment in the Llama series and considering models developed by competitors such as OpenAI and Anthropic [1] - Meta restructured its AI department to optimize its organizational framework for faster AI product development in response to competition [1] Group 3: Financial Investments - This summer, Meta invested nearly $15 billion to acquire a stake in Scale AI and appointed its CEO Alexandr Wong as the Chief AI Officer [1]
迎战谷歌新利器!OpenAI正研发新AI模型“Garlic”
Zhi Tong Cai Jing· 2025-12-03 08:41
Core Insights - OpenAI is developing a large language model codenamed "Garlic" to compete with Google's advancements in AI, particularly its Gemini 3 model [1][2] - The new model is expected to be released as GPT-5.2 or GPT-5.5, potentially as early as early next year [2] - OpenAI's leadership acknowledges the need to enhance ChatGPT's quality amid increasing competition in the AI space [2] Development and Performance - OpenAI's Chief Researcher, Mark Chen, indicated that Garlic has performed well in internal evaluations, particularly in programming and reasoning tasks, outperforming Gemini 3 and Anthropic's Opus 4.5 [1][3] - Garlic is distinct from another model in development, "Shallotpeat," which aims to challenge Gemini 3; Garlic incorporates lessons learned from Shallotpeat's pre-training phase [3][4] - Improvements in pre-training have addressed key issues, allowing OpenAI to inject knowledge into a smaller model rather than relying solely on larger models [4] Future Steps and Enhancements - Garlic will undergo several steps before release, including post-training, where it will be exposed to more curated data for specialized knowledge [5] - OpenAI is also working on a larger and better model based on the experiences gained from the Garlic project [5]
当大型语言模型计算“2+2”时
3 6 Ke· 2025-11-28 07:12
Core Insights - The article explores the unique cognitive processes of large language models (LLMs) and how they differ from human understanding, particularly in the context of arithmetic operations like 2+2 [2][6][8] Group 1: Arithmetic and Language Models - LLMs do not perform arithmetic in the traditional sense; instead, they convert numbers into vectors and find coherence in language patterns rather than calculating sums [2][6] - The process of LLMs arriving at the answer "4" is described as a search for coherence in a high-dimensional space, rather than a mathematical computation [3][8] Group 2: Understanding and Patterns - The article draws parallels between the cognitive processes of LLMs and human thought, suggesting that both rely on patterns and relationships rather than strict rules [4][6] - Children learn arithmetic through associative patterns before grasping numerical concepts, similar to how LLMs operate [4][6] Group 3: Illusion of Understanding - The concept of "anti-intelligence" is introduced, indicating that LLMs may appear intelligent due to their fluent outputs, but lack genuine understanding [5][7] - The coherence produced by LLMs can mislead humans into believing there is comprehension behind the responses, highlighting a shared obsession with coherence in both machines and humans [7][8]
大摩:谷歌每对外销售约50万颗TPU,将推升2027年谷歌云营收增加约130亿美元,每股盈利增长约3%
Ge Long Hui· 2025-11-27 02:33
Group 1 - The core viewpoint is that Google's external sales of approximately 500,000 TPUs could lead to an increase of about $13 billion in Google Cloud revenue by 2027, representing an 11% growth rate, and an increase of approximately $0.37 in earnings per share, equating to a 3% growth rate [1] - If Google Cloud's business growth continues to accelerate and the company's semiconductor market expansion is successful, it will help maintain a high valuation for its stock [1] Group 2 - In terms of industry scale, with Nvidia expected to ship around 8 million GPUs by 2027, Google's external sales of TPUs in the range of 500,000 to 1 million units remains reasonable [3] - There is uncertainty regarding Google's overall strategy for promoting TPU external sales, with investor focus on its business model, pricing strategy, and the types of workloads that TPUs can handle [3] - This year, Google has spent approximately $20 billion on Nvidia for large language model-related computing, while spending on TPUs has been only around $1 billion, indicating a potential adjustment in capital allocation next year, although overall AI chip demand is unlikely to result in a "winner-takes-all" scenario [3]
大摩:谷歌每对外销售约50万颗TPU,将推升2027年每股盈利约3%
Ge Long Hui· 2025-11-27 02:15
Core Insights - Morgan Stanley analysts estimate that Google's external sales of approximately 500,000 TPUs could increase Google Cloud revenue by about $13 billion, representing an approximate growth rate of 11% by 2027, with an increase in earnings per share of about $0.37, or roughly 3% [1] Group 1 - The potential for Google Cloud's revenue growth is linked to the successful expansion of its semiconductor market presence [1] - Analysts suggest that if Google Cloud's business growth accelerates, it will help maintain a high valuation for the company's stock [1] - The estimated external sales range for Google TPUs is considered reasonable, especially in the context of Nvidia's expected GPU shipments of around 8 million units by 2027 [1] Group 2 - There is uncertainty regarding Google's overall strategy for promoting TPU external sales, with key investor concerns focusing on its business model, pricing strategy, and the types of workloads that TPUs can support [1] - This year, Google has spent approximately $20 billion on Nvidia for large language model-related computing, while expenditures on TPUs have been around $1 billion, indicating a potential adjustment in capital allocation next year [1] - The overall demand for AI chips is unlikely to result in a "winner-takes-all" scenario, suggesting a competitive landscape [1]
喝点VC|a16z对话AI领袖:AI的“蛮力”之路能走多远?从根本上具备人性,才能真正理解人们想要什么
Z Potentials· 2025-11-22 03:21
Core Insights - The discussion highlights the rapid advancements in AI technology and its potential to create a new wave of independent entrepreneurs, transforming the software development landscape [5][30]. - There is a divergence in opinions regarding the timeline and feasibility of achieving Artificial General Intelligence (AGI), with some experts expressing optimism about imminent breakthroughs while others remain skeptical [9][19]. AI Development Status and Path to AGI - Adam D'Angelo emphasizes that there are no fundamental challenges that cannot be solved by the brightest minds in the coming years, citing significant progress in reasoning models and code generation [3][8]. - Amjad Masad compares the current AI evolution to historical revolutions, suggesting that humanity is undergoing a transformative change that may not be easily defined [4][27]. - D'Angelo believes that the next five years will see a drastically different world, contingent on resolving current limitations in AI context and usability [8][10]. Economic Transformation and Future Societal Landscape - D'Angelo predicts that the economic impact of AI could lead to GDP growth far exceeding 4-5% if AI can perform tasks at a lower cost than human labor [21]. - Masad raises concerns about the second-order effects of AI on the job market, particularly the potential for entry-level jobs to be automated while expert roles remain [22][23]. - The conversation suggests that as AI automates more tasks, the nature of work will shift, with a potential increase in demand for roles that leverage human creativity and emotional intelligence [24][25]. Technological Landscape Evolution and Entrepreneurial Ecosystem Outlook - D'Angelo expresses excitement about the increase in independent entrepreneurs enabled by AI technologies, which allow individuals to bring ideas to fruition without the need for large teams [28][30]. - The discussion touches on the balance between large-scale companies and new entrants in the market, suggesting that both can coexist and thrive in the evolving landscape [32][36]. - Masad highlights the importance of AI in programming, indicating that as these tools improve, they will democratize software development, allowing more people to create complex applications [44]. Future Challenges and Ultimate Thoughts - The conversation reflects on the cultural implications of increased reliance on AI, particularly regarding knowledge sharing and collaboration among employees [49]. - D'Angelo and Masad both acknowledge the need for ongoing research and innovation in AI to unlock its full potential and address the challenges that arise from its integration into society [41][42].
推出全新AMD Instinct MI350系列GPU优化服务器解决方案 超微电脑(SMCI.US)小幅上涨
Zhi Tong Cai Jing· 2025-11-20 16:20
Core Viewpoint - Super Micro Computer (SMCI.US) has launched its latest AMD Instinct MI350 series GPU-optimized server solutions, enhancing its product offerings in high-performance computing and AI infrastructure [1] Group 1: Product Launch - The new systems are specifically designed for enterprises that require high-end computing power from AMD Instinct MI355X GPUs while needing to operate in air-cooled environments [1] - The new servers can achieve up to 4 times the AI training performance compared to the previous generation [1] - The inference performance has seen a significant leap of up to 35 times, greatly improving capabilities in deploying large language models (LLM), generative AI, and scientific computing [1]