Workflow
大型语言模型(LLM)
icon
Search documents
还不知道研究方向?别人已经在卷VLA了......
自动驾驶之心· 2025-07-21 05:18
Core Viewpoint - The article emphasizes the shift in academic research from traditional perception and planning tasks in autonomous driving to the exploration of Vision-Language-Action (VLA) models, which present new opportunities for innovation and research in the field [1][2]. Group 1: VLA Research Topics - The VLA model aims to create an end-to-end autonomous driving system that maps raw sensor inputs directly to driving control commands, moving away from traditional modular architectures [2]. - The evolution of autonomous driving technology can be categorized into three phases: traditional modular architecture, pure visual end-to-end systems, and the emergence of VLA models [2][3]. - VLA models enhance interpretability and reliability by allowing the system to explain its decision-making process in natural language, thus improving human trust [3]. Group 2: Course Objectives and Structure - The course aims to help participants systematically master key theoretical knowledge in VLA and develop practical skills in model design and implementation [6][7]. - It includes a structured learning experience with a combination of online group research, paper guidance, and maintenance periods to ensure comprehensive understanding and application [6][8]. - Participants will gain insights into classic and cutting-edge papers, coding practices, and effective writing and submission strategies for academic papers [6][12]. Group 3: Enrollment and Requirements - The course is limited to 6-8 participants per session, targeting individuals with a foundational understanding of deep learning and autonomous driving algorithms [5][9]. - Basic requirements include familiarity with Python and PyTorch, as well as access to high-performance computing resources [13][14]. - The course emphasizes academic integrity and provides a structured environment for learning and research [14][19]. Group 4: Course Highlights - The program features a "2+1" teaching model with experienced instructors providing comprehensive support throughout the learning process [14]. - It is designed to ensure high academic standards and facilitate significant project outcomes, including a draft paper and project completion certificate [14][20]. - The course also includes a feedback mechanism to optimize the learning experience based on individual progress [14].
晚点独家丨Agent 初创公司 Pokee.ai 种子轮融资 1200 万美元,Point 72 创投,英特尔陈立武等投资
晚点LatePost· 2025-07-09 11:38
Core Viewpoint - Pokee.ai, an AI Agent startup, recently raised approximately $12 million in seed funding to accelerate research and sales efforts, with notable investors including Point72 Ventures and Qualcomm Ventures [5][6]. Group 1: Company Overview - Pokee.ai was founded in October 2022 and currently has only 7 employees. The founder, Zhu Zheqing, previously led the "Applied Reinforcement Learning" department at Meta, where he significantly improved the content recommendation system [7]. - Unlike other startups that use large language models (LLMs) as the "brain" of their agents, Pokee relies on a different reinforcement learning model that does not require extensive context input [7]. Group 2: Technology and Cost Efficiency - The current version of Pokee has been trained on 15,000 tools, allowing it to adapt to new tools without needing additional context [8]. - Using reinforcement learning models is more cost-effective compared to LLMs, which can incur costs of several dollars per task due to high computational demands. Pokee's task completion cost is only about 1/10 of its competitors [8]. Group 3: Market Strategy and Product Development - Pokee aims to optimize its ability to call data interfaces (APIs) across various platforms, targeting large companies and professional consumers to facilitate cross-platform tasks [9]. - The funding will also support the integration of new features, including a memory function to better understand client needs and preferences [9]. Group 4: Seed Funding Trends - The seed funding landscape for AI startups is evolving, with average seed round sizes increasing significantly. In 2020, the median seed round was around $1.7 million, which has risen to approximately $3 million in 2023 [10]. - The high costs associated with AI product development necessitate larger funding rounds to sustain operations, with some companies reportedly burning through $100 million to $150 million annually [13][14]. Group 5: Investment Climate - Investors are becoming more cautious, requiring solid product-market fit (PMF) before committing to funding. The median time between seed and Series A funding has increased to 25 months, the highest in a decade [17][18].
Gary Marcus惊世之言:纯LLM上构建AGI彻底没了希望!MIT、芝大、哈佛论文火了
机器之心· 2025-06-29 04:23
Core Viewpoint - The article discusses a groundbreaking paper co-authored by MIT, the University of Chicago, and Harvard, which reveals significant inconsistencies in reasoning patterns of large language models (LLMs), termed "Potemkin understanding," suggesting that the hope of creating Artificial General Intelligence (AGI) based solely on LLMs is fundamentally flawed [2][4]. Summary by Sections Introduction - Gary Marcus, a prominent AI scholar, highlights the paper's findings, indicating that even top models like o3 frequently exhibit reasoning errors, undermining the notion of their understanding and reasoning capabilities [2][4]. Key Findings - The paper argues that success in benchmark tests does not equate to genuine understanding but rather reflects a superficial grasp of concepts, leading to a "Potemkin understanding" where models provide seemingly correct answers that mask a deeper misunderstanding [3][17]. - The research team identifies two methods to quantify the prevalence of the Potemkin phenomenon, revealing that it exists across various models, tasks, and domains, indicating a fundamental inconsistency in conceptual representation [17][28]. Experimental Results - The study analyzed seven popular LLMs across 32 concepts, finding that while models could define concepts correctly 94.2% of the time, their performance in applying these concepts in tasks significantly declined, as evidenced by high Potemkin rates [29][33]. - The Potemkin rate, defined as the proportion of incorrect answers following correct responses on foundational examples, was found to be high across all models and tasks, indicating widespread issues in conceptual application [30][31]. Inconsistency Detection - The research also assessed internal inconsistencies within models by prompting them to generate examples of specific concepts and then asking them to evaluate their own outputs, revealing substantial limitations in self-assessment capabilities [36][39]. - The inconsistency scores ranged from 0.02 to 0.64 across all examined models, suggesting that misunderstandings stem not only from incorrect concept definitions but also from conflicting representations of the same idea [39][40]. Conclusion - The findings underscore the pervasive nature of the Potemkin understanding phenomenon in LLMs, challenging the assumption that high performance on traditional benchmarks equates to true understanding and highlighting the need for further research into the implications of these inconsistencies [40].
信息过载时代,如何真正「懂」LLM?从MIT分享的50个面试题开始
机器之心· 2025-06-18 06:09
Core Insights - The article discusses the rapid evolution and widespread adoption of Large Language Models (LLMs) in less than a decade, enabling millions globally to engage in creative and analytical tasks through natural language [2][3]. Group 1: LLM Development and Mechanisms - LLMs have transformed from basic models to advanced intelligent agents capable of executing tasks autonomously, presenting both opportunities and challenges [2]. - Tokenization is a crucial process in LLMs, breaking down text into smaller units (tokens) for efficient processing, which enhances computational speed and model effectiveness [7][9]. - The attention mechanism in Transformer models allows LLMs to assign varying importance to different tokens, improving contextual understanding [10][12]. - Context windows define the number of tokens LLMs can process simultaneously, impacting their ability to generate coherent outputs [13]. - Sequence-to-sequence models convert input sequences into output sequences, applicable in tasks like machine translation and chatbots [15]. - Embeddings represent tokens in a continuous space, capturing semantic features, and are initialized using pre-trained models [17]. - LLMs handle out-of-vocabulary words through subword tokenization methods, ensuring effective language understanding [19]. Group 2: Training and Fine-tuning Techniques - LoRA and QLoRA are fine-tuning methods that allow efficient adaptation of LLMs with minimal memory requirements, making them suitable for resource-constrained environments [34]. - Techniques to prevent catastrophic forgetting during fine-tuning include rehearsal and elastic weight consolidation, ensuring LLMs retain prior knowledge [37][43]. - Model distillation enables smaller models to replicate the performance of larger models, facilitating deployment on devices with limited resources [38]. - Overfitting can be mitigated through methods like rehearsal and modular architecture, ensuring robust generalization to unseen data [40][41]. Group 3: Output Generation and Evaluation - Beam search improves text generation by considering multiple candidate sequences, enhancing coherence compared to greedy decoding [51]. - Temperature settings control the randomness of token selection during text generation, balancing predictability and creativity [53]. - Prompt engineering is essential for optimizing LLM performance, as well-defined prompts yield more relevant outputs [56]. - Retrieval-Augmented Generation (RAG) enhances answer accuracy in tasks by integrating relevant document retrieval with generation [58]. Group 4: Challenges and Ethical Considerations - LLMs face challenges in deployment, including high computational demands, potential biases, and issues with interpretability and privacy [116][120]. - Addressing biases in LLM outputs involves improving data quality, enhancing reasoning capabilities, and refining training methodologies [113].
ACL 2025|为什么你设计的 Prompt 会成功?新理论揭示大模型 Prompt 设计的奥秘与效能
机器之心· 2025-06-16 04:04
Core Insights - The article discusses the importance of prompt design in enhancing the performance of large language models (LLMs) during complex reasoning tasks, emphasizing that effective prompts can significantly improve model accuracy and efficiency [2][7][36] - A theoretical framework is proposed to quantify the complexity of the prompt search space, transitioning prompt engineering from an empirical practice to a more scientific approach [5][35] Group 1: Prompt Design and Its Impact - The effectiveness of prompt engineering has historically been viewed as somewhat mystical, with certain combinations yielding significant performance boosts while others fall short [7] - Prompts serve as critical "selectors" in the chain of thought (CoT) reasoning process, guiding the model in extracting relevant information from its internal hidden states [12][36] - The study reveals that the choice of prompt templates directly influences the reasoning performance of LLMs, with optimal prompt designs leading to performance improvements exceeding 50% [29][36] Group 2: Theoretical Framework and Experimental Evidence - The research introduces a systematic approach to finding optimal prompts by breaking down the CoT reasoning process into two interconnected search spaces: the prompt space and the answer space [22][35] - Experimental results demonstrate that the introduction of CoT mechanisms allows LLMs to perform recursive calculations, which are essential for tackling multi-step reasoning tasks [26][30] - The study highlights that well-designed prompts can effectively dictate the output of each reasoning step, ensuring that only the most relevant information is utilized for subsequent calculations [28][36] Group 3: Limitations and Future Directions - The article notes that relying solely on generic prompts can severely limit the model's performance on complex tasks, indicating the need for tailored prompt designs [36] - Variants of CoT, such as Tree-of-Thought (ToT) and Graph-of-Thought (GoT), can enhance performance but are still constrained by the underlying prompt templates used [32][33] - The findings underscore the necessity for a deeper understanding of task requirements to design prompts that effectively guide LLMs in extracting and utilizing core information [23][35]
迈向人工智能的认识论:窥探黑匣子的新方法
3 6 Ke· 2025-06-16 03:46
Core Insights - The article discusses innovative strategies to better understand and control the reasoning processes of large language models (LLMs) through mechanical analysis and behavioral assessment [1][9]. Group 1: Mechanical Analysis and Attribution - Researchers are breaking down the internal computations of models, attributing specific decisions to particular components such as circuits, neurons, and attention heads [1]. - A promising idea is to combine circuit-level interpretability with chain-of-thought (CoT) verification, using causal tracing methods to check if specific parts of the model are activated during reasoning steps [2]. Group 2: Behavioral Assessment and Constraints - There is a growing interest in developing better fidelity metrics for reasoning, focusing on whether the model's reasoning steps are genuinely contributing to the final answer [3]. - The concept of using auxiliary models for automated CoT evaluation is gaining traction, where a verification model assesses if the answer follows logically from the reasoning provided [4]. Group 3: AI-Assisted Interpretability - Researchers are exploring the use of smaller models as probes to help explain the activations of larger models, potentially leading to a better understanding of complex circuits [5]. - Cross-architecture interpretability is being discussed, aiming to identify similar reasoning circuits in visual and multimodal models [6]. Group 4: Interventions and Model Editing - A promising methodology involves circuit-based interventions, where researchers can modify or disable certain attention heads to observe changes in model behavior [7]. - Future evaluations may include fidelity metrics as standard benchmarks, assessing how well models adhere to known necessary facts during reasoning [7]. Group 5: Architectural Innovations - Researchers are considering architectural changes to enhance interpretability, such as building models with inherently decoupled representations [8]. - There is a shift towards evaluating models in adversarial contexts to better understand their reasoning processes and identify weaknesses [8]. Group 6: Collaborative Efforts and Future Directions - The article highlights significant advancements in interpretability research over the past few years, with collaborations forming across organizations to tackle these challenges [10]. - The goal is to ensure that as more powerful AI systems emerge, there is a clearer understanding of their operational mechanisms [10].
“多模态方法无法实现AGI”
AI前线· 2025-06-14 04:06
Core Viewpoint - The article argues that true Artificial General Intelligence (AGI) requires a physical understanding of the world, as many problems cannot be reduced to symbolic operations [2][4][21]. Group 1: Limitations of Current AI Models - Current large language models (LLMs) may give the illusion of understanding the world, but they primarily learn heuristic collections for predicting tokens rather than developing a genuine world model [4][5][7]. - The understanding of LLMs is superficial, leading to misconceptions about their intelligence levels, as they do not engage in physical simulations when processing language [8][12][20]. Group 2: The Need for Embodied Cognition - The pursuit of AGI should prioritize embodied intelligence and interaction with the environment rather than merely combining multiple modalities into a patchwork solution [1][15][23]. - A unified approach to processing different modalities, inspired by human cognition, is essential for developing AGI that can generalize across various tasks [19][23]. Group 3: Critique of Multimodal Approaches - Current multimodal models often artificially sever the connections between modalities, complicating the integration of concepts and hindering the development of a coherent understanding [17][18]. - The reliance on large-scale models to stitch together narrow-domain capabilities is unlikely to yield a fully cognitive AGI, as it does not address the fundamental nature of intelligence [21][22]. Group 4: Future Directions for AGI Development - The article suggests that future AGI development should focus on interactive and embodied processes, leveraging insights from human cognition and classical disciplines [23][24]. - The challenge lies in identifying the necessary functions for AGI and arranging them into a coherent whole, which is more of a conceptual issue than a mathematical one [23].
迈向人工智能的认识论:真的没有人真正了解大型语言模型 (LLM) 的黑箱运作方式吗
3 6 Ke· 2025-06-13 06:01
Group 1 - The core issue revolves around the opacity of large language models (LLMs) like GPT-4, which function as "black boxes," making their internal decision-making processes largely inaccessible even to their creators [1][4][7] - Recent research highlights the disconnect between the reasoning processes of LLMs and the explanations they provide, raising concerns about the reliability of their outputs [2][3][4] - The discussion includes the emergence of human-like reasoning strategies within LLMs, despite the lack of transparency in their operations [1][3][12] Group 2 - The article explores the debate on whether LLMs exhibit genuine emergent capabilities or if these are merely artifacts of measurement [2][4] - It emphasizes the importance of understanding the fidelity of chain-of-thought (CoT) reasoning, noting that the explanations provided by models may not accurately reflect their actual reasoning paths [2][5][12] - The role of the Transformer architecture in supporting reasoning and the unintended consequences of alignment techniques, such as Reinforcement Learning from Human Feedback (RLHF), are discussed [2][5][12] Group 3 - Methodological innovations are being proposed to bridge the gap between how models arrive at answers and how they explain themselves, including circuit-level attribution and quantitative fidelity metrics [5][6][12] - The implications for safety and deployment in high-risk areas, such as healthcare and law, are examined, stressing the need for transparency in AI systems before their implementation [6][12][13] - The article concludes with a call for robust verification and monitoring standards to ensure the safe deployment of AI technologies [2][6][12]
喝点VC|a16z谈搜索大变局:搜索迈入由语言模型主导的“生成式引擎优化(GEO)”全新范式
Z Potentials· 2025-06-12 04:24
Core Insights - The article discusses the transition from traditional Search Engine Optimization (SEO) to Generative Engine Optimization (GEO), highlighting the impact of large language models (LLMs) on search behavior and marketing strategies [3][5][21] - It emphasizes that the SEO market, valued at over $80 billion, is facing challenges as search behavior shifts from browsers to LLM platforms, fundamentally altering how exposure and content optimization are defined [3][5][9] Transition from Links to Language Models - Traditional search relied on link-based ranking, while GEO focuses on language and direct answers generated by models [4][5] - The average query length has increased significantly to 23 words, compared to just 4 words in traditional searches, indicating deeper user engagement [4] - LLMs provide personalized responses through memory and reasoning capabilities, changing the content discovery and optimization logic [4][5] New Metrics and Competitive Focus - The focus of competition has shifted from click-through rates to "model citation rates," where brands need to be encoded into AI layers to build new competitive barriers [5][12] - Emerging platforms like Profound and Goodie help brands analyze their presence in AI-generated answers and track sentiment in model outputs [12][13] Brand Strategy Evolution - A new brand strategy is emerging that prioritizes model recognition over public recognition, with "unprompted awareness" becoming a key metric in the AI era [12][14] - Tools like Ahrefs' Brand Radar and Semrush's AI toolkit are adapting to help brands monitor their visibility and mentions in generative platforms [13][14] The Rise of GEO Tools - GEO tools are not just about data measurement but also about actively shaping LLM behavior through insights and iterative feedback loops [20] - Companies that excel in GEO will create actionable infrastructures for real-time marketing activities and content optimization [20][21] Timing and Market Dynamics - The article notes that the transition to GEO is still in its early stages, with significant opportunities for brands to adapt as advertising budgets shift rapidly [21][22] - The ultimate question for marketers in the AI-driven landscape is whether models will remember their brands [22]
本周WWDC推出新Siri无望?华尔街质疑苹果AI能力
Hua Er Jie Jian Wen· 2025-06-09 02:43
Core Insights - Apple's upcoming WWDC on June 9 is expected to disappoint investors due to ongoing challenges in upgrading Siri and integrating advanced large language models (LLM) into its AI functionality, "Apple Intelligence" [1][4] - The integration of LLMs to enhance Siri's conversational abilities has faced significant technical difficulties, leading to numerous bugs that competitors like OpenAI and Google have not encountered [3][8] - The delay in launching the upgraded Siri has resulted in a decline of approximately 18% in Apple's stock price since the beginning of 2025, making it the worst performer among the "Tech Seven" giants [4] Siri Upgrade Challenges - Apple is attempting to improve Siri's capabilities to respond more like a human, but the integration process has been plagued by bugs, which has hindered progress [3] - A former Apple executive criticized the gradual development approach, stating that it cannot fundamentally transform Siri [3] - Analysts suggest that it may take Apple three years or more to deliver a modernized AI assistant, significantly lagging behind competitors [8] Market Reactions - Investor sentiment has soured due to repeated delays in the "Apple Intelligence" feature, leading to low expectations for the upcoming WWDC [4] - Analysts from Morgan Stanley and Bank of America have expressed concerns about Apple's ability to meet its previous commitments regarding AI advancements [4][8] Strategic Focus Shift - The upcoming WWDC may focus more on brand restructuring rather than significant technological breakthroughs, with plans to rebrand operating systems and repackage existing features as "AI-driven" [9] - Apple is expected to announce the opening of its foundational models to third-party developers, although its LLM capabilities are significantly less complex than those of competitors [9] - Internal sources indicate that expectations for the AI segment of the conference are low, raising concerns about Apple's visibility in the AI space [9]