大语言模型
Search documents
没有专业背景,但他搞出了一家7亿美元估值的公司
Hu Xiu· 2025-09-15 04:49
Core Insights - Legora is rapidly growing in the legal tech sector, having expanded from Europe to the US and partnered with 250 law firms, including top firms like Cleary Gottlieb and Goodwin [1][2] - The company recently raised $80 million in Series B funding, achieving a valuation of $675 million, positioning itself as a strong competitor to Harvey [2] - The founder, Max Junestrand, emphasizes the importance of humility and collaboration with early partners to navigate the rapidly changing legal industry [3] Product Overview - Legora's product consists of a web application and a Word plugin, integrating AI functionalities into Microsoft Word [4] - The web application has evolved from a simple chat feature to a sophisticated intelligent agent capable of managing complex workflows [5][6] - The "Tabular Review" feature allows users to input multiple documents and queries for simultaneous processing, addressing the complexities of legal documents [9][10] Sales Strategy - Legora adopts a "win-win" approach in sales, positioning itself as a long-term partner for law firms needing to adopt new technologies [18][20] - The company recognizes that many legal services are similar, leading to price pressures and a need for efficiency, which drives firms to adopt new technologies [21][22] - Law firms are motivated to become leaders in adopting technology to maintain their competitive edge [23][24] Competitive Landscape - Legora competes with established legal tech companies but believes that the rapid pace of AI development allows it to outpace larger firms in product delivery [41][44] - The company has successfully built a team of around 100 employees, significantly increasing its development speed compared to larger competitors [45][46] - Law firms are increasingly reluctant to commit to long-term contracts, preferring shorter agreements that allow for flexibility in technology adoption [46][47] Future Outlook - The role of lawyers is expected to shift towards being reviewers rather than executors, managing AI outputs and ensuring quality [51][52] - The company aims to be a strategic partner for law firms, helping them navigate the transformation brought about by AI [61] - Junestrand advises new entrants in the legal tech space to avoid being tied to single suppliers and to find unique niches that AI cannot easily penetrate [63][64] Recruitment and Culture - Legora prioritizes hiring individuals with entrepreneurial backgrounds, fostering a culture of creativity and problem-solving [70][72] - The company has expanded from 10 to 100 employees in a year, emphasizing the importance of hiring proactive team members who can leverage AI for greater efficiency [67][68]
告别ROS的繁琐, 易用易学的机器人学习系统: 华为诺亚面向机器人学习的开源Python框架
机器之心· 2025-09-15 04:00
图 1: Ark 的整体框架 近年来,机器人技术在硬件领域取得了显著突破 —— 无论是 DARPA 机器人挑战赛,还是首届人形机器人自由搏击表演,都展示了令人瞩目的进展。然而,机器 人的自主能力仍明显落后于机器学习的发展步伐。 造成这一差距的 关键瓶 颈在于软 件层面 :现有的机器人技术栈学习门槛较高,仍大量依赖 C/C++ 进行底层开发,工具链分散且硬件集成复杂。相比之下,推动 现代人工智能发展的生态系统以 Python 为核心,文档完善、易于使用 —— 两者形成了鲜明对比。 为应对这些挑战,来自 华为诺亚方舟实验室,德国达姆施塔特工业大学,英国伦敦大学学院,帝国理工学院和牛津大学的研究者 们联合推出了 Ark —— 一个基 于 Python 的机器人开 发框架,支持快速原型 构建,并可便捷地在仿真和真实机器人系统上部署新算法 。 Ark 与主流机器学习工作流深度兼容,能够从仿真环境或实际机器人中采集和预处理数据,并支持使用如 ACT、Diffusion Policy 等前沿模仿学习方法进行策略训 练。该框架采用类似 OpenAI Gym 风格的主接口设计,极大降低了机器学习研究者的上手门槛,便于集成与实验 ...
作为研究,VLA至少提供了一种摆脱无尽corner case的可能性!
自动驾驶之心· 2025-09-15 03:56
Core Viewpoint - VLA (Vision-Language-Action) is emerging as a mainstream keyword in autonomous driving, with new players rapidly entering the field and industrial production accelerating, while academia continues to innovate and compete [1][2]. Summary by Sections 1. VLA Research and Development - The VLA model represents a shift from traditional modular architectures to a unified end-to-end model that directly maps raw sensor inputs to driving control commands, addressing previous bottlenecks in autonomous driving technology [3][4]. - Traditional modular architectures (L2-L4) have clear advantages in terms of logic and independent debugging but suffer from cumulative error effects and information loss, making them less effective in complex traffic scenarios [4][5]. 2. VLA Model Advantages - The introduction of VLA models leverages the strengths of large language models (LLMs) to enhance interpretability, reliability, and the ability to generalize to unseen scenarios, thus overcoming limitations of earlier models [5][6]. - VLA models can explain their decision-making processes in natural language, improving transparency and trust in autonomous systems [5][6]. 3. Course Objectives and Structure - The course aims to provide a systematic understanding of VLA, helping participants develop practical skills in model design and research paper writing, while also addressing common challenges faced by newcomers in the field [6][7]. - The curriculum includes 12 weeks of online group research, followed by 2 weeks of paper guidance and 10 weeks of paper maintenance, focusing on both theoretical knowledge and practical coding skills [7][8]. 4. Enrollment and Requirements - The program is designed for a small group of 6 to 8 participants, targeting individuals with a foundational understanding of deep learning and basic programming skills [11][16]. - Participants are expected to engage actively in discussions and complete assignments on time, maintaining academic integrity throughout the course [20][29]. 5. Course Highlights - The course offers a comprehensive learning experience with a multi-faceted teaching approach, including guidance from experienced mentors and a structured evaluation system to track progress [23][24]. - Participants will gain access to essential resources, including datasets and baseline codes, to facilitate their research and experimentation [24][25].
将KV Cache预算降至1.5%!他们用进化算法把大模型内存占用砍下来了
机器之心· 2025-09-14 05:16
机器之心报道 编辑:张倩 只用 1.5% 的内存预算,性能就能超越使用完整 KV cache 的模型,这意味着大语言模型的推理成本可 以大幅降低。EvolKV 的这一突破为实际 部署中的内存优化提供了全新思路。 键值缓存(KV cache)已经成为大模型快速运行的核心技术,它就像一个「记忆库」,能够保存之前计算过的结果并重复使用,这样就不用每次都重新计算同样 的内容。 但是,这个记忆库有个问题:输入的文本越长,需要的存储空间就越大,而且模型处理长文本时会变得非常慢。 为了应对这些挑战,现有的 KV cache 压缩方法主要依赖基于规则的启发式方法。当前的方法可以归类为三种范式: 虽然这些方法在降低内存占用方面有效,但它们未能考虑两个关键问题: 仅依赖基于规则的 KV cache 预算分层分配,可能导致任务相关信息无法被最优地保留。 针对这些限制,来自中国科学院大学、中国科学院自动化研究所的 Bohan Yu 和苏黎世联邦理工学院的 Yekun Chai 受到(Chai 等,2022)的启发, 采用进化算法 直接基于任务性能搜索最优的 KV cache 分配 。 图源: https://x.com/rohanp ...
AI解数学题只靠最后一个token
量子位· 2025-09-14 05:05
Core Insights - The research indicates that in mental arithmetic tasks, the majority of calculations are concentrated on the last token, rather than being distributed across all tokens, suggesting that global information access is not necessary for specific tasks like mental arithmetic [1][11]. Group 1: Research Methodology - Researchers employed Context-Aware Mean Ablation (CAMA) and attention-based peeking techniques to conduct a series of ablation experiments on models like Llama-3-8B [2][22]. - The experiments aimed to identify the "minimum computation" required for models to perform well by systematically removing or altering parts of the model [3]. - A sparse subgraph termed "All-for-One" (AF1) was identified, which allows efficient computation with minimal layers and limited information transfer [4][5]. Group 2: Model Structure and Functionality - In the AF1 structure, initial layers (L_wait) do not perform calculations related to their own values but instead focus on general preparatory tasks [7]. - Information is transferred to the last token through intermediate layers (L_transfer), which then independently performs the final calculations [8][9]. - This separation of general computation and input-specific computation highlights the model's efficiency in handling arithmetic tasks [10]. Group 3: Experimental Findings - The experiments revealed that Llama-3-8B requires only the first 14 layers for general computation, followed by 2 layers for information transfer, with the remaining layers dedicated to the last token's self-computation [24][26]. - AF1_llama demonstrated high fidelity across eight tasks, maintaining performance levels close to the original model [28][29]. - The importance of specific attention heads in arithmetic calculations was confirmed, with the model retaining approximately 95% accuracy even after removing nearly 60 heads, indicating redundancy in attention heads [30]. Group 4: Generalization and Limitations - AF1_llama was tested for its ability to generalize to other arithmetic forms, showing high accuracy in direct arithmetic tasks but failing in tasks requiring semantic understanding, such as word problems and Python code [32][34]. - Similar AF1-like subgraphs were found in Pythia and GPT-J models, although these models exhibited shorter waiting periods and less clear performance boundaries compared to Llama [35][36]. Group 5: Contributions and Innovations - This research contributes to the understanding of arithmetic reasoning and cross-token computation mechanisms in large language models [37]. - The methodologies introduced, CAMA and ABP, offer innovative approaches that could extend beyond arithmetic tasks to broader applications [37].
Meta开源MobileLLM-R1模型,不到1B参数,用1/10的训练就超越了Qwen3
机器之心· 2025-09-13 08:54
Core Viewpoint - Meta AI has officially released the MobileLLM-R1 series, which includes efficient sub-billion parameter language models optimized for on-device use cases, demonstrating significant performance improvements compared to existing open-source models [4][8]. Group 1: Model Performance and Features - The MobileLLM-R1 series includes three base models: MobileLLM-R1-140M, MobileLLM-R1-360M, and MobileLLM-R1-950M, which are not general chat models but are supervised fine-tuned (SFT) for specific tasks such as mathematics, programming (Python, C++), and scientific questions [6][8]. - The largest model, MobileLLM-R1-950M, was pre-trained using approximately 2 trillion high-quality tokens, achieving performance comparable to models trained on 36 trillion tokens, such as Qwen3 0.6B [8]. - MobileLLM-R1-950M outperforms existing models in various benchmarks, achieving five times higher accuracy on the MATH benchmark compared to the Olmo 1.24B model and twice as high as the SmolLM2 1.7B model [10]. Group 2: Model Architecture and Efficiency - The architecture of the MobileLLM-R1 models includes varying layers and parameters, with MobileLLM-R1-950M having 22 layers and 949 million parameters, while the smaller models have 15 layers and 140 million to 360 million parameters [14]. - The models are designed for text input and output, with a context length of 4k for base models and 32k for final models, supporting a vocabulary size of 128k [15]. Group 3: Research and Development Team - The development of the MobileLLM-R1 series was led by a team of researchers, including Zechun Liu, Ernie Chang, and Changsheng Zhao, who have extensive backgrounds in natural language processing and model optimization [18][21][30]. - The project took a year to develop, focusing on efficient deployment and optimization of large language models for resource-constrained environments [18][22].
100轮工具调用,8B小模型也能做复杂长搜索,MiniMax&港科大最新开源
3 6 Ke· 2025-09-12 12:25
Core Insights - The core issue with current network search agents is not the model parameters but the lack of sufficiently challenging training data [1][5][6] - The proposed method, WebExplorer, aims to create high-quality QA pairs that enable smaller models to outperform larger ones in complex search tasks [1][8][19] Group 1: Training Data Quality - High-quality training data is scarce, which limits the performance of existing open-source network agents in complex search tasks [5][6] - The development of high-capacity network search agents fundamentally relies on improving the quality of training data [6][19] Group 2: WebExplorer Methodology - WebExplorer employs a two-stage approach: Model-Based Exploration and Iterative Query Evolution, to create challenging QA pairs [8][10] - The first stage allows the model to autonomously explore the information space, while the second stage increases query difficulty by removing clear clues and introducing strategic ambiguity [10][12] Group 3: Performance and Results - The WebExplorer-8B model, trained using the new QA dataset, supports long-horizon reasoning with a context length of 128K and up to 100 tool calls, achieving state-of-the-art performance among models of similar size [3][16] - The model demonstrated significant performance improvements, with accuracy dropping from 86.6% to 67.1% for strong commercial models, indicating the effectiveness of the evolutionary process in creating complex queries [15][19] Group 4: Generalization and Application - WebExplorer's QA pair synthesis method shows effective generalization across different benchmarks and domains, even outside STEM fields [19] - The approach highlights the potential for smaller models to excel in complex tasks through carefully designed data synthesis methods and training strategies, which is crucial for AI applications in resource-constrained environments [19]
博实结(301608) - 301608投资者关系活动记录表2025年9月12日
2025-09-12 11:23
Group 1: Company Overview - The company specializes in the research, production, and sales of IoT intelligent products, focusing on communication, positioning, and AI technologies [1] - It aims to become a global expert in IoT intelligent application solutions, adhering to the mission of "empowering everything with wisdom" [1] - In 2024, the company achieved a revenue of CNY 1.402 billion, a year-on-year increase of 24.85%, and a net profit of CNY 176 million, an increase of 0.81% [1] Group 2: Recent Performance - In the first half of 2025, the company reported a revenue of CNY 805 million, a year-on-year increase of 20.17%, and a net profit of CNY 108 million, an increase of 19.07% [2] Group 3: Cloud Management Platform - The cloud management platform addresses the fragmented nature of the IoT market by providing standardized solutions that can be customized for diverse industry needs [2] - The platform has been enhanced with the local deployment of Deepseek large language model and Tongyi Qianwen video analysis model, improving user experience and technical capabilities [2] Group 4: Smart Sleep Terminal - The smart sleep terminal uses an ODM business model, allowing it to integrate seamlessly into existing home environments without requiring changes [2] - It tracks and analyzes users' sleep patterns in real-time, adjusting temperature based on individual needs to enhance sleep quality [2] - The primary markets for the smart sleep terminal include North America, Europe, the Middle East, and East Asia, with plans to enter the domestic market following product certification [2]
Claude 官方发文:如何给 Agent 构建一个好用的工具?
Founder Park· 2025-09-12 10:06
Core Insights - Anthropic has introduced new features in Claude that allow direct creation and editing of various mainstream office documents, expanding AI's application scenarios in practical tasks [2] - The company emphasizes the importance of designing intuitive tools for uncertain, reasoning AI rather than traditional programming methods [4] - A systematic evaluation of tools using real and complex tasks is essential to validate their effectiveness [5] Group 1 - The focus is on creating integrated workflow tools rather than isolated functionalities, which significantly reduces the reasoning burden on AI [6] - Clear and precise descriptions of tools are crucial for AI to understand their purposes, enhancing the success rate of tool utilization [7] - The article outlines key principles for writing high-quality tools, emphasizing the need for systematic evaluation and collaboration with AI to improve tool performance [13][36] Group 2 - Tools should be designed to reflect the unique affordances of AI agents, allowing them to perceive potential actions differently than traditional software [15][37] - The article suggests building a limited number of well-designed tools targeting high-impact workflows, rather than numerous overlapping functionalities [38] - Naming conventions and namespaces are important for helping AI agents choose the correct tools among many options [40] Group 3 - Tools should return meaningful context to AI, prioritizing high-information signals over technical identifiers to improve task performance [43] - Optimizing tool responses for token efficiency is crucial, with recommendations for pagination and filtering to manage context effectively [48] - The article advocates for prompt engineering in tool descriptions to guide AI behavior and improve performance [52] Group 4 - The future of tool development for AI agents involves shifting from predictable, deterministic patterns to non-deterministic approaches [54] - A systematic, evaluation-driven method is essential for ensuring that tools evolve alongside increasingly powerful AI agents [54]
Claude 的秘密:AI 聪不聪明,取决于你给它什么工具 | Jinqiu Select
锦秋集· 2025-09-12 08:48
Core Insights - Anthropic has introduced new features in Claude that allow direct creation and editing of various mainstream office files, expanding AI's application in practical tasks [1] - The company emphasizes a shift in mindset towards designing tools for AI agents rather than traditional coding practices [3] - The effectiveness of AI agents is heavily reliant on the quality and design of the tools provided to them [8] Group 1: Tool Design Principles - The core principle is to design intuitive and user-friendly tools for uncertain, reasoning AI, rather than focusing solely on input-output like traditional programming [3] - Tools should be evaluated through real and complex tasks to ensure they meet practical needs and can identify genuine issues [4] - It is more beneficial to create integrated workflow tools that handle multi-step tasks rather than offering a collection of fragmented API functionalities [5] Group 2: Tool Evaluation and Improvement - Clear and precise descriptions of tools are crucial, as they are the only means for AI to understand their purpose [6] - The process of building and testing tool prototypes should involve comprehensive evaluations to measure performance and iteratively improve the tools [15][21] - Engaging AI agents in the evaluation process can help analyze results and refine tools effectively [33] Group 3: Effective Tool Usage - Selecting the right tools is essential; more tools do not necessarily lead to better outcomes, and tools should be designed with the unique capabilities of AI agents in mind [36] - Tools should be organized into namespaces to avoid confusion among AI agents when selecting which tool to use [39] - Returning meaningful context from tools is important, prioritizing high-information signals over technical identifiers [42] Group 4: Future Outlook - The approach to building effective tools for AI agents must evolve from predictable, deterministic patterns to non-deterministic models [54] - A systematic, evaluation-driven method for improving tools will ensure that as AI agents become more powerful, the tools they use will also evolve accordingly [54]