语言 - filings, earnings calls, financial reports, news

语言

Search documents

Sou Hu Cai Jing· 2025-07-16 17:19

Core Insights - The traditional business inquiry platforms face significant challenges, including information silos, operational inefficiencies, high understanding thresholds, and a lack of deep insights, which hinder effective decision-making [1][2][6] Group 1: Traditional Business Inquiry Platform Challenges - Information fragmentation leads to users needing to navigate multiple platforms for data retrieval, resulting in time-consuming processes and potential oversight of critical information [1] - Operational inefficiencies arise from the cumbersome keyword search and filtering processes, which do not meet the demands for rapid responses [1] - High understanding thresholds exist due to the presentation of raw data without sufficient analysis, placing a heavy cognitive burden on users [1] - The lack of insightful analysis limits the ability to derive deeper insights, predict trends, or provide decision-making support, resulting in underutilization of data value [1] Group 2: Waterdrop Credit's MCP Solution - Waterdrop Credit introduces a multi-type enterprise information query MCP that leverages large model technology to transform the business inquiry experience [2][6] - The MCP allows for natural language interaction, enabling users to express queries in everyday language, which the system can accurately interpret and analyze [10] - The platform features a panoramic data architecture that integrates diverse data sources, breaking down information silos and enabling comprehensive enterprise profiling [12] - Dynamic intelligent reports can be generated based on user queries, enhancing efficiency from data retrieval to decision support [14] - The MCP represents a shift from traditional information repositories to intelligent hubs, facilitating proactive insights and decision-making support [16]

DeepSeek式的AI味，越来越让人受不了了

36氪· 2025-07-16 13:37

以下文章来源于直面AI ，作者小金牙直面AI . 聚焦前沿科技，抢先看到未来。 "AI黑话"正在改变我们的语言系统文｜小金牙编辑｜肖阳来源｜直面AI（ID：faceaibang）封面来源｜ unsplash 我们让ChatGPT给这篇文章写了一个开头：语言，正在变得越来越熟悉——也越来越陌生。在我们这个由算法驱动的时代，表达不再只是人与人之间的桥梁，也成为人与机器之间的共鸣器。当越来越多的人开始使用AI工具协助写作与交流，我们的语言，正在悄然发生变化。不是突然的，也不是剧烈的，而是一种潜移默化的趋同：词汇变得标准，语气更趋中性，情绪被规整得恰到好处。你可能已经察觉到了这一点——一些句子听起来"哪里怪怪的"，太整洁，太流畅，太像AI。人们把这种风格称为："AI味"。这不仅仅是一个风格问题，它关乎我们如何看待创作，如何信任彼此，甚至如何定义"人"的表达方式。于是，一个看似简单的判断题浮现出来：这段文字，是人写的，还是AI写的？而更深一层的问题也随之而来：如果连我们自己也开始说得像AI，那么人类表达的界限，又该如何划定？让ChatGPT用AI味讨论AI味，实在是别有一番风味 ...

AI味

人工智能对人类语言的影响

Artificial Intelligence

Artificial Intelligence

ChatGPT

DeepSeek

ICML 2025杰出论文出炉：8篇获奖，南大研究者榜上有名

自动驾驶之心· 2025-07-16 11:11

Core Insights - The article discusses the recent ICML 2025 conference, highlighting the award-winning papers and the growing interest in AI research, evidenced by the increase in submissions and acceptance rates [3][5]. Group 1: Award-Winning Papers - A total of 8 papers were awarded this year, including 6 outstanding papers and 2 outstanding position papers [3]. - The conference received 12,107 valid paper submissions, with 3,260 accepted, resulting in an acceptance rate of 26.9%, a significant increase from 9,653 submissions in 2024 [5]. Group 2: Outstanding Papers - **Paper 1**: Explores masked diffusion models (MDMs) and their performance improvements through adaptive token decoding strategies, achieving a solution accuracy increase from less than 7% to approximately 90% in logic puzzles [10]. - **Paper 2**: Investigates the role of predictive technologies in identifying vulnerable populations for government assistance, providing a framework for policymakers [14]. - **Paper 3**: Introduces CollabLLM, a framework enhancing collaboration between humans and large language models, improving task performance by 18.5% and user satisfaction by 17.6% [19]. - **Paper 4**: Discusses the limitations of next-token prediction in creative tasks and proposes new methods for enhancing creativity in language models [22][23]. - **Paper 5**: Reassesses conformal prediction from a Bayesian perspective, offering a practical alternative for uncertainty quantification in high-risk scenarios [27]. - **Paper 6**: Addresses score matching techniques for incomplete data, providing methods that perform well in both low-dimensional and high-dimensional settings [31]. Group 3: Outstanding Position Papers - **Position Paper 1**: Proposes a dual feedback mechanism for peer review in AI conferences to enhance accountability and quality [39]. - **Position Paper 2**: Emphasizes the need for AI safety to consider the future of work, advocating for a human-centered approach to AI governance [44].

小模型逆袭！复旦&创智邱锡鹏团队造出「世界感知」具身智能体，代码数据完全开源！

具身智能之心· 2025-07-16 09:12

Core Viewpoint - The article discusses the introduction of the World-Aware Planning Narrative Enhancement (WAP) framework, which significantly improves the performance of large vision-language models (LVLMs) in embodied planning tasks by integrating world knowledge into the data and reasoning chain [2][17]. Group 1: Introduction - LVLMs are becoming central in embodied planning, but existing methods often rely on environment-agnostic imitation learning, leading to poor performance in unfamiliar scenarios [2]. - The WAP framework has shown a success rate increase from 2% to 62.7% on the EB-ALFRED benchmark, surpassing models like GPT-4o and Claude-3.5-Sonnet, highlighting the importance of world perception in high-level planning [2][17]. Group 2: Related Work - WAP differs from existing approaches by explicitly binding instruction-environment context at the data level and relying solely on visual feedback without privileged information [4]. Group 3: Technical Method - The framework injects four-dimensional cognitive narratives (visual, spatial, functional, syntactic) into the data layer, allowing the model to understand the environment before reasoning deeply [6]. - It employs closed-loop observation (only RGB + instructions) and a three-stage curriculum learning approach to develop environmental understanding and long-term reasoning capabilities [6][12]. Group 4: Experiments - The performance comparison on the EmbodiedBench (EB-ALFRED) shows that the WAP approach significantly enhances success rates across various task categories, with Qwen2.5-VL achieving a 60.7 percentage point increase in average success rate [14]. - The WAP framework demonstrates a notable improvement in long-term task success rates, achieving 70% compared to previous models [14][16]. Group 5: Conclusion and Future Work - WAP effectively incorporates world knowledge into the data and reasoning processes, allowing smaller open-source LVLMs to outperform commercial models in pure visual closed-loop settings [17]. - Future work includes expanding to dynamic industrial/outdoor scenes and exploring self-supervised narrative evolution for data-model iterative improvement [21].

具身智能

视觉-语言-大模型（LVLMs）

Artificial Intelligence

WAP (World-Aware Planning Narrative Enhancement)

具身智能

视觉-语言-大模型（LVLMs）

Artificial Intelligence

WAP (World-Aware Planning Narrative Enhancement)

ICCV 2025满分论文：一个模型实现空间理解与主动探索大统一

具身智能之心· 2025-07-16 09:12

Core Insights - The article discusses the transition of artificial intelligence from the virtual internet space to the physical world, emphasizing the challenge of enabling agents to understand three-dimensional spaces and align natural language with real environments [3][40] - A new model proposed by a collaborative research team aims to unify spatial understanding and active exploration, allowing agents to build cognitive maps of their environments through dynamic exploration [3][40] Group 1: Model Overview - The proposed model integrates exploration and visual grounding in a closed-loop process, where understanding and exploration are interdependent and enhance each other [10][14] - The model consists of two main components: online spatial memory construction and spatial reasoning and decision-making, optimized under a unified training framework [16][22] Group 2: Exploration and Understanding - In the exploration phase, the agent accumulates spatial memory through continuous RGB-D perception, actively seeking potential target locations [12][21] - The reasoning phase involves reading from the spatial memory to identify relevant candidate areas based on task instructions, utilizing cross-attention mechanisms [22][23] Group 3: Data Collection and Training - The authors propose a hybrid strategy for data collection, combining real RGB-D scan data with virtual simulation environments to enhance the model's visual understanding and exploration capabilities [25] - The dataset constructed includes over 900,000 navigation trajectories and millions of language descriptions, covering various task types such as visual guidance and goal localization [25] Group 4: Experimental Results - The MTU3D model was evaluated on four key tasks, demonstrating significant improvements in success rates compared to existing methods, with a notable increase of over 20% in the GOAT-Bench benchmark [28][29] - In the A-EQA task, the model improved the performance of GPT-4V, increasing its success rate from 41.8% to 44.2%, indicating its potential to enhance multimodal large models [32][33] Group 5: Conclusion - The emergence of MTU3D represents a significant advancement in embodied navigation, combining understanding and exploration to enable AI to autonomously navigate and complete tasks in real-world environments [40]

具身智能

强化学习

3D视觉语言模型

Artificial Intelligence

Artificial Intelligence

MTU3D

最新报告 | TrendForce 人形机器人产业研究--3Q25季度报告出刊

TrendForce集邦· 2025-07-16 09:05

Core Insights - The humanoid robot industry is gaining global attention, with significant advancements in technology, capital, and application scenarios expected to accelerate commercialization by Q3 2025 [1] Group 1: Major Manufacturer Dynamics - Manufacturers are focusing on the application value of humanoid robots, aiming for versions that can perform complex tasks, are easy to deploy, and can adapt to factory and household environments [2] - Key players such as Tesla, Boston Dynamics, Agility Robotics, Hexagon, Figure AI, Fourier Intelligence, and Lajuj Robot are continuously upgrading their hardware and software [2] Group 2: Key Component Analysis - The cost breakdown of humanoid robots shows that the motion layer accounts for 55% of the BOM cost, the cognitive layer for 23%, the sensing layer for 15%, and the power layer for 7% [3] - Different manufacturers and suppliers are exploring various application solutions and technological routes, particularly in embedded large language models (LLM) and machine vision [3] Group 3: Quarterly Trend Outlook - The current trend in humanoid robot development emphasizes software advancements leading hardware improvements, with a focus on LLM and simulation training platforms expected to be highlighted in the second half of 2025 [4]

一文了解 AI Agent：创业者必看，要把AI当回事

混沌学园· 2025-07-16 09:04

Core Viewpoint - The essence of AI Agents lies in reconstructing the "cognition-action" loop, iterating on human cognitive processes to enhance decision-making and execution capabilities [1][4][41]. Group 1: Breakthroughs in AI Agents - The breakthrough of large language models (LLMs) is fundamentally about decoding human language, enabling machines to possess near-human semantic reasoning abilities [2]. - AI Agents transform static "knowledge storage" into dynamic "cognitive processes," allowing for more effective problem-solving [4][7]. - The memory system in AI Agents plays a crucial role, with short-term memory handling real-time context and long-term memory encoding user preferences and business rules [10][12][13]. Group 2: Memory and Learning Capabilities - The dual memory mechanism allows AI Agents to accumulate experience, evolving from passive tools to active cognitive entities capable of learning from past tasks [14][15]. - For instance, in customer complaint handling, AI Agents can remember effective solutions for specific complaints, optimizing future responses [15]. Group 3: Tool Utilization - The ability to call tools is essential for AI Agents to expand their cognitive boundaries, enabling them to access real-time data and perform complex tasks [17][20]. - In finance, AI Agents can utilize APIs to gather market data and provide precise investment advice, overcoming the limitations of LLMs [21][22]. - The diversity of tools allows AI Agents to adapt to various tasks, enhancing their functionality and efficiency [26][27]. Group 4: Planning and Execution - The planning module of AI Agents addresses the "cognitive entropy" of complex tasks, enabling them to break down tasks into manageable components and monitor progress [28][30][32]. - After completing tasks, AI Agents can reflect on their planning and execution processes, continuously improving their efficiency and effectiveness [33][35]. Group 5: Impact on Business and Society - AI Agents are redefining the underlying logic of enterprise software, emphasizing collaboration between human intelligence and machine capabilities [36][37]. - The evolution from tools to cognitive entities signifies a major shift in how AI can enhance human productivity and decision-making [39][41]. - As AI technology advances, AI Agents are expected to play significant roles across various sectors, including healthcare and education, driving societal progress [44][45]. Group 6: Practical Applications and Community - The company has developed its own AI Agent and established an AI Innovation Institute to assist enterprises in effectively utilizing AI for cost reduction and efficiency improvement [46][48]. - The institute offers practical tools and methodologies derived from extensive real-world case studies, enabling businesses to integrate AI into their operations [51][58]. - Monthly collaborative learning sessions serve as a reflection mechanism, allowing participants to convert theoretical knowledge into actionable solutions [60][62].

7 周一款新产品，OpenAI 到底有多卷？离职员工长文复盘内部真实情况

Founder Park· 2025-07-16 07:07

在经历了多位核心成员离职出走创业，以及近期被 Meta 接连「挖走」核心研究人员之后，OpenAI 公司内部的真实情况是怎么样的？ OpenAI 前员工 Calvin French-Owen 写了一篇「离职复盘」文章，文章中详细地记录了自己在 OpenAI 工作的真实感受、公司内部独特的工作结构以及研究文化等。Calvin French-Owen 此前是一名创业者，在去年 5 月加入了 OpenAI，在职期间推动了 Codex 项目从原型阶段到正式上线。在这篇文章中，我们看到了 OpenAI 不同于外界媒体报道的另一面，当然，有好的地方也有「混乱」的情况。超 9000 人的「AI 产品市集」社群！不错过每一款有价值的 AI 应用。邀请从业者、开发人员和创业者，飞书扫码加群：进群后，你有机会得到：外界习惯把 OpenAI 想象成一个高度集中、协同作战的超级团队。但真实的 OpenAI 更像一个由无数小团队并行推进的集群系统，没有统一路线图，也很少节奏同步，研究方向往往是「自下而上」。在公司内部，把研究人员看作是「迷你 CEO」。大家都有强烈的倾向想独立推进自己的想法，看看会有什么结果。 ...

AGI

大语言模型

Artificial Intelligence

Artificial Intelligence

Codex

ChatGPT

小哥硬核手搓AI桌宠！接入GPT-4o，听得懂人话还能互动，方案可复现

量子位· 2025-07-16 07:02

Core Viewpoint - The article discusses the creation of an AI pet named Shoggoth, inspired by the Pixar lamp robot, which utilizes GPT-4o and 3D printing technology to interact with humans in a pet-like manner [1][48]. Group 1: AI Pet Development - Shoggoth is designed to communicate and interact with users, potentially replacing traditional stuffed toys as childhood companions [5][52]. - The robot's structure is simple, featuring a base with three motors and a 3D-printed conical head, along with a flexible tentacle system inspired by octopus grabbing strategies [8][10]. - The robot can adapt to various object sizes and weights, capable of handling items up to 260 times its own weight [8]. Group 2: Control and Interaction Mechanisms - Shoggoth employs a dual-layer control system: low-level control using preset actions and high-level control utilizing GPT-4o for real-time processing of voice and visual events [25][26]. - The robot's perception includes hand tracking and tentacle tip tracking, using advanced models like YOLO for 3D triangulation [30][33]. - A 2D mapping system simplifies the control of tentacle movements, allowing users to manipulate the robot via a computer touchpad [22][24]. Group 3: Technical Challenges and Solutions - Initial designs faced issues with cable entanglement, which were addressed by adding a cable spool cover and calibration scripts to improve tension control [14][16][17]. - The design also required reinforcement of the "spine" structure to prevent sagging under its own weight [18]. - The final model successfully transitioned from simulation to real-world application, validating the effectiveness of the control strategies implemented [38]. Group 4: Creator Background - The creator, Matthieu Le Cauchois, is an ML engineer with a background in reinforcement learning, speech recognition, and NLP, having previously founded an AI company [39][41]. - His work includes various innovative projects, showcasing his expertise in machine learning and robotics [46][48].

TACTILE-VLA：激活VLA模型的物理知识以实现触觉泛化（清华大学最新）

自动驾驶之心· 2025-07-16 04:05

作者丨 x 编辑丨具身智能之心本文只做学术分享，如有侵权，联系删文 >> 点击进入→ 具身智能之心技术交流群以下文章来源于具身智能之心，作者具身智能之心具身智能之心 . 与世界交互，更进一步点击下方卡片，关注" 具身智能之心 "公众号造成这一局限的关键原因在于触觉感知的缺失。与视觉和语言提供的高层语义信息不同，触觉感知能传递物理交互中丰富、局部且动态的反馈，如摩擦、柔顺性和材料特性等，这些信息对涉及接触的操作任务至关重要。尽管已有研究尝试将触觉纳入机器人系统，但多将其视为补充性感知模态，未能真正实现与决策过程的深度融合。针对这一问题，Tactile-VLA旨在通过深度融合视觉、语言、动作与触觉感知，激活VLA模型中隐含的物理知识，实现接触密集型任务中的精准力控制与泛化能力。更多干货，欢迎加入国内首个具身智能全栈学习社区：具身智能之心知识星球 (戳我) ，这里包含所有你想要的。提出背景与核心问题视觉-语言-动作模型凭借其强大的语义理解和跨模态泛化能力，已成为通用型机器人代理研发的核心驱动力。这类模型依托预训练的视觉-语言backbone网络，能够解读抽象指令并在多 ...