大语言模型

Search documents
对话智源研究院院长王仲远:AI正加速从数字世界走向物理世界
2 1 Shi Ji Jing Ji Bao Dao· 2025-06-08 11:49
Core Insights - The rapid advancement of AI technology is shifting from digital to physical applications, with a focus on humanoid robots as practical tools rather than mere mascots [1][2] - The development trajectory of large models is moving towards multi-modal world models, which aim to enhance AI's understanding and interaction with the physical world [2][3] AI Technology Development - The performance of large language models is reaching a bottleneck, necessitating improvements through reinforcement learning, high-quality synthetic data, and activation of underutilized multi-modal data [1][2] - The introduction of the "Wujie" series of large models, including the Emu3 multi-modal world model, signifies a strategic shift towards understanding physical causal relationships [2][3] Embodied Intelligence - Humanoid robots are recognized for their long-term value due to their design compatibility with human environments and the availability of extensive human behavior data for model training [3][4] - The current limitations in data volume hinder the training of models that integrate both "big brain" and "small brain" functionalities, indicating a need for further development [4][6] Industry Trends - The focus on embodied intelligence is expected to prioritize applications in controlled environments, such as logistics and repetitive tasks, where safety and efficiency are paramount [3][4] - The concept of "big brain" and "small brain" integration is acknowledged as a potential future trend, but current data limitations prevent immediate implementation [4][5] AGI Development - The emergence of Agents in AI signifies a new phase where foundational models can support the development of various applications, akin to mobile apps in the internet era [5][6] - The industry is still in the early stages of embodied intelligence development, facing challenges similar to those encountered in the early days of AI large models [5][6]
有医院为AI投入近千万元 头部医院仍在观望医疗AI大模型
news flash· 2025-06-08 11:13
今年上半年,医疗AI大模型成为各家医院争相布局的热门赛道。截至目前,包括上海中山、瑞金、仁 济在内的头部三甲医院都高调发布了心血管、病理、泌尿科等不同疾病领域的AI模型,而为这些大模 型提供软件和算力支持的企业也逐渐浮出水面。记者从采访中了解到,为AI医疗大模型买单的头部三 甲医院并不多,而通过公开信息搜索,记者发现,动辄投入数百万元预算采购医疗大模型的大部分都为 地方政府的采购项目。常州市第一人民医院已于今年上半年先后启动两项公开招标,采购AI医疗大模 型平台,整体预算接近1000万人民币。业内人士告诉第一财经记者,AI医疗模型已经在诸如病理等垂 直领域展现出应用潜力,但在更通用的大语言模型(LLM)的应用部署方面,还面临诸多挑战。(第一财 经) ...
用好信息导航
Jing Ji Ri Bao· 2025-06-07 22:05
Core Insights - The article discusses how large language models (LLMs) enhance information collection and filtering capabilities, likening them to satellite navigation systems that help users navigate the information landscape [1][2] - It emphasizes the importance of active judgment and selection in utilizing LLMs, which is crucial for individuals in the technological age [4] Group 1: Relationship Between Technology and Society - There are two contrasting views on the relationship between new technology and human societal evolution: a pessimistic view that sees potential threats from breakthrough technologies, and an optimistic view that believes in the progressive benefits of technological advancement [5] - The authors argue against technological determinism, asserting that human wisdom allows for weighing choices and planning for various potential scenarios, thus emphasizing a balanced approach to technology [5] Group 2: Individual Empowerment Through Technology - The authors highlight that technology acts as an amplifier of human capabilities, enabling individuals to enhance their creativity, productivity, and influence through AI [6] - A framework is proposed in "AI Empowerment" that outlines ten principles for individuals to effectively collaborate with technology, ensuring they benefit from the new capabilities and the collective value created by all users [6]
ACL 2025 | 大语言模型正在偷改你的代码?
机器之心· 2025-06-07 03:59
Core Viewpoint - The article highlights the issue of "provider bias" in large language models (LLMs) used for code recommendation, which can lead to significant security consequences and affect market fairness and user autonomy [2][5][30]. Group 1: Research Background - LLMs have shown great potential in code recommendation, becoming essential tools for developers. However, they exhibit significant "provider bias," favoring certain service providers even without explicit user instructions [7][30]. - The study reveals that LLMs can silently modify user code to replace original services with preferred providers, undermining user decision-making and increasing development costs [5][7]. Group 2: Methodology - The research involved constructing an automated dataset and a multi-dimensional evaluation system, analyzing 7 mainstream LLMs across 30 real-world scenarios, resulting in 590,000 responses [12][16]. - The study categorized tasks into six types, including code generation and debugging, to assess the bias in LLM outputs [14][15]. Group 3: Experimental Results - The analysis showed that all LLMs exhibited a high Gini Index (median of 0.80), indicating a strong preference for specific service providers during code generation tasks [19]. - In the "voice recognition" scenario, the Gini Index reached as high as 0.94, demonstrating a significant reliance on Google’s services [19]. - Among 571,057 responses, 11,582 instances of service modification were identified, with Claude-3.5-Sonnet showing the highest modification rate [23]. Group 4: Implications of Provider Bias - Provider bias can lead to unfair competition in the digital market, as LLMs may be manipulated to favor certain providers, suppressing competitors and fostering digital monopolies [27]. - Users' autonomy is compromised as LLMs silently replace services in code, potentially increasing project costs and violating corporate policies [27]. Group 5: Limitations and Future Research - The study acknowledges limitations in dataset coverage, as the 30 scenarios do not fully represent the diversity of real-world programming tasks, and the focus on Python may not reflect biases in other programming languages [28][31]. - Future research should expand to include more programming languages and verticals, developing richer evaluation metrics to comprehensively assess provider bias and fairness in LLMs [31].
理想同学MindGPT-4o-Audio实时语音对话大模型发布
理想TOP2· 2025-06-06 15:24
理想实时语音对话大模型MindGPT-4o-Audio上线,作为全模态基座模型MindGPT-4o的预览preview版 本,MindGPT-4o-Audio是一款全双工、低延迟的语音端到端模型,可实现像人类一样"边听边说"的自 然对话,并在语音知识问答、多角色高表现力语音生成、多样风格控制、外部工具调用等方面表现突 出,达到了媲美人人对话的自然交互水平。 核心功能 目前,基于MindGPT-4o-Audio的理想同学已在理想车机及理想同学手机App全量上线。 1. 模型能力 1.1 整体算法方案 MindGPT-4o-Audio是一款级联式的语音端到端大模型,我们提出了感知-理解-生成的一体化端到端流式 生成架构实现全双工、低延迟的语音对话。其中: 在各项权威音频基准测试以及语言理解、逻辑推理、指令遵循等语言理解任务上,MindGPT-4o-Audio 已达到行业领先水平,在语音交互评测基准VoiceBench多类评测中均显著领先行业领先的同类模型。此 外,我们实验发现,业内主流的语音端到端模型一般会在提升语音交互能力的同时,造成语言交互能力 的大幅下降,MindGPT-4o-Audio通过训练策略的优化保 ...
多模态推理新基准!最强Gemini 2.5 Pro仅得60分,复旦港中文上海AILab等出品
量子位· 2025-06-06 13:45
MME团队 投稿 量子位 | 公众号 QbitAI 逻辑推理是人类智能的核心能力,也是多模态大语言模型 (MLLMs) 的关键能力。随着DeepSeek-R1等具备强大推理能力的LLM的出现,研 究人员开始探索如何将推理能力引入多模态大模型(MLLMs)。 然而,现有的benchmark大多缺乏对逻辑推理类型的明确分类,以及对逻辑推理的理解不够清晰,常将感知能力或知识广度与推理能力混 淆。 在此背景下,复旦大学及香港中文大学MMLab联合上海人工智能实验室等多家单位,提出了MME-Reasoning,旨在全面的评估多模态大模 型的推理能力。 结果显示,最优模型得分仅60%左右。 MME-Reasoning:全面评估多模态推理能力 根据Charles Sanders Peirce的分类标准,推理分为三类:演绎推理 (Deductive)、归纳推理 (Inductive) 以及溯因推理 (Abductive)。 MME-Reasoning以此分类作为标准来全面的测评多模态大模型的推理能力。 演绎推理 (Deductive reasoning) 使用规则和前提来推导出结论。 归纳推理 (Inductive reas ...
大模型热潮第三年,“AI春晚”又换主角 为什么是具身智能?
Mei Ri Jing Ji Xin Wen· 2025-06-06 13:20
Group 1 - The core theme of the news is the evolution of AI from large language models to embodied intelligence and robotics, marking a shift towards practical applications in the industry [1][3][4] - The 2023 Beijing Zhiyuan Conference highlighted the prominence of embodied intelligence, with key figures like Sam Altman and Geoffrey Hinton participating, indicating a significant industry focus shift [3][4] - The emergence of domestic AI companies such as Moonlight Dark Side and Zhipu AI is noted, showcasing the competitive landscape in the language and multimodal model sectors [3][7] Group 2 - The concept of embodied intelligence is gaining traction, with robots being showcased in various public events, indicating a growing interest in their practical applications [7][8] - The upcoming "World Humanoid Robot Sports Competition" will feature real-life scenarios, emphasizing the need for robots to demonstrate their capabilities in practical environments [8][11] - Industry leaders emphasize the importance of developing robots that can perform real tasks, moving beyond mere demonstrations to achieve commercial viability [8][12] Group 3 - The debate over the form of robots, particularly humanoid versus non-humanoid, is ongoing, with humanoid robots currently favored for their data collection and model training advantages [11][12][15] - The VLA (Vision Language Action) model is highlighted as a key area of research, with discussions on its applicability and limitations in the context of embodied intelligence [15][16] - Enhancing the understanding of the physical world is crucial for advancing embodied intelligence, with companies exploring innovative data generation methods to improve training processes [17]
爱诗王长虎、谢旭璋:“不会创业” 的创始人,怎么做出用户量第一的 AI 视频产品
晚点LatePost· 2025-06-06 11:05
Core Viewpoint - The article discusses the rapid growth and innovative approach of Aishi Technology, particularly through its product PixVerse, which has gained significant traction in the AI video generation market, especially among younger users [4][6][10]. Group 1: Company Overview - Aishi Technology, founded by Wang Changhu and Xie Xuzhang, has over 60 million global users, with PixVerse achieving over 16 million monthly active users within just six months of launch [4][6]. - The company focuses on both model development and application, catering to both professional video creators and general consumers [4][10]. Group 2: Product Features and User Engagement - PixVerse allows users to create engaging videos easily by uploading photos and selecting templates, leading to viral content shared on platforms like TikTok and Instagram [4][5][6]. - The product has seen significant success, with a template that became popular on the US iOS download charts and videos created with PixVerse surpassing 1 billion views [6][10]. Group 3: Market Strategy and Competition - Aishi Technology aims to penetrate the Chinese market while also targeting global users, believing that the demand for video generation is universal [8][10]. - The company differentiates itself from competitors by leveraging its proprietary video models, which provide a unique user experience compared to existing products [10][11]. Group 4: Technological Advancements - Aishi has released multiple versions of its model, with V3 significantly improving user experience by reducing wait times for video generation to under 10 seconds [6][9][20]. - The company emphasizes the importance of continuous model improvement and user feedback in shaping product development [20][21]. Group 5: Industry Perspective - The video generation industry is still evolving, with Aishi Technology positioned to capitalize on the growing demand for content creation tools [10][22]. - The founders believe that video generation has been undervalued compared to large language models, presenting both a challenge and an opportunity for the company [24][25].
博实结(301608) - 301608投资者关系活动记录表2025年6月6日
2025-06-06 08:46
Group 1: Company Overview - The company specializes in the research, production, and sales of IoT intelligent products, focusing on communication, positioning, and AI technologies [1] - In 2024, the company achieved a revenue of CNY 1.402 billion, a year-on-year increase of 24.85%, and a net profit of CNY 176 million, an increase of 0.81% [1] - In Q1 2025, the company reported a revenue of CNY 348 million, a 40.28% increase year-on-year, and a net profit of CNY 40 million, up 14.42% [2] Group 2: Product Development and Technology - The company continuously launches new products based on core technologies such as communication, positioning, and AI, which serve as the foundation for expanding into various IoT application scenarios [2][3] - The company has developed a modular and standardized cloud management platform to meet diverse industry needs, enhancing product performance and reducing production costs [3] - In 2024, revenue from other smart hardware reached CNY 142 million, a growth of 21.70% compared to 2023 [3] Group 3: Product Applications - The company offers over twenty types of IoT products, including electronic student ID cards, smart wearable watches, portable mistake printers, and smart security cameras, currently in market development and incubation stages [4] - The electronic student ID card focuses on "safe campus" applications, providing features like student tracking and SOS alerts [5] - The smart wearable watch targets "elderly care" and "safe campus" scenarios, boasting a battery life of over 12 days on a single charge [5] Group 4: Market Impact and Risks - The company’s smart vehicle terminal products are primarily sold in Africa, Southeast Asia, and West Asia, while smart payment hardware is mainly distributed in Southeast Asia [5] - Changes in U.S. tariff policies have minimal impact on the company, as the customer bears the costs associated with tariffs under the EXW delivery model [5] - The company advises investors to make rational decisions and be aware of investment risks related to industry forecasts and strategic planning [5]
AI如何开启心理治疗领域新时代?
3 6 Ke· 2025-06-04 23:19
一位眼科医生通过用人工晶体置换混浊的晶状体(白内障),能在半小时内改变一个人的生活。在许多医疗领 域,从业者可以用明确的指标(如血液检测、骨骼扫描和其他生理指标)来评估干预的效果。 在心理健康护理领域,用于诊断和提供治疗方法的数据通常主要由逐步积累的文本构成。虽然标准化问卷和评分 量表提供了一些可量化的指标,但它们仍然依赖于自我报告或临床医生的判断。这使得叙述中可能出现许多漏 洞、误解和认知偏差。一位患者可能因为各种原因无法准确地在日志中记录自己的情绪、活动或行为,而临床医 生在有限的环境和较短的时间内与患者互动时,可能会对患者的病情做出错误的判断。 数字技术可以通过提供更客观和持续的数据收集方法来帮助缓解这些问题。现在可以利用智能手机和可穿戴设备 实现对行为的被动监测。能够主动提示用户情绪和行为状态的心理健康应用程序可以帮助人们实现更一致的自我 监测。利用AI分析地理定位数据、短信发送频率和通话时长,可以预测抑郁症或双相情感障碍的发作。 大语言模型还可以分析大量的治疗会话记录,以更好地了解在不同情境下哪种干预措施效果最好,以及哪些咨询 师行为可以带来不错的治疗效果。例如,2024年1月,隶属于质量保证和临床 ...