通用人工智能(AGI)
Search documents
全球多模态推理新标杆 智谱视觉推理模型GLM-4.5V正式上线并开源
Zheng Quan Ri Bao Wang· 2025-08-12 08:46
Group 1 - Beijing Zhiyuan Huazhang Technology Co., Ltd. (Zhiyuan) launched the GLM-4.5V, a 100B-level open-source visual reasoning model with a total of 106 billion parameters and 12 billion active parameters [1][2] - GLM-4.5V is a significant step towards Artificial General Intelligence (AGI) and achieves state-of-the-art (SOTA) performance across 41 public visual multimodal benchmarks, covering tasks such as image, video, document understanding, and GUI agent functionalities [2][5] - The model features a "thinking mode" switch, allowing users to choose between quick responses and deep reasoning, balancing efficiency and effectiveness [5][6] Group 2 - GLM-4.5V is composed of a visual encoder, MLP adapter, and language decoder, supporting 64K multimodal long contexts and enhancing video processing efficiency through 3D convolution [6] - The model employs a three-stage strategy: pre-training, supervised fine-tuning (SFT), and reinforcement learning (RL), which collectively enhance its capabilities in complex multimodal understanding and reasoning [6][7] - The pricing for API calls is set at 2 yuan per million tokens for input and 6 yuan per million tokens for output, providing a cost-effective solution for enterprises and developers [5]
马斯克,指责苹果“偏心”
Zheng Quan Shi Bao· 2025-08-12 04:59
马斯克,突然发声。 美国当地时间8月11日,特斯拉首席执行官埃隆·马斯克在社交平台发文称,苹果公司涉嫌通过限制措 施,使除美国开放人工智能研究中心(OpenAI )外的任何人工智能公司都无法在其应用商店排行榜中 登顶,称此为"明确的反垄断违规行为"。马斯克表示,其旗下xAI公司将立即采取法律行动。 显然,在人工智能道路上,xAI公司与OpenAI正处于激烈对抗中。 在马斯克威胁对苹果采取法律行动后,Sam Altman在X上转发了前者的帖文并表示:"我听说有人指控 马斯克通过操纵X来谋取个人及公司利益,并损害其竞争对手和他不喜欢的人的利益,这一指控令人震 惊。我希望有人能对此展开调查,我和许多人都想知道究竟发生了什么。OpenAI将专注于打造卓越的 产品。" 据央视报道,在截至8月11日美国地区的苹果应用商店内的生产力软件排行中,OpenAI的ChatGPT排第 一,xAI的Grok排第二。 早前,马斯克去年先后在州和联邦两级法院起诉OpenAI,指控后者违背非营利承诺,转向商业化路 线,并申请法庭阻止OpenAI转制。马斯克还多次公开批评Sam Altman。 xAI公司是马斯克于2023年创办的人工智能初 ...
智谱推出全球100B级最强开源多模态模型GLM-4.5V:获41个榜单SOTA
IPO早知道· 2025-08-12 01:52
Core Viewpoint - The article discusses the launch of GLM-4.5V, a state-of-the-art open-source visual reasoning model by Zhipu, which is a significant step towards achieving Artificial General Intelligence (AGI) [3][4]. Group 1: Model Overview - GLM-4.5V features a total of 106 billion parameters, with 12 billion activation parameters, and is designed for multi-modal reasoning, which is essential for AGI [3][4]. - The model builds on the previous GLM-4.1V-Thinking, showcasing enhanced performance across various visual tasks, including image, video, and document understanding [4][6]. Group 2: Performance Metrics - In 41 public multi-modal benchmarks, GLM-4.5V achieved state-of-the-art (SOTA) performance, outperforming other models in tasks such as general visual question answering (VQA) and visual grounding [5][6]. - Specific performance metrics include a general VQA score of 88.2 on MMBench v1.1 and 91.3 on RefCOCO-avg for visual grounding tasks [5]. Group 3: Technical Features - The model incorporates a visual encoder, MLP adapter, and language decoder, supporting 64K multi-modal long contexts and enhancing video processing efficiency through 3D convolution [6][8]. - It utilizes a three-stage training strategy: pre-training, supervised fine-tuning (SFT), and reinforcement learning (RL), which collectively improve its multi-modal understanding and reasoning capabilities [8]. Group 4: Practical Applications - Zhipu has developed a desktop assistant application that leverages GLM-4.5V for real-time screen capture and processing various visual reasoning tasks, enhancing user interaction and productivity [8][9]. - The company aims to empower developers through model open-sourcing and API services, encouraging innovative applications of multi-modal models [9].
用时间积累换突破——月之暗面专注通用人工智能领域
Jing Ji Ri Bao· 2025-08-11 22:12
Core Insights - Moonshot AI, based in Beijing, is gaining attention for its open-source model Kimi K2, which ranked fifth globally upon its launch in July 2023 [1] - The company's mission is to explore the limits of intelligence and make AI universally accessible [1] Company Overview - Founded in April 2023 by a team with extensive experience in natural language processing (NLP), Moonshot AI aims to discover transformative possibilities in artificial intelligence [1] - The company has approximately 300 employees, with a significant portion being young talent from the '90s generation [2] Product Development - Kimi K2, a trillion-parameter model, has a unique capability to handle long texts, supporting up to 200,000 Chinese characters [2][5] - The Kimi intelligent assistant was launched in October 2023, followed by several product releases, including Kimi browser assistant and Kimi-Researcher [2] Technical Innovations - Kimi K2's architecture allows for complex tasks at a lower cost, with only 32 billion active parameters [3] - The model has excelled in various benchmarks, particularly in programming, tool usage, and mathematical reasoning [6] User Engagement - Kimi K2's long-text capability has led to a significant increase in user adoption, with user numbers growing from hundreds of thousands to tens of millions in 2024 [5] - The model is designed to be user-friendly, allowing non-programmers to utilize its capabilities effectively [7] Future Aspirations - Moonshot AI aims to create a general-purpose AI that surpasses human intelligence, focusing on developing versatile skills that can enhance each other [8] - The company emphasizes the importance of building a strong foundational model before releasing products, ensuring robust performance and capabilities [8]
智谱宣布开源视觉推理模型GLM-4.5V正式上线并开源
Feng Huang Wang· 2025-08-11 14:14
Core Insights - The article discusses the launch of GLM-4.5V, an open-source visual reasoning model by Zhiyuan AI, which boasts a total of 106 billion parameters and 12 billion active parameters [1] - The model is positioned as the best-performing open-source model in its class, achieving state-of-the-art (SOTA) performance across 41 public multimodal benchmarks [1] - The pricing for API calls is set at 2 yuan per million tokens for input and 6 yuan per million tokens for output, making it competitively priced [1] Company Overview - Zhiyuan AI has introduced GLM-4.5V, which is based on its flagship text model GLM-4.5-Air, continuing the technological trajectory established by GLM-4.1V-Thinking [1] - The model is designed to handle various tasks including image, video, document understanding, and GUI agent functionalities [1] Industry Context - Multimodal reasoning is identified as a crucial capability for achieving artificial general intelligence (AGI), allowing AI to perceive, understand, and make decisions like humans [1] - Vision-Language Models (VLM) are highlighted as the core foundation for enabling multimodal reasoning [1]
AI真能让企业脱胎换骨?混沌AI院产品升级重磅发布
混沌学园· 2025-08-11 12:04
Core Viewpoint - The article discusses the emergence of "AI Business Studies" as a response to the confusion faced by business leaders regarding how to effectively translate AI technology into commercial value, especially in the context of rapid advancements in AI technologies like ChatGPT5 [1][6][11]. Summary by Sections AI Business Studies Introduction - "AI Business Studies" aims to transform AI from a novelty into a practical tool that addresses cost, efficiency, and growth challenges in business [6][7]. - The concept arises from a deep understanding of real business needs, emphasizing actionable methodologies over abstract technological concepts [9]. Transition from Technology to Business Necessity - The AI sector has moved past mere technological showcases and is now focused on practical applications across various industries [6]. - Despite the proliferation of AI tools, many businesses and individuals have yet to fully realize the benefits of AI [6]. GPT5 Release Insights - The release of ChatGPT5 marks significant technological advancements, including improved problem-solving capabilities and a reduction in factual inaccuracies [14][20]. - GPT5's ability to automatically switch between models based on user needs enhances efficiency by at least 40% [19]. - The model's programming capabilities allow for rapid development of applications, significantly reducing development time and costs [20][21]. AGI and Business Opportunities - The article discusses the concept of General Artificial Intelligence (AGI) and its relevance to business, emphasizing that AGI should be measured by its performance in real-world job scenarios rather than theoretical benchmarks [26][27]. - AGI is viewed as a collective of specialized AI systems rather than a single omnipotent entity [30]. Barriers to AI Implementation in Enterprises - Companies face three main barriers to AI adoption: lack of awareness among leaders, absence of practical methodologies, and a shortage of skilled personnel to execute AI projects [33][40]. - The article outlines a three-step method for AI implementation, focusing on breaking down business goals into actionable tasks, matching AI tools to these tasks, and establishing performance metrics [38]. Chaotic AI Institute's Approach - The Chaotic AI Institute's second phase emphasizes a structured approach to AI education, focusing on building a comprehensive framework that spans various business functions and roles [45][46]. - The institute promotes a team-based model for AI project execution, enhancing the likelihood of successful implementation [51]. Community and Resource Sharing - The article highlights the importance of community and resource sharing in accelerating AI adoption, with the Chaotic AI Institute fostering a network of over 500 companies for collaboration and innovation [58][60]. Course Structure and Learning Outcomes - The six-month course at the Chaotic AI Institute is designed to guide participants through a structured learning process, ensuring they leave with actionable AI projects and methodologies [63][67].
狼真的来了,“AI第一轮就业大冲击”已至,矛头直指年轻人
3 6 Ke· 2025-08-11 04:03
人工智能正在成为裁员的主要推手之一,应届毕业生和年轻科技从业者的失业率正在上升。 8月10日周日,美银Hartnett表示,美国毕业生失业率从2023年12月的4.0%飙升至8.1%,人工智能已开 始颠覆美国就业市场。 据人力资源机构Challenger, Gray & Christmas的最新数据,仅今年前七个月,美国就有超过1万个岗位的 消失与生成式AI的应用直接相关。该机构认为,AI已跃升为年度前五大裁员原因之一。 整体来看,2025年美国企业宣布的裁员总数已超过80.6万,创下自2020年以来同期最高纪录。其中,科 技行业是重灾区,已有超8.9万个岗位被削减,自2023年以来至少有2.7万个科技岗位因AI自动化而被取 代。 初级岗位首当其冲 AI正在直接取代大量初级岗位。面向Z世代的求职平台Handshake数据显示,初级职位的招聘信息(尤 其是企业类岗位)同比下降了15%。过去两年里,在招聘信息中提到AI的雇主数量激增了400%。 这些被AI"接手"的工作,大多是原本由初级员工完成的。耶鲁大学管理学院组织行为学副教授Botelho 表示,受冲击最大的是初级员工,许多刚毕业的新员工从事的都是知识密集型 ...
国内首个具身大脑+小脑算法实战全栈教程
具身智能之心· 2025-08-11 00:14
Core Viewpoint - The exploration towards Artificial General Intelligence (AGI) highlights embodied intelligence as a key direction, focusing on the interaction and adaptation of intelligent agents within physical environments [1][6]. Industry Analysis - In the past two years, numerous star teams in the field of embodied intelligence have emerged, establishing valuable companies such as Xinghaitu, Galaxy General, and Zhujidongli, driving advancements in embodied brain and cerebellum technologies [3]. - Major domestic companies like Huawei, JD.com, Tencent, Ant Group, and Xiaomi are actively investing and collaborating to build an ecosystem for embodied intelligence, while international firms like Tesla and investment institutions in the U.S. are supporting companies like Wayve and Apptronik in autonomous driving and warehouse robotics [5]. Technological Evolution - The development of embodied intelligence has progressed through several stages: - The first stage focused on grasp pose detection, which struggled with complex tasks due to a lack of context modeling [6]. - The second stage involved behavior cloning, allowing robots to learn from expert demonstrations but revealing weaknesses in generalization and performance in multi-target scenarios [6]. - The third stage introduced Diffusion Policy methods, enhancing stability and generalization in task execution through sequence modeling [7]. - The fourth stage, emerging in 2025, explores the integration of VLA models with reinforcement learning and tactile sensing, aiming to overcome limitations in feedback and future prediction capabilities [8]. Product and Market Development - The evolution from grasp pose detection to behavior cloning and advanced VLA models signifies a shift towards intelligent agents capable of performing complex tasks in open environments, leading to a surge in product development across various sectors such as industrial, home, dining, and healthcare [9]. - The demand for engineering and system capabilities is increasing as the industry transitions from research to deployment, necessitating higher engineering standards [12]. Educational Initiatives - A comprehensive curriculum has been developed to assist learners in mastering the full spectrum of embodied intelligence algorithms, covering topics from basic tasks to advanced models like VLA and its integrations [9][12].
陈天桥联手清华教授代季峰首发最强开源AI模型项目,全力打造下一个DeepSeek
Tai Mei Ti A P P· 2025-08-10 23:52
Core Insights - The article discusses the establishment of a new company focused on General Artificial Intelligence (AGI) led by Dai Jifeng, a professor at Tsinghua University, and Chen Tianqiao, an entrepreneur and philanthropist [2][12] - The MiroMind AI team has launched the MiroMind Open Deep Research (Miro ODR) project, which is a high-performance, fully open-source deep research initiative, achieving a GAIA score of 82.4, surpassing other models like OpenAI's Deep Research and Manus [3][4] Company Overview - MiroMind aims to create a collaborative environment for building AI rather than providing AI directly, emphasizing community involvement in the development process [5][12] - The company has released four sub-projects on platforms like GitHub and Hugging Face, with plans for monthly updates to enhance the deep research model [5][10] Technical Achievements - MiroMind ODR's GAIA score of 82.4 positions it as the strongest open-source deep research model available, outperforming competitors [4][9] - The project includes various components such as MiroFlow, MiroThinker, MiroVerse, and MiroTrain, each contributing to different aspects of deep research and model training [10][12] Research and Development Focus - The MiroMind team is concentrating on three key areas: AI-driven business decision-making, innovative content distribution algorithms, and AI services tailored for aging and youth demographics [12] - The MiroMind-M1 model, based on 7 billion parameters, has shown superior performance in mathematical reasoning tasks compared to other models [9][11] Leadership and Background - Dai Jifeng has a strong academic and professional background in AI and computer vision, having published over 80 papers and received more than 60,000 citations [7][8] - Chen Tianqiao has a history of investing in brain-computer interface technologies and aims to support long-term, stable investments in hard tech innovations [15][17]
OpenAI发布新一代AI模型GPT-5
Xin Lang Cai Jing· 2025-08-10 23:02
来源:科技日报 科技日报讯 (记者刘霞)美国《纽约时报》等媒体7日报道,OpenAI于当日正式推出新一代人工智能 (AI)模型GPT-5。该公司宣称,这是其"迄今为止最智能、最迅捷且最实用的AI模型",在健康咨询解 答、快速编写计算机代码等领域表现尤为突出。 据OpenAI官网报道,GPT-5首次采用推理模型为免费版ChatGPT提供支持,该模型能对复杂问题进 行"思考"后再作出回答。OpenAI首席执行官萨姆·奥尔特曼在记者会上表示,GPT-5是对前代系统的"重 大升级",新技术响应更快、准确性更高,"幻觉"(即虚构信息)现象显著减少。 此外,根据OpenAI发布的测试数据,GPT-5的响应事实 错误率较GPT-4降低约45%,较早期模型降低80%。新模型在文书处理方面也能提供更精准的写作建 议,在医疗领域则可辅助解读体检报告,但公司特别声明"不会取代专业医疗人员"。 值得注意的是,在演示中,当被问及伯努利原理在机翼应用的经典问题时,GPT-5仍给出了常见却不够 准确的解释,这表明该AI模型尚存改进空间。奥尔特曼坦言,尽管GPT-5是通向通用人工智能(AGI) 的重要里程碑,但要实现媲美人脑的机器智能,仍存 ...