机器之心

Search documents
刚刚,神秘模型火了!网友:是OpenAI要开源?
机器之心· 2025-07-02 10:40
Core Viewpoint - OpenRouter has introduced a new model named "Cypher Alpha," which supports a context of 1 million tokens and is available for free, raising speculation about its origin, particularly regarding OpenAI [2][6][10]. Group 1: Model Features - Cypher Alpha is a cloaked model designed to gather user feedback and is an all-purpose model that supports long-context tasks, including code generation [9]. - The model is free to use, with no costs associated with input or output tokens [9]. - It was created on July 1, 2025, and is intended for real-world applications [9]. Group 2: Speculations and Reactions - Many users speculate that Cypher Alpha may be a new model from OpenAI, given the naming convention and similarities to previous models [6][7][10]. - Some notable figures in the tech community suggest it could be related to GPT-5 or an open-source model, while others speculate it might be from Elon Musk's Grok, although this was quickly dismissed due to performance inconsistencies [11][15]. - User feedback indicates a mixed reception, with some praising its performance in coding and reasoning tasks, while others note that it struggles with complex mathematical and logical outputs [18][21].
直播预告:「开箱」华为盘古首个开源大模型
机器之心· 2025-07-02 10:40
这周一,开源阵营又迎来一个重磅玩家 —— 华为盘古。 这次,这个新玩家一口气宣布了两个大模型的开源 ——70 亿参数的稠密模型 「 盘古 Emb edded 」和 720 亿参数的混合专家模型「 盘古 Pro MoE 」,甚至连基 于昇腾的模型推理技术也一并开源了。 | pangu-pro-moe | ☆ 108 | pangu-embedded | 公 37 | | --- | --- | --- | --- | | 盘古 Pro MoE (72B-A16B): 昇腾原生的分组混合专家模型 | | 盘古 Embedded (7B):灵活切换快慢思考的高效7B模型 | | | | | ☆ 37 ¥ 4 | | | ascend-inference-cluster | ☆ 115 | ascend-inference-system | △ 40 | | 昇腾超大规模MoE模型推理部署技术分享 | | 异腾盘古推理系统技术 | | | ☆ 115 ¥ 22 | | · Python ⭐ 40 ዓ° 6 | | 综合来看,这两个大模型都不是「等闲之辈」:在 SuperCLUE 5 月榜单上,盘古 Pro ...
AI Agent、传统聊天机器人有何区别?如何评测?这篇30页综述讲明白了
机器之心· 2025-07-02 07:03
论文作者包括来自上海交通大学的朱家琛、芮仁婷、单榕、郑琮珉、西云佳、林江浩、刘卫文、俞勇、张伟楠,以及华为诺亚研究所的朱梦辉、陈渤、唐睿明。 本文第一作者是朱家琛,上海交通大学博士生,主要研究兴趣集中在大模型推理,个性化 Agent。本文通讯作者是张伟楠,上海交通大学教授,研究方向包含强化 学习、数据科学、机器人控制、推荐搜索等。 自从 Transformer 问世,NLP 领域发生了颠覆性变化。大语言模型极大提升了文本理解与生成能力,成为现代 AI 系统的基础。而今,AI 正不断向前,具备自主决 策和复杂交互能力的新一代 AI Agent 也正加速崛起。 不同于以往只会对话的 LLM 机器人,AI Agent 能够接入互联网、调用各类 API,还能根据真实环境反馈灵活调整策略。AI Agent 因此具备了感知环境和自主决策 的能力,已经突破了传统 "问答模式" 的限制,能够主动执行任务、应对各种复杂场景,真正成为用户身边可靠的智能助手。 在这股 AI Agent 浪潮中,每个人都可以有属于自己的 AI Agent。而如何衡量自己的 AI Agent 是否足够强大呢? 海量的 Agent 评测方式层出不穷 , ...
马斯克带货Labubu?两个同济校友搞出的这款AI神器,要「卷死」广告圈
机器之心· 2025-07-02 07:03
机器之心报道 编辑:杨文 让马斯克秒变带货主播。 还记得那个让霉霉说地道中文、郭德纲讲英语相声的 HeyGen 吗? 最近它又上新了「产品植入」功能, 只需一张人物头像和一张产品图片,就能让任何人给任何产品「带货」。 比如,让盖尔・加朵、霉霉和伊万卡分别手持 Labubu、百事可乐、Gucci 经典包包说着一段广告词,无论是表情、口型还是手势,都相当自然逼真流畅。 或者让蒙娜丽莎、带珍珠项链的女孩在线推销商品: 还有网友完全用 AI 生成人物和产品图片搞了段带货视频,这要是去掉 HeyGen 水印,再放到社交媒体上,又能忽悠了不少人。 一手体验 不少网友看了这些 case,纷纷表示这将重新定义广告行业。 HeyGen 是一款 AI 视频生成平台,但与可灵、即梦、Runway 等不同,它专注于数字人视频的制作。用户只需输入文本脚本,就能一键生成高质量的虚拟人像视 频,并支持多种语言和方言。 此外,HeyGen 还探索出不少五花八门的功能。比如 Video Podcast,只需上传网站链接或 pdf 文档即可生成双人 AI 视频播客。 再比如 Interactive Avatar,通过该功能,我们可以和各种虚拟形 ...
一亿美金种子轮,刷新硅谷具身智能融资记录!周衔、许臻佳、李旻辰等华人合伙创业
机器之心· 2025-07-02 00:54
Core Viewpoint - The article discusses the emergence of Genesis AI, a company focused on embodied intelligence, which aims to automate physical labor and address the disparity between advancements in AI's cognitive capabilities and its physical applications [2][5][35]. Group 1: Company Overview - Genesis AI recently raised $105 million in seed funding, marking the largest seed round in the embodied intelligence sector to date [5][6]. - The founding team consists of top talents from prestigious institutions such as Mistral AI, NVIDIA, Google, Apple, CMU, MIT, and Stanford, with expertise in physical simulation, graphics, robotics, and large-scale AI model training [12][32]. - The company is linked to the well-known Genesis project, a generative physics engine developed over two years by CMU and over 20 research labs, designed for general robotics and embodied AI applications [8][10]. Group 2: Technology and Goals - Genesis AI aims to create a high-density talent organization to achieve advanced physical intelligence and automate physical labor [35]. - The company is addressing the "data curse" prevalent in the physical intelligence field by developing a scalable universal data engine that integrates high-precision physical simulations, multimodal generative AI, and large-scale real robot data [36][39]. - Their simulation system is fully self-developed, capable of generating high-quality synthetic data while also employing an efficient and scalable real-world data collection system, creating a "synthetic data + real data" dual-engine model [39][40]. Group 3: Future Expectations - The company aspires to become a leading force in the physical intelligence domain, similar to OpenAI, and is expected to release its next milestone by the end of the year [41][42].
从亲密伙伴抢人,Cursor挖走Claude Code两位核心人物
机器之心· 2025-07-02 00:54
Core Viewpoint - The AI industry is experiencing intense talent competition, highlighted by Anysphere's recruitment of key personnel from Anthropic, which may complicate their existing partnership [1][2][3]. Group 1: Talent Acquisition - Anysphere has successfully recruited Boris Cherny and Cat Wu from Anthropic, both of whom played significant roles in the development of Claude Code [4][5]. - Boris Cherny, the lead developer of Claude Code, will take on the role of Chief Architect and Engineering Lead at Anysphere, while Cat Wu will serve as Product Lead [5]. Group 2: Financial Performance - Anthropic's annual revenue has reached $4 billion, translating to a monthly revenue of approximately $333 million, marking a nearly fourfold increase since the beginning of the year [7]. - Anysphere's annual recurring revenue has surpassed $500 million, with a monthly income of about $42 million, more than doubling from $200 million just three months prior [11]. Group 3: Market Dynamics - The competition in the AI programming market has intensified, with major players like OpenAI, Google DeepMind, and Amazon entering the space, following the successful launch of Anthropic's AI programming product, Claude Code [12]. - The recruitment of core personnel from Anthropic by Anysphere could introduce new dynamics in this rapidly evolving market [13].
大模型时代,通用视觉模型将何去何从?
机器之心· 2025-07-02 00:54
Core Viewpoint - The article discusses the evolution of Vision Generalist Models (VGM) in the context of the rise of multimodal large models, emphasizing the need for a distinct focus on visual data despite the shift towards integrating visual modalities with language models [1][2]. Group 1: VGM Overview - VGM aims to create a unified framework capable of handling various visual tasks and modalities, similar to the success of large language models in natural language processing [7]. - VGM's key capability is its ability to process multimodal inputs, including images, point clouds, and videos, through a shared representation method [7][8]. - The model supports multiple visual tasks simultaneously, allowing for parallel processing within a single framework [8]. Group 2: Data, Tasks, and Evaluation - VGM utilizes large and diverse datasets for training and evaluation, covering various types of visual data to support multimodal learning [9]. - Visual tasks are categorized into four types: image tasks, geometric tasks, time series tasks, and other visual-related tasks [9]. - Modern evaluation methods focus on cross-task generalization and multimodal processing capabilities, differing from traditional single-task assessments [9]. Group 3: Model Design Paradigms - Existing VGM design paradigms focus on unifying different visual modality inputs and diverse task outputs, primarily categorized into encoding-based frameworks and sequence-to-sequence frameworks [12][13]. - Encoding-based frameworks create a shared feature space for different input modalities, while sequence-to-sequence frameworks are suitable for tasks with variable-length inputs and outputs [12][13]. Group 4: Current Progress and Future Directions - Current VGM research has made significant progress in unified processing of multiple tasks and modalities but faces challenges in optimizing framework design and improving training efficiency [16]. - Data acquisition and annotation remain bottlenecks for VGM development, with future research likely focusing on automated annotation techniques and large-scale unsupervised learning methods [16]. - Despite challenges, VGM shows extensive potential in practical applications, extending beyond traditional visual tasks to complex multimodal tasks across various fields such as intelligent surveillance, autonomous driving, and robotics [16].
ICML 2025 Spotlight | 清华朱军组&NVIDIA提出DDO:扩散/自回归模型训练新范式,刷新图像生成SOTA
机器之心· 2025-07-01 09:34
Core Viewpoint - The article discusses a novel optimization paradigm for visual generative models called Direct Discriminative Optimization (DDO), which enhances the performance of likelihood-based generative models by treating them as implicit discriminators, thus overcoming limitations of traditional maximum likelihood estimation (MLE) methods [1][8]. Background on Likelihood-Based Generative Models - Diffusion models and autoregressive models have become dominant in image generation, characterized by their stability, diversity, and scalability [4]. - These models estimate the log-likelihood of data explicitly, but they face challenges such as the "mode covering" problem, which can lead to blurry or distorted outputs [6]. DDO Methodology - DDO introduces a training objective that incorporates reverse KL divergence to focus on density concentration around real data, improving generation fidelity without adding extra networks [7][11]. - The method utilizes a target model and a frozen reference model to construct an implicit discriminator, allowing for direct application to diffusion and autoregressive models [11]. Performance Improvements - DDO significantly enhances the generation quality of existing models, achieving state-of-the-art results across various standard image generation tasks [12][13]. - For instance, FID scores improved from 1.58 to 0.97 for ImageNet 64×64 and from 1.85 to 1.30 for CIFAR-10 without guidance [18]. Compatibility and Efficiency - DDO does not require changes to the network structure or increase inference costs, and it is compatible with existing guidance methods like Classifier-Free Guidance (CFG) [21]. - The method allows for performance enhancement through self-play, leading to continuous improvement in FID metrics [19]. Future Prospects - The principles behind DDO may extend beyond visual generation to align with language models, suggesting a unified alignment paradigm for multimodal generation tasks [22][23].
免费约饭!加拿大ICML 2025,相聚机器之心人才晚宴
机器之心· 2025-07-01 09:34
Core Viewpoint - The AI field continues to develop rapidly in 2025, with significant breakthroughs in image and video generation technologies, particularly through diffusion models that enhance image synthesis quality and enable synchronized audio generation in video content [1][2]. Group 1: AI Technology Advancements - The use of diffusion models has led to unprecedented improvements in image synthesis quality, enhancing resolution, style control, and semantic understanding [2]. - Video generation technology has evolved, exemplified by Google's Veo 3, which achieves native audio synchronization, marking a significant advancement in video generation capabilities [2]. Group 2: Academic Collaboration and Events - The ICML conference, a leading academic event in the AI field, will take place from July 13 to July 19, 2025, in Vancouver, Canada, showcasing top research achievements [4]. - The "Yunfan・ICML 2025 AI Talent Meetup" is organized to facilitate informal discussions among professionals, focusing on cutting-edge technologies and talent dialogue [5][7]. Group 3: Event Details - The meetup will feature various engaging activities, including talks by young scholars, talent showcases, interactive experiences, institutional presentations, and networking dinners, aimed at fostering discussions on key issues in technology and application [7][8]. - The event is scheduled for July 15, 2025, from 16:00 to 20:30, with a capacity of 200 participants [8].
SuperCLUE推理榜惊现黑马:原来中兴是一家AI公司?
机器之心· 2025-07-01 05:01
Core Viewpoint - ZTE Corporation, a long-established ICT company, has successfully entered the AI sector, achieving notable recognition in AI reasoning competitions, particularly with its NebulaCoder-V6 model, which ranked first in the SuperCLUE reasoning leaderboard [2][4][6]. Group 1: ZTE's AI Strategy - ZTE has made significant investments in AI, recognizing its potential to transform the telecommunications industry, particularly with the advent of 6G technology [8][10]. - The company has established multiple AI-focused teams and research initiatives, including the Nebula large language model, to enhance its capabilities in AI infrastructure and applications [11][12]. - ZTE's internal operations have already integrated AI, with the Nebula model generating 1.5 billion tokens daily and contributing to 30% of the company's code generation [13][14]. Group 2: Nebula Model's Success - The Nebula model's success in the SuperCLUE competition is attributed to its efficient training optimization strategies, which include pre-training, supervised fine-tuning, and reinforcement learning [16][39]. - A novel knowledge graph construction method, DASER, was implemented to enhance the model's knowledge accuracy by addressing knowledge gaps and errors during pre-training [20][23]. - The supervised fine-tuning phase utilized critique learning to improve the model's understanding of complex instructions, resulting in higher accuracy in reasoning tasks [25][31]. Group 3: Transition from ICT to AI - ZTE's transition from an ICT giant to an AI-focused enterprise is facilitated by its extensive experience in data processing, system optimization, and engineering practices [44][46]. - The company possesses a unique advantage in integrating hardware and software capabilities, allowing it to effectively support the entire AI ecosystem, from hardware development to industry applications [47][48]. - ZTE's existing product ecosystem is undergoing AI transformation, which is expected to create significant market opportunities and accelerate technological advancements [48][49].