Workflow
量子位
icon
Search documents
AI搜索引擎,苹果决定自研!代号WKA
量子位· 2025-09-04 01:13
Core Viewpoint - Apple is planning to launch its own AI search engine named "World Knowledge Answers" in Spring 2026, aiming to compete directly with ChatGPT and Perplexity [8][9]. Group 1: AI Search Engine Development - Apple is preparing a "counterattack" in the AI space by transforming Siri into a new AI-driven search assistant [7]. - The new system will be integrated into Siri, allowing users to ask questions and receive concise answers generated by an AI summarization system [9][10]. - Apple is considering a partnership with Google to utilize Google's models for some functionalities of Siri [11][12]. Group 2: Market Reaction - Following the announcement of the search engine plans, Apple's stock price rose by 3.8%, marking the largest single-day increase in nearly a month [5]. Group 3: Talent Acquisition and Retention - Apple is facing a talent crisis, having lost 10 AI team members in a short period, including key figures from its foundational model team [18][19]. - Despite the halted acquisition discussions with Perplexity, Apple may still pursue other acquisition opportunities to bolster its AI talent pool [16][17]. Group 4: Strategic Partnerships - Apple and Google have a long-standing partnership in the internet search domain, with Google contributing approximately $20 billion annually to Apple [13]. - A formal agreement has been reached for Apple to evaluate and test Google's AI models to support Siri [14].
世界模型,腾讯混元卷到了榜首
量子位· 2025-09-03 07:30
Core Viewpoint - Tencent's HunyuanWorld-Voyager model has been released and is now open-source, showcasing significant advancements in 3D scene generation and immersive experiences, outperforming existing models in the WorldScore benchmark [1][3][45]. Group 1: Model Features and Innovations - HunyuanWorld-Voyager is the industry's first model supporting native 3D reconstruction for long-distance roaming, allowing for the generation of consistent roaming scenes and direct video export to 3D formats [4][24]. - The model introduces a new "roaming scene" feature, enhancing interactivity compared to traditional 360° panoramic images, enabling users to navigate within the scene using mouse and keyboard [10][11]. - It supports various applications, including video scene reconstruction, 3D object texture generation, and video style customization, demonstrating its spatial intelligence potential [27]. Group 2: Technical Framework - The model innovatively incorporates scene depth prediction into the video generation process, combining spatial and feature information to support native 3D memory and scene reconstruction [29]. - It features a unified architecture for generating aligned RGB and depth video sequences, ensuring global scene consistency [33]. - A scalable data construction engine has been developed to automate video reconstruction, allowing for large-scale and diverse training data without manual annotation [34]. Group 3: Performance Metrics - In the WorldScore benchmark, HunyuanVoyager achieved a score of 77.62, ranking first in overall capability, surpassing existing open-source methods [36]. - The model demonstrated superior video generation quality, with a PSNR of 18.751 and an SSIM of 0.715, indicating its ability to produce highly realistic video sequences [39]. - In subjective quality assessments, HunyuanVoyager received the highest ratings, confirming its exceptional visual authenticity [44]. Group 4: Deployment and Open Source - The model requires a resolution of 540p and a peak GPU memory of 60GB for deployment [47]. - Tencent is accelerating its open-source initiatives, including the release of various models and frameworks, contributing to the broader AI landscape [48].
GPT-5又帮陶哲轩解决了一个难题
量子位· 2025-09-03 07:30
一水 发自 凹非寺 量子位 | 公众号 QbitAI GPT-5又帮陶哲轩解决了一个难题! 消息来自陶本人最新动态,他衷心提醒大家: AI能够大显身手的场景再+1—— 半自动化文献检索 。 简单来说,陶正在做的事情,其实就是用 AI+数据库比对 来帮忙解决数学里的难题。 结果AI不仅省时省力,成果也十分卓越,正如陶激动所言: 这是Erdos问题/OEIS关联项目的首次概念验证成果。 具体咋回事儿,下面详细来看—— AI在数学难题解决过程中起到"定位器"作用 事情的起因还要追溯到一个关键人物——20世纪著名匈牙利数学家Paul Erdős。 此人一辈子合作了超过500位数学家,毕生发表了约1525篇数学论文,数量之多,至今无人能及。 相应地,他也给后人留下了一大堆至今未解的难题,它们被称为"Erdős问题"。 其中就有一大类问题很"刁钻"—— 它们不是问"算出结果是多少",而是问"这个结果是不是有理数 (能写成分数的那种) "。 一般而言,准确回答这类问题往往面临两个主要困难: 第一重困难:公式写得简单,但算起来超级复杂,手工几乎算不动。 它不是直接"证明"某个数是不是无理数,而是把这个数列算到很高精度的小数, ...
腾讯混元最新开源成“最强翻译”:国际机器翻译比赛获30个语种第一
量子位· 2025-09-03 05:49
Core Viewpoint - Tencent's Hunyuan-MT-7B model has achieved significant success in international translation competitions, demonstrating its advanced capabilities in translating multiple languages and dialects, while also being open-sourced for broader accessibility [1][2][4]. Group 1: Model Performance and Achievements - Hunyuan-MT-7B won first place in 30 out of 31 language pairs in the WMT2025 competition, showcasing its dominance in both high-resource and low-resource languages [4][29]. - The model supports 33 languages and 5 dialects, making it a comprehensive lightweight translation solution [1]. - In the Flores200 evaluation dataset, Hunyuan-MT-7B outperformed other models of similar size and showed competitive results against larger models [6][9]. Group 2: Technical Innovations - The model is built on a complete training paradigm that includes pre-training, supervised fine-tuning, and reinforcement learning, leading to superior translation performance [11][12]. - The Shy framework, which incorporates synergy-enhanced policy optimization, fundamentally changes traditional optimization approaches by using a systematic design with two main components: foundational model development and ensemble strategies [15][19]. - The GRPO algorithm, a key innovation in the Shy framework, reduces gradient variance and improves sample efficiency, enhancing training stability and model convergence [21][24]. Group 3: Deployment and Usability - Hunyuan-MT-7B is designed for high computational efficiency, allowing for faster inference and lower operational costs compared to larger models [30]. - The model's open-source nature promotes transparency and allows for further improvements by the research community, lowering the technical barriers for participation in machine translation advancements [31]. Group 4: Broader Implications - The methodologies and frameworks developed for Hunyuan-MT-7B can serve as a reference for optimizing other specialized fields, promoting a shift from general to specialized technology applications [33].
Nano Banana官方提示词来了,附完整代码示例
量子位· 2025-09-03 05:49
Core Viewpoint - The article discusses the rising popularity of the Nano-banana tool, highlighting its innovative features and the official guidelines released by Google to help users effectively utilize this technology [1][8]. Group 1: Features of Nano-banana - Nano-banana allows users to generate high-quality images from text descriptions, edit existing images with text prompts, and create new scenes using multiple images [15]. - The tool supports iterative refinement, enabling users to gradually adjust images until they achieve the desired outcome [15]. - It can accurately render text in images, making it suitable for logos, charts, and posters [15]. Group 2: Guidelines for Effective Use - Google emphasizes the importance of providing detailed scene descriptions rather than just listing keywords to generate better and more coherent images [9][10]. - Users are encouraged to think like photographers by considering camera angles, lighting, and fine details to achieve realistic images [19][20]. - The article provides specific prompt structures for various types of images, including photorealistic shots, stylized illustrations, product photography, and comic panels [20][24][35][43]. Group 3: Examples and Applications - The article showcases examples of images generated by Nano-banana, such as a cat dining in a luxurious restaurant under a starry sky, demonstrating the tool's capability to create detailed and imaginative scenes [14][17]. - It also includes code snippets for developers to integrate the image generation capabilities into their applications, highlighting the accessibility of the technology [21][29][35].
大模型“记性差一点”反而更聪明!金鱼损失随机剔除token,让AI不再死记硬背
量子位· 2025-09-03 05:49
Core Viewpoint - The article introduces a new method called "Goldfish Loss" that allows large language models to avoid memorizing training data verbatim, thereby enhancing their ability to learn language patterns while reducing the risk of overfitting [1][4]. Group 1: Goldfish Loss Concept - Goldfish Loss encourages models to forget specific details by randomly omitting a small portion of tokens during loss calculation [3][6]. - This method prevents the model from reproducing the training data word-for-word, while still enabling it to generate coherent text [4][9]. - The approach utilizes a hashing-based masking strategy to ensure consistency in the tokens that are omitted during training [8][14]. Group 2: Comparison with Traditional Methods - Unlike traditional regularization methods like Dropout, which introduce noise randomly, Goldfish Loss employs a static masking technique to consistently omit the same tokens across training iterations [11][19]. - This consistency fundamentally prevents the model from memorizing complete training sequences, as it cannot piece together omitted tokens from different training instances [12][14]. Group 3: Experimental Results - Experiments demonstrated that in extreme scenarios, standard training led to the model memorizing 84 out of 100 articles, while Goldfish Loss resulted in no memorization [22][24]. - In standard training scenarios, Goldfish Loss also significantly reduced the model's tendency to reproduce training data verbatim [24]. - Performance tests indicated no systematic differences in overall capabilities between models trained with Goldfish Loss and those trained with standard loss methods [26]. Group 4: Implications and Considerations - The core of Goldfish Loss lies in ignoring certain tokens during gradient calculations, which may require the model to process more data to compensate for the omitted information, potentially affecting computational efficiency [28].
用“因果规划”解决多智能体协作中的任务依赖难题|港科广&腾讯
量子位· 2025-09-03 05:49
Core Viewpoint - The article discusses the challenges faced by traditional single-agent systems in long-cycle, multi-step collaborative tasks, highlighting the need for a distributed agent framework with global planning and causal dependency management capabilities [1][2]. Group 1: CausalMACE Method - The CausalMACE method is proposed by a research team from Hong Kong University of Science and Technology and Tencent, integrating causal reasoning mechanisms into open-world multi-agent systems to provide scalable engineering solutions for complex task collaboration [2]. - The method includes a "global causal task graph" concept, allowing AI to learn "if-then" logic, enabling dynamic adjustments and clear division of labor among agents [5][6]. Group 2: Framework Components - The CausalMACE framework consists of three main components: Judger, Planner, and Worker [7]. - Judger ("裁判") verifies the legality of actions in real-time and provides feedback on success or failure, ensuring all agents operate under the same game rules [11]. - Planner ("总工") breaks down complex tasks into smaller sub-tasks and creates a rough flowchart based on game rules, refining it through causal reasoning to ensure task dependencies remain valid [12][14]. - Worker ("调度室") utilizes depth-first search to split the causal graph into multiple production lines, calculating a "busy index" for real-time task reassignment among agents [16]. Group 3: Experimental Results - The experimental results indicate that CausalMACE significantly enhances both completion rates and efficiency in benchmark tasks such as construction, cooking, and escape rooms, achieving up to a 12% increase in task completion rates and a maximum efficiency improvement of 1.5 times compared to baseline methods [17]. - In the VillagerBench benchmark tasks, CausalMACE outperformed AgentVerse and VillagerAgent across various metrics, demonstrating its effectiveness in multi-agent collaboration [18]. Group 4: Author Information - The lead author of the paper is Professor Wang Hao, an assistant professor and doctoral supervisor at Hong Kong University of Science and Technology (Guangzhou), with a research background in generative AI models and 3D reconstruction [19][20].
刚刚,宇树科技IPO时间定了!
量子位· 2025-09-03 05:49
Core Viewpoint - Yushutech's IPO timeline has been officially set, with plans to submit application documents between October and December 2025, marking a significant milestone in the company's growth and attracting attention in the robotics sector [1][2][5]. Group 1: IPO Progress - Yushutech is actively preparing for its IPO, having completed a shareholding reform earlier this year, changing its name to "Hangzhou Yushutech Co., Ltd." [8]. - The company's registered capital increased from 2.889 million to 364 million RMB, a 125-fold increase, and it completed a C-round financing, achieving a valuation of 10 billion RMB [9]. - The company has officially initiated its IPO process by filing for listing guidance with the Zhejiang Securities Regulatory Bureau [11]. Group 2: Revenue Structure - For 2024, Yushutech's revenue structure is expected to consist of approximately 65% from quadruped robots, 30% from humanoid robots, and 5% from component sales [4]. - The quadruped robots are primarily used in research, education, and consumer sectors, while humanoid robots are exclusively utilized in research and education [4]. Group 3: Financial Performance - Yushutech is one of the few profitable companies in the robotics sector, having achieved profitability for five consecutive years since 2020 [16]. - The annual revenue has surpassed 1 billion RMB, reflecting strong market performance and investor interest [17]. Group 4: Product Development - Yushutech has made significant advancements in robotics, launching several key products, including the quadruped robot Laikago and the humanoid robot H1, which showcases the company's technological capabilities [20][29]. - The Go1 quadruped robot has achieved a shipment of over 50,000 units, capturing 60% of the global consumer quadruped robot market [26]. Group 5: Market Position and Recognition - Yushutech has established itself as a leading player in the robotics industry, with its products gaining widespread recognition, including appearances on major platforms like CCTV [36][41]. - The company's innovative approach and successful fundraising efforts have positioned it as a significant contributor to the development of the Chinese robotics industry [47].
苹果机器人负责人也被小扎挖走了!浙大校友,任职Meta机器人技术一号位
量子位· 2025-09-03 03:20
克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 苹果又失去了四名AI研究员,其中三人是华人。 彭博社长期跟踪苹果公司的古尔曼称, 苹果机器人 研究小组首席AI研究员Jian Zhang将跳槽到Meta,且消息已获Meta证实。 除了已经实锤的Jian Zhang,古尔曼还爆料,基础模型团队也有三人即将离职。 苹果AI进展缓慢,让团队成员失去信心,接二连三地选择离职。 包括负责人庞若鸣在内, 苹果AI已经在数周时间内失去了10名成员 。 苹果四名AI研究员离职 Jian Zhang 离职的消息除了获得Meta证实之外,其本人的领英主页也更新了相关信息。 在苹果时,他担任的职务是AIML团队下的机器人研究主管,这个机器人研究小组与苹果的机器人产品开发部门是两个不同的部门,后者已于 今年早些时候并入苹果的硬件工程部门。 Jian Zhang在苹果期间主导的研究聚焦机器人智能与人机交互,先后推出多项具有代表性的开放论文与原型系统,奠定了苹果机器人方向从 感知-运动到情感表达的完整技术栈。 Jian Zhang本科和博士分别毕业于浙江大学和美国普渡大学,博士毕业后留校任教一年,其间选择加入苹果,至今已有十年。 ...
Claude估值暴涨300%!全球独角兽字节第三他第四
量子位· 2025-09-03 01:42
天啦噜,搞大模型的实在太疯狂了。 这不, Anthropic 官宣完成了新一轮融资: F轮,融资 130亿美元 ,最新估值达到了 1830亿美元 ——比上一轮暴涨近300%。这也再次刷新了行业新纪录。 要知道,Anthropic成立也才4年。但就在这次融资之后,已经一举成为全球第四大估值最高的初创公司,仅次于SpaceX、OpenAI和字节跳 动。 鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI 融资之外,Anthropic的业务造血能力也被进一步披露。短短半年时间,公司营收从10亿美元年化,一跃突破50亿美元。其中最核心的AI编程 业务Claude Code年收入更是突破5亿美元。 实际上,Anthropic开启新一轮融资以来,市场热情高涨,融资和估值也被一路推高,最终完成了130亿美元的单轮融资巨额。 但在资本总数没有实质增多的情况下,这或许也是 大模型马太效应 的趋势展现。 融资越多的公司获得越多,融不到资的公司则越难获得融资。 单轮融资130亿美元,估值暴涨300%至1830亿美元 Anthropic本次F轮的融资金额高达 130亿美元 ,也是迄今为止AI公司中规模最大的融资事件之一。 公司总估 ...