腾讯研究院

Search documents
腾讯研究院AI速递 20250610
腾讯研究院· 2025-06-09 14:06
生成式AI 一、 ChatGPT 4o低调更新,现在它也会先思考,再去联网搜索 1. ChatGPT 4o现在在回答复杂问题前会先停顿几秒"思考",页面显示"Thought for a few seconds",然后再决定搜索或直接回答; 2. 这种"先理解后搜索"的能力提高了回答准确性,但用户需要等待更长时间,移动端触发率 更高; 3. OpenAI未官宣此功能,但已将这种思考能力扩展到GPT-4.1和GPT-4.5等非推理模型 中。 https://mp.weixin.qq.com/s/ZxkMFmjp6dYRaf6EyVgp4A 二、 谷歌Veo 3 Fast版价格暴降5倍,360°关键词解锁3D效果 1. 谷歌Veo 3模型新增"360°"关键词功能,能生成3D环绕效果视频,但在物理真实性上仍有 缺陷; 2. 推出Veo 3-Fast版本,支持文生视频和自动生成配音,速度更快且价格降低80%; 3. Fast版本生成8秒720P视频仅需20 credits(比标准版便宜5倍),但面部细节和光照效果 略有下降。 https://mp.weixin.qq.com/s/Vw9C6MHOT43yqVl6tsw ...
人工智能的新浪潮和商业化
腾讯研究院· 2025-06-09 07:49
闫德利 腾讯研究院资深专家 一、人工智能是国家战略 我国高度重视人工智能的创新和发展。早在2014年,习近平总书记在中国科学院第十七次院士大会、中 国工程院第十二次院士大会上就对人工智能进行了强调。2017年,人工智能首次写入《政府工作报 告》;同年国务院印发《新一代人工智能发展规划》,提出目标"到2030年人工智能理论、技术与应用 总体达到世界领先水平"。次年,首届世界人工智能大会在上海召开。此后,人工智能在众多重要会议 上不断得以强调。尤其是今年,中共中央政治局就人工智能进行了集体学习,习近平总书记调研了上 海"模速空间"大模型创新生态社区,《政府工作报告》大篇幅描述人工智能,河南、福建、河北、辽 宁、山东、北京、江西、天津、新疆、陕西等地党委理论学习中心组纷纷就人工智能组织学习会。我国 掀起了 人工智能研发和应用的热潮 。如下表所示。 | 会议 | 日期 | 主题/相关表述 | | --- | --- | --- | | 政治局集体 | 2018.10 | 人工智能发展现状和趋势 | | 学习 | 2025.4 | 加强人工智能发展和监管 | | 政治局会议 | 2023.4 | 要重视通用人工智能发展 ...
腾讯研究院AI速递 20250609
腾讯研究院· 2025-06-08 13:26
生成式AI 一、 OpenAI升级高级语音功能,更像真人,外加随身翻译官 1. ChatGPT高级语音功能升级,声音更自然,能表达情感和语调变化,使交流更具人性化; 2. 新增实时翻译功能,支持跨语言对话,可在国际环境中充当同声传译,无缝衔接对话; 3. 该功能已向所有付费用户开放,用户只需点击输入框中的语音图标即可使用。 https://mp.weixin.qq.com/s/E9NZu15JIlQA2mw9XKmGPQ 二、 独角兽ElevenLabs发布Eleven v3:狠狠拿捏情感控制 1. ElevenLabs发布新版TTS模型Eleven v3,支持70多种语言,声称是"迄今为止最具表现力 的文本转语音模型"; 2. 引入音频标签系统,可精确控制情感表达,包括情感标签、音效标签和特殊标签,标点符 号也影响情绪传递; 2. 采用双自回归架构和RLHF技术,支持13种语言,包括中英日等,在TTS-Arena排名第 一; 3. 定价每百万字节15美元(约0.8美元/小时),适用于内容创作和配音领域,未来计划推出版 权音色注册与分成机制。 https://mp.weixin.qq.com/s/UbyYrm ...
腾讯研究院AI每周关键词Top50
腾讯研究院· 2025-06-06 09:10
每周50关键词 把握全局AI动态 点击 关键词 可查看资讯概述 | 类别 | Top关键词 | 主体 | | --- | --- | --- | | 模型 | 推理注意力机制 | Mamba | | 模型 | Video-XL-2 | 智源研究院 | | 应用 | 连接器与录音 | OpenAI | | 应用 | 1.0整数版本 | Cursor | | 应用 | Modify Video | Luma | | 应用 | 声音克隆 | Bland TTS | | 应用 | Search API | Firecrawl | | 应用 | 轻量级记忆 | OpenAI | | 应用 | Codex下放 | OpenAI | | 应用 | 视频生成功能 | Manus | | 应用 | 开源播客生成 | MoonCast | | 应用 | 18年未解难题 | AlphaEvolve | | 应用 | AI诊断笔 | Jun Chen | | 应用 | Bing Video Creator | 微软 | | 应用 | 幻灯片功能 | Manus | | 应用 | AvatarFX | Character.ai | | ...
“AI教父”辛顿最新专访:没有什么人类的能力是AI不能复制的
腾讯研究院· 2025-06-06 09:08
AI推理能力激增,错误率大幅下降,正逐步超越人类。 AI掌握的信息量远超任何个体,已在多个领域比人更聪明。 医疗、教育等行业即将被AI重塑,革命性变革已在发生。 人类能力没有"不可复制"的部分,AI终将全面胜任创意、判断与情感表达。 AI也能类比、学习、调整,甚至展现"情感"和"意识"式行为。 风险并非AI无法控制,而在于"谁拥有控制权"和"谁受益"。 未来不止是失业威胁,更是人类被少数掌控AI者"系统性剥夺"的风险。 被誉为"AI教父"的杰弗里·辛顿于近日接受了调查记者盖昂·埃斯皮纳的远程专访。 他称,AI正在以前所未有的速度演化: 变得更聪明、更少犯错,甚至具备了情感和意识。他警告,AI 不仅可能具备喜怒哀乐等情绪,还已经学会了欺骗。辛顿大胆预测,AI完全失控的概率在10%到20%之 间,人类未来可能被AI所掌控。 辛顿因在机器学习和神经网络领域的开创性贡献,于2024年荣获诺贝尔物理学奖。他曾在谷歌担任AI研 究负责人长达十年,并于2023年选择离开。这样,他可以更坦率地表达对AI潜在风险的深切担忧。 过去那些只存在于反乌托邦科幻小说中的问题——"AI会取代人类吗?"、"它会觉醒吗?"、"会反抗人 类吗 ...
腾讯研究院AI速递 20250606
腾讯研究院· 2025-06-05 15:26
生成式AI https://mp.weixin.qq.com/s/LIbp--FSjAsjpR3zc4eCHg 二、Cursor 1.0首个大版本来袭!自动捉bug,秒改屎山代码 1. Cursor 1.0整数版本正式发布,推出BugBot自动代码审查工具,可自动找出潜在bug并提 供修复建议; 2. 后台智能体功能向所有用户开放,支持Jupyter Notebook深度集成,大幅提升科研和数据 科学任务效率; 3. 新增记忆功能可记住对话关键信息,一键安装MCP服务器,并优化聊天体验支持直接渲染 Mermaid图表和Markdown表格。 https://mp.weixin.qq.com/s/4zurzWK9f5xx48GJSDvjDA 三、Luma 推 出Modify Video,原视频精髓不变角色环境任意换 一、ChatGPT更新:新增录音功能,深度研究可访问文档和应用 1. ChatGPT深度研究新增连接器功能,可访问企业和个人数据源(如Outlook、Teams、 Google Drive等); 2. 新推出录音模式,支持自动转录、提取关键点、带时间戳查询,首先向macOS的Team用 户开放; 3 ...
重视你人生的复利效应
腾讯研究院· 2025-06-05 08:37
达伦·哈迪 《复利效应》作者 本文摘自中信出版社《复利效应》 你听过"稳扎稳打方能制胜"这句话吗?或者至少听过龟兔赛跑的故事吧?女士们,先生们,我就是那只 乌龟。给我足够的时间,我几乎可以在任何时候、任何比赛中击败任何人。为什么?不是因为我最优 秀、最聪明或速度最快。我之所以会赢,是因为我已经养成了积极的习惯,而且在将这些习惯付诸实施 时做到了始终如一。 我是世界上最相信持之以恒的人。它是成功的终极因素, 我自己就是一个活生生 的例子,但对于那些努力奋斗的人来说,这也是最大的陷阱之一。大多数人不知道如何坚持下去,维持 良好习惯。但我知道,这要感谢我的父亲。从本质上讲,他是为我点燃"复利效应"力量的第一位教练。 在我 18 个月大的时候,我的父母就离异了,父亲以单亲爸爸的身份把我抚养长大。他并不是那种温柔 体贴的养育型父亲。他曾是一名大学橄榄球教练,总是鼓励我追求成功。 多亏了父亲,我每天早上 6 点钟都会被叫醒。不是被温柔地拍拍肩膀唤醒,甚至也不是因为闹铃声。我 每天早上都是被铁器重复敲击车库水泥地面的声音吵醒的,车库就在我卧室旁边。我每天就像睡在与施 工工地仅一墙之隔的地方。父亲在车库的墙上贴了一张巨大的标 ...
腾讯研究院AI速递 20250605
腾讯研究院· 2025-06-04 14:24
Group 1 - OpenAI is introducing a lightweight memory feature for free ChatGPT users, allowing personalized responses based on user conversation habits [1] - The lightweight memory feature supports short-term conversation continuity, enabling users to experience basic memory functions [1] - This feature is particularly beneficial in fields such as writing, financial analysis, and medical tracking, with users having the option to enable or disable it at any time [1] Group 2 - ChatGPT's CodeX programming tool is now available to Plus members, featuring internet access, PR updates, and voice input capabilities [2] - The internet access feature for CodeX is turned off by default and must be manually enabled, providing access to approximately 70 safe whitelisted websites [2] - OpenAI has been actively updating CodeX, with three updates in two weeks and more features expected to be released soon [2] Group 3 - AI programming platform Windsurf is set to be acquired by OpenAI for $3 billion, but has faced a near-total cut in access to Claude models from Anthropic [2] - Windsurf is implementing emergency measures, including lowering Gemini model prices and halting free user access to Claude models, citing Anthropic's unwillingness to continue supply [2] - The industry views the supply cut as a result of competitive dynamics following OpenAI's acquisition, with Anthropic shifting focus to IDE and plugins that directly compete with Windsurf [2] Group 4 - Manus has launched a video generation feature that allows for the combination of multiple 5-second clips into a complete story, overcoming video length limitations [3] - The video generation process involves three steps: task planning, phased reference image searching, and segment stitching to complete the editing [3] - Currently, this feature is only available to members, with mixed feedback on its effectiveness, costing approximately 166 points for a 5-second video [4] Group 5 - MoonCast is an open-source conversational voice synthesis model that generates natural bilingual AI podcasts in Chinese and English from a few seconds of voice samples [5] - The model utilizes LLM to extract information and create engaging podcast scripts, incorporating natural speech elements [5] - It employs a 2.5 billion parameter model and extensive training data to achieve over 10 minutes of audio generation through a three-stage training process [5] Group 6 - Turing Award winner Yoshua Bengio has announced the establishment of a non-profit organization, LawZero, which has raised $30 million to develop "design for safety" AI systems [6] - LawZero is working on "Scientist AI," a non-autonomous system aimed at understanding the world rather than taking actions, to counteract current AI risks [6] - This initiative marks the involvement of all three deep learning pioneers in addressing AI risks, with Bengio founding LawZero, Hinton resigning from Google, and LeCun criticizing mainstream AI approaches [6] Group 7 - AlphaEvolve has made significant breakthroughs in combinatorial mathematics, solving a long-standing problem in additive combinatorics, raising the sum-difference set index from 1.14465 to 1.173077 [7] - These breakthroughs highlight the power of AI-human collaboration, with AlphaEvolve discovering initial constructs and mathematicians refining them [7] - This development is seen as a new paradigm in scientific discovery, showcasing the complementary nature of different research methods [7] Group 8 - Jun Chen, a Chinese scientist, has developed an AI diagnostic pen that analyzes handwriting features to assist in the early detection of Parkinson's disease, achieving over 95% accuracy [9] - The pen consists of a magnetoelastic tip and ferromagnetic fluid ink, capable of sensing writing pressure changes and generating recordable voltage signals [9] - This technology offers a lower-cost, portable, and user-friendly alternative to traditional diagnostic methods, particularly beneficial in resource-limited settings [9] Group 9 - Sam Altman predicts that the era of AI executors will emerge within 18 months, with AI evolving from a tool to a problem-solving executor by 2026 [10] - OpenAI's internal use of Codex illustrates the current state of AI agents, which can autonomously receive tasks, query information, and execute multi-step processes [10] - Companies that invest early in AI will gain a competitive advantage through data loops and practical experience, mastering the art of inquiry and problem-solving [10]
腾讯研究院AI速递 20250604
腾讯研究院· 2025-06-03 14:49
Group 1 - Microsoft launched Bing Video Creator, supported by OpenAI's Sora technology, allowing users to generate various types of videos through natural language [1] - The service is free and offers two generation modes: quick and standard, with an initial allowance of 10 quick generation opportunities, producing videos of 5 seconds in length [1] - Built-in safety measures are included to prevent misuse, and each generated video is tagged with content credentials and traceability information; currently, it is not available in the national region [1] Group 2 - Manus introduced a new slide feature that can generate 8 professional PPT slides in 10 minutes, receiving positive feedback [2] - The testing process showed that Manus can automatically search for information, plan structure, and generate content, supporting instant modifications and various export formats, although there are issues with incomplete page displays [2] - Compared to Genspark, Manus is faster (10 minutes vs. 20 minutes) and more powerful, being rated as the best PPT creation tool currently [2] Group 3 - Character.ai launched AvatarFX, enabling static images to speak, sing, and interact with users [3] - AvatarFX is based on the DiT architecture, featuring high fidelity and strong temporal consistency, maintaining stability even in complex scenarios with multiple characters and long sequences [3] - Character.ai also introduced several AI creation features, including immersive narrative experiences and animated chat, while facing an antitrust investigation regarding Google's acquisition of the platform [3] Group 4 - Fellou 2.0 was officially released, functioning as an intelligent agent similar to "Jarvis," enabling 24/7 batch production of AI tasks [4][5] - The new version boasts improved speed (1.2-1.5 times faster), enhanced capabilities (supporting diverse delivery), and increased reliability (success rate improved from 31% to 80%) [5] - Built on the new Eko 2.0 architecture, it supports parallel processing of multiple tasks and plans to release a Windows version while continuously optimizing user experience and model intelligence [5] Group 5 - YouWare is an "ambient programming" platform designed for creators in the AI era, allowing non-programmers to convert ideas into web pages and share them online [6] - The platform's core advantage lies in its "what you see is what you think" experience, where users describe their ideas, and AI generates code for immediate visualization and sharing [6] - YouWare is supported by self-developed AI Agent and Sandbox technology, creating a community similar to "Instagram" and implementing a "Knot" reward mechanism to encourage quality content creation [6] Group 6 - Zhiyuan Research Institute open-sourced the lightweight long video understanding model Video-XL-2, capable of efficiently processing video inputs of up to ten thousand frames on a single card [7] - The model consists of a visual encoder, dynamic token synthesis module, and a large language model, employing a four-stage progressive training method and introducing a segmented pre-filling strategy [7] - Video-XL-2 outperforms all lightweight open-source models on mainstream evaluation benchmarks, encoding 2048 frames of video in just 12 seconds, applicable in film content analysis and anomaly behavior monitoring [7] Group 7 - Salesforce, the leading global CRM platform, acquired the AI Agent platform Moonhub, with the entire team joining Salesforce to develop the Agentforce platform [8] - Salesforce CEO Marc Benioff is optimistic about the development of intelligent agents, aiming to create one billion agents through Agentforce by the end of 2025, with 3,000 paying customers already onboard [8] - Moonhub specializes in recruiting intelligent agents, autonomously searching and screening candidates, complementing Salesforce's existing HR intelligent agent functions and enhancing its influence in the intelligent agent sector [8] Group 8 - Li Feifei's World Labs open-sourced the Forge renderer, enabling real-time rendering of AI-generated 3D worlds on ordinary devices [10] - Forge is a web-based 3D Gaussian splat (3DGS) renderer, seamlessly integrating with three.js, supporting multiple splat objects, cameras, and real-time animation/editing [10] - The technology's key lies in an efficient painter's algorithm for sorting issues and a programmable data pipeline, allowing developers to handle AI-generated 3D worlds as easily as processing triangular meshes [10] Group 9 - The report discusses the model selection guide by Kapasi, recommending GPT-4o for simple daily questions and switching to o3 for complex tasks [11] - Specific usage scenarios include 40% for simple daily questions with 4o, 40% for complex important issues with o3, and using GPT-4.1 for code refinement [11] - The core principle for model selection is "either-or": first determine if the task is important and if one is willing to wait (choose o3) or if it is unimportant and needs quick understanding (choose 4o) [11] Group 10 - ChatGPT's memory system consists of two main components: saving memories and chat history, which is further divided into current session history, dialogue history, and user insights [12] - The technical implementation of memory saving is achieved through bio tools, while dialogue history utilizes vector space to establish multi-layer indexing [12] - The user experience is significantly enhanced by the memory mechanism, particularly the user insight system, which may contribute over 80% to ChatGPT's improved understanding, transforming it from "you tell me" to "I can see" [12]
探元计划郑州站|AI助力太极焕活,解锁非遗传承新范式
腾讯研究院· 2025-06-03 08:15
2025年5月29日,"探元计划2024"太极拳场景共创项目开放日活动在河南举办。本次开放日聚焦数字科技深 度融入太极拳场景落地,旨在推动太极拳场景共创项目优化技术效能、深挖文化价值、探索可持续运营路 径,来自文化、技术、运营方面的众多专家携手参与开放日活动,共议数字赋能太极焕活,通过AI解锁非 遗传承新路径。 参与共创日活动的专家在中国太极拳博物馆前合影 探元计划在国家文物局科技教育司的指导下,由中国文物信息咨询中心(国家文物局数据中心)、腾讯SSV 数字文化实验室、腾讯研究院、社会价值投资联盟(深圳)联合发起,旨在深化文化与科技融合,推动文 化遗产数字化保护。 在"探元计划2024"的创新资助与支持下, 中国非遗保护协会太极拳专委会 联动河南非遗美学馆与太极拳发 源地温县陈家沟,与华邮数字文化技术研究院展开场景创新探索实践合作,采用深度学习姿态识别方法实 现3D姿态重建,通过智能分析连续动作完成多维评估,助力太极拳传承年轻化与数字化。 太极圣地溯源之行 活动伊始,专家们实地调研了太极拳发源地陈家沟太极拳祖祠、中国太极拳博物馆, 并与当地太极拳代表 性传承人进行了现场交流,为后续深入研讨太极拳的保护、传承与 ...