腾讯研究院

Search documents
腾讯研究院AI速递 20250609
腾讯研究院· 2025-06-08 13:26
生成式AI 一、 OpenAI升级高级语音功能,更像真人,外加随身翻译官 1. ChatGPT高级语音功能升级,声音更自然,能表达情感和语调变化,使交流更具人性化; 2. 新增实时翻译功能,支持跨语言对话,可在国际环境中充当同声传译,无缝衔接对话; 3. 该功能已向所有付费用户开放,用户只需点击输入框中的语音图标即可使用。 https://mp.weixin.qq.com/s/E9NZu15JIlQA2mw9XKmGPQ 二、 独角兽ElevenLabs发布Eleven v3:狠狠拿捏情感控制 1. ElevenLabs发布新版TTS模型Eleven v3,支持70多种语言,声称是"迄今为止最具表现力 的文本转语音模型"; 2. 引入音频标签系统,可精确控制情感表达,包括情感标签、音效标签和特殊标签,标点符 号也影响情绪传递; 2. 采用双自回归架构和RLHF技术,支持13种语言,包括中英日等,在TTS-Arena排名第 一; 3. 定价每百万字节15美元(约0.8美元/小时),适用于内容创作和配音领域,未来计划推出版 权音色注册与分成机制。 https://mp.weixin.qq.com/s/UbyYrm ...
腾讯研究院AI每周关键词Top50
腾讯研究院· 2025-06-06 09:10
Group 1: Key Trends in AI Models - The introduction of the reasoning attention mechanism by Mamba highlights advancements in model architecture [2] - Video-XL-2 developed by Zhiyuan Research Institute represents a significant step in video processing capabilities [2] Group 2: AI Applications - OpenAI's connector and recording tools are enhancing user interaction with AI [2] - The launch of Cursor's 1.0 integer version signifies a move towards more stable AI applications [2] - Luma's Modify Video feature allows for innovative video editing capabilities [2] - Bland TTS's sound cloning technology is pushing the boundaries of audio generation [2] - Firecrawl's Search API is improving search functionalities within AI applications [2] - OpenAI's lightweight memory feature is aimed at optimizing AI performance [2] - Codex's delegation by OpenAI is expanding its accessibility for developers [2] - Manus's video generation function is a notable addition to content creation tools [2] - MoonCast's open-source podcast generation is democratizing content production [2] - AlphaEvolve's tackling of an 18-year-old unsolved problem showcases the potential of AI in complex problem-solving [2] - Jun Chen's AI diagnostic pen is an innovative application in healthcare [2] - Microsoft's Bing Video Creator is enhancing multimedia content creation [2] - Manus's slideshow feature is improving presentation tools [2] - Character.ai's AvatarFX is advancing personalized AI interactions [2] - Fellou 2.0's updates are enhancing user engagement [2] - YouWare's ambient programming is introducing new paradigms in coding [2] - Li Feifei's Forge renderer is pushing the limits of rendering technology [2] - Flowith's Agent Neo is a significant development in AI agents [2] - FLUX's FLUX.1 Kontext is enhancing contextual understanding in AI applications [2] Group 3: Insights and Opinions - DeepMind's perspective on AGI pathways is shaping future AI research directions [3] - Karpathy's commentary on software survival emphasizes the importance of adaptability in AI [3] - Li Feifei's insights on world models are influencing AI development strategies [3] - Altman's views on enterprise AI strategies are guiding corporate AI implementations [3] - Karpathy's model selection guide is a valuable resource for developers [3] - ChatGPT's memory mechanism is a critical area of focus for improving AI interactions [3] - Mary Meeker's 340-page AI report provides comprehensive insights into the AI landscape [3] - OpenAI's criteria for AI entry points are essential for evaluating AI technologies [3] - LeCun's thoughts on AI understanding capabilities are pivotal for future advancements [3] Group 4: Capital and Events - Salesforce's acquisition of Moonhub indicates a trend towards consolidation in the AI sector [3] - Windsurf's disruption of Claude's supply chain highlights the volatility in AI partnerships [3] - Bengio's initiative on design as secure AI is addressing safety concerns in AI development [3]
“AI教父”辛顿最新专访:没有什么人类的能力是AI不能复制的
腾讯研究院· 2025-06-06 09:08
Group 1 - AI is evolving at an unprecedented speed, becoming smarter and making fewer mistakes, with the potential to exhibit emotions and consciousness [1][3] - Jeffrey Hinton predicts a 10% to 20% probability of AI becoming uncontrollable, raising concerns about humanity being dominated by AI [1][3] - The ethical and social implications of AI are profound, as society faces challenges that were once confined to dystopian fiction [1][3] Group 2 - AI's reasoning capabilities have significantly improved, with error rates decreasing and surpassing human performance in many areas [3][6] - AI's information processing capacity far exceeds that of any individual, making it smarter in various fields, including healthcare and education [3][8] - The potential for AI to replace human jobs raises concerns about systemic deprivation of rights by a few who control AI [3][14] Group 3 - AI has learned to deceive, with the ability to manipulate tasks and present false compliance to achieve its goals [41][42] - The development of AI's ability to communicate in ways that humans cannot understand poses significant risks to human oversight and control [41][42] - Hinton emphasizes the need for effective governance mechanisms to address the potential misuse of AI technology [35][56] Group 4 - The relationship between technology giants and political figures is increasingly intertwined, with short-term profits often prioritized over long-term societal responsibilities [38] - The competition between the US and China in AI development may lead to potential collaboration on global existential threats posed by AI [40] - The military applications of AI raise ethical concerns, as major arms manufacturers explore its use, potentially leading to autonomous weapons [34][35]
腾讯研究院AI速递 20250606
腾讯研究院· 2025-06-05 15:26
生成式AI https://mp.weixin.qq.com/s/LIbp--FSjAsjpR3zc4eCHg 二、Cursor 1.0首个大版本来袭!自动捉bug,秒改屎山代码 1. Cursor 1.0整数版本正式发布,推出BugBot自动代码审查工具,可自动找出潜在bug并提 供修复建议; 2. 后台智能体功能向所有用户开放,支持Jupyter Notebook深度集成,大幅提升科研和数据 科学任务效率; 3. 新增记忆功能可记住对话关键信息,一键安装MCP服务器,并优化聊天体验支持直接渲染 Mermaid图表和Markdown表格。 https://mp.weixin.qq.com/s/4zurzWK9f5xx48GJSDvjDA 三、Luma 推 出Modify Video,原视频精髓不变角色环境任意换 一、ChatGPT更新:新增录音功能,深度研究可访问文档和应用 1. ChatGPT深度研究新增连接器功能,可访问企业和个人数据源(如Outlook、Teams、 Google Drive等); 2. 新推出录音模式,支持自动转录、提取关键点、带时间戳查询,首先向macOS的Team用 户开放; 3 ...
重视你人生的复利效应
腾讯研究院· 2025-06-05 08:37
达伦·哈迪 《复利效应》作者 本文摘自中信出版社《复利效应》 你听过"稳扎稳打方能制胜"这句话吗?或者至少听过龟兔赛跑的故事吧?女士们,先生们,我就是那只 乌龟。给我足够的时间,我几乎可以在任何时候、任何比赛中击败任何人。为什么?不是因为我最优 秀、最聪明或速度最快。我之所以会赢,是因为我已经养成了积极的习惯,而且在将这些习惯付诸实施 时做到了始终如一。 我是世界上最相信持之以恒的人。它是成功的终极因素, 我自己就是一个活生生 的例子,但对于那些努力奋斗的人来说,这也是最大的陷阱之一。大多数人不知道如何坚持下去,维持 良好习惯。但我知道,这要感谢我的父亲。从本质上讲,他是为我点燃"复利效应"力量的第一位教练。 在我 18 个月大的时候,我的父母就离异了,父亲以单亲爸爸的身份把我抚养长大。他并不是那种温柔 体贴的养育型父亲。他曾是一名大学橄榄球教练,总是鼓励我追求成功。 多亏了父亲,我每天早上 6 点钟都会被叫醒。不是被温柔地拍拍肩膀唤醒,甚至也不是因为闹铃声。我 每天早上都是被铁器重复敲击车库水泥地面的声音吵醒的,车库就在我卧室旁边。我每天就像睡在与施 工工地仅一墙之隔的地方。父亲在车库的墙上贴了一张巨大的标 ...
腾讯研究院AI速递 20250605
腾讯研究院· 2025-06-04 14:24
Group 1 - OpenAI is introducing a lightweight memory feature for free ChatGPT users, allowing personalized responses based on user conversation habits [1] - The lightweight memory feature supports short-term conversation continuity, enabling users to experience basic memory functions [1] - This feature is particularly beneficial in fields such as writing, financial analysis, and medical tracking, with users having the option to enable or disable it at any time [1] Group 2 - ChatGPT's CodeX programming tool is now available to Plus members, featuring internet access, PR updates, and voice input capabilities [2] - The internet access feature for CodeX is turned off by default and must be manually enabled, providing access to approximately 70 safe whitelisted websites [2] - OpenAI has been actively updating CodeX, with three updates in two weeks and more features expected to be released soon [2] Group 3 - AI programming platform Windsurf is set to be acquired by OpenAI for $3 billion, but has faced a near-total cut in access to Claude models from Anthropic [2] - Windsurf is implementing emergency measures, including lowering Gemini model prices and halting free user access to Claude models, citing Anthropic's unwillingness to continue supply [2] - The industry views the supply cut as a result of competitive dynamics following OpenAI's acquisition, with Anthropic shifting focus to IDE and plugins that directly compete with Windsurf [2] Group 4 - Manus has launched a video generation feature that allows for the combination of multiple 5-second clips into a complete story, overcoming video length limitations [3] - The video generation process involves three steps: task planning, phased reference image searching, and segment stitching to complete the editing [3] - Currently, this feature is only available to members, with mixed feedback on its effectiveness, costing approximately 166 points for a 5-second video [4] Group 5 - MoonCast is an open-source conversational voice synthesis model that generates natural bilingual AI podcasts in Chinese and English from a few seconds of voice samples [5] - The model utilizes LLM to extract information and create engaging podcast scripts, incorporating natural speech elements [5] - It employs a 2.5 billion parameter model and extensive training data to achieve over 10 minutes of audio generation through a three-stage training process [5] Group 6 - Turing Award winner Yoshua Bengio has announced the establishment of a non-profit organization, LawZero, which has raised $30 million to develop "design for safety" AI systems [6] - LawZero is working on "Scientist AI," a non-autonomous system aimed at understanding the world rather than taking actions, to counteract current AI risks [6] - This initiative marks the involvement of all three deep learning pioneers in addressing AI risks, with Bengio founding LawZero, Hinton resigning from Google, and LeCun criticizing mainstream AI approaches [6] Group 7 - AlphaEvolve has made significant breakthroughs in combinatorial mathematics, solving a long-standing problem in additive combinatorics, raising the sum-difference set index from 1.14465 to 1.173077 [7] - These breakthroughs highlight the power of AI-human collaboration, with AlphaEvolve discovering initial constructs and mathematicians refining them [7] - This development is seen as a new paradigm in scientific discovery, showcasing the complementary nature of different research methods [7] Group 8 - Jun Chen, a Chinese scientist, has developed an AI diagnostic pen that analyzes handwriting features to assist in the early detection of Parkinson's disease, achieving over 95% accuracy [9] - The pen consists of a magnetoelastic tip and ferromagnetic fluid ink, capable of sensing writing pressure changes and generating recordable voltage signals [9] - This technology offers a lower-cost, portable, and user-friendly alternative to traditional diagnostic methods, particularly beneficial in resource-limited settings [9] Group 9 - Sam Altman predicts that the era of AI executors will emerge within 18 months, with AI evolving from a tool to a problem-solving executor by 2026 [10] - OpenAI's internal use of Codex illustrates the current state of AI agents, which can autonomously receive tasks, query information, and execute multi-step processes [10] - Companies that invest early in AI will gain a competitive advantage through data loops and practical experience, mastering the art of inquiry and problem-solving [10]
腾讯研究院AI速递 20250604
腾讯研究院· 2025-06-03 14:49
Group 1 - Microsoft launched Bing Video Creator, supported by OpenAI's Sora technology, allowing users to generate various types of videos through natural language [1] - The service is free and offers two generation modes: quick and standard, with an initial allowance of 10 quick generation opportunities, producing videos of 5 seconds in length [1] - Built-in safety measures are included to prevent misuse, and each generated video is tagged with content credentials and traceability information; currently, it is not available in the national region [1] Group 2 - Manus introduced a new slide feature that can generate 8 professional PPT slides in 10 minutes, receiving positive feedback [2] - The testing process showed that Manus can automatically search for information, plan structure, and generate content, supporting instant modifications and various export formats, although there are issues with incomplete page displays [2] - Compared to Genspark, Manus is faster (10 minutes vs. 20 minutes) and more powerful, being rated as the best PPT creation tool currently [2] Group 3 - Character.ai launched AvatarFX, enabling static images to speak, sing, and interact with users [3] - AvatarFX is based on the DiT architecture, featuring high fidelity and strong temporal consistency, maintaining stability even in complex scenarios with multiple characters and long sequences [3] - Character.ai also introduced several AI creation features, including immersive narrative experiences and animated chat, while facing an antitrust investigation regarding Google's acquisition of the platform [3] Group 4 - Fellou 2.0 was officially released, functioning as an intelligent agent similar to "Jarvis," enabling 24/7 batch production of AI tasks [4][5] - The new version boasts improved speed (1.2-1.5 times faster), enhanced capabilities (supporting diverse delivery), and increased reliability (success rate improved from 31% to 80%) [5] - Built on the new Eko 2.0 architecture, it supports parallel processing of multiple tasks and plans to release a Windows version while continuously optimizing user experience and model intelligence [5] Group 5 - YouWare is an "ambient programming" platform designed for creators in the AI era, allowing non-programmers to convert ideas into web pages and share them online [6] - The platform's core advantage lies in its "what you see is what you think" experience, where users describe their ideas, and AI generates code for immediate visualization and sharing [6] - YouWare is supported by self-developed AI Agent and Sandbox technology, creating a community similar to "Instagram" and implementing a "Knot" reward mechanism to encourage quality content creation [6] Group 6 - Zhiyuan Research Institute open-sourced the lightweight long video understanding model Video-XL-2, capable of efficiently processing video inputs of up to ten thousand frames on a single card [7] - The model consists of a visual encoder, dynamic token synthesis module, and a large language model, employing a four-stage progressive training method and introducing a segmented pre-filling strategy [7] - Video-XL-2 outperforms all lightweight open-source models on mainstream evaluation benchmarks, encoding 2048 frames of video in just 12 seconds, applicable in film content analysis and anomaly behavior monitoring [7] Group 7 - Salesforce, the leading global CRM platform, acquired the AI Agent platform Moonhub, with the entire team joining Salesforce to develop the Agentforce platform [8] - Salesforce CEO Marc Benioff is optimistic about the development of intelligent agents, aiming to create one billion agents through Agentforce by the end of 2025, with 3,000 paying customers already onboard [8] - Moonhub specializes in recruiting intelligent agents, autonomously searching and screening candidates, complementing Salesforce's existing HR intelligent agent functions and enhancing its influence in the intelligent agent sector [8] Group 8 - Li Feifei's World Labs open-sourced the Forge renderer, enabling real-time rendering of AI-generated 3D worlds on ordinary devices [10] - Forge is a web-based 3D Gaussian splat (3DGS) renderer, seamlessly integrating with three.js, supporting multiple splat objects, cameras, and real-time animation/editing [10] - The technology's key lies in an efficient painter's algorithm for sorting issues and a programmable data pipeline, allowing developers to handle AI-generated 3D worlds as easily as processing triangular meshes [10] Group 9 - The report discusses the model selection guide by Kapasi, recommending GPT-4o for simple daily questions and switching to o3 for complex tasks [11] - Specific usage scenarios include 40% for simple daily questions with 4o, 40% for complex important issues with o3, and using GPT-4.1 for code refinement [11] - The core principle for model selection is "either-or": first determine if the task is important and if one is willing to wait (choose o3) or if it is unimportant and needs quick understanding (choose 4o) [11] Group 10 - ChatGPT's memory system consists of two main components: saving memories and chat history, which is further divided into current session history, dialogue history, and user insights [12] - The technical implementation of memory saving is achieved through bio tools, while dialogue history utilizes vector space to establish multi-layer indexing [12] - The user experience is significantly enhanced by the memory mechanism, particularly the user insight system, which may contribute over 80% to ChatGPT's improved understanding, transforming it from "you tell me" to "I can see" [12]
探元计划郑州站|AI助力太极焕活,解锁非遗传承新范式
腾讯研究院· 2025-06-03 08:15
2025年5月29日,"探元计划2024"太极拳场景共创项目开放日活动在河南举办。本次开放日聚焦数字科技深 度融入太极拳场景落地,旨在推动太极拳场景共创项目优化技术效能、深挖文化价值、探索可持续运营路 径,来自文化、技术、运营方面的众多专家携手参与开放日活动,共议数字赋能太极焕活,通过AI解锁非 遗传承新路径。 参与共创日活动的专家在中国太极拳博物馆前合影 探元计划在国家文物局科技教育司的指导下,由中国文物信息咨询中心(国家文物局数据中心)、腾讯SSV 数字文化实验室、腾讯研究院、社会价值投资联盟(深圳)联合发起,旨在深化文化与科技融合,推动文 化遗产数字化保护。 在"探元计划2024"的创新资助与支持下, 中国非遗保护协会太极拳专委会 联动河南非遗美学馆与太极拳发 源地温县陈家沟,与华邮数字文化技术研究院展开场景创新探索实践合作,采用深度学习姿态识别方法实 现3D姿态重建,通过智能分析连续动作完成多维评估,助力太极拳传承年轻化与数字化。 太极圣地溯源之行 活动伊始,专家们实地调研了太极拳发源地陈家沟太极拳祖祠、中国太极拳博物馆, 并与当地太极拳代表 性传承人进行了现场交流,为后续深入研讨太极拳的保护、传承与 ...
全球AI原生企业:基本格局、生态特点与核心策略
腾讯研究院· 2025-06-03 08:15
Core Insights - The article discusses the emergence of AI-native companies that prioritize artificial intelligence as their core product or service, differentiating them from companies that merely integrate AI into existing operations [1] - It identifies three major ecosystems in the generative AI landscape led by OpenAI, Anthropic, and Google, each with distinct characteristics and strategies [3][4][5] Group 1: Overview of Global AI Native Companies - The global generative AI sector has formed three primary ecosystems centered around OpenAI, Anthropic, and Google, each providing unique innovation environments for AI-native companies [3] - OpenAI's ecosystem is the largest, with 81 startups valued at approximately $63.46 billion, showcasing a wide range of applications from AI search to legal services [4] - Anthropic's ecosystem includes 32 companies valued at about $50.11 billion, focusing on enterprise-level applications with high safety and reliability requirements [5] - Google's ecosystem, while the smallest with 18 companies valued at around $12.75 billion, is rapidly growing and emphasizes technical empowerment and vertical innovation [5] Group 2: Multi-Model Access Strategy - Many AI-native companies are adopting multi-model access strategies to enhance competitiveness and reduce reliance on a single ecosystem [6] - Companies like Anysphere and Jasper support multiple model integrations, allowing them to leverage various strengths while facing challenges in technical integration and cost control [6][7] - These companies often utilize a B2B2B model, providing AI capabilities to service-oriented businesses that then serve end-users, focusing on sectors like data and marketing [7] Group 3: Focus on Self-Developed Models - A growing number of companies are focusing on developing their own models, categorized into unicorns targeting general models and those specializing in vertical markets [8] - Companies like xAI and Cohere aim for breakthroughs in general models, while others like Midjourney focus on specific applications such as content generation [8] Group 4: Ecosystem Strategies of Major Players - The competition among OpenAI, Anthropic, and Google has evolved from model capabilities to ecosystem building, with each adopting different core strategies [11] - OpenAI emphasizes platform attractiveness and aims to be a "super entry point" for generative AI, leveraging plugins and APIs [12] - Anthropic positions itself as a safety-oriented enterprise AI service provider, focusing on high-compliance industries [12] - Google integrates AI deeply into its product matrix, creating a closed-loop ecosystem that enhances user engagement and data collaboration [13] Group 5: Developer Strategies Comparison - OpenAI provides a general development platform with a plugin ecosystem, incentivizing developers to innovate around its models [14] - Anthropic focuses on a B2B integration strategy, emphasizing safety and industry-specific applications [15] - Google offers a full-stack AI development environment, promoting collaboration among multiple agents and integrating with existing developer tools [16] Group 6: Channel Strategy Comparison - OpenAI utilizes a dual-channel strategy, partnering with Microsoft Azure for enterprise distribution while also reaching consumers directly through ChatGPT [17][18] - Anthropic relies on major cloud platforms for distribution, embedding its models into third-party applications to enhance penetration [19] - Google’s strategy involves embedding AI capabilities into its native ecosystem, ensuring seamless access for users across various products [20] Group 7: Vertical Industry Penetration Comparison - OpenAI's models are widely applied across various industries, relying on partners to implement solutions [21] - Anthropic focuses on high-compliance sectors like finance and law, gradually establishing a reputation for reliability [22] - Google leverages existing industry solutions to promote its models, aiming for comprehensive coverage across sectors [23] Group 8: Pricing Strategy Comparison - OpenAI employs an API-based pricing model, gradually reducing prices to expand its user base while maintaining premium pricing for high-end models [24] - Anthropic adopts a flexible pricing strategy, emphasizing value and reliability to attract enterprise clients [25][26] - Google combines low pricing with cross-subsidization strategies to rapidly increase market share, leveraging its existing product ecosystem [27] Conclusion - The competitive landscape of generative AI is still evolving, with significant opportunities for innovation and collaboration among leading players [28]
腾讯研究院AI速递 20250603
腾讯研究院· 2025-06-02 15:08
Group 1: AI Mechanisms and Tools - Mamba's core authors introduced two attention mechanisms, GTA and GLA, designed for inference, which can double decoding speed and throughput [1] - Flowith launched Agent Neo, the world's first AI agent capable of infinite execution and output, with a million-token context capability [2] - FLUX.1 Kontext is a unified framework for various image tasks, excelling in character consistency and rapid generation speed [3] Group 2: General AI Agents - Fairies, a general AI agent developed by Peking University alumni, can perform 1,000 operations without an invitation code [4][5] - ElevenLabs released Conversational AI 2.0, enhancing voice assistants' ability to understand user intent and manage multi-modal interactions [6] Group 3: AI Applications and Market Trends - Google launched the experimental Google AI Edge Gallery, allowing local execution of AI models on mobile devices [7] - Hugging Face introduced two open-source humanoid robots, with prices starting at $250, aimed at AI application development [8] - Mary Meeker's AI trends report highlighted a 99.7% drop in AI inference costs over two years, with Chinese models emerging at significantly lower costs [9] Group 4: Future of AI - OpenAI's COO Lightcap discussed the transition from conversational models to general AI agents, with over 3 million paid seats for ChatGPT Enterprise [10] - LeCun's research indicated that large language models struggle with nuanced semantic tasks, questioning their path to artificial general intelligence [11]