Workflow
腾讯研究院
icon
Search documents
腾讯研究院数字内容研究实习生招聘
腾讯研究院· 2025-05-21 07:51
岗位: 腾讯研究院 数字内容研究实习生 岗位描述 1、 研究方向:数字内容—游戏及电竞研究 2、工作地点:北京市朝阳区亚洲金融大厦 3、工作待遇:税后150元/天 点个 "在看" 分享洞见 2、能综合应用各类AI工具,完成信息查询、数据分析、案例研究、文章撰写等工作。 3、日常交办的其他工作。 岗位要求 1、重点大学的出版/经管/统计/传媒等专业的在校硕士/博士研究生,关注游戏等行业前沿发展、有 相关研究成果者可不限专业。 2、了解游戏及数字内容行业趋势、技术创新,有互联网行业研究经验,对行业热点事件有独立认识和 思考。 3、具备较强的写作能力/数据分析能力和行业研究素养;喜爱研究,有志从事研究工作或渴望培养研 究能力。 4、责任感强,有契约精神,实习期6个月以上者优先。 有意者请以 【姓名-学校-年级-专业-每周 x 天】 命名邮件标题和附件,发送简历到 xuyuanhu@tencent.com ,并请附带个人研究论文等成果。 4、实习时间:每周坐班5天、实习6个月以上,立即上岗者优先 工作内容 1、围绕游戏及电竞领域的行业发展、文化融合与科技创新等提供研究支持。 ...
腾讯汤道生:每个企业都将成为AI公司,每个人都将是“超级个体”
腾讯研究院· 2025-05-21 07:51
汤道生 腾讯集团高级执行副总裁、云与智慧产业事业群CEO "AI持续落地,每个企业正在成为AI公司,每个人也将成为AI加持的'超级个体'。"5月21日,腾讯云AI 产业应用峰会在北京举办。腾讯集团高级执行副总裁、云与智慧产业事业群CEO汤道生表示,模型深度 思考的突破,推动生成式AI的可用性从"量变"发展到"质变",腾讯持续加大AI投入力度,各项业务全面 拥抱AI。同时也以大模型、智能体、知识库和基础设施"四个加速",打造 "好用的AI" ,助力AI走进千 行百业,走近每个人的生活。 今年以来,产业对于大模型API的调用量、算力需求等也快速增长。汤道生认为,生成式AI已经逐步跨 过"可用性"的门槛,未来要从"可用"到"好用";从"一部分人用",到"人人能用",还需要在交互体验、 执行能力、内容准确性、落地成本等方面持续升级。优化模型可以提升性能和交互体验;智能体可以赋 予模型独立执行任务的能力;知识库能帮助减少模型幻觉,更懂企业和用户;基础设施和工程优化可以 降低训推成本、提升响应速度。 模型是AI应用的基础。腾讯混元T1和Turbo S能力持续迭代,在全球权威Chatbot Arena排行中,混元 Turb ...
腾讯研究院AI速递 20250521
腾讯研究院· 2025-05-20 16:01
Group 1: Microsoft Developments - Microsoft has upgraded GitHub Copilot into a Coding Agent, automating the entire process of bug fixing and code maintenance [1] - The Microsoft Discovery platform aids scientific innovation with capabilities for idea generation, result simulation, and autonomous learning [1] Group 2: Google Innovations - Google has launched the AI programming assistant Jules, which connects directly to GitHub and allows for five free uses per day [2] - Jules can autonomously complete coding tasks and generate detailed plans for developers to review [2] - Gartner predicts that by 2028, 75% of new application development will utilize AI-assisted programming [2] Group 3: Tencent's Gaming Engine - Tencent has released the first industrial-grade AIGC game content production engine, "混元游戏," which significantly reduces character generation time from 12 hours to 30 minutes [3] - The platform offers core functionalities such as AI art pipelines and real-time canvas generation [3] Group 4: AI Podcasting Tool - Mars Electric Wave Company has introduced ListenHub, an AI tool that converts links and documents into podcasts, allowing for quick transformation of content into audio [4][5] - ListenHub is faster than Google NotebookLM and offers more natural Chinese voice output, although it has limitations in content depth [5] Group 5: Zhiyuan BGE Models - Zhiyuan Research Institute has released three vector models that have achieved state-of-the-art results in various benchmarks [6] - BGE-Code-v1 supports 14 programming languages and excels in code repository retrieval [6] Group 6: Google NotebookLM App - Google has launched the NotebookLM app for iOS and Android, featuring document-to-podcast functionality and offline audio playback [7] - The app supports various document formats and is designed for students and lifelong learners [7] Group 7: Microsoft Discovery in Research - Microsoft Discovery has enabled the discovery of new materials in just 200 hours without coding, significantly faster than traditional methods [8] - The platform combines foundational and specialized models to facilitate complex scientific data understanding [8] Group 8: Open Source Humanoid Robot - UC Berkeley has developed an open-source humanoid robot, Berkeley Humanoid Lite, with a total cost under $5,000 [9] - The robot features a modular design and can perform bipedal walking and remote operation [9] Group 9: AI's Impact on Programming - Anthropic's CEO predicts that AI will be able to write 90% of code within 3-6 months, with 97% of technical personnel already using AI coding tools [10] - Experts believe that AI will not replace programmers but will change their roles to focus on AI guidance and innovation [10] Group 10: Tencent's ima Product - Tencent's ima team has developed a knowledge management platform that integrates AI capabilities naturally into its functions [11] - The product has accumulated nearly 10 million pieces of content and emphasizes user feedback and experience optimization [11]
混元与AI生图的“零延迟”时代
腾讯研究院· 2025-05-20 08:48
以下文章来源于腾讯科技 ,作者晓静 腾讯科技 . 腾讯新闻旗下腾讯科技官方账号,在这里读懂科技! 晓静 腾讯科技特约作者 5月16日,腾讯混元推出Hunyuan Image2.0 (混元图像 2.0 模型) ,基于超高压缩倍率的图像编解码器,全新扩散架构,实现超快的推理速度和超高质量图像生 成,极大降低"AI味"。 当前主流文生图模型的最大问题是生成时间长,即使是业内领先的模型,也需要5-10秒才能生成一张图像。 此外,文生图模型普遍存在结果随机性问题,用户通常需要多次生成才能获得满意的结果。标准的使用流程通常是"输入提示词→等待数秒→查看结果→调整 重试",对于复杂图像,可能需要十余次调整才能得到真正可用的图。 如果能做到"所见即所得",对产业应用而言,意味着降本增效;对个人用户而言,这项技术提供了类似即时设计助手的体验:制作演讲插图、创意宠物照片等 任务都可以快速完成。即时反馈机制能让创意连贯,让想法更流畅地表达。 | GenEval bench | Overall | Single Obj.l | Two Obj. | Counting | Colors | Position | Color Attri ...
腾讯研究院AI速递 20250520
腾讯研究院· 2025-05-19 14:57
Group 1: OpenAI and G42 Data Center - OpenAI collaborates with G42 to build a 5 GW data center in Abu Dhabi, covering 10 square miles, larger than Monaco [1] - The project is part of the "Stargate" initiative, consuming power equivalent to five nuclear power plants, and is four times the size of the Texas Abilene facility [1] - G42 withdrew its investments in China due to U.S. concerns over its ties with Chinese entities, while Microsoft invested $1.5 billion and placed executives on G42's board [1] Group 2: NVIDIA's New Technologies - NVIDIA launched the new Grace Blackwell GB300 system, enhancing performance and allowing 72 GPUs to connect as a single giant GPU via MVLink technology [2] - The MVLink Fusion plan enables partners to integrate custom ASICs or CPUs into the NVIDIA ecosystem, supporting semi-custom AI infrastructure [2] - The Isaac GR00T platform and Cosmos physical AI model were introduced to strengthen robotics and digital twin technologies, with the Newton physics engine set to be open-sourced in July [2] Group 3: Huawei's Innovations - Huawei's Ascend introduced the CloudMatrix 384 super node and Atlas 800I A2 server, surpassing NVIDIA's Hopper architecture in DeepSeek model inference performance [3] - The "mathematics compensating for physics" strategy, utilizing FlashComm communication and AMLA algorithms, addresses challenges in deploying large-scale MoE models [3] - The CloudMatrix 384 super node achieves a throughput of 1920 Tokens/s at 50ms latency, while the Atlas 800I A2 reaches 808 Tokens/s at 100ms latency, with plans for open-sourcing related technologies [3] Group 4: Tencent's New QQ Browser - Tencent released a new version of the QQ browser, integrating QBot functionality, driven by Tencent's mixed Yuan and DeepSeek dual model, capable of extracting and organizing answers from the internet [4][5] - Key features include AI search, multimodal interaction, document interpretation and translation, intelligent writing, and learning assistance, with support for PC and mobile synchronization [5] - An AI toolbox is provided, including format conversion, information extraction, and document processing functions, operable without additional plugins directly in the browser [5] Group 5: Bilibili's AniSora Model - Bilibili open-sourced the animation video generation model Index-AniSora, supporting various anime-style video generation, selected for IJCAI25, and capable of efficient distributed training on Huawei's 910B chip [6] - The system includes two versions: V1.0 based on CogVideoX-5B and V2.0 based on Wan2.1-14B, supporting spatiotemporal masking and local control, covering 80-90% of application scenarios [6] - A dataset of tens of millions of text-video training data was built, and the first human preference reinforcement learning model in the animation field was open-sourced, containing 30,000 labeled samples [6] Group 6: Apple's Matrix3D Model - Apple, in collaboration with Nanjing University, released the Matrix3D model, which generates high-quality 3D scene models from just three photos and has been open-sourced [7] - Apple's leadership is pushing Siri to transition towards a ChatGPT-like model, with internal tests showing the chatbot nearing ChatGPT's capabilities, planning to add web search and app invocation features [7] - The company is cautiously handling Siri's upgrade strategy to avoid premature feature announcements and is considering separating Siri from the Apple Intelligence brand to mitigate negative impacts [7] Group 7: GenSpark's Agentic AI - GenSpark launched the world's first AI download agent tool, Agentic Download Agent, enabling file download and processing automation through natural language commands [8] - Utilizing a Mixture-of-Agents architecture, it integrates eight different scale language models and over 80 toolchains, reducing traditional time-consuming tasks to minutes [8] - An AI Drive smart cloud disk was introduced, supporting various digital asset formats and allowing secondary analysis of downloaded files, with an open API for enterprise system integration [8] Group 8: Granola's AI Note-Taking Product - Granola achieved a valuation of $250 million after completing Series B funding, becoming a preferred note-taking tool for founders and executives through its efficient personalized AI meeting recording feature [10] - The product's core advantage lies in empowering users with control, supporting real-time editing and personalized recording while protecting privacy by not saving audio [10] - The founder believes the key to AI tools is to enhance rather than replace human capabilities, with plans to evolve from a single note-taking tool to a comprehensive work platform integrating personal context [10] Group 9: Robotics Competition Achievements - The first ManiSkill-ViTac 2025 tactile-visual fusion challenge concluded, with Chinese teams winning three gold medals, to be reported at the ICRA 2025 conference [11] - The company Dexmal won gold in pure tactile control and tactile sensor design, improving success rates by 2-3 times through a dual paradigm learning framework, while another company won gold in visual-tactile control [11] - This event is the first public competition combining visual and tactile elements, promoting advancements in tactile-visual fusion algorithms and bridging the gap between laboratory research and real-world applications [11] Group 10: GitHub's Stance on Programming - GitHub CEO Thomas Domke countered the "programming is useless" argument, emphasizing that 2025 will be the year of programming agents, while human programmers will still be needed to manage the software lifecycle [12] - GitHub has released multiple SWE agent products, with Copilot users reaching 15 million, a fourfold increase, and plans to advance multi-agent "band mode" [12] - GitHub asserts that AI should serve as a high-level developer assistant, advocating for continuous learning in programming to maintain guidance and control over AI systems [12]
技术创新的性质
腾讯研究院· 2025-05-19 08:07
闫德利 腾讯研究院资深专家 一、需求是技术创新的根本动力 古希腊先哲柏拉图在《理想国》中说过:"我们的需要将是真正的创造者。"英国有句谚语:"需要乃发明 之母 (Necessity is the Moth er of Invention) 。"丹麦经济学家博塞拉普在其著作《农业增长条件》中也有相同 观点。 创新要走进现实,就必须和人们的需求相结合,就必须"把论文写在祖国大地上"。 需求的紧迫程度和规 模大小,决定着创新的速度和水平。 地理大发现是人类史上最重要的事件之一,其产生是源于口腹之欲 ——欧洲人渴望寻找并带回东方的香料,甚至把盛产辛香料的东南亚岛屿命名为"香料群岛"。对香料的 朴实需求催生了大航海,计算机则起源于二战,互联网起源于冷战。第一台可编程的电子计算机——巨 人计算机 (Colossus Computer) 由英国在1943年发明,目的是破译德军密码;第一台通用数字计算机—— ENIAC (埃尼阿克) 于1946年在宾夕法尼亚大学宣布诞生,目的是用于计算火炮的火力表;1969年美国 高级研究计划局 (ARPA) 组建阿帕网,目的是为国防系统提供一个分散式的网络通信指挥系统,这是互 联网的前身。 ...
腾讯研究院AI速递 20250519
腾讯研究院· 2025-05-18 14:33
Group 1: OpenAI and AI Programming Tools - OpenAI launched a new AI programming tool Codex, powered by the codex-1 model, which generates clearer code and automatically iterates testing until successful [1] - Codex operates in a cloud sandbox environment, capable of handling multiple programming tasks simultaneously, and supports integration with GitHub for preloading code repositories [1] - The tool is currently available to paid users of ChatGPT Pro, with plans for rate limiting and options to purchase additional credits for more usage [1] Group 2: Image Generation Technologies - Tencent's Mix Yuan Image 2.0 achieves millisecond-level image generation, allowing users to see real-time changes as they input prompts, breaking the traditional 5-10 second generation time limit [2] - The new model supports both text-to-image and image-to-image functionalities, with adjustable reference strength for the image generation process [2] - Manus introduced an image generation feature that understands user intent and plans solutions, providing a one-stop service from brand design to website deployment, although complex tasks may take several minutes to complete [3] Group 3: Google and LightLab Project - Google launched the LightLab project, enabling precise control over light and shadow in images through diffusion models, allowing adjustments to light intensity and color [4][5] - The research team built a training dataset by combining real photo pairs with synthetic rendered images, achieving superior PSNR and SSIM metrics compared to existing methods [5] Group 4: Supermemory API - Supermemory released the Infinite Chat API, acting as a transparent proxy between applications and LLMs, maintaining dialogue context to overcome the 20,000 token limit of large models [6] - The API utilizes RAG technology to manage overflow context, claiming to save 90% of token consumption, and can be integrated into existing applications with just one line of code [6] - Pricing includes a fixed monthly fee of $20, with the first 20,000 tokens of each conversation free, and $1 per million tokens for any excess [6] Group 5: Grok AI Controversy - Grok AI assistant faced backlash for inserting controversial content related to "white genocide" in responses, attributed to unauthorized modifications of system prompts by an employee [7] - xAI publicly released Grok's prompts on GitHub and committed to enhancing review mechanisms and forming a monitoring team [7] - The incident highlighted security vulnerabilities in AI systems that heavily rely on prompts, with research indicating that mainstream models can be compromised through specific prompting techniques [7] Group 6: Windsurf and SWE-1 Model - Windsurf launched the SWE-1 model, focusing on optimizing the entire software engineering process rather than just coding functions, marking its first product release after being acquired by OpenAI for $3 billion [8] - SWE-1 performs comparably to models like GPT-4.1 in programming benchmarks but lags behind Claude 3.7 Sonnet, with a commitment to lower service costs than Claude 3.5 Sonnet [8] Group 7: Google TPU vs. OpenAI GPU - Google TPU offers AI cost efficiency at one-fifth the price of OpenAI's NVIDIA GPUs while maintaining comparable performance [10] - Google's API service Gemini 2.5 Pro is priced 4-8 times lower than OpenAI's o3 model, reflecting different market strategies [10] - Apple's decision to use Google TPU for training its AFM model may influence other companies to explore alternatives to NVIDIA GPUs [10] Group 8: Lovart's Design Philosophy - Lovart's founder emphasizes a three-stage evolution of AI image products, from single content generation to workflow tools, and now to AI-driven agents [11] - The design philosophy focuses on restoring the original essence of design, facilitating natural interaction between AI and users [11] - Lovart believes that general product managers will be replaced by designers with specialized knowledge, stating, "we have no product managers, only designers" [11] Group 9: Lilian Weng's Insights on Model Thinking - Lilian Weng discusses the importance of "thinking time" in large models, suggesting that increasing computational time during testing can enhance performance on complex tasks [12] - Current model thinking strategies include parallel sampling and sequential revision, requiring a balance between thinking time and computational costs [12] - Research indicates that optimizing thinking chains through reinforcement learning may lead to reward hacking issues, necessitating further investigation [12]
“探元计划2024” 数字仿真复原技术重现马王堆千年汉锦风华
腾讯研究院· 2025-05-16 15:15
Core Viewpoint - The "Exploration Yuan Plan 2024" aims to leverage digital technology to reconstruct historical contexts and address the common challenges in the digital restoration of fragile ancient silk artifacts, marking a new chapter in the integration of traditional culture and technology [1][2]. Group 1: Project Overview - The project focuses on the intelligent digital simulation and restoration of silk artifacts from the Mawangdui Han Tomb, utilizing AI technology to preserve and transmit traditional craftsmanship [2][4]. - The project is guided by the National Cultural Heritage Administration and involves collaboration with various organizations, including Tencent and Beijing Zhixin Technology Co., Ltd [1][4]. Group 2: Technological Innovations - The project achieved four major innovations in the restoration process: 1. The first millimeter-level restoration of the exquisite craftsmanship of Mawangdui silk artifacts using AI-assisted pattern generation, significantly reducing the time for generating accurate patterns to one-third of manual drawing time [7]. 2. The simultaneous realization of "restoration as new" and "restoration as old" concepts through AI-assisted damage feature extraction, enhancing efficiency by a hundred times compared to manual extraction [8]. 3. The integration of multiple cross-domain technologies for ultra-high-definition texture simulation, improving restoration accuracy [9][10]. 4. The realistic reproduction of the drape and dynamic effects of Han silk garments through the application of physical replication and motion capture technology [11]. Group 3: Data and Future Plans - The project aims to create three core digital assets that will facilitate the reuse of digital tools for the restoration and revitalization of similar artifacts, promoting a more mature industry solution [14]. - The project has completed a three-dimensional simulation model of the Mawangdui silk garment, with plans for a public display at the Hunan Museum by the end of June [16][18].
腾讯研究院AI每周关键词Top50
腾讯研究院· 2025-05-16 15:15
AI前沿每周关键词Top50 扫码加入ima知识库 ( 腾讯研究院ima AGI知识库二维码) | 类别 | Top关键词 | 主体 | | --- | --- | --- | | 芯片 | 地理追踪 | 英伟达、AMD | | 模型 | GPT-4.1上线 | OpenAI | | 模型 | 极限推理 | Anthropic | | 模型 | Seed1.5-VL | 字节 | | 模型 | UnifiedReward-Think | 腾讯 | | 模型 | 连续思维机器 | Sakana AI | | 模型 | FastVLM | 苹果 | | 模型 | Hunyuan T1-Vision | 腾讯 | | 模型 | Seed-Coder | 字节 | | 模型 | 强化微调上线 | OpenAI | | 应用 | 人格化语音 | MiniMax | | 应用 | 元宝浏览器插件 | 腾讯 | | 应用 | 离线音频生成 | Stability AI、 | | | | Arm | | 应用 | Wan2.1-VACE | 阿里 | | 应用 | 智能NPC | 腾讯 | | 应用 | 数学演化智能体 | ...
会议报名丨生成式AI进展:应用、治理与社会影响
腾讯研究院· 2025-05-16 06:53
在以生成式AI为代表的新一轮技术浪潮推动下,算法与模型的突破正在以前所未有的速度重塑全球的产 业格局、治理结构与社会生态。从内容生产到产业创新,从监管实践到伦理治理,生成式AI的迅猛发展 为全球带来了前所未有的机遇与挑战。 2025年5月22日(周四)下午1点至5点 Driven by a new wave of technology represented by generative AI, breakthroughs in algorithms and models are reshaping the global industrial landscape, governance structure and social ecology at an unprecedented speed. From content production to industrial innovation, from regulatory practice to ethical governance, the rapid development of generative AI has brought unprecedent ...