Workflow
腾讯研究院
icon
Search documents
腾讯研究院数字内容研究实习生招聘
腾讯研究院· 2025-05-21 07:51
岗位: 腾讯研究院 数字内容研究实习生 岗位描述 1、 研究方向:数字内容—游戏及电竞研究 2、工作地点:北京市朝阳区亚洲金融大厦 3、工作待遇:税后150元/天 点个 "在看" 分享洞见 2、能综合应用各类AI工具,完成信息查询、数据分析、案例研究、文章撰写等工作。 3、日常交办的其他工作。 岗位要求 1、重点大学的出版/经管/统计/传媒等专业的在校硕士/博士研究生,关注游戏等行业前沿发展、有 相关研究成果者可不限专业。 2、了解游戏及数字内容行业趋势、技术创新,有互联网行业研究经验,对行业热点事件有独立认识和 思考。 3、具备较强的写作能力/数据分析能力和行业研究素养;喜爱研究,有志从事研究工作或渴望培养研 究能力。 4、责任感强,有契约精神,实习期6个月以上者优先。 有意者请以 【姓名-学校-年级-专业-每周 x 天】 命名邮件标题和附件,发送简历到 xuyuanhu@tencent.com ,并请附带个人研究论文等成果。 4、实习时间:每周坐班5天、实习6个月以上,立即上岗者优先 工作内容 1、围绕游戏及电竞领域的行业发展、文化融合与科技创新等提供研究支持。 ...
腾讯汤道生:每个企业都将成为AI公司,每个人都将是“超级个体”
腾讯研究院· 2025-05-21 07:51
汤道生 腾讯集团高级执行副总裁、云与智慧产业事业群CEO "AI持续落地,每个企业正在成为AI公司,每个人也将成为AI加持的'超级个体'。"5月21日,腾讯云AI 产业应用峰会在北京举办。腾讯集团高级执行副总裁、云与智慧产业事业群CEO汤道生表示,模型深度 思考的突破,推动生成式AI的可用性从"量变"发展到"质变",腾讯持续加大AI投入力度,各项业务全面 拥抱AI。同时也以大模型、智能体、知识库和基础设施"四个加速",打造 "好用的AI" ,助力AI走进千 行百业,走近每个人的生活。 今年以来,产业对于大模型API的调用量、算力需求等也快速增长。汤道生认为,生成式AI已经逐步跨 过"可用性"的门槛,未来要从"可用"到"好用";从"一部分人用",到"人人能用",还需要在交互体验、 执行能力、内容准确性、落地成本等方面持续升级。优化模型可以提升性能和交互体验;智能体可以赋 予模型独立执行任务的能力;知识库能帮助减少模型幻觉,更懂企业和用户;基础设施和工程优化可以 降低训推成本、提升响应速度。 模型是AI应用的基础。腾讯混元T1和Turbo S能力持续迭代,在全球权威Chatbot Arena排行中,混元 Turb ...
腾讯研究院AI速递 20250521
腾讯研究院· 2025-05-20 16:01
Group 1: Microsoft Developments - Microsoft has upgraded GitHub Copilot into a Coding Agent, automating the entire process of bug fixing and code maintenance [1] - The Microsoft Discovery platform aids scientific innovation with capabilities for idea generation, result simulation, and autonomous learning [1] Group 2: Google Innovations - Google has launched the AI programming assistant Jules, which connects directly to GitHub and allows for five free uses per day [2] - Jules can autonomously complete coding tasks and generate detailed plans for developers to review [2] - Gartner predicts that by 2028, 75% of new application development will utilize AI-assisted programming [2] Group 3: Tencent's Gaming Engine - Tencent has released the first industrial-grade AIGC game content production engine, "混元游戏," which significantly reduces character generation time from 12 hours to 30 minutes [3] - The platform offers core functionalities such as AI art pipelines and real-time canvas generation [3] Group 4: AI Podcasting Tool - Mars Electric Wave Company has introduced ListenHub, an AI tool that converts links and documents into podcasts, allowing for quick transformation of content into audio [4][5] - ListenHub is faster than Google NotebookLM and offers more natural Chinese voice output, although it has limitations in content depth [5] Group 5: Zhiyuan BGE Models - Zhiyuan Research Institute has released three vector models that have achieved state-of-the-art results in various benchmarks [6] - BGE-Code-v1 supports 14 programming languages and excels in code repository retrieval [6] Group 6: Google NotebookLM App - Google has launched the NotebookLM app for iOS and Android, featuring document-to-podcast functionality and offline audio playback [7] - The app supports various document formats and is designed for students and lifelong learners [7] Group 7: Microsoft Discovery in Research - Microsoft Discovery has enabled the discovery of new materials in just 200 hours without coding, significantly faster than traditional methods [8] - The platform combines foundational and specialized models to facilitate complex scientific data understanding [8] Group 8: Open Source Humanoid Robot - UC Berkeley has developed an open-source humanoid robot, Berkeley Humanoid Lite, with a total cost under $5,000 [9] - The robot features a modular design and can perform bipedal walking and remote operation [9] Group 9: AI's Impact on Programming - Anthropic's CEO predicts that AI will be able to write 90% of code within 3-6 months, with 97% of technical personnel already using AI coding tools [10] - Experts believe that AI will not replace programmers but will change their roles to focus on AI guidance and innovation [10] Group 10: Tencent's ima Product - Tencent's ima team has developed a knowledge management platform that integrates AI capabilities naturally into its functions [11] - The product has accumulated nearly 10 million pieces of content and emphasizes user feedback and experience optimization [11]
混元与AI生图的“零延迟”时代
腾讯研究院· 2025-05-20 08:48
Core Viewpoint - Tencent's Hunyuan Image 2.0 model represents a significant advancement in image generation technology, enabling real-time, high-quality image creation with minimal latency, thus enhancing user experience and productivity in various applications [3][4][10]. Group 1: Model Features - Hunyuan Image 2.0 utilizes a high-compression image codec and a new diffusion architecture, achieving ultra-fast inference speeds and high-quality image generation [3]. - The model allows for "what you see is what you get" functionality, enabling users to see image changes in real-time as they input text prompts [4][11]. - Compared to existing models that take 5-10 seconds to generate images, Hunyuan Image 2.0 significantly reduces this time, providing a more efficient user experience [5][8]. Group 2: User Experience - The model supports strong adherence to text prompts, allowing for real-time modifications of images based on user input [8]. - It offers two modes for image generation: "reference subject" and "reference outline," allowing users to set the intensity of reference features for more tailored outputs [19][22]. - Users can upload reference images and adjust the strength of adherence to the original image, enabling creative flexibility [19][20]. Group 3: Applications and Use Cases - The technology serves as an instant design assistant, facilitating quick creation of illustrations for presentations and creative projects [5][8]. - For professional designers, the dual canvas feature allows for immediate previews of color and style changes, streamlining the creative process [27][30]. - The model's ability to generate images based on detailed prompts enables users to create complex visuals, such as character designs or themed illustrations, with minimal effort [15][33]. Group 4: Performance Metrics - Hunyuan Image 2.0 outperforms competitors in various evaluation metrics, achieving a score of 0.9597 in overall performance, surpassing models like DALL-E 3 and CogView4-6B [7]. - The model demonstrates strong capabilities in generating images with specific attributes, such as color and position, indicating its advanced understanding of user prompts [7]. Group 5: Accessibility - The model is currently available for public testing, allowing users to experience its capabilities firsthand [9]. - Its user-friendly interface enables individuals with no design background to easily create images, democratizing access to advanced image generation technology [27].
腾讯研究院AI速递 20250520
腾讯研究院· 2025-05-19 14:57
Group 1: OpenAI and G42 Data Center - OpenAI collaborates with G42 to build a 5 GW data center in Abu Dhabi, covering 10 square miles, larger than Monaco [1] - The project is part of the "Stargate" initiative, consuming power equivalent to five nuclear power plants, and is four times the size of the Texas Abilene facility [1] - G42 withdrew its investments in China due to U.S. concerns over its ties with Chinese entities, while Microsoft invested $1.5 billion and placed executives on G42's board [1] Group 2: NVIDIA's New Technologies - NVIDIA launched the new Grace Blackwell GB300 system, enhancing performance and allowing 72 GPUs to connect as a single giant GPU via MVLink technology [2] - The MVLink Fusion plan enables partners to integrate custom ASICs or CPUs into the NVIDIA ecosystem, supporting semi-custom AI infrastructure [2] - The Isaac GR00T platform and Cosmos physical AI model were introduced to strengthen robotics and digital twin technologies, with the Newton physics engine set to be open-sourced in July [2] Group 3: Huawei's Innovations - Huawei's Ascend introduced the CloudMatrix 384 super node and Atlas 800I A2 server, surpassing NVIDIA's Hopper architecture in DeepSeek model inference performance [3] - The "mathematics compensating for physics" strategy, utilizing FlashComm communication and AMLA algorithms, addresses challenges in deploying large-scale MoE models [3] - The CloudMatrix 384 super node achieves a throughput of 1920 Tokens/s at 50ms latency, while the Atlas 800I A2 reaches 808 Tokens/s at 100ms latency, with plans for open-sourcing related technologies [3] Group 4: Tencent's New QQ Browser - Tencent released a new version of the QQ browser, integrating QBot functionality, driven by Tencent's mixed Yuan and DeepSeek dual model, capable of extracting and organizing answers from the internet [4][5] - Key features include AI search, multimodal interaction, document interpretation and translation, intelligent writing, and learning assistance, with support for PC and mobile synchronization [5] - An AI toolbox is provided, including format conversion, information extraction, and document processing functions, operable without additional plugins directly in the browser [5] Group 5: Bilibili's AniSora Model - Bilibili open-sourced the animation video generation model Index-AniSora, supporting various anime-style video generation, selected for IJCAI25, and capable of efficient distributed training on Huawei's 910B chip [6] - The system includes two versions: V1.0 based on CogVideoX-5B and V2.0 based on Wan2.1-14B, supporting spatiotemporal masking and local control, covering 80-90% of application scenarios [6] - A dataset of tens of millions of text-video training data was built, and the first human preference reinforcement learning model in the animation field was open-sourced, containing 30,000 labeled samples [6] Group 6: Apple's Matrix3D Model - Apple, in collaboration with Nanjing University, released the Matrix3D model, which generates high-quality 3D scene models from just three photos and has been open-sourced [7] - Apple's leadership is pushing Siri to transition towards a ChatGPT-like model, with internal tests showing the chatbot nearing ChatGPT's capabilities, planning to add web search and app invocation features [7] - The company is cautiously handling Siri's upgrade strategy to avoid premature feature announcements and is considering separating Siri from the Apple Intelligence brand to mitigate negative impacts [7] Group 7: GenSpark's Agentic AI - GenSpark launched the world's first AI download agent tool, Agentic Download Agent, enabling file download and processing automation through natural language commands [8] - Utilizing a Mixture-of-Agents architecture, it integrates eight different scale language models and over 80 toolchains, reducing traditional time-consuming tasks to minutes [8] - An AI Drive smart cloud disk was introduced, supporting various digital asset formats and allowing secondary analysis of downloaded files, with an open API for enterprise system integration [8] Group 8: Granola's AI Note-Taking Product - Granola achieved a valuation of $250 million after completing Series B funding, becoming a preferred note-taking tool for founders and executives through its efficient personalized AI meeting recording feature [10] - The product's core advantage lies in empowering users with control, supporting real-time editing and personalized recording while protecting privacy by not saving audio [10] - The founder believes the key to AI tools is to enhance rather than replace human capabilities, with plans to evolve from a single note-taking tool to a comprehensive work platform integrating personal context [10] Group 9: Robotics Competition Achievements - The first ManiSkill-ViTac 2025 tactile-visual fusion challenge concluded, with Chinese teams winning three gold medals, to be reported at the ICRA 2025 conference [11] - The company Dexmal won gold in pure tactile control and tactile sensor design, improving success rates by 2-3 times through a dual paradigm learning framework, while another company won gold in visual-tactile control [11] - This event is the first public competition combining visual and tactile elements, promoting advancements in tactile-visual fusion algorithms and bridging the gap between laboratory research and real-world applications [11] Group 10: GitHub's Stance on Programming - GitHub CEO Thomas Domke countered the "programming is useless" argument, emphasizing that 2025 will be the year of programming agents, while human programmers will still be needed to manage the software lifecycle [12] - GitHub has released multiple SWE agent products, with Copilot users reaching 15 million, a fourfold increase, and plans to advance multi-agent "band mode" [12] - GitHub asserts that AI should serve as a high-level developer assistant, advocating for continuous learning in programming to maintain guidance and control over AI systems [12]
技术创新的性质
腾讯研究院· 2025-05-19 08:07
Group 1 - Demand is the fundamental driving force behind technological innovation, and the urgency and scale of demand determine the speed and level of innovation [1][3] - Historical examples illustrate that significant innovations often arise from pressing needs, such as the development of the steam engine and the internet, which were driven by specific demands [3] - The integration of technology with practical, widespread needs is essential for its successful implementation and growth [3] Group 2 - Innovation involves trial and error, which inherently requires costs; higher trial and error costs can slow technological progress [4][5] - The digital transformation of manufacturing industries faces high trial and error costs due to stringent requirements for product quality and production stability [6] - Sectors with lower trial and error costs, such as entertainment and digital services, can innovate more rapidly and serve as testing grounds for new technologies [6] Group 3 - Technological innovation is a gradual process rather than a sudden breakthrough, often built upon previous advancements and requiring long-term iteration [7][8] - Major inventions, like the steam engine and computers, have undergone extensive improvements over time rather than appearing fully formed [8][10] - The perception of innovation as revolutionary often overlooks the incremental efforts that lead to significant breakthroughs [10] Group 4 - Resource-rich environments may hinder innovation due to a phenomenon known as the "resource curse," while resource-scarce regions often exhibit stronger innovation capabilities [12][13] - Large organizations may struggle with innovation due to organizational inertia and path dependency, suggesting that smaller, more agile teams may be more successful in driving innovation [13][14] Group 5 - Innovation thrives in diverse environments where different ideas and perspectives can intersect, akin to "cross-pollination" [16][17] - The movement of talent across regions is a key indicator of innovation potential, as diverse backgrounds contribute to new ideas and solutions [17] Group 6 - While youth has historically been associated with innovation, the average age of significant innovators has been rising, with many breakthroughs occurring in the 30-50 age range [18][21] - Despite the trend of older innovators, the urgency to innovate remains, emphasizing the importance of timely action [21] Group 7 - Innovations often emerge simultaneously from different individuals or groups, reflecting the maturity of social conditions rather than individual genius [23][24] - Predictions about the timing and impact of innovations can be notoriously inaccurate, highlighting the unpredictable nature of technological advancement [24][26]
腾讯研究院AI速递 20250519
腾讯研究院· 2025-05-18 14:33
Group 1: OpenAI and AI Programming Tools - OpenAI launched a new AI programming tool Codex, powered by the codex-1 model, which generates clearer code and automatically iterates testing until successful [1] - Codex operates in a cloud sandbox environment, capable of handling multiple programming tasks simultaneously, and supports integration with GitHub for preloading code repositories [1] - The tool is currently available to paid users of ChatGPT Pro, with plans for rate limiting and options to purchase additional credits for more usage [1] Group 2: Image Generation Technologies - Tencent's Mix Yuan Image 2.0 achieves millisecond-level image generation, allowing users to see real-time changes as they input prompts, breaking the traditional 5-10 second generation time limit [2] - The new model supports both text-to-image and image-to-image functionalities, with adjustable reference strength for the image generation process [2] - Manus introduced an image generation feature that understands user intent and plans solutions, providing a one-stop service from brand design to website deployment, although complex tasks may take several minutes to complete [3] Group 3: Google and LightLab Project - Google launched the LightLab project, enabling precise control over light and shadow in images through diffusion models, allowing adjustments to light intensity and color [4][5] - The research team built a training dataset by combining real photo pairs with synthetic rendered images, achieving superior PSNR and SSIM metrics compared to existing methods [5] Group 4: Supermemory API - Supermemory released the Infinite Chat API, acting as a transparent proxy between applications and LLMs, maintaining dialogue context to overcome the 20,000 token limit of large models [6] - The API utilizes RAG technology to manage overflow context, claiming to save 90% of token consumption, and can be integrated into existing applications with just one line of code [6] - Pricing includes a fixed monthly fee of $20, with the first 20,000 tokens of each conversation free, and $1 per million tokens for any excess [6] Group 5: Grok AI Controversy - Grok AI assistant faced backlash for inserting controversial content related to "white genocide" in responses, attributed to unauthorized modifications of system prompts by an employee [7] - xAI publicly released Grok's prompts on GitHub and committed to enhancing review mechanisms and forming a monitoring team [7] - The incident highlighted security vulnerabilities in AI systems that heavily rely on prompts, with research indicating that mainstream models can be compromised through specific prompting techniques [7] Group 6: Windsurf and SWE-1 Model - Windsurf launched the SWE-1 model, focusing on optimizing the entire software engineering process rather than just coding functions, marking its first product release after being acquired by OpenAI for $3 billion [8] - SWE-1 performs comparably to models like GPT-4.1 in programming benchmarks but lags behind Claude 3.7 Sonnet, with a commitment to lower service costs than Claude 3.5 Sonnet [8] Group 7: Google TPU vs. OpenAI GPU - Google TPU offers AI cost efficiency at one-fifth the price of OpenAI's NVIDIA GPUs while maintaining comparable performance [10] - Google's API service Gemini 2.5 Pro is priced 4-8 times lower than OpenAI's o3 model, reflecting different market strategies [10] - Apple's decision to use Google TPU for training its AFM model may influence other companies to explore alternatives to NVIDIA GPUs [10] Group 8: Lovart's Design Philosophy - Lovart's founder emphasizes a three-stage evolution of AI image products, from single content generation to workflow tools, and now to AI-driven agents [11] - The design philosophy focuses on restoring the original essence of design, facilitating natural interaction between AI and users [11] - Lovart believes that general product managers will be replaced by designers with specialized knowledge, stating, "we have no product managers, only designers" [11] Group 9: Lilian Weng's Insights on Model Thinking - Lilian Weng discusses the importance of "thinking time" in large models, suggesting that increasing computational time during testing can enhance performance on complex tasks [12] - Current model thinking strategies include parallel sampling and sequential revision, requiring a balance between thinking time and computational costs [12] - Research indicates that optimizing thinking chains through reinforcement learning may lead to reward hacking issues, necessitating further investigation [12]
“探元计划2024” 数字仿真复原技术重现马王堆千年汉锦风华
腾讯研究院· 2025-05-16 15:15
Core Viewpoint - The "Exploration Yuan Plan 2024" aims to leverage digital technology to reconstruct historical contexts and address the common challenges in the digital restoration of fragile ancient silk artifacts, marking a new chapter in the integration of traditional culture and technology [1][2]. Group 1: Project Overview - The project focuses on the intelligent digital simulation and restoration of silk artifacts from the Mawangdui Han Tomb, utilizing AI technology to preserve and transmit traditional craftsmanship [2][4]. - The project is guided by the National Cultural Heritage Administration and involves collaboration with various organizations, including Tencent and Beijing Zhixin Technology Co., Ltd [1][4]. Group 2: Technological Innovations - The project achieved four major innovations in the restoration process: 1. The first millimeter-level restoration of the exquisite craftsmanship of Mawangdui silk artifacts using AI-assisted pattern generation, significantly reducing the time for generating accurate patterns to one-third of manual drawing time [7]. 2. The simultaneous realization of "restoration as new" and "restoration as old" concepts through AI-assisted damage feature extraction, enhancing efficiency by a hundred times compared to manual extraction [8]. 3. The integration of multiple cross-domain technologies for ultra-high-definition texture simulation, improving restoration accuracy [9][10]. 4. The realistic reproduction of the drape and dynamic effects of Han silk garments through the application of physical replication and motion capture technology [11]. Group 3: Data and Future Plans - The project aims to create three core digital assets that will facilitate the reuse of digital tools for the restoration and revitalization of similar artifacts, promoting a more mature industry solution [14]. - The project has completed a three-dimensional simulation model of the Mawangdui silk garment, with plans for a public display at the Hunan Museum by the end of June [16][18].
腾讯研究院AI每周关键词Top50
腾讯研究院· 2025-05-16 15:15
AI前沿每周关键词Top50 扫码加入ima知识库 ( 腾讯研究院ima AGI知识库二维码) | 类别 | Top关键词 | 主体 | | --- | --- | --- | | 芯片 | 地理追踪 | 英伟达、AMD | | 模型 | GPT-4.1上线 | OpenAI | | 模型 | 极限推理 | Anthropic | | 模型 | Seed1.5-VL | 字节 | | 模型 | UnifiedReward-Think | 腾讯 | | 模型 | 连续思维机器 | Sakana AI | | 模型 | FastVLM | 苹果 | | 模型 | Hunyuan T1-Vision | 腾讯 | | 模型 | Seed-Coder | 字节 | | 模型 | 强化微调上线 | OpenAI | | 应用 | 人格化语音 | MiniMax | | 应用 | 元宝浏览器插件 | 腾讯 | | 应用 | 离线音频生成 | Stability AI、 | | | | Arm | | 应用 | Wan2.1-VACE | 阿里 | | 应用 | 智能NPC | 腾讯 | | 应用 | 数学演化智能体 | ...
会议报名丨生成式AI进展:应用、治理与社会影响
腾讯研究院· 2025-05-16 06:53
在以生成式AI为代表的新一轮技术浪潮推动下,算法与模型的突破正在以前所未有的速度重塑全球的产 业格局、治理结构与社会生态。从内容生产到产业创新,从监管实践到伦理治理,生成式AI的迅猛发展 为全球带来了前所未有的机遇与挑战。 2025年5月22日(周四)下午1点至5点 Driven by a new wave of technology represented by generative AI, breakthroughs in algorithms and models are reshaping the global industrial landscape, governance structure and social ecology at an unprecedented speed. From content production to industrial innovation, from regulatory practice to ethical governance, the rapid development of generative AI has brought unprecedent ...