Workflow
腾讯研究院
icon
Search documents
探元计划香港站|AI 赋能历史溯源,解码九龙寨城中华文脉基因
腾讯研究院· 2025-05-23 07:47
Core Viewpoint - The "Exploration Plan 2024" aims to integrate culture and technology to promote the digital preservation of cultural heritage, with a focus on the "In Kowloon City, Witness Hong Kong" project, which highlights the historical significance of Kowloon City and its cultural narratives [3][10]. Group 1: Project Overview - The "In Kowloon City, Witness Hong Kong" project is a collaboration between Hong Kong United Publishing Group, Electronic Publishing Co., and Huacui Starlight (Beijing) Intelligent Technology Co., utilizing advanced technologies like large model agents and 3D virtual spaces to recreate the cultural essence of Kowloon City [3][4]. - The project was selected from 81 cultural demand scenarios as one of the six key cultural co-creation scenes under the "Exploration Plan 2024" [4]. Group 2: Technological Innovations - The project team is developing a multimodal knowledge intelligent agent that supports bilingual and trilingual interactions, enhancing user engagement with Kowloon City's historical culture [4]. - An AI interactive narrative game is being designed to create immersive learning experiences, encouraging public interest in Kowloon City's history [4]. - A 3D virtual space of Kowloon City will be constructed to allow users to experience different historical periods and cultural customs [4]. Group 3: Expert Insights and Discussions - Experts from various sectors, including cultural institutions and universities, discussed the importance of technology and culture working together to enhance cultural dissemination and user engagement [11]. - The discussions emphasized the need for a shift from one-way cultural output to a collaborative and shared approach, utilizing gamification and user-generated content to stimulate cultural transmission [11]. - The project aims to create sustainable development models by integrating educational and cultural tourism resources, focusing on local schools and Kowloon City Park as pilot sites [11]. Group 4: Future Events and Exhibitions - The results of the "In Kowloon City, Witness Hong Kong" project will be showcased at the Shenzhen Cultural Expo from May 22 to 26 and at the Hong Kong Book Fair from July 16 to 22 [13].
大模型巨浪的下一个方向:AI Ascent 2025的十个启示
腾讯研究院· 2025-05-23 07:47
Core Insights - AI is expected to create trillion-dollar market opportunities, with all necessary elements in place for an imminent explosion in AI development [3][7] - The leap in AI capabilities, such as coding, indicates a shift towards a "bountiful era" where labor becomes cheap and abundant, while "taste" may become a new scarce asset [3][9] - The number of foundational large models will be limited, with companies investing more in reinforcement learning to enhance model capabilities [3][4] Group 1 - AI models may become more sparse and specialized, focusing on different areas of expertise and allowing for dynamic resource allocation [4][17] - Intelligent agents will possess improved working capabilities, including better memory and self-guidance, enabling longer autonomous operation [5][18] - User engagement with AI products may evolve into a new business model where personal background information is used for logging into multiple AI services [6][22] Group 2 - Innovation in the AI era is occurring at the blurred lines between model research and product development, advocating for a bottom-up exploration approach [4][21] - Organizations developing software products will face challenges from AI code generation, necessitating structural and operational changes [5][24] - Companies need to adopt a "stochastic mindset" to manage the uncertainties of AI, shifting from strict rule-driven approaches to dynamic adaptability [5][8] Group 3 - The competition in AI applications is expected to intensify, leading to the formation of an "agent economy" [6][9] - Startups should focus on solving complex problems that require human involvement, building data flywheels linked to specific business metrics [8][9] - AI's impact on the economy will be profound, reshaping companies and the overall economic landscape [8][9] Group 4 - OpenAI emphasizes maintaining organizational agility and aims to become a "core AI subscription" service [10][12] - The potential of models is believed to have a 10-100x growth space, with a focus on reinforcement learning to enhance model capabilities [10][11] - The vision includes creating an AI application ecosystem that provides powerful tools and services for developers and users [12][13] Group 5 - Google's approach focuses on hardware-software synergy to enhance model development, predicting significant advancements in AI capabilities within the next few years [14][15] - The future of models may involve mixed expert models to improve computational efficiency and continuous learning [17][18] - AI's transformative potential in scientific research is highlighted, with expectations for AI to replace traditional simulation methods [18][19] Group 6 - Anthropic advocates for a bottom-up approach in AI product development, emphasizing the importance of user needs over technical showcases [20][21] - The next generation of AI products will focus on autonomous agents capable of long-term operation and improved collaboration [22][23] - The rise of AI-generated content will necessitate new standards for content traceability and security [22][24]
腾讯研究院AI速递 20250523
腾讯研究院· 2025-05-22 15:09
Group 1: OpenAI Innovations - OpenAI's Responses API now supports MCP services, allowing developers to connect external services with simple configurations, significantly reducing development complexity [1] - The updated API enhances security controls through the allowed_tools parameter and permission management to ensure safe tool usage by agents [1] - New features include image generation, Code Interpreter, file search, background mode, inference summaries, and encrypted inference items [1] Group 2: Microsoft's Magentic-UI - Microsoft launched the open-source Web Agent project Magentic-UI, enabling automatic web browsing, file reading/writing, and code execution, with user monitoring and control [2] - The system employs a collaborative planning and execution mechanism, generating task plans for user confirmation and allowing real-time intervention during execution [2] - The project integrates innovative technologies like neural style engines, component DNA mapping, and performance prediction for intelligent style conversion and component reuse [2] Group 3: Mistral's Devstral Model - Mistral, in collaboration with All Hands AI, released the open-source language model Devstral, featuring 24 billion parameters and capable of running on a single RTX 4090 or a 32GB RAM Mac [3] - Devstral scored 46.8% on the SWE-Bench Verified benchmark, outperforming GPT-4.1-mini and other open-source models, showcasing excellent code understanding and problem-solving abilities [3] - The model is released under the Apache 2.0 license for commercial use, with pricing set at $0.10 per million input tokens and $0.30 per million output tokens [3] Group 4: xAI's Live Search API - xAI introduced the Live Search API, providing real-time data access for Grok AI, enabling retrieval of the latest information from X platform, web content, and breaking news [4][5] - The API offers flexible search control features, including enabling/disabling searches, limiting result numbers, and specifying time ranges and domains, combined with DeepSearch for inference display [5] - A Python SDK is available, with free beta testing until June 5, 2025, allowing developers to implement real-time information queries and research assistance [5] Group 5: OpenAI's Acquisition of Jony Ive's Team - OpenAI acquired AI device startup io for $6.5 billion, gaining a hardware team led by former Apple Chief Design Officer Jony Ive, with the deal expected to close by summer [6] - io is developing new forms of AI devices aimed at reducing screen time, including headphones, wearables, and AI home devices, with a projected release in 2026 [6] - The associated company LoveFrom will continue to operate independently while taking on more design responsibilities for OpenAI, including ChatGPT interface and voice interaction products [6] Group 6: Kunlun Wanwei's Skywork Super Agents - Kunlun Wanwei launched the Skywork Super Agents, integrating five expert agents and one general agent for one-stop generation of documents, PPTs, and spreadsheets [7] - The product's core is based on deep research technology, supporting deep information retrieval and traceable content generation at only 40% of OpenAI's costs, with the framework open-sourced [7] - System features include automated requirement clarification, information tracing, and personal knowledge base functionality, allowing users to upload various file formats to build knowledge bases [7] Group 7: Microsoft's Aurora Model - Microsoft introduced the first large-scale atmospheric foundation model, Aurora, trained on millions of hours of atmospheric data, achieving computation speeds 5000 times faster than the most advanced numerical forecasting systems [8] - Aurora excels in predicting air quality, wave patterns, tropical cyclone trajectories, and high-resolution weather, maintaining high accuracy even in data-scarce regions and extreme weather [8] - The model utilizes a 3D Swin Transformer architecture, allowing fine-tuning for different application areas, with a training cycle of only 4-8 weeks, and future expansion into ocean circulation and seasonal weather predictions [8] Group 8: Gartner's Principles for Intelligent Applications - Gartner identified that GenAI will drive enterprise software from auxiliary tools to intelligent agents, outlining five principles for building intelligent applications: adaptive experience, embedded intelligence, autonomous orchestration, interconnected data, and composable architecture [9] - Intelligent applications emphasize personalized experiences and proactive services, enabling cross-system tasks through natural language interactions, with AI capabilities deeply embedded in business logic for process optimization [9] - Enterprises need to maintain balanced investments in the five principles while upgrading foundational data, processes, architecture, and experiences to ensure intelligent applications transition from pilot demonstrations to scalable value applications [9] Group 9: a16z's Insights on AI Programming - The AI coding market has become the second-largest AI market after chatbots, valued at approximately $3 trillion, with developers rapidly adopting this tool as early technology adopters [10] - AI programming will not completely replace traditional programming; understanding foundational abstractions and system architecture remains crucial, with developer roles shifting towards product management or QA engineering [10] - New demographics and methods are fostering a new software paradigm, similar to the WordPress era, where AI lowers the barrier to "writing code," yet the depth and complexity of software development still require professional knowledge [10]
吴恩达:如何在人工智能领域打造你的职业生涯?
腾讯研究院· 2025-05-22 09:35
Core Insights - The article emphasizes the importance of coding in artificial intelligence as a new literacy skill, akin to reading and writing [7][8] - It outlines three key steps for career development in AI: learning foundational skills, engaging in project work, and finding a job [11][12] - The article discusses the necessity of technical skills in promising AI careers, including machine learning, deep learning, and software development [15][16] Group 1: Importance of Coding and AI Skills - Coding is becoming essential for effective communication between humans and machines, with AI applications becoming increasingly prevalent in various industries [8][9] - Foundational skills in AI include machine learning techniques such as linear regression, neural networks, and understanding the underlying mathematics [17][18] - Continuous learning and adapting to new technologies are crucial in the rapidly evolving field of AI [19][20] Group 2: Project Work and Career Development - Engaging in project work helps deepen skills, build a portfolio, and create impact, which is vital for career advancement in AI [12][13] - Identifying valuable projects involves understanding business problems, brainstorming AI solutions, and evaluating their feasibility [26][30] - A supportive community is essential for navigating the challenges of project work and career transitions in AI [14][33] Group 3: Job Search Strategies - The job search process in AI typically involves researching roles, preparing for interviews, and leveraging networks for opportunities [46][58] - Information interviews can provide valuable insights into specific roles and companies, helping candidates understand the skills required [52][54] - Building a strong portfolio of projects that demonstrate skill progression is beneficial when seeking employment in AI [40][45] Group 4: Overcoming Challenges - Many individuals experience imposter syndrome in the AI field, which can hinder their confidence and growth [10][70] - The article encourages embracing the learning journey and recognizing that mastery comes with time and experience [70]
腾讯研究院AI速递 20250522
腾讯研究院· 2025-05-21 15:01
Group 1 - Google Veo 3 features audio-visual synchronization, generating video, dialogue, lip movements, and sound effects based on prompts, providing a complete audio-visual experience [1] - Gemini Diffusion generates text at a speed of 2000 tokens per second, capable of producing 10,000 tokens in 12 seconds, utilizing diffusion technology for rapid iteration and error correction [2] - Tencent's TurboS ranks among the top eight globally, with improvements in reasoning and coding capabilities, and introduces new models for visual reasoning and voice communication [3] Group 2 - ByteDance launches the Doubao voice podcast model, enabling rapid conversion from text to dual-dialogue podcasts, addressing traditional AI podcast challenges [4][5] - Google introduces the Flow AI editing tool, supporting video generation and editing with various input methods, allowing for the export of high-quality video content [6] - Google collaborates with Xreal to launch Project Aura smart glasses, featuring real-time translation and visual search capabilities, built on the Gemini platform [7] Group 3 - NVIDIA's DreamGen project allows robots to learn autonomously in a generated "dream world," significantly improving success rates in various robotic applications [8] - The FaceAge AI model predicts biological age from facial photos, showing significant correlations with cancer patient outcomes, though it has limitations in training data diversity [10] - Microsoft's CPO emphasizes the shift in product management towards prompt-based development, highlighting the importance of taste and editing skills in the AI era [11] Group 4 - The discussion on the implications of AI solving all problems raises concerns about human purpose and values in a future where traditional work may no longer be necessary [12]
腾讯研究院数字内容研究实习生招聘
腾讯研究院· 2025-05-21 07:51
岗位: 腾讯研究院 数字内容研究实习生 岗位描述 1、 研究方向:数字内容—游戏及电竞研究 2、工作地点:北京市朝阳区亚洲金融大厦 3、工作待遇:税后150元/天 点个 "在看" 分享洞见 2、能综合应用各类AI工具,完成信息查询、数据分析、案例研究、文章撰写等工作。 3、日常交办的其他工作。 岗位要求 1、重点大学的出版/经管/统计/传媒等专业的在校硕士/博士研究生,关注游戏等行业前沿发展、有 相关研究成果者可不限专业。 2、了解游戏及数字内容行业趋势、技术创新,有互联网行业研究经验,对行业热点事件有独立认识和 思考。 3、具备较强的写作能力/数据分析能力和行业研究素养;喜爱研究,有志从事研究工作或渴望培养研 究能力。 4、责任感强,有契约精神,实习期6个月以上者优先。 有意者请以 【姓名-学校-年级-专业-每周 x 天】 命名邮件标题和附件,发送简历到 xuyuanhu@tencent.com ,并请附带个人研究论文等成果。 4、实习时间:每周坐班5天、实习6个月以上,立即上岗者优先 工作内容 1、围绕游戏及电竞领域的行业发展、文化融合与科技创新等提供研究支持。 ...
腾讯汤道生:每个企业都将成为AI公司,每个人都将是“超级个体”
腾讯研究院· 2025-05-21 07:51
汤道生 腾讯集团高级执行副总裁、云与智慧产业事业群CEO "AI持续落地,每个企业正在成为AI公司,每个人也将成为AI加持的'超级个体'。"5月21日,腾讯云AI 产业应用峰会在北京举办。腾讯集团高级执行副总裁、云与智慧产业事业群CEO汤道生表示,模型深度 思考的突破,推动生成式AI的可用性从"量变"发展到"质变",腾讯持续加大AI投入力度,各项业务全面 拥抱AI。同时也以大模型、智能体、知识库和基础设施"四个加速",打造 "好用的AI" ,助力AI走进千 行百业,走近每个人的生活。 今年以来,产业对于大模型API的调用量、算力需求等也快速增长。汤道生认为,生成式AI已经逐步跨 过"可用性"的门槛,未来要从"可用"到"好用";从"一部分人用",到"人人能用",还需要在交互体验、 执行能力、内容准确性、落地成本等方面持续升级。优化模型可以提升性能和交互体验;智能体可以赋 予模型独立执行任务的能力;知识库能帮助减少模型幻觉,更懂企业和用户;基础设施和工程优化可以 降低训推成本、提升响应速度。 模型是AI应用的基础。腾讯混元T1和Turbo S能力持续迭代,在全球权威Chatbot Arena排行中,混元 Turb ...
腾讯研究院AI速递 20250521
腾讯研究院· 2025-05-20 16:01
Group 1: Microsoft Developments - Microsoft has upgraded GitHub Copilot into a Coding Agent, automating the entire process of bug fixing and code maintenance [1] - The Microsoft Discovery platform aids scientific innovation with capabilities for idea generation, result simulation, and autonomous learning [1] Group 2: Google Innovations - Google has launched the AI programming assistant Jules, which connects directly to GitHub and allows for five free uses per day [2] - Jules can autonomously complete coding tasks and generate detailed plans for developers to review [2] - Gartner predicts that by 2028, 75% of new application development will utilize AI-assisted programming [2] Group 3: Tencent's Gaming Engine - Tencent has released the first industrial-grade AIGC game content production engine, "混元游戏," which significantly reduces character generation time from 12 hours to 30 minutes [3] - The platform offers core functionalities such as AI art pipelines and real-time canvas generation [3] Group 4: AI Podcasting Tool - Mars Electric Wave Company has introduced ListenHub, an AI tool that converts links and documents into podcasts, allowing for quick transformation of content into audio [4][5] - ListenHub is faster than Google NotebookLM and offers more natural Chinese voice output, although it has limitations in content depth [5] Group 5: Zhiyuan BGE Models - Zhiyuan Research Institute has released three vector models that have achieved state-of-the-art results in various benchmarks [6] - BGE-Code-v1 supports 14 programming languages and excels in code repository retrieval [6] Group 6: Google NotebookLM App - Google has launched the NotebookLM app for iOS and Android, featuring document-to-podcast functionality and offline audio playback [7] - The app supports various document formats and is designed for students and lifelong learners [7] Group 7: Microsoft Discovery in Research - Microsoft Discovery has enabled the discovery of new materials in just 200 hours without coding, significantly faster than traditional methods [8] - The platform combines foundational and specialized models to facilitate complex scientific data understanding [8] Group 8: Open Source Humanoid Robot - UC Berkeley has developed an open-source humanoid robot, Berkeley Humanoid Lite, with a total cost under $5,000 [9] - The robot features a modular design and can perform bipedal walking and remote operation [9] Group 9: AI's Impact on Programming - Anthropic's CEO predicts that AI will be able to write 90% of code within 3-6 months, with 97% of technical personnel already using AI coding tools [10] - Experts believe that AI will not replace programmers but will change their roles to focus on AI guidance and innovation [10] Group 10: Tencent's ima Product - Tencent's ima team has developed a knowledge management platform that integrates AI capabilities naturally into its functions [11] - The product has accumulated nearly 10 million pieces of content and emphasizes user feedback and experience optimization [11]
混元与AI生图的“零延迟”时代
腾讯研究院· 2025-05-20 08:48
Core Viewpoint - Tencent's Hunyuan Image 2.0 model represents a significant advancement in image generation technology, enabling real-time, high-quality image creation with minimal latency, thus enhancing user experience and productivity in various applications [3][4][10]. Group 1: Model Features - Hunyuan Image 2.0 utilizes a high-compression image codec and a new diffusion architecture, achieving ultra-fast inference speeds and high-quality image generation [3]. - The model allows for "what you see is what you get" functionality, enabling users to see image changes in real-time as they input text prompts [4][11]. - Compared to existing models that take 5-10 seconds to generate images, Hunyuan Image 2.0 significantly reduces this time, providing a more efficient user experience [5][8]. Group 2: User Experience - The model supports strong adherence to text prompts, allowing for real-time modifications of images based on user input [8]. - It offers two modes for image generation: "reference subject" and "reference outline," allowing users to set the intensity of reference features for more tailored outputs [19][22]. - Users can upload reference images and adjust the strength of adherence to the original image, enabling creative flexibility [19][20]. Group 3: Applications and Use Cases - The technology serves as an instant design assistant, facilitating quick creation of illustrations for presentations and creative projects [5][8]. - For professional designers, the dual canvas feature allows for immediate previews of color and style changes, streamlining the creative process [27][30]. - The model's ability to generate images based on detailed prompts enables users to create complex visuals, such as character designs or themed illustrations, with minimal effort [15][33]. Group 4: Performance Metrics - Hunyuan Image 2.0 outperforms competitors in various evaluation metrics, achieving a score of 0.9597 in overall performance, surpassing models like DALL-E 3 and CogView4-6B [7]. - The model demonstrates strong capabilities in generating images with specific attributes, such as color and position, indicating its advanced understanding of user prompts [7]. Group 5: Accessibility - The model is currently available for public testing, allowing users to experience its capabilities firsthand [9]. - Its user-friendly interface enables individuals with no design background to easily create images, democratizing access to advanced image generation technology [27].
腾讯研究院AI速递 20250520
腾讯研究院· 2025-05-19 14:57
Group 1: OpenAI and G42 Data Center - OpenAI collaborates with G42 to build a 5 GW data center in Abu Dhabi, covering 10 square miles, larger than Monaco [1] - The project is part of the "Stargate" initiative, consuming power equivalent to five nuclear power plants, and is four times the size of the Texas Abilene facility [1] - G42 withdrew its investments in China due to U.S. concerns over its ties with Chinese entities, while Microsoft invested $1.5 billion and placed executives on G42's board [1] Group 2: NVIDIA's New Technologies - NVIDIA launched the new Grace Blackwell GB300 system, enhancing performance and allowing 72 GPUs to connect as a single giant GPU via MVLink technology [2] - The MVLink Fusion plan enables partners to integrate custom ASICs or CPUs into the NVIDIA ecosystem, supporting semi-custom AI infrastructure [2] - The Isaac GR00T platform and Cosmos physical AI model were introduced to strengthen robotics and digital twin technologies, with the Newton physics engine set to be open-sourced in July [2] Group 3: Huawei's Innovations - Huawei's Ascend introduced the CloudMatrix 384 super node and Atlas 800I A2 server, surpassing NVIDIA's Hopper architecture in DeepSeek model inference performance [3] - The "mathematics compensating for physics" strategy, utilizing FlashComm communication and AMLA algorithms, addresses challenges in deploying large-scale MoE models [3] - The CloudMatrix 384 super node achieves a throughput of 1920 Tokens/s at 50ms latency, while the Atlas 800I A2 reaches 808 Tokens/s at 100ms latency, with plans for open-sourcing related technologies [3] Group 4: Tencent's New QQ Browser - Tencent released a new version of the QQ browser, integrating QBot functionality, driven by Tencent's mixed Yuan and DeepSeek dual model, capable of extracting and organizing answers from the internet [4][5] - Key features include AI search, multimodal interaction, document interpretation and translation, intelligent writing, and learning assistance, with support for PC and mobile synchronization [5] - An AI toolbox is provided, including format conversion, information extraction, and document processing functions, operable without additional plugins directly in the browser [5] Group 5: Bilibili's AniSora Model - Bilibili open-sourced the animation video generation model Index-AniSora, supporting various anime-style video generation, selected for IJCAI25, and capable of efficient distributed training on Huawei's 910B chip [6] - The system includes two versions: V1.0 based on CogVideoX-5B and V2.0 based on Wan2.1-14B, supporting spatiotemporal masking and local control, covering 80-90% of application scenarios [6] - A dataset of tens of millions of text-video training data was built, and the first human preference reinforcement learning model in the animation field was open-sourced, containing 30,000 labeled samples [6] Group 6: Apple's Matrix3D Model - Apple, in collaboration with Nanjing University, released the Matrix3D model, which generates high-quality 3D scene models from just three photos and has been open-sourced [7] - Apple's leadership is pushing Siri to transition towards a ChatGPT-like model, with internal tests showing the chatbot nearing ChatGPT's capabilities, planning to add web search and app invocation features [7] - The company is cautiously handling Siri's upgrade strategy to avoid premature feature announcements and is considering separating Siri from the Apple Intelligence brand to mitigate negative impacts [7] Group 7: GenSpark's Agentic AI - GenSpark launched the world's first AI download agent tool, Agentic Download Agent, enabling file download and processing automation through natural language commands [8] - Utilizing a Mixture-of-Agents architecture, it integrates eight different scale language models and over 80 toolchains, reducing traditional time-consuming tasks to minutes [8] - An AI Drive smart cloud disk was introduced, supporting various digital asset formats and allowing secondary analysis of downloaded files, with an open API for enterprise system integration [8] Group 8: Granola's AI Note-Taking Product - Granola achieved a valuation of $250 million after completing Series B funding, becoming a preferred note-taking tool for founders and executives through its efficient personalized AI meeting recording feature [10] - The product's core advantage lies in empowering users with control, supporting real-time editing and personalized recording while protecting privacy by not saving audio [10] - The founder believes the key to AI tools is to enhance rather than replace human capabilities, with plans to evolve from a single note-taking tool to a comprehensive work platform integrating personal context [10] Group 9: Robotics Competition Achievements - The first ManiSkill-ViTac 2025 tactile-visual fusion challenge concluded, with Chinese teams winning three gold medals, to be reported at the ICRA 2025 conference [11] - The company Dexmal won gold in pure tactile control and tactile sensor design, improving success rates by 2-3 times through a dual paradigm learning framework, while another company won gold in visual-tactile control [11] - This event is the first public competition combining visual and tactile elements, promoting advancements in tactile-visual fusion algorithms and bridging the gap between laboratory research and real-world applications [11] Group 10: GitHub's Stance on Programming - GitHub CEO Thomas Domke countered the "programming is useless" argument, emphasizing that 2025 will be the year of programming agents, while human programmers will still be needed to manage the software lifecycle [12] - GitHub has released multiple SWE agent products, with Copilot users reaching 15 million, a fourfold increase, and plans to advance multi-agent "band mode" [12] - GitHub asserts that AI should serve as a high-level developer assistant, advocating for continuous learning in programming to maintain guidance and control over AI systems [12]