腾讯研究院AI速递 20251216
腾讯研究院·2025-12-15 16:22

Group 1: Manus 1.6 Release - Manus 1.6 Max has transitioned from an "auxiliary tool" to an "independent contractor," resulting in a 19.2% increase in user satisfaction, capable of independently completing complex Excel financial modeling and data analysis [1] - New mobile development features support end-to-end app development processes, allowing users to generate runnable iOS and Android applications simply by describing their needs [1] - The introduction of Design View allows for localized image editing, precise text rendering, and multi-layer composition, addressing the uncontrollable issues of AI-generated images [1] Group 2: OpenAI Circuit-Sparsity Model - OpenAI has released the Circuit-Sparsity model with only 0.4 billion parameters, enforcing 99.9% of weights to be zero, retaining only 0.1% non-zero weights, which addresses model interpretability issues [2] - The sparse model forms a compact and readable "circuit," reducing the scale by 16 times compared to dense models, although it operates 100 to 1000 times slower [2] - The research team proposed a "bridge network" solution to insert encoder-decoder pairs between sparse and dense models, enabling interpretable behavior editing of existing large models [2] Group 3: Thinking Machines Product Update - Thinking Machines, founded by former OpenAI CTO Mira Murati, has opened access to its Tinker product, an API for developers to fine-tune language models [3] - The update includes support for Kimi K2 Thinking fine-tuning (designed for long-chain reasoning) and Qwen3-VL visual input (available in 30B and 235B models) [3] - A new inference interface compatible with OpenAI API has been introduced, allowing users to easily integrate with any platform that supports OpenAI API, simplifying the post-training process for LLMs [3] Group 4: NotebookLM Integration with Gemini - NotebookLM has officially integrated with the Gemini system, allowing users to add NotebookLM notes as data sources for Q&A within Gemini conversations [4] - Gemini acts as a "hub" connecting multiple NotebookLM notes, resolving the issue of NotebookLM not supporting notebook merging, enabling simultaneous queries across multiple notes [4] - The content from NotebookLM can now be used alongside online information, facilitating a mixed analysis of "personal data + global information," integrating into Google's core AI product line [4] Group 5: Tongyi's Model Releases - Tongyi Bailing has upgraded the Fun-CosyVoice3 model, reducing initial latency by 50% and doubling the accuracy of mixed Chinese-English recognition, supporting 9 languages and 18 dialects for cross-lingual cloning and emotional control [5] - The Fun-ASR model achieves a 93% accuracy rate in noisy environments, supports lyrics and rap recognition, and covers 31 languages for free mixing, with the initial word latency reduced to 160ms [5][6] - The open-source Fun-CosyVoice3-0.5B provides zero-shot voice cloning capabilities, while the lightweight Fun-ASR-Nano-0.8B version offers lower inference costs [6] Group 6: Zoom's AI Claims - Zoom claims to have achieved a score of 48.1% on the "Human Last Exam" HLE benchmark, surpassing Google Gemini 3 Pro's score of 45.8% by 2.3 percentage points [7] - The company employs a "federated AI approach," combining its small language model with both open-source and closed-source models from OpenAI, Anthropic, and Google, using a Z-scorer scoring system for output selection [7] - This score has not appeared on the official HLE leaderboard, and on the same day, Sup AI announced a score of 52.15%, indicating Zoom's ambition to become the AI hub in enterprise workflows [7] Group 7: Gemini 3's CFA Exam Performance - Recent research indicates that reasoning models have passed all levels of the CFA exam, with Gemini 3.0 Pro achieving a historic high of 97.6% on Level 1 and GPT-5 leading Level 2 with 94.3% [8] - In Level 3, Gemini 2.5 Pro scored 86.4% on multiple-choice questions, while Gemini 3.0 Pro reached 92.0% on open-ended questions, showing significant improvement from previous years [8] - Experts caution that passing exams does not equate to practical capability, noting that AI struggles with ethical questions and cannot replace analysts' strategic thinking and client communication [8] Group 8: OpenEvidence Valuation Surge - OpenEvidence is undergoing a $250 million equity financing round, with a post-money valuation reaching $12 billion, doubling from its previous round two months ago [9] - The company generates revenue by selling advertising space for chatbots to pharmaceutical companies, with an annual advertising income of approximately $150 million, tripling since August, and a gross margin exceeding 90% [9] - An OffCall survey indicates that about 45% of U.S. doctors use OpenEvidence, answering approximately 20 million questions monthly, with its medical journal information being more accurate than general chatbots [9] Group 9: OpenAI's Sora Development Insights - OpenAI's development of the Android version of Sora was completed in just 28 days by a team of 4 engineers collaborating with the AI agent Codex, consuming around 5 billion tokens, with approximately 85% of the code generated by AI [10] - The team utilized an "exploration-validation-federation" workflow, allowing Codex to handle heavy coding tasks while engineers focused on architecture, user experience, and quality control, achieving a 99.9% crash-free rate [10] - Codex is responsible for 70% of OpenAI's internal PR weekly, capable of monitoring its training process and handling user feedback, creating a self-evolving model of "AI iterating AI" [10]

腾讯研究院AI速递 20251216 - Reportify