模型人格量化
Search documents
腾讯研究院AI速递 20260121
腾讯研究院· 2026-01-20 16:03
Group 1 - Musk has fulfilled his promise by open-sourcing the new recommendation algorithm for the X platform, which is 100% AI-driven and removes manual features and rules [1] - The algorithm utilizes Thunder and Phoenix engines to construct information streams, predicting 15 types of user behaviors with weighted scoring, where the weight of replying to authors' comments is 75 times that of likes [1] - Negative feedback such as blocking and reporting significantly reduces visibility, while time spent and genuine interactions are core metrics, allowing even small accounts to gain exposure and diminishing the advantage of having a large follower base [1] Group 2 - Zhipu AI has open-sourced the lightweight model GLM-4.7-Flash, which has 30 billion total parameters but only 3 billion activated, aimed at "local programming and intelligent assistants," with free API access [2] - This model is the first to adopt the MLA architecture from DeepSeek, supporting a context window of 200K and scoring 59.2 in the SWE-bench code repair test [2] - Local deployment tests show that it can run at 43 tokens per second on Apple's M5 chip and is compatible with HuggingFace, vLLM, and Huawei's Ascend NPU [2] Group 3 - MiniMax has unveiled Agent 2.0, defined as an "AI-native workspace," which offers a desktop application for seamless local and cloud connectivity, allowing operations on local files and initiating web automation tasks [3] - The Expert Agents feature encapsulates private knowledge and industry SOPs to create vertical domain expert avatars, enhancing general expertise scores from 70 to as high as 100 [3] - Users can customize Expert Agents, achieving a closed-loop capability from research to delivery, with desktop versions available for both Windows and Mac [3] Group 4 - Jieyue Xingchen has open-sourced the multimodal small model Step3-VL-10B, which, with only 10 billion parameters, competes with and even surpasses models like GLM-4.6V (106 billion) and Qwen3-VL (235 billion) in various evaluations [4] - The model possesses exceptional visual perception, deep logical reasoning, and interactive capabilities with edge agents, achieving top-tier performance in the AIME math competition [4] - It employs 1.2 trillion data for full parameter joint pre-training, over 1400 reinforcement learning iterations, and an innovative PaCoRe parallel coordination reasoning mechanism, with both Base and Thinking versions open-sourced [4] Group 5 - "Moon's Dark Side" is undergoing a new round of financing, with a valuation of $4.8 billion, an increase of $500 million from the previously announced $4.3 billion valuation just 20 days ago, with financing expected to complete soon [5] - The company currently holds over 10 billion yuan in cash and is not in a hurry to go public, planning to time its IPO as a means to accelerate AGI development [5] Group 6 - Superparameter Technology has launched the game agent COTA, which is entirely driven by a large model, achieving professional-level performance in FPS games with a visible reasoning chain [6] - It uses a "dual-system hierarchical architecture" to simulate human fast and slow thinking, with the Commander responsible for strategic decisions and the Operator executing operations in milliseconds, reducing response time to 100 ms [6] - This product validates the feasibility of large models in high-frequency competitive gaming scenarios, providing reference ideas for embodied intelligence and other real-world issues [6] Group 7 - Microsoft CEO Satya Nadella stated at the Davos Forum that mastering model orchestration capabilities is essential for establishing a competitive edge in the AI era [7] - The proliferation of AI requires enhancing "token efficiency per dollar per watt" from the supply side, while the demand side necessitates companies to drive transformation across "concepts, capabilities, and data" [7] - True "enterprise sovereignty" involves converting unique experiences and knowledge into proprietary AI models to prevent core value from flowing to model providers [7] Group 8 - a16z's analysis indicates that while ChatGPT maintains a dominant position with 800-900 million weekly active users, Gemini is growing at 155%, indicating a "winner-takes-most" market in AI assistants [8] - OpenAI's new experiences pushed through the ChatGPT interface for shopping, tasks, and learning have not truly broken through, limited by the existing chatbox interface's inability to provide a top-tier product experience [8] - Successful AI products like Replit, Suno, and Character AI share a common trait of having a distinct and focused interface, suggesting that startup opportunities lie in deep optimization for specific workflows [8] Group 9 - Anthropic's research team has discovered that model personalities can be quantified, with a dominant dimension called the "assistance axis" measuring the extent to which models operate in "intelligent assistant" mode [9] - Interventions along the assistance axis can control role-playing willingness, significantly reducing harmful response rates and defending against personality jailbreak attacks [9] - The proposed "activation ceiling" technique can lower the success rate of personality jailbreaks by nearly 60% without significantly impairing model performance, opening new pathways for human control over AI [9]