Gemini 3 Pro
Search documents
“光顾赚钱不搞研究”,OpenAI元老级高管出现离职潮,Mark Chen紧急回应
3 6 Ke· 2026-02-04 08:51
绷不住了!OpenAI深陷高管离职潮,内部"红色警报"再次拉响。 且看最近的离职名单,个顶个的都是OpenAI元老级人物: Jerry Tworek:原OpenAI研究副总裁,o3/o1负责人,GPT-4/Codex核心贡献者; Andrea Vallone:原OpenAI模型策略团队负责人; Tom Cunningham:原OpenAI经济预测与商业规划负责人; Hannah Wong:原OpenAI首席传播官; Matt Knight:原OpenAI首席信息安全官; …… 为啥会出现这种情况呢? 站队Mark Chen的认为,开公司就是为了赚钱,没毛病! 据《金融时报》透露,这场危机和OpenAI内部的战略转向脱不开关系。 简单来说就是,商人重利轻研究,在OpenAI里做基础研究越来越没出路……(doge) 这也难怪那些心怀大志的研究员们要纷纷跳船离开。 结果Mark Chen坐不住了,立马出来反驳:这种说法完全错误! 基础研究一直是OpenAI的核心。 一边是各种小道消息满天飞,一边是当事人出面辩解,OpenAI这场瓜,网友们吃得那叫一个欢。 Anyway,咱先来把事情经过捋一捋。 All in LLM ...
“光顾赚钱不搞研究”,OpenAI元老级高管出现离职潮,Mark Chen紧急回应
量子位· 2026-02-04 07:28
鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI 绷不住了!OpenAI深陷 高管离职潮 ,内部"红色警报"再次拉响。 且看最近的离职名单,个顶个的都是OpenAI元老级人物: 为啥会出现这种情况呢? 据《金融时报》透露,这场危机和OpenAI内部的 战略转向 脱不开关系。 简单来说就是,商人重利轻研究,在OpenAI里做基础研究越来越没出路…… (doge) 这也难怪那些心怀大志的研究员们要纷纷跳船离开。 结果 Mark Chen 坐不住了,立马出来反驳:这种说法完全错误! 基础研究一直是OpenAI的核心。 一边是各种小道消息满天飞,一边是当事人出面辩解,OpenAI这场瓜,网友们吃得那叫一个欢。 站队Mark Chen的认为,开公司就是为了赚钱,没毛病! Jerry Tworek:原OpenAI研究副总裁,o3/o1负责人,GPT-4/Codex核心贡献者; Andrea Vallone:原OpenAI模型策略团队负责人; Tom Cunningham:原OpenAI经济预测与商业规划负责人; Hannah Wong:原OpenAI首席传播官; Matt Knight:原OpenAI首席信息安全官; ...
OpenClaw一战封神,给大家分享6种官方不会告诉你的神级技巧。
数字生命卡兹克· 2026-02-04 02:11
这比开个OpenCode或者开个Codex的漫长前戏,爽多了。 而且我给了他一个人设: OpenClaw(也就是Clawdbot)的热度还在继续。 在我自己经历了好几天的深度使用之后,说句实话,我开始连我最心爱的OpenCode都比较少打开了,以前操作电脑干点事,我真的都是先打开Codex或 者OpenCode,然后让他们去解决。 但是现在,我开始习惯于,在飞书上给OpenClaw下命令了,因为,这玩意实在太方便了,常驻后台,你几乎无感。 无论你人在哪,想起个啥,随时随地,打开飞书,直接发话。 周一的时候,我做了一个非常重要的决定,就是,把我的主力Macbook的一些重要和敏感文件备份了下来,然后直接把我的电脑,给重置了。 你看我的硬盘空间你就懂了。 从此以后,我就跟这个胖逼小龙虾一起成长,咱们反正一起从0开始,我用的电脑,也就是你的家园。 也提前感受一下,那个所谓人人都有个人通用Agen助理的生活。 这里我也稍微简单的提一下,你想得到最牛逼的体验,那OpenClaw一定要用Mac,别用服务器或Windows,差距真的超级巨大。 你的名字是小卡,你的身份:是我 数字生命卡兹克的 AI 员工,你的性格:幽默风趣 ...
智谱逆势涨超8%,正式发布并开源GLM-OCR,多领域性能接近Gemini 3 Pro
Ge Long Hui· 2026-02-03 03:37
责任编辑:栎树 智谱表示,该模型仅0.9B参数规模,支持vLLM、SGLang和Ollama部署。在文本、公式、表格识别及讯 息抽取四大细分领域的表现优于多款OCR专项模型,性能接近谷歌(Google)旗舰大模型Gemini 3 Pro。 在实际应用中,GLM-OCR能够精准解析扫描件、PDF、表格及票据,有效解决手写、印章、竖排及多 语言混排难题。公司称,未来将持续迭代GLM-OCR,推出更多尺寸版本,并将能力延伸至更多语言和 视频OCR,全面拓宽视觉智能的应用边界。 港股频道更多独家策划、专家专栏,免费查阅>> 2月3日,"国产大模型六小虎"之一的人工智能(AI)初创公司智谱(2513.HK)今日盘中一度逆势拉升涨超 8%至243港元。消息面上,智谱正式发布并开源GLM-OCR。 ...
港股异动丨智谱逆势涨超8%,正式发布并开源GLM-OCR,多领域性能接近Gemini 3 Pro
Ge Long Hui· 2026-02-03 03:07
"国产大模型六小虎"之一的人工智能(AI)初创公司智谱(2513.HK)今日盘中一度逆势拉升涨超8%至243港 元。消息面上,智谱正式发布并开源GLM-OCR。 智谱表示,该模型仅0.9B参数规模,支持vLLM、SGLang和Ollama部署。在文本、公式、表格识别及讯 息抽取四大细分领域的表现优于多款OCR专项模型,性能接近谷歌(Google)旗舰大模型Gemini 3 Pro。 在实际应用中,GLM-OCR能够精准解析扫描件、PDF、表格及票据,有效解决手写、印章、竖排及多 语言混排难题。公司称,未来将持续迭代GLM-OCR,推出更多尺寸版本,并将能力延伸至更多语言和 视频OCR,全面拓宽视觉智能的应用边界。 ...
AI看不懂的色盲测试背后,藏着一场像素与诗意的战争。
数字生命卡兹克· 2026-02-03 01:31
Core Viewpoint - The article discusses the limitations of AI in visual perception, particularly in color recognition tasks, suggesting that AI lacks the holistic understanding that humans possess when interpreting visual information [13][62]. Group 1: AI's Color Recognition Limitations - Recent tests revealed that advanced AI models, including Gemini 3 Pro and Claude Opus 4.5, failed to accurately identify numbers in color-blind tests, with responses like "74" and "8" instead of the correct "45" [5][6]. - The only model that succeeded was GPT 5.2 Thinking, which utilized a coding technique to visualize the numbers, indicating a reliance on external methods rather than genuine understanding [7]. Group 2: Human vs. AI Perception - Humans perceive images as cohesive wholes, quickly organizing visual information into meaningful patterns, while AI processes images in fragmented parts, leading to a lack of overall comprehension [22][56]. - The article references Gestalt psychology, emphasizing that humans naturally integrate visual elements into a unified perception, whereas AI struggles with this holistic approach [30][22]. Group 3: Research Findings - A study titled "Pixels, Patterns, but No Poetry: To See The World like Humans" concludes that current AI does not "see" the world like humans but rather computes it, lacking the ability to appreciate the abstract and meaningful connections between visual elements [13][14]. - The study employed a Turing Vision Test (TET) to evaluate AI's visual perception capabilities, revealing significant shortcomings in recognizing patterns and meanings in visual data [32][38]. Group 4: AI's Processing Mechanism - AI models analyze images by breaking them into small patches, focusing on local details rather than the overall context, which leads to a fragmented understanding of visual information [54][56]. - The Grad-CAM technique was used to visualize AI's attention during image processing, showing that AI often fixates on irrelevant details rather than the significant features necessary for accurate interpretation [39][41]. Group 5: Conclusion on AI's Visual Understanding - The article concludes that AI's inability to effectively prioritize and integrate visual information results in a form of "attention deficit," where it can identify colors and patterns but fails to construct a meaningful whole from them [62][60]. - This limitation highlights a fundamental difference between human cognition and AI processing, suggesting that while AI can mimic human intelligence, it lacks the wisdom to discern what is truly important in visual contexts [62][66].
Kimi K2.5登顶开源第一!15T数据训练秘籍公开,杨植麟剧透K3
量子位· 2026-02-03 00:37
Core Insights - Kimi K2.5 has achieved significant recognition, topping the Trending chart on Hugging Face with over 53,000 downloads [2] - The model excels in agent capabilities, outperforming flagship closed-source models like GPT-5.2 and Claude 4.5 Opus in various benchmark tests [3] - Kimi K2.5's technical report reveals its development process and innovative features [5] Group 1: Model Architecture and Training - Kimi K2.5 is built on the K2 architecture and has undergone continuous pre-training with 15 trillion mixed visual and text tokens [6] - The model adopts a native multimodal approach, allowing it to process visual signals and text logic within the same parameter space [7] - This extensive data training has led to synchronized enhancements in visual understanding and text reasoning, breaking the previous trade-off between the two [8] - Kimi K2.5 demonstrates high cost-effectiveness, achieving better performance than GPT-5.2 while consuming less than 5% of its resources [9] Group 2: Visual Programming and Debugging - The model has unlocked "visual programming" capabilities, enabling it to infer code directly from video streams [11] - Kimi K2.5 can accurately capture the dynamics of visual elements in videos and translate them into executable front-end code [12] - To address issues with code execution and styling, K2.5 integrates a self-visual debugging mechanism that verifies the rendered interface against expected outcomes [14] - If discrepancies are found, the model can autonomously query documentation to identify and correct issues [15] - This "generate-observe-query-fix" automated loop simulates a senior engineer's debugging process, allowing the model to independently complete end-to-end software engineering tasks [16] Group 3: Agent Swarm Architecture - Kimi K2.5 features an Agent Swarm architecture, capable of autonomously constructing digital teams of up to 100 agents for parallel task execution [17] - This system breaks down complex tasks into numerous concurrent subtasks, significantly reducing processing time [18] - The operation of this large team is managed by the PARL (Parallel Agent Reinforcement Learning) framework, which includes a core scheduler and multiple sub-agents [20][21] - The scheduler oversees task distribution, while sub-agents focus on efficiently executing specific instructions [22] - The design balances flexibility in planning with the logical rigor required for large-scale parallel operations [23] Group 4: Training and Efficiency - The training process employs a phased reward shaping strategy to encourage efficient division of labor among agents [25] - Initially, the focus is on incentivizing the scheduler for parallel exploration, gradually shifting to the success rate of tasks as training progresses [26] - This gradual approach fosters a mindset in the model to maximize concurrency while ensuring result accuracy [27] - Efficiency evaluation incorporates critical steps as a core metric, emphasizing the reduction of end-to-end wait times [28] Group 5: Future Developments and Community Engagement - Following the launch of K2.5, the founders of Moonlight appeared on Reddit for a 3-hour AMA, discussing the model's development and future plans [29] - The team hinted at the next-generation Kimi K3, which may be based on a linear attention mechanism, promising significant advancements [31] - They acknowledged that while they cannot guarantee a tenfold improvement, K3 will likely represent a qualitative leap over K2.5 [32] - The team also addressed the model's occasional misidentification as Claude, attributing it to the high-quality programming training data that included Claude's name [34] - The laboratory emphasizes that achieving AGI is not solely about increasing computational power but also about developing more efficient algorithms and smarter architectures [38]
榜单更新!Kimi 2.5表现突出|xbench月报
红杉汇· 2026-02-03 00:04
截至2026年1月底,xbench的3个leaderboard已完成分数更新。Kimi K2.5悉数上榜,表现突出。 xbench近期发布了2个新的benchmark,分别是用来评估模型多模态理解能力的BabyVision,和Agent复杂任务指 令遵循能力的AgentIF-OneDay。 • BabyVision:评估大模型的多模态视觉理解能力的基准评测集,https://xbench.org/agi/babyVision • AgentIF-OneDay:评估通用智能体在日常场景、多附件、复杂任务中的指令遵循基准评测集,https://xbench.or g/agi/agentif xbench采用长青评估机制,持续汇报最新模型的能力表现,更多榜单未来将陆续更新,期待你的关注。你可以 在xbench.org上追踪我们的工作和查看实时更新的Leaderboard榜单排名。如果公司已上线发布的产品想参与xben ch评测和Leaderboard榜单,欢迎通过team@xbench.org与我们取得联系,反馈意见。 xbench-ScienceQA Leaderboard更新 | | 模型名 | API | 模 ...
LeCun离职后不止创一份业!押注与大模型不同的路线,加入硅谷初创董事会
量子位· 2026-01-30 04:23
衡宇 发自 凹非寺 量子位 | 公众号 QbitAI 离开Meta这座围城后,Yann LeCun似乎悟了"不要把鸡蛋装在同一个篮子里"。 一边,他亲手打造了自己的初创公司AMI,试图在世界模型这条赛道上大展拳脚;同时,他的目光又投向了硅谷的另一角。 就在最近, LeCun正式宣布加入一家名为Logical Intelligence的初创公司,担任技术研究委员会的创始主席。 挺有意思的。因为Logical Intelligence选择了一条与当前主流大模型 (LLM) 截然不同的技术路线。 该公司主推的是一种 能量-推理模型,"更擅长学习、推理和自我纠正"。 在数独游戏测试上,Logical Intelligence推出的模型Kona不到1s就正确完成了数字填写, 而GPT 5.2、Claude Opus 4.5、Claude Sonnet 4.5都跑了100s了,还没个结果…… | さ | | KONA 1.0 EBM | | | | | | Done in 0.72s | V | GPT 5.2 Running. . . 99.10s DK | | --- | --- | --- | --- | --- ...
云资本支出前瞻_关键支出保障持续增长-Cloud Capex Preview_ mission-critical spend to ensure durable growth
2026-01-29 10:59
Accessible version US Semiconductors Cloud Capex Preview: mission-critical spend to ensure durable growth Industry Overview AI capex CY26/27 outlook now up +36%/+15% YoY Data center remains a key stronghold into Q4 semis earnings, with four major US hyperscalers (Google, Microsoft, Meta, Amazon) set to report in the coming weeks. Ahead of earnings, our tracker indicates Q4 global hyperscale capex at $141bn, up +9% QoQ or +59% YoY. For CY26/CY27, capex points to $641bn/$739bn or up +36%/+15% YoY now, up from ...