多模态推理
Search documents
深夜,3万亿美元巨头大涨
Shang Hai Zheng Quan Bao· 2025-11-19 15:45
| Us 谷歌-A Ai) [] Q | | --- | | GOOGL | | 302.860 今天 287.310 最高 303.680 最低 286.630 | | 6.54% 18.580 换手 0.43% 总量 2497万股 金额 74.00亿 | | 曲ミ 昨收 284.280 总值 3.65万亿 市盈 " 29.41 | | 盘前 287.450 3.170 1.12% 09:30 美东 √ | | 分时 五日 日K 周K 目K 更多, | | 均价:296.401 最新:302.860 18.580 6.54% | | 303.300 6.72% 卖1 302.880 20 | | 买1 302.840 18 | | 时 价 0日 | | 10:22302.890 100 | | 10:22302.890 100 | | 10:22302.960↑ 100 284-280 - | | 10:22302.970↑ 100 | | 10:22302.950↓ 100 | | 10:22302.940↓ 1000 | | 10:22302.920↓ 100 | | 10:22302.900↓ 200 ...
Gemini 3.0发布:从“工具辅助”到“主动代理”,谷歌做了这几点
Tai Mei Ti A P P· 2025-11-19 00:32
据悉,Gemini 3将被整合进Gemini应用、谷歌的AI搜索产品AI Mode和AI Overviews,以及其企业级产 品。该模型将自周二起向部分订阅用户开放,并将在未来几周更大范围上线。 在2025年11月的财报电话会议上,谷歌CEO桑达尔・皮查伊就已确认Gemini 3的发布计划,他当时强 调:"前沿模型的进一步发展需要更多时间,我们既要追求迭代速度,更要确保显著的能力突破"。这 种"慢工出细活"的策略,在Gemini 3的产品形态中得到充分体现——它不是对2.5 Pro的简单微调,而是 从架构、能力到生态的全面重构。 推理是AI解决复杂问题的核心,Gemini 3在这一领域实现了双重突破:基础性能的全面提升与推理模式 的产品化创新。在基础推理能力上,Gemini 3 Pro在多个权威基准测试中创下新高:GPQA Diamond (研究生级推理)测试准确率达91.9%,Humanity's Last Exam(多步逻辑推理)无工具状态下得分 37.5%,SimpleQA Verified(事实准确性)以72.1%的分数领跑业界。 这些数据意味着模型在科学研究、专业咨询等需要深度思考的场景中,可靠性达 ...
Gemini3 正式发布
小熊跑的快· 2025-11-19 00:09
Core Insights - Google has officially launched Gemini 3, the most powerful multimodal understanding model to date, enhancing interactive experiences and reasoning capabilities [1][4] - Gemini 3 Pro and Gemini 3 Deep Think are key versions, with the latter showing superior performance in reasoning tasks [4][10] Performance Metrics - Gemini 3 Pro achieved a score of 1501 Elo, ranking first on the LMArena leaderboard, and demonstrated doctoral-level reasoning with a 37.5% score on Humanity's Last Exam [1][3] - In various benchmarks, Gemini 3 Pro outperformed previous models, achieving 91.9% on GPQA Diamond and 23.4% on MathArena Apex [3][4] - Gemini 3 Deep Think further improved performance, scoring 41.0% on Humanity's Last Exam and 93.8% on GPQA Diamond [4] Multimodal Capabilities - Gemini 3 is designed to seamlessly integrate information across text, images, videos, audio, and code, pushing the boundaries of multimodal reasoning [6] - It can generate interactive learning materials and analyze performance in various activities, such as sports [7] Developer Tools and Platforms - Gemini 3 enhances developer efficiency through vibe coding and agentic coding, leading to significant improvements in software development tasks [8][10] - Google Antigravity, a new development platform, allows developers to build in a task-oriented manner, transforming AI into a proactive partner [9][10] User Experience - Google AI Ultra subscribers can access Gemini's advanced capabilities, enabling more effective long-term planning and task execution [11]
Gemini 3深夜来袭:力压GPT 5.1,大模型谷歌时代来了
3 6 Ke· 2025-11-19 00:04
Core Insights - The release of Gemini 3 has generated significant anticipation within the AI community, marking a pivotal moment for Google in the AI landscape [1][4][5] - Gemini 3 is positioned as a major step towards AGI, showcasing advanced multimodal understanding and interaction capabilities [6][10] - The model has set new SOTA standards in various AI benchmarks, outperforming its predecessor Gemini 2.5 Pro and competing models like Claude Sonnet 4.5 and GPT-5.1 [7][8] Model Performance - Gemini 3 Pro achieved a groundbreaking Elo score of 1501 on the LMArena Leaderboard, demonstrating exceptional reasoning capabilities [7] - In key benchmarks, Gemini 3 Pro scored 37.5% in Humanity's Last Exam (no tools), 91.9% in GPQA Diamond, and 23.4% in MathArena Apex, establishing new standards in academic reasoning and mathematics [8] - The model excelled in multimodal reasoning, scoring 81% in MMMU-Pro and 87.6% in Video-MMMU, indicating its proficiency in understanding complex scientific charts and dynamic video streams [7][8] Interaction and Usability - Gemini 3 Pro has improved interaction quality, providing concise and direct responses, thus acting as a true thinking partner [9] - The Deep Think mode enhances reasoning and multimodal understanding, achieving scores of 41.0% in Humanity's Last Exam and 93.8% in GPQA Diamond [10][13] - The model supports various learning modalities, allowing users to learn through text, images, videos, and code, thus broadening its application [14][15] Development and Integration - Gemini 3 is designed to facilitate developers in transforming ideas into reality, excelling in zero-shot generation and interactive web UI rendering [16] - The model ranks first in the WebDev Arena with an Elo score of 1487, showcasing its capabilities in web development tasks [16] - Google Antigravity, a new development platform, allows developers to leverage Gemini 3 for building applications with enhanced interactivity and visual effects [24][17] Market Impact and Adoption - Gemini 3 is now available for general users and developers through various platforms, indicating a strategic move to enhance user engagement [19] - The model's pricing structure is based on context length, with specific rates for tasks under and over 200k tokens [21] - Google has seen a resurgence in market confidence, with significant user engagement metrics, including 2 billion monthly active users for AI Overviews and 650 million for Gemini applications [34][36]
2025 全球机器学习技术大会 100% 议程出炉,顶级嘉宾阵容 + 参会指南一键获取
AI科技大本营· 2025-10-14 11:14
Core Insights - The 2025 Global Machine Learning Technology Conference will be held on October 16-17 in Beijing, featuring prominent figures from the AI industry, including researchers from OpenAI and other leading tech companies [1][3][11]. Group 1: Conference Overview - The conference will gather experts from top tech companies and research institutions to discuss cutting-edge topics such as large models, intelligent agent engineering, and multimodal reasoning [3][12]. - Keynote speakers include Lukasz Kaiser, co-founder of GPT-5 and GPT-4, and Li Jianzhong, Vice President of CSDN, who will present insights on AI industry paradigms and the evolution of large models [4][5]. Group 2: Key Presentations - Li Jianzhong will present on "Large Model Technology Insights and AI Industry Paradigm Insights," focusing on the technological evolution driven by large models [4]. - Michael Wong will discuss the "AI Platform Paradox," analyzing the reasons behind the failures of many open-source AI ecosystems and how to create a thriving environment [4]. Group 3: Roundtable Discussions - A roundtable titled "Core Issues in AI Industry Paradigm Shift" will feature discussions among industry leaders on the evolution of AI paradigms and the challenges of technology implementation [10]. - Participants include Li Jianzhong, Wang Bin from Xiaomi, and other notable scientists, fostering a high-density exchange of ideas [10]. Group 4: Afternoon Sessions - The afternoon sessions on October 16 will cover various topics, including the evolution of large language models, intelligent agent engineering, and AI-enabled software development [12][18]. - Notable speakers include experts from ByteDance, Tencent, and other leading firms, sharing their latest breakthroughs and insights [13][19]. Group 5: Second Day Highlights - The second day will feature multiple specialized sessions on embodied intelligence, AI infrastructure, and practical applications of large models [18][19]. - Key presentations will include discussions on the next generation of AI agents and the integration of AI technologies in various industries [20][22].
永别了,人类冠军,AI横扫天文奥赛,GPT-5得分远超金牌选手2.7倍
3 6 Ke· 2025-10-12 23:57
Core Insights - AI models GPT-5 and Gemini 2.5 Pro achieved gold medal levels in the International Olympiad on Astronomy and Astrophysics (IOAA), outperforming human competitors in theoretical and data analysis tests [1][3][10] Performance Summary - In the theoretical exams, Gemini 2.5 Pro scored 85.6% overall, while GPT-5 scored 84.2% [4][21] - In the data analysis exams, GPT-5 achieved a score of 88.5%, significantly higher than Gemini 2.5 Pro's 75.7% [5][31] - The performance of AI models in the IOAA 2025 was remarkable, with GPT-5 scoring 86.8%, which is 443% above the median, and Gemini 2.5 Pro scoring 83.0%, 323% above the median [22] Comparative Analysis - The AI models consistently ranked among the top performers, with GPT-5 and Gemini 2.5 Pro surpassing the best human competitors in several years of the competition [40][39] - The models demonstrated strong capabilities in physics and mathematics but struggled with geometric and spatial reasoning, particularly in the 2024 exams where geometry questions were predominant [44][45] Error Analysis - The primary sources of errors in the theoretical exams were conceptual mistakes and geometric/spatial reasoning errors, which accounted for 60-70% of total score losses [51][54] - In the data analysis exams, errors were more evenly distributed across categories, with significant issues in plotting and interpreting graphs [64] Future Directions - The research highlights the need for improved multimodal reasoning capabilities in AI models, particularly in spatial and temporal reasoning, to enhance their performance in astronomy-related problem-solving [49][62]
Meta刚从OpenAI挖走了清华校友宋飏
36氪· 2025-09-26 13:35
Core Viewpoint - The recent hiring of Yang Song, a key figure in diffusion models and an early contributor to DALL·E 2, by Meta Superintelligence Labs (MSL) signals a strategic move in the AI competition, enhancing MSL's talent pool and research capabilities [2][3][11]. Group 1: Talent Acquisition and Team Structure - Yang Song's addition to MSL strengthens the "dual-core" structure of the team, with one leader managing overall strategy and the other focusing on critical paths in research [16]. - The team composition is becoming clearer, with a more structured division of research responsibilities [17]. - Since summer, over 11 researchers from OpenAI, Google, and Anthropic have joined MSL, indicating a high-frequency recruitment strategy [20]. Group 2: Industry Trends and Dynamics - The rapid turnover of talent among top AI labs is becoming more common, reflecting a shift towards project compatibility and team dynamics as key factors in employment decisions [25]. - The relationship between researchers and labs is evolving into a "mutual pursuit," where both parties seek alignment in goals and capabilities [47]. - The competition for AI talent is intensifying, with increasing demands on researchers to understand cross-modal capabilities and complete data workflows [48]. Group 3: Research Focus and Strategic Alignment - Yang Song's research on diffusion models aligns closely with MSL's strategic direction, aiming to develop universal models that can understand various data forms [28][30]. - The integration of Yang Song's expertise is expected to enhance MSL's ability to create a comprehensive AI product system, accelerating the formation of a complete technical loop from modeling to execution [32][41]. - Meta is not only attracting top talent but is also working to transform these capabilities into organizational and product-level resources [44].
突发,Meta刚从OpenAI挖走了清华校友宋飏
3 6 Ke· 2025-09-25 11:56
Core Insights - Meta has successfully recruited Song Yang, a key figure in diffusion models and an early contributor to DALL·E 2 technology, to lead research at Meta Superintelligence Labs (MSL) [1][12][29] - This recruitment signals a strategic shift for Meta, indicating a move towards a more collaborative team structure rather than relying solely on individual talent [12][13] Group 1: Team Dynamics - The combination of Song Yang and Shengjia Zhao represents a transition for MSL from a focus on individual excellence to a more coordinated team approach [12][13] - Both individuals share a strong academic background, having studied at Tsinghua University and Stanford, and have significant experience at OpenAI [13][14] - The team structure is becoming clearer, with defined roles that enhance research efficiency and collaboration [13][29] Group 2: Talent Acquisition Trends - Meta's recruitment pace has accelerated, with over 11 researchers from OpenAI, Google, and Anthropic joining MSL since summer [14][18] - There is a notable trend of talent movement among top AI labs, indicating that project alignment and team culture are becoming critical factors in employment decisions [14][18] - The departure of some researchers, such as Aurko Roy, highlights the competitive nature of talent retention in the AI sector [14][18] Group 3: Strategic Focus - Song Yang's research aligns closely with MSL's strategic direction, particularly in multi-modal reasoning and the development of general models that can process various data types [18][29] - His expertise in diffusion models is expected to enhance MSL's capabilities in generative AI, contributing to a more integrated research approach [18][28] - The ongoing evolution of AI projects necessitates a deeper understanding of cross-modal interactions and the integration of research into practical applications [29]
阿里开源Qwen3-VL系列旗舰模型 包含两个版本
Di Yi Cai Jing· 2025-09-25 06:08
Core Insights - Alibaba has launched the upgraded Qwen3-VL series, which is the most powerful visual understanding model in the Qwen series to date [1] - The flagship model, Qwen3-VL-235B-A22B, has been open-sourced, featuring both Instruct and Thinking versions [1] - The Instruct version outperforms or matches the performance of Gemini 2.5 Pro in several mainstream visual perception evaluations [1] - The Thinking version achieves state-of-the-art (SOTA) performance across various multimodal reasoning benchmarks [1]
紫东太初4.0发布 国产大模型迈向“边看、边识、边思”新阶段
Di Yi Cai Jing· 2025-09-19 16:08
Core Insights - The first fully domestically developed deep reasoning model "Zidong Taichu" 4.0 was launched in Wuhan on September 19, showcasing advanced multimodal reasoning capabilities that surpass GPT-5, particularly in complex reasoning and tool usage with visual inputs [1][4]. Model Development - "Zidong Taichu" 4.0 represents a significant evolution from its predecessor 3.0, transitioning from "pure text reasoning" to "fine-grained multimodal semantic reasoning," achieving a threefold leap in capabilities [3][5]. - The model can perform complex reasoning tasks, such as determining the number of shots needed to win a snooker game by analyzing images of the game table [3]. Performance Metrics - In video multimodal applications, "Zidong Taichu" 4.0 can deeply understand 180-minute long videos, achieving state-of-the-art performance across six tasks, including video Q&A and content summarization [4]. - The model's reasoning speed has improved by approximately 15% compared to version 3.0, enhancing its application in industrial settings, such as high-precision laser welding technology [4][6]. Technological Innovations - The model incorporates three core technological innovations: low-cost data synthesis for real events, critical multi-round reflective learning, and difficulty-sensitive adaptive reinforcement learning, which collectively enhance training efficiency and reasoning performance by about 15% [5][6]. Industry Impact - The launch of the "Zidong Taichu Cloud" platform aims to convert the model's technological advantages into industrial value, providing comprehensive support for enterprises in computing power, application development, and implementation [6]. - The platform is positioned as the first native collaborative cloud for multimodal large models in China, facilitating the integration of AI capabilities into core business operations [6]. Economic Context - The current era is characterized as a computing power economy, where computing power, data, and algorithms are key resources driving the digital economy, highlighting the need for rapid iteration and widespread application of AI technologies [6].