Workflow
量子位
icon
Search documents
IMO怒斥OpenAI自封夺金,“91位评委均未参与评分”,网友:炒作无下限
量子位· 2025-07-21 04:23
梦晨 鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI OpenAI声称新模型获得IMO金牌不到24小时,剧情就出现了大反转! 多位IMO官方人士和学界大佬纷纷发声,直指OpenAI的做法 "粗鲁且不恰当" 。 IMO主办方要求AI公司在闭幕式一周后再公布结果,让关注的焦点留在参赛的青少年上 ,然而OpenAI偏偏选择在闭幕式刚结束就急不可耐地 宣布了成绩。 有网友评价:OpenAI一如既往的为了炒作什么都干得出来。没有官方分数,没有耐心,也没有羞耻心。 更劲爆的是,OpenAI自称的"金牌"成绩可能根本站不住脚: OpenAI并不是与IMO合作测试模型的AI公司之一,91位IMO官方评委中没有任何人参与评估他们的答卷。这意味着,OpenAI的"金牌"成绩完 全是自说自话,没有经过官方认证。 原本以为是AI发展的里程碑时刻,没想到引发了一场关于学术道德和商业炒作的激烈争论。 他的发言 承认了OpenAI没有事先与IMO官方取得联系 ,只是在发布成绩之前告知了一位组织者,组织者要求他们在 闭幕式之后 再宣布成 绩。 IMO官方怒了:"请给孩子们留点空间" 事情的导火索来自一位IMO资深人士的爆料。 Jose ...
聊聊AI Coding的现状与未来|沙龙招募
量子位· 2025-07-21 02:17
林樾 发自 凹非寺 量子位|公众号 QbitAI Vibe Coding的概念让更多人能够以更低的门槛,将想法变为现实。但我们更想关注—— AI Coding到底多大程度提升了生产力? 从插件到AI原生IDE,从补全代码到自主编程,AI Coding已经以不同方式与形态嵌入到了工作流中。 AI Coding正在如何改变工作流?如何平衡效率与可靠性、安全性?如何看AI Coding未来的形态与协作方式? 8月上旬 ,我们将 在北京举办线下沙龙 ,希望聊聊 AI Coding的现状与未来 。如果你正在从事AI Coding相关工作或创业,或是AI Coding的资深用户,欢迎来和我们一起交流~ 沙龙简介 本次AI Coding沙龙将以行业代表 主题分享 、 圆桌对谈 为主要形式,与行业嘉宾、观众共同交流。 以AI Coding为代表的AI效率工具正在如何改变普通人思维模式? 做一个通用的AI Coding,最重要的产品能力是什么? AI Coding的终极形态是扮演什么样的角色? 希望邀请AI Coding产品及相关从业者来参与分享。 联系方式 活动负责人:王琳玉 微信:18801103170 邮箱:linyu@ ...
95后北大校友挑起ChatGPT Agent大梁!今年刚博士毕业,曾获陶哲轩支持的AIMO第二名
量子位· 2025-07-20 05:08
Core Viewpoint - The article highlights the significant presence of Chinese talent at OpenAI, particularly during a recent event where two Chinese individuals took center stage, showcasing their contributions to key projects like ChatGPT Agent and GPT-4 [2][8][34]. Group 1: Key Individuals - Zhiqing Sun, a 95-born graduate from Peking University, is the head of Deep Research at OpenAI and has made substantial contributions to various core projects within a short span of time [14][16]. - Casey Chu, a senior employee at OpenAI, has been involved in the development of multimodal AI systems and led the initial prototype development for GPT-4's visual input [29][31]. Group 2: Contributions and Achievements - Zhiqing Sun's research has garnered over 10,000 citations, with notable works including the RotatE method for knowledge graph embedding, which has been cited 3,231 times [21][23]. - Casey Chu has participated in the development of major projects like DALL·E 2 and GPT-4, with the GPT-4 technical report receiving 15,859 citations [31]. Group 3: Industry Dynamics - The article discusses the competitive landscape, noting that despite Meta's efforts to recruit talent from OpenAI, the presence of Chinese researchers remains strong, indicating a deep pool of talent that is difficult to deplete [34][36]. - The narrative also touches on the broader implications of talent migration within the AI industry, particularly the strategic moves by companies like Meta to secure top talent [48][50].
大模型自信心崩塌!谷歌DeepMind证实:反对意见让GPT-4o轻易放弃正确答案
量子位· 2025-07-20 05:08
Core Viewpoint - The research conducted by Google DeepMind and University College London reveals that large language models (LLMs) exhibit conflicting behaviors of being both confident and self-doubting, influenced by their sensitivity to opposing feedback [2][3][21]. Group 1: Model Behavior - LLMs tend to maintain their initial answers when they can see them, reflecting a human-like tendency to uphold one's viewpoint after making a decision [11][12]. - Conversely, when the initial answer is hidden, LLMs are more likely to change their answers, indicating an excessive sensitivity to opposing suggestions, even if those suggestions are incorrect [13][21]. - This behavior diverges from human cognition, as humans typically do not easily abandon their correct conclusions based on misleading information [15][21]. Group 2: Experimental Design - The study involved a two-round experiment where LLMs were first presented with a binary choice question and then received feedback from a fictional suggestion LLM [7][8]. - Key variables included whether the initial answer was visible to the responding LLM, which significantly affected the final decision-making process [9][10]. Group 3: Reasons for Inconsistent Behavior - The inconsistency in LLM responses is attributed to several factors: - Over-reliance on external feedback due to reinforcement learning from human feedback (RLHF), leading to a lack of independent judgment regarding the reliability of information [19][21]. - Decision-making based on statistical pattern matching rather than logical reasoning, making LLMs susceptible to misleading signals [19][21]. - The absence of a robust memory mechanism that would allow for deeper reasoning, resulting in a tendency to be swayed by opposing suggestions when the initial answer is not visible [21][22].
提速63%!中科院生成式渲染器突破效率瓶颈,一致性提升20%,破解具身数据荒难题
量子位· 2025-07-20 02:49
TC-Light团队 投稿 量子位 | 公众号 QbitAI 具身这么火,面向具身场景的生成式渲染器也来了。 中科院自动化所张兆翔教授团队研发的TC-Light,能够对具身训练任务中复杂和剧烈运动的长视频序列进行逼真的光照与纹理重渲染,同时具 备良好的时序一致性和低计算成本开销。 它能够帮助减少Sim2Real Gap以及实现Real2Real的数据增强,帮助获得具身智能训练所需的海量高质量数据。 论文Demo代码均已公开。 研究背景 光线及其与周围环境的交互共同塑造了人类以及具身智能体感知数字世界和现实世界的基本方式。 然而,在现实环境中采集不同光照与场景条件下的数据代价高昂,而仿真环境中尽管可以获得近乎无限的数据,但受限于算力资源,通常需要 对光线的多次折射衍射以及纹理精度进行近似和简化,使得视觉真实性无可避免地受到损失,在视觉层面产生Sim2Real Gap。 而如果能够借助生成式模型根据所需的光照条件对现实或仿真环境下采集到的视频数据进行重渲染,不仅够帮助获得增加已有真实数据的多样 性,并且能够弥合计算误差带来的CG感,使得从仿真器中能够得到视觉上高度真实的传感器数据,包括RL-CycleGAN在内的 ...
任务级奖励提升App Agent思考力,淘天提出Mobile-R1,3B模型可超32B
量子位· 2025-07-20 02:49
他们提出了个具有任务级奖励(Task-level Reward)的交互式强化学习框架,即Mobile-R1。 而这些奖励只能引导代理预测每一步中最佳的单一动作,因此难以应对不断变化的移动环境。 比如一句指令:"打开飞猪,进入酒店套餐,进入热门直播,找到飞猪超级VIP,并关注主播"。Qwen2.5-VL-3B-Instruct在第二步失败。 淘天集团算法技术-未来生活实验室&点淘算法团队联合提出,采用多回合、任务导向的学习方式,结合在线学习和轨迹纠错,也许能提高 Agent的适应性和探索能力。 Mobile-R1团队 投稿 量子位 | 公众号 QbitAI 现有Mobile/APP Agent的工作可以适应实时环境,并执行动作,但由于它们大部分都仅依赖于动作级奖励(SFT或RL)。 △ 轨迹数据集构造流程 为了确保训练的稳定性,团队提出了一个三阶段训练过程:格式微调、动作级训练和任务级训练。此外引入新的中文基准和高质量轨迹数据 集,证明了该方法在移动代理领域的有效性。 结果Mobile-R1顺利地完成了这一任务。 轨迹数据集 团队使用Qwen2.5-VL-3B执行一系列任务获得初始轨迹,并人工标注这些初始轨迹, ...
陶哲轩回应OpenAI新模型IMO夺金!GPT-5测试版也曝光了
量子位· 2025-07-20 02:49
Core Insights - OpenAI's latest model achieved a gold medal level at the 2025 International Mathematical Olympiad (IMO), solving 5 out of 6 problems and scoring 35 points out of a possible 42, surpassing this year's gold medal threshold [1][2][11][12]. Group 1: Model Performance - The model's performance was evaluated under conditions identical to human participants, with two 4.5-hour exams, without any tools or internet access, requiring natural language explanations for solutions [9][11]. - The gold medal score of 35 points aligns with the human participant results, where only 5 out of approximately 600 competitors achieved full marks this year [12]. - The evaluation process was rigorous, with each solution assessed by three former IMO medalists, ensuring consensus before final scoring [13]. Group 2: Breakthrough Significance - The achievement signifies a new level of creative thinking in problem-solving, with the model demonstrating rapid progress in reasoning time across various benchmarks, culminating in tackling the IMO's complex problems [14]. - The model's success indicates a departure from traditional reinforcement learning methods, showcasing its ability to construct intricate proofs akin to human mathematicians [14]. Group 3: Upcoming Developments - Alexander Wei from OpenAI indicated that GPT-5 is set to be released soon, although the IMO gold medal model remains an experimental research project with no immediate plans for public release [3][8]. - The discovery of the code "GPT-5-reasoning-alpha-2025-07-13" in third-party repositories suggests that GPT-5 is on the horizon [6][8]. Group 4: Community Reactions - The announcement of the model's success sparked significant discussion within the AI community, with notable mathematician Terence Tao expressing skepticism about the comparability of AI performance due to the lack of standardized testing environments [23][24]. - Tao emphasized that AI capabilities are influenced by various factors, including resources and methodologies, making it challenging to quantify performance uniformly [25][26]. Group 5: Independent Evaluations - The MathArena platform conducted independent assessments, revealing that even the best-performing models, such as Gemini 2.5 Pro, scored only 13 points (31%), far below the bronze medal threshold [34][35]. - The MathArena team expressed the need for transparency regarding OpenAI's methodology to validate the reported results [37].
AI打假AI,拿下SOTA丨厦大&腾讯优图
量子位· 2025-07-20 02:49
Core Viewpoint - The article discusses the innovative AIGI-Holmes method developed by Xiamen University and Tencent Youtu Lab for detecting AI-generated images, addressing the challenges of interpretability and generalization in existing detection models [2][12][36]. Group 1: Methodology - AIGI-Holmes employs a "large model + visual expert" collaborative architecture to enhance image detection capabilities [2][5]. - The method includes a dual-visual encoder architecture that integrates NPR visual experts to process both high-level semantics and low-level visual features [6]. - The Holmes Pipeline consists of three training phases: visual expert pre-training, supervised fine-tuning (SFT), and direct preference optimization (DPO) [7][22]. Group 2: Key Innovations - The AIGI-Holmes method addresses two critical bottlenecks in existing detection technologies: lack of interpretability and limited generalization capabilities [12][36]. - A new dataset, Holmes-Set, was constructed containing 45,000 images and 20,000 annotations to improve data scarcity issues, covering various types of generation defects [15][18]. - The model architecture includes a collaborative decoding strategy that merges predictions from visual experts and the large language model to enhance detection accuracy [8][25]. Group 3: Performance Evaluation - Experimental results indicate that AIGI-Holmes outperforms existing methods across all benchmarks in detection accuracy and interpretability [10][29]. - The model achieved optimal results in objective metrics (BLEU/ROUGE/METEOR/CIDEr) and subjective evaluations compared to current advanced models [31]. - In robustness tests against common distortions like JPEG compression and Gaussian blur, AIGI-Holmes maintained superior detection accuracy compared to other baseline methods [33][35]. Group 4: Future Directions - The team acknowledges limitations such as the hallucination problem, where the model may misinterpret normal features as defects, and the need for more granular understanding of visual defects [36][39]. - Future work will focus on addressing the hallucination issue, enhancing fine-grained understanding capabilities, and developing objective evaluation metrics for visual defect explanations [39].
英伟达GPU被曝严重漏洞,致模型准确率暴跌99.9%
量子位· 2025-07-20 02:49
克雷西 henry 发自 凹非寺 量子位 | 公众号 QbitAI 目前,研究人员已经在英伟达RTX A6000上成功测试了这种攻击,但不排除其他型号也可能受到影响。 英伟达这边建议用户实施一项防御措施,但这种措施会让模型性能下降10%。 那么,这个漏洞到底是怎么一回事呢? 不是Bug,而是"物理攻击" 英伟达GPU,被白帽黑客发现了严重漏洞。 通过一种名为GPUHammer的攻击方式,可以让GPU上跑的大模型,准确率从80%直接掉到0.02%,可以说是渣都不剩。 多伦多大学的研究人员形容,这种攻击就像在模型中引发灾难性的脑损伤。 GPUHammer是首个成功攻击GPU显存的Rowhammer攻击。 它并不是通过代码篡改模型文件,而是直接对你的显存"物理动手"。 它属于Rowhammer攻击的一类:攻击者通过反复"敲击"内存某一行,引发相邻行中的比特翻转(从0变成1,从1变成0),从而悄悄篡改数 据。 而在云机器学习平台或VDI设置等共享GPU环境中,恶意租户可能会对相邻的工作负载发起GPUHammer攻击,从而影响推理准确性或破坏 缓存的模型参数。 可以说,GPUHammer对AI时代的基础设施有着毁灭性的 ...
无需NeRF/高斯点后处理,视频秒变游戏模型成现实!新方法平均每帧仅需60秒 | ICCV 2025
量子位· 2025-07-19 05:15
Core Viewpoint - The article discusses a new method called V2M4 developed by a research team from KAUST, which enables the direct generation of usable 4D mesh animations from monocular video, significantly improving the efficiency and usability of animation and game content generation [1][6]. Summary by Sections Method Overview - V2M4 constructs a systematic multi-stage process that includes camera trajectory recovery, appearance optimization, topology unification, and texture synthesis, allowing videos to be transformed into models quickly [2][6]. Performance Metrics - The generated appearance and structure are highly restored, with an average processing time of about 60 seconds per frame, which is significantly faster than existing methods. It also supports "long videos," performing well even on videos with a duration of 300 frames [4][20]. Challenges in Video to Animation Conversion - Traditionally, converting a video into continuous animated mesh assets has been a long-standing challenge in visual computing, requiring high-cost methods like multi-camera setups and motion capture. Implicit methods like NeRF can replicate appearance but struggle to output topologically consistent explicit meshes [4][5]. Camera Trajectory Recovery - V2M4 employs a three-stage camera estimation strategy to reconstruct the camera perspective for each video frame, converting "camera motion" into "mesh motion" to accurately model dynamic scenes [10][11]. Appearance Consistency Optimization - To address appearance discrepancies, V2M4 utilizes a strategy from image editing called null text optimization to fine-tune the conditional embeddings of the generation network, enhancing the visual fidelity of the generated meshes [13][15]. Topology Unification - V2M4 introduces a frame-by-frame registration and topology unification mechanism, ensuring that all frames maintain a consistent topology, which is crucial for subsequent texture generation and temporal interpolation [16]. Texture Consistency Optimization - A shared global texture map is constructed for all frames to eliminate flickering and discontinuities, ensuring a smooth visual experience throughout the animation [17]. Animation Export - The method includes time interpolation and structural encapsulation of the generated mesh sequences, resulting in a smooth animation sequence that can be exported as a GLTF-compliant file for use in mainstream graphics and game engines [18]. Performance Validation - V2M4's performance is evaluated on challenging video data, demonstrating comprehensive advantages in reconstruction quality, operational efficiency, and generalization capabilities [19][20]. Visual Comparison - The visual results show that V2M4 generates meshes with superior rendering details, normal structures, and inter-frame consistency, achieving high fidelity and stable generation of continuous animations [21].