量子位
Search documents
3D生成「ImageNet」来了!腾讯混元开源HY3D-Bench
量子位· 2026-02-06 10:10
腾讯混元团队 投稿 量子位 | 公众号 QbitAI 3D生成如今在可用性上已经达到了一眼惊艳的程度。 但数据质量参差、评估标准缺失、长尾类别覆盖不足这三大痛点,依然困扰着该领域的研究者们。 该工作通过自动化数据清洗流水线,从Objaverse等大规模原始库中筛选并处理了 25.2万 个高质量3D资产,提供包括水密网格、多视角渲染 图像在内的"即用型"数据集,同时还包含 24万 个3D部件分解结果,显著降低了3D生成模型的训练门槛。 另外,为补充学术数据集多样性不足,创新性地引入AIGC驱动合成管道,利用LLM生成语义描述、扩散模型生成图像,并通过HY3D-3.0引 擎转化为高保真3D资产,均匀覆盖了1252个类别,平衡了常见类别和长尾类别数据分布的差异。 实验显示,基于该基准的轻量级模型(Hunyuan3D-2.1-Small)在生成质量和推理速度上均优于传统方法,该数据集为机器人仿真、虚拟现 实等下游应用提供了坚实的数据基石。 数据集组成 高质量基准数据集的可用性始终是3D生成模型发展的核心制约因素。早期基准数据集如ShapeNet虽为3D生成研究奠定基础,但存在 类别覆 盖失衡、几何结构简单、数据量不足 ...
清华研究生开源大一统世界模型:性能超越硅谷标杆40%!
量子位· 2026-02-06 10:10
金磊 发自 凹非寺 量子位 | 公众号 QbitAI 这就是由 生数科技 联合 清华大学 ,正式开源的大一统世界模型—— Motus 。 项目主要负责人,是来自清华大学计算机系朱军教授TSAIL实验室的二年级硕士生 毕弘喆 和三年级博士生 谭恒楷 。 之所以说是大一统,是因为Motus在架构上,直接把VLA(视觉-语言-动作)、世界模型、视频生成、逆动力学、视频-动作联合预测这五种具 身智能范式, 首次 实现了"看-想-动"的完美闭环。 而且在50项通用任务的测试中,Motus的绝对成功率比国际顶尖的 Pi-0.5 提升了 35% 以上,最高提升幅度甚至达到了 40%! 在Motus的加持之下,现在的机器人已经具备了 预测未来 的能力。 国产开源 具身世界模型 ,直接秒了Pi-0.5,而且还是几位 清华硕、博士研究生 领衔推出的。 瞧, Cloudflare人机验证 任务,机器人可以轻松拿捏: 从视频中不难看出,面对形状不规则的曲面鼠标,Motus控制的机械臂不仅能精准识别,还能根据鼠标与屏幕点击框的距离,平稳连续地移 动,最后极度精准地完成点击。 再如长程多步推理的 孔明棋 任务,Motus同样展现出了严密 ...
量子位编辑作者招聘
量子位· 2026-02-06 10:10
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 加入我们,你可以获得: 以下是岗位详情: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 AI产业方向 AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? AI财经商业方向 岗位职责: 任职要求: 岗位职责: AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术、新工具应用于工作,提升工作效率和创造 ...
GPT-5.3上线Codex!OpenAI回应Claude新模型只用了15分钟
量子位· 2026-02-06 02:30
Core Viewpoint - The article discusses the release of OpenAI's latest programming model, GPT-5.3-Codex, which competes with Anthropic's Claude Opus 4.6, highlighting significant improvements in coding capabilities and user interface design [1][13]. Group 1: Model Improvements - GPT-5.3-Codex shows enhanced aesthetic appeal and design in its demos, including a racing game and a diving game [2][6]. - The model has demonstrated superior performance in various benchmarks, achieving 57% in SWE-Bench Pro, 76% in TerminalBench 2.0, and 64% in OSWorld [11][12]. - It has improved efficiency, requiring less than half the tokens compared to its predecessor for the same tasks, with a speed increase of over 25% [11][22]. Group 2: Functional Capabilities - GPT-5.3-Codex excels in computer use, assisting financial professionals in creating presentations and handling complex tasks like document writing and spreadsheet management [9][23]. - The model's ability to self-accelerate during its training process marks a significant advancement, allowing it to monitor and debug its own training tasks [28][29]. Group 3: Business Applications - OpenAI is launching Frontier, a platform aimed at integrating AI into corporate workflows, with notable companies like HP and Uber already adopting it [34][38]. - The AI4S initiative, in collaboration with Ginkgo, aims to reduce protein synthesis costs by 40% using GPT-5, showcasing the model's application in synthetic biology [39][41].
10万Agent在Moltbook娱乐空谈,小冰之父出手造了个生产力实干版
量子位· 2026-02-06 02:30
Core Viewpoint - The article discusses the emergence of the "Moltbook" community and the "Tuanzi" platform, highlighting the shift towards multi-agent systems that enhance human productivity and decision-making, moving beyond mere entertainment in AI [1][3][4]. Group 1: Multi-Agent Systems - The term "multi-agent" has quickly become a buzzword in the industry, indicating a growing interest in collective intelligence and collaborative problem-solving [2]. - The "Tuanzi" platform allows users to engage with multiple agents that act like expert teams, providing debate, challenge, and reflection on complex issues, thus facilitating clearer understanding and decision-making [4][6]. Group 2: User Experience and Functionality - The interface of the Tuanzi platform is designed to be simple and intuitive, allowing users to input questions and tag different agent teams for assistance [7]. - Users can interact with a 40-member "sister team" that provides collective insights and strategies for personal dilemmas, showcasing the platform's ability to generate diverse perspectives [12][18]. Group 3: Analytical Depth - Agents on the platform analyze user queries from various angles, including emotional, psychological, and professional perspectives, leading to comprehensive decision-making frameworks [29][31]. - The platform emphasizes the importance of understanding both explicit and implicit needs, providing insights that go beyond surface-level responses [29][49]. Group 4: Group Intelligence and Decision-Making - The Nextie team has developed a framework for evaluating group intelligence, focusing on completeness of perspectives, implicit need satisfaction, dialectical depth, actionability, and decision explainability [78]. - The group intelligence approach aims to mitigate cognitive biases by incorporating diverse viewpoints and experiences, thus enhancing the quality of decision-making [73][74]. Group 5: Future Directions and Innovations - Nextie plans to continue evolving the Tuanzi platform with regular updates, introducing new roles and capabilities, including a "group simulation team" to model potential real-world outcomes of decisions [102]. - The company is also exploring funding opportunities to expand its operations and enhance its offerings, indicating a proactive approach to growth in the AI sector [99][100].
陈丹琦入职Mira翁荔公司,原来是有IOI三金王赛友
量子位· 2026-02-06 00:15
鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI 陈丹琦首次转身工业界,第一站就选择Mira初创的理由找到了—— 有个赛友也在这儿,还足足"潜伏"了一年之久。 这人就是和陈丹琦同年拿下IOI金牌的 Neal Wu 。 还不止一届,Neal Wu可是足足拿了三次IOI金牌,是美国队当之无愧的顶梁柱。 他还是全球首个AI程序员、此前炸翻硅谷的 Devin 缔造者之一。 而他的存在,原本一直被Mira视作 顶级机密 来着。 直到这场公司内讧,多名创始人集体"叛逃"回OpenAI,这位传奇程序员的行踪才意外浮出水面。 不过相对于老朋友陈丹琦,Neal Wu则显得更为低调。 其公开资料中从未透露过具体职位,仅隐晦地表示自己正在以联合创始人兼顾问的身份参与一项新计划。 开始时间是一年前,和当初Mira宣布成立新公司的时间线高度重合。 08年同为金牌的 陈丹琦 ,目前是普林斯顿大学计算机系副教授,以及NLP小组的联合负责人,还曾收获斯隆奖。 有趣的是,以前是对手现在成战友。 那么,Neal Wu究竟有什么过人之处,值得Mira如此大费周章地将他 "藏" 起来? Neal Wu其人 翻开Neal Wu的履历,可谓是天才少 ...
Claude新模型4.6来了!更多饭碗没了:华尔街财务、编译器、安全白帽、PPT…通通失守
量子位· 2026-02-06 00:15
Core Viewpoint - Anthropic's new model, Claude Opus 4.6, has significantly impacted the market, causing declines in major financial data service providers and indices due to concerns over AI's potential to disrupt various industries [1][2][3]. Model Performance - Claude Opus 4.6 outperforms OpenAI's GPT-5.2 by 144 Elo in the GDPval-AA evaluation, indicating superior performance in financial analysis and research tasks [7][42]. - In programming capabilities, Opus 4.6 achieved the highest score in the Terminal-Bench 2.0 assessment, demonstrating its advanced task planning and debugging abilities [30][31]. New Features - The model introduces a 1M token context window, significantly improving its ability to handle long texts and reducing context decay [12][14]. - Opus 4.6 features Adaptive Thinking, allowing it to autonomously determine when to engage in deep reasoning, enhancing its flexibility in various tasks [19][20]. - Context Compaction is a new feature that summarizes and replaces old content when approaching context limits, facilitating longer conversations and tasks [23][24]. Pricing and Accessibility - The pricing for Opus 4.6 remains unchanged at $5 per million tokens for input and $25 for output, with additional charges for exceeding 200k tokens in the 10M token context version [11][50][51]. Security and Ethical Considerations - Opus 4.6 has demonstrated unexpected capabilities in cybersecurity, identifying over 500 previously unknown high-risk zero-day vulnerabilities during testing [62][63]. - Anthropic has implemented new security detection mechanisms to mitigate potential misuse of these capabilities [68]. Development and Testing - The model has been developed using its own capabilities, with Anthropic engineers utilizing Claude Code for internal projects, indicating a self-reinforcing development cycle [69].
邓明扬一作论文改写生成范式!何恺明也署名了
量子位· 2026-02-05 11:20
Core Viewpoint - The article discusses the introduction of a new generative model paradigm called Drifting Models, proposed by He Kaiming's team, which shifts the distribution evolution process from the inference stage to the training stage, enabling one-step generation of high-quality samples [1][4][36]. Summary by Sections Introduction of Drifting Models - The Drifting Model represents a significant innovation in generative modeling by introducing the "Drifting Field" mechanism, which aligns the prior distribution with the real data distribution during training, eliminating common instabilities in GANs and avoiding reliance on multi-step ODE/SDE solutions [5][12][19]. Mechanism of Drifting Models - The core of the Drifting Model is to learn a mapping function that transforms a simple prior distribution (like Gaussian noise) into a pushforward distribution that matches real data [9][10]. - Unlike traditional models that require multiple iterations during inference, the Drifting Model allows for single-step generation by leveraging the iterative nature of neural network training as the driving force for distribution evolution [14][18]. Training Process - The training process involves calculating a drift vector for each sample based on the distribution of positive and negative samples, guiding the model to align its output distribution with the target distribution [21][26]. - The model's training trajectory is essentially equivalent to the path of distribution evolution, allowing for high-quality generation with only a single forward pass during inference [18][36]. Experimental Results - In the ImageNet 256x256 benchmark, the Drifting Model achieved a FID score of 1.54 in latent space and 1.61 in pixel space during one-step inference, outperforming many traditional diffusion models that require hundreds of iterations [32][33]. - The model also demonstrated strong generalization capabilities in embodied intelligence control tasks, matching or exceeding the decision quality of diffusion policies that require significantly more inference steps [34][35]. Conclusion - The Drifting Model successfully transfers the generative pressure from the inference stage to the training stage, providing a new perspective on generative modeling that reinterprets the training process as a mechanism for distribution evolution [36][37].
Claude一个插件吓哭华尔街,软件公司集体暴跌,2万亿元一日蒸发
量子位· 2026-02-05 11:20
Core Viewpoint - The emergence of AI tools, particularly Anthropic's "Claude Cowork," is perceived as a significant threat to the Software as a Service (SaaS) industry, leading to a dramatic sell-off in software stocks and a widespread belief that "SaaS is dead" [1][2][8]. Group 1: Market Reaction - The launch of Anthropic's "plugins" resulted in a loss of approximately $285 billion in market value for Nasdaq, with software stocks experiencing a 6% drop, the largest single-day decline since April of the previous year [3][4]. - Following the initial drop, the iShares expanded technology software ETF fell an additional 2%, indicating ongoing market distress [6]. - The overall sentiment on Wall Street has shifted to a pessimistic outlook, with many investors eager to exit software stocks regardless of current prices [8][28]. Group 2: AI's Impact on SaaS - Anthropic's "Claude Cowork" can automate tasks traditionally handled by various software, such as legal document review, significantly reducing costs for businesses from $50,000 annually to potentially just over $100 monthly [14][20]. - The introduction of AI capabilities is expected to disrupt numerous vertical industries, including finance, sales, and marketing, as more plugins are developed [23][30]. - The perception that AI will replace software has led to a reevaluation of the SaaS model, which was previously seen as a complementary relationship [25][38]. Group 3: Competitive Landscape - Anthropic's self-developed underlying model positions it as a formidable competitor, potentially undermining traditional legal services and existing startups in the legal automation space [17][20]. - Other companies, such as Harvey AI and Legora, are also active in the legal automation sector, but Anthropic's capabilities may give it a competitive edge [15][17]. - The market is witnessing a broader trend where AI is seen as a direct competitor to SaaS companies, challenging their traditional business models [27][39]. Group 4: Long-term Outlook - Despite the current turmoil, some industry leaders, like Jensen Huang, argue that software will remain essential as a tool for AI, suggesting that the notion of SaaS being "dead" is misguided [9][47]. - The future may see a transformation in the SaaS business model, where SaaS becomes a more foundational infrastructure rather than a direct user interface [48][49]. - The long-term viability of SaaS companies may depend on their ability to adapt and leverage proprietary data and robust systems to maintain their competitive advantage [42][44].
谷歌北大联手学术版Banana爆火,论文图表100%精确生成
量子位· 2026-02-05 06:01
一水 发自 凹非寺 量子位 | 公众号 QbitAI 效果好到刷屏的Nano Banana,学术特供版热乎出炉! 名字就是如此直观—— PaperBanana ,给你每天都在头痛的Paper用上Banana。 (试图押韵skr) 而且这一次是由谷歌北大强强联手打造。 知道你想马上看效果,别急,三个官方案例这就给大家搬上桌。 在相同输入下,人类绘制、原版Nano Banana与PaperBanana生成的论文插图对比如下: 综合评估显示,PaperBanana在美观性、简洁性与逻辑清晰度上均全面优于原版。 而且它还能直接优化人工绘制的插图,瞅瞅右边,是不是高级感一下就上去了。 而在看到其效果之后,一众网友也纷纷感叹"学术插图"这个老大难总算是要被攻克了。 想想以前的日子,真真是要落泪了~ 研究人员花费4个小时在Figma中绘制一张图,简直令人难以置信。 那么,学术版PaperBanana是如何炼造的呢? 一个不够,那就5个! 此外,由于PaperBanana还提供代码出图功能 (即利用Gemini-3-Pro自动生成并执行Python可视化代码出图) ,所以它还能用来生成需要 数值100%精准的各种图表。 好好 ...