量子位
Search documents
硅谷停电干崩谷歌Robotaxi,马斯克贴脸热嘲:特斯拉就没事
量子位· 2025-12-22 09:30
一凡 发自 凹非寺 量子位 | 公众号 QbitAI 一次大规模停电,暴露了全球无人车一哥的短板。 被曝估值冲上千亿美元没几天,Waymo就因为当地停电全面停摆了,挡在路中间,造成城市 拥堵,相关视频疯传。 马斯克第一时间"补刀",表示自家Robotaxi就没受到影响。看上去,特斯拉代表的L2渐进式 路线,似乎小胜了一局……反正马哥认为这就是彰显优越性的时刻。 在Robotaxi战场上,今年马斯克的一举一动,都把自动驾驶之争推向了新的高潮,大洋两岸 更多玩家开始入场,沿着「特斯拉路线」前进,与「Waymo路线」争夺自动驾驶圣杯。 所以问题是,停电是如何影响Waymo Robotaxi的? 当地停电,Waymo停工 Waymo停摆源自一场火灾,旧金山变电站失火,导致当地大规模停电,据说直接影响到13 万居民用电。 更要命的是,因为大范围停电,马路上的红绿灯都不亮了,引发Waymo无人车全面停摆。 真是屋漏偏逢连夜雨,本就混乱的交通,这下因为无人车挡在路上变得更堵了。Waymo只好 连夜找拖车运走了无人车,同时宣布在当地停运,目前还不清楚什么时候重新上线。 所以为啥停电会导致Waymo停运?首先是官方回应暴露的运 ...
全自研仿真GPU求解器x虚实对标物理测量工厂,打造具身合成数据SuperApp,加速具身仿真生态丨光轮智能@MEET2026
量子位· 2025-12-22 08:01
Core Insights - The transition from large model intelligence in the "language world" to embodied intelligence in the "physical world" highlights the importance of simulation as a foundational infrastructure for practical applications [1] - The scale of embodied intelligence is significantly larger than that of text and visual models due to the more complex and realistic data dimensions involved [2] - The core of the embodied intelligence era is not the algorithms themselves, but the effectiveness and scalability of the data they rely on, with simulation being the only viable solution to address data challenges [3] Simulation Infrastructure - The company is developing a comprehensive simulation infrastructure that includes measurement, generation, and solving capabilities to address industry pain points such as unrealistic simulations and unreliable Sim2Real transitions [3][15] - The simulation ecosystem is anchored in real industry needs, with the creation of high-fidelity synthetic data assets being essential for training embodied intelligence [5] - The company has established the world's largest remote operation data collection factory, a large-scale RL training platform (LW-BenchHub), and the first industrial-grade robot evaluation platform (RoboFinals) to support the transition of embodied intelligence from the lab to the real world [6] Data Opportunities - The data opportunities in embodied world models are estimated to be 1000 times greater than those in large language models, due to the complexity of interactions and feedback mechanisms required in physical environments [14] - Traditional pre-training data for large models is based on existing data, while embodied intelligence faces a significant pre-training demand due to the lack of real-world instances [17][18] Challenges and Solutions - Past simulation failures are attributed to three main issues: unrealistic physics, visual distortion of assets, and inaccurate interaction behaviors [19][20] - The company has developed a "measurement, generation, solving" triad solution to create a simulation factory that aligns closely with the physical world, eliminating reliance on guesswork [21][23] - Accurate parameter identification is crucial for ensuring that simulated robots behave consistently with real-world counterparts, thereby bridging the Sim2Real gap [33] Ecosystem and Commercialization - A robust ecosystem is essential for the sustainable development of simulation platforms, with the company focusing on creating "killer applications" to support ongoing evolution [39][40] - The company’s applications include a global remote operation data collection factory, a large-scale RL training system, and the RoboFinals evaluation platform, which has become a leading standard for assessing robotic models [40][45]
倒反天罡!Gemini Flash表现超越Pro,“帕累托前沿已经反转了”
量子位· 2025-12-22 08:01
Core Insights - Gemini 3 Flash outperforms its predecessor Gemini 2.5 Pro and even the flagship Gemini 3 Pro in various benchmarks, achieving a score of 78% in the SWE-Bench Verified test, surpassing Gemini 3 Pro's score of 76.2% [1][6][9] - The performance of Gemini 3 Flash in the AIME 2025 mathematics competition benchmark is notable, scoring 99.7% with code execution capabilities, indicating its advanced mathematical reasoning skills [7][8] - The article emphasizes a shift in perception regarding flagship models, suggesting that smaller, optimized models like Flash can outperform larger models, challenging the traditional belief that larger models are inherently better [19][20] Benchmark Performance - In the Humanity's Last Exam, Flash scored 33.7% without tools, closely trailing Pro's 37.5% [7][8] - Flash's performance in various benchmarks includes: - 90.4% in GPQA Diamond for scientific knowledge [8] - 95.2% in AIME 2025 for mathematics without tools [8] - 81.2% in MMMU-Pro for multimodal understanding [8] - Flash's speed is three times that of Gemini 2.5 Pro, with a 30% reduction in token consumption, making it cost-effective at $0.50 per million tokens for input and $3.00 for output [9] Strategic Insights - Google’s team indicates that the Pro model's role is to "distill" the capabilities of Flash, focusing on optimizing performance and cost [10][12][13] - The evolution of scaling laws is discussed, with a shift from merely increasing parameters to enhancing reasoning capabilities through advanced training techniques [15][16] - The article highlights the importance of post-training as a significant area for future development, suggesting that there is still substantial room for improvement in open-ended tasks [17][18] Paradigm Shift - The emergence of Flash has sparked discussions about the validity of the "parameter supremacy" theory, as it demonstrates that smaller, more efficient models can achieve superior performance [19][21] - The integration of advanced reinforcement learning techniques in Flash is cited as a key factor in its success, proving that increasing model size is not the only path to enhancing capabilities [20][22] - The article concludes with a call to reconsider the blind admiration for flagship models, advocating for a more nuanced understanding of model performance [23]
让AI像人类画家一样边画边想,港中文&美团让模型「走一步看一步」
量子位· 2025-12-22 04:41
Core Viewpoint - The article discusses the introduction of a new paradigm called Thinking-while-Generating (TwiG), which interleaves textual reasoning with visual generation to enhance the capabilities of models in generating complex images and videos, addressing limitations of existing models in handling spatial relationships and object interactions [5][19]. Group 1: Existing Challenges - Current diffusion and autoregressive models, such as FLUX.1 and Emu3, struggle with generating accurate representations of complex spatial relationships and interactions, often resulting in errors like misplacing objects or incorrect quantities [1]. - Two main approaches have been previously explored: "Think-before-Generation," which lacks flexibility, and "Think-after-Generation," which incurs high computational costs and delays [4]. Group 2: Introduction of TwiG - TwiG allows models to pause during the generation process to evaluate and plan the next steps, mimicking human artistic processes [5][7]. - The framework breaks down visual generation into a cycle of "generate-think-regenerate," enabling models to incorporate reasoning at multiple points during the creation process [7]. Group 3: Core Dimensions of TwiG - The framework consists of three key dimensions: 1. **When to Think**: The model creates a "thinking schedule" based on user prompts, optimizing the generation process into three stages that align with the semantic structure of images [8]. 2. **What to Say**: At each pause, the model generates a "thought chain" that guides the next steps in a more precise manner than traditional prompts [9]. 3. **How to Refine**: After completing a section, the model performs self-reflection to correct any mistakes immediately, enhancing efficiency [10]. Group 4: Empirical Research and Results - The research team conducted experiments on a unified multimodal model (Janus-Pro) to validate the TwiG framework, demonstrating its potential through various stages of testing [12]. - **Zero-Shot Performance**: The TwiG-ZS model showed remarkable "think-while-generating" capabilities without parameter updates, outperforming baseline models in multiple dimensions [13][14]. - **Supervised Fine-Tuning (SFT)**: A dataset of 50K was used for SFT, which improved the model's coherence and control over generated thought chains [16]. - **Reinforcement Learning (RL)**: The TwiG-RL model, optimized with a specific RL strategy, demonstrated competitive performance against existing models like Emu3 and FLUX.1 in key metrics [17]. Group 5: Conclusions and Future Implications - The introduction of TwiG represents a shift in how visual generation models operate, emphasizing the need for logical reasoning in generation processes [19]. - Key conclusions include the necessity of explicit reasoning for complex logic, the efficiency of local corrections over complete rewrites, and the critical role of reinforcement learning in enhancing model capabilities [20]. - The TwiG framework is designed to be compatible with diffusion models, suggesting potential applications in more complex fields such as video generation and 3D modeling [21].
MiniMax海螺视频团队首次开源:Tokenizer也具备明确的Scaling Law
量子位· 2025-12-22 04:41
Core Viewpoint - The MiniMax Sea Cucumber video team has introduced a new scalable visual tokenizer pre-training framework (VTP) that addresses the limitations of traditional tokenizers in generating high-quality outputs from generative models, emphasizing the importance of understanding over mere pixel reconstruction [5][15][58]. Group 1: Traditional Tokenizer Limitations - Traditional tokenizers focus on pixel-level reconstruction, which does not necessarily translate to improved generation quality, leading to a saturation point where increased computational resources yield diminishing returns [4][15]. - The "pre-training scaling problem" indicates that better reconstruction accuracy can paradoxically lead to poorer generation performance, as traditional methods often overlook high-level semantic understanding [12][15]. Group 2: VTP's Approach and Innovations - VTP shifts the focus from pixel-level reconstruction to a more holistic understanding of visual semantics, integrating various representation learning methods to enhance the tokenizer's capabilities [26][30]. - The framework employs a multi-task loss function that combines understanding, reconstruction, and generation, allowing the tokenizer to produce semantically rich latent representations that improve downstream model performance [34][35]. Group 3: Empirical Findings and Performance Metrics - VTP demonstrates that injecting "understanding" into the tokenizer significantly enhances generation quality, with empirical evidence showing a positive correlation between understanding capabilities and generation performance [40][41]. - The VTP model achieved a zero-shot classification accuracy of 78.2% on ImageNet, surpassing the original CLIP's 75.5%, and exhibited superior reconstruction and generation capabilities compared to existing models [44]. Group 4: Scaling Law and Industry Implications - VTP reveals a scaling law for tokenizers, indicating that performance can improve with increased computational resources, data, and parameters, challenging the traditional view that scaling benefits only apply to main models [50][54]. - The findings suggest that investing in tokenizer development is crucial for enhancing overall generative system performance, positioning tokenizers as a core component worthy of long-term investment in the industry [58].
天下苦SaaS已久,企业级AI得靠「结果」说话
量子位· 2025-12-22 04:41
Core Viewpoint - The article discusses the shift from traditional SaaS models to RaaS (Result as a Service) in the AI industry, highlighting the challenges and opportunities in deploying AI solutions for enterprises [2][35]. Group 1: Challenges in SaaS and AI Deployment - Service providers are struggling with high inference costs and inconsistent delivery quality, leading to a decline in the attractiveness of SaaS in the AI era [2][8]. - Traditional paths for deploying AI involve high upfront costs and significant trial-and-error expenses, which deter many potential customers from adopting AI solutions [11][15]. - The complexity of integrating new AI systems with existing infrastructure adds to the challenges faced by enterprises [12][17]. Group 2: Emergence of RaaS - RaaS is seen as a promising alternative to SaaS, focusing on paying for results rather than just tools, which aligns better with customer needs [39][40]. - The Results Cloud by BaiRongYunChuang offers a comprehensive solution that includes infrastructure, an operating system, and an application store, addressing the pain points of traditional AI deployment [16][34]. - RaaS encourages a collaborative relationship between service providers and clients, transforming the dynamic from a client-vendor relationship to a partnership [42][44]. Group 3: Results Cloud Architecture - The Results Cloud is structured in three layers: BaiJi (infrastructure), BaiGong (operating system), and BaiHui (application store), each serving a specific purpose in the AI deployment process [19][29]. - BaiJi provides a marketplace for AI infrastructure, offering pre-packaged models and computing power without exposing the underlying complexity to clients [20][21]. - BaiGong acts as a central hub that filters and optimizes the combination of models and computing resources, significantly reducing decision-making costs for clients [25][26]. Group 4: Performance Measurement and Compensation - The Results Cloud aligns the performance metrics of AI employees with human employees, allowing for a more straightforward evaluation of effectiveness [46]. - Compensation models for AI employees can include task-based pricing, value-sharing agreements, or fixed salaries, ensuring that clients only pay for actual results [48][49]. - This approach mitigates concerns about upfront costs, encouraging clients to trial AI solutions without financial risk [52]. Group 5: Ecosystem Development - BaiRongYunChuang emphasizes the importance of building an ecosystem for AI solutions, inviting third-party developers to contribute to the platform [57][59]. - The company aims to create a "Silicon-based Productivity Alliance" to foster collaboration and innovation in the AI space [59][60]. - By leveraging its established technology and client base, BaiRongYunChuang seeks to facilitate market opportunities for developers and enhance the overall AI ecosystem [62][63].
量子位编辑作者招聘
量子位· 2025-12-22 04:41
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit," which focuses on tracking AI advancements and has established itself as a leading content platform in the industry [1]. Group 1: Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. - Positions are full-time and based in Beijing, with various levels of roles open for application [2][4]. Group 2: Job Responsibilities - **AI Industry Direction**: Focuses on innovations in infrastructure, including chips, AI infrastructure, and cloud computing [6]. - **AI Finance Direction**: Involves tracking venture capital and financial reports in the AI sector, monitoring capital movements within the industry [6]. - **AI Product Direction**: Concentrates on the application and hardware advancements in AI [6]. Group 3: Benefits and Growth Opportunities - Employees will have the chance to engage with the latest AI technologies, enhance their work efficiency through new AI tools, and build personal influence by creating original content [6]. - The company offers competitive salaries, comprehensive benefits including social insurance, meal allowances, project performance bonuses, and overtime compensation [6]. Group 4: Company Growth Metrics - By 2025, Quantum Bit aims to have over 2.4 million subscribers on WeChat and more than 7 million users across all platforms, with a daily reading volume exceeding 2 million [12]. - Quantum Bit is recognized as the top new media outlet in the AI and frontier technology sector according to third-party data platforms [12].
真正面向大模型的AI Infra,必须同时懂模型、系统、产业|商汤大装置宣善明@MEET2026
量子位· 2025-12-22 01:40
Core Viewpoints - The core strategy of the company is "1+X," where "1" represents core businesses including large devices, large models, and AI applications, while "X" encompasses innovative businesses such as smart driving, healthcare, and retail [6][10]. AI Infrastructure Development - The company emphasizes that AI infrastructure must not only address the availability of computing power but also ensure efficient, stable, and scalable support for models and industries [3][4]. - The total computing power of the company has reached 32,000 PetaFLOPS, showcasing its commitment to building a robust AI infrastructure [6][13]. Energy Efficiency and Carbon Reduction - The AI computing center has implemented a power consumption prediction system that can accurately forecast power needs within 15 minutes, achieving a 7% annual reduction in electricity costs and over 3,000 tons of annual carbon reduction [6][21]. - The center's Power Usage Effectiveness (PUE) has reached 1.267, with a 15% improvement in overall computing efficiency [21]. Collaboration and Resource Sharing - The company has launched the "SenseTime Computing Power Mall" in collaboration with over ten domestic manufacturers, allowing clients to freely combine and allocate diverse domestic computing resources and industry model services [6][22]. - The platform supports seamless implementation of algorithms across various chips, enhancing the overall capabilities of the PaaS platform [22]. Industry Applications and Partnerships - The company has established partnerships with top-tier research institutions and various industries, including internet technology, AIGC, and traditional sectors, providing comprehensive end-to-end solutions [25][26]. - Notable collaborations include working with major clients in traditional industries to develop industry-specific AI models, demonstrating the feasibility of AI applications even in complex traditional sectors [29][30].
AI体育教练来了!中国团队打造SportsGPT,完成从数值评估到专业指导的智能转身
量子位· 2025-12-22 01:40
Core Insights - The article discusses the current state of "intelligent" sports systems, highlighting that most remain at the "scoring + visualization" stage, lacking actionable insights for athletes and coaches [1] - It introduces the SportsGPT framework, which aims to provide a complete intelligent loop from "motion assessment" to "professional diagnosis" and "training prescription" [5][37] Group 1: Limitations of Current Models - General large models like GPT-5 struggle with specialized sports biomechanics analysis due to their lack of fine-grained visual perception, leading to generic and sometimes physically infeasible suggestions [3][9] - A comparative evaluation shows that SportsGPT outperforms other models in accuracy (3.80) and feasibility (3.77), indicating its unique advantages in generating precise, actionable training guidance [8][9] Group 2: Motion Analysis Techniques - MotionDTW is a two-stage time series alignment algorithm designed for sports motion analysis, addressing traditional DTW's limitations by constructing a high-dimensional feature space [10][21] - The algorithm employs a weighted multi-modal feature space to eliminate errors caused by athlete body differences and incorporates dynamic features like angular velocity to enhance motion phase representation [12][18] Group 3: Diagnostic Capabilities - KISMAM serves as a bridge between raw biomechanical data and interpretable diagnostics, establishing a quantitative benchmark based on data from 100 youth sprinters [25][26] - The model quantifies deviations from standard thresholds and constructs a high-dimensional mapping matrix to understand complex relationships between motion anomalies and technical issues [28][30] Group 4: Training Guidance - SportsRAG, built on a large external knowledge base, enhances the generation of training guidance by integrating domain knowledge with diagnostic results, ensuring actionable recommendations [33][34] - The absence of the RAG module significantly reduces the feasibility of the model's outputs, demonstrating its critical role in transforming diagnostic insights into professional training prescriptions [34] Group 5: Conclusion - The SportsGPT framework represents a significant advancement in intelligent sports training, moving from mere data presentation to providing executable, expert-level guidance [37] - It establishes a new standard in smart sports by effectively addressing the challenges of motion analysis, diagnosis, and training instruction [37]
火线解析MiniMax招股书!全球领先大模型成本只有OpenAI 1%,果然拳怕少壮
量子位· 2025-12-21 15:10
Core Viewpoint - MiniMax, a leading AI model unicorn, has successfully passed the Hong Kong Stock Exchange hearing, signaling its IPO ambitions amidst discussions about the bubble in large AI models like OpenAI [1][3]. Group 1: Company Overview - MiniMax has raised over $1.5 billion in funding within four years, attracting investments from notable firms such as MiHoYo, Alibaba, Tencent, and others [3][62]. - The company has a global presence, serving over 200 countries, with 70% of its revenue coming from international markets [6][42]. - MiniMax aims to achieve Artificial General Intelligence (AGI) and views scalability as a core driver towards this goal [8][7]. Group 2: Technological Advancements - MiniMax is one of the few companies that invested in multimodal model development from its inception [10]. - The company has released several models, including the M1 and M2 text models, with M2 achieving top rankings in performance and cost efficiency [16][17]. - MiniMax has also developed leading models in voice, music, and video, with its video model Hailuo ranking in the top tier of international tests [20][25][26]. Group 3: Financial Performance - MiniMax's revenue surged from $346,000 in 2023 to $30.52 million in 2024, marking a 782.2% increase [39]. - By the first nine months of 2025, revenue reached $53.44 million, significantly surpassing the previous year's total [40]. - The company has achieved a gross margin improvement from -24.7% in 2023 to 23.3% in the first nine months of 2025 [45][46]. Group 4: Operational Efficiency - MiniMax's R&D expenses have increased significantly, but the efficiency of these investments has improved, with training-related cloud computing costs as a percentage of revenue decreasing from over 1365% in 2023 to 266.5% in 2025 [52][54]. - The company has a cash reserve of $1.102 billion, sufficient to sustain operations for over 53 months without additional fundraising [58][59]. - MiniMax's team is young, with an average age of 29, and a high proportion of R&D personnel, which contributes to its innovative and efficient operational model [70][71].