量子位
Search documents
王兴兴硕士论文惊现GitHub,宇树雏形那时候就有了
量子位· 2025-10-15 06:27
一水 发自 凹非寺 量子位 | 公众号 QbitAI 人火了是连毕业论文都要被翻出来的(doge)。 这不,宇树科技CEO 王兴兴的 硕士毕业论文 就被网友们掘地三尺找到了。 (不在知网,而是在GitHub上找到的。) 此时回看这篇近10年前的论文,有两点颇让人注意: 一是王兴兴当时大胆押注的电驱式机器人方案,目前已经被业界广泛接受。当时包括波士顿动力在内的国内外团队都将研究集中于液压方案, 而现在,这一形式已经发生逆转。 (波士顿动力从去年开始改液压为电驱) 二是宇树科技 (已经估值百亿且即将IPO) 的开局,其实就是源自论文所提出的那只名叫XDog的机器小狗。不止王兴兴本人在多个场合公 开提到这只小狗,而且它还被明晃晃摆在宇树科技展厅的起首位置。 当然更重要的是,论文中所蕴含的"性价比"思想后来也几乎成了宇树科技的"立身之本"—— 不谈如今已满大街跑的机器狗,这家公司去年8月发布的G1双足人形机器人,更是首次将人形机器人价格下探至10万元大关 (9.9万元起售) 所以,要问明星独角兽宇树科技是如何炼成的?创始人王兴兴的这篇论文,或许可以找到一些线索。 论文已初现机器人"性价比"思维 这篇论文完成于2016 ...
OPPO新AI操作系统,走出屏幕“指哪答哪”,嘈杂环境只听你声音
量子位· 2025-10-15 04:00
Core Viewpoint - OPPO has launched the new generation of AIOS, ColorOS 16, featuring upgraded functionalities such as "One-Click Flash Memory" and "One-Click Question Screen" to enhance user experience and interaction with AI technology [1][50]. Group 1: One-Click Flash Memory - The "One-Click Flash Memory" function allows users to save key information with a single button press, which has been significantly upgraded in ColorOS 16 [9][8]. - Users can now save multiple images at once, extracting key information and text without the need to browse through them [12]. - The AI can automatically generate summaries from long videos, identifying key timestamps for easier reference [14]. - This feature also enables users to remember takeout codes and payment details, automatically recognizing and storing them for future access [20][23]. - The system can create personalized consumption reports by recognizing spending types and amounts [23]. - It incorporates a "memory symbiosis" feature, which can recommend restaurants based on users' health reports, avoiding unsuitable food options [26]. - Users can also capture paper receipts using the camera for record-keeping [27]. Group 2: One-Click Question Screen - The "One-Click Question Screen" feature has been updated to support voice recognition, allowing users to interact with AI even in noisy environments [34][36]. - Users can simply point at objects in the real world for the AI to provide information, enhancing the interaction experience [38]. - This feature has been expanded to include collaboration with popular review platforms, enhancing the exploration experience [41]. Group 3: New AI Technology Architecture - OPPO introduced a new AI technology architecture that includes new computing, new perception, and new ecosystem layers [43]. - The new computing aspect focuses on intelligent edge computing, enabling high-performance inference capabilities [44]. - The new perception layer features a memory symbiosis engine that allows for continuous awareness of the physical world and lifelong memory capabilities [46]. - The new ecosystem aims to facilitate cross-application AI capabilities and enhance interaction between devices and users [48]. - This architecture marks the transition of ColorOS into a new AIOS era, set to debut with the upcoming Find X9 series and OnePlus devices [50][52].
人工智能年度榜单火热报名中!五大奖项,寻找AI+时代的先锋力量
量子位· 2025-10-15 04:00
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 这是量子位人工智能年度榜单的 第8年 。八年来,我们见证了技术的突破与落地,产业的融合与重塑,也见证了一批又一批推动时代前行 的企业、人物与产品。 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 在人工智能重新定义一切的时代里,智能技术已不再是单一工具,而是产业与社会协同进化的驱动力。我们期待通过这场年度评选,去发现 并致敬那些真正引领变革、开拓边界的探索者与实践者。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 产品榜 2025 人工智能年度领航企业 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 评选标准 : 2025 人工智能年度潜力创业公司 聚焦于中国人 ...
谷歌新版Gemini一夜端掉UI:单HTML文件复刻macOS,成功率100%
量子位· 2025-10-15 01:08
Core Insights - Google's AI, Gemini 3.0 Pro, has demonstrated the ability to create a fully functional macOS-like web operating system from simple prompts, showcasing its advanced capabilities in UI design and functionality [2][3][4] - The AI's success in generating operating systems for macOS, Windows, and Linux within a single HTML file indicates a significant leap in programming models, potentially positioning Gemini 3.0 Pro as a leading tool in the field [10][12][15] - Despite the impressive results, some experts caution that these creations are merely simulations and not true operating systems, emphasizing the distinction between emulation and actual implementation [18] Group 1: Gemini 3.0 Pro Capabilities - Gemini 3.0 Pro can replicate macOS UI features, including animations, window management, and bundled software, all functioning correctly [4][10] - The AI can also generate a web-based Windows environment with integrated Python and gaming capabilities, demonstrating versatility across different operating systems [12][11] - A successful attempt to create a Linux desktop environment further highlights the AI's comprehensive capabilities in UI and functionality [16][15] Group 2: Community Reactions and Comparisons - Users have expressed excitement over the potential of Gemini 3.0 Pro, suggesting it could become the strongest programming model to date if the final version meets these expectations [9] - Comparisons with other AI models, such as Claude 4.5 Sonnet, reveal that Gemini 3.0 Pro outperforms its competitors in generating functional applications [13] - The community acknowledges the impressive nature of the AI's output while also recognizing the limitations of its current capabilities, particularly in terms of true operating system functionality [18] Group 3: Future Prospects - Although Google has not officially announced the release date for Gemini 3.0 Pro, industry insiders speculate it may debut in the coming months based on previous patterns [19][20] - Increased visibility through demonstration videos from influencers suggests a strategic marketing approach by Google, reminiscent of past successful campaigns [22] - The anticipation surrounding Gemini 3.0 Pro raises concerns about potential disappointment if expectations are set too high, similar to the reception of previous AI models [22]
实测新版LiblibAI:终于把模型、生图、工作流塞进一个碗了
量子位· 2025-10-15 01:08
Core Insights - The article discusses the significant upgrades in LiblibAI 2.0, transforming it from a model-finding website to a comprehensive AIGC (AI-Generated Content) platform, enhancing user experience and functionality [11][36]. Group 1: Platform Upgrades - LiblibAI 2.0 introduces multiple models and video effects, moving beyond simple interface changes to a more integrated creative workflow [3][12]. - The platform now allows users to create content without switching between multiple websites, streamlining the creative process [11][12]. - The interface has evolved to resemble a combination of ChatGPT and Canva, making it more user-friendly [12]. Group 2: Model Integration - The platform retains its core strength by integrating popular models such as Qwen-Image, Seedream 4.0, and the latest Midjourney V7 model, which was only recently released [15][16]. - LiblibAI 2.0 has also incorporated various mainstream video models, ensuring a comprehensive offering for users [17][18]. Group 3: User Experience - The new feature of adding special effects to videos has been highlighted as a standout capability, allowing for creative transformations [19][21]. - Users have reported mixed experiences, with some noting issues like page lag and limited editing capabilities for generated content [28][38]. - The platform's ability to visualize model selection through a global image style library simplifies the process for new users [33]. Group 4: Company Background - LiblibAI has a history of rapid growth, having completed four rounds of financing in one year, setting a record in the domestic AI application sector [39]. - The founder, Chen Mian, has a strong background in commercializing products, previously working with popular applications like Jianying and CapCut [42][43]. - The company is transitioning from a model-sharing community to a comprehensive AI toolkit for creators, which poses challenges in maintaining user trust and engagement [45].
谢赛宁新作:VAE退役,RAE当立
量子位· 2025-10-14 08:16
Core Viewpoint - The era of Variational Autoencoders (VAE) is coming to an end, with Representation Autoencoders (RAE) set to take over in the field of diffusion models [1][3]. Summary by Sections RAE Introduction - RAE is a new type of autoencoder designed for training diffusion Transformers (DiT), utilizing pre-trained representation encoders (like DINO, SigLIP, MAE) paired with lightweight decoders, replacing the traditional VAE [3][9]. Advantages of RAE - RAE provides high-quality reconstruction results and a semantically rich latent space, supporting scalable transformer-based architectures. It achieves faster convergence without the need for additional representation alignment losses [4][10]. Performance Metrics - At a resolution of 256×256, the FID score without guidance is 1.51, and with guidance, it is 1.13 for both 256×256 and 512×512 resolutions [6]. Limitations of VAE - VAE has outdated backbone networks, leading to overly complex architectures, requiring 450 GFLOPs compared to only 22 GFLOPs for a simple ViT-B encoder [7]. - The compressed latent space of VAE (only 4 channels) severely limits information capacity, resulting in minimal improvement in information carrying ability [7]. - VAE's weak representation capability, relying solely on reconstruction training, leads to low feature quality and slows down convergence, negatively impacting generation quality [7]. RAE's Design and Training - RAE combines pre-trained representation encoders with trained decoders without requiring additional training or alignment phases, and it does not introduce auxiliary loss functions [9]. - RAE outperforms SD-VAE in reconstruction quality despite its simplicity [10]. Model Comparisons - RAE models such as DINOv2-B, SigLIP2-B, and MAE-B show significant improvements in rFID and Top-1 accuracy compared to SD-VAE [11]. Adjustments for Diffusion Models - RAE requires simple adjustments for effective performance in high-dimensional spaces, including a wide DiT design, noise scheduling, and noise injection in the decoder training [13][17]. - The DiT-XL model trained with RAE surpasses REPA without any auxiliary losses or additional training phases, achieving convergence speeds up to 16 times faster than REPA based on SD-VAE [18][19]. Scalability and Efficiency - The new architecture enhances the scalability of DiT in terms of training computation and model size, outperforming both standard DiT based on RAE and traditional methods based on VAE [24].
不用跟AI客气了!新研究:语气越粗鲁回答正确率越高
量子位· 2025-10-14 08:16
Core Insights - The article discusses a study from Penn State University titled "Mind Your Tone," which reveals that using a ruder tone when interacting with AI models like GPT-4o results in higher accuracy in responses, with a correctness rate of 84.8% compared to 80.8% when using a very polite tone [2][10]. Group 1: Study Findings - The study involved a test with 250 multiple-choice questions across various subjects, where each question was presented in five different tones ranging from very polite to very rude [6][7]. - The results indicated that the ruder the tone, the more accurate the AI's responses, suggesting that polite phrasing may introduce unnecessary complexity that distracts the AI from the core task [10][12]. Group 2: Implications for AI Interaction - The findings imply that clearer and more direct instructions yield better results when using AI tools, as the rudeness in tone may help the AI focus on the task at hand [13][18]. - The article notes that while newer models like GPT-4o perform better with ruder tones, older models such as GPT-3.5 and Llama2-70B do not respond well to rudeness, indicating a difference in how various AI models process language [16][17].
OpenAI自研芯片内幕曝光!18个月前开始用AI优化芯片设计,比人类工程师更快
量子位· 2025-10-14 05:39
Core Viewpoint - OpenAI and Broadcom have announced a strategic collaboration to deploy a 10GW scale AI accelerator, marking a significant step in building the infrastructure necessary to unlock AI potential and address computational demands [5][12][43] Group 1: Collaboration Details - The partnership involves OpenAI designing AI accelerators and systems, while Broadcom will assist in their development and deployment, with full deployment expected by the end of 2029 [5][6] - The 10GW scale is equivalent to 10,000MW, which can power approximately 100 million 100-watt light bulbs, indicating the substantial power requirements for AI operations [10][11] - OpenAI's CEO emphasized that this collaboration is crucial for creating infrastructure that benefits humanity and businesses, while Broadcom's CEO highlighted its significance in the pursuit of general artificial intelligence [12][13] Group 2: Strategic Importance - The collaboration underscores the importance of custom accelerators and Ethernet as core technologies in AI data centers, enhancing Broadcom's leadership in AI infrastructure [13] - For OpenAI, this partnership helps alleviate computational constraints, especially given the nearly 800 million active users of ChatGPT each week [14] Group 3: Insights from Leadership - OpenAI's President discussed the reasons for developing in-house chips, including a deep understanding of workloads, the necessity of vertical integration, and challenges faced with external collaborations [18][21] - The decision to self-develop chips is driven by the need to address specific computational tasks that existing chips do not adequately cover, emphasizing the importance of vertical integration [21][30] - OpenAI's leadership has recognized that scaling is essential for achieving optimal results, as demonstrated in their past experiences with reinforcement learning [27][28] Group 4: Future Implications - The self-developed chips are expected to enhance efficiency, leading to better performance and cost-effectiveness in AI models [31] - AI is playing a significant role in optimizing chip design, reportedly outperforming human engineers in speed and efficiency [32][34] - OpenAI's strategy of "self-development + collaboration" has been in the works for nearly two years, with ongoing efforts to design a dedicated inference chip [43]
量子位「MEET2026智能未来大会」启动!年度榜单征集中
量子位· 2025-10-14 05:39
Core Insights - The article emphasizes the transformative impact of artificial intelligence (AI) on various sectors, marking the beginning of a new era where AI reshapes work, life, and societal operations [1][7]. Group 1: AI Integration and Evolution - Intelligent technology has deeply penetrated production and daily life, evolving from mere tools to intelligent partners that understand human needs [2]. - AI technology is no longer confined to specific fields but transcends industry, discipline, and scenario boundaries, creating new ecosystems and opportunities [3]. - Emerging technologies such as multimodal, AR/VR, and spatial computing are blurring the lines between the digital and physical worlds [4]. Group 2: MEET2026 Conference Overview - The MEET2026 Intelligent Future Conference will focus on the theme "Symbiosis Without Boundaries, Intelligence to Ignite the Future," inviting leaders from technology, industry, and academia to witness industry transformation [7]. - This year marks the seventh iteration of the MEET Intelligent Future Conference, which attracts thousands of tech professionals and millions of online viewers, establishing itself as an annual barometer for the intelligent technology industry [9][12]. - The conference will feature prominent figures such as Dr. Kai-Fu Lee and Professor Zhang Yaqin, along with leaders from major tech companies like Baidu, Alibaba, Tencent, and Huawei [9]. Group 3: AI Trends and Awards - The "2025 Artificial Intelligence Annual List" will recognize influential companies, products, and individuals in the AI sector, with results announced at the MEET2026 conference [16][17]. - The annual trend report will highlight ten significant AI trends, analyzing their potential and impact on the industry [22]. Group 4: Event Logistics - The MEET2026 conference is scheduled for December 2025 in Beijing, China, with registration details to be announced [24].
0人工参与实现梯度更新!MIT新框架让AI自动生成微调数据,权重自主升级
量子位· 2025-10-14 04:08
Core Viewpoint - The article discusses a new reinforcement learning framework called SEAL (Self-Adapting LLMs) developed by MIT, which enables large models to autonomously update their weights and learn new knowledge without human intervention [1][4][6]. Group 1: SEAL Framework Overview - SEAL employs a nested learning mechanism that consists of an external loop driven by reinforcement learning and an internal loop for parameter updates [4][26]. - The framework allows models to generate fine-tuning data and self-update instructions, thus overcoming the limitations of relying solely on external supervised data [6][25]. Group 2: Knowledge Incorporation Experiment - In the knowledge incorporation experiment, the Qwen2.5-7B model was tested using the SQuAD dataset, where it generated training data based on new paragraphs without seeing the corresponding questions [9][10]. - The accuracy of the model improved from 32.7% to 47.0% when using SEAL for fine-tuning, outperforming both original and GPT-4.1 generated data [14][15]. - SEAL demonstrated a significant accuracy of 58.2% when tested with longer paragraphs, indicating its ability to generalize to larger data organization tasks [16]. Group 3: Few-Shot Learning Experiment - In the few-shot learning experiment, the LLaMA-3.2-1B-Instruct model was evaluated using a subset of tasks from the ARC-AGI dataset [17][18]. - SEAL achieved a success rate of 72.5%, significantly higher than the 0% success rate of fixed few-shot prompts and 20% from random sampling strategies [22][23]. - Although SEAL's performance did not reach the optimal strategy (Oracle TTT) at 100%, it showcased strong task adaptability through self-discovered learning paths [22]. Group 4: Mechanism of SEAL - SEAL's process involves reading new information, rewriting it in its own language, and performing gradient updates for autonomous learning [25]. - The model generates self-edit instructions that describe how to update itself based on the current input, including information extraction and training parameters [28][29]. - The framework utilizes a non-traditional reinforcement learning method called ReSTEM, which focuses on behavior cloning and filtered sampling to optimize self-edit strategies [33][36].