Workflow
量子位
icon
Search documents
开源模型TOP5,被中国厂商包圆了
量子位· 2025-10-15 06:27
Core Insights - The article highlights the significant rise of Chinese open-source large models, with notable mentions of Alibaba's Qwen series and DeepSeek, which are expected to have a profound impact on the open-source community starting in the second half of 2024 [1][6][20]. Model Rankings - Chinese open-source models have moved from being followers to leaders in the field, as evidenced by their positions in the LMArena rankings, where models like GLM-4.6 and DeepSeek-v3.2 are closely following top proprietary models such as GPT-5 and Gemini-2.5-pro [7][10]. - Qwen3-max-preview has reached the top three in rankings, although it is not yet open-sourced [8]. Performance in Various Domains - In the text generation domain, Chinese models like DeepSeek-R1/V3.1 and GLM-4.6 are competing closely with leading proprietary models [10]. - In web development tasks, models such as DeepSeek-R1-0528 and Qwen3-Coder have also made it to the top ten [11]. - In the visual domain, Tencent's Hunyuan-vision-1.5 and Qwen3 are among the strongest open-source models, with Hunyuan-vision-1.5 still in the planning phase for open-sourcing [12]. Popularity and Downloads - Qwen3 is noted as one of the highest downloaded models, leading among open-source models when scaled to hundreds of billions of parameters [18]. - The most popular model currently is DeepSeek-R1, indicating strong user engagement and preference [17]. Industry Trends - The article suggests that the shift in dominance within the open-source model landscape is not just about who leads but may redefine the global innovation landscape [21]. - The driving force behind this momentum is increasingly recognized as coming from China, indicating a potential shift in the global AI development paradigm [20].
王兴兴硕士论文惊现GitHub,宇树雏形那时候就有了
量子位· 2025-10-15 06:27
一水 发自 凹非寺 量子位 | 公众号 QbitAI 人火了是连毕业论文都要被翻出来的(doge)。 这不,宇树科技CEO 王兴兴的 硕士毕业论文 就被网友们掘地三尺找到了。 (不在知网,而是在GitHub上找到的。) 此时回看这篇近10年前的论文,有两点颇让人注意: 一是王兴兴当时大胆押注的电驱式机器人方案,目前已经被业界广泛接受。当时包括波士顿动力在内的国内外团队都将研究集中于液压方案, 而现在,这一形式已经发生逆转。 (波士顿动力从去年开始改液压为电驱) 二是宇树科技 (已经估值百亿且即将IPO) 的开局,其实就是源自论文所提出的那只名叫XDog的机器小狗。不止王兴兴本人在多个场合公 开提到这只小狗,而且它还被明晃晃摆在宇树科技展厅的起首位置。 当然更重要的是,论文中所蕴含的"性价比"思想后来也几乎成了宇树科技的"立身之本"—— 不谈如今已满大街跑的机器狗,这家公司去年8月发布的G1双足人形机器人,更是首次将人形机器人价格下探至10万元大关 (9.9万元起售) 所以,要问明星独角兽宇树科技是如何炼成的?创始人王兴兴的这篇论文,或许可以找到一些线索。 论文已初现机器人"性价比"思维 这篇论文完成于2016 ...
OPPO新AI操作系统,走出屏幕“指哪答哪”,嘈杂环境只听你声音
量子位· 2025-10-15 04:00
Core Viewpoint - OPPO has launched the new generation of AIOS, ColorOS 16, featuring upgraded functionalities such as "One-Click Flash Memory" and "One-Click Question Screen" to enhance user experience and interaction with AI technology [1][50]. Group 1: One-Click Flash Memory - The "One-Click Flash Memory" function allows users to save key information with a single button press, which has been significantly upgraded in ColorOS 16 [9][8]. - Users can now save multiple images at once, extracting key information and text without the need to browse through them [12]. - The AI can automatically generate summaries from long videos, identifying key timestamps for easier reference [14]. - This feature also enables users to remember takeout codes and payment details, automatically recognizing and storing them for future access [20][23]. - The system can create personalized consumption reports by recognizing spending types and amounts [23]. - It incorporates a "memory symbiosis" feature, which can recommend restaurants based on users' health reports, avoiding unsuitable food options [26]. - Users can also capture paper receipts using the camera for record-keeping [27]. Group 2: One-Click Question Screen - The "One-Click Question Screen" feature has been updated to support voice recognition, allowing users to interact with AI even in noisy environments [34][36]. - Users can simply point at objects in the real world for the AI to provide information, enhancing the interaction experience [38]. - This feature has been expanded to include collaboration with popular review platforms, enhancing the exploration experience [41]. Group 3: New AI Technology Architecture - OPPO introduced a new AI technology architecture that includes new computing, new perception, and new ecosystem layers [43]. - The new computing aspect focuses on intelligent edge computing, enabling high-performance inference capabilities [44]. - The new perception layer features a memory symbiosis engine that allows for continuous awareness of the physical world and lifelong memory capabilities [46]. - The new ecosystem aims to facilitate cross-application AI capabilities and enhance interaction between devices and users [48]. - This architecture marks the transition of ColorOS into a new AIOS era, set to debut with the upcoming Find X9 series and OnePlus devices [50][52].
人工智能年度榜单火热报名中!五大奖项,寻找AI+时代的先锋力量
量子位· 2025-10-15 04:00
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 这是量子位人工智能年度榜单的 第8年 。八年来,我们见证了技术的突破与落地,产业的融合与重塑,也见证了一批又一批推动时代前行 的企业、人物与产品。 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 在人工智能重新定义一切的时代里,智能技术已不再是单一工具,而是产业与社会协同进化的驱动力。我们期待通过这场年度评选,去发现 并致敬那些真正引领变革、开拓边界的探索者与实践者。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 产品榜 2025 人工智能年度领航企业 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 评选标准 : 2025 人工智能年度潜力创业公司 聚焦于中国人 ...
谷歌新版Gemini一夜端掉UI:单HTML文件复刻macOS,成功率100%
量子位· 2025-10-15 01:08
Core Insights - Google's AI, Gemini 3.0 Pro, has demonstrated the ability to create a fully functional macOS-like web operating system from simple prompts, showcasing its advanced capabilities in UI design and functionality [2][3][4] - The AI's success in generating operating systems for macOS, Windows, and Linux within a single HTML file indicates a significant leap in programming models, potentially positioning Gemini 3.0 Pro as a leading tool in the field [10][12][15] - Despite the impressive results, some experts caution that these creations are merely simulations and not true operating systems, emphasizing the distinction between emulation and actual implementation [18] Group 1: Gemini 3.0 Pro Capabilities - Gemini 3.0 Pro can replicate macOS UI features, including animations, window management, and bundled software, all functioning correctly [4][10] - The AI can also generate a web-based Windows environment with integrated Python and gaming capabilities, demonstrating versatility across different operating systems [12][11] - A successful attempt to create a Linux desktop environment further highlights the AI's comprehensive capabilities in UI and functionality [16][15] Group 2: Community Reactions and Comparisons - Users have expressed excitement over the potential of Gemini 3.0 Pro, suggesting it could become the strongest programming model to date if the final version meets these expectations [9] - Comparisons with other AI models, such as Claude 4.5 Sonnet, reveal that Gemini 3.0 Pro outperforms its competitors in generating functional applications [13] - The community acknowledges the impressive nature of the AI's output while also recognizing the limitations of its current capabilities, particularly in terms of true operating system functionality [18] Group 3: Future Prospects - Although Google has not officially announced the release date for Gemini 3.0 Pro, industry insiders speculate it may debut in the coming months based on previous patterns [19][20] - Increased visibility through demonstration videos from influencers suggests a strategic marketing approach by Google, reminiscent of past successful campaigns [22] - The anticipation surrounding Gemini 3.0 Pro raises concerns about potential disappointment if expectations are set too high, similar to the reception of previous AI models [22]
实测新版LiblibAI:终于把模型、生图、工作流塞进一个碗了
量子位· 2025-10-15 01:08
Core Insights - The article discusses the significant upgrades in LiblibAI 2.0, transforming it from a model-finding website to a comprehensive AIGC (AI-Generated Content) platform, enhancing user experience and functionality [11][36]. Group 1: Platform Upgrades - LiblibAI 2.0 introduces multiple models and video effects, moving beyond simple interface changes to a more integrated creative workflow [3][12]. - The platform now allows users to create content without switching between multiple websites, streamlining the creative process [11][12]. - The interface has evolved to resemble a combination of ChatGPT and Canva, making it more user-friendly [12]. Group 2: Model Integration - The platform retains its core strength by integrating popular models such as Qwen-Image, Seedream 4.0, and the latest Midjourney V7 model, which was only recently released [15][16]. - LiblibAI 2.0 has also incorporated various mainstream video models, ensuring a comprehensive offering for users [17][18]. Group 3: User Experience - The new feature of adding special effects to videos has been highlighted as a standout capability, allowing for creative transformations [19][21]. - Users have reported mixed experiences, with some noting issues like page lag and limited editing capabilities for generated content [28][38]. - The platform's ability to visualize model selection through a global image style library simplifies the process for new users [33]. Group 4: Company Background - LiblibAI has a history of rapid growth, having completed four rounds of financing in one year, setting a record in the domestic AI application sector [39]. - The founder, Chen Mian, has a strong background in commercializing products, previously working with popular applications like Jianying and CapCut [42][43]. - The company is transitioning from a model-sharing community to a comprehensive AI toolkit for creators, which poses challenges in maintaining user trust and engagement [45].
谢赛宁新作:VAE退役,RAE当立
量子位· 2025-10-14 08:16
时令 发自 凹非寺 量子位 | 公众号 QbitAI 昔日风光无限的VAE,终于被宣判"退役"? 谢赛宁团队 最新研究给出了答案—— VAE的时代结束,RAE将接力前行 。 其中表征自编码器RAE(Representation Autoencoders)是一种用于扩散Transformer(DiT)训练的新型自动编码器,其核心设计是用预 训练的表征编码器(如DINO、SigLIP、MAE 等)与训练后的轻量级解码器配对,从而替代传统扩散模型中依赖的VAE(变分自动编码 器)。 这种新结构不仅能提供高质量重建结果,还具备语义丰富的潜空间,同时支持可扩展的基于变换器的架构。 该方法在无需额外表示对齐损失的情况下,实现了更快的收敛速度。通过采用配备轻量级宽型DDT头部的DiT变体,他们在ImageNet上取得 强劲的图像生成效果: 下面具体来看。 VAE退役,RAE当立 如今,Diffusion Transformer虽已取得长足发展,但多数模型仍依赖2021年的旧版SD-VAE构建潜空间。 这引发了几大核心问题: 256×256分辨率下,无引导(no guidance)FID= 1.51; 256×256和512 ...
不用跟AI客气了!新研究:语气越粗鲁回答正确率越高
量子位· 2025-10-14 08:16
Core Insights - The article discusses a study from Penn State University titled "Mind Your Tone," which reveals that using a ruder tone when interacting with AI models like GPT-4o results in higher accuracy in responses, with a correctness rate of 84.8% compared to 80.8% when using a very polite tone [2][10]. Group 1: Study Findings - The study involved a test with 250 multiple-choice questions across various subjects, where each question was presented in five different tones ranging from very polite to very rude [6][7]. - The results indicated that the ruder the tone, the more accurate the AI's responses, suggesting that polite phrasing may introduce unnecessary complexity that distracts the AI from the core task [10][12]. Group 2: Implications for AI Interaction - The findings imply that clearer and more direct instructions yield better results when using AI tools, as the rudeness in tone may help the AI focus on the task at hand [13][18]. - The article notes that while newer models like GPT-4o perform better with ruder tones, older models such as GPT-3.5 and Llama2-70B do not respond well to rudeness, indicating a difference in how various AI models process language [16][17].
OpenAI自研芯片内幕曝光!18个月前开始用AI优化芯片设计,比人类工程师更快
量子位· 2025-10-14 05:39
Core Viewpoint - OpenAI and Broadcom have announced a strategic collaboration to deploy a 10GW scale AI accelerator, marking a significant step in building the infrastructure necessary to unlock AI potential and address computational demands [5][12][43] Group 1: Collaboration Details - The partnership involves OpenAI designing AI accelerators and systems, while Broadcom will assist in their development and deployment, with full deployment expected by the end of 2029 [5][6] - The 10GW scale is equivalent to 10,000MW, which can power approximately 100 million 100-watt light bulbs, indicating the substantial power requirements for AI operations [10][11] - OpenAI's CEO emphasized that this collaboration is crucial for creating infrastructure that benefits humanity and businesses, while Broadcom's CEO highlighted its significance in the pursuit of general artificial intelligence [12][13] Group 2: Strategic Importance - The collaboration underscores the importance of custom accelerators and Ethernet as core technologies in AI data centers, enhancing Broadcom's leadership in AI infrastructure [13] - For OpenAI, this partnership helps alleviate computational constraints, especially given the nearly 800 million active users of ChatGPT each week [14] Group 3: Insights from Leadership - OpenAI's President discussed the reasons for developing in-house chips, including a deep understanding of workloads, the necessity of vertical integration, and challenges faced with external collaborations [18][21] - The decision to self-develop chips is driven by the need to address specific computational tasks that existing chips do not adequately cover, emphasizing the importance of vertical integration [21][30] - OpenAI's leadership has recognized that scaling is essential for achieving optimal results, as demonstrated in their past experiences with reinforcement learning [27][28] Group 4: Future Implications - The self-developed chips are expected to enhance efficiency, leading to better performance and cost-effectiveness in AI models [31] - AI is playing a significant role in optimizing chip design, reportedly outperforming human engineers in speed and efficiency [32][34] - OpenAI's strategy of "self-development + collaboration" has been in the works for nearly two years, with ongoing efforts to design a dedicated inference chip [43]
量子位「MEET2026智能未来大会」启动!年度榜单征集中
量子位· 2025-10-14 05:39
Core Insights - The article emphasizes the transformative impact of artificial intelligence (AI) on various sectors, marking the beginning of a new era where AI reshapes work, life, and societal operations [1][7]. Group 1: AI Integration and Evolution - Intelligent technology has deeply penetrated production and daily life, evolving from mere tools to intelligent partners that understand human needs [2]. - AI technology is no longer confined to specific fields but transcends industry, discipline, and scenario boundaries, creating new ecosystems and opportunities [3]. - Emerging technologies such as multimodal, AR/VR, and spatial computing are blurring the lines between the digital and physical worlds [4]. Group 2: MEET2026 Conference Overview - The MEET2026 Intelligent Future Conference will focus on the theme "Symbiosis Without Boundaries, Intelligence to Ignite the Future," inviting leaders from technology, industry, and academia to witness industry transformation [7]. - This year marks the seventh iteration of the MEET Intelligent Future Conference, which attracts thousands of tech professionals and millions of online viewers, establishing itself as an annual barometer for the intelligent technology industry [9][12]. - The conference will feature prominent figures such as Dr. Kai-Fu Lee and Professor Zhang Yaqin, along with leaders from major tech companies like Baidu, Alibaba, Tencent, and Huawei [9]. Group 3: AI Trends and Awards - The "2025 Artificial Intelligence Annual List" will recognize influential companies, products, and individuals in the AI sector, with results announced at the MEET2026 conference [16][17]. - The annual trend report will highlight ten significant AI trends, analyzing their potential and impact on the industry [22]. Group 4: Event Logistics - The MEET2026 conference is scheduled for December 2025 in Beijing, China, with registration details to be announced [24].