量子位
Search documents
网友疯玩Gemini 3!AI造物门槛真是0了
量子位· 2025-11-20 04:09
Core Viewpoint - The article highlights the innovative capabilities of Gemini 3 Pro, showcasing its ability to generate various applications and games through simple prompts, indicating a significant leap in AI technology compared to its predecessor, Gemini 2.5 Pro [9]. Group 1: Application Generation - Users can create retro-filtered photos instantly by simply posing and clicking a button [3]. - The AI has enabled the rapid expansion of a platform akin to "4399 mini-games," demonstrating its versatility in generating creative and interactive content [9][10]. - Various games and applications can be generated, including a 2D parkour game and a 3D interactive water physics scene, showcasing the AI's capability to produce complex environments and interactions [26][28]. Group 2: User-Generated Content - Users have shared their creations, such as an Xbox One controller SVG and a 3D Pac-Man game, illustrating the community's engagement and creativity [12][14]. - The AI can transform simple sketches into interactive applications, such as a house layout design from a floor plan [34]. - The ability to generate entire mobile application UI interfaces from basic prompts has been highlighted, emphasizing the ease of use and accessibility for users [38]. Group 3: Tools and Utilities - Users can create useful tools like a screen recording application that provides real-time prompts based on spoken instructions, enhancing productivity for online meetings and presentations [42][44]. - The AI's capability to adjust video ratios and generate creative video ideas further supports users in content creation without the need for expensive software [45][41]. - The article encourages users to explore their creativity and share their ideas, fostering a collaborative environment for innovation [48].
朱啸虎投的第一个AI硬件公司,又完成一轮融资
量子位· 2025-11-20 00:30
Core Insights - Gyges Labs has successfully completed a Pre A+ funding round, attracting investment from Granite Asia and璀璨资本, following a previous Pre-A round led by金沙江创投 [2][4] - The company aims to create AI glasses that prioritize everyday wearability, emphasizing a "Glass First" philosophy, which contrasts with the trend of bulky, feature-heavy smart glasses [6][7][10] - Gyges Labs' innovative display technology, DigiWindow, allows for a lightweight design (35 grams) and discreet information display, enhancing user experience without compromising on aesthetics [11][14][15] Funding and Company Background - Gyges Labs raised significant capital in its Pre A+ round, building on a previous multi-million RMB Pre-A round [2][4] - The leadership team includes experienced professionals from Silicon Valley and major tech companies, enhancing the company's credibility and innovation potential [4] Product Philosophy and Design - The company focuses on minimalism in design, opting for a lightweight and unobtrusive product that can be worn daily without drawing attention [10][14] - The absence of a camera in the AI glasses reflects a strategic choice to avoid issues related to privacy, battery life, and usability [19][21][22] AI Functionality and User Experience - Gyges Labs emphasizes "active AI" capabilities, such as real-time translation and contextual information display, which aim to integrate seamlessly into users' daily lives [26][27] - The design philosophy prioritizes user comfort and social acceptance, avoiding features that could lead to awkward situations in social settings [27] Future Vision and Product Development - Gyges Labs envisions a broader ecosystem of wearable devices, with plans to introduce a smart ring in 2026, expanding its AI capabilities beyond glasses [33][35] - The company's ultimate goal is to empower users through various wearable technologies, enhancing their capabilities and experiences [35]
“最强具身VLA大模型”,究竟强在哪儿?
量子位· 2025-11-20 00:30
Core Insights - The article discusses the breakthrough of the robot foundation model π*0.6, which showcases its capabilities in performing complex tasks with a success rate exceeding 90% [2][10]. Group 1: Model Overview - π*0.6 is the latest VLA (Vision-Language-Action) model, building on the previous π0.5, and introduces a novel training method called RECAP [8][10]. - The RECAP method allows robots to learn from their mistakes, shifting from traditional imitation learning to a more intuitive learning approach [3][29]. Group 2: RECAP Methodology - RECAP consists of three main stages: guidance through human demonstration, correction through expert intervention, and practice through autonomous experience [7][12]. - The model utilizes a value function to evaluate actions, which helps in identifying advantageous actions and improving learning efficiency [19][22]. Group 3: Training Process - The training process involves offline reinforcement learning using diverse data sources, including human demonstrations and autonomous attempts, to train the value function and policy [20][22]. - The model's architecture has been enhanced, with the backbone expanding from Gemma (2.6B) to Gemma3 (4B) and Action Expert parameters increasing to 860M [25]. Group 4: Performance Evaluation - In tests involving complex tasks like folding clothes and making espresso, RECAP doubled the throughput and reduced failure rates by approximately 50% compared to models using only supervised fine-tuning [27]. - The model demonstrated high stability, successfully performing tasks for extended periods without human intervention [28]. Group 5: Learning from Failures - The ability of the model to learn from failures is highlighted as a significant advancement, allowing it to extract effective learning signals from imperfect experiences [29][56]. - This approach opens new avenues for future research in robotics, emphasizing the importance of learning from real-world execution rather than solely relying on ideal demonstrations [56].
三行代码就能手搓一个AI应用!蚂蚁OceanBase开源其首款AI数据库
量子位· 2025-11-19 09:01
Core Insights - OceanBase has launched its first AI-native database, seekdb, designed to meet the demands of the AI era, allowing developers to build AI applications with just three lines of code [8][9][19] - The database aims to address the challenges faced by enterprises in integrating multimodal data for AI applications, which often suffer from fragmentation and complexity [11][12][19] - OceanBase's seekdb features a hybrid search capability that combines vector retrieval, full-text search, and scalar filtering, enhancing both speed and accuracy [14][19] Group 1: OceanBase Overview - OceanBase is a self-developed distributed relational database by Ant Group, launched in 2010, and has evolved over 15 years to become a leading domestic database [3][4] - The database has over 4,000 global customers and has achieved an average annual growth rate of over 100% for five consecutive years [4] - As of May this year, OceanBase has built an active community of over 25,000 developers, with cumulative downloads exceeding one million [5] Group 2: seekdb Features - seekdb supports unified storage and retrieval of various data types, including scalar, vector, text, JSON, and GIS, facilitating complex queries without cross-system calls [14] - The database is designed for easy deployment, requiring only 1 CPU core and 2GB of memory, and can be installed with a single command [16] - seekdb is open-sourced under the Apache 2.0 license, allowing users to freely use, modify, and extend the software [17] Group 3: AI Integration - OceanBase's CEO emphasizes that the real bottleneck in AI is not the models but the data, particularly in high-sensitivity scenarios like finance and government [19] - seekdb is positioned as a real-time entry layer for integrating large models with private data, aiming to simplify the data architecture for AI applications [20][21] - The new OceanBase 4.4 version integrates transaction processing, analytical processing, and AI capabilities into a single core, enhancing distributed scalability and high availability [22] Group 4: Additional Tools - OceanBase has also released a series of tools alongside seekdb, forming a complete toolchain for AI applications, covering data management, retrieval, analysis, and memory [23] - PowerRAG is an enterprise-level retrieval-augmented generation solution that simplifies the process of building AI applications like knowledge bases and intelligent customer service [24] - PowerMem is designed to efficiently manage and recall user interaction context, achieving a top score in the LoCoMo Benchmark while significantly reducing token consumption [26][27] Group 5: Strategic Vision - OceanBase's strategy focuses on unifying data across different systems and formats through a multi-load, multi-modal, and hybrid cloud architecture [29] - The goal is to provide enterprises with a single database core capable of handling transactions, analysis, search, and AI inference, streamlining operations and reducing complexity [31]
何恺明团队新作:扩散模型可能被用错了
量子位· 2025-11-19 09:01
Core Viewpoint - The article discusses a new paper by He Kaiming that challenges the mainstream approach to diffusion models by advocating for a return to the original purpose of denoising, suggesting that models should directly predict clean images instead of noise [2][5][6]. Summary by Sections Diffusion Models - Diffusion models have become increasingly complex over the years, often focusing on predicting noise rather than the clean images they were originally designed to denoise [4][6]. - The new paper emphasizes that since diffusion models are fundamentally denoising models, they should directly perform denoising [5][6]. Manifold Hypothesis - The article explains the manifold hypothesis, stating that natural images exist on a low-dimensional manifold within a high-dimensional pixel space, while noise is uniformly distributed across the entire space [7][9]. - This distinction leads to challenges when neural networks attempt to fit high-dimensional noise, requiring significant model capacity and often resulting in training failures [9]. JiT Architecture - The proposed architecture, JiT (Just image Transformers), is a simplified model that processes images directly without relying on complex components like VAE or tokenizers [10][11]. - JiT operates by taking raw pixel data, dividing it into large patches, and setting the output target to predict clean image blocks [12]. Experimental Results - Experimental results indicate that while predicting noise and predicting original images perform similarly in low-dimensional spaces, traditional noise prediction models fail in high-dimensional spaces, while JiT remains robust [14]. - JiT demonstrates excellent scalability, maintaining high-quality generation even when input dimensions are significantly increased [15][17]. - The JiT architecture achieved state-of-the-art FID scores of 1.82 and 1.78 on ImageNet datasets of 256x256 and 512x512, respectively, without relying on complex components or pre-training [18][19]. Research Focus - The primary research direction of He Kaiming includes representation learning, generative models, and their synergistic effects, aiming to build intelligent visual systems that understand the world beyond human perception [21].
聊AI,当然得来量子位MEET大会!
量子位· 2025-11-19 06:20
Core Insights - The article emphasizes the transformative impact of artificial intelligence (AI) on various industries, marking the beginning of a new era in 2025 [1] - The MEET2026 Intelligent Future Conference will focus on cutting-edge technologies and industry advancements related to AI [2][3] - The conference theme "Symbiosis Without Boundaries, Intelligence to Ignite the Future" highlights AI's role in driving societal evolution [3] Event Details - The conference will cover hot topics in the tech circle, including reinforcement learning, multimodal AI, chip computing power, AI in various industries, and AI going global [4] - It will feature a collision of academic frontiers and commercial applications, showcasing leading technological achievements from infrastructure, models, and products [5] - The event will also include the authoritative release of the annual AI rankings and trends report [6] Notable Speakers - The conference will host prominent figures such as Zhang Yaqin, a world-class scientist and entrepreneur in AI and digital video [12][13] - Sun Maosong, Executive Vice President of Tsinghua University's AI Research Institute, will also be a key speaker [17] - Other notable speakers include Wang Zhongyuan, Zhao Junbo, and Liu Fanping, all recognized for their contributions to AI and technology [21][27][48] AI Rankings and Trends Report - The "Artificial Intelligence Annual Rankings" initiated by Quantum Bit has become one of the most influential rankings in the AI industry, evaluating companies, products, and individuals [60] - The "2025 Annual AI Trends Report" will focus on the main themes of technological development, identifying ten significant AI trends and analyzing their potential value [61] Conference Logistics - The MEET2026 Intelligent Future Conference will take place at the Beijing Jinmao Renaissance Hotel, with registration now open for attendees [62] - The event aims to attract thousands of tech professionals and millions of online viewers, establishing itself as an annual barometer for the intelligent technology industry [64]
“日本版OpenAI”创下估值新高!Transformer八子之一创办,老黄也投了
量子位· 2025-11-19 06:20
Core Viewpoint - Sakana AI has achieved a record valuation of approximately 400 billion yen (about 2.635 billion USD) following its recent Series B funding round, making it the highest-valued startup in Japan's history [4][42]. Group 1: Company Overview - Sakana AI was founded in July 2023 and has quickly gained attention due to its innovative approach to AI, particularly in developing a nature-inspired intelligence model [6][20]. - The company is co-founded by Llion Jones, a notable author of the Transformer paper, and David Ha, a former senior scientist at Google Brain [7][16]. Group 2: Funding and Valuation - The recent Series B funding raised 20 billion yen (approximately 135 million USD), contributing to a total valuation of around 400 billion yen [4][5]. - The investment consortium includes major players like Nvidia, Khosla Ventures, NEA, and Japanese financial giants such as Mitsubishi UFJ and Shikoku Electric Power [5]. Group 3: Technological Innovation - Sakana AI aims to develop AI models inspired by natural evolution, focusing on efficiency and performance while reducing computational costs [20][21]. - The company has introduced "The AI Scientist," a comprehensive AI system capable of automating the entire scientific research process, including generating and publishing academic papers [27][28]. Group 4: Research and Development - The AI Scientist has evolved, with its second version successfully passing peer review at the ICLR conference, demonstrating its capability to generate high-quality research [38][42]. - Sakana AI has maintained a rapid research output, releasing multiple studies and innovations on a monthly basis, further solidifying its position in the AI landscape [42][44]. Group 5: Market Comparison - In comparison to OpenAI's valuation, Sakana AI's growth trajectory positions it as the closest equivalent to a "Japanese version of OpenAI," despite its unique approach [43][45].
无需重训练+即插即用+性能零损耗,蚂蚁集团×南洋理工首发微调安全框架,让模型既安全又高效
量子位· 2025-11-19 06:20
最近研究表明,模型的微调过程会严重削弱安全对齐能力,也就是说,模型能力越强反而越危险。 EnchTable团队 投稿 量子位 | 公众号 QbitAI 无需重新训练,也能一键恢复模型的安全意识了。 于是蚂蚁集团联合南洋理工大学针对性推出了模型安全对齐框架—— EnchTable ,可以让模型在微调后依旧保持安全意识。 通过 安全蒸馏+干扰感知融合 两大核心技术,在多个模型架构与任务中实现了安全与效用的最佳平衡,甚至在抗攻击能力上超越了官方 Instruct安全模型。 而且 即插即用 ,完全不影响模型性能。 详细内容如下: 安全对齐具有"可迁移性" 目前陆续出现了多起有关微调模型安全能力下降的事件,其根本问题在于当前的安全对齐机制无法随模型微调而持续生效。 对此,研究团队认为: 安全对齐 (Safety Alignment) 本身是一种具有高度可迁移性 (transferability) 的知识。 这意味着 不需要 在每个微调模型上都"重新学习"一遍安全,而是可以将"安全"作为一种独立的知识模块,从一个已对齐的模型中"提取"出来, 再"注入"到另一个模型中。 而这一发现则将问题从"昂贵的重新训练" 转变为"高效 ...
融资数亿、营收过亿!黄仁勋频频关注的具身赛道隐形冠军浮出水面
量子位· 2025-11-19 06:20
衡宇 发自 凹非寺 量子位 | 公众号 QbitAI 刚刚,一家AI公司的融资引发了圈内热议。 Why?因为它与具身智能息息相关,也与通往物理AI的世界模型密不可分。更准确来说,完成融资的这家公司是站在二者相关生态上的关键供 应链公司——仿真合成数据公司。 量子位最新获悉, 仿真合成数据公司光轮智能,刚刚完成数亿元A轮、A+轮融资 。 此次披露的投资方里,既有东方富海、九派资本等机构投资者,也有三七互娱、琥珀资本等产业方。老股东辰韬资本也持续加注。 而同样受关注的是它合作的客户,既有英伟达、谷歌、阿里、字节,也有Figure AI、1X Technology、智元机器人、银河通用,还有 Toyota,BOSCH、比亚迪、吉利…… 一己之力,串起了整个AI生态 。 有消息称,这家全球唯一专注仿真合成数据的技术公司, 营收已突破亿元大关 。 而作为全球首家把生成式AI融入仿真技术的公司, 光轮智能的创始人是圈内声名卓著的大佬谢晨 ——之前英伟达、Cruise及蔚来的仿真负责 人。 最近一次出圈,则因为与黄仁勋女儿Madison Huang的首秀对谈,谈论的话题还是风口上的物理AI…… 物理AI是黄仁勋在2025年 ...
文献、报告、合同翻译的老大难被国产工具治了?三大翻译神器横评后,这家稳得离谱
量子位· 2025-11-19 06:20
Core Viewpoint - The article discusses the advantages of Baidu's "Document Translation" tool, particularly in academic settings, highlighting its superior translation accuracy, formatting preservation, and integrated AI assistance compared to competitors like Google Translate and DeepL [1][3][59]. Translation Capability Comparison - Baidu's "Document Translation" offers specialized translation models for over 10 professional fields, including academic papers, legal documents, and news, making it more user-friendly for specific needs compared to Google and DeepL, which lack such differentiation [8][17]. - The tool boasts a professional translation accuracy rate of 90%, effectively capturing the nuances of academic language, which is crucial for users dealing with complex terminologies [17][22]. AI Assistance Features - The integrated AI assistant in Baidu's tool can summarize content, answer specific questions about the text, and provide explanations for technical terms, enhancing the user experience significantly [26][30][36]. - Users can interact with the AI to clarify difficult sections of the text, making the translation process more intuitive and less daunting [28][32]. Formatting and Editing Capabilities - Baidu's "Document Translation" excels in maintaining the original document's formatting, achieving a near 1:1 restoration of the layout, which is critical for academic papers that often include complex structures like tables and figures [43][46]. - The tool allows for extensive post-translation editing, enabling users to modify text directly within the translated document, which is not supported by DeepL and is limited in Google Translate [52][55]. Overall User Experience - The comprehensive features of Baidu's translation tool cater to the needs of students and professionals, making it a preferred choice for those who require efficient and accurate translations without the hassle of manual corrections [57][58]. - The article concludes that Baidu's "Document Translation" is the closest to an ideal translation tool, effectively integrating into the workflow of users in academic and professional environments [59][60].