Workflow
量子位
icon
Search documents
老黄玩Nano Banana上瘾,拉着哈萨比斯大夸特夸,“不会有人不喜欢吧?”
量子位· 2025-09-18 04:20
Core Viewpoint - Jensen Huang, CEO of NVIDIA, expresses his admiration for the AI product Nano Banana, highlighting its appeal and functionality [1][2][4]. Group 1: Jensen Huang's Views on AI - Huang believes that artificial intelligence is the greatest opportunity to bridge the technological gap and should be accessible to everyone [8]. - He utilizes AI tools to enhance his work efficiency, stating that they help him remember tasks and improve the quality of his work [10]. - Huang employs various AI tools, including ChatGPT, Grok, and Gemini, selecting them based on specific tasks [11][12]. Group 2: Nano Banana's Features and Popularity - Nano Banana has introduced a new feature that allows users to upload photos and generate stickers effortlessly [14][15]. - This feature is built on Gemini's Canvas functionality, enabling users to select from nine different styles without needing to input prompts [18]. - Since its launch, Nano Banana has gained immense popularity, contributing to Gemini's rapid growth with 23 million new users in less than a month and over 500 million images edited [23][24].
开源Agent模型榜第一名,现在是阿里通义DeepResearch
量子位· 2025-09-18 04:20
Core Viewpoint - Alibaba has open-sourced its first deep research agent model, Tongyi DeepResearch, which outperforms existing models like OpenAI's Deep Research and DeepSeek-V3.1 in various authoritative evaluation sets [1][3]. Data Strategy - The model's capability enhancement is attributed to a multi-stage data strategy designed to generate high-quality training data without relying on expensive manual annotations [4][5]. - The team introduced Agentic CPT for incremental pre-training, establishing a solid foundation for the agent [6]. - A systematic and scalable data synthesis scheme was developed to create a positive feedback loop for data generation [7]. Data Construction - An open-world knowledge memory was constructed using a wide range of knowledge documents, web crawler data, knowledge graphs, and trajectory data from post-training [8]. - Three types of action data were created based on diverse question styles and historical trajectory data, enabling extensive exploration of the reasoning-action space [9]. Post-training Data - The team developed a fully automated synthetic data generation scheme to produce datasets that surpass the quality of manual annotations [11][12]. - A new process was designed to extract information from real website data, ensuring the authenticity of data structures while increasing question complexity [14]. Reasoning Modes - Tongyi DeepResearch features both a native ReAct Mode and a Heavy Mode for handling complex multi-step research tasks [15][18]. - The IterResearch paradigm was created to deconstruct tasks into a series of research rounds, allowing the agent to maintain cognitive focus and high-quality reasoning [20]. Training Process - The training process was innovated to connect Agentic CPT, Agentic SFT, and Agentic RL, leading to a new paradigm for agent model training [25][27]. - The team emphasized the importance of data quality and training environment stability over algorithmic factors in the success of reinforcement learning projects [37][39]. Application Deployment - Tongyi DeepResearch has empowered multiple internal applications within Alibaba, including the Gaode travel agent, which integrates complex query capabilities into its app [42][43]. - A simulated training environment was created to address the high costs and inconsistencies associated with real-time web API development [44]. Legal AI Application - Tongyi Law Rui, a legal AI agent, aims to provide professional legal services, leveraging innovative agent architecture and iterative planning technology for complex reasoning tasks [46].
中国大模型首登Nature封面!DeepSeek首次披露:R1训练只花了200万
量子位· 2025-09-18 00:51
Core Insights - DeepSeek has become the first Chinese large model company to be featured on the cover of Nature, with founder Liang Wenfeng as the corresponding author [2][3] - The R1 model has been recognized for its innovative approach, achieving significant performance improvements in reasoning tasks through a pure reinforcement learning framework [19][20] Group 1: Achievements and Recognition - DeepSeek's R1 model is the first large language model to undergo peer review, marking a significant milestone in the field [5] - The model has garnered 3,596 citations on Google Scholar and has been downloaded 10.9 million times from Hugging Face, indicating its widespread acceptance and use [7] - The training cost of R1 is approximately $294,000, significantly lower than competitors that often exceed $10 million, challenging the notion that high investment is necessary for top-tier AI models [12][13] Group 2: Training and Data - R1 was trained using 512 H800 GPUs for 198 hours, with a total training cost of $294,000 [10][11] - The dataset for R1 includes five types of data: Math, Code, STEM, Logic, and General, with a total of 126,000 prompts [15][18] - The model's training involved a combination of cold-start data, reinforcement learning, and supervised fine-tuning, enhancing its reasoning capabilities [25][26] Group 3: Performance Metrics - DeepSeek-R1-Zero achieved a pass@1 score of 71.0% in AIME 2024, significantly improving from 15.6% [21] - In comparison to other leading models, DeepSeek-R1 demonstrated competitive performance across various benchmarks, including MATH-500 and LiveCode [23][30] - The distilled models from DeepSeek-R1 outperformed direct applications of reinforcement learning on the base model, showcasing the effectiveness of the training approach [29] Group 4: Safety and Transparency - DeepSeek has released a detailed safety assessment of the R1 model, indicating a moderate inherent safety level comparable to GPT-4o [18][22] - The company has embraced transparency by open-sourcing the model weights for DeepSeek-R1 and DeepSeek-R1-Zero on Hugging Face, promoting community engagement [30]
ICPC总决赛被AI统治!GPT-5组合系统12题全对登顶,人类打破头只能争夺第三
量子位· 2025-09-18 00:51
Core Insights - The article discusses the impressive performance of AI systems in the 2025 International Collegiate Programming Contest (ICPC), highlighting the dominance of OpenAI's GPT-5 and Google's Gemini 2.5 models in solving complex programming problems [2][9][18]. Group 1: AI Performance in ICPC - OpenAI's system, utilizing GPT-5 and an experimental reasoning model, solved all 12 problems in under five hours, achieving a perfect score [9][10]. - Google's Gemini 2.5 Deep Think model solved 10 out of 12 problems, reaching gold medal level, and ranked second overall [3][18]. - The competition featured 139 top teams from nearly 3,000 universities across 103 countries [5]. Group 2: Problem-Solving Challenges - A particularly difficult problem, "Problem C," was unsolved by any university team, while both Gemini and OpenAI's models successfully tackled it [7][20]. - Gemini's approach involved assigning priority values to storage units and using dynamic programming to find optimal configurations for liquid distribution [25][26]. Group 3: Technological Advancements - The advancements in AI models, particularly in reasoning capabilities, have significantly improved over the past year, making them smarter, faster, and more cost-effective [17]. - Gemini's success is attributed to a combination of pre-training, post-training, novel reinforcement learning techniques, and multi-step reasoning [27][28]. Group 4: Future Directions - OpenAI's research vice president indicated that after ICPC, the focus may shift to applying AI in real-world scientific and engineering problems, suggesting a new frontier for AI applications [30][32].
豆包大模型开始上车了!上汽荣威率先进入AI智舱新拐点
量子位· 2025-09-17 12:09
Core Viewpoint - The article discusses the integration of the Doubao deep thinking model into the automotive industry, particularly highlighting its role in transforming vehicles into intelligent spaces that provide personalized user experiences through AI technology [1][12][32]. Group 1: Doubao Deep Thinking Model - The Doubao deep thinking model is essential for understanding user needs and executing appropriate actions, serving as a bridge between user intent and vehicle response [10][20]. - The model's ability to recognize complex user commands and intentions allows for a more interactive and human-like experience in vehicle operation, moving beyond simple command-response interactions [11][19]. Group 2: AI Smart Cabin Concept - The concept of AI smart cabins is emerging, with various functionalities being introduced, leading to confusion about what constitutes a true AI smart cabin [6][12]. - A genuine AI smart cabin should proactively sense user needs, interpret ambiguous dialogues, and autonomously execute tasks, rather than merely responding to direct commands [8][10]. Group 3: Collaboration with SAIC Roewe - The first application of the Doubao deep thinking model is with SAIC Roewe, marking a significant collaboration between a traditional automotive giant and an internet company [3][12]. - SAIC Roewe's extensive data advantage, robust hardware interfaces, and innovative spirit make it an ideal partner for deploying advanced AI models in vehicles [27][30]. Group 4: User Experience Enhancements - The integration of the Doubao model allows vehicles to act as personal assistants, providing advice on vehicle functions and answering a wide range of questions, enhancing the overall user experience [15][16]. - The model's memory capabilities enable it to remember user preferences and past interactions, allowing for personalized recommendations and improved service [16][22]. Group 5: Industry Implications - The introduction of the Doubao deep thinking model signifies a turning point in the automotive industry, as vehicles transition towards becoming intelligent entities capable of deep thought and interaction [12][32]. - This shift is indicative of a broader trend in the automotive sector, where AI technologies are increasingly being integrated to enhance user engagement and redefine human-vehicle interactions [20][34].
腾讯披露元宝已是TOP3应用
量子位· 2025-09-17 11:06
Core Viewpoint - Tencent is making significant strides in both consumer and business sectors with its AI products, showcasing impressive user engagement and technological advancements while also expanding its global infrastructure with a substantial investment in Saudi Arabia [1][19][24]. Group 1: Consumer Product Developments - Tencent Yuanbao has become one of the top three AI-native applications in China, achieving daily active user metrics that match the total question volume from the entire previous month [5][4]. - The AI meeting summary feature in Tencent Meeting has seen a user growth of over 150% in one year [8]. - The Mixyuan Lab has launched over 30 models in a year, with the Mixyuan 3D model achieving a download count exceeding 2.6 million [10][12]. Group 2: Business Integration and Applications - Tencent is successfully transitioning its consumer products to the business sector, with examples like Tencent Cloud CodeBuddy, which generates 50% of new code internally [18]. - Companies like Midea and AstraZeneca are leveraging Tencent's AI capabilities to enhance operational efficiency and service delivery [18]. Group 3: Global Expansion and Investment - Tencent Cloud is not merely exporting products but is taking a validated ecosystem abroad, including audio-video technology and mini-program platforms [20][21]. - The company announced a $150 million investment to build a new data center in Saudi Arabia, aiming to enhance its global digital infrastructure [24][19]. - Tencent's strategy emphasizes increasing industrial efficiency through smart solutions and expanding revenue through global outreach [27].
小红书首次公开AI技术体系,为最大规模校招拼了
量子位· 2025-09-17 11:06
Core Insights - Xiaohongshu announced its largest-ever campus recruitment for 2026, opening eight major job categories, with a significant increase in technical positions, which surged by 2.5 times [1][3]. Group 1: Recruitment and Talent Development - The company is in a rapid growth phase, necessitating a large influx of talent due to the emergence of new businesses and functions [3]. - Xiaohongshu places high importance on the potential and growth of campus recruits, as past recruits have quickly developed into key business personnel, reinforcing the commitment to invest in campus recruitment and training [3][42]. - The "Shu Guang Plan" is a two-year growth program for all campus recruits, aimed at helping them quickly understand the company culture and integrate into the organization [46][50]. Group 2: AI Technology System - Xiaohongshu's AI technology system is divided into five key components, which support its large UGC community of over 350 million monthly active users [10][8]. - The AI infrastructure provides the necessary support for efficient operation of AI models and technologies, enhancing user experience and content accuracy [16]. - The search and recommendation algorithms emphasize community interaction and personalized user experiences, moving beyond traditional keyword matching [15][23]. Group 3: Career Guidance and Skills Development - During the live session, experts emphasized that potential is more important than experience for young job seekers, highlighting the value of learning and dedication [34][35]. - The balance between cutting-edge research and practical application in the AI field was discussed, with a focus on the greater opportunities in commercial applications compared to academic exploration [38]. - Xiaohongshu encourages recruits to find their interests and develop unique value while remaining aware of external developments in the industry [39].
稚晖君机器人炸场:全球首秀“真男人必会的韦伯斯特空翻”
量子位· 2025-09-17 11:06
Core Viewpoint - The article highlights the achievement of the Lingxi X2 robot, which has become the first robot globally to complete a Webster flip, a complex acrobatic maneuver that demonstrates advanced capabilities in robotics [1][7]. Group 1: Robot Capabilities - The Lingxi X2 robot stands approximately 1.3 meters tall and possesses 25-31 degrees of freedom, although it lost 2 degrees due to the removal of its head for the Webster flip [13][14]. - The robot can perform basic movements like running and can navigate various terrains without the need for navigation systems, showcasing its autonomous obstacle avoidance capabilities [16][19]. - The successful execution of the Webster flip required overcoming significant challenges, including high dynamical complexity, real-time perception and feedback, and high hardware reliability [23][24]. Group 2: Technological Innovations - The achievement is attributed to the Lingchuan platform, which is an AI-enhanced tool for robot motion and expression creation, allowing for the design and secondary development of robot movements [20][19]. - The robot's motion capabilities are based on a reinforcement learning strategy that utilizes human video data to train its movements, ensuring precise execution in real-world scenarios [24]. Group 3: Future Developments - The Lingxi X2 series includes other models such as Lingxi X2-W and Lingxi X2-N, which are designed for different operational capabilities, including task intelligence and adaptability to various terrains [26][34]. - The company plans to scale production of the Lingxi X2 by the second half of 2025, with an expected output of several thousand units by the end of 2026 [36].
AI在实时视频里秒“剪”出你想要的部分!输入文字/图/视频片段,它都能秒懂|ICCV2025
量子位· 2025-09-17 11:06
OVG-HQ团队 投稿 量子位 | 公众号 QbitAI 还在实时视频里找特定事件找半天?最新技术直接开挂了。 试想一下,安防监控中,几个人影短暂掠过,利用新技术可以秒级调出这段"可疑聚众"的精准片段。 △ 图片为AI生成 在VR训练场,你戴上VR眼镜练习投篮,提前在手机App输入"定位和这个视频示范 (库里完美三分片段) 相似的动作"。训练开始,每一次 出手,眼镜在后台默默分析第一视角视频流。当你做出动作、发力、弧线都神似库里的三分时,眼镜立刻就能在虚拟界面高亮标记这个片段。 △ 图片为AI生成 不卖关子,这就是来自深圳北理莫斯科大学、阿德莱德大学的研究团队提出的新任务。 名叫 混合模态在线视频定位 (Online Video Grounding with Hybrid-modal Queries, OVG-HQ) 。 用大白话说,这项技术能让系统一边直播/录像,一边根据你提供的多种"线索",包括文字、参考图、示范视频片段或组合等,瞬间在实时视频 流中找出并精准裁剪出你关心的完整事件。 论文已收录于ICCV2025。 "离线"是硬伤 :主流技术必须等视频录完才能干活,事后分析如同马后炮,无法满足安防"秒级响 ...
390亿美元,全球具身智能第一估值来了!英伟达持续加注中
量子位· 2025-09-17 11:06
Core Viewpoint - Figure has made significant advancements in technology and financing after parting ways with OpenAI, achieving a post-financing valuation of $39 billion, the highest in the embodied intelligence sector to date [2][32]. Financing and Valuation - Figure has successfully raised over $1 billion in Series C financing, leading to a post-money valuation of $39 billion [2][32]. - The financing round was led by Parkway Venture Capital, with participation from notable investors including Nvidia, Brookfield Asset Management, and Qualcomm Ventures [4]. Strategic Focus Areas - The new funding will support Figure's development in three core areas [8]. - The first area is the large-scale penetration of humanoid robots into household and commercial scenarios, with plans to expand the production capacity of its BotQ manufacturing facility [9]. - The second area involves building next-generation GPU infrastructure to accelerate training and simulation for the Helix model [21]. - The third area focuses on launching advanced data collection projects to enhance the robot's understanding and operational capabilities in complex environments [21]. Technological Advancements - Figure has introduced the Helix architecture, a visual-language-action model that allows robots to perceive, understand, and act like humans [17]. - Helix consists of two systems that communicate and are trained end-to-end, enabling the robot to perform various tasks with a single unified model [18]. - The recent funding will further enhance the capabilities of Helix, which is designed to optimize the performance of embodied intelligent AI systems [20]. Company Background - Figure was founded in May 2022 by Brett Adcock, a serial entrepreneur [22]. - The company gained attention in the humanoid robotics sector after raising $675 million in Series B financing in February 2024, achieving a valuation of $2.6 billion at that time [22]. - Following a partnership with OpenAI, Figure decided to pursue vertical integration of its AI models, focusing on developing an end-to-end AI model tailored for specific robotic hardware [30][28].