Workflow
量子位
icon
Search documents
1米3宇树G1完美上篮!港科大解锁全球首个真实篮球机器人Demo
量子位· 2025-11-24 09:30
henry 发自 凹非寺 量子位 | 公众号 QbitAI 1米3的机器人小土豆,三步上篮也可以如此丝滑。 别误会,这台宇树G1暂时还不准备参加NBA选秀,但它刚解锁的 "现实世界打篮球" 技能,离上"村BA"首发应该不远了。 据悉,这是全球首个能在真实场景中完成篮球动作的机器人demo,来自香港科技大学的研究团队。 虽然团队还没公开完整的技术细节,但结合他们此前让机器人"打篮球"的工作,这次很可能是在之前研究的基础上,进一步改良而来。 接下来,让我们一窥究竟。 SkillMimic-v2 首先是被收录于 SIGGRAPH 2025 的 SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations 。 当前,通过动作捕捉等方式收集的数据往往存在以下缺陷: 稀疏性 (Sparse):演示数据仅覆盖了有限的技能变体,缺乏技能之间的过渡轨迹。 不连贯性 (Disconnected):不同的技能片段是独立的,缺乏自然的连接。 噪声 (Noisy):数据中包含物理上不可行的 ...
陶哲轩亲测:我用Gemini十分钟搞定了困扰学界多年的难题
量子位· 2025-11-24 07:30
Core Viewpoint - The collaboration between mathematician Terence Tao and the AI model Gemini has successfully solved a long-standing mathematical problem in just ten minutes, showcasing the potential of AI in mathematical proofs [1][3][25]. Group 1: Problem Overview - The problem addressed is the 367 problem proposed by Paul Erdős, which involves the 2-full part of an integer n and the existence of a constant for sufficiently large n [12][14]. - The problem requires verification of the existence of a limit supremum under specific conditions [16]. Group 2: AI's Role in the Solution - Terence Tao utilized Gemini Deep Think to complete the proof, which took only ten minutes, demonstrating the efficiency of AI in mathematical reasoning [19][20]. - Following the AI's proof, Tao spent an additional thirty minutes converting the AI's p-adic algebraic proof into a more fundamental argument [21]. Group 3: Collaborative Efforts - Two days later, Boris Alexeev used the Harmonic Aristotle tool to formalize the proof, taking two to three hours to complete the process [24]. - The problem was ultimately resolved through the collaboration between Gemini and human mathematicians, highlighting the synergy between AI and human expertise [25]. Group 4: Future Implications - This instance is not the first time Tao has employed AI for mathematical work, indicating a growing trend of AI assisting in mathematical proofs [29]. - The advancements in AI's mathematical reasoning capabilities suggest that future mathematics will involve more experimental approaches rather than solely theoretical ones [30].
奥特曼承认谷歌威胁到OpenAI!即将推出新模型“Shallotpeat”
量子位· 2025-11-24 07:30
Core Insights - The competitive landscape in the AI sector is shifting, with Google gaining an edge over OpenAI due to advancements in their AI models, particularly Gemini 3 Pro and Nano Banana Pro [2][12][17] - OpenAI's CEO, Sam Altman, acknowledged internal concerns about Google's progress and its potential impact on OpenAI's financial performance [4][15][18] - OpenAI is facing significant financial pressures, with projected revenues of $13 billion in 2023 but anticipated costs exceeding $100 billion in the coming years [18][20] Group 1 - Google has successfully repositioned itself in the AI market, leveraging its extensive resources and infrastructure to surpass OpenAI [26][35] - The key to Google's success lies in its model pre-training capabilities, which have outperformed OpenAI's efforts in this area [27][28][33] - OpenAI is aware of its need to improve pre-training processes and plans to release a new model, "Shallotpeat," to address these challenges [32][30] Group 2 - Google's financial strength, with over $70 billion in free cash flow generated in the last four quarters, contrasts sharply with OpenAI's financial model, which relies heavily on external funding [19][20] - The AI competition is evolving from a focus on individual model breakthroughs to a comprehensive stack approach, where Google benefits from its integrated infrastructure and product ecosystem [34][39] - Google's ability to rapidly scale its services and integrate AI into its existing platforms provides it with a significant distribution advantage over competitors [37][39]
上线4天下载破百万,蚂蚁CTO:灵光要做AGI时代的“支付宝”
量子位· 2025-11-24 05:30
Core Insights - The article highlights the rapid success of the AI application "Lingguang," which achieved over one million downloads within four days and two million downloads shortly thereafter, surpassing other global AI products in growth rate [1][2] - Lingguang is positioned as a transformative product in the AGI era, aiming to be the next "Alipay" for AI applications, focusing on efficiency rather than entertainment [4][12] Group 1: Product Development and Strategy - Lingguang's development was influenced by the emergence of DeepSeek, which provided confidence to Ant Group in pursuing AGI initiatives [5][6] - The product is designed to be user-friendly, lowering barriers to access AI technology, similar to how QR code payments revolutionized the internet era [12] - The core capabilities of Lingguang include dialogue, flash applications, and visual recognition, all aimed at maximizing efficiency for users [15][18] Group 2: Market Positioning and Competition - Lingguang is not seen as a competitor to other AI applications like Qianwen but rather as a collaborative partner in the AGI space [21][28] - The AGI market is still in its early stages, with significant growth potential, making the launch of new AI applications timely [26] - Ant Group emphasizes a cooperative approach in the AGI landscape, focusing on shared growth rather than direct competition [28][29] Group 3: Future Vision and Goals - Ant Group aims to establish a representative product in the AGI era, similar to its previous successes with Alipay and other financial products [30][34] - The long-term vision includes creating a comprehensive ecosystem where Lingguang serves as a versatile assistant, AQ as a health manager, and other products contribute to a digital financial landscape [33][34] - The company believes that focusing on a larger vision and collaborative efforts will lead to success in the evolving AGI market [29][30]
田渊栋卡帕西力荐Nano Banana新玩法:论文变漫画、手写解题以假乱真,谷歌这波赢麻了
量子位· 2025-11-24 05:30
Group 1 - The article highlights the innovative use of Nano Banana Pro integrated with NotebookLM, transforming academic papers into engaging comics, making them more accessible and enjoyable to read [1][7] - Users have discovered new functionalities of Nano Banana Pro, such as its ability to mimic human handwriting and accurately interpret handwritten notes and diagrams, enhancing its utility in educational contexts [8][17] - The article mentions that the former AI director of Tesla, Andrej Karpathy, has endorsed the use of Nano Banana Pro for solving problems, noting its accuracy in recognizing handwritten solutions [8][11] Group 2 - Google has made significant advancements in AI with the release of Gemini and Nano Banana, leading to a surge in its stock price and market capitalization, surpassing Microsoft [24] - The article references a humorous incident involving Google's CEO, who addressed a long-standing meme about the placement of cheese in a hamburger emoji, showcasing the company's commitment to refining its AI capabilities [27][32] - The advancements in AI technology, as demonstrated by the ability to understand physical world logic and spatial positioning, signify a substantial leap in AI's capabilities, marking Google's return to a leading position in the industry [32][33]
谢赛宁李飞飞LeCun搞的寒武纪,究竟是个啥?
量子位· 2025-11-24 03:39
Core Viewpoint - The article discusses the emergence and significance of "Cambrian-S," a new AI model focused on spatial perception, aiming to enhance how artificial intelligence understands and interacts with the world [2][6][8]. Group 1: Overview of Cambrian-S - Cambrian-S is not about creating silicon-based chips but rather about enabling AI to genuinely perceive the world [2]. - The model excels in multi-modal video processing, particularly in spatial reasoning tasks, achieving state-of-the-art (SOTA) results in short video spatial reasoning [6][41]. - The model's architecture includes a predictive perception module that allows it to anticipate the next frame in a video, improving efficiency and reducing GPU memory consumption [44]. Group 2: Development and Breakthroughs - The development of Cambrian-S followed a series of breakthroughs, including the evaluation of over 20 visual encoders to identify their strengths and suitable application scenarios [11]. - A spatial visual aggregator (SVA) was designed to efficiently integrate multi-source visual features while maintaining high processing quality [11]. - The team created a high-quality training dataset, filtering from 10 million to 7 million entries to enhance model interaction capabilities [13]. - They established the CV-Bench benchmark to address the inadequacies in existing visual capability assessments [15]. - The optimal training strategy was identified, demonstrating that two-stage training and unfreezing visual encoders significantly enhance model performance [17]. Group 3: Concept of Hyper-Perception - The team introduced the concept of "hyper-perception," which emphasizes the need for AI to not only recognize objects but also understand their spatial relationships and predict their future states [20][23]. - This concept is crucial for developing true multi-modal intelligence, as it allows AI to comprehend continuous video sequences rather than isolated images [25]. Group 4: Testing and Performance - The team developed the VSI-SUPER benchmark to evaluate AI's spatial perception capabilities through tasks like long-term spatial memory and continuous counting [26][30]. - Current models, such as Gemini-Live and GPT-Realtime, showed poor performance in these tests, with accuracy rates below 15% for 10-minute videos [31]. - The Cambrian-S model family, with parameters ranging from 0.5 billion to 7 billion, achieved over 30% improvement in spatial memory accuracy compared to open-source models [41][34].
抢先报名!MEET2026最新嘉宾阵容官宣,一起热聊AI
量子位· 2025-11-24 03:39
Core Viewpoint - The article emphasizes the transformative impact of artificial intelligence (AI) on various industries and society as a whole, highlighting the upcoming MEET2026 conference as a platform to explore these advancements and trends in AI technology [1][3]. Group 1: Conference Overview - The MEET2026 Intelligent Future Conference will focus on cutting-edge technologies and industry developments, particularly in AI [2]. - The theme of the conference is "Symbiosis Without Boundaries, Intelligence to Ignite the Future," aiming to explore how AI transcends industry, discipline, and scenario boundaries [3]. - Key topics of discussion will include reinforcement learning, multimodal AI, chip computing power, AI applications in various industries, and AI's global expansion [4]. Group 2: Notable Speakers - The conference will feature prominent figures such as Zhang Yaqin, a leading scientist in digital video and AI, and former president of Baidu [12][13]. - Sun Maosong, Executive Vice President of the Tsinghua University AI Research Institute, will also be a key speaker, known for his leadership in national research projects [17]. - Other notable speakers include Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, and He Xiaodong, Senior Vice President of JD Group, who has extensive experience in multimodal intelligence [21][30]. Group 3: AI Trends and Reports - The conference will unveil the "Artificial Intelligence Annual List" and the "Annual AI Trend Report," which are anticipated to provide insights into the most influential companies, products, and individuals in the AI sector [6][102]. - The 2025 AI Annual List will evaluate candidates across three dimensions: companies, products, and individuals, with results announced at the conference [103]. - The 2025 Annual AI Top Ten Trends Report will analyze significant AI trends based on technological maturity, current applications, and potential value, highlighting representative organizations and best cases [104]. Group 4: Event Details - The MEET2026 conference is scheduled for December 10, 2025, at the Beijing Jinmao Renaissance Hotel, with registration now open [105]. - The event is recognized as a significant technology business summit, attracting thousands of industry professionals and millions of online viewers each year [107].
顶流设计Agent能用Nano Banana Pro了!一句话BlackPink变东北翠花
量子位· 2025-11-24 03:39
Core Insights - The article discusses the integration of Lovart, a leading design agent, with Nano Banana Pro, highlighting its impact on design processes and user experience [1][7][34]. Group 1: Lovart and Nano Banana Pro Integration - Lovart has officially integrated with Nano Banana Pro, enhancing its capabilities for designers [1]. - The integration allows users to create designs with simple prompts, making the design process accessible even for those without technical skills [3][10]. - Lovart's annual recurring revenue (ARR) surpassed $30 million within two months of its official launch, indicating strong market demand [8]. Group 2: Features and Functionalities - Lovart supports multi-modal context processing, allowing users to edit images and generate videos seamlessly within a single canvas [9][21]. - The new Touch Edit feature enables users to modify specific elements in a design without disrupting the overall structure, improving the editing experience [24][36]. - Users can process up to 14 images simultaneously, streamlining the design workflow [13][21]. Group 3: Practical Applications - Lovart can generate professional-quality presentations and complex visual content quickly, reducing the time spent on AI training [47][61]. - The platform allows for the easy modification of generated content, ensuring that users can refine their designs as needed [60][64]. - Lovart's ability to link various models for image and video generation enhances its versatility and usability for different creative tasks [65][71]. Group 4: User Incentives - Users who subscribe to the Basic plan or higher before November 30 will receive a year of unlimited access to Nano Banana Pro at no cost, promoting user engagement [72].
杭州蚂蚁投了家腾讯系具身智能公司
量子位· 2025-11-23 10:33
Core Viewpoint - Ant Group has invested in a Tencent-backed embodied intelligence company, Stardust Intelligence, which recently completed a multi-hundred million yuan A++ round of financing, indicating strong market interest and confidence in its innovative technology [1][3][5]. Financing and Valuation - Stardust Intelligence has successfully completed its A++ round of financing, led by Ant Group and Guokai Investment, with participation from existing investor Jinqiu Fund, raising several hundred million yuan [5][6]. - Following this round, Stardust Intelligence has achieved a valuation of 2 billion yuan, joining the ranks of high-valuation startups in the embodied intelligence sector [4]. Company Background and Technology - Founded in December 2022, Stardust Intelligence focuses on a unique technology route involving rope-driven AI robots, which differ from traditional rigid robots by using flexible ropes for movement [13][17]. - The rope-driven robots are designed to mimic human muscle function, allowing for greater flexibility and adaptability in various operational environments, making them suitable for tasks requiring dexterity and human collaboration [19][23]. Product Development and Market Applications - Stardust Intelligence has made significant strides in product development, showcasing the Astribot S1, capable of performing tasks like folding clothes and cooking, and recently launching several new products aimed at commercial service scenarios [25][27]. - The company has established partnerships with major firms such as ByteDance, Tencent, and JD, and has deployed its robots across sectors including research, cultural tourism, and logistics, securing thousands of orders [35]. Team and Leadership - The core team of Stardust Intelligence includes experienced professionals from Tencent's Robotics X lab, with CEO Lai Jie having over 16 years of experience in robotics research and development [40][41]. - The founding team’s diverse backgrounds in technology and business from leading companies like Google and Huawei contribute to the rapid implementation of their rope-driven technology [48][49]. Future Outlook - CEO Lai Jie emphasizes that the real challenge lies ahead in scaling the deployment of robots in open environments, aiming to integrate AI robots into everyday life as reliable productivity nodes [50].
「创业初期全靠人工」,AI笔记独角兽自曝了
量子位· 2025-11-23 10:33
Core Viewpoint - The article discusses the controversial history of Fireflies, a leading AI note-taking company, revealing that in its early days, the company relied on manual note-taking rather than actual AI technology, raising concerns about privacy and trust in AI applications [2][7][21]. Group 1: Company Background - Fireflies, a top AI meeting assistant, has achieved a valuation of $1 billion and serves over 500,000 organizations, including 75% of Fortune 500 companies [5][36]. - The company has experienced rapid growth, with its user base increasing eightfold in the past 18 months, making it one of the fastest-growing AI applications globally [35]. Group 2: Early Operations - In its initial phase, Fireflies' founders manually took notes during meetings, presenting themselves as an AI service to clients, which they later admitted to be a strategy to save on development costs [10][14][17]. - The founders participated in over 100 meetings, often struggling to stay awake, while they manually transcribed notes and sent them to clients shortly after [18][19]. Group 3: Privacy and Trust Issues - The revelation of manual note-taking has sparked significant backlash, with critics highlighting the potential privacy violations and the ethical implications of having humans listen in on confidential meetings [23][27]. - Concerns have been raised about the trustworthiness of Fireflies, as clients expect AI-driven solutions rather than human involvement in sensitive discussions [26][28]. Group 4: Current Operations and Future Outlook - Fireflies has since transitioned to fully automated note-taking since 2017, addressing earlier concerns about privacy and manual participation [28][39]. - The company has achieved profitability since 2023 without relying on new funding, indicating a sustainable business model that contrasts with many other AI startups [44][45].