量子位
Search documents
波士顿动力机器人终于有脑子了!人类故意使绊子也不怕
量子位· 2025-08-22 02:30
Core Viewpoint - Boston Dynamics has upgraded its Atlas robot to incorporate end-to-end AI capabilities, allowing it to understand natural language commands, autonomously plan actions, and handle unexpected situations [1][8]. Group 1: Atlas Robot Capabilities - The new version, Atlas MTS, can recognize and open boxes even when the lid is closed [2]. - It can accurately identify changes in the position of objects, such as boxes being moved [4]. - Atlas can discover and correctly place missing items into boxes, showcasing its advanced perception [6]. - The robot can autonomously respond to unexpected situations, such as parts falling or lids not being closed [21][22]. - It has the ability to learn any action that a human can demonstrate, including tying knots and folding chairs [23]. Group 2: Technical Innovations - The upgrade was developed in collaboration with Toyota Research Institute and is based on a Large Behavior Model (LBM) [8]. - The LBM utilizes a diffusion Transformer model with 450 million parameters, converting various inputs like images and natural language into action commands for Atlas [17]. - The integration of a model predictive controller with a VR interface allows for precise control over a wide range of tasks, from fine motor skills to full-body movements [19][20]. Group 3: Transition to Electric Drive - Boston Dynamics has retired the hydraulic version of Atlas and released an all-electric version, which is more cost-effective and integrates better with AI systems [28][29]. - Electric drive systems offer higher precision, lower energy consumption, and are more compatible with AI learning frameworks [30]. - The transition to electric drive has enabled Boston Dynamics to continuously introduce new movements and capabilities for the robot [31][36]. Group 4: Competitive Landscape - The article also mentions the domestic company Yushu, which has consistently used electric drive technology in its robots, achieving rapid iterations and gaining global recognition [39]. - Yushu's product lineup includes various humanoid robots with different specifications and price points, showcasing a focus on electric drive philosophy [41]. Group 5: Future Outlook - The integration of electric drive technology with AI algorithms is expected to usher in a new era for electric robots [44].
OpenAI头号叛徒,竟然是自学的AI???
量子位· 2025-08-22 02:30
Core Viewpoint - The article discusses the journey of Tom Brown, co-founder of Anthropic, who transitioned from a self-taught AI enthusiast to a key player in the AI industry, challenging his former employer, OpenAI, with the success of their model, Claude 3.5 Sonnet [1][2][16]. Group 1: Tom Brown's Journey - Tom Brown initially struggled academically, particularly in linear algebra, but decided to self-study AI after leaving his job [2][35]. - He developed a structured self-learning plan over six months, which included online courses and practical projects, leading to his eventual entry into OpenAI [36][38]. - Brown played a significant role in the development of GPT-3 at OpenAI, focusing on scaling and model architecture improvements [41][45]. Group 2: Anthropic's Competitive Position - Anthropic, founded by former OpenAI employees, has gained significant market share, now holding 32% of the market, particularly excelling in programming capabilities [17][20]. - The release of Claude 3.5 Sonnet marked a turning point for Anthropic, allowing it to compete directly with OpenAI's offerings [16][13]. - Recent developments include the expansion of Claude's context window to 1 million tokens, directly challenging OpenAI's GPT-5 [25][24]. Group 3: Industry Dynamics - The competitive landscape between Anthropic and OpenAI has intensified, with both companies rapidly releasing new models and features [24][26]. - OpenAI's market share has declined by 25%, while Anthropic has positioned itself as a leader in certain AI applications [17][20]. - The article highlights the strategic moves made by both companies, including API access restrictions and model upgrades, indicating a fierce rivalry [21][22][24]. Group 4: Career Advice from Tom Brown - Tom Brown offers five key career tips for aspiring professionals: prioritize networking, seek mentorship, demonstrate value, engage in hands-on experience, and embrace risk-taking [48].
清北浙领跑两院新院士候选!最年轻被提名人39岁
量子位· 2025-08-22 02:30
Core Viewpoint - The selection of academicians by the Chinese Academy of Sciences and the Chinese Academy of Engineering reflects the development trends in various scientific fields, with a notable increase in candidates for the 2025 election, particularly in artificial intelligence and emerging disciplines [6][14]. Summary by Sections Candidate Numbers - The number of candidates for the Chinese Academy of Sciences has increased to 639, representing a growth of approximately 9.6% compared to 2023 [7][8]. - The Chinese Academy of Engineering has 660 candidates, which is an increase of about 0.8% from 2023 [8]. Discipline Distribution - The distribution of candidates across disciplines shows a focus on life sciences, engineering, and emerging technologies, with specific numbers for each discipline listed [8][10]. - The adjustment in discipline allocation indicates a shift towards prioritizing new and interdisciplinary fields, particularly artificial intelligence [13][21]. Artificial Intelligence Focus - Artificial intelligence is transitioning from a subfield of information technology to an independent discipline, with its own dedicated nomination slot in the 2025 selection guidelines [15][17]. - The 2025 guidelines for the Chinese Academy of Engineering highlight robotics as a priority discipline, indicating a significant shift in focus from previous years [19][21]. Notable Candidates and Recommendations - Several prominent figures in the artificial intelligence field are among the candidates, including researchers from leading universities and institutions [24]. - Recommendations for candidates come from notable academicians, emphasizing the importance of mentorship and recognition within the academic community [26][27].
小扎“亿元俱乐部”车门焊死!被曝冻结招聘,禁止内部人员流动
量子位· 2025-08-22 00:59
Core Viewpoint - Meta has recently frozen hiring in its Superintelligence Labs, indicating a significant organizational restructuring amidst rising tensions between new and existing employees due to salary disparities and cultural clashes [1][6][8]. Group 1: Organizational Changes - Meta's Superintelligence Labs has been restructured into four independent groups, focusing on high-risk innovations, product applications, infrastructure, and foundational AI research [11][15]. - The hiring freeze requires approval from the new Chief AI Officer, Alexandr Wang, for any exceptions, reflecting a shift in recruitment strategy [6][10]. Group 2: Recruitment and Internal Tensions - Meta has previously made aggressive recruitment efforts, hiring over 50 new employees from top AI companies, but this has led to internal friction regarding compensation and cultural integration [4][7][8]. - Existing employees have expressed dissatisfaction with the pay differences, leading to threats of resignation among some researchers [7][8]. Group 3: Financial Performance and Market Context - Despite the hiring freeze, Meta's AI investments have shown positive results, with Q2 2025 revenue reaching $47.52 billion, a 22% year-over-year increase, and net profit of $18.34 billion, up 36% [19][20]. - The company is facing scrutiny over rising costs and investor concerns, prompting a strategic reassessment of its AI initiatives [20][22]. Group 4: Industry Perspective - The current climate in the tech industry is marked by concerns over an "AI bubble," with reports indicating that 95% of companies see no return on AI investments [14][17]. - Meta's AI-driven advertising systems have improved engagement metrics, suggesting that its investments are yielding tangible benefits, contrasting with broader industry trends [18].
稚晖君新大招:机器人二次开发0门槛了!
量子位· 2025-08-22 00:59
Core Viewpoint - The article discusses the launch of the "LinkCraft" platform by ZhiYuan Robotics, which aims to lower the barriers for secondary development of robots, allowing creators to easily program and customize robot actions without requiring extensive technical knowledge [1][5][6]. Group 1: Platform Overview - LinkCraft is positioned as an AI-powered platform for creators and developers, enabling robots to express themselves like humans and allowing creators to direct robot actions freely [7]. - The platform simplifies the process of programming robots by providing modular tools for action customization, reducing the complexity previously associated with robot programming [10][11]. Group 2: Features and Functionality - Users can select from a library of standardized actions or upload videos of human movements, which the AI will then convert into robot actions [20][21]. - The platform allows for real-time preview and adjustment of robot actions, making it user-friendly and accessible to non-experts [16][22]. - LinkCraft can analyze audio inputs to generate corresponding robot actions based on the emotional tone of the audio [24]. - Users can input text to generate speech and specify corresponding actions for the robot, enabling a more interactive performance [26]. Group 3: Future Developments - The beta version of LinkCraft is set to launch in October, with plans to expand compatibility to more hardware devices beyond the current humanoid robot, Lingxi X2 [14][13]. - A new prototype robot, Lingxi X2-W, was showcased, featuring advanced capabilities and compact design [27][29].
多人有声视频一体化生成!用百度最新AI生成营销视频,现在1.4元/5秒
量子位· 2025-08-21 11:10
Core Viewpoint - Baidu has shifted its stance on video generation models, now aggressively developing its MuseSteamer (蒸汽机) video generation model, which has recently upgraded to version 2.0, focusing on integrated multi-person audio and video generation [1][21]. Summary by Sections Product Features - MuseSteamer 2.0 excels in complex camera movements and storytelling capabilities, with improved video quality [2]. - The model can generate detailed visuals, including intricate features like scales and makeup on characters, and can create humorous scenarios [3]. - Users can experience the product through Baidu search or the "绘想" platform [5]. - There are four versions of MuseSteamer 2.0: Turbo, Lite, Pro, and Audio, with varying pixel quality and features [6]. - The pricing is competitive, with the Turbo audio version priced at 2.5 yuan per second, and a limited-time offer of 1.4 yuan for 5 seconds [8]. Technical Innovations - The model achieves integrated multi-person audio and video generation with millisecond precision in aligning voice with lip movements and expressions [17]. - It employs a unique Latent Multi-Modal Planner technology to coordinate multiple roles and emotions, ensuring coherent storytelling [17]. - The model is designed to deeply adapt to Chinese scenarios, achieving over 98% accuracy in rendering Chinese speech details and emotional expressions [18]. - It generates film-quality visuals through precise dynamic characterization of subjects [19]. - The camera control is sophisticated, utilizing professional lens techniques to align visual details with creative intent [20]. Market Strategy - Baidu's development of MuseSteamer is driven by the strong demand from its internal applications, including search, content distribution, and commercial needs [21][26]. - The model is already widely used within Baidu's mobile ecosystem, enhancing multi-modal experiences across various platforms [22]. - Examples of applications include creative marketing videos for brands like Volkswagen and Yili, showcasing the model's capabilities in real-world scenarios [24][25].
vivo率先发布国产版Vision Pro,重量是苹果2/3,售价预计苹果1/3
量子位· 2025-08-21 11:10
Core Viewpoint - The article discusses the launch of vivo's first domestic MR headset, the vivo Vision Exploration Edition, highlighting its lightweight design, advanced features, and potential market impact compared to competitors like Apple's Vision Pro [1][10]. Group 1: Product Features - The vivo Vision weighs only 398g, significantly lighter than Apple's Vision Pro at 600g, making it comparable to wearing headphones [3][12]. - The headset supports magnetic prescription lenses for users with myopia, accommodating a wide range of vision needs from 100 to 1000 degrees [6][39]. - It features eye-hand interaction, allowing seamless connectivity with vivo smartphones and PCs [7][41]. - The device is powered by the second-generation Snapdragon XR2+ chip, enhancing performance [8]. Group 2: Design and Comfort - The design focuses on reducing weight, which is crucial for comfort, wear duration, and portability, as heavy headsets can lead to discomfort [16][17]. - The ergonomic design is based on human facial measurements to minimize pressure points, resulting in less facial indentation after prolonged use [24][22]. - The headset's dimensions (height: 83mm, thickness: 40mm) allow it to fit easily into small bags, appealing to users who prioritize portability [21]. Group 3: Display and Interaction - The vivo Vision utilizes a Micro-OLED display with dual 8K resolution, achieving a resolution of 3840*3552*2 and covering 94% of the DCI-P3 color gamut [27][28]. - It offers a 180-degree field of view and can simulate a 120-inch virtual screen at a distance of 100 meters [30][32]. - The device supports ultra-low latency color passthrough with a delay of just 13ms, facilitating real-time interaction with the environment [35][36]. Group 4: Market Position and Future Goals - vivo is the first domestic smartphone manufacturer to pursue the MR headset route, positioning itself as a bridge between the physical and digital worlds [50][51]. - The long-term vision includes using MR technology as a foundation for future household robots, addressing challenges in perception and decision-making in unstructured environments [51][52]. - The headset is not yet available for the consumer market, with plans for experiential stores in major cities like Beijing, Shanghai, and Guangzhou [53].
“半路截胡”张益唐,北大出身的中山大学校长这样做
量子位· 2025-08-21 07:15
Core Viewpoint - The article discusses the return of renowned mathematician Zhang Yitang to China, specifically to Sun Yat-sen University, after over 40 years abroad, highlighting the competitive nature of academic recruitment in China and Zhang's significant contributions to mathematics [2][3][4]. Group 1: Zhang Yitang's Academic Journey - Zhang Yitang, a prominent mathematician known for his work on the twin prime conjecture, has recently joined Sun Yat-sen University as the chief scientist of the newly established Hong Kong Advanced Institute [2][3]. - Prior to this, he was a tenured professor at the University of California, Santa Barbara, and had been contemplating a return to China for several years due to various international factors [3][4]. - His decision to join Sun Yat-sen University was somewhat unexpected, as he had other institutions lined up, but the university managed to secure his commitment at the last moment [4]. Group 2: Contributions to Mathematics - Zhang gained international recognition at the age of 58 for his groundbreaking paper "Bounded Gaps Between Primes," which provided significant progress on the twin prime conjecture [12][16]. - His research demonstrated the existence of infinitely many pairs of prime numbers with gaps smaller than 70 million, marking a historic advancement in number theory [11][12]. - This achievement was particularly notable as many experts had previously deemed the problem unsolvable, showcasing Zhang's unique approach and capabilities in mathematics [13][14]. Group 3: Personal Background and Philosophy - Born in 1955 in Shanghai, Zhang displayed exceptional mathematical talent from a young age, independently proving the Pythagorean theorem at the age of 10 [18][20]. - He faced significant challenges in his early career, including difficulties finding academic positions in the U.S. after completing his Ph.D., which led him to work in a restaurant temporarily [28][33]. - Zhang emphasizes the importance of passion for mathematics over material success, stating that he values the ability to continue his work in mathematics regardless of his circumstances [34].
上下文即记忆!港大&快手提出场景一致的交互式视频世界模型,记忆力媲美Genie3,且更早问世!
量子位· 2025-08-21 07:15
Core Viewpoint - The article discusses a new framework called "Context-as-Memory" developed by a research team from the University of Hong Kong and Kuaishou, which significantly improves scene consistency in interactive long video generation by efficiently utilizing historical context frames [8][10][19]. Summary by Sections Introduction to Context-as-Memory - The framework addresses the issue of scene inconsistency in AI-generated videos by using a memory retrieval system that selects relevant historical frames to maintain continuity [10][19]. Types of Memory in Video Generation - Two types of memory are identified: dynamic memory for short-term actions and behaviors, and static memory for scene-level and object-level information [12][13]. Key Concepts of Context-as-Memory - Long video generation requires long-term historical memory to maintain scene consistency over time [15]. - Memory retrieval is crucial, as directly using all historical frames is computationally expensive; a memory retrieval module is needed to filter useful information [15]. - Context memory is created by concatenating selected context frames with the input, allowing the model to reference historical information during frame generation [15][19]. Memory Retrieval Method - The model employs a camera trajectory-based search method to select context frames that overlap significantly with the current frame's visible area, enhancing both computational efficiency and scene consistency [20][22]. Dataset and Experimental Results - A dataset was created using Unreal Engine 5, containing 100 videos with 7601 frames each, to evaluate the effectiveness of the Context-as-Memory method [23]. - Experimental results show that Context-as-Memory outperforms baseline and state-of-the-art methods in memory capability and generation quality, demonstrating its effectiveness in maintaining long video consistency [24][25]. Generalization of the Method - The method's generalization was tested using various styles of images as initial frames, confirming its strong memory capabilities in open-domain scenarios [26][27]. Research Team and Background - The research was a collaboration between the University of Hong Kong, Zhejiang University, and Kuaishou, led by PhD student Yu Jiwen under Professor Liu Xihui [28][33].
北大ChatExcel,获得千万级新投资
量子位· 2025-08-21 07:15
Core Viewpoint - ChatExcel has recently completed its angel round financing, securing nearly ten million from Shanghai Changlei Capital and Wuhan Donghu Angel Fund, aimed at accelerating product development and global market expansion [2][15]. Group 1: Company Development - ChatExcel is the first generative AI Excel and data analysis agent in China, allowing users to operate Excel spreadsheets through chat [6]. - The platform has achieved significant progress in AI spreadsheet processing and DataAgent technology, serving over one million users [10][9]. - The company has integrated into the ecosystems of major firms like Huawei, Lenovo, HP, and Alibaba Cloud, supporting commercial growth [12]. Group 2: Product Features and Upgrades - ChatExcel covers four main modules: Excel processing, data computation, data analysis, and chart generation, making it user-friendly for all skill levels [7]. - Recent updates include mobile H5 and desktop client support, an "enterprise version" with SSO, local deployment, and API calls, and a 300% increase in processing speed with a 50% improvement in model effectiveness [17][20][19]. - The tool now supports various data sources, including Excel files, databases, and web data, enabling comprehensive data analysis [34]. Group 3: Future Plans - The company plans to utilize the new funding to build an AI DataAgent that enhances data flow and creates a commercial closed loop [14]. - ChatExcel is actively pursuing product iteration and plans to introduce more new features in the coming months to enhance intelligence and user experience [28].