Workflow
具身智能之心
icon
Search documents
社区准备做一些访谈了,关于求职,读博/转方向......
具身智能之心· 2025-11-01 05:40
Core Insights - The article emphasizes the growing opportunities in the embodied intelligence sector, highlighting an increase in funding and job openings compared to the previous year [1][2] - The community is preparing interviews with industry leaders to provide insights on job hunting and research advice for newcomers [1][2] Group 1: Community Engagement - The community is organizing interviews with experienced professionals to share their career paths and insights into the industry [1] - There is a focus on creating a closed-loop system for sharing knowledge across various fields, including industry, academia, and job opportunities [2][5] - The community has established a referral mechanism for job placements with various companies in the embodied intelligence sector [11] Group 2: Educational Resources - A comprehensive technical roadmap has been developed for beginners, outlining essential skills and knowledge areas [7] - The community has compiled numerous open-source projects and datasets relevant to embodied intelligence, facilitating quick access for newcomers [12][26] - Various learning paths have been organized, covering topics such as reinforcement learning, multi-modal models, and robotic navigation [12][40] Group 3: Industry Insights - The community is hosting roundtable discussions and live streams to address ongoing challenges and developments in the embodied intelligence industry [5] - A collection of industry reports and research papers has been compiled to keep members informed about the latest advancements and applications [19] - The community includes members from renowned universities and leading companies in the field, fostering a rich environment for knowledge exchange [11][15]
招募VLA+RL方向的合伙人!
具身智能之心· 2025-10-31 04:00
Core Insights - The article discusses the recruitment of a lecturer for an online course focused on VLA (Vision-Language Alignment) and RL (Reinforcement Learning) [1][2] - The community aims to enhance understanding and knowledge sharing in the field of embodied intelligence, specifically in VLA and RL [3] Recruitment Requirements - Candidates should have a research background in VLA and RL, preferably holding a PhD or being a doctoral student, with publications in top conferences [2] - Practical experience in the industry, including hands-on debugging with real machines, is also desired [2] Community Overview - The company, "Embodied Intelligence Heart," is identified as the first comprehensive technical exchange community in China, focusing on VLA and RL [3] - The community has attracted a significant number of individuals interested in these research areas [3] Compensation and Resources - The company offers compensation that is above the industry average, along with access to extensive industry resources [4]
再创历史!英伟达市值一夜突破5万亿美元!
具身智能之心· 2025-10-31 00:04
Core Viewpoint - Nvidia has become the first company in history to surpass a market valuation of $5 trillion, marking a significant milestone in the tech industry [2][4][11]. Group 1: Market Performance - On October 29, Nvidia's stock price rose by 5.44%, reaching an intraday high of $212.19 per share, and closing at $207.04 per share, resulting in a market capitalization of $5.03 trillion [3][11]. - Since the beginning of 2025, Nvidia's stock has surged by 56%, showcasing its rapid growth compared to other major tech companies [6][40]. - Nvidia's market value now exceeds the combined market capitalizations of major competitors such as AMD, Intel, and Qualcomm, as well as entire sectors within the S&P 500 [6][7]. Group 2: Growth Trajectory - Nvidia's market value has skyrocketed from $1 trillion to $5 trillion in just two and a half years, a feat unmatched by other tech giants [10][24]. - The company achieved its first $1 trillion valuation in May 2023, followed by reaching $3 trillion in June 2024, and then $4 trillion in just over a year [23][24]. - In contrast, Microsoft took nearly six years to grow from $1 trillion to $4 trillion, while Apple took over seven years for the same growth [17][20]. Group 3: Strategic Developments - The recent surge in Nvidia's market value is attributed to announcements made during the GTC developer conference, where CEO Jensen Huang unveiled several technological advancements and partnerships [26][40]. - Key highlights from the conference included plans to collaborate with the U.S. Department of Energy to build new supercomputers and the introduction of the Blackwell chip series, which is expected to significantly increase production [27][28]. - Nvidia's new open system architecture, Nvidia NVQLink, aims to accelerate the development of quantum supercomputers, further positioning the company at the forefront of technological innovation [29][32]. Group 4: Future Outlook - Nvidia anticipates that the cumulative revenue from its upcoming products, including the Blackwell and Rubin chip platforms, could reach $500 billion by the end of next year [32][34]. - The company is also set to invest up to $100 billion in building AI data centers in collaboration with OpenAI, indicating a strong commitment to expanding its AI infrastructure [40][41]. - Nvidia's growth is closely tied to the increasing demand for computational power driven by AI advancements, with its GPUs being integral to the infrastructure of leading AI companies [40].
OmniDexGrasp 揭秘:基础模型 + 力反馈,让机器人 “看懂指令、灵活抓握” 的通用方案
具身智能之心· 2025-10-31 00:04
Core Insights - The article discusses the OmniDexGrasp framework, which addresses the challenges of dexterous grasping in robotics by combining foundation models with force feedback control to achieve generalizable and physically feasible grasping [1][2][21]. Group 1: Challenges in Dexterous Grasping - Current dexterous grasping solutions face a dilemma between data-driven approaches, which struggle with generalization due to limited datasets, and foundation models, which often fail to translate abstract knowledge into physical actions [2]. - The core issue is the inability to balance generalization and physical feasibility, leading to failures in grasping new objects or in complex scenarios [2]. Group 2: OmniDexGrasp Framework - OmniDexGrasp employs a three-stage approach: generating human grasping images, action transfer to robots, and force feedback control, effectively bridging the gap between abstract knowledge and physical execution [4][21]. - The framework retains the generalization capabilities of foundation models while ensuring physical feasibility through precise action transformation and control strategies [4]. Group 3: Key Modules of OmniDexGrasp - **Module 1**: Generates human grasping images to help robots understand how to grasp objects, utilizing a variety of input designs to accommodate different user needs [6][8]. - **Module 2**: Translates human grasping images into robot actions, addressing the challenge of aligning human intent with robotic capabilities through a three-step transfer strategy [9][12]. - **Module 3**: Implements force feedback control to ensure stable and safe grasping, adapting to the physical properties of objects and preventing damage during the grasping process [12][13]. Group 4: Experimental Results - OmniDexGrasp demonstrated an average success rate of 87.9% across six core grasping tasks, significantly outperforming traditional methods [15]. - In comparative tests, OmniDexGrasp showed superior generalization, especially with new objects, achieving success rates that far exceeded those of existing solutions [16][18]. Group 5: Future Directions - The framework suggests future enhancements through multi-modal observation integration and deeper control task development, aiming for end-to-end general manipulation capabilities [22]. - The potential for OmniDexGrasp to extend beyond grasping to broader manipulation tasks is highlighted, indicating its versatility in robotic applications [20].
面向实习/校招:京东具身智能算法岗位开放投递
具身智能之心· 2025-10-31 00:04
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 单位:京东探索研究院具身智能实验室 地点:北京亦庄 类型:实习/校招 简历投递:zhangtianle14@jd.com 或 liyihang18@jd.com 【工作职责】 【任职资格】 负责收集并规范化处理视频或机器人操作数据。 负责VLA模型算法在仿真环境和真实机器人上部署及测试。 本科及以上学历,人工智能、计算机科学、自动化或机器学习相关领域。 熟悉VLA模型训练及测试(包括但不限于pi0、pi0.5、Gr00t N1.5、OpenVLA等)。 精通Python/C++,熟练使用PyTorch深度学习框架。 具备独立分析和解决问题的能力,较强的协作沟通能力。 有VLA模型真机部署经验者优先。 【工作职责】 负责视觉-语言-动作(VLA)模型算法研发,包括模型架构设计、数据利用及模型训练方法。 【任职资格】 负责虚实仿真数据合成算法研发,涉 ...
阿里新研究:一统VLA和世界模型
具身智能之心· 2025-10-31 00:04
Core Insights - The article discusses the development of WorldVLA, a unified framework that integrates Visual Language Action models (VLA) with world models, aimed at enhancing AI's understanding of the world [2][5]. Group 1: Framework and Model Integration - WorldVLA demonstrates significant performance improvements over independent action and world models, showcasing a mutual enhancement effect [3][20]. - The framework combines the capabilities of action models and world models to predict future images and generate actions, addressing the limitations of each model when used separately [5][6]. Group 2: Model Architecture and Training - WorldVLA utilizes three independent tokenizers for encoding images, text, and actions, with a compression ratio of 16 and a codebook size of 8192 [9]. - The model employs a novel attention mask for action generation, allowing for parallel generation of multiple actions while maintaining the integrity of the generated sequence [12][13]. Group 3: Performance Metrics and Results - Benchmark tests indicate that WorldVLA outperforms discrete action models, even without pre-training, with notable improvements in various performance metrics [20][22]. - The model's performance is positively correlated with image resolution, with 512×512 pixel resolution yielding significant enhancements over 256×256 resolution [22][24]. Group 4: Mutual Benefits of Model Types - The integration of world models enhances action models by providing a deeper understanding of environmental physics, which is crucial for tasks requiring precision [26][27]. - Conversely, action models improve the visual understanding capabilities of world models, leading to more effective action generation [18][31].
ICCV 2025 | Mamba-3VL:单一模型攻克18类异构任务,重新定义具身智能大模型能力边界
具身智能之心· 2025-10-30 10:00
Core Insights - The article discusses the Mamba-3VL model, which integrates state space modeling into 3D vision-language learning, addressing the challenge of task adaptability in embodied intelligence [2][3][18] - Mamba-3VL demonstrates the capability to handle 18 heterogeneous tasks across various domains, marking a significant advancement in the field of embodied intelligence [3][11][17] Summary by Sections 1. Core Method Innovations - Mamba-3VL introduces three key technological breakthroughs to overcome the limitations of traditional embodied models, particularly those based on Transformer architecture [3][5] - The model utilizes a multi-modal Mamba Mixer module to efficiently fuse 3D point clouds, visual data, and language inputs, enhancing spatial relationship modeling [5][6] - A dynamic position encoding mechanism, IDPA, combines geometric priors and semantic modulation to adapt to varying task precision requirements [6][9] - The unified query decoding framework allows for flexible output across multiple tasks without the need for module reconstruction [6][10] 2. Comprehensive Task Coverage - Mamba-3VL supports 18 distinct tasks categorized into four major dimensions, showcasing its versatility in both foundational and advanced embodied interactions [11][12] - The tasks include basic 3D perception, language reasoning, instance segmentation, and advanced interaction and planning tasks [11][14] 3. Performance and Generalization - The model sets new performance records on key benchmarks, demonstrating superior capabilities in handling large-scale 3D data with linear computational complexity [15][16] - Mamba-3VL achieves state-of-the-art results in various tasks, including dense description generation and robotic operations, indicating strong generalization abilities [15][17] 4. Research Significance - The advancements presented by Mamba-3VL redefine the direction of general embodied intelligence, suggesting applications in robotics, autonomous driving, virtual reality, and smart home control [17][18] - The model's ability to adapt to 18 heterogeneous tasks without extensive retraining paves the way for future developments in multi-task embodied intelligence [20]
具身智能之心交流群成立来!VLA/RL/导航/数采等多个方向
具身智能之心· 2025-10-30 10:00
Group 1 - The establishment of a technical exchange group focused on embodied intelligence technology, inviting participation from various subfields [1] - The group encompasses nearly 20 sub-directions, including humanoid robots, quadrupeds, robotic arms, and areas such as vla, large models, vln, reinforcement learning, mobile operation, multimodal perception, simulation, and data collection [1] - The invitation encourages collaboration and discussion on technology and industry developments among participants [1]
能部署ACT和pi0,专为具身领域打造的高性价比机械臂来啦!
具身智能之心· 2025-10-30 03:43
面向具身科研领域打造的轻量级高性价比机械臂 还在为具身智能领域的硬件选择发愁吗? 太贵的机械臂买不起,太便宜的又难用、难上手? 别担心,Imeta-Y1 来了——这是一款专为新手和科研初学者设计的轻量级高性价比机械臂。 无论你是学生、教育工作者,还是刚踏入机器人领域的开发者,Imeta-Y1 都能帮你低成本、高效率地完成 算法验证与项目开发。 对小白尤其友好的是: ✅ 提供全流程开源工具链+代码示例,从数据采集到模型部署一气呵成; ✅ 支持 Python / C++ 双语言接口,无论你擅长哪种语言都能快速上手; ✅ 兼容 ROS1 / ROS2,并提供 URDF 模型,仿真与真机无缝切换; ✅ 24小时快速售后响应,遇到问题不卡壳,学习路上有保障! 该机械臂融合高精度运动控制、低功耗设计与开放软硬件架构,支持从仿真到真机的无缝联调,并提供全 流程开源SDK与工具链,助力用户快速实现算法验证、数据采集、模型训练与部署应用。 其紧凑型结构与模块化接口,尤其适用于嵌入式AI与机器人学习平台的开发与应用推广。 | 本体重量 | 4.2KG | 额定负载 | 3KG | 自由度 | 6 | | --- | --- | ...
近500页史上最全扩散模型修炼宝典,一书覆盖三大主流视角
具身智能之心· 2025-10-30 00:03
Core Insights - The article discusses the comprehensive guide on diffusion models, which have significantly reshaped the landscape of generative AI across various domains such as images, audio, video, and 3D environments [3][5][6] - It emphasizes the need for a structured understanding of diffusion models, as researchers often struggle to piece together concepts from numerous papers [4][10] Summary by Sections Introduction to Diffusion Models - Diffusion models are framed as a gradual transformation process over time, contrasting with traditional generative models that directly learn mappings from noise to data [12] - The development of diffusion models is explored through three main perspectives: variational methods, score-based methods, and flow-based methods, which provide complementary frameworks for understanding and implementing diffusion modeling [12][13] Fundamental Principles of Diffusion Models - The origins of diffusion models are traced back, linking them to foundational perspectives such as Variational Autoencoders (VAE), score-based methods, and normalizing flows [14][15] - The chapter illustrates how these methods can be unified under a continuous time framework, highlighting their mathematical equivalence [17] Core Perspectives on Diffusion Models - The article outlines the core perspectives on diffusion models, including the forward process of adding noise and the reverse process of denoising [22] - Each perspective is detailed: - Variational view focuses on learning denoising processes through variational objectives [23] - Score-based view emphasizes learning score functions to guide denoising [23] - Flow-based view describes the generation process as a continuous transformation from a simple prior distribution to the data distribution [23][24] Sampling from Diffusion Models - The sampling process in diffusion models is characterized by a unique refinement from coarse to fine details, which presents a trade-off between performance and efficiency [27][28] - Techniques for improving sampling efficiency and quality are discussed, including classifier guidance and numerical solvers [29] Learning Fast Generative Models - The article explores methods for directly learning fast generative models that approximate the diffusion process, enhancing speed and scalability [30] - Distillation-based methods are highlighted, where a student model mimics a slower teacher model to achieve faster sampling [30][31] Conclusion - The book aims to establish a lasting theoretical framework for diffusion models, focusing on continuous time dynamical systems that connect simple prior distributions to data distributions [33] - It emphasizes the importance of understanding the underlying principles and connections between different methods to design and improve next-generation generative models [36]