Workflow
ImageNet数据集
icon
Search documents
从洗碗工到“AI教母”,她又预言了下一个十年
3 6 Ke· 2026-01-13 07:31
作为"AI教母",李飞飞每次对AI的判断都会成为全球科技的重要风向标,2025年年底,李飞飞发出万字 长文,引发的讨论又一次引爆了硅谷。 她笃定AI的下一个十年是"空间智能"——若AI无法理解物体的深度、距离、遮挡与重力,就永远无法真 正"具身"。她表示,语言是用来描述世界的工具,但不是世界本身。 1月6日,李飞飞应邀站在CES 2026的演讲台上,她也再度强调了"大语言模型终究受制于语言本身"的局 限性。 "一只苍蝇没有万亿级参数,却能在杂乱空间中极速避障、精准着陆。"关于空间智能,网络上流传着这 样一句热梗。在李飞飞发布的万字长文末尾,便提到:"若没有空间智能,我们关于真正智能机器的梦 想将永远不完整。这场探索,是我的北极星。" 这颗恒星对李飞飞来说,意义非凡。它是李飞飞十几岁时,在一次野外徒步中对自然世界的感悟。她在 加州理工学院读博期间,受认知神经科学启发,开始研发"如何教计算机识别物体"。后来她一手缔造 ImageNet数据集,推动计算机视觉领域实现跨越式发展,也是她从寒武纪大爆发与生物视觉起源中攫取 到的灵感。 2025年末,李飞飞创立的World Labs发布首款商用"世界模型"Marble。该 ...
李飞飞团队25年研究大盘点:从视觉理解到具身智能的全景图谱
自动驾驶之心· 2025-11-07 00:05
Core Insights - The research team led by Professor Fei-Fei Li at Stanford University has made significant advancements in artificial intelligence, focusing on human-centered AI and its applications in various domains [2][3][19]. - The team's work emphasizes a holistic approach to AI, integrating perception, modeling, reasoning, and decision-making to create intelligent systems that can understand and reconstruct the world [3][19]. Research Achievements in 2025 - The team has achieved notable results in generative modeling, developing a framework that enhances the transfer of knowledge from 2D to 3D environments, showcasing improved generalization and scalability [3][19]. - In the area of embodied intelligence, the team has successfully integrated affordance learning and action constraints to enable robots to generalize across different tasks and environments [3][19]. - The research on semantic reasoning and human-machine understanding has strengthened model consistency in dynamic environments, enhancing the alignment between visual and language inputs [3][19]. - The team has actively contributed to AI governance and social responsibility, advocating for policy assessments and safety frameworks in cutting-edge AI technologies [3][19]. Specific Research Contributions - The MOMAGEN framework addresses the challenge of efficiently generating demonstration data for multi-step robotic tasks, significantly improving data diversity and generalization capabilities with minimal real data [5][7]. - The Spatial Mental Modeling study introduces a new benchmark, MINDCUBE, to evaluate visual language models' ability to construct spatial mental models from limited views, revealing the importance of internal spatial structure representation [9][10]. - The UAD framework allows for unsupervised extraction of affordance knowledge from large-scale models, enhancing robotic manipulation capabilities in open environments without manual labeling [10][12]. - The Grafting method enables efficient exploration of diffusion transformer designs without the need for retraining, achieving high-quality generation with minimal computational resources [12][14]. - The NeuHMR framework improves 3D human motion reconstruction by utilizing neural rendering, enhancing robustness and accuracy in complex scenarios [14][16]. - The BEHAVIOR ROBOT SUITE provides a comprehensive platform for real-world robotic manipulation tasks, demonstrating capabilities in dual-arm coordination and precise navigation [16][18]. - The MOMA-QA dataset and SGVLM model advance video question answering by emphasizing fine-grained temporal and spatial reasoning, significantly outperforming existing methods [18][19]. - The Gaussian Atlas framework facilitates the transfer of knowledge from 2D diffusion models to 3D generation tasks, bridging the gap between these two domains [18][19]. Keywords for 2025 - Cognition, Generation, Embodiment, Transfer, Explainability [20]