空间超感知
Search documents
早报|下代iPhone Air将延期发布/闪迪价格暴涨50%/摩根大通CEO:未来发达国家每周只需上班三天半
Sou Hu Cai Jing· 2025-11-11 00:45
Group 1: Apple and iPhone Air - Apple has decided to postpone the release of the next-generation iPhone Air due to poor sales performance since its launch in September 2025, with no new timeline provided for its release [5][6] - The iPhone Air, which features a slim design with a thickness of only 5.6mm, compromises on battery capacity and camera configuration, offering only a single rear camera at a price of $999 [5] - The iPhone 17 Pro offers a better value proposition with a triple-camera system and longer battery life, highlighting the challenges Apple faces in positioning a fourth model beyond the standard and Pro series [5] Group 2: NAND Flash Market - SanDisk has announced a 50% increase in NAND flash contract prices due to supply constraints, with expectations of a continued upward trend in the market [21] - The NAND flash market is experiencing a supply-demand imbalance, which is anticipated to persist until at least the end of 2026 [21] Group 3: AI and Business Trends - According to McKinsey's report, 88% of companies have adopted AI, but only 39% have seen an increase in earnings before interest and taxes (EBIT), indicating a gap between efficiency gains and profitability [45][46] - High-performing companies are more likely to benefit from AI, with 50% planning transformative changes driven by AI, compared to only 14% of average companies [46] - The demand for AI-related roles is increasing, while traditional roles face replacement pressures, with 32% of companies expecting a decrease in total workforce in the next year [46] Group 4: Robotics and AI Developments - The first international robot debate competition concluded with Songyan Power winning the championship, showcasing the potential of robots in both physical and intellectual domains [34] - A new AI framework called "Cambrian-S" has been proposed by researchers to enhance spatial perception and long-term memory in AI systems, indicating a shift towards more advanced AI capabilities [40]
腾讯研究院AI速递 20251111
腾讯研究院· 2025-11-10 16:30
Group 1: Generative AI Developments - OpenRouter platform has launched the anonymous model Polaris Alpha, believed to be a variant of GPT-5.1, with a knowledge base cutoff in October 2024 and a maximum context capacity of 256K and a single output limit of 128K [1] - Polaris Alpha shows smooth performance in desk work and programming tasks, exhibiting typical GPT characteristics and supporting NSFW mode [1] - The model is currently available for free via API, demonstrating good performance in programming mini-games and web design, with GPT-5.1 expected to be officially released in mid-November [1] Group 2: Multi-Modal Intelligence - A new multi-modal paradigm called Cambrian-S has been proposed by researchers including Yann LeCun, focusing on "spatial super-perception" and marking the first step in exploring video spatial super-perception [2] - The research outlines a development path for multi-modal intelligence across four levels: semantic perception, streaming event cognition, 3D spatial cognition, and predictive world modeling, introducing the VSI-SUPER benchmark for spatial super-perception capabilities [2] - Cambrian-S utilizes latent variable frame prediction to manage memory and event segmentation through a "surprise" signal, outperforming Gemini in spatial cognition tasks with smaller models [2] Group 3: AI Programming Tools - Meituan has launched an AI IDE programming tool named CatPaw, featuring code completion, agent Q&A generation, built-in browser preview debugging, and project-level analysis [3] - The core engine of CatPaw is Meituan's self-developed LongCat model, fully compatible with major programming languages like Python, C++, and Java, and currently available for free [3] - Over 80% of weekly active users among Meituan's internal developers utilize CatPaw, with AI-generated code accounting for about 50% of new code submissions, and a Windows version expected to launch soon [3] Group 4: Domestic AI IDE Launch - YunSi Intelligence has introduced Vinsoo, the world's first AI IDE equipped with a cloud-based security agent, surpassing products like Cursor and Codex that utilize Claude [4] - Vinsoo achieves breakthroughs in long-context engineering algorithms, supporting effective context lengths in the millions and allowing up to eight intelligent agents to operate simultaneously [4] - The new Beta 3.0 version supports cloud-based one-click publishing, mobile usage, and team collaboration, led by a founding team of post-00s graduates from top universities in China and the U.S. [4] Group 5: Open Source Audio Editing Model - Jieyue Xingchen has released the first open-source LLM-level audio editing model, Step-Audio-EditX, which allows precise control over audio emotions, speaking styles, and paralinguistic features through language commands [5] - The model employs a unified LLM framework and a "dual-codebook" audio tokenizer structure, supporting zero-shot text-to-speech, iterative editing, and bilingual capabilities [5] - With approximately 3 billion parameters, the model can run on a single 32GB GPU, achieving higher accuracy in emotion and style control compared to closed-source models like MiniMax and Doubao [5] Group 6: AI Glasses Launch - Baidu has officially launched the Xiaodu AI glasses Pro, priced at 2299 yuan, with a promotional price of 2199 yuan for Double Eleven, weighing 39 grams and featuring a 12-megapixel wide-angle camera [6] - The glasses integrate multi-modal AI models, offering functionalities such as photography, music recognition, AI translation, object recognition, note-taking, and audio recording, with real-time translation capabilities [6] - Similar to Xiaomi's AI glasses, these are not the more advanced AI+AR glasses currently available [6] Group 7: Robotics Innovation - Galaxy General has introduced the DexNDM, a dexterous hand neural dynamics model that achieves stable, multi-axial rotation operations on various objects, capable of using tools like screwdrivers and hammers [8] - The DexNDM model disassembles hand-object interactions to the joint level, utilizing a training process that allows for stable operations across tasks and forms without requiring successful examples [8] - This technology has been applied to remote operation systems, enabling operators to give high-level commands via VR controllers while DexNDM autonomously manages fine control at the finger level [8] Group 8: Insights on AI Entrepreneurship - A YC partner emphasizes that AI tools cannot replace a founder's sales capabilities, suggesting that AI should first target quick-to-implement entry points in traditional industries rather than aiming for full automation [9] - The core competitive advantage in early-stage entrepreneurship is "learning speed" rather than scale, with a focus on quickly validating ideas with small customers [9] - AI sales development representatives (SDRs) are effective only when there are already well-functioning sales processes, and founders must clarify their target audience and attention acquisition strategies for AI tools to be effective [9]
谢赛宁、李飞飞、LeCun联手提出多模态LLM新范式,「空间超感知」登场
机器之心· 2025-11-10 03:53
Core Insights - The article discusses the new research achievement named "Cambrian-S," which represents a significant step in exploring supersensing in video space [1][4] - It builds upon the previous work "Cambrian-1," focusing on enhancing AI's visual representation learning capabilities [2] Group 1: Definition and Importance of Supersensing - Supersensing is defined as how a digital entity truly experiences the world, absorbing endless input streams and continuously learning [4][5] - The research emphasizes that before developing "superintelligence," it is crucial to establish "supersensing" capabilities [4] Group 2: Development Path of Multimodal Intelligence - The team outlines a developmental path for multimodal intelligence, identifying video as the ultimate medium for human experience and a direct projection of real-life experiences [6] - They categorize the evolution of multimodal intelligence into several stages, from linguistic-only understanding to predictive world modeling [9] Group 3: Benchmarking Supersensing - The researchers conducted a two-part study to establish benchmarks for measuring supersensing capabilities, revealing that existing benchmarks primarily focus on language understanding and semantic perception, neglecting advanced spatial and temporal reasoning [14][25] - They introduced a new benchmark called VSI-Super, specifically designed to detect spatial intelligence in continuous scenarios [15][26] Group 4: Challenges in Current Models - The article highlights that current models, including Gemini-2.5-Flash, struggle with tasks requiring true spatial cognition and long-term memory, indicating a fundamental gap in the current paradigm [35][38] - The performance of advanced models on the VSI-Super benchmark was notably poor, underscoring the challenges of integrating continuous sensory experiences [35][36] Group 5: Predictive Sensing as a New Paradigm - The researchers propose predictive sensing as a forward path, where models learn to predict sensory inputs and build internal world models to handle unbounded visual streams [42][43] - This approach is inspired by human cognitive theories, emphasizing selective retention of sensory inputs and the ability to predict incoming stimuli [42][44] Group 6: Case Studies and Results - The article presents case studies demonstrating the effectiveness of surprise-driven event segmentation in improving performance on the VSI-Super benchmark [49][53] - The results indicate that the surprise-driven method outperformed existing models, showcasing better generalization capabilities [55][57]