Workflow
量子位
icon
Search documents
嚯,具身智能和脑机接口在康复医疗合体了
量子位· 2026-01-30 02:23
Core Viewpoint - The article discusses the integration of brain-machine interfaces (BMIs) and embodied intelligence in rehabilitation, highlighting the potential for these technologies to enhance patient recovery and redefine the roles of healthcare professionals and robots in medical settings [6][7][10]. Group 1: Brain-Machine Interface and Embodied Intelligence - The concept of combining BMIs with embodied intelligence is presented as a groundbreaking approach to rehabilitation, allowing robots to assist patients based on their brain signals [6][7]. - The integration of BMIs can potentially enable patients to control robotic devices through thought, enhancing the effectiveness of rehabilitation training [23][25]. - The article emphasizes that the future of rehabilitation may involve not only robots assisting doctors but also patients becoming "cyborgs" [8][10]. Group 2: Technological Advancements - Recent advancements in BMI technology, including lighter and more modular hardware, have made it feasible for large-scale deployment in clinical settings [31][36]. - The development of large models has improved the processing of complex brain signals, allowing for more accurate intention recognition [32][34]. - The article notes that the combination of these technological advancements has laid the groundwork for BMIs to actively participate in rehabilitation [35][36]. Group 3: Clinical Applications and Future Directions - Fourier's introduction of the "smart rehabilitation port" in 2020 has been a significant step in integrating advanced technologies into rehabilitation practices [11][12]. - The article outlines a strategic initiative to create a large-scale BMI data set to enhance the training of large models for better intention recognition [40][41]. - The potential for robots to serve as experimental platforms for understanding brain functions is highlighted, suggesting that they could facilitate research that is difficult to conduct directly on human subjects [64][65]. Group 4: Expert Insights and Discussions - Experts in the roundtable discussion emphasize the importance of leveraging intelligent devices to enhance the capabilities of healthcare professionals, rather than replacing them [49][56]. - The conversation also touches on the need for a comprehensive understanding of brain functions to improve the design of intelligent systems that can effectively interact with humans [51][62]. - The integration of BMIs, embodied intelligence, and AI is seen as a pathway to achieving significant advancements in both medical applications and broader societal impacts [60][63].
马斯克被曝合并SpaceX和xAI!估值1.5万亿美元,左手火箭右手AI
量子位· 2026-01-30 02:23
为了SpaceX的IPO,马斯克又有大动作! 路透社消息,马斯克 正在推进通过换股的方式,把旗下的SpaceX和xAI合并 。 克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 这一战略整合,正值SpaceX计划于今年晚些时候进行轰动性公开上市的关键前夕。 交易落地后,马斯克的火箭发射能力将与Grok人工智能模型彻底打通,整合在同一个商业屋檐下。 这也是马斯克一贯"商业帝国大一统"策略的延续,此前他已通过类似手段,将社交平台X并入了xAI体系。 SpaceX与xAI换股合并 根据路透社披露的消息,知情人士表示SpaceX与xAI正进行并购谈判,计划通过换股方式合并。 作为交易的一部分,有一些xAI高管也可以选择不要股票,直接拿现金,到目前为止,最终协议尚未签署。 不过这笔交易依然有实质进展——为了促进交易达成, 马斯克已经在内华达州设立了两个实体 。 这一系列紧锣密鼓的资本运作,其源头最早可追溯至2025年12月9日。彼时,彭博社率先披露了SpaceX正在秘密筹备公开上市的消息。 内华达州企业备案文件显示,这两个名为"K2 Merger Sub"的实体已于21日成立,其中一家有限责任公司将SpaceX及 ...
登顶行业SOTA的多模态视频生成标杆,昆仑天工刚给开源了
量子位· 2026-01-29 08:27
Core Viewpoint - The article discusses the launch and capabilities of the AI model SkyReels-V3 by Kunlun Tiangong, highlighting its advanced features in video generation and its open-source nature, which is seen as a significant technological advancement in the AI field [3][4][10]. Group 1: Model Features - SkyReels-V3 is a multi-modal video generation model capable of generating videos from text and images, extending video lengths, and creating virtual avatars [7][9]. - The model aims to eliminate the stiffness and disjointedness often associated with AI-generated videos, achieving a new level of realism and coherence [9][10]. - It supports various video formats and resolutions, allowing for seamless transitions and maintaining visual quality across different aspect ratios [19][45]. Group 2: Technical Innovations - SkyReels-V3 addresses common issues in AI video generation, such as the scarcity of high-quality training data, computational limitations, and a lack of understanding of physical laws [33][36]. - The model employs a "one core, multiple branches" architecture, utilizing a multi-modal in-context learning framework for differentiated fine-tuning across tasks [37][38]. - It incorporates advanced techniques like cross-frame pairing for data construction, multi-reference condition fusion for detail control, and mixed training strategies to enhance generalization [39][42][45]. Group 3: Performance Metrics - In comparative evaluations, SkyReels-V3 outperformed other models in terms of reference image consistency, instruction adherence, and visual quality [46][47]. - The model's video extension capabilities go beyond simple frame addition, employing intelligent semantic understanding to create coherent narrative continuations [49][54]. - It also features a virtual avatar model that can generate synchronized audio-visual content, supporting multi-character interactions and long video generation [55][60]. Group 4: Industry Context - The AI video generation sector is transitioning from mere technical demonstrations to a competitive landscape focused on commercial applications, with SkyReels-V3 standing out for its multi-modal capabilities and precision [64][65]. - Kunlun Tiangong's strategic focus on self-developed technologies and a diverse model matrix positions it as a leader in the AI space, with applications spanning various domains [68][70]. - The company has successfully launched multiple AI products catering to different consumer needs, establishing a sustainable cycle of technology, user engagement, and product innovation [73][74].
量子位编辑作者招聘
量子位· 2026-01-29 08:27
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit," which focuses on tracking AI advancements and has established itself as a leading content platform in the industry [1]. Group 1: Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. - Positions are open for various levels, including editors, lead writers, and chief editors, with a focus on matching roles to individual capabilities [6]. Group 2: Job Responsibilities - **AI Industry Direction**: Responsibilities include tracking innovations in infrastructure, such as chips, AI infrastructure, and cloud computing, as well as interpreting technical reports from conferences [6][7]. - **AI Finance Direction**: Focuses on venture capital, financial reports, and capital movements within the AI industry, requiring strong analytical skills and a passion for interviews [11]. - **AI Product Direction**: Involves monitoring AI applications and hardware developments, producing in-depth evaluations of AI products, and engaging with industry experts [11]. Group 3: Benefits and Growth - Employees can expect to gain exposure to the latest AI technologies, enhance their work efficiency through new tools, and build personal influence in the AI field [6]. - The company offers competitive salaries, comprehensive benefits, and a supportive environment for professional growth, including mentorship from senior editors [6][12]. Group 4: Company Impact - By 2025, Quantum Bit aims to have over 2.4 million subscribers on WeChat and more than 7 million users across platforms, with a daily reading volume exceeding 2 million [12]. - The company is recognized as the top new media outlet in the AI and frontier technology sector according to third-party data platforms [12].
这么哇塞的世界模型,竟然是开源的!
量子位· 2026-01-29 08:27
Core Viewpoint - Ant Group's LingBot-World represents a significant advancement in the field of embodied intelligence, integrating memory, interactivity, and continuity in a fully open-source world model, which has garnered considerable attention online [12][30]. Group 1: LingBot-World Features - LingBot-World allows continuous generation and interaction for up to 10 minutes, achieving visual effects comparable to DeepMind's Genie 3 but with longer time dimensions [3][11]. - Users can control the perspective in real-time using keyboard and mouse, similar to playing a AAA game, while the agent can autonomously plan and execute actions within the generated world [5][6]. - The model maintains high consistency and memory, allowing it to infer actions of objects even when they are out of view, adhering to real-world physical laws [9][10][11]. Group 2: Technical Innovations - LingBot-World's development involved a mixed data engine, utilizing both real-world videos and synthetic data from Unreal Engine to teach the model causal relationships [16][17]. - The model employs a three-stage evolution strategy, starting with pre-training for video generation, followed by training to understand physical laws, and finally integrating interactive data to enhance memory capabilities [21][24]. - A novel causal attention mechanism and few-step distillation technology were introduced to reduce inference time to under one second, achieving real-time playability at 16 frames per second [26]. Group 3: Strategic Implications - The release of LingBot-World, along with LingBot-Depth and LingBot-VLA, indicates Ant Group's strategic focus on creating a comprehensive infrastructure for embodied intelligence [30][32]. - The integration of perception (LingBot-Depth), decision-making (LingBot-VLA), and simulation (LingBot-World) creates a closed-loop system that enhances the capabilities of robots in virtual environments [41][42]. - This open-source approach aims to provide reusable and standardized infrastructure for various industries, including gaming, AIGC, and autonomous driving, suggesting potential future expansions [43].
大模型学会拖进度条看视频了!阿里新研究让视频推理告别脑补,实现证据链思考 | ICLR 2026
量子位· 2026-01-29 08:27
Core Insights - The research team from Alibaba's Future Life Lab highlights that the effectiveness of models in video reasoning tasks is significantly influenced by how they are taught to "think" [1] - They propose a high-quality video reasoning dataset called ReWatch and a state-of-the-art model named ReWatch-R1, which can "rewatch" videos like humans to enhance reasoning capabilities [1] Group 1: ReWatch Dataset - The ReWatch dataset consists of 10,000 videos, 170,000 question-answer pairs, and 135,000 reasoning chains, addressing three main issues in existing training data: rough video descriptions, overly simplistic Q&A, and a heavy reliance on textual common sense rather than video content [2][4] - Key features of the ReWatch dataset include: 1. High-fidelity temporal captions that provide detailed event descriptions with precise timestamps, forming a solid factual basis for complex reasoning [2] 2. High-difficulty video Q&A that ensures questions depend on video details, preventing models from relying on guessing or common sense [2] 3. Video-grounded reasoning chains that simulate human behavior of "rewatching and confirming" through a multi-agent framework, ensuring reasoning steps are closely tied to video content [2] Group 2: ReWatch-R1 Model - The training of the ReWatch-R1 model employs a SFT+RL paradigm with an innovative reward mechanism that emphasizes the importance of the reasoning process [6] - The core of the training method is the process reward mechanism (GRPO with O&R Reward), which supervises and rewards the model's intermediate reasoning steps rather than just the final answer [6][8] - The process reward is calculated based on: 1. Observation Reward, which evaluates the accuracy of the model's observations against high-fidelity captions [8] 2. Reasoning Reward, which assesses the effectiveness of the model's reasoning actions based solely on its observations [8] Group 3: Experimental Results and Insights - ReWatch-R1 has achieved state-of-the-art performance across five mainstream video reasoning benchmarks, significantly outperforming all comparable open-source models [9] - A key insight from the research is that reinforcement learning (RL) is crucial for unlocking the "thinking" potential of models, as it allows for a substantial performance leap in the reasoning mode compared to the direct answering mode [11][12] - The study emphasizes that explicit, step-by-step reasoning processes supported by evidence are vital for tackling complex video tasks, with RL being the key to fostering this capability [12][14]
OpenAI推理第一人创业了:要造“活到老学到老”的AI,先来融它70个亿
量子位· 2026-01-29 05:03
Core Viewpoint - Jerry Tworek, a key figure in AI model reasoning, has founded a new company called Core Automation, focusing on "continuous learning" in AI models and plans to raise $1 billion (approximately 70 billion RMB) for this venture [1][15][20]. Company Background - Jerry Tworek played a crucial role in the development of OpenAI's reasoning capabilities and has a strong theoretical and mathematical background, having completed a master's degree in mathematics at the University of Warsaw [4][6][9]. - Before joining OpenAI in 2019, he worked in quantitative research, which shaped his interest in reinforcement learning [7][9]. Focus on Continuous Learning - The new company aims to address the challenge of how models can continuously learn from new data and experiences, rather than being static after deployment [12][15]. - Tworek believes that current mainstream models are limited to a "train and deploy" approach, which does not adapt to new situations encountered in real-world applications [12][22]. Implementation Strategy - Core Automation plans to develop a new architecture that does not rely on Transformers and aims to integrate the training process into a continuous system, allowing models to learn while in operation [17][20]. - The goal is to enable AI models to learn from ongoing experiences while retaining previously acquired knowledge [16][22]. Industry Context - The continuous learning approach is gaining traction, with other companies and academic institutions also exploring similar directions, such as Ilya's SSI company and Google Research's new methodologies [24][28]. - The industry consensus suggests that achieving Artificial General Intelligence (AGI) requires models to possess capabilities akin to biological systems, including continuous evolution and self-optimization, making continuous learning a critical aspect [23][24]. Future Outlook - The ambition to raise $1 billion reflects the high expectations for the potential of continuous learning in AI, with industry experts predicting that 2026 could be a pivotal year for this field [31].
MoltBot作者被Claude刁难后:MiniMax M2.1是最优秀的开源模型
量子位· 2026-01-29 05:03
Core Viewpoint - The article discusses the rise and impact of Moltbot, a tool that automates workflows and enhances productivity for developers, highlighting its practical applications and the excitement it has generated in the tech community [1][2][3][4]. Group 1: Moltbot's Features and Applications - Moltbot has been utilized by developers to automate various tasks, such as writing blogs, tracking work hours, and generating customized reports, showcasing its versatility and efficiency [3][4]. - Developers have integrated Moltbot with tools like Notion and Toggl, allowing for seamless workflow management and automation of routine tasks [4]. - The tool's ability to evolve, such as developing voice features and personalized designs, has surprised users and enhanced its functionality [3]. Group 2: Market Response and Competition - The demand for Moltbot has led to the rapid launch of cloud services by major providers like Alibaba Cloud and Tencent Cloud, which offer environments for running Moltbot [6][7]. - Competitors in the market are emerging, with one tool claiming to provide zero-configuration deployment and extensive compatibility with various applications [9][10]. Group 3: Developer Insights and Future Prospects - Peter Steinberger, the creator of Moltbot, shared insights on his journey into AI development, emphasizing the importance of passion and experimentation in creating innovative tools [12][14][17]. - The project has gained significant traction, with a growing community and interest from investors, indicating a strong market potential for personal AI agents [36][39]. - Steinberger believes that the future of AI tools will involve more personalized and user-friendly interactions, potentially leading to a shift in how applications are developed and utilized [50][51].
谷歌Alpha家族再登Nature封面!刷新基因组预测SOTA,精准定位远端致病突变
量子位· 2026-01-29 02:30
Core Viewpoint - Google DeepMind's new model, AlphaGenome, expands AI's predictive capabilities to the complex realm of the human genome, achieving state-of-the-art (SOTA) performance in genomic predictions [1][9]. Group 1: Model Capabilities - AlphaGenome can simultaneously predict 11 different gene regulatory processes, capturing complex interactions within genes [3][11]. - The model accurately analyzes gene splicing mechanisms, identifying how a single gene can produce multiple proteins and when errors occur that lead to diseases [4][8]. - It has demonstrated the ability to predict mutations related to diseases, such as accurately reconstructing pathogenic mutations in the TAL1 gene associated with leukemia [6][23]. Group 2: Performance Metrics - AlphaGenome has achieved SOTA performance in 22 out of 24 evaluations related to genomic trajectory predictions and outperformed existing models in 25 out of 26 direct disease association tasks [14][9]. - The model's predictive performance includes a 49% success rate in identifying regulatory directions for GWAS-related variants, significantly surpassing traditional methods [21]. Group 3: Technical Architecture - The model employs a hybrid architecture combining CNN and Transformer technologies, allowing for high-precision genomic predictions [30][31]. - AlphaGenome's input window extends to 1 million base pairs, enabling it to cover most interactions between remote enhancers and promoters [36]. - The training process utilizes a large-scale dataset covering both human and mouse genomes, ensuring the model learns universal rules of gene regulation across different physiological environments [37][38]. Group 4: Training Strategy - AlphaGenome implements a two-phase training strategy to balance generalization and inference efficiency, including a pre-training phase with strict cross-validation and a distillation phase for model refinement [40][41]. - The training incorporates rigorous data augmentation strategies to enhance the model's robustness against unseen mutations [43].
马斯克冲刺机器人量产,果断停产特斯拉豪华车型!2026年资本支出将“非常大”
量子位· 2026-01-29 02:30
鱼羊 发自 凹非寺 量子位 | 公众号 QbitAI 让特斯拉转型成「机器人」公司,马斯克来真的了: 在最新财报电话会议上,马斯克宣布,特斯拉将在2026年第二季度 停产豪华车型Model S和Model X 。 现在是时候让Model S和Model X体面地退出舞台了,因为我们正在迈向一个以自动驾驶为基础的未来。 擎天柱官推则更为明确地放出了这样一个视频,配文: Model S和X将通过我继续存在。 马斯克透露,在把特斯拉加州弗里蒙特工厂的Model S/X生产线改造成擎天柱生产线后,其机器人的产量将达到 每年一百万台 。 特斯拉「情怀」车型走向终结 Model S/X的退役,倒也不是无迹可寻。 毕竟马斯克2019年就说过,这两款分别诞生于2012年和2015年的车型纯粹是因为"情感原因"才被继续生产的。 数据很直接:在过去几年里,特斯拉每年能卖出差不多170万辆车,其中Model S/X也就占几个百分点,并且销量还在持续下降。 目的很明确,就是要给特斯拉机器人擎天柱 (Optimus) 让出生产线。 财报显示,特斯拉第四季度净利润为8.40亿美元,较上年同期下降了61%。 | ($ in million ...