Workflow
机器之心
icon
Search documents
刚上市的摩尔线程,即将揭晓新一代GPU架构
机器之心· 2025-12-09 03:17
Core Viewpoint - The MUSA Developer Conference (MDC 2025) will be held on December 19-20, 2025, in Beijing, focusing on the development of full-function GPUs and aiming to explore breakthroughs in domestic computing power and the creation of a new autonomous computing ecosystem [2][4]. Group 1: Conference Overview - MDC 2025 is the first domestic conference dedicated to full-function GPUs, emphasizing the themes of creation, connection, and convergence [2]. - The conference aims to gather global developers, technology leaders, and industry pioneers to discuss the self-reliance in technology and industrial upgrades [2]. - The event will showcase the MUSA technology system and its full-stack capabilities, promoting the integration of GPU technology across various industries [2][4]. Group 2: Main Forum Highlights - The main forum will focus on intelligent computing as a core engine for digital transformation across industries, with a keynote by Zhang Jianzhong, founder and CEO of Moole Technology, detailing the full-stack development strategy centered around MUSA [4]. - A new generation GPU architecture will be unveiled, along with a comprehensive layout of product systems, core technologies, and industry solutions [4]. - The forum will also share practical applications and ecological progress in AI computing, graphics rendering, and scientific computing [4]. Group 3: Technical Sessions - Over 20 technical sub-forums will be held, covering key areas such as intelligent computing, graphics computing, AI infrastructure, and developer tools [6]. - The "Moole Academy" will be established to empower developers through systematic technical sharing, resource integration, and talent cultivation [6]. Group 4: Interactive Experience - A 1000 square meter immersive "MUSA Carnival" will be created, featuring diverse thematic exhibition areas that cover cutting-edge technologies and popular application scenarios [8]. - The carnival will include interactive live demonstrations, allowing attendees to experience innovations in AI, digital twins, and more [8][11]. Group 5: Company Vision - Moole Technology aims to provide accelerated computing infrastructure and one-stop solutions to support digital transformation across various industries [26]. - The company aspires to become a leading GPU enterprise with international competitiveness, focusing on the integration of AI and digital twin technologies [26].
ICLR 2026还会好吗?300篇投稿50篇含幻觉,引用example.com竟也能过审
机器之心· 2025-12-08 10:11
机器之心报道 编辑:杜伟、Panda 这届 ICLR 的烦心事还没有结束。 最近一段时间,对于 ICLR 2026 来说,真可谓是一波未平、一波又起。先是第三方机构对审稿意见的系统性统计发现,其中 有 21% 完全由 AI 生成 ;后有 OpenReview 评审大开盒 ,波及到了 ICLR 2026 超过 10000 篇投稿。 今天,ICLR 2026 的审稿又被揭开一块遮羞布。事情是这样的: AI 生成内容检测平台 GPTZero 扫描了 300 篇 投稿论文,发现其中有 50 篇在论文引用上至少包含 一处明显的幻觉内容。 甚至有些幻觉引用还非常离谱,达到了匪夷所思的程度,就好像投稿者完全不检查一样。比如下面 GPTZero CTO 和联创 Alex Cui 在 X 分享的这个例子,投稿者给 出的引用链接竟然是默认示例链接 example.com ! 而在下面的例子中,作者名单就只是一串大写字母。 更令人担忧的是, 这些存在幻觉内容的投稿已经经过了 3-5 名领域专家的同行评审,但他们中的绝大多数都未能识别出这些虚假的引用。 这意味着,如果这些投稿没有其他外部干预,就可能会被 ICLR 会议接收。部分投稿 ...
AAAI 2026 Oral | 手机传感器正在泄露隐私?PATN实时守护隐私安全
机器之心· 2025-12-08 10:11
Core Viewpoint - The article discusses the development of a privacy protection framework called PATN, which aims to safeguard user privacy while maintaining the utility of mobile sensor data, addressing the critical issue of privacy risks associated with sensor data collection [2][3]. Group 1: Introduction to PATN - PATN is a predictive adversarial transformation network designed to protect privacy in mobile sensor data by applying small perturbations that do not affect data semantics or temporal structure [3]. - The framework addresses real-time protection and temporal misalignment issues through two core technologies: a generative network for immediate prediction and application of future perturbations, and a historical-aware top-k optimization strategy [3][10]. Group 2: Technical Challenges - Two key challenges in existing privacy protection methods are identified: real-time perturbation generation and the temporal misalignment between defense and attack [7][8]. - Real-time perturbation generation focuses on creating future perturbations instantaneously as data is generated, ensuring continuous privacy protection without waiting for complete sequences [7]. - Temporal misalignment addresses the need for perturbations to effectively cover target windows even when there is a time offset between attacks and defenses [8]. Group 3: Methodology of PATN - PATN utilizes open-source privacy inference models and their gradients to predict future perturbations based on historical sensor data, balancing privacy protection with data fidelity [10]. - The system consists of a training phase that optimizes three types of losses: adversarial effectiveness, temporal robustness, and smooth regularization [10]. - The perturbation range is strictly limited to 5% of the mean or standard deviation of each sensor dimension to ensure that the perturbations remain imperceptible to users [12]. Group 4: Performance Evaluation - PATN was evaluated on two mobile sensor datasets, MotionSense and ChildShield, demonstrating superior real-time protection performance compared to traditional methods [15]. - In experiments, PATN achieved an Attack Success Rate (ASR) of 40.11% and an Equal Error Rate (EER) of 41.65% on the MotionSense dataset, significantly outperforming existing baseline methods [14][15]. - The framework maintains high data usability for downstream tasks like behavior recognition and gait detection, ensuring that privacy protection does not compromise application performance [18]. Group 5: Future Directions - Future work will focus on expanding PATN's applicability to black-box models and covering a broader range of sensitive attributes [19].
斯坦福最火CS课:不让学生写代码,必须用AI
机器之心· 2025-12-08 10:11
Core Insights - Stanford University's new course "The Modern Software Developer" (CS146S) focuses on teaching programming development without writing code, emphasizing the use of AI tools like Cursor and Claude [2][5] - The course has gained immense popularity, with over 200 students on the waiting list, reflecting the growing concern about navigating an AI-driven world [5] Course Overview - The course spans 10 months and is the first to concentrate on AI software principles and practices, combining practicality with engagement [8] - Prerequisites include programming experience equivalent to CS111, with recommendations to have completed courses in advanced mathematics and machine learning [9] Weekly Breakdown - **Week 1**: Introduction to coding LLMs and AI development, covering LLM fundamentals and effective prompting techniques [10] - **Week 2**: Internal structure of programming agents, including architecture and function calling mechanisms [11] - **Week 3**: Focus on AI integrated development environments, emphasizing context management and code understanding [12] - **Week 4**: Management of agent autonomy and collaboration between humans and agents [13] - **Week 5**: Integration of AI with modern terminal capabilities, including command line enhancements [14] - **Week 6**: Application of AI in testing and security, focusing on secure coding practices and automated test generation [14] - **Week 7**: Evaluation of AI code system reliability and automated documentation generation [14] - **Week 8**: Automation in UI and app building, enabling rapid prototyping [15] - **Week 9**: Management of deployed AI systems, including monitoring and fault response [15] - **Week 10**: Future directions in AI software engineering, exploring new coding paradigms and industry trends [15][16] Instructor Background - Mihail Eric, the course instructor, is an engineer and educator with experience in the Stanford NLP group and a focus on machine learning and software engineering practices [19][20]
从分钟级等待到20倍超速:LightX2V重写AI视频生成速度上限
机器之心· 2025-12-08 04:27
Core Viewpoint - The LightX2V project has gained significant popularity in the ComfyUI community, achieving over 1.7 million downloads in a single month, enabling creators to generate high-quality videos in real-time on consumer-grade graphics cards [2][7]. Group 1: Technology and Performance - LightX2V utilizes a comprehensive inference technology stack aimed at low-cost, high-real-time video generation, achieving near 1:1 real-time video generation [2][7]. - The project features a dual-core algorithm: Phased DMD step distillation and LightVAE, which work together to compress the video diffusion process from 40-50 steps to just 4 steps while maintaining time consistency and motion details [10][11]. - LightVAE is designed to meet the dual demands of throughput and resolution in video generation, effectively reducing encoding and decoding overhead while maintaining high-quality visuals [12]. Group 2: System Optimization - After algorithmic compression, LightX2V employs a full-stack inference framework to enhance performance, making it efficient for both single-card and multi-card deployments [14][16]. - Key technologies include low-bit operators, sparse attention, and feature caching, which collectively reduce memory requirements to below 8GB, allowing entry-level consumer cards to run the system [21]. Group 3: Ecosystem and Applications - LightX2V supports a range of mainstream video generation models and is integrated with ComfyUI, allowing users to easily access accelerated inference through a familiar graphical interface [19][21]. - The project caters to various user needs, from individual creators to enterprise-level applications, enabling functionalities such as image-to-video and text-to-video generation [19][21]. - LightX2V is compatible with a variety of hardware, including both NVIDIA and domestic AI chips, facilitating localized and large-scale deployments [21].
DeepSeek V3到V3.2的进化之路,一文看全
机器之心· 2025-12-08 04:27
Core Insights - DeepSeek has released two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which have generated significant interest and discussion in the AI community [2][5][11] - The evolution from DeepSeek V3 to V3.2 includes various architectural improvements and the introduction of new mechanisms aimed at enhancing performance and efficiency [10][131] Release Timeline - The initial release of DeepSeek V3 in December 2024 did not create immediate buzz, but the subsequent release of the DeepSeek R1 model changed the landscape, making DeepSeek a popular alternative to proprietary models from companies like OpenAI and Google [11][14] - The release of DeepSeek V3.2-Exp in September 2025 was seen as a preparatory step for the V3.2 model, focusing on establishing the necessary infrastructure for deployment [17][49] Model Types - DeepSeek V3 was initially launched as a base model, while DeepSeek R1 was developed as a specialized reasoning model through additional training [19][20] - The trend in the industry has seen a shift from hybrid reasoning models to specialized models, with DeepSeek seemingly reversing this trend by moving from specialized (R1) to hybrid models (V3.1 and V3.2) [25] Evolution from V3 to V3.1 - DeepSeek V3 utilized a mixed expert model and multi-head latent attention (MLA) to optimize memory usage during inference [29][30] - DeepSeek R1 focused on Reinforcement Learning with Verifiable Rewards (RLVR) to enhance reasoning capabilities, particularly in tasks requiring symbolic verification [37][38] Sparse Attention Mechanism - DeepSeek V3.2-Exp introduced a non-standard sparse attention mechanism, which significantly improved efficiency in training and inference, especially in long-context scenarios [49][68] - The DeepSeek Sparse Attention (DSA) mechanism allows the model to selectively focus on relevant past tokens, reducing computational complexity from quadratic to linear [68] Self-Verification and Self-Correction - DeepSeekMath V2, released shortly before V3.2, introduced self-verification and self-correction techniques to improve the accuracy of mathematical reasoning tasks [71][72] - The self-verification process involves a verifier model that assesses the quality of generated proofs, while self-correction allows the model to iteratively improve its outputs based on feedback [78][92] DeepSeek V3.2 Architecture - DeepSeek V3.2 maintains the architecture of its predecessor, V3.2-Exp, while incorporating improvements aimed at enhancing overall model performance across various tasks, including mathematics and coding [107][110] - The model's training process has been refined to include updates to the RLVR framework, integrating new reward mechanisms for different task types [115][116] Performance Benchmarks - DeepSeek V3.2 has shown competitive performance in various benchmarks, achieving notable results in mathematical tasks and outperforming several proprietary models [127]
百万人围观的「萌娃教训小狗」视频火了,结果都是AI生成的|附教程
机器之心· 2025-12-07 04:33
Core Viewpoint - The article discusses the rise of AI-generated videos featuring interactions between children and pets, highlighting their emotional appeal and the technology behind their creation [15][20]. Group 1: AI Video Generation - AI tools like Sora2, Veo3.1, and Keling Video 2.6 are capable of producing highly realistic videos by using specific prompts [10][12]. - Sora2 has shown significant improvements in physical realism, detail presentation, and audio synchronization compared to its previous version [12]. Group 2: Popularity and User Engagement - Videos of children and pets have gone viral on social media, achieving high engagement rates with likes in the thousands and views in the millions [7]. - Despite initial popularity, Sora's user retention rates are alarmingly low, with only 10% remaining on the first day and dropping to 1% by the 30th day [21][23]. Group 3: User Behavior and Platform Dynamics - New social applications often experience a surge in initial user activity, but many users leave once they assess the platform's long-term value [23]. - Sora's dual identity as both a creative tool and a social platform complicates its ability to retain users, as most content is AI-generated rather than authentic social interactions [27][29].
DeepSeek、Gemini谁更能提供情感支持?趣丸×北大来了波情绪轨迹动态评估
机器之心· 2025-12-07 04:33
Core Viewpoint - The paper titled "Detecting Emotional Dynamic Trajectories: An Evaluation Framework for Emotional Support in Language Models" co-authored by Quwan Technology and Peking University has been accepted for AAAI 2026, highlighting the importance of emotional support in human-AI interactions and the need for a new evaluation framework for language models [2][3]. Research Background - Emotional support is a core capability in human-AI interactions, yet existing evaluations of large language models (LLMs) often rely on short, static dialogues, failing to capture the dynamic and long-term nature of emotional support [5]. - The evaluation of emotional capabilities in LLMs is crucial for self-developed models, as emotional support dialogues have evolved from emotion recognition and generation to include broader human-centered tasks like role-playing and casual chatting [5]. Proposed Framework - The team introduced a new evaluation framework called ETrajEval, designed to systematically assess the ability of LLMs to provide emotional support in long-term dialogues [6]. Key Contributions 1. The framework addresses two main limitations of existing evaluation methods: the lack of long-term and dynamic interactions, and an overemphasis on model-centered response quality [8]. 2. The framework adopts a user-centered perspective, focusing on the emotional trajectories of users throughout the interaction process [9]. 3. Three trajectory-level metrics were proposed: Average Emotional Level (BEL), Emotional Trajectory Variability (ETV), and Emotional Centroid Position (ECP), which together represent the dynamic changes in user emotional states [11]. Experimental Analysis - The team constructed a dataset with 328 interaction environments and 1,152 disruptive events to simulate real emotional changes and assess model adaptability in evolving contexts [14]. - Psychological theories were utilized to constrain model responses, encouraging supportive behaviors aligned with validated therapeutic principles [14]. - The evaluation framework was validated through extensive assessments of leading models, revealing significant differences in their long-term emotional support capabilities [15]. Findings - The results indicated that top open-source and closed-source models do not show significant differences in overall emotional support capabilities [16]. - Models designed for role-playing did not outperform general-purpose LLMs in maintaining positive emotional states [17]. - Models exhibited stronger long-term emotional support capabilities in English dialogues compared to Chinese dialogues [17]. Visualization and Analysis - Emotional centroid visualizations revealed that models with higher BEL and ETV scores demonstrated strong capabilities in guiding users to stable positive emotional states [21]. - The emotional trajectory visualizations indicated that models with higher ETV scores effectively helped users recover from low emotional states, confirming the team's earlier assertions [22]. Conclusion - The proposed emotional dynamic trajectory analysis framework offers a comprehensive and multidimensional evaluation of LLMs' emotional support capabilities, achieving high consistency with human evaluations [28].
LLM强化学习不稳定之谜,被Qwen团队从「一阶近似」视角解开
机器之心· 2025-12-07 04:33
Core Insights - Reinforcement Learning (RL) has become a key technology paradigm for enhancing the complex reasoning and problem-solving capabilities of Large Language Models (LLMs) [2] - The main challenge in RL for LLMs is the mismatch between sequence-level rewards and token-level optimization objectives, raising concerns about theoretical soundness and training stability [2][5] - A new RL formulation method proposed by Alibaba's Qianwen team focuses on optimizing the expected value of sequence-level rewards using a surrogate token-level objective as a first-order approximation [2][11] Methodology - The team defines an autoregressive LLM represented by a policy π_θ, focusing on sequence-level rewards where a scalar reward R(x, y) is assigned to the entire response y [6] - The decision to avoid value function methods stems from the difficulty in constructing a general, scalable, and reliable value model [7] - Directly optimizing the expected sequence-level reward is challenging due to numerical differences between training and inference [9] Key Findings - The team conducted extensive experiments using a 30 billion parameter MoE model, consuming hundreds of thousands of GPU hours [4] - The introduction of on-policy training with importance sampling correction achieved the highest training stability [10] - In off-policy updates, both clipping and Routing Replay are essential for maintaining training stability, as their absence leads to performance degradation [23] Experimental Results - The MiniRL algorithm, which incorporates importance sampling, demonstrated the best performance and stability during training [22] - The removal of importance sampling correction during training led to rapid collapse and a sharp decrease in entropy, confirming its critical role in the first-order approximation [22] - Different cold-start initialization methods yielded similar final performance, indicating that the focus should be on the RL methods themselves rather than initialization details [27]
两个LLM互相对线,推理能力起飞:康奈尔团队发布大模型版类GAN训练法
机器之心· 2025-12-07 02:52
Core Insights - The article discusses the development of a new GAN-like training framework called PasoDoble, aimed at enhancing the reasoning capabilities of large language models (LLMs) through adversarial training without external supervision [3][41]. Group 1: PasoDoble Framework - PasoDoble consists of two models: Proposer, which generates challenging questions with standard answers, and Solver, which attempts to solve these questions [3][9]. - The training process involves Proposer generating question-answer pairs based on knowledge sampled from a knowledge base, while Solver generates multiple answers for each question [9][10]. - The framework does not rely on any supervisory signals throughout the training process, making it a fully unsupervised method [3][7]. Group 2: Performance Improvements - The implementation of PasoDoble has led to significant performance improvements in mathematical tasks, with Qwen3-1.7B-Base showing an average performance increase of approximately 13 percentage points and Qwen3-4B-Base showing an increase of about 16 percentage points [7][28]. - The results from various models indicate that the performance enhancement is more pronounced with larger model sizes, demonstrating the scalability of the PasoDoble approach [28][41]. Group 3: Reward Mechanism - The Proposer's reward mechanism is designed to encourage the generation of difficult and diverse questions, with rewards based on the difficulty and novelty of the questions generated [12][13]. - The Solver's training relies solely on correctness rewards, where each answer generated is compared to the standard answer provided by the Proposer [22][23]. - The effectiveness of the reward mechanisms is highlighted by the significant performance differences observed when using random rewards compared to the structured rewards from the PasoDoble framework [35][37]. Group 4: Experimental Results - The article presents detailed experimental results across various mathematical benchmarks, showing that PasoDoble significantly enhances model performance, particularly in competitive math tasks [28][29]. - The results indicate that models trained with PasoDoble consistently outperform baseline models, with notable improvements in accuracy across different benchmarks [28][34]. Group 5: Future Directions - Future research will explore extending the PasoDoble framework to other domains beyond mathematics, such as code generation and factual question answering, and investigate broader multi-model training paradigms [41].