Workflow
机器之心
icon
Search documents
AAAI 2026 Oral | 手机传感器正在泄露隐私?PATN实时守护隐私安全
机器之心· 2025-12-08 10:11
Core Viewpoint - The article discusses the development of a privacy protection framework called PATN, which aims to safeguard user privacy while maintaining the utility of mobile sensor data, addressing the critical issue of privacy risks associated with sensor data collection [2][3]. Group 1: Introduction to PATN - PATN is a predictive adversarial transformation network designed to protect privacy in mobile sensor data by applying small perturbations that do not affect data semantics or temporal structure [3]. - The framework addresses real-time protection and temporal misalignment issues through two core technologies: a generative network for immediate prediction and application of future perturbations, and a historical-aware top-k optimization strategy [3][10]. Group 2: Technical Challenges - Two key challenges in existing privacy protection methods are identified: real-time perturbation generation and the temporal misalignment between defense and attack [7][8]. - Real-time perturbation generation focuses on creating future perturbations instantaneously as data is generated, ensuring continuous privacy protection without waiting for complete sequences [7]. - Temporal misalignment addresses the need for perturbations to effectively cover target windows even when there is a time offset between attacks and defenses [8]. Group 3: Methodology of PATN - PATN utilizes open-source privacy inference models and their gradients to predict future perturbations based on historical sensor data, balancing privacy protection with data fidelity [10]. - The system consists of a training phase that optimizes three types of losses: adversarial effectiveness, temporal robustness, and smooth regularization [10]. - The perturbation range is strictly limited to 5% of the mean or standard deviation of each sensor dimension to ensure that the perturbations remain imperceptible to users [12]. Group 4: Performance Evaluation - PATN was evaluated on two mobile sensor datasets, MotionSense and ChildShield, demonstrating superior real-time protection performance compared to traditional methods [15]. - In experiments, PATN achieved an Attack Success Rate (ASR) of 40.11% and an Equal Error Rate (EER) of 41.65% on the MotionSense dataset, significantly outperforming existing baseline methods [14][15]. - The framework maintains high data usability for downstream tasks like behavior recognition and gait detection, ensuring that privacy protection does not compromise application performance [18]. Group 5: Future Directions - Future work will focus on expanding PATN's applicability to black-box models and covering a broader range of sensitive attributes [19].
斯坦福最火CS课:不让学生写代码,必须用AI
机器之心· 2025-12-08 10:11
Core Insights - Stanford University's new course "The Modern Software Developer" (CS146S) focuses on teaching programming development without writing code, emphasizing the use of AI tools like Cursor and Claude [2][5] - The course has gained immense popularity, with over 200 students on the waiting list, reflecting the growing concern about navigating an AI-driven world [5] Course Overview - The course spans 10 months and is the first to concentrate on AI software principles and practices, combining practicality with engagement [8] - Prerequisites include programming experience equivalent to CS111, with recommendations to have completed courses in advanced mathematics and machine learning [9] Weekly Breakdown - **Week 1**: Introduction to coding LLMs and AI development, covering LLM fundamentals and effective prompting techniques [10] - **Week 2**: Internal structure of programming agents, including architecture and function calling mechanisms [11] - **Week 3**: Focus on AI integrated development environments, emphasizing context management and code understanding [12] - **Week 4**: Management of agent autonomy and collaboration between humans and agents [13] - **Week 5**: Integration of AI with modern terminal capabilities, including command line enhancements [14] - **Week 6**: Application of AI in testing and security, focusing on secure coding practices and automated test generation [14] - **Week 7**: Evaluation of AI code system reliability and automated documentation generation [14] - **Week 8**: Automation in UI and app building, enabling rapid prototyping [15] - **Week 9**: Management of deployed AI systems, including monitoring and fault response [15] - **Week 10**: Future directions in AI software engineering, exploring new coding paradigms and industry trends [15][16] Instructor Background - Mihail Eric, the course instructor, is an engineer and educator with experience in the Stanford NLP group and a focus on machine learning and software engineering practices [19][20]
从分钟级等待到20倍超速:LightX2V重写AI视频生成速度上限
机器之心· 2025-12-08 04:27
Core Viewpoint - The LightX2V project has gained significant popularity in the ComfyUI community, achieving over 1.7 million downloads in a single month, enabling creators to generate high-quality videos in real-time on consumer-grade graphics cards [2][7]. Group 1: Technology and Performance - LightX2V utilizes a comprehensive inference technology stack aimed at low-cost, high-real-time video generation, achieving near 1:1 real-time video generation [2][7]. - The project features a dual-core algorithm: Phased DMD step distillation and LightVAE, which work together to compress the video diffusion process from 40-50 steps to just 4 steps while maintaining time consistency and motion details [10][11]. - LightVAE is designed to meet the dual demands of throughput and resolution in video generation, effectively reducing encoding and decoding overhead while maintaining high-quality visuals [12]. Group 2: System Optimization - After algorithmic compression, LightX2V employs a full-stack inference framework to enhance performance, making it efficient for both single-card and multi-card deployments [14][16]. - Key technologies include low-bit operators, sparse attention, and feature caching, which collectively reduce memory requirements to below 8GB, allowing entry-level consumer cards to run the system [21]. Group 3: Ecosystem and Applications - LightX2V supports a range of mainstream video generation models and is integrated with ComfyUI, allowing users to easily access accelerated inference through a familiar graphical interface [19][21]. - The project caters to various user needs, from individual creators to enterprise-level applications, enabling functionalities such as image-to-video and text-to-video generation [19][21]. - LightX2V is compatible with a variety of hardware, including both NVIDIA and domestic AI chips, facilitating localized and large-scale deployments [21].
DeepSeek V3到V3.2的进化之路,一文看全
机器之心· 2025-12-08 04:27
Core Insights - DeepSeek has released two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which have generated significant interest and discussion in the AI community [2][5][11] - The evolution from DeepSeek V3 to V3.2 includes various architectural improvements and the introduction of new mechanisms aimed at enhancing performance and efficiency [10][131] Release Timeline - The initial release of DeepSeek V3 in December 2024 did not create immediate buzz, but the subsequent release of the DeepSeek R1 model changed the landscape, making DeepSeek a popular alternative to proprietary models from companies like OpenAI and Google [11][14] - The release of DeepSeek V3.2-Exp in September 2025 was seen as a preparatory step for the V3.2 model, focusing on establishing the necessary infrastructure for deployment [17][49] Model Types - DeepSeek V3 was initially launched as a base model, while DeepSeek R1 was developed as a specialized reasoning model through additional training [19][20] - The trend in the industry has seen a shift from hybrid reasoning models to specialized models, with DeepSeek seemingly reversing this trend by moving from specialized (R1) to hybrid models (V3.1 and V3.2) [25] Evolution from V3 to V3.1 - DeepSeek V3 utilized a mixed expert model and multi-head latent attention (MLA) to optimize memory usage during inference [29][30] - DeepSeek R1 focused on Reinforcement Learning with Verifiable Rewards (RLVR) to enhance reasoning capabilities, particularly in tasks requiring symbolic verification [37][38] Sparse Attention Mechanism - DeepSeek V3.2-Exp introduced a non-standard sparse attention mechanism, which significantly improved efficiency in training and inference, especially in long-context scenarios [49][68] - The DeepSeek Sparse Attention (DSA) mechanism allows the model to selectively focus on relevant past tokens, reducing computational complexity from quadratic to linear [68] Self-Verification and Self-Correction - DeepSeekMath V2, released shortly before V3.2, introduced self-verification and self-correction techniques to improve the accuracy of mathematical reasoning tasks [71][72] - The self-verification process involves a verifier model that assesses the quality of generated proofs, while self-correction allows the model to iteratively improve its outputs based on feedback [78][92] DeepSeek V3.2 Architecture - DeepSeek V3.2 maintains the architecture of its predecessor, V3.2-Exp, while incorporating improvements aimed at enhancing overall model performance across various tasks, including mathematics and coding [107][110] - The model's training process has been refined to include updates to the RLVR framework, integrating new reward mechanisms for different task types [115][116] Performance Benchmarks - DeepSeek V3.2 has shown competitive performance in various benchmarks, achieving notable results in mathematical tasks and outperforming several proprietary models [127]
百万人围观的「萌娃教训小狗」视频火了,结果都是AI生成的|附教程
机器之心· 2025-12-07 04:33
Core Viewpoint - The article discusses the rise of AI-generated videos featuring interactions between children and pets, highlighting their emotional appeal and the technology behind their creation [15][20]. Group 1: AI Video Generation - AI tools like Sora2, Veo3.1, and Keling Video 2.6 are capable of producing highly realistic videos by using specific prompts [10][12]. - Sora2 has shown significant improvements in physical realism, detail presentation, and audio synchronization compared to its previous version [12]. Group 2: Popularity and User Engagement - Videos of children and pets have gone viral on social media, achieving high engagement rates with likes in the thousands and views in the millions [7]. - Despite initial popularity, Sora's user retention rates are alarmingly low, with only 10% remaining on the first day and dropping to 1% by the 30th day [21][23]. Group 3: User Behavior and Platform Dynamics - New social applications often experience a surge in initial user activity, but many users leave once they assess the platform's long-term value [23]. - Sora's dual identity as both a creative tool and a social platform complicates its ability to retain users, as most content is AI-generated rather than authentic social interactions [27][29].
DeepSeek、Gemini谁更能提供情感支持?趣丸×北大来了波情绪轨迹动态评估
机器之心· 2025-12-07 04:33
Core Viewpoint - The paper titled "Detecting Emotional Dynamic Trajectories: An Evaluation Framework for Emotional Support in Language Models" co-authored by Quwan Technology and Peking University has been accepted for AAAI 2026, highlighting the importance of emotional support in human-AI interactions and the need for a new evaluation framework for language models [2][3]. Research Background - Emotional support is a core capability in human-AI interactions, yet existing evaluations of large language models (LLMs) often rely on short, static dialogues, failing to capture the dynamic and long-term nature of emotional support [5]. - The evaluation of emotional capabilities in LLMs is crucial for self-developed models, as emotional support dialogues have evolved from emotion recognition and generation to include broader human-centered tasks like role-playing and casual chatting [5]. Proposed Framework - The team introduced a new evaluation framework called ETrajEval, designed to systematically assess the ability of LLMs to provide emotional support in long-term dialogues [6]. Key Contributions 1. The framework addresses two main limitations of existing evaluation methods: the lack of long-term and dynamic interactions, and an overemphasis on model-centered response quality [8]. 2. The framework adopts a user-centered perspective, focusing on the emotional trajectories of users throughout the interaction process [9]. 3. Three trajectory-level metrics were proposed: Average Emotional Level (BEL), Emotional Trajectory Variability (ETV), and Emotional Centroid Position (ECP), which together represent the dynamic changes in user emotional states [11]. Experimental Analysis - The team constructed a dataset with 328 interaction environments and 1,152 disruptive events to simulate real emotional changes and assess model adaptability in evolving contexts [14]. - Psychological theories were utilized to constrain model responses, encouraging supportive behaviors aligned with validated therapeutic principles [14]. - The evaluation framework was validated through extensive assessments of leading models, revealing significant differences in their long-term emotional support capabilities [15]. Findings - The results indicated that top open-source and closed-source models do not show significant differences in overall emotional support capabilities [16]. - Models designed for role-playing did not outperform general-purpose LLMs in maintaining positive emotional states [17]. - Models exhibited stronger long-term emotional support capabilities in English dialogues compared to Chinese dialogues [17]. Visualization and Analysis - Emotional centroid visualizations revealed that models with higher BEL and ETV scores demonstrated strong capabilities in guiding users to stable positive emotional states [21]. - The emotional trajectory visualizations indicated that models with higher ETV scores effectively helped users recover from low emotional states, confirming the team's earlier assertions [22]. Conclusion - The proposed emotional dynamic trajectory analysis framework offers a comprehensive and multidimensional evaluation of LLMs' emotional support capabilities, achieving high consistency with human evaluations [28].
LLM强化学习不稳定之谜,被Qwen团队从「一阶近似」视角解开
机器之心· 2025-12-07 04:33
Core Insights - Reinforcement Learning (RL) has become a key technology paradigm for enhancing the complex reasoning and problem-solving capabilities of Large Language Models (LLMs) [2] - The main challenge in RL for LLMs is the mismatch between sequence-level rewards and token-level optimization objectives, raising concerns about theoretical soundness and training stability [2][5] - A new RL formulation method proposed by Alibaba's Qianwen team focuses on optimizing the expected value of sequence-level rewards using a surrogate token-level objective as a first-order approximation [2][11] Methodology - The team defines an autoregressive LLM represented by a policy π_θ, focusing on sequence-level rewards where a scalar reward R(x, y) is assigned to the entire response y [6] - The decision to avoid value function methods stems from the difficulty in constructing a general, scalable, and reliable value model [7] - Directly optimizing the expected sequence-level reward is challenging due to numerical differences between training and inference [9] Key Findings - The team conducted extensive experiments using a 30 billion parameter MoE model, consuming hundreds of thousands of GPU hours [4] - The introduction of on-policy training with importance sampling correction achieved the highest training stability [10] - In off-policy updates, both clipping and Routing Replay are essential for maintaining training stability, as their absence leads to performance degradation [23] Experimental Results - The MiniRL algorithm, which incorporates importance sampling, demonstrated the best performance and stability during training [22] - The removal of importance sampling correction during training led to rapid collapse and a sharp decrease in entropy, confirming its critical role in the first-order approximation [22] - Different cold-start initialization methods yielded similar final performance, indicating that the focus should be on the RL methods themselves rather than initialization details [27]
两个LLM互相对线,推理能力起飞:康奈尔团队发布大模型版类GAN训练法
机器之心· 2025-12-07 02:52
Core Insights - The article discusses the development of a new GAN-like training framework called PasoDoble, aimed at enhancing the reasoning capabilities of large language models (LLMs) through adversarial training without external supervision [3][41]. Group 1: PasoDoble Framework - PasoDoble consists of two models: Proposer, which generates challenging questions with standard answers, and Solver, which attempts to solve these questions [3][9]. - The training process involves Proposer generating question-answer pairs based on knowledge sampled from a knowledge base, while Solver generates multiple answers for each question [9][10]. - The framework does not rely on any supervisory signals throughout the training process, making it a fully unsupervised method [3][7]. Group 2: Performance Improvements - The implementation of PasoDoble has led to significant performance improvements in mathematical tasks, with Qwen3-1.7B-Base showing an average performance increase of approximately 13 percentage points and Qwen3-4B-Base showing an increase of about 16 percentage points [7][28]. - The results from various models indicate that the performance enhancement is more pronounced with larger model sizes, demonstrating the scalability of the PasoDoble approach [28][41]. Group 3: Reward Mechanism - The Proposer's reward mechanism is designed to encourage the generation of difficult and diverse questions, with rewards based on the difficulty and novelty of the questions generated [12][13]. - The Solver's training relies solely on correctness rewards, where each answer generated is compared to the standard answer provided by the Proposer [22][23]. - The effectiveness of the reward mechanisms is highlighted by the significant performance differences observed when using random rewards compared to the structured rewards from the PasoDoble framework [35][37]. Group 4: Experimental Results - The article presents detailed experimental results across various mathematical benchmarks, showing that PasoDoble significantly enhances model performance, particularly in competitive math tasks [28][29]. - The results indicate that models trained with PasoDoble consistently outperform baseline models, with notable improvements in accuracy across different benchmarks [28][34]. Group 5: Future Directions - Future research will explore extending the PasoDoble framework to other domains beyond mathematics, such as code generation and factual question answering, and investigate broader multi-model training paradigms [41].
M系列芯片一号人物准备离开,苹果高管流失正在失控
机器之心· 2025-12-07 02:52
机器之心编辑部 机器之心报道 近些天,苹果高管变动的风波就没断过。 12 月 1 日,苹果负责 机器学习与人 工智能 战略 的高级副总裁约翰・詹南德雷亚(John Giannandrea)正式宣布退休,在正式退休前担任公司顾问职务,其退休时 间预计为 2026 年春季。 同时,苹果还宣布,知名人工智能研究员阿马尔・苏布拉马尼亚(Amar Subramanya)已加入苹果,出任人工智能副总裁,并向高级副总裁 Craig Federighi 汇报工 作。 这一事件标志着苹果对人工智能定位的转变,AI 不再是一个单独向库克汇报的部门,而从属于软件工程之下。 据报道,Dye 跳槽的原因有很大一部分在于对苹果在人工智能领域进展缓慢的失望。 众所周知,苹果现在保持足够竞争力的主要支柱,一个是出众的工业和美学设计,另一个则是行业领先的芯片设计。 但遗憾的是,苹果设计总监已经决定离开,而 「苹果芯片」之父也已经正在考虑离开苹果。 彭博社报道称,苹果公司 硬件技术高级副总裁 约翰尼・斯鲁吉(Johny Srouji)已向蒂姆・库克(Tim Cook)表示: 他正在「认真考虑」在不久的将来离开苹果,前往另一家公司。 早在今年 1 ...
更多非共识,Test-time Scaling 能否一直大力出奇迹?
机器之心· 2025-12-07 01:30
Test-time Scaling 有哪些非共识?流行的 Sequential 和 Parallel 路线有何局限?Test-time Scaling 为何需要「Better Search」?「温度」如何影响 Scaling 效果?Test-time Scaling 有哪些 「Where」需要改进?... 机器之心PRO · 会员通讯 Week 49 --- 本周为您解读 ③ 个值得细品的 AI & Robotics 业内要事 --- 1. 多非共识,Test-time Scaling 能否一直大力出奇迹? 2. Skills vs MCP,谁才是 「大模型的 HTTP 时刻」? 一年过去,社区对于 MCP 的定位仍有争议?平均 25 个用户对应 1 个开发者,MCP 目前更多是开发者自娱自乐的产物?「人如其名」,Skills 真是来 kill MCP 的?MCP 能做但 Skills 不能做 的,现在也没什么用?MCP 大规模落地还得看下一个「微信小程序」入口的出现?... 3. 从否定单模 AGI 到回应开源冲击,OpenAI 如何打造「最强平台」? 曾被视为真理的「单模 AGI」为何在商业现实面前彻底梦 ...