Workflow
机器之心
icon
Search documents
4个月烧掉30亿Token,这位「菜鸟」程序员做出50多个产品,360万人围观
机器之心· 2026-01-03 04:13
Core Insights - The article discusses the evolution of programming in the age of AI, emphasizing that coding is no longer a tedious process but rather an engaging experience where individuals can collaborate with AI to create their desired projects [2][7]. Group 1: New Paradigm of Programming - The traditional view of programming as a skill requiring deep technical knowledge is being challenged by the rise of AI, which allows individuals to engage in coding without extensive prior experience [2][6]. - Ben Tossell, a developer with limited coding skills, has utilized AI to process 30 billion tokens, demonstrating that the ability to navigate systems is more important than traditional coding skills [3][7]. - The concept of "vibe coding" is introduced, which suggests that programming can be approached with a more intuitive mindset, similar to the no-code movement [6][32]. Group 2: Practical Applications and Projects - Tossell has successfully redesigned his personal website and completed around 50 projects using AI, showcasing the practical benefits of this new coding paradigm [10][11]. - He primarily works through a command-line interface (CLI), which he finds more efficient than graphical user interfaces, allowing for a clearer view of the coding process [13][20]. - Tossell emphasizes the importance of end-to-end testing in his projects to catch bugs early, indicating a shift in focus towards quality assurance in the development process [18]. Group 3: Learning and Adaptation - Tossell's approach to learning programming has shifted from traditional methods to a system-thinking perspective, allowing him to understand the components of projects more effectively [27]. - The article highlights the importance of asking fundamental questions to deepen understanding, which has helped Tossell break down barriers in his coding journey [30][25]. - The flexibility of AI tools enables rapid prototyping and iteration, allowing developers to experiment without significant emotional or financial investment [34][35]. Group 4: Future of Programming - The article posits that the future of programming lies in collaboration with AI, where individuals can focus on providing context and prompts rather than mastering complex syntax [24][36]. - Tossell believes that anyone with a desire to explore technology can engage in programming, as the barriers to entry have been significantly lowered [36]. - The rapid feedback loop facilitated by AI tools is expected to lead to an explosion of innovative projects, transforming the landscape of software development [35].
微信炼出扩散语言模型,实现vLLM部署AR模型3倍加速,低熵场景超10倍
机器之心· 2026-01-03 04:13
问题的关键在于:大多数扩散语言模型采用双向注意力机制,这与标准的 KV 缓存机制不兼容,导致并行预测的优势无法转化为实际的速度提升。 近日,腾讯微信 AI 团队提出了 WeDLM (WeChat Diffusion Language Model),这是 首个在工业级推理引擎(vLLM)优化条件下,推理速度超越同等 AR 模型 的扩散语言模型。 腾讯微信 AI 团队提出 WeDLM(WeChat Diffusion Language Model),通过在标准因果注意力下实现扩散式解码,在数学推理等任务上实现相比 vLLM 部署的 AR 模型 3 倍以上加速,低熵场景更可达 10 倍以上,同时保持甚至提升生成质量。 引言 自回归(AR)生成是当前大语言模型的主流解码范式,但其逐 token 生成的特性限制了推理效率。扩散语言模型(Diffusion LLMs)通过并行恢复多个 mask token 提供了一种替代方案,然而在实践中,现有扩散模型往往难以在推理速度上超越经过高度优化的 AR 推理引擎(如 vLLM)。 论文标题:WeDLM: Reconciling Diffusion Language Models ...
LeCun在Meta还有论文:JEPA物理规划的「终极指南」
机器之心· 2026-01-03 04:13
编辑|Panda 长期以来,AI 领域一直怀揣着一个宏大的梦想:创造出能够像人类一样直观理解物理世界,并在从未见过的任务和环境中游刃有余的智能体。 传统的强化学习方法往往比较笨拙,需要通过无数次的试错和海量的样本才能学到一点皮毛,这在奖励信号稀疏的现实环境中简直是灾难。 为了打破这一僵局,研究者们提出了「 世界模型 」这一概念,即让智能体在脑海中构建一个物理模拟器,通过预测未来状态来进行演练。 近年来,虽然能够生成精美像素画面的生成式模型层出不穷,但对于物理规划而言,沉溺于无关紧要的细节(如背景烟雾的流动)往往是低效的。真正的挑战在 于,如何在错综复杂的原始视觉输入中提取抽象精髓。 这便引出了本研究的主角: JEPA-WM(联合嵌入预测世界模型) 。 从名字也能看出来,这个模型与 Yann LeCun 的 JEPA(联合嵌入预测架构) 紧密相关。事实上也确实如此,并且 Yann LeCun 本人也是该论文的作者之一。更有 意思的是,在这篇论文中,Yann LeCun 的所属机构为 Meta FAIR。不知道这是不是他在 Meta 的最后一篇论文? | Adrien Bardes | | --- | | Met ...
陶哲轩:AI让数学进入「工业化」时代,数学家也可以是「包工头」
机器之心· 2026-01-03 01:35
Core Insights - The article discusses the transformation of mathematical research, driven by AI and formal proof languages like Lean, moving away from traditional methods towards a more industrialized approach [1][2]. Group 1: Transformation of Mathematical Research - Terry Tao highlights that traditional mathematical research is facing a paradigm shift due to the integration of AI and formal proof systems, which reduce repetitive tasks and enhance collaboration [2][5]. - The use of large language models (LLMs) and automated formalization is making tedious tasks easier, allowing mathematicians to focus on more complex problems [2][9]. - The modularization of research is expected to enable non-experts, or "citizen mathematicians," to contribute to advanced research, thereby accelerating progress in the field [2][29]. Group 2: Changes in Collaboration and Roles - The article suggests that the future of mathematics may resemble software engineering, with roles such as "architects" or project managers emerging to oversee large collaborative projects [2][23]. - Tao emphasizes the importance of collaboration, noting that the traditional model of individual research is insufficient for the complexity of modern mathematical problems [25][26]. - The integration of formal tools and AI is expected to facilitate seamless collaboration among individuals with varying skill sets, allowing for a more efficient division of labor in mathematical research [27][28]. Group 3: Impact of Formalization on Mathematical Thinking - Formalization is changing the way mathematicians think, helping them identify implicit assumptions and refine their definitions, which leads to clearer and more concise writing [10][12]. - The process of formalization encourages a new style of proof writing that is more modular and easier to understand, contrasting with traditional linear proofs [12][13]. - Tao notes that formalization allows for a more precise understanding of the applicability of mathematical tools, potentially leading to breakthroughs in various areas [15][16]. Group 4: Future of Mathematical Research - The article predicts a future where the role of mathematicians will expand to include project management and coordination of large-scale research efforts, rather than solely focusing on individual contributions [29][30]. - As tools and collaboration methods evolve, the barriers to entry for participating in mathematical research are expected to decrease, allowing a broader range of individuals to engage in the field [30][31]. - The potential for AI to handle repetitive tasks in mathematical research is seen as a way to unlock new levels of productivity and creativity among mathematicians [32][34].
Sebastian Raschka万字年终复盘:2025,属于「推理模型」的一年
机器之心· 2026-01-02 09:30
Core Insights - The AI field continues to evolve rapidly, with significant advancements in reasoning models and algorithms such as RLVR and GRPO, marking 2025 as a pivotal year for large language models (LLMs) [1][4][19] - DeepSeek R1's introduction has shifted the focus from merely stacking parameters to enhancing reasoning capabilities, demonstrating that high-performance models can be developed at a fraction of previously estimated costs [9][10][12] - The importance of collaboration between humans and AI is emphasized, reflecting on the boundaries of this partnership and the evolving role of AI in various tasks [1][4][66] Group 1: Reasoning Models and Algorithms - The year 2025 has been characterized as a "year of reasoning," with RLVR and GRPO algorithms gaining prominence in the development of LLMs [5][19] - DeepSeek R1's release showcased that reasoning behavior can be developed through reinforcement learning, enhancing the accuracy of model outputs [6][19] - The estimated training cost for the DeepSeek R1 model is significantly lower than previous assumptions, around $5.576 million, indicating a shift in cost expectations for advanced model training [10][12] Group 2: Focus Areas in LLM Development - Key focus areas for LLM development have evolved over the years, with 2025 emphasizing RLVR and GRPO, following previous years' focus on RLHF and LoRA techniques [20][22][24] - The trend of "Benchmaxxing" has emerged, highlighting the overemphasis on benchmark scores rather than real-world applicability of LLMs [60][63] - The integration of tools in LLM training has improved performance, allowing models to access external information and reduce hallucination rates [54][56] Group 3: Architectural Trends - The architecture of LLMs is converging towards using mixture of experts (MoE) layers and efficient attention mechanisms, indicating a shift towards more scalable and efficient models [43][53] - Despite advancements, traditional transformer architectures remain prevalent, with ongoing improvements in efficiency and engineering adjustments [43][53] Group 4: Future Directions - Future developments are expected to focus on expanding RLVR applications beyond mathematics and coding, incorporating reasoning evaluation into training signals [25][27] - Continuous learning is anticipated to gain traction, addressing challenges such as catastrophic forgetting while enhancing model adaptability [31][32] - The need for domain-specific data is highlighted as a critical factor for LLMs to establish a foothold in various industries, with proprietary data being a significant concern for companies [85][88]
KAN作者刘子鸣:AI还没等到它的「牛顿」
机器之心· 2026-01-02 05:00
Core Viewpoint - The article discusses the current state of AI research, likening it to the early stages of physics, specifically the Tycho era, where there is a wealth of observational data but a lack of systematic understanding of underlying principles [1][8]. Group 1: Current State of AI Research - AI research is still in the observational phase, focusing primarily on performance metrics rather than understanding the underlying phenomena [3][9]. - The pursuit of short-term performance has led to a significant "cognitive debt," as the field has bypassed the critical step of understanding [3][9]. - The academic publishing culture favors "perfect stories" or significant performance improvements, which has resulted in the neglect of valuable but fragmented observational work [5][12]. Group 2: Call for a New Approach - There is a need for a more accessible and inclusive phenomenological approach in AI research, which does not prioritize immediate applicability or require a complete narrative [17][21]. - This new approach should emphasize controllability through toy models, multi-perspective characterization, and curiosity-driven exploration [21][22]. - The article advocates for researchers to document observations and collaborate more broadly, moving away from the fragmented nature of current AI research communities [22]. Group 3: Challenges in Phenomenology Development - The development of AI phenomenology is hindered by the high standards for publication, which often only recognize universally applicable or surprising phenomena [15][16]. - Many interesting phenomena are discarded because they cannot be easily structured into a publishable format, leading to a loss of potentially valuable insights [14][22]. - The article highlights the need for a shift in mindset to foster a more robust understanding of AI phenomena, akin to the evolution seen in physics [7][9].
自回归也能做强视觉模型?NEPA开启「下一嵌入预测」时代,谢赛宁参与
机器之心· 2026-01-02 05:00
Core Viewpoint - The article discusses a new approach in visual pre-training called Next-Embedding Predictive Autoregression (NEPA), which shifts the paradigm from learning representations to learning models, demonstrating strong performance in visual tasks similar to language models [2][18]. Group 1: NEPA Overview - NEPA is a minimalist approach that predicts the next feature block of an image, akin to how language models predict the next word [20]. - The method utilizes causal masking and stop gradient techniques to ensure stable predictions without requiring complex architectures [17][25]. - NEPA has shown competitive performance on benchmarks like ImageNet-1K, achieving Top-1 accuracy of 83.8% for ViT-B and 85.3% for ViT-L, surpassing several state-of-the-art methods [29]. Group 2: Methodology and Architecture - The architecture employs a standard visual Transformer (ViT) backbone with causal attention masking, directly predicting future image block embeddings based on past embeddings [22]. - Unlike pixel-level reconstruction methods, NEPA does not require a separate decoder, simplifying the model design [22]. - The training process involves segmenting images into patches, encoding them into vectors, and predicting the next patch while preventing the model from "cheating" by using stop-gradient techniques [25]. Group 3: Performance and Applications - NEPA demonstrates strong transfer capabilities, achieving 48.3% and 54.0% mIoU on the ADE20K semantic segmentation task, indicating its ability to learn rich semantic features necessary for dense prediction tasks [29]. - The model can be adapted for various downstream tasks by simply changing the classification head, showcasing its versatility [30]. - Visual analysis reveals that NEPA learns long-range, object-centered attention patterns, effectively ignoring background noise and focusing on semantically relevant areas [37].
让模型自己找关键帧、视觉线索,小红书Video-Thinker破解视频推理困局
机器之心· 2026-01-02 03:12
Core Insights - The article discusses the revolutionary advancements in video reasoning through the introduction of the "Thinking with Videos" paradigm, specifically the Video-Thinker model, which enhances the model's ability to autonomously navigate and understand temporal sequences in videos [2][6][10]. Group 1: Model Development and Methodology - Video-Thinker integrates "temporal grounding" and "visual captioning" into the model's cognitive chain, eliminating reliance on external tools and enabling the model to autonomously identify key frames and extract visual cues [2][10]. - The research team constructed the Video-Thinker-10K dataset, consisting of 10,000 high-quality samples, and employed a two-phase training strategy of "supervised fine-tuning + reinforcement learning" to enhance the model's self-exploration and self-correction capabilities [3][10]. - The model achieved state-of-the-art (SOTA) performance in various challenging video reasoning benchmarks, significantly surpassing existing baselines with its 7 billion parameters [3][22]. Group 2: Data Quality and Training Process - The construction of high-quality training data is crucial for developing complex reasoning capabilities, leading to the integration of six major datasets into Video-Thinker-10K, which combines precise temporal annotations with detailed visual descriptions [12][13]. - The training process involved a structured thinking paradigm where the model learns to output specific labels such as <time> and <caption>, ensuring a rigorous "locate - perceive - reason" sequence [16][18]. - The reinforcement learning phase, utilizing Group Relative Policy Optimization (GRPO), allowed the model to explore and optimize its reasoning strategies, leading to emergent cognitive behaviors akin to human metacognition [19][22]. Group 3: Performance Evaluation - Video-Thinker-7B demonstrated significant advantages across various video reasoning benchmarks, establishing a new SOTA for models with 7 billion parameters [25][29]. - The model's performance was evaluated through both in-domain and out-of-domain assessments, showcasing its ability to generalize effectively to unseen scenarios [24][29]. - The model achieved an accuracy of 43.22% on the Video-Holmes benchmark and 80.69% on the VRBench, outperforming previous models by notable margins [29][30]. Group 4: Key Findings and Implications - The model's success is attributed to its internal capabilities of grounding and captioning, which were quantitatively assessed and found to be superior to those of baseline models [32][36]. - The findings indicate that relying on external tools can hinder performance, as demonstrated by experiments showing that simple plug-and-play tools did not enhance, but rather degraded, the model's reasoning capabilities [34][35]. - The article concludes that Video-Thinker's approach of integrating core internal capabilities rather than depending on large parameters and datasets represents a new paradigm in video reasoning, with potential applications across various industries [39].
Meta重磅:让智能体摆脱人类知识的瓶颈,通往自主AI的SSR级研究
机器之心· 2026-01-02 03:12
Core Viewpoint - Meta is pursuing the ambitious goal of developing "superintelligent" AI, which aims to create autonomous AI systems that surpass human expert levels. This initiative has faced skepticism from experts like Yann LeCun, who believes the path to superintelligence is impractical [1]. Group 1: SSR Methodology - The Self-play SWE-RL (SSR) method is introduced as a new approach to training superintelligent software agents, which can learn and improve without relying on existing problem descriptions or human supervision [2][4]. - SSR leverages self-play systems, similar to AlphaGo, allowing software agents to interact with real code repositories to autonomously generate learning experiences [2][4]. - The SSR framework operates with minimal reliance on human data, assuming access to sandboxed code repositories with source code and dependencies, eliminating the need for manually annotated issues or test cases [4]. Group 2: Bug Injection and Repair Process - The SSR framework involves two roles: a bug-injection agent that introduces bugs into a codebase and a bug-solving agent that generates patches to fix these bugs [8][9]. - The bug-injection agent creates artifacts that intentionally introduce bugs, which are then verified for consistency to ensure they are reproducible [9][11]. - The bug-solving agent generates final patches based on the defined bugs, with success determined by the results of tests associated with those bugs [11][12]. Group 3: Performance Evaluation - Experimental results show that SSR demonstrates stable and continuous self-improvement even without task-related training data, indicating that large language models can enhance their software engineering capabilities through interaction with original code repositories [17]. - SSR outperforms traditional baseline reinforcement learning methods in two benchmark tests, achieving improvements of +10.4% and +7.8% respectively, highlighting the effectiveness of self-generated learning tasks over manually constructed data [17]. - Ablation studies indicate that the self-play mechanism is crucial for performance, as it continuously generates dynamic task distributions that enrich the training signals [19][20]. Group 4: Implications for AI Development - SSR represents a significant step towards developing autonomous AI systems that can learn and improve without direct human supervision, addressing fundamental scalability limitations in current AI development [21][22]. - The ability of large language models to generate meaningful learning experiences from real-world software repositories opens new possibilities for AI training beyond human-curated datasets, potentially leading to more diverse and challenging training scenarios [22]. - As AI systems become more capable, the ability to learn autonomously from real-world environments is essential for developing intelligent agents that can effectively solve complex problems [25].
「辍学创业」的风再次席卷硅谷,但真正的变量从来不是学位
机器之心· 2026-01-02 03:12
Core Viewpoint - The trend of "dropping out to start a business" is gaining traction in Silicon Valley, with many founders emphasizing their dropout status as a positive credential in the venture capital community [3][4]. Group 1: The Trend of Dropping Out - More founders at Y Combinator's Demo Day are highlighting their dropout experiences, indicating that this has become a badge of honor reflecting their commitment to entrepreneurship [4]. - The urgency to capitalize on the AI startup boom is driving some students to abandon their studies, believing that a degree may hinder their chances of securing funding [5]. - Some investors express skepticism about the extreme dropout trend, suggesting that the value of a college network and brand remains significant, even for those who do not graduate [7]. Group 2: Perspectives on Age and Experience - While many young founders are dropping out, some investors prefer older founders who possess wisdom gained from experience and setbacks, viewing this as a more valuable trait [8]. - Despite the trend, many leading AI entrepreneurs still choose to complete their education, indicating that a degree can still hold value in the industry [9]. Group 3: The Nature of Dropping Out - The concept of "dropping out" has evolved; many who drop out continue to engage in their entrepreneurial pursuits in resource-rich environments [10]. - Ultimately, success is determined by the founder's ability to leverage the right resources and networks at the right time, rather than merely holding a degree [12].