Workflow
大语言模型(LLMs)
icon
Search documents
敏捷大佬:AI 大模型彻底改写编程规则,这一变化颠覆所有人认知
程序员的那些事· 2025-09-05 01:08
Core Viewpoint - The emergence of large language models (LLMs) represents a transformative change in software development, comparable to the shift from assembly language to the first generation of high-level programming languages [5][10]. Group 1: Impact of LLMs on Programming - LLMs not only enhance the level of abstraction in programming but also compel a reevaluation of what it means to program with non-deterministic tools [7][10]. - The transition from deterministic to non-deterministic programming paradigms expands the dimensions of programming practices [8][10]. Group 2: Historical Context of Programming Languages - High-level programming languages (HLLs) introduced a new level of abstraction, allowing programmers to think in terms of sequences, conditions, and iterations rather than specific machine instructions [8][9]. - Despite advancements in programming languages, the fundamental nature of programming has not changed significantly until the advent of LLMs [6][9]. Group 3: Embracing Non-Determinism - The introduction of non-deterministic abstractions means that results from LLMs cannot be reliably reproduced, contrasting with the consistent outcomes from traditional programming [10][13]. - The industry is experiencing a radical transformation as developers learn to navigate this non-deterministic environment, which is unprecedented in the history of software development [13].
招聘最猛的竟不是OpenAI,这家陷入间谍案的HR初创,正在狂招工程师
3 6 Ke· 2025-09-04 08:22
Group 1 - The U.S. tech job market has undergone significant changes since the launch of ChatGPT in November 2022, with some positions experiencing drastic declines while others remain in high demand [1] - The largest wave of layoffs in U.S. history began in 2023, impacting the IT job market, but hiring activities are gradually recovering, albeit with limited new positions [2] - The average tenure of software engineers at major tech companies has increased significantly, indicating a slowdown in hiring and a reluctance among employees to change jobs [6][80] Group 2 - The demand for AI engineers has surged since mid-2023, making it the hottest position in the tech industry, with a notable increase in job openings [29] - Major tech companies like Apple, IBM, and Amazon are leading in job openings, with Apple having the highest number at 2,177 positions [13] - Over half of the open positions are at senior levels, and there is a notable decrease in vacancies for senior engineers, prompting them to apply for lower-level positions [21][24] Group 3 - The San Francisco Bay Area remains the dominant hub for tech jobs, accounting for nearly 20% of global tech job openings, with a total of 9,072 positions [72][74] - The average tenure at major tech companies has increased by about two years over the past three years, reflecting a more stable workforce amid hiring slowdowns [80] - The trend of internal mobility among major tech firms is prevalent, with companies primarily hiring from each other, leading to longer tenures [85] Group 4 - Remote job opportunities have decreased, with the proportion of remote positions falling from 25% to 20% over the past year, although AI engineering roles still see a slight increase in remote opportunities [98][100] - The salary for remote positions has generally declined by 10-15%, as supply exceeds demand, making high-paying remote jobs a rare privilege [102]
Kitchen-R :高层任务规划与低层控制联合评估的移动操作机器人基准
具身智能之心· 2025-08-25 00:04
Core Viewpoint - The article introduces the Kitchen-R benchmark, a unified evaluation framework for task planning and low-level control in embodied AI, addressing the existing fragmentation in current benchmarks [4][6][8]. Group 1: Importance of Benchmarks - Benchmarks are crucial in various fields such as natural language processing and computer vision for assessing model progress [7]. - In robotics, simulator-based benchmarks like Behavior-1K are common, providing model evaluation and training capabilities [7]. Group 2: Issues with Existing Benchmarks - Current benchmarks for high-level language instruction and low-level robot control are fragmented, leading to incomplete assessments of integrated systems [8][9]. - High-level benchmarks often assume perfect execution of atomic tasks, while low-level benchmarks rely on simple single-step instructions [9]. Group 3: Kitchen-R Benchmark Features - Kitchen-R fills a critical gap in embodied AI research by providing a comprehensive testing platform that closely simulates real-world scenarios [6][8]. - It includes a digital twin kitchen environment and over 500 language instructions, supporting mobile ALOHA robots [9][10]. - The benchmark supports three evaluation modes: independent evaluation of planning modules, independent evaluation of control strategies, and critical full system integration evaluation [9][10]. Group 4: Evaluation Metrics - Kitchen-R is designed with offline independent evaluation and online joint evaluation metrics to ensure comprehensive system performance measurement [16][20]. - Key metrics include Exact Match (EM) for task planning accuracy and Mean Squared Error (MSE) for trajectory prediction accuracy [20][21]. Group 5: Baseline Methods - Kitchen-R provides two baseline methods: a VLM-driven task planning baseline and a Diffusion Policy low-level control baseline [43][49]. - The VLM planning baseline enhances planning accuracy through contextual examples and constrained generation [47][48]. - The Diffusion Policy baseline integrates visual features and robot states to predict future actions [49][52]. Group 6: Future Directions - Kitchen-R can expand to include more complex scenarios, such as multi-robot collaboration and dynamic environments, promoting the application of language-guided mobile manipulation robots in real-world settings [54].
速递|种子轮融资500万美元,Paradigm配备超5000个AI智能体表格
Z Potentials· 2025-08-19 15:03
Core Insights - Paradigm has launched a product that integrates AI agents into spreadsheets, aiming to enhance the management of CRM data traditionally stored in spreadsheets [3][4] - The company has raised $5 million in seed funding led by General Catalyst, bringing total funding to $7 million [3] - Paradigm's platform features over 5,000 AI agents that can autonomously gather and populate information in spreadsheets [3][4] Funding and Product Development - Paradigm completed a $5 million seed round, with total funding reaching $7 million [3] - The product is currently in a closed beta testing phase, with plans for continuous iteration based on user feedback [3] Target Market and User Base - Early adopters include consulting firms like Ernst & Young, AI chip startups, and AI programming companies [4] - The platform attracts a diverse user base, including consultants, sales professionals, and finance personnel, utilizing a tiered subscription model based on usage [3][4] Competitive Landscape - Paradigm does not view itself as a competitor in the AI-driven spreadsheet market but rather as a new type of AI-driven workflow [5] - Other companies, such as Quadratic, are also working on integrating AI into spreadsheets, with Quadratic having raised over $6 million [4]
开源扩散大模型首次跑赢自回归!上交大联手UCSD推出D2F,吞吐量达LLaMA3的2.5倍
机器之心· 2025-08-18 03:22
Core Insights - The article discusses the introduction of Discrete Diffusion Forcing (D2F), a new model that significantly enhances the inference speed of open-source diffusion large language models (dLLMs) compared to autoregressive (AR) models, achieving up to 2.5 times higher throughput on benchmarks like GSM8K [2][6][22]. Group 1: Challenges and Solutions - Existing dLLMs face challenges such as the lack of a complete KV cache mechanism and insufficient parallel potential, resulting in slower inference speeds compared to AR models [2][8]. - D2F addresses these challenges by integrating a mixed paradigm of autoregressive and diffusion approaches, optimizing model architecture, training methods, and inference strategies [11][12]. Group 2: D2F Design Features - D2F incorporates block-level causal attention to ensure compatibility with KV caching, allowing for the reuse of KV states and reducing computational redundancy [12][15]. - The model employs asymmetric distillation and structured noise scheduling to efficiently transfer knowledge from a pre-trained teacher model to the D2F student model, enhancing its parallel capabilities [18]. Group 3: Inference Mechanism - D2F introduces a pipelined parallel decoding algorithm that maintains a dynamic decoding window, allowing for semi-activated and fully-activated states to optimize throughput and quality [20][21]. - The model achieves a maximum speedup of up to 50 times compared to original dLLMs while maintaining average performance levels [22]. Group 4: Performance Metrics - D2F demonstrates superior performance-efficiency trade-offs, with the ability to adapt to various scenarios by adjusting decoding parameters, achieving over four times the throughput of AR models in specific tasks [25]. - Comparative tests show D2F-LLaDA achieving a throughput of 52.5 tokens per second, representing a 7.3 times increase over baseline methods [23]. Group 5: Future Directions - The success of D2F indicates a promising path for further research in parallel decoding technologies, with potential future developments including real-time serving capabilities and hybrid parallel processing [28].
万字长文!首篇智能体自进化综述:迈向超级人工智能之路~
自动驾驶之心· 2025-07-31 23:33
Core Insights - The article discusses the transition from static large language models (LLMs) to self-evolving agents that can adapt and learn continuously from interactions with their environment, aiming for artificial superintelligence (ASI) [3][5][52] - It emphasizes three fundamental questions regarding self-evolving agents: what to evolve, when to evolve, and how to evolve, providing a structured framework for understanding and designing these systems [6][52] Group 1: What to Evolve - Self-evolving agents can improve various components such as models, memory, tools, and workflows to enhance performance and adaptability [14][22] - The evolution of agents is categorized into four pillars: cognitive core (model), context (instructions and memory), external capabilities (tool creation), and system architecture [22][24] Group 2: When to Evolve - Self-evolution occurs in two main time modes: intra-test-time self-evolution, which happens during task execution, and inter-test-time self-evolution, which occurs between tasks [26][27] - The article outlines three basic learning paradigms relevant to self-evolution: in-context learning (ICL), supervised fine-tuning (SFT), and reinforcement learning (RL) [27][28] Group 3: How to Evolve - The article discusses various methods for self-evolution, including reward-based evolution, imitation and demonstration learning, and population-based approaches [32][36] - It highlights the importance of continuous learning from real-world interactions, seeking feedback, and adjusting strategies based on dynamic environments [30][32] Group 4: Evaluation of Self-evolving Agents - Evaluating self-evolving agents presents unique challenges, requiring assessments that capture adaptability, knowledge retention, and long-term generalization capabilities [40] - The article calls for dynamic evaluation methods that reflect the ongoing evolution and diverse contributions of agents in multi-agent systems [51][40] Group 5: Future Directions - The deployment of personalized self-evolving agents is identified as a critical goal, focusing on accurately capturing user behavior and preferences over time [43] - Challenges include ensuring that self-evolving agents do not reinforce existing biases and developing adaptive evaluation metrics that reflect their dynamic nature [44][45]
大模型隐私安全和公平性有“跷跷板”效应,最佳平衡法则刚刚找到 | 人大&上海AI Lab
量子位· 2025-07-27 11:57
Core Insights - The research from Renmin University and Shanghai AI Lab reveals that enhancing privacy protection in large language models (LLMs) can lead to a significant drop in fairness, with a decline of up to 45% [1][8] - The study identifies a "seesaw effect" caused by coupled neurons that encode both fairness and privacy, leading to conflicts during model optimization [1][10] Group 1: Ethical Challenges in LLMs - The concept of "Alignment Tax" describes the trade-off where optimizing for alignment-related goals often sacrifices other foundational capabilities like general knowledge and reasoning [3] - As LLMs are increasingly integrated into critical sectors such as healthcare, finance, and education, ensuring models maintain fairness and privacy has become essential [4][5] - Users expect LLMs to protect privacy while also ensuring fairness, but achieving both simultaneously is challenging [7] Group 2: SPIN Methodology - The SPIN method is introduced as a training-free solution that involves precisely suppressing 0.00005% of key neurons to enhance both fairness and privacy [2][12] - The approach involves three steps: identifying critical neurons, locating coupled neurons that impact both fairness and privacy, and implementing suppression to decouple their effects [13][15][16] - SPIN demonstrates significant improvements in fairness and privacy metrics across various models, outperforming traditional fine-tuning methods [17][18][19] Group 3: Performance and Robustness - SPIN allows for zero-cost deployment, requiring only a one-time neuron scan, and operates without additional computational costs during inference [20] - The method shows resilience even when trained on harmful data, maintaining stable improvements in fairness and privacy [26][31] - SPIN's effectiveness is validated through various benchmark tests, indicating that it can enhance model performance without sacrificing intelligence [21][22] Group 4: Broader Implications - The principles behind SPIN can be extended to address other ethical conflicts in AI, such as balancing safety and utility [37] - The research highlights the importance of understanding neuron-level interactions to create more responsible AI systems [12][37]
港科大等提出LOVON:足式机器人开放世界全域目标追踪新范式!
具身智能之心· 2025-07-27 09:37
Core Viewpoint - The article introduces the LOVON framework, which integrates large language models, open vocabulary visual detection, and precise language-motion mapping to enhance the navigation capabilities of legged robots in dynamic and unstructured environments [4][6][23]. Group 1: LOVON Framework Overview - LOVON addresses the challenges of long-range multi-target navigation for legged robots in complex environments, overcoming limitations of traditional methods that struggle with real-time visual disturbances and target loss [3][6]. - The framework combines task planning capabilities of large language models with open vocabulary visual detection, enabling robots to efficiently navigate and track dynamic targets in open-world scenarios [4][6][10]. Group 2: Key Features of LOVON - LOVON consists of three core modules that create a closed loop of language, vision, and motion, enhancing the robot's ability to perform complex tasks [10]. - The framework employs Laplacian variance filtering technology to stabilize visual processing, improving the detection frame rate by 25% during robot movement [12][13]. - An adaptive execution logic allows robots to respond to unexpected situations, such as target loss or external interference, by switching to search mode or seamlessly executing new commands [14][16]. Group 3: Performance Metrics - In simulated environments, LOVON achieved a success rate (SR) of 1.00, significantly outperforming traditional methods like EVT, which had an SR of 0.94 [19]. - The training efficiency of LOVON is remarkable, requiring only 1.5 hours to complete training, compared to 360 hours for the best competing model, TrackVLA, representing a 240-fold improvement [19][20]. Group 4: Practical Applications - LOVON's "plug-and-play" feature allows easy deployment on various mainstream legged robot platforms, supporting applications in home services, industrial inspections, and field research [21][24]. - The framework demonstrates exceptional capabilities in open-world adaptation, multi-target long-range tracking, robustness in dynamic environments, and resistance to interference, making it suitable for diverse real-world scenarios [24].
港科大&北京人形提出LOVON:足式机器人开放世界全域目标追踪新范式!
机器之心· 2025-07-25 04:29
Core Viewpoint - The LOVON framework represents a significant advancement in the field of robotics, enabling legged robots to autonomously navigate complex, dynamic environments by integrating large language models, open vocabulary visual detection, and precise language-motion mapping [2][5][20]. Group 1: Introduction to LOVON - The LOVON framework addresses the challenges of long-range multi-target navigation in open environments, overcoming limitations of traditional methods that struggle with real-time visual disturbances and target loss [1][5]. - It combines task planning capabilities of large language models with open vocabulary visual detection and a language-motion mapping model, allowing for efficient navigation in dynamic, unstructured settings [2][5]. Group 2: Core Modules of LOVON - LOVON integrates three core modules to create a closed loop of language, vision, and motion, enhancing the robot's navigation capabilities [9]. - The framework employs Laplacian variance filtering technology to stabilize visual processing, improving the detection rate of clear frames by 25% during robot movement [11][12]. - An adaptive execution logic allows robots to respond to unexpected situations, such as target loss or external disturbances, by switching to search mode or seamlessly executing new commands [13][15]. Group 3: Performance Metrics - In simulation environments like GymUnreal, LOVON achieved a success rate of 1.00, significantly outperforming traditional methods, which had a success rate of 0.94 [18]. - The training efficiency of LOVON is remarkable, requiring only 1.5 hours compared to 360 hours for the best competing model, indicating a 240-fold improvement [18]. Group 4: Real-World Applications - LOVON has been successfully deployed on various legged robot platforms, including Unitree Go2, B2, and H1-2, showcasing its plug-and-play capability without the need for extensive customization [19]. - The framework is poised to transform applications in smart homes, industrial inspections, and field research, providing robust support for diverse tasks [20][21]. Group 5: Key Features - LOVON demonstrates exceptional open-world adaptability, enabling robots to recognize a wide range of objects in unfamiliar environments [23]. - It excels in multi-target long-range tracking, executing complex tasks smoothly and without interruption [23]. - The framework exhibits strong robustness in dynamic environments, maintaining stable tracking of moving targets across various terrains [23]. - LOVON's anti-interference capabilities allow it to quickly reacquire targets and continue tasks despite disruptions [23].
让 VLMs 更适配机器人:小型VLMs也能展现出强大的视觉规划能力
具身智能之心· 2025-07-15 13:49
Core Insights - The article discusses the potential of large language models (LLMs) in robotic program planning, highlighting their ability to generate coherent action sequences but also noting their limitations in providing the necessary sensory details for physical execution [3][4] - It introduces a new framework called SelfReVision, which enhances the performance of small visual language models (VLMs) through self-distillation without external supervision, aiming to improve their planning capabilities in real-world scenarios [4][9] Research Background - LLMs show promise in generating action sequences but often lack the precision required for robotic tasks due to their reliance on human-centric training data [3] - Visual language models (VLMs) can potentially address these limitations, but existing methods either require specialized simulation environments or are costly to train and deploy [3] Methodology - SelfReVision is proposed as a self-improvement framework that allows small VLMs to enhance their performance through iterative self-critique and revision [4][6] - The framework operates in three stages: critique, revise, and verify, enabling models to generate and refine plans based on self-assessment [4][10] Experimental Setup - Two types of experiments were conducted to evaluate the planning capabilities of SelfReVision: image-based program planning and entity-agent tasks [11] - Evaluation metrics included coverage, ordering, completeness, overall quality, and a new metric called image groundedness [12] Key Results - SelfReVision significantly outperformed baseline models across various metrics, achieving an average win rate of 68% on the PLACES dataset and 72% on the SIMULATION dataset [13] - Larger models benefited more from SelfReVision, with an average gain of 74% for models with 12 billion parameters or more [13] Comparison with Other Methods - SelfReVision demonstrated clear advantages over other methods like Best-of-N and PaliGemma, with improvements of 60% in most settings compared to modest gains from Best-of-N [17] - When compared to GPT-4o, SelfReVision's plans had at least a 25% higher win rate for models with 12 billion parameters or more, indicating its effectiveness in enhancing smaller models [17] Ablation Studies - The complete Criticize-Revise-Verify (CRV) process showed the strongest performance, with average win rates of 68.3% on the PLACES dataset and 71.9% on the SIMULATION dataset [18] - Variants of the process showed significant performance drops, emphasizing the importance of the verification step in filtering out suboptimal revisions [18] Application in Entity-Agent Tasks - SelfReVision was tested in challenging scenarios, showing a 26% improvement for the Gemma 12B model and a 17% improvement for the Gemma 27B model in block manipulation tasks [21] - In hierarchical tasks, SelfReVision plans led to a 70% success rate in generating trajectories, surpassing the 61% success rate of baseline models [21]