LLaMA系列模型

Search documents
华人2亿美元年薪破界,AI竞赛冰火两重天
Sou Hu Cai Jing· 2025-07-11 06:03
Group 1 - Meta has offered over $200 million annual salary to Ruoming Pang, a prominent AI/ML expert from Apple, to strengthen its newly established "Superintelligence Labs" [4][8] - The compensation package for Pang exceeds Apple's CEO Tim Cook's salary of $74.6 million and approaches the earnings of sports stars like Cristiano Ronaldo and Stephen Curry [4] - The majority of Pang's compensation is structured as stock options, signing bonuses, and performance-based incentives, requiring years of service and achievement of Meta's market value growth targets to unlock [4] Group 2 - Microsoft has laid off 15,000 employees, including 9,000 in its third round of layoffs, as part of a cost-cutting strategy amid a significant increase in AI infrastructure investment [5][7] - The layoffs reflect a broader trend in the tech industry, where companies are restructuring to focus resources on AI, with Amazon cutting 27,000 jobs and other firms like Google and IBM also reducing staff [7] - The shift towards AI is leading to the replacement of traditional IT roles, as seen in Microsoft's layoffs where 40% of the affected positions were software engineers, indicating a significant transformation in the workforce [5][7] Group 3 - Meta's recruitment of Pang is part of a larger strategy to enhance its capabilities in large language models and intelligent assistants, addressing concerns about its AI progress compared to competitors [9] - Apple is reportedly considering abandoning its in-house large language model development in favor of technologies from Anthropic or OpenAI due to slow internal progress, leading to the exit of several key AI engineers [9] - The competition for AI talent is intensifying, with Meta actively recruiting from leading tech firms to fill gaps in its AI research and development [9]
精准调控大模型生成与推理!浙大&腾讯新方法尝试为其注入“行为定向剂”
量子位· 2025-06-05 10:28
Core Viewpoint - The article discusses the dilemma in controlling large AI models, emphasizing the need for a balance between intelligence and compliance, proposing the Steering Target Atoms (STA) method as a solution to create AI that is both smart and obedient [1][6]. Method & Experimental Results - The STA method allows for "atomic-level" behavior editing of large models, enhancing robustness and safety in output control [2]. - Traditional methods often couple safety defenses with general intelligence, leading to potential performance trade-offs. The STA method addresses this by intervening at the internal neuron level, identifying and adjusting specific neurons associated with harmful behaviors while preserving those linked to correct responses [4][5]. - The STA method has been tested on models like Gemma and LLaMA, showing superior detoxification performance without significant negative impact on general performance [10]. Experimental Setup - The research involved manipulating target atom directions and amplitudes to regulate model behavior, with extensive testing on various model configurations [9]. Key Experimental Results - The STA method outperformed other techniques in detoxification while maintaining general performance, as shown in the comparative results table [10]. Steering Vectors vs. Prompt Engineering - The article compares Steering Vectors with traditional prompt engineering, highlighting that Steering is more robust against jailbreak attacks and allows for finer control [12][13]. Cognitive Intervention in Large Models - The research also explored cognitive interventions in larger models like DeepSeek-R1, enhancing reasoning capabilities by amplifying weights of neurons associated with "thinking" [16][18]. - The findings indicate that while Steering techniques may lack the convenience of prompts, they offer more robust and precise intervention effects [18]. Open Source Contribution - The research team has made some intervention methods open source to encourage further exploration in the field of safe and controllable large models [19].
中国AI模型全面爆发,AI大模型技术体系综合开源影响力榜单重磅发布!
AI科技大本营· 2025-04-18 05:53
一提到"大模型",很多人的第一反应往往是那个既能聊天,又会写代码、画画的"模型本身"。但其 实,大模型远不止是一个"能输出结果的程序"这么简单,其背后有一整套复杂而庞大的技术体系作为 支撑:从大规模、高质量、多样化的数据,到先进的模型架构与训练策略,再到推理部署、资源调度 等支撑落地的系统能力,以及不可或缺的科学评测机制。大模型更像是一个由模型、数据、系统、评 测平台 等多要素构成的"技术共同体",而非单一模块的堆叠。 如今在闭源技术壁垒与高昂商用门槛的对比下,开源大模型正迅速崛起,成为推动 AI 技术普惠化的 重要力量。但面对层出不穷的开源 AI 模型技术,我们该如何选型?不同的模型技术体系又各有怎样 的优势与短板? 在这一背景下,为系统呈现全球大模型生态的开源发展现状,CSDN 联合多家机构于 4 月 18 日在 2025 全球机器学习技术大会(ML-Summit 2025)现场重磅发布《AI 大模型技术体系综合开源影响 力榜单》,全面评估全球范围内开源大模型技术体系的贡献与影响力,旨在为行业提供参考坐标,推 动开源创新持续前行。 注:这里大模型是指 主要包括 decoder-only 以来的模型结构,包 ...
图灵奖得主LeCun:人类智能不是通用智能,下一代AI可能基于非生成式
量子位· 2025-04-14 09:09
一水 发自 凹非寺 量子位 | 公众号 QbitAI 人类智能并非通用智能。 我们的大脑是进化的产物,只擅长解决对生存有用的问题,而不是真正"通用"的计算…… 在最新一档播客节目中,Meta首席AI科学家&图灵奖得主 LeCun 发表了如上观点。 他表示,AGI(通用人工智能)一词非常具有误导性,但人类智能本质是非通用的,它非常专业。 更有意思的是,当大家都在谈论生成式AI时,他却凭借直觉大胆预测: 下一代AI的突破可能基于非生成式。 同时他还再次cue到了DeepSeek,并直言对于这个走红硅谷的新事物, 他们这些长期深耕AI领域的人实则并不感到意外 。 整场节目中,LeCun与两位主持人的讨论涵盖了从大语言模型 (LLMs) 的局限性到人工智能研究的下一个范式转变等主题,重点关注了推理、 规划和世界建模等概念。 省流版如下: 与此同时,量子位在不改变原意的基础上,对部分问题进行了翻译整理。 "下一代AI可能基于非生成式 " Q: 如何看待一边是收益递减,一边是企业们纷纷押注生成式AI? LeCun: 毫无疑问,生成式AI很有用,尤其是编程助手之类的。近来人们正在讨论Agent系统,但它还并不完全可靠。 从 ...