Llama 4 Maverick
Search documents
不演了,图灵奖得主刚离职就曝 Meta 黑幕,还阴阳 28 岁上司:没经验还想管我?
3 6 Ke· 2026-01-03 04:25
Meta Llama 4 「刷榜」,终于实锤了。 金融时报最新专访曝出猛料,图灵奖得主、Meta 前首席科学家 Yann LeCun 在巴黎米其林餐厅接受采访时,亲口承认 Meta 的 Llama 4 模型「测试结果确 实被修饰了一点」,团队用不同的模型应对不同测试,以获得更好的成绩。 这位刚刚宣布离职准备创业的 AI 大佬,终于说出了藏在心里许久的实话,同时这也是首次有 Meta 官方层面的核心人物明确承认「刷榜」行为,将行业 内的「公开秘密」摆上了台面。 这个定制版和公开版表现完全不同,回答更冗长,频繁使用表情符号,明显经过特殊调教。等到 Arena 引入「风格控制」功能,中和掉字数和格式这些表 面文章后,Llama 4 Maverick 的排名直接从第 2 掉到第 5。 后续更多质疑 Llama 4 刷榜的证据和质疑声,如潮水般涌向 Meta。 Reddit 的 r/LocalLLaMA 论坛上,许多原本对 Llama 系列寄予厚望的用户表达了失望情绪,甚至有人戏称是时候将论坛改名为「LocalGemma」了,调侃 Llama 4 发布如同迟到的愚人节玩笑。 对于 Meta 提交榜单特供版模型的做法,开 ...
喝点VC|a16z谈AI的“玻璃鞋效应”:大量模型都能把事情“勉强做好”,却没能够激发用户忠诚度
Z Potentials· 2025-12-30 03:09
Malika Aubakirova 是 a16zAI 基础设施团队的投资人,专注于人工智能、网络安全与企业级基础设施交叉领域的前沿技术,拥有后端系统、前端开发与 SRE 背景,长期从事高可扩展性、高安全性与高可靠性软件系统的构建。本文发布于 2025 年 12 月 8 日。 MVP 、用户流失率,以及 " 老派 SaaS 剧本 " Z Highlights : 在传统 SaaS 模式中,早期留存经常是一场苦战。行业里形成了一套心照不宣的打法:先快速推出一个功能极简的 MVP (最小可行产品),再在真实用户 的反馈与压力下不断 " 补功能、补体验 " ,同时祈祷用户不要流失得太快。在这一逻辑里,反复迭代不仅是常态,甚至被视为正确路径。创始团队默认接 受一种现状:第一批用户中必然会有人离开。于是,大家便把希望寄托在后续版本上:要么通过持续改进把已经流失的用户拉回来,要么至少让那个不断 漏水的 " 留存桶 " 漏得慢一点。 这种运作逻辑,几乎定义了 SaaS 行业多年来的常态:产品先以现有能力上线,随后眼看着相当一部分早期采用者逐渐流失,再通过高强度、快节奏的迭 代,试图把留存率一点点拉升。 高留存被视为真正的 " ...
a16z 提出 AI 产品的「水晶鞋效应」:第一批用户反而是最忠诚的
Founder Park· 2025-12-12 06:00
Core Insights - The article discusses the "Cinderella Glass Slipper Effect" in AI, highlighting that early users of AI models often exhibit higher retention rates compared to later users, which contrasts with traditional SaaS retention strategies [1][5][6]. Group 1: Traditional SaaS vs AI Retention - In traditional SaaS, the common approach is to launch a minimal viable product (MVP) and iterate quickly to improve user retention, but this often leads to high early user churn [4]. - The AI landscape is witnessing a shift where some AI products achieve high retention rates from their first users, indicating a new model of user engagement [5][6]. Group 2: Understanding the Cinderella Effect - The "Cinderella Glass Slipper Effect" suggests that when an AI model perfectly addresses a user's needs, it creates a loyal user base that integrates the model deeply into their workflows [7][8]. - Early adopters, referred to as the "foundational cohort," tend to remain loyal if the model meets their specific needs effectively [8][9]. Group 3: User Retention Dynamics - Retention rates serve as a critical indicator of a model's success, with early users' loyalty being a sign of a genuine breakthrough in capability [6][24]. - The window of opportunity for AI products to capture foundational users is short, often lasting only a few months, necessitating rapid identification and resolution of core user needs [6][22]. Group 4: Case Studies and Examples - The article provides examples of AI models like Google’s Gemini 2.5 Pro and Anthropic’s Claude 4 Sonnet, which demonstrate high retention rates among early users compared to later adopters [14][15]. - Models that fail to establish a unique value proposition often see low retention rates across all user groups, indicating a lack of product-market fit (PMF) [17][24]. Group 5: Implications for AI Companies - The "Cinderella Effect" emphasizes the need for AI companies to focus on solving high-value, unmet needs rather than creating broadly applicable but mediocre products [23][24]. - The competition in AI is shifting from merely having larger or faster models to effectively identifying and retaining users who find genuine value in the product [23][24].
X @Avi Chawla
Avi Chawla· 2025-09-29 06:33
You're in a Research Scientist interview at OpenAI.The interviewer asks:"Our investors want us to contribute to open-source.o3 crushed benchmarks.But we can lose a competitive edge by open-sourcing it.What do we do?"You: "Release the research paper."Interview over.You forgot that LLMs don't just learn from raw text; they also learn from each other.For example:- Llama 4 Scout & Maverick were trained using Llama 4 Behemoth.- Gemma 2 and 3 were trained using Gemini.Distillation helps us do so, and the visual e ...
LLM也具有身份认同?当LLM发现博弈对手是自己时,行为变化了
3 6 Ke· 2025-09-01 02:29
Core Insights - The research conducted by Columbia University and Montreal Polytechnic reveals that LLMs (Large Language Models) exhibit changes in cooperation tendencies based on whether they believe they are competing against themselves or another AI [1][29]. Group 1: Research Methodology - The study utilized an Iterated Public Goods Game, a variant of the Public Goods Game, to analyze LLM behavior in cooperative settings [2][3]. - The game involved multiple rounds where each model could contribute tokens to a public pool, with the total contributions multiplied by a factor of 1.6 and then evenly distributed among players [3][4]. - The research was structured into three distinct studies, each examining different conditions and configurations of the game [8][14]. Group 2: Key Findings - In the first study, when LLMs were informed they were playing against "themselves," those prompted with collective terms tended to betray more, while those prompted with selfish terms cooperated more [15][16]. - The second study simplified the rules by removing reminders and reasoning prompts, yet the behavioral differences between the "No Name" and "Name" conditions persisted, indicating that self-recognition impacts behavior beyond mere reminders [21][23]. - The third study involved LLMs truly competing against their own copies, revealing that under collective or neutral prompts, being told they were playing against themselves increased contributions, while under selfish prompts, contributions decreased [24][28]. Group 3: Implications - The findings suggest that LLMs possess a form of self-recognition that influences their decision-making in multi-agent environments, which could have significant implications for the design of future AI systems [29]. - The research highlights potential issues where AI might unconsciously discriminate against each other, affecting cooperation or betrayal tendencies in complex scenarios [29].
全球AI智商最新排名公布,还好,没谁超过爱因斯坦
3 6 Ke· 2025-08-19 05:22
Group 1 - The project "Trackingai.org" has created a fun initiative to test AI models using a human-like IQ test format, aiming to measure their cognitive abilities in a familiar way [1][25] - The challenge featured top AI models including OpenAI's GPT-5 Pro, Google's Gemini 2.5 Pro, and xAI's Grok 4, showcasing their performance in a competitive environment [3][4] - The results of the IQ tests reveal significant insights into the cognitive evolution of AI and highlight the differences between AI and human thinking [3][28] Group 2 - In the Mensa IQ test, Google's Gemini 2.5 Pro achieved the highest score of 137, indicating its advanced capabilities in logical reasoning and abstract thinking [6][28] - OpenAI's GPT-5 scored 121, while Grok 4 scored 125, both of which are above average but below Gemini 2.5 Pro [6][19] - The performance of these models illustrates a gradient in AI intelligence levels, with each model employing different reasoning paths to arrive at correct answers [17][19] Group 3 - The Llama 4 Maverick from Meta scored only 98, reflecting a significant gap compared to top competitors, despite being close to the human average [21][22] - Meta is actively recruiting top AI researchers to improve its models and close the performance gap with leading closed-source models [22][24] - DeepSeek R1, despite using older data, scored 102, indicating that effective model architecture and training methods can lead to competitive performance without the latest updates [24][25] Group 4 - The testing method serves as a bridge for public understanding of AI capabilities, making it easier to discuss and compare AI intelligence in relatable terms [25][26] - High IQ scores for AI models signify a qualitative leap in their cognitive abilities, moving beyond mere information retrieval to complex logical reasoning and problem-solving [28][29] - The results highlight that while AI can excel in logical analysis, it does not equate to possessing a complete human-like intelligence, which includes creativity and emotional understanding [29]
AI竞争压顶,Meta终于杀入风投
虎嗅APP· 2025-07-07 10:36
Core Viewpoint - Meta's CEO Mark Zuckerberg is under pressure to enhance the company's AI capabilities and is adopting a more hands-on approach to management, including the establishment of a Corporate Venture Capital (CVC) unit to attract top talent and improve performance in the AI sector [2][8]. Group 1: Meta's Current Challenges - Zuckerberg's recent management style has shifted to a more direct and micro-level approach, reallocating resources to the GenAI team to boost the performance of LLaMA [2][4]. - There is a growing concern about talent retention at Meta, with reports of AI engineers leaving for competitors like OpenAI and Anthropic, often with offers exceeding $2 million [6][7]. - The AI landscape is becoming increasingly competitive, with Meta's LLaMA struggling to keep pace with rivals like Qwen and DeepSeek, leading to a perception of stagnation in Meta's AI initiatives [6][12]. Group 2: Establishment of CVC - Historically, Meta has not had a dedicated CVC, relying instead on its corporate development teams for acquisitions [4][5]. - The decision to form a CVC is part of Zuckerberg's broader strategy to create a "superintelligence unit" aimed at revitalizing Meta's AI efforts [8][10]. - Meta's investment in the venture fund NFDG, led by Daniel Gross, is a strategic move to gain access to top talent and innovative projects in the AI space [9][12]. Group 3: Financial Implications and Market Dynamics - The AI investment landscape is currently dominated by corporate investments, which accounted for approximately 75% of the total funding in 2023, indicating a scarcity of available high-quality targets [12][13]. - Meta's recent acquisition of Scale AI for $14.8 billion is seen as a critical step in its strategy to bolster its AI capabilities [7][12]. - The overall number of AI startups has decreased significantly, with a reported 81% drop in new AI companies since the peak in 2021, complicating Meta's efforts to secure talent and technology [12][13].
13万亿巨头,杀入CVC
3 6 Ke· 2025-07-05 02:33
Core Insights - Meta's CEO Mark Zuckerberg is experiencing frustration as the company struggles to keep pace with competitors in the AI space, particularly in light of its underwhelming performance in the metaverse and AR/VR sectors [1][2] - Despite Meta's strong financial performance and stock price nearing historical highs, there is growing anxiety about the company's future direction and competitiveness in AI [1][2] Group 1: Management Changes and Strategies - Zuckerberg has taken a hands-on approach to AI management, reallocating resources from foundational AI research to the GenAI team to enhance the performance of LLaMA [2] - The restructuring includes demoting the head of the GenAI team and splitting it into two groups, reflecting Zuckerberg's intense pressure to deliver results [2] - Meta's lack of a dedicated Corporate Venture Capital (CVC) team has prompted Zuckerberg to consider establishing one to better compete in the AI landscape [4][7] Group 2: Talent Acquisition Challenges - Meta is facing significant talent retention issues, with reports of AI engineers leaving for competitors like OpenAI and Anthropic, often with offers exceeding $2 million [6] - Zuckerberg's ambitious "superintelligence unit" plan aims to recruit top industry talent, offering salaries that could reach nine figures [6][7] - The difficulty in attracting talent is compounded by the competitive landscape, where even substantial financial incentives have not been enough to secure top candidates [10][12] Group 3: Investment and Acquisition Strategies - Meta's acquisition of Scale AI for $14.8 billion is part of a broader strategy to bolster its AI capabilities and leadership [6][12] - The company is also investing in Daniel Gross's venture fund, NFDG, to gain access to top talent and expertise in AI [7][8] - The overall investment landscape in AI is becoming increasingly competitive, with a significant drop in the number of new AI startups and rising costs for quality acquisitions [11][12]
大模型全员0分!谢赛宁领衔华人团队,最新编程竞赛基准出炉,题目每日更新禁止刷题
量子位· 2025-06-18 09:17
Core Viewpoint - The recent LiveCodeBench Pro benchmark test revealed that leading large language models (LLMs) performed poorly, with all models scoring zero points, indicating that they have not yet reached the level of human experts in competitive programming tasks [1][2][8]. Group 1: Benchmark Overview - LiveCodeBench Pro is a real-time benchmark testing platform that includes competitive programming problems from IOI, Codeforces, and ICPC [3]. - The question bank is updated daily to prevent LLMs from memorizing questions, ensuring a challenging evaluation environment [4][15]. - The benchmark consists of 584 top-tier competition problems, categorized by cognitive focus and difficulty level, with automatic selection based on normal distribution [15][17]. Group 2: Model Performance - The best-performing model achieved a pass rate of only 53% on medium difficulty questions, while the pass rate for hard questions was 0% [9][10]. - The performance metrics of various models showed that while they excelled in knowledge-intensive and logic-intensive problems, they struggled with observation-intensive problems [26][29]. - LLMs demonstrated advanced skills in precise implementations but fell short in algorithm design and complex case analysis [28][29]. Group 3: Testing Methodology - The testing team categorized problems based on underlying algorithmic concepts and recorded the official difficulty ratings from Codeforces [19]. - Each model's submissions were evaluated against human expert solutions, with results indicating that LLMs often failed to utilize provided sample inputs effectively [30][32]. - The team plans to release a completely new evaluation set quarterly to maintain the relevance and challenge of the testing environment [38]. Group 4: Team Composition - The LiveCodeBench Pro team consists of several Olympic competition winners, with a significant portion being of Chinese descent [40]. - Key team members have backgrounds in prestigious institutions and have previously interned at major tech companies, contributing to the project's credibility and expertise [41][44].
砸千亿重金、挖28岁华裔天才CEO、高薪聘谷歌OpenAI员工,传Meta正重组AI研发体系
3 6 Ke· 2025-06-11 23:33
Group 1 - Meta is establishing a new lab focused on "Superintelligence" to develop AI systems that surpass human intelligence in reasoning, problem-solving, creativity, and decision-making [1][3] - Meta has agreed to acquire 49% of Scale AI for $14.8 billion, which is approximately 106.14 billion RMB [1][3] - Alexander Wang, the 28-year-old CEO of Scale AI, is invited to join Meta's new lab, highlighting Meta's strategy to attract top talent in the AI field [1][4] Group 2 - Meta is offering compensation packages ranging from seven to nine figures to recruit top researchers from companies like OpenAI and Google, with some already agreeing to join [4][9] - Scale AI, founded in 2016, provides data labeling solutions and reported a revenue of $870 million in the previous year, with expectations to double to over $2 billion this year [3][9] - Meta's AI efforts are led by two groups: a generative AI team and a fundamental AI research lab, with Yann LeCun, a Turing Award winner, overseeing the latter [4][9] Group 3 - Meta's recent AI model testing faced criticism, with external researchers questioning the objectivity of its benchmark tests [5][8] - The company aims to regain its competitive edge in AI, especially after the rise of ChatGPT, which has intensified competition in the tech industry [9][10] - Meta's previous focus on open-source large models and social platform AI tools has led to a fragmented strategy, prompting the need for a more cohesive approach [10]