Workflow
Llama 3
icon
Search documents
X @Avi Chawla
Avi Chawla· 2025-08-23 19:32
RT Avi Chawla (@_avichawla)The growth of LLM context length with time:- GPT-3.5-turbo → 4k tokens- OpenAI GPT4 → 8k tokens- Claude 2 → 100k tokens- Llama 3 → 128k tokens- Gemini → 1M tokensLet's understand how they extend the context length of LLMs: ...
X @Avi Chawla
Avi Chawla· 2025-08-23 06:30
That's a wrap!If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs.Avi Chawla (@_avichawla):The growth of LLM context length with time:- GPT-3.5-turbo → 4k tokens- OpenAI GPT4 → 8k tokens- Claude 2 → 100k tokens- Llama 3 → 128k tokens- Gemini → 1M tokensLet's understand how they extend the context length of LLMs: ...
X @Avi Chawla
Avi Chawla· 2025-08-23 06:30
The growth of LLM context length with time:- GPT-3.5-turbo → 4k tokens- OpenAI GPT4 → 8k tokens- Claude 2 → 100k tokens- Llama 3 → 128k tokens- Gemini → 1M tokensLet's understand how they extend the context length of LLMs: ...
GPT-5能啃下多少行业硬骨头
Core Insights - OpenAI has officially launched GPT-5, which is described as the most intelligent, fastest, and useful model to date by CEO Sam Altman [1][2] Model Highlights - GPT-5 is a fusion model that automatically adjusts its thinking depth based on the complexity of the question [2][7] - It has achieved record high scores in various industry benchmarks, including 94.6% accuracy in the AIME 2025 math test, 84.2% in multi-modal understanding, and 46.2% in the HealthBench Hard medical test [4] - The model significantly reduces the "hallucination" problem and is more honest about its capabilities [2][7] Programming Capabilities - GPT-5 shows remarkable improvements in programming, scoring 74.9% in the SWE-bench Verified test and 88% in the Aider polyglot test [4] - It can generate complex code quickly, as demonstrated by creating a complete French learning game in seconds [4] Medical Applications - GPT-5 is touted as the most accurate model for medical queries, enhancing patient understanding and decision-making [6] - It is designed to complement, not replace, doctors by improving patient knowledge and communication [6] Commercialization Strategy - OpenAI has raised $8.3 billion, with a valuation of $300 billion, and its annual recurring revenue has increased from $10 billion to $13 billion [8] - The launch of GPT-5 comes amid intense global AI competition, with other companies like Google and Meta also advancing their models [8] Market Positioning - OpenAI is actively expanding into enterprise and government markets, offering ChatGPT enterprise versions at a symbolic price to federal agencies [8][9] - The company has signed a $200 million contract with the U.S. Department of Defense to explore AI applications in various fields [9] Competitive Landscape - In the enterprise AI market, OpenAI holds a 25% share, trailing behind Anthropic (32%) and Google (20%) [10] - The ability of GPT-5 to solve complex problems may create differentiated economic value in high-margin sectors like strategic consulting and investment analysis [10]
一觉醒来,硅谷被他挖空了
36氪· 2025-07-18 12:41
Core Viewpoint - Meta is aggressively recruiting top talent in the AI field, offering exorbitant salaries and bonuses to attract key personnel from competitors like OpenAI and Google, aiming to build a leading AI team and enhance its capabilities in artificial intelligence [4][7][23]. Group 1: Meta's Recruitment Strategy - Meta announced the launch of a supercomputer cluster with a capacity of 1 gigawatt and plans for a 5-gigawatt computing power cluster, indicating a significant investment in AI infrastructure [4]. - The company has been actively poaching talent, with reports of offers reaching up to $200 million for individuals like the former head of Apple's foundational research team [7][21]. - Meta's recruitment strategy mirrors high-profile sports transfers, where significant financial incentives are used to secure top talent, akin to how Qatar invested in European football clubs [10][11]. Group 2: Talent Acquisition and Implications - The recruitment of top AI talent has raised concerns within the industry, with OpenAI's CEO expressing feelings of loss and urgency to recalibrate their compensation structure [8][24]. - Meta's approach includes a focus on psychological tactics, taking advantage of OpenAI's internal challenges and employee burnout to lure away key personnel [24][26]. - The company has successfully recruited several prominent figures from OpenAI and other tech firms, significantly enhancing its AI capabilities in data, model training, and multimodal learning [32][29]. Group 3: Challenges and Cultural Issues - Despite the aggressive recruitment, there are underlying issues within Meta's AI department, including a lack of clear mission and a culture of fear among employees, which could hinder long-term success [38][40]. - Historical examples suggest that relying solely on financial incentives to attract talent can lead to instability and high turnover rates, as seen in other industries [40][42]. - The potential for a "bubble" in talent acquisition is highlighted, with large-scale hiring often being a sign of underlying problems within the organization [42][44].
AI“读书”合法了:美法院最新裁定,无需作者同意,已购书籍可用于训练AI
量子位· 2025-06-26 03:43
Core Viewpoint - The recent U.S. court ruling allows AI companies like Anthropic to use legally purchased books for training AI without needing the authors' permission, citing "transformative use" under the Fair Use principle, which promotes technological innovation and public interest [2][3][14]. Group 1: Court Ruling Details - The court's decision marks the first recognition of AI companies' rights to use books, significantly reducing copyright risks associated with AI training data [3]. - The ruling specifies that while the use of legally purchased books for AI training is permissible, the use of pirated books does not qualify as fair use and remains subject to copyright infringement claims [15][17]. - The case originated from accusations by three authors against Anthropic for using both legally purchased and pirated books to train their AI model, Claude [6][13]. Group 2: Background on Anthropic - Anthropic's co-founder Ben Mann downloaded 196,000 copyrighted books from a piracy site in 2021 and later amassed at least 5 million copies from other sources [7][8]. - Despite recognizing the legal risks of using pirated content, Anthropic retained all pirated copies until March 2023, when they began training Claude with a subset of books from their digital library [9][10]. - In February 2024, Anthropic shifted to legally procuring and scanning books, purchasing millions of physical copies [11]. Group 3: Implications and Reactions - The ruling has sparked discussions about whether AI can be equated with human reading and understanding, and how creators can protect their intellectual property [19]. - Similar cases in the past, such as Google Books and GitHub Copilot, have set precedents for the application of fair use in AI training, indicating a trend in favor of technological innovation over copyright restrictions [23][32]. - The outcome of this case may influence ongoing litigation involving OpenAI and Meta, as it reflects a judicial inclination towards supporting AI companies in their use of copyrighted materials [34].
人工智能周报(25年第23周):OpenAI 公布 GPT-5 路线图,腾讯升级企业大模型知识库-20250613
Guoxin Securities· 2025-06-13 09:11
Investment Rating - The report maintains an "Outperform" rating for the internet sector, indicating expected performance above the market index by over 10% [3][36]. Core Insights - The report highlights the overall stability of earnings in the internet sector following the first quarter disclosures, despite ongoing fierce competition in the e-commerce industry. Companies are either continuing to offer discounts to merchants or increasing investments in instant retail to seek new growth [2][32]. - In the AI sector, major players are benefiting from their business scenarios such as cloud computing and advertising, although short-term AI agent developments still require refinement. The Hang Seng Technology Index is currently in a period of fluctuation, with recommendations for defensive stocks like Tencent Music and NetEase, which have stable earnings and low valuations [2][32]. Company Dynamics - OpenAI has publicly announced the roadmap for GPT-5 and launched new features for ChatGPT Enterprise [17]. - Google is testing a new AI search display method to guide users back to traditional link-clicking paths [19]. - Meta has opened commercial access to Llama 3, integrating deeply with AWS to capture the enterprise market [20]. - NVIDIA reaffirmed its leading position in AI infrastructure at the GTC conference, emphasizing the importance of edge-side inference capabilities [21]. - Amazon is enhancing its advertising business with generative AI tools aimed at automating brand content creation [22]. - Tencent Cloud has upgraded its enterprise large model knowledge base, integrating new models and network search capabilities [23]. - ByteDance has announced the open-sourcing of its unified multimodal understanding and generation model, BAGEL [24]. Underlying Technologies - Microsoft has officially included AI model safety assessments in Azure Foundry, evaluating around 1,900 models for content risk [26]. - Google has updated the Gemini 2.5 Pro preview model, showing significant improvements in performance metrics [27]. - The Zhiyuan Research Institute has released the "Wujie" series of large models, reflecting the evolution of AI from the digital to the physical world [28]. - Alibaba has open-sourced a new vector model series, Qwen3-Embedding, enhancing its capabilities in natural language processing [29]. Industry Policies - The Ministry of Industry and Information Technology is promoting the development of the AI industry and its integration into new industrialization efforts, emphasizing the need for a supportive ecosystem [30]. - A draft policy from Chengdu aims to promote high-quality development in the AI industry, focusing on innovation, industry capacity enhancement, and application expansion [31].
Meta makes major investment in Scale AI, takes in CEO
TechXplore· 2025-06-13 08:10
Core Insights - Meta has made a significant investment of over $10 billion in Scale AI, valuing the startup at more than $29 billion, and has acquired its CEO, Alexandr Wang, to enhance its artificial intelligence initiatives [3][4]. - The partnership aims to deepen collaboration in producing data for AI models, with Wang joining Meta to work on superintelligence efforts [4][5]. - Scale AI, founded in 2016, has grown to over 1,500 employees and focuses on leveraging AI for businesses, governments, and labs [5][6]. Investment Details - Meta's investment is part of a strategic partnership to enhance AI capabilities amid competition with companies like OpenAI, Google, and Microsoft [4]. - The investment will allow Scale AI to accelerate innovation, strengthen partnerships, and distribute proceeds to equity holders [9]. - After the investment deal closes, Meta will hold a minority stake in Scale AI, although the exact percentage has not been disclosed [9]. Leadership Changes - Alexandr Wang will transition to Meta while remaining on Scale AI's board of directors, and other employees, referred to as "Scaliens," will also join him [6]. - Jason Droege, a tech industry veteran and co-founder of Uber Eats, will take over as the new CEO of Scale AI [9]. AI Capabilities - Scale AI has developed an AI model called "Defense Llama," based on Meta's Llama 3 model, tailored for U.S. national security missions [7][8]. - The model is designed to assist in military and intelligence operations by assessing scenarios and answering tactical questions [8]. - Scale AI is committed to ongoing collaboration with the defense community to ensure the effectiveness of its AI solutions [8].
1.93bit版DeepSeek-R1编程超过Claude 4 Sonnet,不用GPU也能运行
量子位· 2025-06-10 04:05
Core Viewpoint - The article discusses the performance and advancements of the DeepSeek-R1 (0528) model, highlighting its programming capabilities and efficiency improvements compared to previous versions and competitors. Group 1: Model Performance - The latest version R1-0528 achieved a score of 71.4 on the Aider programming leaderboard, surpassing Claude 4 Opus and the previous R1 version [5][2] - R1-0528 shows significant improvements in gaming performance, particularly in Tetris, where it outperformed o4-mini and ranked just below o3 [21][24][28] - The model's performance in Candy Crush was also notable, scoring 548 points, which is nearly 20 points higher than o4-mini [32] Group 2: Model Optimization and Size - The 1.93bit version of R1 has a file size reduced by over 70% compared to the original 8bit version, making it more lightweight and efficient [3][9] - Unsloth has developed multiple quantized versions of R1, with the smallest being 1.66bit at 162GB, which is nearly 80% smaller than the 8bit version [9][10] - The team recommends using the 2.4bit and 2.7bit versions for a better balance between size and performance [14] Group 3: Team and Other Models - Unsloth's team focuses on fine-tuning models for better efficiency, having worked on various models including Qwen, Phi, Mistral, and Llama, achieving at least a 50% reduction in memory usage and a 50% increase in speed [16][17] - Unsloth has also introduced a distilled Qwen3-8B model based on R1-0528, claiming it can match the performance of Qwen3-235B and is adaptable to various configurations [19]
Meta拟重金加码AI赛道,传将斥资超百亿美元投资Scale AI
Sou Hu Cai Jing· 2025-06-09 17:19
Group 1 - Meta is in discussions with Scale AI for a potential investment of up to or exceeding $10 billion, which could be its largest AI investment to date and one of the largest private financing cases globally [2] - Scale AI, founded in 2016 by Alexandr Wang, specializes in providing high-quality data annotation services for AI models, serving major clients like Microsoft and OpenAI [3] - In 2023, Scale AI achieved revenue of $870 million and is projected to exceed $2 billion in 2024, with a previous funding round in which Meta participated valuing the company at $13.8 billion [4] Group 2 - Scale AI has developed a military AI model called "Defense Llama" based on Meta's Llama 3, indicating a deep technical collaboration between the two companies [5] - Meta's significant investment in Scale AI reflects its strategy to gain a strong foothold in the data supply chain, which is crucial for training AI models and optimizing algorithms [6] - The potential $10 billion investment underscores Meta's commitment to AI and could significantly reshape its AI strategy and the global AI industry landscape [7]