Workflow
Scaling Laws
icon
Search documents
Anthropic Co-founder: Building Claude Code, Lessons From GPT-3 & LLM System Design
Y Combinator· 2025-08-19 14:00
Anthropic's Early Days and Mission - Anthropic started with seven co-founders, facing initial uncertainty about product development and success, especially compared to OpenAI's $1 billion funding [1][46][50] - The company's core mission is to ensure AI alignment with humanity, focusing on responsible AI development and deployment [45][49] - A key aspect of Anthropic's culture is open communication and transparency, with "everything on Slack" and "all public channels" [44] Product Development and Strategy - Anthropic initially focused on building training infrastructure and securing compute resources [50] - The company launched a Slackbot version of Claude one nine months before ChatGPT, but hesitated to release it as a product due to uncertainties about its impact and lack of serving infrastructure [51][52] - Anthropic's Claude 35 Sonnet model gained significant traction, particularly for coding tasks, becoming a preferred choice for startups in YC batches [55] - Anthropic invested in making its models good at code, leading to emergent behavior and high market share in coding-related tasks [56] - Claude Code was developed as an internal tool to assist Anthropic's engineers, later becoming a successful product for agentic use cases [68][69] - Anthropic emphasizes building the best possible API platform for developers, encouraging external innovation on top of its models [70][77] Compute Infrastructure and Scaling - The AI industry is experiencing a massive infrastructure buildout, with spending on AGI compute increasing roughly 3x per year [83] - Power is identified as a major bottleneck for data center construction, especially in the US, highlighting the need for increased data center permitting and construction [85] - Anthropic utilizes GPUs, TPUs, and Tranium from multiple manufacturers to optimize performance and capacity [86][87] Advice for Aspiring AI Professionals - Taking more risks and working on projects that excite and impress oneself are crucial for success in the AI field [92] - Extrinsic credentials like degrees and working at established tech companies are becoming less relevant compared to intrinsic motivation and impactful work [92]
深度|Sam Altman:创业者不要做OpenAI核心要做的事,还有很多领域值得探索,坚持深耕可长成比OpenAI更大的公司
Z Potentials· 2025-07-03 03:13
Core Insights - The conversation highlights the importance of decisive action and gathering talented individuals around ambitious goals, particularly in the context of OpenAI's early days and its focus on AGI [3][5][6] - The discussion emphasizes the current state of AI technology, including the rapid advancements in model capabilities and the lag in product development, as well as the potential for future innovations [7][8][9] - The dialogue also touches on the future of human-computer interaction, the role of AI in scientific progress, and the potential for a new industrial era driven by AI and robotics [15][27][29] Group 1: Early Decisions and Talent Gathering - One of the most crucial decisions for OpenAI was simply to commit to the project, despite initial doubts about the feasibility of AGI [3] - Attracting top talent was facilitated by presenting a unique and ambitious vision that few others were pursuing at the time [5] - OpenAI started small, with only eight people, and initially focused on producing quality research rather than having a clear business model [6] Group 2: Current State of AI Technology - There is a significant gap between the capabilities of AI models and the products available, indicating a "product lag" [7] - The cost of using models like GPT-4o is expected to decrease rapidly, enhancing accessibility and potential applications [7] - OpenAI plans to open-source a powerful model soon, which could surprise many users with its capabilities [7] Group 3: Future Innovations and Human-Computer Interaction - The introduction of memory features in AI is seen as a step towards creating more personalized and proactive AI assistants [8] - The future of human-computer interaction is envisioned as a "melted interface," where AI seamlessly manages tasks with minimal user intervention [21][22] - The integration of AI with real-world data sources is crucial for enhancing user experiences and operational efficiency [11] Group 4: Industrial and Scientific Progress - The conversation suggests that the next industrial revolution could be driven by AI and robotics, with the potential to automate various sectors [15][16] - AI is expected to significantly accelerate scientific discovery, which could lead to sustainable economic growth and improvements in human life [27] - The relationship between energy and AI is highlighted, emphasizing the need for sustainable energy solutions to support advanced AI operations [29][30] Group 5: Entrepreneurial Advice and Market Opportunities - Current technological shifts present a unique opportunity for startups to innovate and adapt quickly, leveraging the evolving landscape [23] - Founders are encouraged to focus on unique ideas rather than following trends, as true innovation often comes from exploring uncharted territories [17][18] - The importance of resilience and long-term vision in entrepreneurship is emphasized, particularly in the face of skepticism [19][32]
OpenAI路线遭质疑,Meta研究员:根本无法构建超级智能
3 6 Ke· 2025-06-20 12:00
Core Insights - The pursuit of "superintelligence" represents a significant ambition among leading AI companies like Meta, OpenAI, and Google DeepMind, with substantial investments being made in this direction [1][3][4] - Sam Altman of OpenAI suggests that building superintelligence is primarily an engineering challenge, indicating a belief in a feasible path to achieve it [3][4] - Meta AI researcher Jack Morris argues that the current approach of using large language models (LLMs) and reinforcement learning (RL) may not be sufficient to construct superintelligence [1][2] Group 1: Current Approaches and Challenges - Morris outlines three potential methods for building superintelligence: purely supervised learning (SL), RL from human validators, and RL from automated validators [2] - The integration of non-text data into models is believed not to enhance overall performance, as human-written text carries intrinsic value that sensory inputs do not [2][6] - The concept of a "data wall" or "token crisis" is emerging, where the availability of text data for training LLMs is becoming a concern, leading to extensive efforts to scrape and transcribe data from various sources [8][19] Group 2: Learning Algorithms and Their Implications - The two primary learning methods identified for potential superintelligence are SL and RL, with SL being more stable and efficient for initial training [10][22] - The hypothesis that superintelligence could emerge from SL alone is challenged by the limitations of current models, which may not exhibit human-level general intelligence despite excelling in specific tasks [15][16] - The combination of SL and RL is proposed as a more viable path, leveraging human feedback or automated systems to refine model outputs [20][22][28] Group 3: Future Directions and Speculations - The potential for RL to effectively transfer learning across various tasks remains uncertain, raising questions about the scalability of this approach to achieve superintelligence [34] - The competitive landscape among AI companies is likely to intensify as they seek to develop the most effective training environments for LLMs, potentially leading to breakthroughs in superintelligence [34]
Lex Fridman 对谈谷歌 CEO:追上进度后,谷歌接下来打算做什么?
Founder Park· 2025-06-06 15:03
Core Insights - Google has made significant strides in the AI competition, particularly with the launch of Gemini 2.5, positioning itself on par with OpenAI [1][4] - The future of Google Search is envisioned to integrate advanced AI models that will enhance user experience by providing valuable content through multi-path retrieval [4][13] - The company is currently in the AJI (Artificial Jagged Intelligence) phase, indicating notable progress but also existing limitations in AI capabilities [4][42] Group 1: AI Development and Integration - Google aims to deploy the strongest models in search, executing multi-path retrieval for each query to deliver valuable content [4][13] - Approximately 30% of code is generated with the assistance of AI prompts, leading to a 10% increase in overall engineering efficiency [32][34] - The company is focused on creating a seamless integration of AI into its products, with plans to migrate AI Mode to the main search page [4][18] Group 2: Search and Advertising Evolution - The traditional search interface is evolving, with AI becoming an auxiliary layer that provides context and summaries while still directing users to human-created content [14][19] - AI Mode is currently being tested by millions, showing promising early indicators of user engagement and satisfaction [15][18] - Future advertising strategies will be rethought to align with AI capabilities, ensuring that ads are presented in a natural and unobtrusive manner [16][17] Group 3: Challenges and Future Outlook - Scaling laws remain effective, but the company acknowledges limitations in computational power affecting model deployment [29][30] - The integration of AR (Augmented Reality) is seen as the next significant interaction paradigm, with Project Astra being crucial for the Android XR ecosystem [36][38] - The company anticipates that while AGI may not be achieved by 2030, significant advancements will occur across various dimensions of AI [42][44]
AI 月报:马斯克加速 GPU 竞赛;大模型真撞墙了? 风口转到 Agent
晚点LatePost· 2024-12-11 14:30
新栏目上线试运行。 文丨 贺乾明 编辑丨黄俊杰 到了 11 月,越来越多的人说,成就 OpenAI 的这条路似乎撞到了墙: 多家媒体报道,Google、OpenAI、Anthropic 等公司,开发下一代模型时,都没能像前些年那样让模型能力大幅提升。 硅谷风投 a16z 创始合伙人、投资了 OpenAI 等多家大模型公司的马克·安德森(Marc Andreessen)说:"我们以相 同的速度增加(GPU),根本没有智能提升。" OpenAI 联合创始人、前首席科学家伊尔亚·苏茨克维 (Ilya Sutskever) 说:"2010 年代是扩大规模的时代,现在我 们再次回到了需要奇迹和新发现的时代。" 这些公司的高管否认了 "撞墙" 的说法,也有证据表明他们仍在想办法突破,毕竟建设更大规模的算力中心的势头并没 有放缓,甚至还在加速。 他们同步在大模型应用上倾注更多的资源。从 OpenAI、Anthropic 到 Google、微软,再到风投机构,都把 Agent——让 大模型理解人类指令,调度数据库和工具完成复杂任务的系统——当作下一个赛点。 11 月,ChatGPT 迎来两周年,却是 OpenAI 官方相对沉 ...
发布视频生成模型、日均交互 30 亿次,MiniMax 第一次线下活动记录
晚点LatePost· 2024-09-02 15:40
"如果我们在竞争中打不赢,就应该被淘汰,没有其他选择。 文丨程曼祺 由 MiniMax 视频生成大模型制作的短片《魔法硬币》,MiniMax 称其中每个场景都由大模型生成,未经任何修改。 发布会所在的 "西岸漩心" 被巨大的螺旋式阶梯环绕,游人可沿着步道一直走到顶层露台,眺望浦东风景。这 是一条上升、平缓,然后再上升、平缓,最终达到顶点的路。此时 AI 领域似乎也处在螺旋中的相对平缓期。 当 MiniMax 创始人闫俊杰放映完由视频生成模型制作的动画短片后,观众席传来数声尖叫。至少 3 位在场的 投资人说, 视频生成模型是他们当天最在意的成果 。 但视频生成模型本身不新鲜了,自 OpenAI 年初发布 Sora,数家中国公司跟进这一方向。 "期货" 也在成为行业关键词:GPT-5、GPT-4o 的语音视频功能、Sora……它们要么上线晚于预期,要么亮相多 时后仍未大规模公测。据我们了解,国内 "六小龙"(MiniMax、月之暗面、智谱 AI、百川智能、零一万物、 阶跃星辰 6 家大模型独角兽)今年的基础模型或多模态模型的更新时点也多晚于原计划。 发布结束后,闫俊杰被问起如何看待技术进展放缓。他说,一条上升、平 ...
中国首批核聚变创业者谭熠:它总在你绝望时又给你希望|TECH TUESDAY
晚点LatePost· 2024-07-30 13:15
"核聚变永远还有 50 年是对的,现在不到 10 年可能也是对的。" 文丨 贺乾明 编辑丨程曼祺 "如果核聚变发电就是实现不了呢?" 听到这个问题,在清华大学研究核聚变 20 多年的谭熠沉默了几秒,然后笑了起来。他觉得这个问题 "根本没道理",因为核聚变 "从科学上是可行的"。 70 多年前的曼哈顿工程期间,科学家就了解核聚变原理。二战结束后,美国很快就用它造出了氢弹。但用核聚变发电的研究几经起伏,冷战后几乎停滞了 20 多年。 情况在 2021 年发生变化 ,美国的核聚变公司 Helion 宣布把等离子体加热到 1 亿摄氏度,实现原本只有政府项目才能做到的壮举;从麻省理工分拆的核聚变 公司 CFS 开发出形成更强磁场的高温超导磁体,把低成本建造能实现核聚变装置可能性大幅提高。 核聚变创业热潮出现:OpenAI 联合创始人山姆·阿尔特曼、PayPal 联合创始人彼得·蒂尔、比尔·盖茨、乔治·索罗斯等硅谷科技名流和富豪,以及 Google、DFJ 等机构在短时间里朝核聚变行业投资了 30 多亿美元,是美国政府数年来累计拨款的数倍。 这一年,谭熠创办核聚变公司星环聚能,担任首席科学家,在 2022 年 6 月拿到 ...
Llama 3 发布,亮点在于 “小” 模型
晚点LatePost· 2024-04-19 16:05
重新寻找 Scaling Laws。 文丨 贺乾明 编辑丨黄俊杰 像一个人的学习成长一样,每个全新的大模型,都需要从大量的文本中学习 "知识",才有能力去解 决一个个问题。 Google 训练 70 亿参数的 Gemma 开源模型,让它 "看过" 6 万亿 Token(6 万亿个词)的文本。微软 投资的 Mistral 训练 73 亿参数模型,"看过" 8 万亿个 Token 的文本。 用如此大规模的数据训练参数不到 100 亿的模型,已经是行业中比较重的方法。按照 DeepMind 研 究人员提出的策略,如果考虑性价比,这么大的模型,看 2000 亿 Token 的文本就够了。不少中国 一线创业公司的同等规模大模型只用了 1 万亿~2 万亿个 Token 的文本。 Meta 的 CEO 马克·扎克伯格(Mark Zuckerberg)不满足于此,他直接把下一代开源大模型送进了 "县中",用更多习题拔高能力。Meta 昨夜推出的 Llama 3 系列大模型,80 亿参数模型用了 15 万亿 Token 的训练数据,比 Google 的多学了一倍还不止,是很多小公司产品的十倍。 根据 Meta 公布的数据,在 ...