大型语言模型(LLM)
Search documents
“可能性大概0到1%”:IBM CEO给AGI泼冷水,断言AI数据中心投资无法获得回报
Sou Hu Cai Jing· 2025-12-03 14:40
Core Viewpoint - The debate over whether AI data center investments are overheated is intensifying in Wall Street and Silicon Valley, with significant capital expenditure plans announced by major tech companies, raising concerns about potential returns on these investments [1][2]. Group 1: Investment Plans - Major tech companies have announced substantial investments in data centers: Meta plans to invest over $600 billion over the next three years, Microsoft $80 billion by 2025, Google $75 billion, and Apple $500 billion over four years, potentially pushing global data center and AI infrastructure investments to over $5 trillion in the next five years [1]. - IBM's CEO Arvind Krishna expressed skepticism about the returns on these investments, stating that the current infrastructure costs make it impossible to achieve returns on the promised multi-trillion dollar investments [2][4]. Group 2: Cost Analysis - Krishna calculated that filling a 1 gigawatt data center costs approximately $80 billion, leading to a total capital expenditure of about $8 trillion if tech companies pursue a total capacity of 100 gigawatts [4]. - He emphasized that this level of investment would require around $800 billion in profits just to cover interest payments, not accounting for depreciation of equipment, particularly AI chips that have a rapid obsolescence rate [4]. Group 3: Comparison to Past Bubbles - Krishna compared the current AI investment frenzy to the internet bubble of the early 2000s, noting that while fiber optics had long-term utility, AI hardware like GPUs has a much shorter lifespan, necessitating expensive updates every five years [5]. - He acknowledged that while some infrastructure can last, the rapid pace of technological advancement in AI hardware raises questions about the sustainability of current investments [5]. Group 4: AGI Potential - Krishna expressed a low probability (0 to 1%) that current technology can achieve Artificial General Intelligence (AGI), contrasting sharply with optimistic statements from other tech leaders [6][8]. - He believes that achieving AGI will require significant advancements beyond current large language models (LLMs) and emphasizes the need for integrating hard knowledge with AI technologies [8]. Group 5: IBM's Strategic Focus - IBM has chosen not to compete in the consumer AI market, focusing instead on enterprise solutions, where it can leverage its long-standing reputation for data protection and reliability [9]. - The company is actively hiring while others in the tech sector are laying off employees, as it aims to enhance productivity through AI tools [9]. Group 6: Quantum Computing Outlook - Krishna predicts that quantum computing could reach practical scale within three to five years, with an estimated market value of $400 billion to $700 billion annually [9]. - He provided a probabilistic timeline for when quantum computing might deliver significant commercial value, suggesting a higher likelihood of breakthroughs within four to five years [10]. Group 7: Industry Perspectives - Krishna's views reflect a broader skepticism within the tech industry regarding the disconnect between current investment levels and realistic return expectations, while still acknowledging the transformative potential of AI for enterprise productivity [11]. - The ongoing debate highlights differing beliefs about the future of AI and AGI, with some companies betting heavily on becoming market leaders through substantial investments [12].
联发科,23年最佳
半导体芯闻· 2025-11-28 10:46
Group 1 - Media reports indicate that MediaTek has partnered with Alphabet's unit to design Tensor Processing Units (TPUs), which are seen as potential competitors to NVIDIA's chips in the AI application field [1] - MediaTek is known for smartphone chips, but faces pressure on gross margins due to uncertain demand, intense competition, and high R&D costs; however, AI-related news has provided some relief for its stock price, which has still declined by approximately 1.4% this year [1] - Morgan Stanley analysts upgraded MediaTek's rating from "Equal Weight" to "Overweight," citing that the growth of Google TPUs should offset headwinds in the smartphone market in the long term [1] Group 2 - UBS analysts raised their 2027 sales forecast for MediaTek's TPUs from $1.8 billion to $4 billion, predicting that these chips will account for 20% of the company's operating profit by 2028, contingent on MediaTek's execution with Google [2] - Recent interest has been fueled by reports that Meta is discussing the adoption of Google TPUs in data centers by 2027; UBS believes MediaTek has further growth potential in additional ASIC projects with Meta [2] - Overall, foreign investors remain optimistic about MediaTek, with 23 firms maintaining a "Buy" rating and 10 firms a "Hold" rating, while no firms have issued a "Sell" rating; analysts from Macquarie Group express a preference for investing in MediaTek and other Google partners over NVIDIA's supply chain [2]
如何让你的数据为人工智能做好准备
3 6 Ke· 2025-11-11 01:29
Core Insights - The emergence of agent-based AI is fundamentally transforming the big data paradigm, requiring a proactive approach to data integration into specialized intelligent computing platforms rather than the traditional reactive methods [1] - This shift is leading to a re-evaluation of data modeling and storage, as modern AI can leverage significantly smaller datasets compared to traditional machine learning [1] Group 1: Changes in Data Interaction - The way data is utilized is evolving, with non-technical users increasingly interacting directly with data through AI agents, moving from a builder-centric to an interactor-centric model [2][4] - Existing SaaS applications are integrating natural language interactions more seamlessly, allowing users to create applications based on their needs [4][6] Group 2: Data Engineering Principles - Data engineers must rethink ETL/ELT processes, focusing on context rather than strict normalization, as AI agents can interpret data without extensive preprocessing [7][9] - The importance of data organization is emphasized over mere data collection, as quality examples for context-based learning are more valuable than large quantities of data [10][12] Group 3: Infrastructure and Management - AI agents require infrastructure that supports both data perception and action, necessitating clear interfaces and documentation for effective tool usage [15][17] - The management of AI-generated artifacts is crucial, as these outputs become part of the data ecosystem and must adhere to industry standards and regulations [20][21] Group 4: Observability and Training - Establishing a feedback loop between observability and training is essential for enhancing AI agent performance, requiring a platform to monitor data quality and model performance [22][24] - Data engineers' roles are evolving to include maintaining decision logs and managing agent-generated code as versioned artifacts for future analysis and training [26][29]
微信、清华连续自回归模型CALM,新范式实现从「离散词元」到「连续向量」转变
机器之心· 2025-11-07 06:02
Core Insights - The article discusses a new method called Continuous Autoregressive Language Model (CALM) proposed by Tencent WeChat AI and Tsinghua University, which aims to improve the efficiency of large language models (LLMs) by predicting multiple tokens as a continuous vector instead of one token at a time [3][11][12]. Group 1: Efficiency Challenges of LLMs - The efficiency issues of LLMs stem from their reliance on discrete token sequences for autoregressive prediction, leading to high computational costs and low information density per token [8][10]. - The information density of discrete tokens is low, with a 32K vocabulary size yielding only 15 bits of information per token, creating a direct bottleneck in efficiency [10][11]. - The transition from discrete to continuous representations allows for a significant reduction in the number of generation steps, enhancing computational efficiency while maintaining performance [12][21]. Group 2: Implementation of CALM - CALM employs a high-fidelity autoencoder to compress K tokens into a continuous vector, achieving over 99.9% reconstruction accuracy [11][21]. - The model's architecture includes a generative head that outputs the next continuous vector based on the hidden states from a Transformer, facilitating efficient single-step generation [24][25]. - The design of CALM allows for a more stable input signal by first decoding the predicted vector back into discrete tokens before further processing [26]. Group 3: Performance Evaluation - The Brier Score is introduced as a new evaluation metric for the model's performance, which can be estimated using Monte Carlo methods and is applicable to both traditional and new language models [29][32]. - Experimental results indicate that CALM models, such as CALM-M with 371M parameters, require significantly fewer training and inference FLOPs compared to traditional Transformer models while achieving comparable performance [37][38]. Group 4: Future Directions - The article highlights potential research directions, including enhancing the autoencoder's semantic understanding, exploring more robust end-to-end architectures, and developing efficient sampling algorithms to reduce inference costs [43][45]. - A new scaling law incorporating semantic bandwidth K is suggested as a macro-level research direction to further optimize language model efficiency [44].
NeurIPS 2025 Spotlight | 选择性知识蒸馏精准过滤:推测解码加速器AdaSPEC来了
机器之心· 2025-11-06 03:28
Core Insights - The article discusses the introduction of AdaSPEC, an innovative selective knowledge distillation method aimed at enhancing speculative decoding in large language models (LLMs) [3][9][16] - AdaSPEC focuses on improving the alignment between draft models and target models by filtering out difficult-to-learn tokens, thereby increasing the overall token acceptance rate without compromising generation quality [3][11][16] Research Background - LLMs excel in reasoning and generation tasks but face high inference latency and computational costs due to their autoregressive decoding mechanism [6] - Traditional acceleration methods like model compression and knowledge distillation often sacrifice generation quality for speed [6] Method Overview - AdaSPEC employs a selective token filtering mechanism that allows draft models to concentrate on "easy-to-learn" tokens, enhancing their alignment with target models [3][9] - The method utilizes a two-stage training framework: first, it identifies difficult tokens using a reference model, and then it filters the training dataset to optimize the draft model [11][12] Experimental Evaluation - The research team conducted systematic evaluations across various model families (Pythia, CodeGen, Phi-2) and tasks (GSM8K, Alpaca, MBPP, CNN/DailyMail, XSUM), demonstrating consistent and robust improvements in token acceptance rates [14] - Key experimental results indicate that AdaSPEC outperforms the current optimal DistillSpec method, with token acceptance rates increasing by up to 15% across different tasks [15] Summary and Outlook - AdaSPEC represents a precise, efficient, and universally applicable paradigm for accelerating speculative decoding, paving the way for future research and industrial deployment of efficient LLM inference [16] - The article suggests two potential avenues for further exploration: dynamic estimation mechanisms for token difficulty and application of AdaSPEC in multimodal and reasoning-based large models [17]
Codeforces难题不够刷?谢赛宁等造了个AI出题机,能生成原创编程题
3 6 Ke· 2025-10-20 08:15
Core Insights - The article discusses the importance of training large language models (LLMs) to generate high-quality programming competition problems, emphasizing that creating problems requires deeper algorithmic understanding than merely solving them [2][3][30] - The research introduces AutoCode, a framework that automates the entire lifecycle of problem creation and evaluation for competitive programming, utilizing a closed-loop, multi-role system [3][30] Group 1: Problem Creation and Evaluation - The ability to create programming competition problems is more challenging than solving them, as it requires a profound understanding of underlying algorithm design principles and data structures [2] - Existing testing datasets for programming competitions have high false positive rates (FPR) and false negative rates (FNR), which can distort the evaluation environment [2][14] - AutoCode employs a robust Validator-Generator-Checker framework to ensure high-quality input generation and minimize errors in problem evaluation [5][8][30] Group 2: Performance Metrics - AutoCode achieved a consistency rate of 91.1% in problem evaluation, significantly higher than previous methods, which did not exceed 81.0% [17] - The framework reduced FPR to 3.7% and FNR to 14.1%, representing approximately a 50% decrease compared to state-of-the-art techniques [17][19] - In a more challenging benchmark with 720 recent Codeforces problems, AutoCode maintained a consistency of 98.7%, validating its effectiveness on modern, difficult problems [19] Group 3: Novel Problem Generation - The team developed a novel problem generation framework that utilizes a dual verification protocol to ensure correctness without human intervention [23] - The process begins with a "seed problem," which is modified to create new, often more challenging problems, with a focus on generating high-quality reference solutions [23][24] - The dual verification protocol successfully filtered out 27% of error-prone problems, increasing the accuracy of reference solutions from 86% to 94% [24][30] Group 4: Findings on LLM Capabilities - LLMs can generate solvable problems that they themselves cannot solve, indicating a limitation in their creative capabilities [27][29] - The findings suggest that LLMs excel in "knowledge recombination" rather than true originality, often creating new problems by combining existing frameworks [32] - The difficulty increase of newly generated problems is typically greater than that of the seed problems, with optimal quality observed when seed problems are of moderate difficulty [32]
Codeforces难题不够刷?谢赛宁等造了个AI出题机,能生成原创编程题
机器之心· 2025-10-20 04:50
Core Insights - The article discusses the importance of training large language models (LLMs) to generate high-quality programming problems, which is crucial for advancing their capabilities towards artificial general intelligence (AGI) [1][3]. Group 1: Problem Creation and Evaluation - Creating programming competition problems requires a deeper understanding of algorithms compared to merely solving them, as competition problems have strict standards to evaluate underlying algorithm design principles [2]. - The ability to generate better problems will lead to more rigorous benchmarks for competitive programming, as existing datasets often suffer from high false positive and false negative rates [2][21]. - The AutoCode framework, developed by the LiveCodeBench Pro team, automates the entire lifecycle of creating and evaluating competitive programming problems using LLMs [3][7]. Group 2: Framework Components - The AutoCode framework consists of a Validator, Generator, and Checker, ensuring that inputs adhere to problem constraints and minimizing false negatives [8][10]. - The Generator employs diverse strategies to create a wide range of inputs, aiming to reduce false positive rates, while the Checker compares outputs against reference solutions [12][14]. - A dual verification protocol is introduced to ensure correctness without human intervention, significantly improving the quality of generated problems [29]. Group 3: Performance Metrics - The AutoCode framework achieved a consistency rate of 91.1% with a false positive rate of 3.7% and a false negative rate of 14.1%, marking a significant improvement over previous methods [21][22]. - In a more challenging benchmark with 720 recent Codeforces problems, AutoCode maintained a consistency of 98.7%, validating its effectiveness on modern, difficult problems [24]. - The framework's performance was further validated through ablation studies, confirming the effectiveness of its components [26]. Group 4: Novel Problem Generation - The team established a new problem generation framework that builds on robust test case generation, introducing a dual verification protocol to ensure correctness [29]. - LLMs can generate solvable problems that they themselves cannot solve, indicating a strength in knowledge recombination rather than original innovation [34]. - The quality of generated problems is assessed based on difficulty and the increase in difficulty compared to seed problems, providing reliable indicators of problem quality [34][38]. Group 5: Conclusion - The AutoCode framework represents a significant advancement in using LLMs as problem setters for competitive programming, achieving state-of-the-art reliability in test case generation and producing new, competition-quality problems [36]. - Despite the model's strengths in algorithmic knowledge recombination, it struggles to introduce truly novel reasoning paradigms or flawless example designs [37].
速递|AI语音革新市场调研:Keplar获凯鹏华盈领投340万美元种子轮
Z Potentials· 2025-09-22 03:54
Core Insights - Keplar is a market research startup utilizing voice AI technology to conduct customer interviews, offering faster and cheaper analysis reports compared to traditional market research firms [3][4] - The company recently raised $3.4 million in seed funding led by Kleiner Perkins, with participation from SV Angel, Common Metal, and South Park Commons [3] - Keplar's platform allows businesses to set up research projects in minutes, transforming product-related questions into interview guides [4] Company Overview - Founded in 2023 by Dhruv Guliani and William Wen, Keplar emerged from a founder incubation program [3] - The startup aims to replace traditional market research methods, which rely on manual surveys and interviews, with conversational AI [4] - Keplar's AI voice researcher can directly contact existing customers if granted access to the client's CRM system, producing reports and presentations similar to those from traditional research firms [5] Technology and Innovation - The advancements in large language models (LLMs) have made it feasible for voice AI to conduct realistic conversations, often leading participants to forget they are interacting with AI [5] - Keplar's clients include notable companies such as Clorox and Intercom, indicating its growing presence in the market [5] Competitive Landscape - Keplar is not the only AI company targeting the market research sector; competitors include Outset, which raised $17 million in A round funding, and Listen Labs, which secured $27 million from Sequoia Capital [5]
从少样本到千样本!MachineLearningLM给大模型上下文学习装上「机器学习引擎」
机器之心· 2025-09-16 04:01
Core Insights - The article discusses the limitations of large language models (LLMs) in in-context learning (ICL) and introduces a new framework called MachineLearningLM that significantly enhances the performance of LLMs in various classification tasks without requiring downstream fine-tuning [2][7][22]. Group 1: Limitations of Existing LLMs - Despite their extensive world knowledge and reasoning capabilities, LLMs struggle with ICL when faced with numerous examples, often plateauing in performance and being sensitive to example order and label biases [2]. - Previous methods relied on limited real task data, which restricted the generalization ability of models to new tasks [7]. Group 2: Innovations of MachineLearningLM - MachineLearningLM introduces a "continue pre-training" framework that allows LLMs to learn from thousands of examples directly through ICL, achieving superior accuracy in binary and multi-class tasks across various fields [2][22]. - The framework utilizes a large synthetic task dataset of over 3 million tasks generated through a structural causal model (SCM), ensuring no overlap with downstream evaluation sets, thus providing a fair assessment of model generalization [7][11]. Group 3: Methodology Enhancements - The research incorporates a two-tier filtering mechanism using Random Forest models to enhance training stability and interpretability, addressing issues of task quality inconsistency [11][12]. - MachineLearningLM employs efficient context example encoding strategies, such as using compact table formats instead of verbose natural language descriptions, which improves data handling and inference efficiency [15][20]. Group 4: Performance Metrics - The model demonstrates a continuous improvement in performance with an increasing number of examples, achieving an average accuracy that surpasses benchmark models like GPT-5-mini by approximately 13 to 16 percentage points in various classification tasks [22][24]. - In MMLU benchmark tests, MachineLearningLM maintains its original conversational and reasoning capabilities while achieving competitive zero-shot and few-shot accuracy rates [24][25]. Group 5: Application Potential - The advancements in multi-sample context learning and numerical modeling capabilities position MachineLearningLM for broader applications in finance, healthcare, and scientific computing [26][28].
LLM也具有身份认同?当LLM发现博弈对手是自己时,行为变化了
3 6 Ke· 2025-09-01 02:29
Core Insights - The research conducted by Columbia University and Montreal Polytechnic reveals that LLMs (Large Language Models) exhibit changes in cooperation tendencies based on whether they believe they are competing against themselves or another AI [1][29]. Group 1: Research Methodology - The study utilized an Iterated Public Goods Game, a variant of the Public Goods Game, to analyze LLM behavior in cooperative settings [2][3]. - The game involved multiple rounds where each model could contribute tokens to a public pool, with the total contributions multiplied by a factor of 1.6 and then evenly distributed among players [3][4]. - The research was structured into three distinct studies, each examining different conditions and configurations of the game [8][14]. Group 2: Key Findings - In the first study, when LLMs were informed they were playing against "themselves," those prompted with collective terms tended to betray more, while those prompted with selfish terms cooperated more [15][16]. - The second study simplified the rules by removing reminders and reasoning prompts, yet the behavioral differences between the "No Name" and "Name" conditions persisted, indicating that self-recognition impacts behavior beyond mere reminders [21][23]. - The third study involved LLMs truly competing against their own copies, revealing that under collective or neutral prompts, being told they were playing against themselves increased contributions, while under selfish prompts, contributions decreased [24][28]. Group 3: Implications - The findings suggest that LLMs possess a form of self-recognition that influences their decision-making in multi-agent environments, which could have significant implications for the design of future AI systems [29]. - The research highlights potential issues where AI might unconsciously discriminate against each other, affecting cooperation or betrayal tendencies in complex scenarios [29].