BERT
Search documents
谷歌AI往事:隐秘的二十年,与狂奔的365天
3 6 Ke· 2025-11-27 12:13
一年前,谷歌在硅谷的叙述中还是一个充满了"中年危机"色彩的角色。 但短短一年后,故事发生了翻天覆地的变化。 Gemini 3横扫各大榜单,「香蕉」模型Nano Banana Pro更是将AI生图的精度与想象力提升到了一个新的 高度。 财报数字成为了这一轮反击最直接的注脚:截至三季度,Gemini应用的月活跃用户突破6.5亿,相比上 一季度公布的4.5亿大幅提升。 这不禁让人好奇:谷歌最近怎么突然这么猛了? 其实,这并非一次突然的爆发,而是一场"大象转身",正以前所未有的决心和效率,将自己数十年积累 的 AI 技术储备,转化为产品力。 如果把时间轴拉得更长,你会看到一条更惊人的暗线: 从拉里·佩奇早年对"终极搜索引擎"的想象,到"猫论文",再到DeepMind、TPU,谷歌二十多年间的AI 投资几乎贯穿了现代深度学习的绝大部分关键节点。 在 Transformer 论文发表前的十年前,世界上几乎所有知名的 AI 人才,都曾在谷歌工作过。 这种全栈的技术沉淀与人才密度,为谷歌构筑了一条远超想象的护城河。 草蛇灰线,伏脉千里。谷歌今天的绝对反击,其实已经深入藏在了它二十年来的投资拼图中。今年,我 们就来复盘一下谷歌 ...
扩散不死,BERT永生,Karpathy凌晨反思:自回归时代该终结了?
3 6 Ke· 2025-11-05 04:44
Core Insights - The article discusses Nathan Barry's innovative approach to transforming BERT into a generative model using a diffusion process, suggesting that BERT's masked language modeling can be viewed as a specific case of text diffusion [1][5][26]. Group 1: Model Transformation - Nathan Barry's research indicates that BERT can be adapted for text generation by modifying its training objectives, specifically through a dynamic masking rate that evolves from 0% to 100% [13][27]. - The concept of using diffusion models, initially successful in image generation, is applied to text by introducing noise and then iteratively denoising it, which aligns with the principles of masked language modeling [8][11]. Group 2: Experimental Validation - Barry conducted a validation experiment using RoBERTa, a refined version of BERT, to demonstrate that it can generate coherent text after being fine-tuned with a diffusion approach [17][21]. - The results showed that even without optimization, the RoBERTa Diffusion model produced surprisingly coherent outputs, indicating the potential for further enhancements [24][25]. Group 3: Industry Implications - The article highlights the potential for diffusion models to challenge existing generative models like GPT, suggesting a shift in the landscape of language modeling and AI [30][32]. - The discussion emphasizes that the generative capabilities of language models can be significantly improved through innovative training techniques, opening avenues for future research and development in the field [28][30].
前阿里、字节大模型带头人杨红霞创业:大模型预训练,不是少数顶尖玩家的算力竞赛|智能涌现独家
Sou Hu Cai Jing· 2025-10-30 08:35
Core Insights - Yang Hongxia, a key figure in large model research from Alibaba and ByteDance, has launched a new AI company, InfiX.ai, focusing on decentralized model training and innovation in the AI space [1][15][36] - InfiX.ai aims to democratize access to large model training, allowing small and medium enterprises, research institutions, and individuals to participate in the process [4][16][19] Company Overview - InfiX.ai was founded by Yang Hongxia after her departure from ByteDance, with a focus on model-related technologies [1][15] - The company has quickly assembled a team of 40 people in Hong Kong, leveraging the region's strong talent pool and funding opportunities [3][15] Technological Innovations - InfiX.ai is developing a decentralized approach to large model training, contrasting with the centralized models dominated by major institutions [4][16] - The company has released the world's first FP8 training framework, which enhances training speed and reduces memory consumption compared to the commonly used FP16/BF16 [7][10] - InfiX.ai's model fusion technology allows for the integration of different domain-specific models, reducing resource waste and enhancing knowledge sharing [10][16] Market Positioning - The company is targeting challenging fields, particularly in healthcare, with a focus on cancer detection, to demonstrate the capabilities of its models [15][41] - InfiX.ai's approach is gaining traction, with increasing interest from investors and a shift in perception towards decentralized model training in the industry [15][36] Future Vision - Yang Hongxia envisions a future where every organization has its own expert model, facilitated by model fusion across different domains and geographical boundaries [16][19] - The company aims to make model training accessible and affordable, fostering a collaborative environment for AI development [16][19]
Embedding黑箱成为历史!这个新框架让模型“先解释,再学Embedding”
量子位· 2025-10-21 09:05
Core Insights - The article introduces GRACE, a new explainable generative embedding framework developed by researchers from multiple universities, aimed at addressing the limitations of traditional text embedding models [1][6]. Group 1: Background and Limitations - Text embedding models have evolved from BERT to various newer models, mapping text into vector spaces for tasks like semantic retrieval and clustering [3]. - A common flaw in these models is treating large language models as "mute encoders," which output vectors without explaining the similarity between texts [4]. - This black-box representation becomes a bottleneck in tasks requiring high interpretability and robustness, such as question-answer matching and cross-domain retrieval [5]. Group 2: GRACE Framework Overview - GRACE transforms "contrastive learning" into "reinforcement learning," redefining the meaning of contrastive learning signals [6]. - The framework emphasizes generating explanations (rationales) for text before learning embeddings, allowing the model to produce logical and semantically consistent reasoning [7][25]. - GRACE consists of three key modules: 1. Rationale-Generating Policy, which generates explanatory reasoning chains for input texts [8]. 2. Representation Extraction, which combines input and rationale to compute final embeddings [9]. 3. Contrastive Rewards, which redefines contrastive learning objectives as a reward function for reinforcement learning updates [11]. Group 3: Training Process - GRACE can be trained in both supervised and unsupervised manners, utilizing labeled query-document pairs and self-alignment techniques [12][18]. - In the supervised phase, the model learns semantic relationships from a dataset of 1.5 million samples [13]. - The unsupervised phase generates multiple rationales for each text, encouraging consistent representations across different explanations [17]. Group 4: Experimental Results - GRACE was evaluated across 56 datasets in various tasks, showing significant performance improvements over baseline models in retrieval, pair classification, and clustering [19][20]. - The results indicate that GRACE not only enhances embedding capabilities without sacrificing generative abilities but also provides transparent representations that can be understood by users [25][27]. Group 5: Conclusion - Overall, GRACE represents a paradigm shift in embedding models, moving towards a framework that can explain its understanding process, thus enhancing both performance and interpretability [28].
X @THE HUNTER ✴️
GEM HUNTER 💎· 2025-09-23 16:57
Cryptocurrency Trends - The document identifies a list of trending cryptocurrencies, including DOG, TOSHI, ASTER, APEX, MOMO, TRUMP, WLFI, PUMP, SUN, UFD, TROLL, BERT, NMR, BITCOIN, and BLESS [1] - The document acknowledges that the list of trending cryptocurrencies is incomplete and seeks community input to identify missing cryptocurrencies [1]
张小珺对话OpenAI姚顺雨:生成新世界的系统
Founder Park· 2025-09-15 05:59
Core Insights - The article discusses the evolution of AI, particularly focusing on the transition to the "second half" of AI development, emphasizing the importance of language and reasoning in creating more generalizable AI systems [4][62]. Group 1: AI Evolution and Language - The concept of AI has evolved from rule-based systems to deep reinforcement learning, and now to language models that can reason and generalize across tasks [41][43]. - Language is highlighted as a fundamental tool for generalization, allowing AI to tackle a variety of tasks by leveraging reasoning capabilities [77][79]. Group 2: Agent Systems - The definition of an "Agent" has expanded to include systems that can interact with their environment and make decisions based on reasoning, rather than just following predefined rules [33][36]. - The development of language agents represents a significant shift, as they can perform tasks in more complex environments, such as coding and internet navigation, which were previously challenging for AI [43][54]. Group 3: Task Design and Reward Mechanisms - The article emphasizes the importance of defining effective tasks and environments for AI training, suggesting that the current bottleneck lies in task design rather than model training [62][64]. - A focus on intrinsic rewards, which are based on outcomes rather than processes, is proposed as a key factor for successful reinforcement learning applications [88][66]. Group 4: Future Directions - The future of AI development is seen as a combination of enhancing agent capabilities through better memory systems and intrinsic rewards, as well as exploring multi-agent systems [88][89]. - The potential for AI to generalize across various tasks is highlighted, with coding and mathematical tasks serving as prime examples of areas where AI can excel [80][82].
LeCun团队揭示LLM语义压缩本质:极致统计压缩牺牲细节
量子位· 2025-07-04 01:42
Core Viewpoint - The article discusses the differences in semantic compression strategies between large language models (LLMs) and human cognition, highlighting that LLMs focus on statistical compression while humans prioritize detail and context [4][17]. Group 1: Semantic Compression - Semantic compression allows efficient organization of knowledge and quick categorization of the world [3]. - A new information-theoretic framework was proposed to compare the strategies of humans and LLMs in semantic compression [4]. - The study reveals fundamental differences in compression efficiency and semantic fidelity between LLMs and humans, with LLMs leaning towards extreme statistical compression [5][17]. Group 2: Research Methodology - The research team established a robust human concept classification benchmark based on classic cognitive science studies, covering 1,049 items across 34 semantic categories [5][6]. - The dataset provides category affiliation information and human ratings of "typicality," reflecting deep structures in human cognition [6][7]. - Over 30 LLMs were selected for evaluation, with parameter sizes ranging from 300 million to 72 billion, ensuring a fair comparison with human cognitive benchmarks [8]. Group 3: Findings and Implications - The study found that LLMs' concept classification results align significantly better with human semantic classification than random levels, validating LLMs' basic capabilities in semantic organization [10][11]. - However, LLMs struggle with fine-grained semantic differences, indicating a mismatch between their internal concept structures and human intuitive category assignments [14][16]. - The research highlights that LLMs prioritize reducing redundant information, while humans emphasize adaptability and richness, maintaining context integrity [17]. Group 4: Research Contributors - The research was conducted collaboratively by Stanford University and New York University, with Chen Shani as the lead author [19][20]. - Yann LeCun, a prominent figure in AI and a co-author of the study, has significantly influenced the evolution of AI technologies [24][25][29].
盘一盘,2017年Transformer之后,LLM领域的重要论文
机器之心· 2025-06-29 04:23
Core Insights - The article discusses Andrej Karpathy's concept of "Software 3.0," where natural language becomes the new programming interface, and AI models execute specific tasks [1][2]. - It emphasizes the transformative impact of this shift on developers, users, and software design paradigms, indicating a new computational framework is being constructed [2]. Development of LLMs - The evolution of Large Language Models (LLMs) has accelerated since the introduction of the Transformer architecture in 2017, leading to significant advancements in the GPT series and multimodal capabilities [3][5]. - Key foundational papers that established today's AI capabilities are reviewed, highlighting the transition from traditional programming to natural language interaction [5][6]. Foundational Theories - The paper "Attention Is All You Need" (2017) introduced the Transformer architecture, which relies solely on self-attention mechanisms, revolutionizing natural language processing and computer vision [10][11]. - "Language Models are Few-Shot Learners" (2020) demonstrated the capabilities of GPT-3, establishing the "large model + large data" scaling law as a pathway to more general artificial intelligence [13][18]. - "Deep Reinforcement Learning from Human Preferences" (2017) laid the groundwork for reinforcement learning from human feedback (RLHF), crucial for aligning AI outputs with human values [15][18]. Milestone Breakthroughs - The "GPT-4 Technical Report" (2023) details a large-scale, multimodal language model that exhibits human-level performance across various benchmarks, emphasizing the importance of AI safety and alignment [26][27]. - The release of LLaMA models (2023) demonstrated that smaller models trained on extensive datasets could outperform larger models, promoting a new approach to model efficiency [27][30]. Emerging Techniques - The "Chain-of-Thought Prompting" technique enhances reasoning in LLMs by guiding them to articulate their thought processes before arriving at conclusions [32][33]. - "Direct Preference Optimization" (2023) simplifies the alignment process of language models by directly utilizing human preference data, making it a widely adopted method in the industry [34][35]. Important Optimizations - The "PagedAttention" mechanism improves memory management for LLMs, significantly enhancing throughput and reducing memory usage during inference [51][52]. - The "Mistral 7B" model showcases how smaller models can achieve high performance through innovative architecture, influencing the development of efficient AI applications [55][56].