Workflow
BERT
icon
Search documents
X @Kraken
Kraken· 2025-07-16 16:21
WOOFmornin'Trade $BERT on Inky. Use code: BERT and sign up: https://t.co/talsOUYeBu https://t.co/e979gAnymo ...
LeCun团队揭示LLM语义压缩本质:极致统计压缩牺牲细节
量子位· 2025-07-04 01:42
Core Viewpoint - The article discusses the differences in semantic compression strategies between large language models (LLMs) and human cognition, highlighting that LLMs focus on statistical compression while humans prioritize detail and context [4][17]. Group 1: Semantic Compression - Semantic compression allows efficient organization of knowledge and quick categorization of the world [3]. - A new information-theoretic framework was proposed to compare the strategies of humans and LLMs in semantic compression [4]. - The study reveals fundamental differences in compression efficiency and semantic fidelity between LLMs and humans, with LLMs leaning towards extreme statistical compression [5][17]. Group 2: Research Methodology - The research team established a robust human concept classification benchmark based on classic cognitive science studies, covering 1,049 items across 34 semantic categories [5][6]. - The dataset provides category affiliation information and human ratings of "typicality," reflecting deep structures in human cognition [6][7]. - Over 30 LLMs were selected for evaluation, with parameter sizes ranging from 300 million to 72 billion, ensuring a fair comparison with human cognitive benchmarks [8]. Group 3: Findings and Implications - The study found that LLMs' concept classification results align significantly better with human semantic classification than random levels, validating LLMs' basic capabilities in semantic organization [10][11]. - However, LLMs struggle with fine-grained semantic differences, indicating a mismatch between their internal concept structures and human intuitive category assignments [14][16]. - The research highlights that LLMs prioritize reducing redundant information, while humans emphasize adaptability and richness, maintaining context integrity [17]. Group 4: Research Contributors - The research was conducted collaboratively by Stanford University and New York University, with Chen Shani as the lead author [19][20]. - Yann LeCun, a prominent figure in AI and a co-author of the study, has significantly influenced the evolution of AI technologies [24][25][29].
盘一盘,2017年Transformer之后,LLM领域的重要论文
机器之心· 2025-06-29 04:23
Core Insights - The article discusses Andrej Karpathy's concept of "Software 3.0," where natural language becomes the new programming interface, and AI models execute specific tasks [1][2]. - It emphasizes the transformative impact of this shift on developers, users, and software design paradigms, indicating a new computational framework is being constructed [2]. Development of LLMs - The evolution of Large Language Models (LLMs) has accelerated since the introduction of the Transformer architecture in 2017, leading to significant advancements in the GPT series and multimodal capabilities [3][5]. - Key foundational papers that established today's AI capabilities are reviewed, highlighting the transition from traditional programming to natural language interaction [5][6]. Foundational Theories - The paper "Attention Is All You Need" (2017) introduced the Transformer architecture, which relies solely on self-attention mechanisms, revolutionizing natural language processing and computer vision [10][11]. - "Language Models are Few-Shot Learners" (2020) demonstrated the capabilities of GPT-3, establishing the "large model + large data" scaling law as a pathway to more general artificial intelligence [13][18]. - "Deep Reinforcement Learning from Human Preferences" (2017) laid the groundwork for reinforcement learning from human feedback (RLHF), crucial for aligning AI outputs with human values [15][18]. Milestone Breakthroughs - The "GPT-4 Technical Report" (2023) details a large-scale, multimodal language model that exhibits human-level performance across various benchmarks, emphasizing the importance of AI safety and alignment [26][27]. - The release of LLaMA models (2023) demonstrated that smaller models trained on extensive datasets could outperform larger models, promoting a new approach to model efficiency [27][30]. Emerging Techniques - The "Chain-of-Thought Prompting" technique enhances reasoning in LLMs by guiding them to articulate their thought processes before arriving at conclusions [32][33]. - "Direct Preference Optimization" (2023) simplifies the alignment process of language models by directly utilizing human preference data, making it a widely adopted method in the industry [34][35]. Important Optimizations - The "PagedAttention" mechanism improves memory management for LLMs, significantly enhancing throughput and reducing memory usage during inference [51][52]. - The "Mistral 7B" model showcases how smaller models can achieve high performance through innovative architecture, influencing the development of efficient AI applications [55][56].
ESG体系下的AI研究(一):多维投资增效,防范伦理风险
ZHESHANG SECURITIES· 2025-06-05 14:23
Group 1: AI and ESG Investment Infrastructure - AI is expected to significantly enhance ESG investment infrastructure by addressing challenges such as high compliance costs and difficulties in data acquisition and analysis[2] - AI can help regulatory bodies reduce tracking costs and improve the implementation of ESG policies through dynamic monitoring and cross-validation systems[2] - Companies can utilize AI tools like knowledge graphs to analyze policies and automate compliance reporting, thereby lowering compliance costs and encouraging ESG practices[2] Group 2: AI's Role in Investment Strategy and Marketing - Traditional ESG data faces issues like low update frequency and high processing costs; AI can streamline data collection and analysis, providing timely insights for investors[3] - Machine learning algorithms can assist in constructing and selecting factor strategies, optimizing risk-return profiles for investors[3] - Generative AI can significantly reduce marketing costs by generating marketing strategies and content, enhancing investor engagement[3] Group 3: Responsible AI and Ethical Risk Management - The integration of responsible AI principles with ESG frameworks can help identify companies with ethical risks associated with AI, aiding investors in risk management[4] - AI's dual impact on environmental, social, and governance aspects necessitates a robust ethical risk analysis framework to mitigate potential negative consequences[4] - Investors can leverage communication with companies to gather information on AI governance measures, enhancing their understanding of associated risks[4] Group 4: Risk Considerations - Potential risks include slower-than-expected economic recovery, instability of AI models, and fluctuations in market sentiment and preferences[5]
AI浪潮录丨王晟:谋求窗口期,AI初创公司不要跟巨头抢地盘
Bei Ke Cai Jing· 2025-05-30 02:59
Core Insights - Beijing is emerging as a strategic hub in the AI large model sector, driven by technological innovation and a supportive ecosystem for breakthroughs [1] - The role of angel investors is crucial in the AI industry, providing essential support to startups and helping them take their first steps [4] - The AI large model wave has gained momentum globally since 2023, with early investments in generative models proving to be prescient [5][6] Group 1: AI Development and Investment Trends - The AI large model trend is characterized by a shift from previous waves focused on computer vision and autonomous driving to the current emphasis on AI agents and embodied intelligence [5][6] - Investors are increasingly favoring experienced founders with strong academic and research backgrounds, as seen in the case of companies like DeepMind and the Tsinghua NLP team [12][16] - The emergence of open-source models like Llama has accelerated competition among AI companies, allowing them to shorten development timelines [13] Group 2: Investment Strategies and Market Dynamics - Angel investors are focusing on a select number of projects, often operating in a "water under the bridge" manner, avoiding fully marketized projects [14][15] - The investment landscape is divided between long-term oriented funds that prioritize innovation and those focused on immediate revenue generation [21][22] - The success of companies like DeepSeek highlights the challenges faced by startups in competing with established giants, as the consensus around large models has solidified post-ChatGPT [26][27] Group 3: Entrepreneurial Characteristics and Market Challenges - Current AI entrepreneurs are predominantly scientists or technical experts, forming a close-knit community that is easier to identify and engage with [18][19] - The academic foundation of AI startups is critical, as many successful ventures are built on decades of research and development from their respective institutions [16][20] - The market is witnessing a shift where the ability to innovate is becoming more important than merely having financial resources, as the previous model of "buying capability" is no longer sustainable [27][28]
DeepSeek技术溯源及前沿探索报告
Zhejiang University· 2025-05-22 01:20
浙江大学DS系列专题 DeepSeek技术溯源及前沿探索 主讲人:朱强 浙江大学计算机科学与技术学院 人工智能省部共建协同创新中心(浙江大学) https://person.zju.edu.cn/zhuq 1 Outline 一、语言模型 三、ChatGPT 二、Transformer 四、DeepSeek 五、新一代智能体 2 语言模型:终极目标 Language Modeling 对于任意的词序列,计算出这个序列是一句话的概率 我们每天都和语言模型打交道: I saw a cat I saw a cat on the chair I saw a cat running after a dog I saw a ca car I saw a cat in my dream 3 语言模型:基本任务 编码:让计算机理解人类语言 She is my mom 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 只有一个1,其余均为0 One-hot Encoding有什么缺点吗? One-hot Encoding 4 编码:让计算机理解人类语言 Word Embedding A bottle of tez ...
一文讲透AI历史上的10个关键时刻!
机器人圈· 2025-05-06 12:30
Core Viewpoint - By 2025, artificial intelligence (AI) has transitioned from a buzzword in tech circles to an integral part of daily life, impacting various industries through applications like image generation, coding, autonomous driving, and medical diagnosis. The evolution of AI is marked by significant breakthroughs and challenges, tracing back to the Dartmouth Conference in 1956, leading to the current technological wave driven by large models [1]. Group 1: Historical Milestones - The Dartmouth Conference in 1956 is recognized as the birth of AI, where pioneers gathered to explore machine intelligence, laying the foundation for AI as a formal discipline [2][3]. - In 1957, Frank Rosenblatt developed the Perceptron, an early artificial neural network that introduced the concept of optimizing models using training data, which became central to machine learning and deep learning [4][6]. - ELIZA, created in 1966 by Joseph Weizenbaum, was the first widely recognized chatbot, demonstrating the potential of AI in natural language processing by simulating human-like conversation [7][8]. - The rise of expert systems in the 1970s, such as Dendral and MYCIN, showcased AI's ability to perform specialized tasks in fields like chemistry and medical diagnosis, establishing its application in professional domains [9][11]. - IBM's Deep Blue defeated world chess champion Garry Kasparov in 1997, marking a significant milestone in AI's capability to outperform humans in strategic decision-making [12][14]. - The 1990s to 2000s saw a shift towards data-driven algorithms in AI, emphasizing the importance of machine learning [15]. - The emergence of deep learning in 2012, particularly through the work of Geoffrey Hinton, revolutionized AI by utilizing multi-layer neural networks and backpropagation techniques, leading to significant advancements in model training [17][18]. - The introduction of Generative Adversarial Networks (GANs) in 2014 by Ian Goodfellow transformed the field of generative models, enabling the creation of realistic synthetic data [20]. - AlphaGo's victory over Lee Sedol in 2016 highlighted AI's potential in complex games requiring intuition and strategic thinking, further pushing the boundaries of AI capabilities [22]. - The development of large language models began with the introduction of the Transformer architecture in 2017, leading to models like GPT-3, which demonstrated emergent abilities and set the stage for the current AI landscape [24][26].
首个能准确回答火箭发动机问题的AI来了!马斯克:下周推出 Grok 3.5【附大模型行业现状分析】
Sou Hu Cai Jing· 2025-05-04 09:57
Grok由马斯克旗下xAI公司研发,该公司由社交平台X(原推特)与人工智能团队整合成立,Grok为其 核心产品。 Grok的突破性在于其 "立场鲜明"的对话哲学 。不同于传统AI的"中立化"表达,Grok基于马斯克推崇 的"极致求真"理念,依托X平台海量数据训练,兼具实时响应能力、幽默表达风格及争议性话题处理机 制。 回顾过往,2024年7月,马斯克披露Grok 3训练过程使用10万块英伟达H100芯片。2025年1月3日,马斯 克宣布Grok 3即将发布;1月27日,该版本在独立平台及X平台启动内部测试;2月18日,xAI正式推出 Grok 3;2月20日,Grok 3面向公众免费开放,上线后迅速登顶苹果应用商店免费应用下载榜首。 大模型是指包含超大规模参数(通常在十亿个以上)的神经网络模型。这些模型基于神经网络结构构成, 受到人脑神经系统结构的启发,由人工神经元(节点)和它们之间的连接组成。通过调整这些连接的权 重,神经网络能够学习和适应输入数据的模式。 当下,全球大模型正经历 "性能竞赛"向"价值创造" 的范式转型,技术突破与伦理约束的平衡、开源生 态与商业模式的创新、垂直场景与跨行业知识的融合,将决定 ...
亚裔 AI 人才的硅谷晋升之路,被一张绿卡阻断了?
3 6 Ke· 2025-04-28 11:23
Core Viewpoint - The article highlights the precarious situation faced by skilled immigrants in the U.S. tech industry, particularly in light of tightening immigration policies, as exemplified by the case of Kai Chen, a prominent AI researcher who was forced to leave the U.S. after her green card application was denied [2][4][5]. Group 1: Impact of Immigration Policies - The tightening of immigration policies under the Trump administration has created a new barrier for skilled workers in the tech industry, particularly affecting those on H1B visas [2][6][18]. - Kai Chen's experience reflects a broader trend where even highly qualified individuals with significant contributions to their companies can suddenly find themselves at risk of deportation [4][5][6]. - The article notes that over 1,000 international students have had their visas revoked, illustrating the widespread impact of these policies across various sectors [16][18]. Group 2: Demographics and Contributions of Asian Talent - Asian representation in major U.S. AI companies is significant, with Asians making up 45.7% of Google's workforce, surpassing the percentage of white employees [7][9]. - The article emphasizes that while Asian talent, particularly of Indian and Chinese descent, has been rising in the tech industry, they often face challenges in career advancement due to office politics [9][10]. - Despite these challenges, the AI sector has provided new opportunities for Asian professionals, allowing them to leverage their technical skills for career growth [10][11]. Group 3: Future Prospects for Talent - The article discusses the potential for skilled workers like Kai Chen to seek opportunities outside the U.S., as companies in Europe and China are actively recruiting top talent [19][20]. - Major Chinese tech firms are launching initiatives to attract high-end talent, indicating a shift in where skilled professionals may choose to work in the future [20][21]. - The narrative suggests that while the U.S. has historically been a magnet for talent, the current political climate may lead to a redistribution of skilled workers globally [19][22].