大型语言模型
Search documents
人工智能和知识图谱:人工智能中知识图谱的概述
3 6 Ke· 2025-05-30 03:48
Core Insights - Knowledge Graphs (KG) are structured networks of entities and their relationships, providing a powerful tool for semantic understanding and data integration in artificial intelligence [1][2][3] - The concept of Knowledge Graphs was popularized by Google in 2012, building on decades of research in semantic networks and ontologies [1][8] - Future innovations will focus on automating the construction of Knowledge Graphs, enhancing reasoning capabilities, and integrating them closely with AI models [1][9] Definition and Structure - Knowledge Graphs represent knowledge as a network of entities (nodes) and their relationships (edges), allowing for flexible data modeling [2] - Each node corresponds to a real-world concept identified by a unique ID or URI, while edges represent specific relationships between entities [2] Role in Artificial Intelligence - Knowledge Graphs play a crucial role in machine reasoning and semantic understanding by providing structured background knowledge for AI systems [3][4] - They facilitate knowledge integration by linking information from multiple sources, creating a unified view [3][5] - Knowledge Graphs enhance semantic richness, improving the performance of AI technologies like machine learning and natural language processing [3][5] Significance and Benefits - Knowledge Graphs embed knowledge into AI systems, reducing the need for extensive training data by providing prior knowledge [5][6] - They improve transfer learning by allowing AI systems to apply knowledge across different tasks without retraining [6] - Knowledge Graphs contribute to explainable AI by providing transparent representations of facts and their connections, enhancing trust in AI decisions [6][7] Data Integration and Interoperability - Knowledge Graphs use shared vocabularies and identifiers to achieve interoperability between systems, acting as a common language for data integration [7] - They are essential for building large-scale AI systems, as demonstrated by Google's use of Knowledge Graphs to enhance search results [7] Historical Evolution - The term "Knowledge Graph" gained popularity in 2012, but its underlying concepts date back to the 1960s with semantic networks [8] - The development of standards like RDF and OWL has facilitated the interconnection of data on the web, laying the groundwork for modern Knowledge Graphs [8] Recent Developments - From 2023 to 2025, significant progress is expected in integrating Knowledge Graphs with large language models (LLMs) to enhance reasoning capabilities [9][10] - Research is focused on using LLMs as external knowledge sources for Knowledge Graphs, improving fact accuracy and handling complex queries [10][11] Emerging Trends - The collaboration between Knowledge Graphs and LLMs is a key research area, aiming to combine symbolic reasoning with neural language understanding [16] - There is a growing emphasis on domain-specific Knowledge Graphs, particularly in fields like biomedicine and law, which require customized ontologies and algorithms [16] - Advances in Knowledge Graph embedding techniques are expected to address challenges related to dynamic knowledge and multimodal data integration [16][12]
香港金管局与香港科技大学签署合作备忘录 推动香港金融业的网络安全创新
Zhi Tong Cai Jing· 2025-05-29 03:26
Core Viewpoint - The Hong Kong Monetary Authority (HKMA) and the Hong Kong University of Science and Technology (HKUST) Business School have signed a memorandum of cooperation to enhance collaboration in cybersecurity research, addressing the needs of the Hong Kong financial industry [1][2] Group 1: Collaboration Details - The memorandum establishes a strategic cooperation framework focused on cybersecurity, aiming to promote relevant research and knowledge growth [1] - The collaboration will utilize advanced technologies such as large language models to explore innovative supervisory technology (Suptech) and regulatory technology (Regtech) solutions [1] - The goal is to enhance the HKMA's regulatory capabilities and strengthen the financial sector's cybersecurity resilience [1] Group 2: Objectives and Impact - The partnership aims to develop practical application solutions, increase industry awareness of emerging threats, and cultivate cybersecurity professionals to support the ongoing development of the financial industry [1] - HKMA and HKUST will actively engage with financial institutions to validate research outcomes and gain deeper insights into the evolving cybersecurity needs and challenges faced by the industry [1] - The collaboration is expected to contribute to the resilience of Hong Kong's financial ecosystem by addressing real-world cybersecurity challenges [2]
蔡崇信:大多数机器人不需要像人类,年轻人选老板比选岗位更重要
Sou Hu Cai Jing· 2025-05-26 03:36
ters we the 来源:猎云网 第五届BEYOND国际科技创新博览会(BEYOND Expo2025)于5月21日至24日举行。 5月24日,在闭幕式上,阿里巴巴集团董事长蔡崇信现身现场,提到阿里巴巴对组织架构进行了一些调整。 蔡崇信称,阿里巴巴将专注于几大核心业务:一是电子商务;二是云计算;三是希望确保人工智能渗透到业务的各个方面,既面向客户,也面向内部。 此外,蔡崇信还发表了年轻人就业的观点。 他认为,年轻人应因为想获取更多技能和知识而工作,这才是工作的意义。 同时,他表示,当你将机器人技术与人工智能结合起来时,想到了非常令人兴奋的事情。比如,机器人可以为你煮咖啡,或者可以到你家清洁地板。 但他也认为,世界上大多数智能机器人不需要看起来像人类。 他举例,如果你想让一个机器人来清洁你的地毯,回家打扫你的厨房或客厅,你真的想要一个看起来像人类的东西吗?我会感到害怕。我只想要一个看起来 像吸尘器的东西能智能地在房间里完成清洁工作。 "当我们谈论机器人时,我们总是会想起小时候看过的电影。它们看起来都像人,但它们显然不是人。现在,我们是否正在努力向与人类完全一样的机器迈 进?我认为这实际上是一种技术。还有很多 ...
腾讯混元TurboS技术报告首次全公开:560B参数混合Mamba架构,自适应长短链融合
AI前线· 2025-05-22 19:57
Core Viewpoint - Tencent's Hunyuan TurboS model ranks 7th globally in the latest Chatbot Arena evaluation, showcasing its advanced capabilities and innovative architecture [1][2]. Group 1: Model Architecture and Innovations - Hunyuan TurboS employs a hybrid Transformer-Mamba architecture, achieving a balance between performance and efficiency through the integration of Mamba's long-sequence processing and Transformer’s contextual understanding [2][7]. - The model features 128 layers and utilizes an innovative "AMF" (Attention → Mamba2 → FFN) and "MF" (Mamba2 → FFN) interleaved module pattern, maintaining high computational efficiency while having a total of 560 billion parameters [7][14]. - An adaptive long-short thinking chain mechanism allows the model to dynamically switch between quick response and deep thinking modes based on problem complexity, optimizing resource allocation [2][7]. Group 2: Training and Evaluation - The model was trained on a dataset comprising 16 trillion tokens, significantly enhancing its performance compared to previous iterations [10][13]. - Hunyuan TurboS achieved an overall score of 1356 in the LMSYS Chatbot Arena, ranking it among the top 7 out of 239 models evaluated [2][49]. - The model demonstrated strong performance across various benchmarks, particularly excelling in multi-task capabilities and multilingual support, ranking first in Chinese, French, and Spanish [4][42]. Group 3: Post-Training Strategies - The post-training process includes four key modules: Supervised Fine-Tuning (SFT), Adaptive Long-short CoT Fusion, Multi-round Deliberation Learning, and Two-stage Large-scale Reinforcement Learning [8][22]. - SFT data was meticulously curated across multiple themes, ensuring high-quality samples for training [24][26]. - The adaptive long-short CoT fusion method allows the model to choose between long and short reasoning chains based on the complexity of the task, enhancing its reasoning capabilities [26][29]. Group 4: Performance Metrics - Hunyuan TurboS outperformed many leading models in key areas such as mathematical reasoning, logic reasoning, and knowledge-intensive tasks, particularly in Chinese evaluations [41][42]. - The model achieved a cost-effective output generation, using only 52.8% of the tokens compared to similar models while maintaining performance [43][45]. - The model's architecture and training optimizations resulted in a 1.8x acceleration in inference compared to pure Transformer MoE models [47].
何恺明等新作大道至简,瞬时速度改为平均速度,一步生成表现提升70%
量子位· 2025-05-21 06:31
Core Viewpoint - The article discusses the introduction of a new model called MeanFlow, which utilizes average velocity to achieve a one-step generation framework, significantly improving the state-of-the-art (SOTA) in image generation tasks [1][5][10]. Group 1: Model Development - The MeanFlow model is trained from scratch without any pre-training, distillation, or curriculum learning, achieving a Fréchet Inception Distance (FID) score of 3.43, which is a notable improvement over previous one-step diffusion/flow models [3][10][13]. - The model introduces the concept of average velocity to represent flow fields, contrasting with instantaneous velocity used in flow matching methods [5][9]. Group 2: Experimental Results - Experiments conducted on ImageNet at a resolution of 256×256 demonstrated that the MeanFlow model achieved a 50% to 70% relative advantage over previous state-of-the-art methods in terms of FID scores [13][19]. - The model's performance was evaluated through an ablation study, showing various configurations and their corresponding FID scores, with the best results achieved under specific parameter settings [15][19]. Group 3: Scalability and Comparison - The MeanFlow model exhibits good scalability in terms of model size, with different configurations yielding competitive FID scores compared to other generative models [16][19]. - A comparison of the MeanFlow model with other generative models indicates that it significantly narrows the gap between one-step diffusion/flow models and their multi-step predecessors [19][20]. Group 4: Research Team and Background - The research was conducted by a team from MIT and CMU, including notable contributors such as PhD student Geng Zhengyang and other students of He Kaiming [21][22][23]. - The team aims to bridge the gap between generative modeling and simulations in physics, addressing multi-scale simulation problems [20].
前景堪忧!苹果(AAPL.US)被曝在AI领域遭遇重重挫折
Zhi Tong Cai Jing· 2025-05-18 23:53
Core Insights - Apple's ongoing struggles in the AI sector may jeopardize its dominance in the smartphone market and threaten its broader ambitions in robotics and next-generation hardware [1] - Despite initial optimism following the hiring of John Giannandrea in 2018 to lead AI strategy, Apple has failed to keep pace with competitors in generative AI and large language models [1][3] Group 1: AI Strategy and Developments - In 2024, Apple announced "Apple Intelligence," promising smarter writing tools, summarization features, and an upgraded Siri, but the rollout has faced delays and internal testing issues [2] - Apple's slow progress in AI is attributed to a reluctance to make large-scale investments, internal cultural resistance, and strict data privacy policies that limit AI model training [3] - Apple is undergoing a restructuring, with leadership of Siri and related product development shifting from John Giannandrea to Mike Rockwell, head of the Vision Pro headset project [3] Group 2: Future Outlook and Challenges - Engineers are working on a complete overhaul of Siri's architecture to create a new system based on large language models, with internal testing of a proprietary chatbot aimed at matching ChatGPT's capabilities [4] - Apple plans to differentiate Siri from the broader "Apple Intelligence" brand to repair its damaged reputation, while adopting a conservative approach at the 2025 WWDC, focusing on incremental improvements rather than groundbreaking features [4] - Despite significant challenges, insiders believe Apple has the potential to catch up due to its hardware integration advantages, large global user base, and brand influence, although many acknowledge that Apple can no longer afford to be a "latecomer" in the AI field [4]
【中国那些事儿】俄专家:中俄人工智能合作跨越“小院高墙”,构建公平世界科技新秩序
Huan Qiu Wang Zi Xun· 2025-05-10 05:18
科洛宁还提到,人工智能的飞速发展引发了人们对滥用人工智能和通用人工智能的担忧。一些国家利用 其在人工智能领域的主导地位,对他国进行胁迫,阻挠它们与被视为威胁的国家开展合作。鉴于此,那 些希望建立公平世界秩序的国家需加深彼此间的合作,例如在金砖国家框架下,秉持互惠互利的原则, 共同推动全球科技治理体系的完善。 科洛宁强调,俄罗斯科学界对与中国以及其他志同道合的国家携手,共同推动全球在人工智能和通用人 工智能领域的协调发展与有效治理持开放态度。欢迎其他国家参与俄罗斯AGI社区研讨会等开放活动, 以及数学AI等联合会议,并期待各方逐步完善人工智能技术管理的联合战略。 另据相关报道,由外国顶尖专家组成的"瓦尔代"国际辩论俱乐部(Valdai Discussion Club)项目主任季 莫费·博尔达切夫(Timofei Bordachev)同样指出,人工智能是前沿科技领域,中国和俄罗斯都具备相 应技术和人才,两国可以通过在这一领域的合作,树立起科技合作的典范,并为全球南方国家在科学、 文化和教育领域的解放贡献力量。这不仅将为两国开辟全新的合作领域,还将切实推动南南合作,这对 于构建一个更加平衡、公正的世界秩序至关重要。 ...
铜缆和光纤外,第三种选择
半导体行业观察· 2025-05-08 01:49
Core Viewpoint - The article discusses the limitations of copper and fiber interconnects in next-generation data centers and introduces a third solution, e-Tube, which aims to support the growing demands of AI workloads and data bandwidth requirements [1][10][16]. Group 1: Challenges in Data Center Expansion - Data center AI accelerator clusters face increasing complexity due to the emergence of new technologies, particularly generative AI and large language models (LLMs), which are pushing data bandwidth beyond traditional interconnects, rapidly doubling to 800G and soon reaching 1.6T [1]. - The need for improved performance, cost control, and energy efficiency presents significant challenges for network operators [4]. Group 2: Limitations of Current Technologies - Data centers currently rely on 400G and 800G network equipment, using copper cables for short distances and fiber optics for long distances, but both technologies are approaching their respective limits in terabit interconnect speeds [3][6]. - Copper cables, while cost-effective and reliable for short distances, suffer from channel loss due to skin effect, limiting their transmission range and scalability in high-density data centers [3][6]. Group 3: Transition to Optical Interconnects - Large-scale enterprises are shifting towards optical interconnects, such as Active Optical Cables (AOC), which can provide connections over several kilometers but come with increased complexity, power consumption, and costs, potentially up to five times that of copper cables [8]. - Optical technologies are less reliable due to performance variations with temperature changes and the eventual failure of optical components, which can also introduce significant latency [8]. Group 4: Introduction of e-Tube Technology - The e-Tube platform offers a scalable multi-terabit interconnect solution using plastic medium waveguides to transmit radio frequency data, overcoming the limitations of copper and fiber optics [10][12]. - e-Tube cables, made from low-density polyethylene (LDPE), can efficiently transmit data without the high-frequency losses associated with copper, supporting data speeds from 56G to 224G and beyond [12]. Group 5: Advantages of e-Tube - e-Tube technology results in a tenfold increase in cable coverage, fivefold reduction in weight, twofold decrease in thickness, threefold reduction in power consumption, and a thousandfold decrease in latency, all while reducing costs by three times [14]. - This technology is positioned as an ideal alternative to copper cables as data centers transition to 1.6T and 3.2T speeds, providing unique power efficiency and compatibility with existing network infrastructure [14][16].
优步UBER
2025-05-07 15:20
Summary of Uber's Q1 2025 Earnings Call Company Overview - **Company**: Uber Technologies, Inc. (UBER.US) - **Date**: May 7, 2025 Key Points Financial Performance - Uber reported a strong Q1 2025 performance with total bookings and trip volume both increasing, adjusted EBITDA reached $1.9 billion, a 35% year-over-year increase, and free cash flow hit a record $2.3 billion [1][2] - Monthly active users grew by 14% to 170 million, with trip volume increasing by 18% and global retention rates at an all-time high [2] Autonomous Vehicle Initiatives - Uber partnered with Waymo to deploy approximately 100 autonomous vehicles in Austin, achieving high utilization rates and positive consumer feedback, with average usage exceeding 99% compared to human drivers [3][4] - Plans to expand the autonomous vehicle fleet in Austin and other regions like Atlanta are underway [4] Pricing Strategy and Market Dynamics - Uber observed that price elasticity remains similar to past trends, where a $1 price increase negatively impacts transaction volume, but consumers are adapting to stable pricing [5] - The competitive landscape in the U.S. ride-hailing market is intense, with competitors like Bolt and DK&D in international markets, yet Uber maintains a leading position [6] Growth Outlook - Uber anticipates stronger revenue and profitability growth in Q2 2025, setting a solid foundation for the peak season in the second half of the year [7] - The company is focused on providing high-quality services and has established clear strategies and ambitious goals for future growth [7] Delivery Business Performance - The gross margin for Uber's delivery business expanded to 3.7%, a 70 basis point increase year-over-year, driven by advertising revenue and economies of scale [3][10] - The delivery business showed strong profitability with a contribution margin of 9% in Q1, indicating robust growth potential in grocery and retail sectors [10] Insurance Costs and Innovations - Uber expects moderate increases in insurance costs in 2025 but aims to alleviate cost pressures through innovations and policy adjustments [3][11] - The company is implementing driver behavior scoring to enhance safety and reduce insurance costs, with positive feedback received [11] Macro Economic Environment - The macroeconomic environment has not shown significant changes in audience growth, maintaining a stable frequency of service usage [12][13] - Uber's diverse service categories, including dining and transportation, are less affected by macroeconomic uncertainties [13] International Market Developments - In Europe, Uber has achieved a leading position in the UK food delivery market through organic growth, with France and Germany identified as key markets for future expansion [16] Emerging Market Opportunities - Sparse mobility markets present growth opportunities for Uber, with 20% of trips now coming from these areas, which are growing faster than urban core markets [18][19] - Uber plans to launch hundreds of new cities by 2025, focusing on achieving sustainable profitability in these markets [18] Future of Autonomous Driving - The autonomous driving sector is evolving, with companies like Waymo leading the way, and Uber is collaborating with various partners to develop and deploy autonomous technologies in Europe [11][15] Conclusion - Uber's strategic focus on enhancing service quality, expanding autonomous vehicle initiatives, and navigating competitive pressures positions the company for continued growth and profitability in the evolving mobility landscape [7][19]
ICML 2025 | 注意力机制中的极大值:破解大语言模型上下文理解的关键
机器之心· 2025-05-06 04:11
Core Insights - The article discusses a significant phenomenon in large language models (LLMs) related to the concentration of massive values in the self-attention mechanism, particularly in the query (Q) and key (K) representations, which is crucial for contextual knowledge understanding [1][3][4]. Research Highlights - The study reveals that massive values are highly concentrated in Q and K, which is contrary to the expectation of independent operations in each attention head. This consistency across multiple layers and heads is visually demonstrated [3][4]. - The phenomenon of massive values is specifically observed in models using Rotational Position Encoding (RoPE), such as LLaMA, Qwen, and Gemma, while models without RoPE, like GPT-2 and OPT, do not exhibit this pattern [4]. - The research establishes a direct link between the presence of massive values in Q and K and the ability to understand contextual knowledge [4]. Key Findings 1. **Concentration of Massive Values**: Massive values are found to be highly concentrated in specific regions of each attention head, indicating a surprising level of consistency [3][4]. 2. **Impact on Contextual Knowledge Understanding**: The study shows that the presence of massive values is critical for understanding contextual knowledge, as demonstrated through destructive experiments that reset these values to their average [5][6]. 3. **Quantization Techniques**: Specific quantization methods that address massive values, such as AWQ and SmoothQuant, are shown to better preserve contextual knowledge understanding compared to methods that do not focus on massive values [7]. 4. **Origin of Concentration Phenomenon**: The concentration of massive values is attributed to RoPE, which affects low-frequency regions in Q and K, leading to this phenomenon appearing from the early layers of the model [8]. Experimental Results - The experiments reveal a stark contrast in the impact of massive values on different knowledge tasks: - **Resilience in Parametric Knowledge Retrieval**: Tasks relying on parametric knowledge show a decline of only 15-20% in accuracy when massive values are disrupted, maintaining 76%-88% accuracy [10]. - **Catastrophic Decline in Contextual Knowledge Tasks**: Tasks requiring contextual understanding experience a drastic drop in performance, with accuracy in key retrieval tasks plummeting from 100% to near 0% when massive values are disrupted [11]. - **Control Experiments**: When only non-massive values are disrupted, task performance remains stable, confirming the unique importance of massive values in contextual understanding [12]. Future Directions - The research opens several avenues for further exploration, including enhancing or adjusting the distribution of massive values to improve contextual understanding, examining the universality of this phenomenon across different architectures, and designing targeted quantization methods to protect massive values related to contextual understanding [16].