Workflow
大语言模型
icon
Search documents
谷歌推出 LLM-Evalkit,为提示词工程带来秩序与可衡量性
AI前线· 2025-10-29 00:44
Core Insights - Google has launched LLM-Evalkit, an open-source framework built on Vertex AI SDK, aimed at streamlining prompt engineering for large language models [2][5] - The tool replaces fragmented documentation and guesswork with a unified, data-driven workflow, allowing teams to create, test, version, and compare prompts in a coherent environment [2][3] - LLM-Evalkit emphasizes precise measurement over subjective judgment, enabling users to define specific tasks and evaluate outputs using objective metrics [2][3] Integration and Accessibility - LLM-Evalkit seamlessly integrates with existing Google Cloud workflows, creating a structured feedback loop between experimentation and performance tracking [3] - The framework features a no-code interface, lowering the operational barrier for a wider range of professionals, including developers, data scientists, and UX writers [3] - This inclusivity fosters rapid iteration and collaboration between technical and non-technical team members, transforming prompt design into a cross-disciplinary effort [3] Community Response and Availability - The announcement of LLM-Evalkit has garnered significant attention from industry practitioners, highlighting the need for a centralized system to track prompts, especially as models evolve [6] - LLM-Evalkit is available as an open-source project on GitHub, deeply integrated with Vertex AI, and comes with detailed tutorials in the Google Cloud console [6] - New users can utilize a $300 trial credit provided by Google to explore the capabilities of this powerful tool [6]
国泰海通:打破内存墙限制 AI SSD迎来广阔成长空间
智通财经网· 2025-10-28 12:33
Core Viewpoint - The report from Guotai Junan Securities highlights the challenges faced by large language models (LLMs) due to the "memory wall" issue, proposing SSD-based storage offloading technology as a new pathway for efficient AI model operation [1][2]. Industry Perspective and Investment Recommendations - The massive data generated by AI is straining global data center storage facilities, leading to a focus on SSDs as traditional Nearline HDDs face supply shortages. The industry is rated "overweight" [1][2]. - The growth of KV Cache capacity is surpassing the capabilities of High Bandwidth Memory (HBM), necessitating the optimization of computational efficiency and reduction of redundant calculations through KV Cache technology [2]. KV Cache Management and Technological Innovations - The industry is exploring tiered cache management technologies for KV Cache, with NVIDIA's Dynamo framework allowing for the offloading of KV Cache from GPU memory to CPU, SSD, and even network storage, addressing the memory bottleneck of large models [3]. - Samsung's proposal at the 2025 Open Data Center Conference suggests SSD-based storage offloading to enhance AI model performance, achieving significant reductions in token latency when KV Cache size exceeds HBM or DRAM capacity [3]. Market Dynamics and Supply Chain Adjustments - The demand for AI storage is driving a shift from HDDs to high-capacity Nearline SSDs, with NAND Flash suppliers accelerating production of ultra-large capacity SSDs (122TB and 245TB) in response to the supply gap in the HDD market [4].
国泰海通|电子:打破内存墙限制,AI SSD迎来广阔成长空间
Core Viewpoint - The article discusses the challenges faced by large language models (LLMs) due to the "memory wall" issue and proposes SSD-based storage offloading technology as a new path for efficient AI model operation [1]. Group 1: Industry Insights and Investment Recommendations - The massive data generated by AI is impacting global data center storage facilities, leading to a focus on KV Cache caching that can offload from GPU memory to CPU and SSD [1]. - The traditional Nearline HDD, which has been a cornerstone for massive data storage, is experiencing supply shortages, prompting a shift towards high-performance, high-cost SSDs, resulting in an "overweight" rating for the industry [1]. Group 2: KV Cache Technology and Its Implications - The growth of KV Cache capacity is exceeding the capabilities of HBM, as it temporarily stores generated tokens to optimize computational efficiency and reduce redundant calculations [2]. - As the demand for larger models and longer sequences increases, the reliance on HBM is becoming a bottleneck, leading to frequent memory overflows and performance issues [2]. Group 3: Technological Developments in Storage Solutions - The industry is exploring tiered caching management technologies for KV Cache, with NVIDIA launching a distributed inference service framework called Dynamo to offload KV Cache from GPU memory to CPU, SSD, and even network storage [3]. - Samsung has proposed an SSD-based storage offloading solution to address the "memory wall" challenge, which can reduce the first token latency by up to 66% and inter-token latency by up to 42% when KV Cache size exceeds HBM or DRAM capacity [3]. Group 4: Market Trends and Supply Chain Dynamics - The demand for AI storage is driving a replacement effect for HDDs, with NAND Flash suppliers accelerating the production of large-capacity Nearline SSDs due to significant supply gaps in the HDD market [4]. - NAND Flash manufacturers are investing in the production of ultra-large capacity Nearline SSDs, such as 122TB and even 245TB models, to meet the growing demand from AI inference applications [4].
大模型优秀大脑齐聚硬核开源聚会,SGLang社区举办国内首次Meetup
机器之心· 2025-10-28 06:29
Core Insights - The Pytorch Conference 2025 showcased the vibrant community and significant developments in deep learning, particularly highlighting SGLang's contributions and potential in the industry [1][3][4]. SGLang Overview - SGLang, an open-source high-performance inference engine for large language models and visual language models, originated from RadixAttention and is incubated by the non-profit organization LMSYS. It offers low latency and high throughput inference across various environments, from single GPUs to large distributed clusters [7][8]. Community Engagement - The first Meetup event in Beijing, co-hosted by SGLang, Meituan, and Amazon Web Services, attracted numerous contributors, developers, and scholars, indicating a strong community presence and development potential [4][8]. Technical Developments - The Meetup featured technical discussions on SGLang's architecture, including advancements in KV Cache, Piecewise CUDA Graph, and Spec Decoding, aimed at improving efficiency and compatibility [21][22]. - SGLang's quantization strategies were also discussed, focusing on expanding application range and optimizing model performance [34][35]. Application and Practice - Various industry applications of SGLang were presented, including its integration with Baidu's Ernie 4.5 model for large-scale deployment and optimization in search scenarios [41][42]. - The application of SGLang in WeChat's search function was highlighted, emphasizing the need for high throughput and low latency in user experience [44]. Future Directions - The roadmap for SGLang includes further integration with various hardware and software solutions, aiming to enhance stability and compatibility across different platforms [22][35]. - The Specforge framework, developed by the SGLang team, aims to accelerate large language model inference and has been adopted by major companies like Meituan and NVIDIA [57][58].
A16Z最新洞察:视频模型从狂飙到分化,产品化是下一个机会
3 6 Ke· 2025-10-28 00:18
Core Insights - The video generation model industry is transitioning from a phase of rapid performance improvement to a "product era," focusing on diversity and specialization rather than just model parameters and benchmark scores [2][4][12] - There is a growing realization that no single model can dominate all video generation tasks, leading to a trend of specialization where different models excel in specific areas [4][11][12] - The need for better integrated products to simplify the creative process is becoming increasingly apparent, as many creators still rely on multiple tools to achieve their desired outcomes [13][15][16] Group 1: Industry Trends - The pace of progress in video generation models has slowed, with most mainstream models now capable of generating impressive 10-15 second videos with synchronized audio [1][6] - The concept of a "superior model" in the video domain is being challenged, as recent releases like Sora 2 have not consistently outperformed predecessors like Veo 3 [4][11] - The industry is witnessing a shift towards models that are tailored for specific capabilities, such as physical simulation and multi-shot editing, rather than one-size-fits-all solutions [2][11][12] Group 2: Product Development - The current landscape shows that while video generation capabilities have improved, the corresponding product development has not kept pace, leading to a gap in user experience and creative efficiency [13][15] - Companies are beginning to address this gap by developing tools that allow users to modify video elements more intuitively, such as Runway's suite of tools and OpenAI's Sora Storyboard [15][16] - The future is expected to see more specialized models for specific industries or scenarios, along with comprehensive creative toolkits that integrate various media elements into a cohesive workflow [16]
上海普陀聚侨智赋能区域协同发展 侨界人才研修营收官
Zhong Guo Xin Wen Wang· 2025-10-24 11:45
圆桌交流环节围绕"海外高层次人才如何在沿沪宁产业创新带扎根成长""产业创新与区域融合中的海外 人才力量"两大主题,展开深入讨论。上海普陀侨界青年、墨泉生物创始人秦楚汉分享了沿沪宁区域协 同的创业经历,从企业成长视角探讨普陀营商环境优势与产业配套支持,为学员提供可借鉴的落地发展 案例。 中新网上海10月24日电(范宇斌)近日,由上海市普陀区侨办、普陀区人才局、普陀区侨联主办,江苏省 南通市侨联、泰州市侨联协办的"侨连沪宁·智创未来"——侨界高层次人才普陀研修营收官。 本次研修营汇聚来自上海普陀、南通、泰州三地的30位侨界人才,展现沿沪宁产业创新带人才荟萃的优 势。学员专业领域覆盖智能制造、新材料、生物科技等前沿产业,90%具备硕士及以上学历。研修营旨 在加强三地侨界人才交流合作,为沿沪宁产业创新带建设注入新的活力。 活动现 场。 主办方供图 研修营的课程内容兼具理论深度与实践导向,助力学员构建系统性认知框架。上海长三角现代产业发展 促进会秘书长李昌浩以《上海及长三角"十五五"规划展望》为题,解析区域产业发展新机遇;普陀区侨 联主席、华东师范大学研究生院院长吕岳围绕《人工智能与大语言模型》,探讨技术驱动下的产业变 ...
美股异动|阿里巴巴一度涨超2.8%,夸克AI眼镜即将开启预售
Ge Long Hui· 2025-10-23 14:28
Core Viewpoint - Alibaba's stock (BABA.US) experienced an intraday increase of over 2.8%, reaching a peak of $170.6, driven by the announcement of the pre-sale of its Quark AI glasses [1] Product Launch - Alibaba's Quark AI glasses will begin pre-sale at midnight on the 24th, with a starting price of 3,699 yuan [1] - The glasses are powered by Alibaba's self-developed Qwen large language model and Quark AI assistant, featuring functionalities such as hands-free calls, music playback, and real-time translation [1] - The product is expected to start shipping in December [1]
硅谷预言家凯文·凯利:以“进托邦”思维拥抱AI时代
Core Insights - The importance of optimism in the AI era is emphasized, suggesting that believing in a better future is crucial for innovation and progress [1][2][7] - The concept of "Protopia" is introduced, indicating a gradual improvement in society rather than a perfect utopia [2][6] AI Development and Future - The development of artificial intelligence (AI) is seen as a slow process, requiring new approaches beyond merely increasing the size of language models [2][3] - AI is expected to create a vast ecosystem of intelligent agents that will perform various tasks and interact with each other, potentially leading to an economy larger than that of humans [3][4] Human-AI Collaboration - AI is viewed as a tool for empowerment rather than replacement, enhancing human capabilities in various fields such as customer service and complex problem-solving [4][5] - The future workforce will depend on the ability to collaborate with AI, with new job roles emerging in AI maintenance and emotional support [4][5] China's Role in AI and Innovation - China is predicted to transition from a "student" to a "teacher" in the global innovation landscape, leading in areas such as gaming, autonomous vehicles, and AI chip manufacturing [5][6] - The educational system in China needs to adapt to foster critical thinking and creativity, preparing students for jobs that do not yet exist [5][6] Embracing Failure and Continuous Learning - Embracing failure is highlighted as essential for innovation, with a call for educational systems to teach resilience alongside success [6][7] - Lifelong learning is emphasized as a necessity for future job markets, where adaptability will be key [6][7]
现在,最会赚钱的AI是Qwen3,全球六大模型厮杀,Top 2来自中国
3 6 Ke· 2025-10-23 12:49
Core Insights - Qwen3 Max has emerged as the leading model in the AI trading competition, surpassing DeepSeek and achieving significant profitability [1][32] - The competition, Alpha Arena, showcases the capabilities of various AI models in real market conditions, emphasizing the financial market as a training ground for AI [30][32] Performance Summary - Qwen3 Max achieved a return of +44.38%, with an account value of $14,438 and total profit of $4,438 [11] - DeepSeek V3.1 follows with a return of +20.92%, account value of $12,092, and total profit of $2,092 [11] - Other models, such as Claude 4.5 Sonnet, Grok 4, Gemini 2.5 Pro, and GPT-5, reported negative returns, with GPT-5 showing the largest loss at -71.48% [10][11] Competition Dynamics - The competition began on October 18 and has seen Qwen3 Max steadily improve its position, particularly after a significant drop in all models on October 22 [22][24] - Qwen3 Max's strategy has been characterized as "quick and precise," allowing it to capitalize on market opportunities effectively [8][32] - The competition has highlighted the contrasting performance of models, with Qwen3 Max and DeepSeek being the only two models consistently performing well [22][24] Market Implications - The success of Qwen3 Max indicates the growing competitiveness of Chinese AI models in the global market, particularly in high-risk financial environments [33] - The Alpha Arena competition serves as a demonstration of how AI can adapt and thrive in real-world financial scenarios, reinforcing the notion that financial markets are ideal for AI training [30][32]
6800万美元,清华、北大、上海交大多位校友获奖,亚马逊AI博士奖学金公布
机器之心· 2025-10-23 07:45
Group 1 - Amazon has announced the recipients of its AI PhD Scholarship, funding over 100 PhD students from nine universities to research machine learning, computer vision, and natural language processing [1] - The participating universities include CMU, Johns Hopkins University, MIT, Stanford University, UC Berkeley, UCLA, University of Illinois Urbana-Champaign, University of Texas at Austin, and University of Washington [1] - The program will provide $10 million in funding for the academic years 2025-2026 and 2026-2027, along with an additional $24 million in Amazon Web Services (AWS) cloud credits each year, totaling $68 million over two years [2] Group 2 - Several universities have already announced their selected PhD candidates, including notable Chinese scholars [3] - Jenny Huang from MIT focuses on data-driven machine learning and uncertainty quantification [4][6] - David Jin from MIT is interested in scalable computing and AI-driven decision systems [8][6] - Songyuan Zhang from MIT is researching safe multi-agent systems and intelligent assistive robots [11][6] Group 3 - Yuxiao Qu from CMU aims to endow AI agents with human-like curiosity to advance scientific research [12][14] - Danqing Wang from CMU is working on integrating safety and functionality into training for reliable AI agents [15][17] - Mengdi Wu from CMU focuses on machine learning for optimizing computational kernel strategies [18][20] Group 4 - Dacheng Li from UC Berkeley is developing efficient AI and artificial worlds through visual and text generation models [34][36] - Hao Wang from UC Berkeley is researching practical secure code generation through controlled reasoning [37][39] - Melissa Pan from UC Berkeley is interested in sustainability in large-scale machine learning and data center systems [40][42] Group 5 - Haoyu Li from UT Austin is utilizing AI to enhance modern system performance and availability [49][51] - Junbo Li from UT Austin is focused on agentic large language models and reinforcement learning [52][54] - Kaizhao Liang from UT Austin is researching efficient training methods and sparse neural networks [56][58] Group 6 - Zeping Liu from UT Austin is advancing geospatial AI research with a focus on geographic foundational models [59][61] - Haoran Xu from UT Austin is expanding reinforcement learning methods and integrating generative AI [62][64] - Chutong Yang from UT Austin is interested in algorithm design and analysis in trustworthy machine learning [65][67] Group 7 - Xiao Zhang from UT Austin is focusing on networked and distributed systems to achieve predictable AI performance in 5G edge environments [68][69] - The list of awardees will continue to be updated as more universities announce their recipients [70]