Workflow
Qwen2.5
icon
Search documents
普元信息:截至目前公司产品已接入Qwen2.5、Qwen3.0、QwQ-32B等开源模型
Ge Long Hui· 2025-11-26 09:41
格隆汇11月26日丨普元信息(688118.SH)在投资者互动平台表示,公司相关产品与阿里云专有云产品通 过产品生态集成认证。截至目前公司产品已接入Qwen2.5、Qwen3.0、QwQ-32B等开源模型。 ...
普元信息:公司产品已接入Qwen2.5、Qwen3.0、QwQ-32B等开源模型
Mei Ri Jing Ji Xin Wen· 2025-11-26 09:41
(文章来源:每日经济新闻) 每经AI快讯,有投资者在投资者互动平台提问:普元信息与阿里系的业务合作体现在哪些方面?其产 品通过阿里云专有云生态集成认证且接入Qwen2.5等开源模型吗? 普元信息(688118.SH)11月26日在投资者互动平台表示,公司相关产品与阿里云专有云产品通过产品 生态集成认证。截至目前,公司产品已接入Qwen2.5、Qwen3.0、QwQ-32B等开源模型。 ...
普元信息(688118.SH):截至目前公司产品已接入Qwen2.5、Qwen3.0、QwQ-32B等开源模型
Ge Long Hui· 2025-11-26 09:40
格隆汇11月26日丨普元信息(688118.SH)在投资者互动平台表示,公司相关产品与阿里云专有云产品通 过产品生态集成认证。截至目前公司产品已接入Qwen2.5、Qwen3.0、QwQ-32B等开源模型。 ...
淘宝终于对搜索动刀了
虎嗅APP· 2025-11-11 23:53
Core Viewpoint - The article discusses the rapid evolution of AI in the e-commerce sector, particularly focusing on Alibaba's Taobao platform, highlighting the integration of AI tools to enhance user experience and operational efficiency during the 2024 Double Eleven shopping festival [4][11][30]. Group 1: AI Integration in E-commerce - Taobao's AI tools, such as "Xiao Wan Assistant," have significantly improved sales performance, with some brands reporting over 35% increase in orders after implementing AI-driven strategies [4][11]. - The establishment of the "Search and Promotion Intelligent Product Division" under the leadership of Zhang Kaifu marks a strategic shift towards AI-driven enhancements in Taobao's search and recommendation systems [7][12]. - The urgency for improving user experience in search functionalities was driven by customer feedback on social media, indicating a need for immediate action to address dissatisfaction [8][10]. Group 2: Challenges and Strategic Focus - The search experience issues stem from 22 years of accumulated complexities in Taobao's search engine, necessitating a comprehensive upgrade involving business, technology, and supply chain collaboration [9][19]. - The team identified three key focus areas for AI evolution by 2025: upgrading search and promotion systems, enhancing efficiency for merchants, and launching new AI-driven shopping products for consumers [16][21]. - The transition to AI-driven systems requires a complete overhaul of the existing product database to ensure compatibility with AI technologies, which is a significant undertaking [20][21]. Group 3: Organizational Changes and Talent Development - The internal structure has shifted to support AI initiatives, with a focus on creating flexible project teams that can innovate without being constrained by traditional metrics [24][25]. - A significant recruitment drive has targeted young talent from local universities, emphasizing the importance of nurturing creativity and technical skills in the AI domain [26][27]. - The company has implemented a systematic training approach for new hires, ensuring they are equipped with the necessary skills to contribute effectively to AI projects [27]. Group 4: Performance Metrics and Future Outlook - As of November 8, 2024, the AI-driven search and promotion capabilities have led to a 12% increase in advertising ROI and a 20% improvement in search relevance under complex queries [29][30]. - Despite initial successes, challenges remain in educating traditional merchants about AI tools, indicating a need for ongoing support and training [31]. - The company views AI as a long-term strategic focus, with plans for increased investment and further development of AI capabilities in the coming years [32][33].
清华唐杰新作:大模型能打掼蛋吗?
量子位· 2025-09-10 10:01
Core Viewpoint - The research indicates that large models can effectively play various card games, demonstrating their capabilities in complex decision-making scenarios [2][4][52]. Group 1: Model Performance - Different models exhibit varying performance across different card games, with fine-tuned models showing superior results compared to API-based and base models [3][40]. - Among the API-based models, GPT-4o performs the best overall, while GLM-4 demonstrates strong capabilities in games like DouDizhu and GuanDan [39][40]. - Fine-tuned models, particularly GLM4-9B-Chat-mix, excel in multiple games, including DouDizhu, GuanDan, and Uno, indicating their versatility [42][40]. Group 2: Game Selection and Learning Methodology - The research team selected eight popular card games based on their complexity and the availability of high-quality models and data [8]. - The learning process involved generating high-quality interaction data through teacher models and opponents, allowing the large language models to learn effectively [14][16]. - The complexity of the games influenced the number of training instances collected, with more complex games like DouDizhu and GuanDan requiring larger datasets [20][21]. Group 3: Inter-Game Influence - The study found that models trained on similar games can enhance each other's performance, while those trained on games with significant rule differences may experience performance conflicts [52][49]. - For instance, models trained on GuanDan showed good performance in DouDizhu, suggesting a positive transfer of skills between these games [45]. Group 4: Generalization and Capability - The research indicates that while training on card games, the general capabilities of the models may decline, but this can be mitigated by incorporating general data into the training process [56][54]. - The mixed training approach allowed for some recovery of general capabilities, demonstrating the balance between specialized game skills and broader knowledge [56].
自搜索强化学习SSRL:Agentic RL的Sim2Real时刻
机器之心· 2025-09-02 01:27
Core Insights - The article discusses the development and effectiveness of SSRL (Structured Search Reinforcement Learning) in enhancing the training efficiency and stability of Search Agents using large language models (LLMs) [6][28] - SSRL demonstrates superior performance over traditional methods that rely on external search engines, achieving effective transfer from simulation to real-world applications (Sim2Real) [6][28] Group 1 - SSRL utilizes structured prompts and format rewards to effectively extract world knowledge from models, leading to improved performance across various benchmarks and reduced hallucination [2][6] - The research highlights the high costs and inefficiencies associated with current RL training methods for Search Agents, which include full-real and semi-real search approaches [7][13] - The introduction of SSRL allows for a significant increase in training efficiency, estimated at approximately 5.6 times, while maintaining a continuous increase in training rewards without collapse [31][32] Group 2 - Experiments show that models trained with SSRL outperform those relying on external engines, particularly in real-world search scenarios, indicating the importance of integrating real-world knowledge [28][31] - The article presents findings that suggest the combination of self-generated knowledge and real-world knowledge can enhance model performance, particularly through entropy-guided search strategies [34] - The integration of SSRL with TTRL (Task-Driven Reinforcement Learning) has shown to improve generalization and effectiveness, achieving up to a 67% performance increase in certain tasks [38][39]
吴伟:中国科技崛起吹响AI平权的号角
Huan Qiu Wang Zi Xun· 2025-09-01 22:53
Group 1 - The 2025 Global AI Influence List by Time magazine features several Chinese entrepreneurs and scholars, indicating a significant increase in representation and diversity compared to previous years [1] - The rise of Chinese figures on the list reflects the rapid development of China's AI industry and its increasing presence on the international stage, as well as the global trend of "de-geographicalization" in technology [1] - The open-source technology path taken by DeepSeek contributes to a more inclusive global technology landscape, enhancing the openness and participation of the AI industry [1] Group 2 - Southeast Asia is actively seizing opportunities from the "de-geographicalization" wave in AI, with the region's digital economy projected to reach $2 trillion by 2030, and the AI market expected to exceed $580 billion [2] - Countries like Singapore, Malaysia, and Indonesia are implementing national AI strategies and attracting significant investments from major tech companies, indicating a shift towards technological self-sufficiency [2] - The rise of local innovation in developing countries is seen as a way to dismantle external technological monopolies and empower these nations as creators of AI technology [2] Group 3 - Despite the concentration of top AI talent in the U.S., Chinese talent now accounts for 38% of the top AI research institutions in the U.S., surpassing the 37% of local talent [3] - The increase in homegrown talent and the return of overseas scholars signal a promising future for China's talent strategy focused on local cultivation and talent repatriation [3] - China's AI industry is characterized by a systematic innovation paradigm driven by top-level policies, autonomous innovation, and a commitment to long-termism [3] Group 4 - The performance gap between Chinese and U.S. large models has dramatically decreased from 17.5% in 2023 to just 0.3% [4] - China's unique advantages in open-source ecosystem development and vertical application innovation have contributed to this rapid advancement [4] - The success of China's AI rise is attributed to the establishment of an open, symbiotic ecosystem that fosters talent and continuous innovation, providing a valuable model for global AI development [4]
阿里巴巴开源三款大模型性能比肩国际顶尖
Sou Hu Cai Jing· 2025-08-21 00:10
Core Insights - Alibaba has open-sourced three self-developed large models—Qwen2.5, Qwen2-VL, and Qwen-Audio, marking a significant advancement in AI technology and showcasing China's capabilities in foundational AI technologies [2][4][6] Technical Breakthroughs - The three models cover text, visual, and audio domains, forming a "full-modal" technology matrix. Qwen2.5 matches or surpasses international benchmarks like GPT-4 in tasks such as mathematical reasoning and code generation, achieving an accuracy of 87.3% in the MMLU benchmark [4] - Qwen2-VL focuses on multi-modal visual understanding, featuring dynamic resolution adaptation technology that enhances efficiency by 40% when processing high-resolution medical images or complex scenes, achieving 78.6% accuracy on the VQAv2 dataset [4] - Qwen-Audio innovates in audio processing, improving recognition accuracy in noisy environments to 92%, a 15 percentage point increase from the previous model, supporting applications in intelligent customer service and accessibility [4] Open Source Ecosystem - Alibaba's decision to fully open-source model weights and training codes, along with detailed technical white papers, contrasts with the closed-source API model of some international companies. The Qwen series received over 20,000 stars on GitHub within 48 hours, with developers from 120 countries submitting over 3,000 optimization suggestions [5] - This open-source strategy is seen as a new paradigm for tech giants to build technological moats, attracting global developers to optimize the ecosystem while enabling value conversion through cloud services and customized development [5] Industry Impact - The open-sourcing of these models signifies a shift in China's AI technology from "following innovation" to "leading breakthroughs," establishing a competitive stance against Western technologies in foundational AI models [6] - The flourishing of the open-source ecosystem is expected to accelerate the democratization of AI technology, allowing developers in regions like Africa and Southeast Asia to create localized solutions, thereby reshaping the global AI industry landscape [6]
ARPO:智能体强化策略优化,让Agent在关键时刻多探索一步
机器之心· 2025-08-09 06:02
Core Viewpoint - The article introduces a novel method called Agentic Reinforced Policy Optimization (ARPO), designed to enhance the performance of large language models (LLMs) in multi-round interactions by addressing the challenges of uncertainty and exploration during tool usage [3][41]. Group 1: Research Motivation and Background - The emergence of Agentic Reinforcement Learning (RL) is driven by the need for LLMs to engage in dynamic multi-round interactions with external tools, moving from static problem-solving to a more interactive agent-environment reasoning paradigm [8]. - Existing Agentic RL methods often underestimate the value of multi-round interactions due to sparse rewards and overuse of tools, leading to a lack of fine-grained exploration of tool usage [8][41]. - The study identifies a significant increase in entropy (uncertainty) after tool calls, indicating an opportunity for exploration that current methods do not fully leverage [14][16]. Group 2: ARPO Methodology - ARPO introduces an entropy-driven adaptive rollout strategy that enhances exploration during high-entropy tool usage phases, allowing for more diverse reasoning paths [11][20]. - The method includes four key steps: initialization of global rollout, monitoring entropy changes, adaptive branching based on entropy, and defining termination conditions for the rollout process [24][27]. - ARPO incorporates advantage attribution estimation to help the model better internalize the value differences in tool usage at each step [28][30]. Group 3: Experimental Results - ARPO outperforms existing sample-level RL methods, achieving better performance with only half the tool call budget across 13 challenging benchmarks, demonstrating its efficiency in training multi-round reasoning agents [21][41]. - The method shows consistent improvements in performance metrics such as Pass@3 and Pass@5, particularly in dynamic, multi-round tasks [37][39]. - In comparative tests, ARPO achieves higher accuracy than GRPO and DAPO in various tasks, including deep search and knowledge-intensive reasoning [41][42]. Group 4: Future Directions - Future research may explore the application of ARPO in multi-modal tasks, expanding its capabilities beyond text-based reasoning to include images and videos [42]. - There is potential for integrating a broader range of external tools to enhance complex task performance through optimized tool usage strategies [42]. - The scalability and real-time deployment of ARPO in larger models and dynamic environments could further improve its practical value and cost-effectiveness [42].
监督学习未死,一题训练五小时起飞!华人学者新方法20倍训练效率释放大模型推理能力
量子位· 2025-08-04 07:00
Core Viewpoint - The article discusses the breakthrough of One-Shot Critique Fine-Tuning (One-Shot CFT) in enhancing reasoning capabilities of large language models (LLMs) with minimal data and computational resources, outperforming traditional reinforcement learning (RL) methods and small-scale supervised fine-tuning (SFT) approaches [1][3][14]. Group 1: One-Shot CFT Methodology - One-Shot CFT is a new method that allows models to learn reasoning by analyzing the quality of answers rather than merely imitating them, thus providing a deeper learning signal [3][12]. - The process involves selecting a representative task, generating multiple answers using various models, and then having a more powerful model critique these answers, which serves as the supervision signal for training [4][5]. - The entire training process requires only one question, multiple answers, and critiques, taking approximately 5 GPU hours, significantly less than RL methods [5][14]. Group 2: Performance and Results - In experiments, Qwen2.5-Math-7B achieved a 15% accuracy increase after One-Shot CFT fine-tuning on a single question, surpassing both RL and full supervised fine-tuning models that used tens of thousands of training samples [9][10]. - The method demonstrated strong performance across various mathematical and logical reasoning tasks, with accuracy improvements ranging from 10% to 16% in specific sub-tasks [10][11]. - One-Shot CFT showed stability and reproducibility across different tasks and model configurations, indicating its robustness [11][13]. Group 3: Advantages of One-Shot CFT - The method emphasizes critical learning, allowing models to understand why answers are correct or incorrect, which enhances the depth of learning compared to traditional SFT [12]. - It introduces multi-perspective inputs by generating multiple answers and critiques for a single task, closely mimicking human learning processes [12]. - The training signals from critiques are highly generalizable, reducing the risk of overfitting and allowing for easier transfer to new tasks [12]. Group 4: Accessibility and Practical Implications - One-Shot CFT's low computational cost makes it accessible for individual researchers, resource-limited labs, and startups, providing a cost-effective solution for enhancing reasoning capabilities [14][15]. - The entire process is open-source, including training scripts, model parameters, and datasets, which significantly lowers the barrier for replication and experimentation [17].