Workflow
Tongyi DeepResearch
icon
Search documents
大厂数据护城河打破!上交全开源Search Agent OpenSeeker登场
机器之心· 2026-03-31 12:19
Core Insights - OpenSeeker, developed by a research team from Shanghai Jiao Tong University, is the first fully open-source deep search agent with complete training data, breaking the data monopoly held by large companies [2][28]. - The model demonstrates that high-quality data synthesis can achieve state-of-the-art (SOTA) performance without relying on extensive computational resources [2][28]. Group 1: Model Development - OpenSeeker utilizes a unique high-quality data synthesis approach to overcome the data bottleneck typically faced by large enterprises [6][28]. - The model requires only 11.7k synthetic samples for a single round of supervised fine-tuning (SFT) to achieve competitive results on various benchmarks [17][28]. Group 2: Training Methodology - The training of deep search agents hinges on two critical aspects: creating challenging question-answer tasks and generating high-quality solution trajectories [7][8]. - OpenSeeker employs a fact-based question construction method using real web structures to ensure the model engages in genuine multi-hop reasoning [9][10][11]. - A dynamic denoising trajectory synthesis method is introduced to enhance core information extraction in noisy environments [12][15]. Group 3: Performance Metrics - OpenSeeker achieved a score of 48.4% on the BrowseComp-ZH leaderboard, surpassing Alibaba's Tongyi DeepResearch, which scored 46.7% after extensive training [17][18]. - The model's performance across multiple benchmarks includes 29.5 on BrowseComp, 48.4 on BrowseComp-ZH, 74.0 on xbench, and 59.4 on WideSearch [18]. Group 4: Data Quality and Challenges - The synthetic data generated by OpenSeeker presents a significantly higher difficulty level compared to existing benchmarks, with an average of 46.35 tool calls per trajectory and an average token length of 76.1k [25][20]. - In controlled data volume comparisons, OpenSeeker's data quality is notably superior to that of Alibaba's models, maintaining a significant advantage across various metrics [20][21]. Group 5: Community Impact - The open-source release of OpenSeeker is seen as a pivotal moment for advancing the field, providing researchers with a solid foundation for exploring next-generation search agents [24][28]. - The community response highlights the importance of data transparency and the ability to innovate without the constraints of data gatekeeping [26][29].
xbench榜单更新!DeepSeek V3.2追平GPT-5.1|xbench月报
红杉汇· 2025-12-05 00:06
Core Insights - The latest xbench-ScienceQA leaderboard has been released, showcasing new models from six companies, with Gemini 3 Pro achieving state-of-the-art (SOTA) performance and DeepSeek V3.2 matching GPT-5.1 in scores while offering high cost-effectiveness [1][2][6] - xbench will introduce two new benchmarks to evaluate agent instruction-following capabilities and multimodal understanding of models [1] Model Performance Summary - **Gemini 3 Pro**: Scored 71.6, up from 59.4 in Gemini 2.5 Pro, with a BoN of 85. Average response time is 48.62 seconds. Cost for answering 500 questions is approximately $3 [3][6] - **DeepSeek V3.2**: Achieved a score of 62.6, matching GPT-5.1, with a BoN of 81. The cost for 500 questions is only $2 for the Speciale version and $1.3 for the Thinking version [6] - **Claude Opus 4.5**: Scored 55.2 with a fast average response time of 13 seconds, showing improvement over its predecessor [6] - **Kimi K2 Thinking**: Scored 51.8 with a BoN of 76, indicating a slight improvement [6] New Model Developments - **DeepSeek V3.2**: Introduces a Sparse Attention mechanism to enhance long-context performance while reducing computational complexity. It also features a scalable reinforcement learning framework to improve reasoning and instruction-following capabilities [10][12] - **Gemini 3**: A new multimodal model from Google DeepMind, excelling in reasoning depth and multimodal understanding, achieving a top score of 1501 Elo in LMArena [13] - **Nano Banana Pro**: A new image generation model that integrates advanced reasoning capabilities with real-time knowledge, allowing for complex image synthesis [14] - **Claude Opus 4.5**: A flagship model from Anthropic that excels in code generation and human-computer interaction, achieving high performance in real-world software engineering tasks [15][16] - **GPT-5.1**: An important iteration from OpenAI that enhances conversational fluency and complex task reasoning, introducing adaptive reasoning mechanisms [17] - **Tongyi DeepResearch**: Designed for deep research tasks, this model combines mid-training and post-training frameworks to enhance agent capabilities, achieving competitive performance with a smaller model [19]
Alibaba Cloud's AI Boom: Can This Momentum Drive Even Higher Growth?
ZACKS· 2025-09-22 18:15
Core Insights - Alibaba's cloud computing unit experienced a significant growth of 26% year-over-year in Q1 of fiscal 2026, reflecting strong enterprise adoption of its AI models [1][9] - The company is making substantial investments in AI infrastructure, which has led to increased capital expenditures and cash outflows, positioning it to benefit from the growing AI market [2][9] - Recent innovations in Alibaba's AI model portfolio, including the open-sourcing of Wan2.2-Animate and the release of Tongyi DeepResearch, are enhancing its AI ecosystem [3][4] Financial Performance - The cloud division's growth is attributed to technological advancements and strategic partnerships, such as collaboration with S&P Global to provide AI-ready datasets [4] - Despite the strong growth, the company faces challenges in maintaining profitability due to price reductions in AI infrastructure aimed at user acquisition [5] - Alibaba's shares have gained 94.2% year-to-date, outperforming the broader Internet-Commerce industry, which grew by 14.9% [9][10] Competitive Landscape - While Alibaba's cloud growth is notable, global competitors like Microsoft and Amazon have significant advantages, with Microsoft Azure achieving 39% growth and Amazon Web Services at 17.5% [7] - Microsoft and Amazon have deeper capital resources and established enterprise relationships, which pose challenges for Alibaba's international expansion efforts [7] Valuation and Estimates - Alibaba's stock is currently trading at a forward Price/Earnings ratio of 17.41X, compared to the industry's 25.33X, indicating a potential undervaluation [13] - The Zacks Consensus Estimate for fiscal 2026 earnings is $8.09 per share, reflecting a 10.21% year-over-year decline [16]