Workflow
开源模型
icon
Search documents
阿里一口气发了N款新模型,让我们向源神致敬。
数字生命卡兹克· 2025-09-24 05:28
Core Viewpoint - Alibaba's recent cloud conference showcased a comprehensive range of new AI models, indicating a significant investment in AI technology and a commitment to building a robust AI ecosystem [1][64]. Group 1: New Model Releases - The Qwen3-Max model was introduced as a direct competitor to top models like GPT-5 and Claude Opus 4, featuring over 1 trillion parameters and trained on 36 trillion tokens [3][6]. - Qwen3-Max has two versions: the Instruct version for general use and a more advanced Thinking version, which is not yet publicly available [8][15]. - The Wan2.5 model was launched, enhancing capabilities for audio-visual synchronization, allowing users to generate videos from images and audio [20][32]. - Qwen3-VL, a powerful visual language model, supports a context of 256K tokens and can be extended to 1 million tokens, outperforming some competitors in specific tasks [33][37]. - Qwen3-Omni, an end-to-end multimodal model, supports various input types and languages, showcasing Alibaba's extensive capabilities in AI [45][48]. Group 2: Performance and Capabilities - Qwen3-Max achieved top scores in various AI benchmarks, including a perfect score in challenging math reasoning competitions [11][15]. - The models demonstrate advanced reasoning and agent capabilities, allowing them to perform complex tasks and interact with tools effectively [40][41]. - The new models are designed to enhance user experience in applications such as digital content creation and real-time translation, with low latency and high accuracy [49][59]. Group 3: Additional Innovations - Alibaba introduced several other models, including Qwen3-Coder-Plus for improved coding efficiency and Fun-ASR for advanced speech recognition [54][57]. - The company is also focusing on safety with models like Qwen3Guard, aimed at ensuring AI security in real-time applications [60]. - The overall strategy reflects Alibaba's ambition to create a comprehensive AI ecosystem that spans various modalities and applications [68][70].
谈超级人工智能之路,吴泳铭称阿里目标是打造AI时代的操作系统
Di Yi Cai Jing· 2025-09-24 03:29
其次,他判断,AI Cloud是下一代计算机。算力正在从以CPU为核心的计算加速转变为GPU为核心、以 大模型驱动的AI计算,新的计算范式需要更稠密的算力、更高效的网络和更大的集群规模,需要超大 规模的基础设施和全栈基础积累才能承载这样的需求。他认为,未来全世界也许只会有5到6个超级云计 算平台。 AGI并不是终点,吴泳铭认为,AI会经历三个阶段最终成长为超级人工智能。第一阶段是智能涌现,AI 学习人;第二阶段是AI自主行动,辅助人,我们刚刚处在这个阶段的开端,未来也许会有超过世界人 口的智能体和机器人和人类一起工作;第三个阶段是自我迭代,超越人,跨越到这个阶段需要两个要 素,AI将逐步连接几乎物理世界的所有场景和数据,模型能够自我学习、通过与真实世界的持续交互 获得新的数据实现自我迭代与智能升级。 在通往这个变革的路上,吴泳铭作出了一些预测。首先,他认为大模型将是下一代操作系统,在未来物 理世界与数字世界的交互中,大模型扮演今天操作系统的地位。各行各业、所有用户都会通过大模型相 关的工具执行任务,自然语言可能就是未来AI时代的编程语言。 吴泳铭相信,未来大模型将运行在所有计算设备中,基于此,阿里巴巴坚持开源 ...
阿里云CTO周靖人:通义千问已开源300+模型,累计下载量超6亿
Xin Lang Ke Ji· 2025-09-24 02:59
新浪科技讯 9月24日上午消息,在2025云栖大会上,阿里云智能集团首席技术官周靖人分享中透露,截 至目前,通义千问已累计开放300+开源模型,覆盖全尺寸、全模态模型,开源模型下载量已经突破6亿 +。 责任编辑:江钰涵 大会上,阿里云还新发布了Qwen3-VL等多款模型。据周靖人透露,通义万象目前已经生成了超过3.9亿 张图片,7000万多个视频。(文猛) ...
为 OpenAI 秘密提供模型测试, OpenRouter 给 LLMs 做了套“网关系统”
海外独角兽· 2025-09-23 07:52
Core Insights - The article discusses the differentiation of large model companies in Silicon Valley, highlighting OpenRouter as a key player in model routing, which has seen significant growth in token usage [2][3][6]. Group 1: OpenRouter Overview - OpenRouter was established in early 2023, providing a unified API Key for users to access various models, including mainstream and open-source models [6]. - The platform's token usage surged from 405 billion tokens at the beginning of the year to 4.9 trillion tokens by September, marking an increase of over 12 times [2][6]. - OpenRouter addresses three major pain points in API calls: lack of a unified market and interface, API instability, and balancing cost with performance [7][9]. Group 2: Model Usage Insights - OpenRouter's model usage reports have sparked widespread discussion in the developer and investor communities, becoming essential reading [3][10]. - The platform provides insights into user data across different models, helping users understand model popularity and performance [10]. Group 3: Founder Insights - Alex Atallah, the founder of OpenRouter, believes that the large model market is not a winner-takes-all scenario, emphasizing the need for developers to control model routing based on their requests [3][18]. - Atallah draws parallels between OpenRouter and his previous venture, OpenSea, highlighting the importance of integrating disparate resources into a cohesive platform [19][20]. Group 4: OpenRouter Functionality - OpenRouter functions as a model aggregator and marketplace, allowing users to manage over 470 models through a single interface [31]. - The platform employs intelligent load balancing to route requests to the most suitable providers, enhancing reliability and performance [37]. - OpenRouter aims to empower developers by providing a unified view of model access, allowing them to choose the best models based on their specific needs [34][35]. Group 5: Future Directions - OpenRouter is exploring the potential of personalized models based on user prompts while ensuring user data remains private unless opted in for recording [52][55]. - The platform aims to become the best reasoning layer for agents, providing developers with the tools to create intelligent agents without being locked into specific suppliers [58][60].
朱啸虎:搬离中国,假装不是中国AI创业公司,是没有用的
Hu Xiu· 2025-09-20 14:15
Group 1 - The discussion highlights the impact of DeepSeek and Manus on the AI industry, emphasizing the importance of open-source models in China and their potential to rival closed-source models in the US [3][4][5] - The conversation indicates that the open-source model trend is gaining momentum, with Chinese models already surpassing US models in download numbers on platforms like Hugging Face [4][5] - The competitive landscape is shifting towards "China's open-source vs. America's closed-source," with the establishment of an open-source ecosystem being beneficial for China's long-term AI development [6][7] Group 2 - Manus is presented as a case study for Go-to-Market strategies, illustrating that while Chinese entrepreneurs have strong product capabilities, they often lack effective market entry strategies [10][11] - Speed is identified as a critical barrier for AI application companies, with the need to achieve rapid growth to outpace competitors [11][12] - Token consumption is discussed as a significant cost indicator, with Chinese companies focusing on this metric due to lower willingness to pay among domestic users [12][13][14] Group 3 - The AI coding sector is characterized as a game dominated by large companies, with high token costs making it challenging for startups to compete effectively [15][16] - The conversation suggests that AI coding is not a viable area for startups due to the lack of customer loyalty among programmers and the high costs associated with token consumption [16][18] - Investment in vertical applications rather than general-purpose agents is preferred, as the latter may be developed by model manufacturers themselves [20] Group 4 - The discussion on robotics emphasizes investment in practical, value-creating robots rather than aesthetically pleasing ones, with examples of successful projects like a boat-cleaning robot [21][22] - The importance of combining functionality with sales capabilities in robotic applications is highlighted, as this can lead to a more favorable ROI [22][23] Group 5 - The conversation stresses the need for AI hardware companies to focus on simplicity and mass production rather than complex features, as successful hardware must be deliverable at scale [28][29] - The potential for new hardware innovations in the AI era is questioned, with a belief that significant breakthroughs may still be years away [30][31] Group 6 - The dialogue addresses the challenges of globalization for Chinese companies, noting that successful market entry in the US requires a deep understanding of local dynamics and compliance [36][37] - The importance of having a local sales team for B2B applications in the US is emphasized, as relationships play a crucial role in sales success [38][39] Group 7 - The conversation highlights the risks associated with high valuations, which can limit a company's flexibility and increase pressure for performance [42][43] - The discussion suggests that IPOs for Chinese companies may increasingly occur in Hong Kong rather than the US, as liquidity issues persist in the market [46][48] Group 8 - The need for startups to operate outside the influence of large companies is emphasized, with a call for rapid growth and innovation in the AI sector [49][53] - The potential for AI startups to achieve significant scale quickly is acknowledged, but the conversation warns that the speed of evolution in the AI space may outpace traditional exit strategies [52][53]
超强开源模型Qwen3、DeepSeek-V3.1,都被云计算一哥「收」了
机器之心· 2025-09-19 10:43
Core Insights - Amazon Web Services (AWS) is enhancing its AI capabilities by integrating new models into its Amazon Bedrock and Amazon SageMaker platforms, allowing users to choose from a diverse range of AI models [2][5][39] - The recent addition of two significant domestic models, Qwen3 and DeepSeek-V3.1, showcases AWS's commitment to providing a comprehensive ecosystem for AI development [3][7][11] - AWS emphasizes the importance of model choice, asserting that no single model can address all challenges, and advocates for a multi-model approach to meet complex real-world demands [5][39] Summary by Sections Model Integration - AWS has recently integrated OpenAI's new open-source models into its AI platforms, alongside the domestic models Qwen3 and DeepSeek-V3.1, which are now available globally on Amazon Bedrock [2][3][4] - The integration of these models reflects AWS's agility in the global AI competition and its strategy of offering diverse options to developers and enterprises [5][7] Qwen3 Model - Qwen3, developed by Alibaba, is a new generation model that excels in reasoning, instruction following, multilingual support, and tool invocation, significantly reducing deployment costs and hardware requirements [9][10] - The model features a hybrid architecture, supporting both MoE and dense configurations, which enhances its performance across various applications [10][13] - Qwen3 supports a context window of 256K tokens, expandable to 1 million tokens, allowing it to handle extensive codebases and long conversations effectively [10] DeepSeek-V3.1 Model - DeepSeek-V3.1 is recognized for its efficient reasoning capabilities and competitive pricing, making it a popular choice for enterprises [11][12] - AWS is the first overseas cloud provider to offer a fully managed version of DeepSeek, enhancing its service offerings [12][16] - The model supports both thinking and non-thinking modes, improving adaptability and efficiency in various applications [14] Performance and User Experience - Both Qwen3 and DeepSeek models have demonstrated strong performance in practical tests, showcasing their capabilities in code generation and complex reasoning tasks [19][23][31] - The Amazon Bedrock platform currently hosts 249 models, providing users with a wide array of options for different applications, from general dialogue to code assistance [16] Strategic Vision - AWS's strategy, encapsulated in the "Choice Matters" philosophy, aims to empower customers with the freedom to select and customize models according to their specific needs [39][40] - This approach not only enhances innovation potential but also positions AWS as a neutral and reliable infrastructure provider in the AI landscape [40][41]
通义DeepResearch震撼发布!性能比肩OpenAI,模型、框架、方案完全开源
机器之心· 2025-09-18 01:01
Core Insights - The article discusses the advancements of Tongyi DeepResearch, highlighting its transition from basic conversational capabilities to sophisticated research functionalities, achieving state-of-the-art (SOTA) results across multiple benchmarks while being fully open-source [1][3]. Data Strategy - The improvement in model capabilities is attributed to a multi-stage data strategy designed to generate high-quality training data without relying on expensive manual annotations [5]. - The team introduced Agentic Continual Pre-training (CPT) to establish a solid foundation for the model, utilizing a systematic and scalable data synthesis approach [6]. - The data generation process involves restructuring and constructing questions based on a wide array of knowledge documents, web crawler data, and knowledge graphs, creating an open-world knowledge memory anchored by entities [6]. Reasoning Modes - Tongyi DeepResearch features both a native ReAct Mode and a Heavy Mode for managing complex multi-step research tasks [11]. - In ReAct Mode, the model excels in a standard thinking-action-observation cycle, supporting extensive interaction rounds with a context length of 128K [12]. - Heavy Mode employs a new IterResearch paradigm to deconstruct tasks into research rounds, allowing the agent to maintain cognitive focus and high-quality reasoning [13][14]. Training Methodology - The training process integrates Agentic CPT, Supervised Fine-Tuning (SFT), and Reinforcement Learning (RL), establishing a new paradigm for agent model training [17][20]. - The team customized RL algorithms based on GRPO, ensuring that learning signals align with the model's current capabilities, and implemented strategies to enhance training stability [21]. - Dynamic indicators during training show significant learning effects, with rewards consistently increasing, indicating effective exploration and adaptation [23]. Application Deployment - Tongyi DeepResearch has empowered various internal applications within Alibaba, including the creation of a simulated training environment to reduce development costs and improve speed [27]. - The team developed a stable and efficient tool sandbox to ensure reliable tool calls during agent training and evaluation [27]. - The collaboration with Gaode App focuses on enhancing complex query experiences in navigation and local services, showcasing the practical application of agent capabilities [28]. Legal Intelligence - Tongyi Falvui serves as a legal intelligence agent, providing professional legal services such as legal Q&A, case law retrieval, and document drafting, leveraging innovative agent architecture [30]. - The performance metrics of Tongyi Falvui indicate superior quality in answer points, case citations, and legal references compared to other models [31]. Research Contributions - The Tongyi DeepResearch team has consistently published technical reports, contributing to the open-source community and advancing the field of deep research agents [33].
从苹果收购传闻到ASML豪掷13亿成大股东,起底Mistral AI的技术与商业密码
3 6 Ke· 2025-09-12 07:35
Core Insights - Apple is reportedly considering acquiring Mistral AI, which could become its largest acquisition in history, as it seeks to enhance its AI capabilities, particularly in improving Siri's performance [3][15] - ASML has led a €1.3 billion investment in Mistral AI's Series C funding round, making it the largest shareholder and establishing a strategic partnership, further elevating Mistral AI's profile in the tech industry [1][2][17] - Mistral AI, founded in April 2023, has rapidly gained attention in the AI sector, achieving significant funding milestones and a valuation surge to $14 billion [1][2] Company Overview - Mistral AI was founded by three young talents from top institutions like DeepMind and Meta, showcasing a strong team background [1][4] - The company has achieved remarkable funding success, including a record €105 million seed round and subsequent rounds totaling €1.7 billion, leading to a valuation increase from €5.8 billion to €14 billion in just over a year [2][26] Technological Strengths - Mistral AI offers a diverse range of models, including lightweight and multimodal technologies, which have garnered significant industry attention [5][8] - The Mistral 7B model, with 70 billion parameters, demonstrates superior performance in complex reasoning and coding tasks, while the Mixtral 8×7B model has outperformed larger models in benchmark tests [8][10] - The company is also advancing multimodal technology with the Pixtral Large model, which integrates image understanding and text generation for various applications [9][10] Open Source and Community Engagement - Mistral AI emphasizes open-source development, allowing global developers to access and improve its models, fostering a collaborative ecosystem [10][13] - The open-source approach contrasts with many competitors, enhancing Mistral AI's reputation and community support [13][26] Strategic Partnerships and Market Position - ASML's collaboration with Mistral AI aims to integrate advanced AI models into semiconductor manufacturing processes, enhancing efficiency and performance [16][17] - Mistral AI's unique position as a leading European AI company makes it a strategic asset amid growing concerns over reliance on American AI technologies [24][25]
王兴兴,最新发声!“还处在爆发性增长前夜”
Group 1: AI Development Insights - The AI field is still in its early stages, with significant growth expected soon, as highlighted by the CEO of Yushu Technology, Wang Xingxing [2] - Challenges in high-quality data collection and model algorithms are present, particularly in the integration of multimodal data and robot control [2] - The era of innovation and entrepreneurship in AI is seen as promising, with lower barriers for young innovators [2] Group 2: Open Data and Resources - Open data and computational resources are essential for advancing AI, as stated by Wang Jian, founder of Alibaba Cloud [3] - The shift from code open-sourcing to resource openness marks a revolutionary change in AI competition [3] - The launch of the "Three-body Computing Constellation" with 12 satellites aims to process data in space, facilitating deep space exploration [3] Group 3: AI in Healthcare - Ant Group's CEO, Han Xinyi, emphasizes the importance of combining AI with human expertise in healthcare, focusing on personalized and precise recommendations [4] - The dual nature of healthcare as a low-frequency behavior and health management as a high-frequency need creates fertile ground for AI applications [4] - AI is expected to serve as an assistant to doctors, enhancing their capabilities rather than replacing them [4] Group 4: AI Business Opportunities - The upcoming year is anticipated to witness a significant explosion in AI applications, with new entrepreneurial opportunities emerging [5] - The distinction between B2B and B2C AI ventures is noted, with the U.S. focusing more on B2B and China excelling in C2C [5] - Differentiation in AI lies in creating unique user experiences beyond the AI technology itself [5]
图灵奖得主、王坚、韩歆毅、王兴兴等最新发声
Zhong Guo Ji Jin Bao· 2025-09-11 11:10
Core Insights - The 2025 Bund Conference gathered 550 guests from 16 countries to discuss the future of AI and innovation, featuring prominent figures like Richard Sutton and Wang Jian [1] Group 1: AI Development and Trends - Richard Sutton emphasized that AI is entering an "experience era" focused on continuous learning, with potential far exceeding previous capabilities [2] - Sutton also noted that fears surrounding AI, such as bias and job loss, are exaggerated and often fueled by those who profit from such narratives [2] - Wang Jian highlighted the shift from code open-source to resource open-source as a revolutionary change in AI, making the choice between open and closed models a key competitive factor [4] Group 2: Infrastructure and Economic Impact - Zhang Hongjiang pointed out that AI is driving large-scale infrastructure expansion, with significant capital expenditures expected, such as over $300 billion in AI-related spending by major tech companies in the U.S. by 2025 [6] - He also mentioned that the AI data center industry has seen a construction boom, which will positively impact the power ecosystem and economic growth [6] Group 3: AI in Healthcare - Ant Group's CEO, Han Xinyi, stated that AI will not replace doctors but will serve as a valuable assistant, enhancing the capabilities of specialists and supporting grassroots healthcare [9][11] - Han identified three core challenges for AI in healthcare: high-quality data, mitigating hallucinations, and addressing ethical concerns [11] Group 4: Challenges in AI Implementation - Wang Xingxing from Yushutech expressed optimism about the AI landscape but acknowledged that practical applications of AI still face significant challenges, particularly in aligning video generation with robotic control [13] - He noted that the barriers to innovation have lowered, creating a favorable environment for young entrepreneurs to leverage AI tools for new ideas [14]