Workflow
BAGEL
icon
Search documents
字节突然开源Seed-OSS,512K上下文碾压主流4倍长度!推理能力刷新纪录
量子位· 2025-08-21 02:36
Core Viewpoint - ByteDance has launched an open-source large model named Seed-OSS-36B, featuring 360 billion parameters, which aims to compete with existing models like OpenAI's GPT-OSS series [1][3][4]. Model Features - Seed-OSS-36B boasts a native context window of 512K, significantly larger than the 128K offered by mainstream models like DeepSeek V3.1, allowing it to handle complex tasks such as legal document review and long report analysis [5][6][8]. - The model introduces a "Thinking Budget" mechanism, enabling users to set a token limit for the model's reasoning depth, which can be adjusted based on task complexity [9][10][12]. - The architecture includes 360 billion parameters, 64 layers, and utilizes RoPE position encoding, GQA attention mechanism, RMSNorm normalization, and SwiGLU activation function [13][14]. Performance Metrics - Seed-OSS-36B-Base achieved a score of 65.1 on the MMLU-Pro benchmark, outperforming Qwen2.5-32B-Base, which scored 58.5 [16]. - The model scored 87.7 on the BBH reasoning benchmark, setting a new record for open-source models, and demonstrated strong performance in math and coding tasks [17][18]. - The instruction-tuned version, Seed-OSS-36B-Instruct, scored 91.7 on the AIME24 math competition, ranking just below OpenAI's OSS-20B [20]. Development Background - The ByteDance Seed team, established in 2023, aims to create advanced AI foundational models and has released several impactful projects, including Seed-Coder and BAGEL, which address various AI tasks [21][22][23]. - The team has also developed VeOmni, a distributed training framework, and Seed LiveInterpret, an end-to-end simultaneous interpretation model [24][25]. Open Source Contribution - With the release of Seed-OSS, ByteDance adds a significant player to the domestic open-source base model landscape, promoting further advancements in AI technology [26].
腾讯,大动作!
Zhong Guo Ji Jin Bao· 2025-06-27 15:11
Core Insights - Tencent Hunyuan has launched its first open-source hybrid reasoning model, Hunyuan-A13B, which is the industry's first 13B-level MoE open-source hybrid reasoning model, demonstrating performance comparable to leading open-source models of the same architecture [2][4][6] Group 1: Model Features and Performance - Hunyuan-A13B has a total of 80 billion parameters, with only 13 billion activated parameters, offering faster inference speed and higher cost-effectiveness [4][6] - The model has shown strong general capabilities, achieving high scores on various authoritative industry data test sets, particularly excelling in agent tool invocation and long text understanding [4][6] - In practical applications, Hunyuan-A13B allows developers to choose between fast and slow thinking modes, enhancing flexibility in output [6] Group 2: Open Source and Industry Trends - The model is now available on open-source platforms like GitHub and Hugging Face, with API support on Tencent Cloud for quick deployment [4][6] - The trend of open-sourcing large models is accelerating among major internet companies, with Tencent, Alibaba, and ByteDance among those releasing multiple open-source models this year [8][9] - A report indicates that over 50% of global enterprises have adopted open-source AI technologies, highlighting the shift towards lower-cost, high-quality AI solutions [9] Group 3: Future Developments - Tencent plans to release more models of varying sizes and features, including dense models ranging from 0.5B to 32B and various MoE models to meet diverse enterprise needs [9] - The company aims to continue enhancing the open-source ecosystem by sharing practical technologies and innovations [6][9]
腾讯,大动作!
中国基金报· 2025-06-27 15:00
Core Viewpoint - Tencent's Hunyuan-A13B model is the first open-source MoE model at the 13B parameter level, offering significant performance improvements and cost advantages for developers in the AI industry [4][6]. Group 1: Model Features and Performance - Hunyuan-A13B has a total of 80 billion parameters, with 13 billion active parameters, outperforming other leading open-source models in terms of inference speed and cost-effectiveness [4][5]. - The model supports flexible thinking modes, allowing for either quick, efficient outputs or deeper, more comprehensive reasoning processes [5]. - It is user-friendly for individual developers, requiring only a single mid-range GPU for deployment, and integrates seamlessly with mainstream open-source inference frameworks [5][10]. Group 2: Industry Trends and Open Source Movement - The open-source trend in AI is accelerating, with major tech companies like OpenAI, Google, and Alibaba releasing over 10 open-source models since March 2023 [8][9]. - The performance of open-source models continues to improve, with platforms like Hugging Face frequently updating their model rankings [8]. - Companies are increasingly adopting open-source AI technologies, with over 50% of enterprises reportedly utilizing these solutions for data, models, and tools [9][10]. Group 3: Future Developments - Tencent plans to release more models of varying sizes and features, contributing to the growth of the open-source ecosystem [6][10]. - Future releases will include a range of mixed reasoning models from 0.5B to 32B parameters, as well as multi-modal foundational models for images, videos, and 3D [10].
人工智能周报(25年第23周):OpenAI 公布 GPT-5 路线图,腾讯升级企业大模型知识库-20250613
Guoxin Securities· 2025-06-13 09:11
Investment Rating - The report maintains an "Outperform" rating for the internet sector, indicating expected performance above the market index by over 10% [3][36]. Core Insights - The report highlights the overall stability of earnings in the internet sector following the first quarter disclosures, despite ongoing fierce competition in the e-commerce industry. Companies are either continuing to offer discounts to merchants or increasing investments in instant retail to seek new growth [2][32]. - In the AI sector, major players are benefiting from their business scenarios such as cloud computing and advertising, although short-term AI agent developments still require refinement. The Hang Seng Technology Index is currently in a period of fluctuation, with recommendations for defensive stocks like Tencent Music and NetEase, which have stable earnings and low valuations [2][32]. Company Dynamics - OpenAI has publicly announced the roadmap for GPT-5 and launched new features for ChatGPT Enterprise [17]. - Google is testing a new AI search display method to guide users back to traditional link-clicking paths [19]. - Meta has opened commercial access to Llama 3, integrating deeply with AWS to capture the enterprise market [20]. - NVIDIA reaffirmed its leading position in AI infrastructure at the GTC conference, emphasizing the importance of edge-side inference capabilities [21]. - Amazon is enhancing its advertising business with generative AI tools aimed at automating brand content creation [22]. - Tencent Cloud has upgraded its enterprise large model knowledge base, integrating new models and network search capabilities [23]. - ByteDance has announced the open-sourcing of its unified multimodal understanding and generation model, BAGEL [24]. Underlying Technologies - Microsoft has officially included AI model safety assessments in Azure Foundry, evaluating around 1,900 models for content risk [26]. - Google has updated the Gemini 2.5 Pro preview model, showing significant improvements in performance metrics [27]. - The Zhiyuan Research Institute has released the "Wujie" series of large models, reflecting the evolution of AI from the digital to the physical world [28]. - Alibaba has open-sourced a new vector model series, Qwen3-Embedding, enhancing its capabilities in natural language processing [29]. Industry Policies - The Ministry of Industry and Information Technology is promoting the development of the AI industry and its integration into new industrialization efforts, emphasizing the need for a supportive ecosystem [30]. - A draft policy from Chengdu aims to promote high-quality development in the AI industry, focusing on innovation, industry capacity enhancement, and application expansion [31].
知识类型视角切入,全面评测图像编辑模型推理能力:所有模型在「程序性推理」方面表现不佳
量子位· 2025-06-13 05:07
Core Viewpoint - The article discusses the development of KRIS-Bench, a benchmark for evaluating the reasoning capabilities of image editing models, focusing on the structured knowledge acquisition process similar to human learning [2][3][16]. Group 1: KRIS-Bench Overview - KRIS-Bench is a collaborative effort involving multiple prestigious institutions aimed at assessing AI's reasoning abilities in image editing [2]. - The benchmark categorizes knowledge into three types: Factual Knowledge, Conceptual Knowledge, and Procedural Knowledge, allowing AI to face progressively complex editing challenges [4][8]. - It features 7 reasoning dimensions and 22 typical editing tasks, ranging from basic to advanced difficulty levels [6]. Group 2: Evaluation Metrics - KRIS-Bench introduces a four-dimensional automated evaluation system to score editing outputs: Visual Consistency, Visual Quality, Instruction Following, and Knowledge Plausibility [10][11][13]. - The evaluation process includes a total of 1,267 image-instruction pairs, meticulously curated by experts to ensure diverse data sources [12]. Group 3: Model Performance Insights - The benchmark tests 10 models, including 3 closed-source and 7 open-source models, revealing performance gaps particularly in procedural reasoning and natural science tasks [14][16]. - Closed-source models like GPT-Image-1 lead in performance, while open-source models like BAGEL-Think show improvements in knowledge plausibility through enhanced reasoning processes [17].
知识类型视角切入,全面评测图像编辑模型推理能力:所有模型在「程序性推理」方面表现不佳
量子位· 2025-06-13 05:07
Core Viewpoint - The article discusses the development of KRIS-Bench, a benchmark for evaluating the reasoning capabilities of image editing models, focusing on the structured knowledge acquisition process similar to human learning [2][3][4]. Group 1: Knowledge Structure - KRIS-Bench is designed to assess AI's knowledge structure through three categories: Factual Knowledge, Conceptual Knowledge, and Procedural Knowledge, allowing for a progressive challenge in image editing tasks [4][8]. - The benchmark includes 7 reasoning dimensions and 22 typical editing tasks, ranging from basic to advanced difficulty levels, covering a wide spectrum of challenges [6]. Group 2: Evaluation Metrics - KRIS-Bench introduces a four-dimensional automated evaluation system to score editing outputs, which includes Visual Consistency, Visual Quality, Instruction Following, and Knowledge Plausibility [11][13]. - The evaluation process involves a total of 1,267 image-instruction pairs, meticulously curated by an expert team to ensure diverse data sources and prevent model exploitation [12]. Group 3: Model Performance - The benchmark evaluates 10 models, including 3 closed-source and 7 open-source models, revealing that closed-source models like GPT-Image-1 outperform open-source counterparts in knowledge plausibility [14][18]. - Despite some models showing improvement in factual knowledge tasks, many still struggle with procedural reasoning and complex scientific tasks, indicating a significant gap in deep reasoning capabilities [18].
腾讯研究院AI速递 20250526
腾讯研究院· 2025-05-25 15:57
Group 1: Nvidia's Blackwell GPU - Nvidia's market share in China's AI chip market has plummeted from 95% to 50% due to U.S. export controls, allowing domestic chips to capture market share [1] - To address this issue, Nvidia has launched a new "stripped-down" version of the Blackwell GPU, priced between $6,500 and $8,000, significantly lower than the H20's price range of $10,000 to $12,000 [1] - The new chip utilizes GDDR7 memory technology with a memory bandwidth of approximately 1.7TB/s to comply with export control restrictions [1] Group 2: AI Developments and Innovations - Claude 4 employs a verifiable reward reinforcement learning (RLVR) paradigm, achieving breakthroughs in programming and mathematics where clear feedback signals exist [2] - The development of AI agents is currently limited by insufficient reliability, but it is expected that by next year, software engineering agents capable of independent work will emerge [2] - By the end of 2026, AI is predicted to possess sufficient "self-awareness" to execute complex tasks and assess its own capabilities [2] Group 3: Veo3 Video Generation Model - Google I/O introduced the Veo3 video generation model, which achieves smooth and realistic animation effects with synchronized audio, addressing physical logic issues [3] - Veo3 can accurately present complex scene details, including fluid dynamics, texture representation, and character movements, supporting various camera styles and effects [3] - As a creative tool, Veo3 has reached near-cinematic quality, supporting non-verbal sound effects and multilingual narration, raising discussions about the difficulty of distinguishing real from fake videos [3] Group 4: OpenAI o3 Model - The OpenAI o3 model discovered a remote 0-day vulnerability (CVE-2025-37899) in the Linux kernel's SMB implementation, outperforming Claude Sonnet 3.7 in benchmark tests [4] - In tests with 3,300 lines of code, o3 successfully identified known vulnerabilities 8 out of 100 times, with a false positive rate of approximately 1:4.5, demonstrating a reasonable signal-to-noise ratio [4] - o3 independently discovered a new UAF vulnerability and surpassed human experts in insight, indicating that large language models (LLMs) have reached practical levels in vulnerability research [5] Group 5: Byte's BAGEL Model - Byte has open-sourced the multimodal model BAGEL, which possesses GPT-4o-level image generation capabilities, integrating image understanding, generation, editing, and 3D generation into a single 7B parameter model [6] - BAGEL employs a MoT architecture, featuring two expert models and an independent visual encoder, showcasing a clear emergence of capabilities: multimodal understanding appears first, followed by complex editing abilities [6] - In various benchmark tests, BAGEL outperformed most open-source and closed-source models, supporting image reasoning, complex image editing, and perspective synthesis, and has been released under the Apache 2.0 license on Hugging Face [6] Group 6: Tencent's "Wild Friends Plan" - Tencent's SSV "Wild Friends Plan" mini-program has upgraded to include AI species recognition and intelligent Q&A interaction, capable of identifying biological species from user-uploaded photos and providing expert knowledge [7] - The new feature not only provides species names but also answers in-depth information about biological habits and migration patterns through natural language dialogue, translating technical terms into everyday language [7] - The "Shenzhen Biodiversity Puzzle" public participation activity has been launched, where user-uploaded images and interactive content will be used for model training, contributing to population surveys and habitat protection [7] Group 7: OpenAI's AI Hardware - OpenAI's first AI hardware, developed in collaboration with Jony Ive, is reported to be a neck-worn device resembling an iPod Shuffle, featuring no screen but equipped with a camera and microphone [8] - The new device aims to transcend screen limitations and provide more natural interactions, capable of connecting to smartphones and PCs, with mass production expected in 2027 [8] - Similar AI wearable devices are already on the market, but there are concerns among users regarding privacy and practicality, with some suggesting that AI glasses would be a better option [8] Group 8: AI Scientist Team's Breakthrough - The world's first AI scientist team discovered a new drug, Ripasudil, for treating dry age-related macular degeneration (dAMD) within 2.5 months, marking a significant scientific achievement [10] - The team developed the Robin multi-agent system, which automated the entire scientific discovery process, combining Crow, Falcon, and Finch agents for literature review, experimental design, and data analysis [10] - AI identified treatment pathways previously unconsidered by humans, fully dominating the research framework while humans only executed experiments, showcasing a new paradigm of AI-driven scientific discovery [10] Group 9: AI Product Development Insights - The best AI products often grow "bottom-up" rather than being planned, discovering potential through foundational experiments, reshaping product development paths [11] - As AI-generated content becomes mainstream, future core issues will shift from "whether AI generated" to content provenance, credibility, and verifiability [11] - AI has profoundly changed work methods, with 70% of Anthropic's internal code generated by Claude, leading to new challenges in efficiency bottlenecks in "non-engineering" areas [11] Group 10: Future of AI Applications - The best AI applications have yet to be invented, with the current state of the AI field likened to alchemy, where no one knows exactly what will work [12] - Generality and usability should develop in parallel rather than in opposition, with Character.AI focusing on building products that are both usable and highly general [12] - AI technology is expected to advance rapidly within 1-3 years, with the value of large language models lying in their ability to translate limited training into broad applications, with computational capacity being the key challenge rather than data scale [12]