大模型研发
Search documents
智谱冲A新进展:增聘国泰海通为辅导机构,重新办理辅导备案登记
Xin Lang Cai Jing· 2026-02-13 00:29
公开资料显示,智谱成立于2019年,源自清华大学技术成果转化,是中国最早投身大模型研发的企业之 一。2026年1月8日,智谱完成港交所上市,成为"全球大模型第一股"。 上市之后,智谱股价震荡上扬,目前总市值已逼近1800亿港元。交易行情显示,截至2026年2月12日收 盘,智谱报402港元/股,总市值1792亿港元。 近日,中国证监会官网更新了智谱公司的IPO辅导进展信息。根据证监会披露,智谱撤回了其于2025年 4月提交的辅导备案,并办理了新的辅导备案登记。 与上次不同的是,智谱IPO辅导机构变更为两家券商,由原来的中金公司变更为国泰海通证券和中金公 司两家。新的辅导备案报告还披露,智谱拟冲刺上交所科创板。 继H股上市后,智谱继续推进其A股上市计划。 ...
梁文锋的幻方量化去年收益57%,跻身百亿级量化基金业绩榜第二!
21世纪经济报道· 2026-01-14 08:38
Core Viewpoint - The article highlights the impressive performance of Fantom Quantitative, which achieved an average return of 56.55% in 2025, ranking second among quantitative private equity firms in China, and emphasizes the financial support it provides to DeepSeek for AI model development [1][2]. Group 1: Company Performance - Fantom Quantitative's average return over the past three years is 85.15%, and over the past five years, it is 114.35% [1]. - The company currently manages over 700 billion yuan, maintaining its position in the top tier of China's private quantitative investment sector [1]. - Estimated revenue from management fees and performance commissions for the previous year could exceed 700 million USD, based on a 1% management fee and 20% performance commission [2]. Group 2: DeepSeek Development - DeepSeek, founded in July 2023, is focused on general artificial intelligence and is primarily funded by the research budget of Fantom Quantitative [2]. - The V4 model, an iteration of the V3 model set to be released around the Spring Festival in February, is reported to surpass current leading models in programming capabilities [3]. - DeepSeek's V3 model had a total training cost budget of 5.57 million USD [2]. Group 3: Industry Context - Competitors in the AI model space, such as Zhizhu and MiniMax, have reported significant R&D expenditures, with Zhizhu's cumulative investment reaching approximately 4.4 billion yuan and MiniMax's around 316 million yuan [3]. - The Italian antitrust authority concluded an investigation into DeepSeek regarding user warnings about potential misinformation, indicating regulatory scrutiny in the AI sector [4].
腾讯 AI Lab副主任俞栋离职,混元团队“新老交替”进行中|智能涌现独家
3 6 Ke· 2025-12-29 06:02
Core Insights - The departure of Yu Dong, former Deputy Director of Tencent AI Lab, is attributed to personal development reasons, marking a significant change in Tencent's AI leadership [1] - Yu Dong has been a key figure in Tencent's AI development since joining in 2017, contributing to advancements in speech processing, natural language processing, and digital human technologies [2][3] - Tencent is actively recruiting new talent and restructuring its AI model development resources to enhance competitiveness in the rapidly evolving AI landscape [4][5] Group 1 - Yu Dong's expertise in speech processing and deep learning, along with his leadership in applying deep learning to speech recognition, has been pivotal for Tencent [3] - During his tenure, Yu led research teams that published hundreds of papers and advanced the application of NLP and speech technologies within Tencent's business [2][3] - The "Hunyuan" model, which Yu contributed to, is part of Tencent's broader strategy to integrate AI capabilities across various departments [2][4] Group 2 - Following Yu Dong's departure, Tencent is focusing on talent acquisition, having recently brought in former OpenAI researcher Yao Shunyu to strengthen its AI capabilities [4] - Tencent is consolidating its AI model development resources to address inefficiencies caused by previously dispersed teams, aiming for a more focused approach [5] - The establishment of new departments within Tencent's Technology Engineering Group (TEG) is part of a strategic move to clarify roles and enhance model development [5]
腾讯升级大模型研发架构,新成立AI Infra、AI Data等部门
Xin Lang Cai Jing· 2025-12-17 08:54
Core Insights - Tencent has upgraded its large model research and development structure by establishing new departments: AI Infra, AI Data, and Data Computing Platform, to enhance its core capabilities in large model development [1][2] - Vincesyao has been appointed as the Chief AI Scientist of the CEO/President's Office and will oversee both the AI Infra and Large Language Model departments, reporting to Tencent's President Liu Chiping [1][2] Department Responsibilities - The AI Infra department will focus on building technical capabilities for large model training and inference platforms, emphasizing distributed training and high-performance inference services to create a competitive edge in large model AI infrastructure [2] - The upgraded AI Data and Data Computing Platform departments will be responsible for constructing the data and evaluation systems for large models, as well as developing a data intelligence integration platform for big data and machine learning [2] - Wang Di continues as the Deputy General Manager of the Large Language Model department, reporting to Vincesyao, while Liu Yuhong and Chen Peng have been appointed as heads of the AI Data and Data Computing Platform departments, respectively, both reporting to Vice President Jiang Jie [2]
突发!Anthropic全面封禁中国控股公司使用Claude:无论你在哪,都别想绕过!
菜鸟教程· 2025-09-05 07:04
Core Viewpoint - The new policy announced on September 5, 2025, restricts access to Claude services for Chinese companies and entities with significant Chinese capital, impacting their ability to develop competitive AI models [1][9]. Group 1: Policy Implications - The policy applies to mainland Chinese companies and overseas subsidiaries with over 50% Chinese ownership, as well as entities using Claude indirectly through cloud services or third-party platforms [1]. - The restrictions are not limited to direct users of Claude but also include companies that access the service indirectly, regardless of their registration location [9]. Group 2: Competitive Concerns - There are concerns that Chinese companies could use subsidiaries to access Claude for military or intelligence applications, potentially accelerating their own AI model development to compete with U.S. and allied tech firms [5]. - Anthropic has chosen to prioritize security over profit, advocating for stricter export controls and enhanced domestic infrastructure for AI development [6]. Group 3: Industry Impact - The sudden shutdown of Claude's API could halt ongoing projects for multinational businesses, prompting a shift towards developing domestic AI models and ensuring compliance and security [10]. - As external access becomes increasingly restricted, the focus shifts to developing indigenous solutions to maintain competitiveness in the AI landscape [11].
智谱 GLM-4.5 团队深夜爆料:上下文要扩、小模型在路上,还承诺尽快发新模型!
AI前线· 2025-08-29 08:25
Core Insights - The GLM-4.5 model focuses on expanding context length and improving its hallucination prevention capabilities through effective Reinforcement Learning from Human Feedback (RLHF) processes [6][10][11] - The future development will prioritize reasoning, programming, and agent capabilities, with plans to release smaller parameter models [6][50][28] Group 1: GLM-4.5 Development - The team behind GLM-4.5 includes key contributors who have worked on various significant AI projects, establishing a strong foundation for the model's development [3] - The choice of GQA over MLA in the architecture was made for performance considerations, with specific weight initialization techniques applied [12][6] - There is an ongoing effort to enhance the model's context length, with potential releases of smaller dense or mixture of experts (MoE) models in the future [9][28] Group 2: Model Performance and Features - GLM-4.5 has demonstrated superior performance in tasks that do not require long text generation compared to other models like Qwen 3 and Gemini 2.5 [9] - The model's effective RLHF process is credited for its strong performance in preventing hallucinations [11] - The team is exploring the integration of reasoning models and believes that both reasoning and non-reasoning models will coexist and complement each other in the long run [16][17] Group 3: Future Directions and Innovations - The company plans to focus on developing smaller MoE models and enhancing the capabilities of existing models to handle more complex tasks [28][50] - There is an emphasis on improving data engineering and the quality of training data, which is crucial for model performance [32][35] - The team is also considering the development of multimodal models, although current resources are primarily focused on text and vision [23][22] Group 4: Open Source vs. Closed Source Models - The company believes that open-source models are closing the performance gap with closed-source models, driven by advancements in resources and data availability [36][53] - The team acknowledges that while open-source models have made significant strides, they still face challenges in terms of computational and data resources compared to leading commercial models [36][53] Group 5: Technical Challenges and Solutions - The team is exploring various technical aspects, including efficient attention mechanisms and the potential for integrating image generation capabilities into language models [40][24] - There is a recognition of the importance of fine-tuning and optimizing the model's writing capabilities through improved tokenization and data processing techniques [42][41]