Workflow
多模态
icon
Search documents
前亚研院谭旭离职月之暗面,加入腾讯混元,AI人才正加速回流大厂
Sou Hu Cai Jing· 2025-08-23 12:10
Unsplash 根据让互联网飞一会儿的报道,微软亚洲研究院前首席研究经理谭旭已于近期正式加入腾讯混元团队,负责多模态方向的前沿研究。 谭旭是一位在学术与产业界都颇具分量的研究者:在微软研究院任职期间,他的研究聚焦于生成式人工智能,以及语音、音频与视频内容生成,其论文引 用量已超过万次,研究成果也被大规模应用于 Azure、Bing 等核心产品。他还多次担任 NeurIPS 等国际顶级学术会议的审稿人,在学术界有着较高声望。 更深层的解读是,中国大模型赛道正在经历"由野蛮生长到资源集中"的转折。早期创业公司依靠故事、融资与速度抢占叙事高地,但随着竞争进入比拼数 据、算力、落地生态的深水区,创业公司的先发优势正在迅速消退。大厂凭借资本实力、算力基础设施和应用场景,正逐步收拢最顶尖的人才与技术方 向。 谭旭的选择,某种意义上也是这一趋势的缩影:当赛道进入淘汰赛阶段,个人要想继续在多模态领域做出成果,或许唯有依附大厂,才能确保研究的持续 性与产业化的可能。 值得注意的是,谭旭在去年 8 月才刚刚加入国内大模型创业公司"月之暗面",负责研发端到端语音模型。据悉,该公司的多模态研究在他入职前已悄然展 开数月。去年底,随 ...
拾象 AGI 观察:LLM 路线分化,AI 产品的非技术壁垒,Agent“保鲜窗口期”
海外独角兽· 2025-08-22 04:06
访谈:李广密,张小珺 「全球大模型季报」 是「海外独角兽」和 「张小珺 Jùn|商业访谈录」 的 AI 领域观察栏目,以季 度为单位,拾象 CEO 李广密和财经作者张小珺梳理 LLM 领域的重要信号,预测未来。 • 智能和产品都重要,ChatGPT 身上有很多非技术性壁垒,而 Coding 或模型公司只是技术壁垒; • 做 AI 产品很像挖矿,保鲜窗口很关键,这个窗口期明显在缩短; • ChatGPT 的 Deep Research 和 Anthropic 的 Claude Code 最早交付了 L4 级别的体验,分别对应信息 搜索和软件开发; • 极端来说,Coding 公司不做模型的话,在未来是没有优势的,未来就是比拼成本; 一起开一个脑洞: 如果让你未来 4 年加入一家 AI 公司或者选一个好的 CEO,你的选择会是什么? 欢迎在评论区留下你的答案。 01 . 模型开始分化 Guangmi Li: 希望大家能在这一期记住三个关键信息: 1. 大模型在分化与收敛; 2025 Q2 全球大模型的爆发性比以往更强,硅谷的各个模型公司开始分化到各个领域,比如除了 Google Gemini 和 OpenAI 还 ...
字节突然开源Seed-OSS,512K上下文碾压主流4倍长度!推理能力刷新纪录
量子位· 2025-08-21 02:36
Core Viewpoint - ByteDance has launched an open-source large model named Seed-OSS-36B, featuring 360 billion parameters, which aims to compete with existing models like OpenAI's GPT-OSS series [1][3][4]. Model Features - Seed-OSS-36B boasts a native context window of 512K, significantly larger than the 128K offered by mainstream models like DeepSeek V3.1, allowing it to handle complex tasks such as legal document review and long report analysis [5][6][8]. - The model introduces a "Thinking Budget" mechanism, enabling users to set a token limit for the model's reasoning depth, which can be adjusted based on task complexity [9][10][12]. - The architecture includes 360 billion parameters, 64 layers, and utilizes RoPE position encoding, GQA attention mechanism, RMSNorm normalization, and SwiGLU activation function [13][14]. Performance Metrics - Seed-OSS-36B-Base achieved a score of 65.1 on the MMLU-Pro benchmark, outperforming Qwen2.5-32B-Base, which scored 58.5 [16]. - The model scored 87.7 on the BBH reasoning benchmark, setting a new record for open-source models, and demonstrated strong performance in math and coding tasks [17][18]. - The instruction-tuned version, Seed-OSS-36B-Instruct, scored 91.7 on the AIME24 math competition, ranking just below OpenAI's OSS-20B [20]. Development Background - The ByteDance Seed team, established in 2023, aims to create advanced AI foundational models and has released several impactful projects, including Seed-Coder and BAGEL, which address various AI tasks [21][22][23]. - The team has also developed VeOmni, a distributed training framework, and Seed LiveInterpret, an end-to-end simultaneous interpretation model [24][25]. Open Source Contribution - With the release of Seed-OSS, ByteDance adds a significant player to the domestic open-source base model landscape, promoting further advancements in AI technology [26].
融资数千万美元,前B站副总裁创业:走出ICU,用户已超800万
Sou Hu Cai Jing· 2025-08-17 21:36
Core Insights - Binson, a veteran in the internet industry, founded a new AI companionship product called "Doudou Game Partner," which has gained 8 million users during its testing phase and received several rounds of funding totaling tens of millions of dollars [1][28] - The product aims to provide not just companionship but also practical assistance in gaming, differentiating itself from traditional virtual pets by offering strategic advice and real-time game support [3][5] - Binson's personal experience with a life-threatening accident has influenced his perspective on the importance of emotional connection and companionship in AI products [1][11] Product Overview - "Doudou Game Partner" is an AI companion designed to assist users while they play games, offering strategic insights and reminders during gameplay [3][5] - The AI supports various popular games, providing tailored advice and emotional engagement, making it feel more like a gaming coach than a simple virtual pet [5][9] - The product features voice interaction, allowing users to engage without needing to divert their attention from the game [5][11] Market Positioning - The company targets a large user base, aiming for "at least tens of millions, even hundreds of millions" of users, reflecting the potential market size in the gaming industry [11][67] - Binson believes that the AI companionship market will expand as societal loneliness increases, positioning the product as a solution for emotional support [39][48] Technology and Development - The product utilizes advanced AI technologies, including visual language models (VLM) and real-time inference capabilities, to enhance user interaction and experience [31][34] - Continuous improvements are being made to the AI's understanding and contextual awareness, with a focus on long-term user engagement and emotional connection [37][38] User Engagement and Feedback - The company emphasizes user satisfaction, monitoring retention rates and user engagement to gauge emotional connections with the AI [46] - Users have expressed a willingness to wait for further improvements, indicating a strong demand for the product despite its current limitations [29][28] Competitive Landscape - Binson acknowledges competition from both game developers and larger tech companies but believes that the unique focus on emotional companionship and cross-game support sets "Doudou Game Partner" apart [47][48] - The company has established a strong emotional bond with its users, which is seen as a significant competitive advantage [49][50] Future Outlook - The company plans to expand its offerings beyond gaming, potentially integrating AI companionship into users' offline lives, such as managing daily tasks [27][39] - Binson envisions a future where AI companionship becomes a standard part of life, addressing the emotional needs of users in various contexts [39][48]
我们距离真正的具身智能大模型还有多远?
2025-08-13 14:56
Summary of Conference Call Notes Industry Overview - The discussion revolves around the humanoid robot industry, emphasizing the importance of the model end in the development of humanoid robots, despite the current market focus on hardware [1][2][4]. Key Points and Arguments 1. **Importance of Large Models**: The emergence of multi-modal large models is seen as essential for equipping humanoid robots with intelligent capabilities, which is the underlying logic for the current development in humanoid robotics [2][4]. 2. **Data Collection Challenges**: The stagnation in model development is attributed to insufficient data collection, as initial data has not been monetized due to a lack of operational robots in factories [3][16]. 3. **Role of Tesla**: Tesla is highlighted as a crucial player in the industry, as the standardization of hardware is necessary for effective data collection and model improvement [3][4][16]. 4. **Data Flywheel Concept**: The formation of a data flywheel is critical for the rapid growth of large models, which requires a solid hardware foundation [4][16]. 5. **Model Development Trends**: The development of models is driven by three main lines: multi-modality, increased action frequency, and enhanced reasoning capabilities [5][11][12]. 6. **Model Evolution**: The evolution of models from C-CAN to RT1, RT2, and Helix shows a progression in capabilities, including the integration of various input modalities and improved action execution frequencies [6][10][11]. 7. **Training Methodology**: The training of models is compared to human learning, involving pre-training on low-quality data followed by fine-tuning with high-quality real-world data [13][14]. 8. **Data Quality and Collection**: Real-world data is deemed the highest quality but is challenging to collect efficiently, while simulation data is more accessible but may lack realism [15][17]. 9. **Motion Capture Technology**: The discussion includes the importance of motion capture technology in data collection, with various methods and their respective advantages and disadvantages [18][19]. 10. **Future Directions**: The future of large models is expected to involve more integration of modalities and the development of world models, which are seen as a consensus in the industry [21][22]. Additional Important Content - **Industry Players**: Companies like Galaxy General and Xinjing are mentioned as key players in the model development space, with Galaxy General focusing on full simulation data [22][23]. - **Market Recommendations**: Recommendations for investment focus on motion capture equipment, cameras, and humanoid robot control systems, with specific companies highlighted for potential investment [26]. This summary encapsulates the critical insights from the conference call, providing a comprehensive overview of the humanoid robot industry's current state and future directions.
硬件只是入场券:AI可穿戴的百万销量背后,软件与场景才是终极战场
AI前线· 2025-08-12 07:22
Core Viewpoint - The integration of AI into hardware is essential for creating valuable services and enhancing user experience, marking a shift towards a collaborative and tool-oriented era for large models [1][4][15]. Group 1: AI Hardware Development - The future of AI hardware will excel in scenarios where traditional hardware falls short, with the integration of software and hardware being key to achieving this [4][15]. - Successful products attract top talent, which is crucial for creating competitive offerings in the market [4][15]. - Companies like Plaud and Rokid have gained early advantages by recognizing real user needs and investing in product development before the rise of large models [6][7]. Group 2: Market Dynamics and User Engagement - Crowdfunding success for Plaud was driven by a combination of genuine user demand and strong design appeal, which is critical for hardware products [7][8]. - The AI integration in hardware has led to increased market recognition, with many manufacturers seeking ways to embed AI into their products [8][9]. - The evolution of hardware focuses on lightweight designs to cater to a broader user base, including children and the elderly [9]. Group 3: Competitive Landscape - The competitive edge lies in the ability to gather contextual information effectively, which is essential for differentiating software capabilities [11][12]. - Large companies often overlook the hardware sector due to its challenges, creating opportunities for startups to thrive [12][16]. - The core value of integrated software and hardware in AI applications is to create a seamless user experience, which requires comprehensive team capabilities [12][13]. Group 4: Technical Challenges and Innovations - Multi-modal interaction presents significant technical challenges, particularly in understanding user intent and context [17][19]. - The integration of various data types (audio, visual, etc.) is crucial for enhancing AI's understanding of user interactions [19][20]. - Ensuring user privacy and data security is paramount as multi-modal capabilities expand [23][20]. Group 5: Future Outlook and Market Education - The market for AI hardware is still in its early stages, requiring patience and education to encourage user adoption [26][28]. - The ultimate form of smart wearable devices will be lightweight and unobtrusive, becoming a part of daily life [33]. - Establishing user trust is critical for the success of AI hardware, as users must feel secure in sharing their data [37].
活动报名:AI 视频的模型、产品与增长实战|42章经
42章经· 2025-08-10 14:04
Core Insights - The article discusses an upcoming online event focused on AI video technology, featuring industry experts sharing their practical experiences and insights on models, products, and growth strategies in the AI video sector [10]. Group 1: Event Overview - The online event will take place on August 16, from 10:30 AM to 12:30 PM, and will be hosted on Tencent Meeting [7][8]. - The event is limited to 100 participants, with a preference for attendees who provide thoughtful responses and have relevant backgrounds [10]. Group 2: Guest Speakers and Topics - Guest speaker Dai Gaole, Lead of Luma AI model products, will discuss the technical paths and future capabilities of video models and world models [2]. - Guest speaker Xie Xuzhang, co-founder of Aishi Technology, will share key decisions that led to Pixverse achieving 60 million users in two years, including the evolution of visual models [3][4]. - Guest speaker Xie Juntao, former growth product lead at OpusClip, will focus on customer acquisition, conversion strategies, user retention, and data-driven decision-making in video creation products [5].
关于 AI Infra 的一切 | 42章经
42章经· 2025-08-10 14:04
Core Viewpoint - The rise of large models has created significant opportunities for AI infrastructure (AI Infra) professionals, marking a pivotal moment for the industry [7][10][78]. Group 1: Understanding AI Infra - AI Infra encompasses both hardware and software components, with hardware including AI chips, GPUs, and switches, while software can be categorized into three layers: IaaS, PaaS, and an optimization layer for training and inference frameworks [3][4][5]. - The current demand for AI Infra is driven by the unprecedented requirements for computing power and data processing brought about by large models, similar to the early days of search engines [10][11]. Group 2: Talent and Industry Dynamics - The industry is witnessing a shift where both new engineers and traditional Infra professionals are needed, as the field emphasizes accumulated knowledge and experience [14]. - The success of AI Infra professionals is increasingly recognized, as they play a crucial role in optimizing model performance and reducing costs [78][81]. Group 3: Performance Metrics and Optimization - Key performance indicators for AI Infra include model response latency, data processing efficiency per GPU, and overall cost reduction [15][36]. - The optimization of AI Infra can lead to significant cost savings, as demonstrated by the example of improving GPU utilization [18][19]. Group 4: Market Opportunities and Challenges - Third-party companies can provide value by offering API marketplaces, but they must differentiate themselves to avoid being overshadowed by cloud providers and model companies [22][24]. - The integration of hardware and model development is essential for creating competitive advantages in the AI Infra space [25][30]. Group 5: Future Trends and Innovations - The future of AI models may see breakthroughs in multi-modal capabilities, with the potential for significant cost reductions in model training and inference [63][77]. - Open-source models are expected to drive advancements in AI Infra, although there is a risk of stifling innovation if too much focus is placed on optimizing existing models [69][70]. Group 6: Recommendations for Professionals - Professionals in AI Infra should aim to closely align with either model development or hardware design to maximize their impact and opportunities in the industry [82].
逐鹿人工智能下半场,AI应用商业化起量!基金经理最新观点
券商中国· 2025-08-10 10:21
Core Viewpoint - The article emphasizes that AI is entering a virtuous cycle from computing power investment to cloud service consumption and then to commercialization revenue, with the scaling of AI applications being the key driver of this effect [1][2]. AI Application Commercialization - This year is seen as a pivotal year for the commercialization and scaling of AI applications, with significant growth in both domestic and international markets [3]. - Notable achievements include Cursor reaching $500 million in ARR, Anthropic's ARR soaring from $1 billion to nearly $4 billion, and OpenAI surpassing $10 billion in annualized revenue, reflecting an 80% increase from last year [3]. - In China, Kuaishou's AI product achieved over $10 million in ARR within 10 months, while ByteDance's model saw a 137-fold increase in daily token usage since its launch [3]. Market Sentiment and Trends - Fund managers note that AI functionalities are increasingly penetrating daily work and life, evidenced by explosive growth in token usage, indicating rapid user adoption [4]. - The focus has shifted from event-driven catalysts to actual progress in commercialization, with many AI companies in the U.S. revising their performance expectations upward due to AI-driven growth [4][5]. B2B and B2C Empowerment - AI applications are focusing on dual empowerment for B2B (business) and B2C (consumer) sectors, with B2B applications aiming to reduce costs and increase efficiency, while C2C applications are enhancing user experiences through hardware integration [5]. - By 2025, over 25% of global AI tools are expected to be applied in areas like code generation and customer service, driving enterprise spending towards AI [5]. Evolution of AI Agents - AI Agents are becoming a crucial entry point for human-computer interaction, with advancements in models like GPT-5 enhancing their capabilities [6][7]. - The concept of AI Agents has evolved from basic tools to sophisticated systems capable of complex tasks, with predictions that 2025 will mark the "year of the agent" [6][7]. Future Growth Engines - Despite slower progress in AI hardware applications, there is optimism about the potential of edge AI innovations, such as smart glasses and smart homes, to drive the next growth cycle [10][11]. - The smart glasses market is projected to grow significantly, with sales expected to reach 1 million units by 2027, representing a market opportunity of 100 billion yuan [11].
中信证券:GPT-5发布 美股科技领域建议布局AI计算芯片等领域
Core Insights - OpenAI's recent release of GPT-5 has garnered significant attention in the capital markets due to its notable advancements in reasoning capabilities and competitive pricing compared to other leading models like Gemini2.5Pro [1] - GPT-5 has demonstrated strong performance in specialized applications such as programming and healthcare, indicating substantial potential for market expansion [1] - The rapid updates and iterations from model developers like OpenAI are influencing a competitive landscape among tech giants in the frontier model sector, leading to explosive growth in computing power and the unlocking of complex application scenarios [1] Industry Recommendations - The technology sector in the U.S. stock market is advised to focus on infrastructure and AI applications, particularly in areas such as AI computing chips, HBM, AI networking equipment, IDC, foundational and application software, and internet services [1]