多模态

Search documents
投资人热议Agent投资:通用与垂类智能体的路径权衡
Guo Ji Jin Rong Bao· 2025-09-13 13:09
技术从大模型跃迁到多模态、AI Agent(人工智能体)与具身智能,产业正处在技术奇点与商业爆发的交汇口。9月12日,在2025Inclusion·外滩大会 的"全球AI投资展望——AI创业的爆发与中国机遇"论坛上,围绕AI智能体的落地前景及投资逻辑,多位AI领域创业者、投资人展开观点交锋。 阶跃星辰创始人、CEO姜大昕认为,智能体正在金融、医疗、教育等垂直领域快速渗透,而下一代智能硬件的竞争焦点将是"会做事、总在场、有记 忆、能进化"。"随着世界模型的建立,智能体终将从数字世界走向物理世界。走向物理世界能够从经验中学习,自主学习,并且和人类科学家一起发现人类 还没有发现的物理规律。" 在智能体公司的估值尚未锚定、商业化刚刚起跑的黎明,投资人也选择了不同的路径。当前智能体大致可分为通用和垂类两种,前者天花板更高但伴随 更高投资风险,后者则超额回报的想象空间有限。 蚂蚁集团战略投资部资深总监黄海坦言,通用智能体和垂直智能体两个方向都有成长的空间,目前蚂蚁投资的智能体以垂类为主,其选择关键在于"市 场空间够大、付费意愿强,且能够在一定阶段构建护城河"。他同时表示,在投资中对算力等底层基础设施持续关注,未来Toke ...
可灵VS即梦:初探“多模态”
Tai Mei Ti A P P· 2025-09-11 05:33
Core Insights - The article discusses the current state of AI-generated video platforms in China, specifically focusing on two leading platforms: Keling and Jimeng [1] - It explores the process of creating a film using AI, highlighting the roles of AI in scriptwriting, storyboarding, and directing [5][10][18] - The article emphasizes the strengths and weaknesses of the AI platforms in generating videos, particularly in terms of creativity and fidelity [35][42] Group 1: AI Video Generation Process - The first step involves using AI as a screenwriter to create scripts, demonstrating that AI can effectively handle text-based tasks [7][8] - The second step is utilizing AI as an artist to create storyboards, where the quality of images generated can vary, with some instances of misunderstanding instructions [12][14] - The third step involves AI directing the video, where initial results may be impressive, but inconsistencies and logical errors become apparent in later outputs [18][20][24] Group 2: Performance of AI Platforms - Keling shows better performance in understanding abstract concepts and artistic interpretation, often producing videos that reflect the intended themes [36][38] - Jimeng excels in image fidelity and stability, ensuring that the generated videos maintain a consistent visual quality [43][44] - Both platforms face challenges in simulating physical realism and maintaining narrative coherence, leading to issues such as "memory loss" within short video segments [31][50] Group 3: Technical and Cost Considerations - The article notes that the current technology in AI video generation struggles to balance fidelity and creativity, with limitations on video length impacting content expression [50][52] - The cost of using these platforms can be significant, with basic configurations priced at 1 yuan per video for Jimeng and 2 yuan for Keling, indicating that achieving high-quality outputs may require additional investment [59][60] - The need for patience is emphasized, as generating visually appealing films with AI may take time and repeated adjustments [62]
中信证券:巨头持续布局的AI浏览器以及情感陪伴类应用潜力值得关注
Xin Lang Cai Jing· 2025-09-08 00:44
Core Insights - The report from CITIC Securities indicates that overseas AI applications are accelerating as of July 2025, with significant growth in token processing volumes and annual recurring revenue (ARR) for top AI applications [1] Group 1: Token Processing Volumes - Google's token processing volume reached 980 trillion in July, doubling compared to May [1] - Microsoft's Azure AI Foundry saw a token processing volume of 310 trillion in Q2, representing a quarter-over-quarter growth of 210% [1] Group 2: Annual Recurring Revenue (ARR) - The total ARR for the top 100 AI applications overseas reached $39.3 billion in July, marking a 17.3% increase from May [1] Group 3: Application Trends - AI Coding and multimodal applications remain the hottest areas, with products like Lovable, Replit, Pixverse, and Nano Banana gaining traction [1] - The potential of AI browsers and emotional companion applications, which are being continuously developed by major players, is noteworthy [1]
九章云极COO尚明栋:算力利用率不足30%,根源在于「堆硬件」而非「重运营」丨智算想象力十人谈
雷峰网· 2025-09-02 10:09
Core Viewpoint - The cloud computing industry, particularly in intelligent computing, faces challenges such as underutilization of computing power and the inefficiencies of traditional leasing models, necessitating innovative operational strategies to optimize resource usage and costs [3][4][6]. Group 1: Industry Challenges - The average utilization rate of computing power in the industry is below 30%, leading to significant waste [3]. - The traditional bare metal leasing model locks clients into fixed time and resource boundaries, making it difficult for smaller companies to access necessary resources [3][16]. - Many companies in the industry are struggling with issues like project arbitrage and short-term profit chasing due to immature business models and regulatory environments [7]. Group 2: Operational Strategies - Computing power should be viewed as an operational service rather than a one-time product delivery, emphasizing continuous usage and consumption [4][9]. - The introduction of the "Alaya NeW" intelligent computing center operating system aims to optimize hardware management and support a diverse ecosystem, enhancing cost efficiency [6][10]. - The focus on flexible and elastic computing power services is crucial for meeting the diverse needs of clients, particularly in the context of increasing demand for AI applications [13][19]. Group 3: Market Dynamics - The competition in the intelligent computing market is intensifying, with major cloud providers needing to maintain cost competitiveness while developing robust ecosystems [23]. - The shift towards a retail model for computing power, where clients pay based on actual usage rather than fixed leases, is gaining traction [11][15]. - The demand for inference computing power is expected to grow significantly, driven by the increasing application of AI across various industries [26][27]. Group 4: Future Outlook - The intelligent computing industry is at a crossroads, with opportunities for innovation in service delivery and resource management [29]. - The evolution towards multi-modal AI applications indicates a trend towards more integrated and versatile computing solutions [28].
谷歌NanoBanana出圈
Huafu Securities· 2025-08-31 05:19
Investment Rating - The industry rating is "Outperform the Market," indicating that the overall return of the industry is expected to exceed the market benchmark index by more than 5% in the next 6 months [14]. Core Insights - The report highlights the rapid advancement of Google's Nano Banana model, which has become the leading image generation and editing model, scoring 1362 on the lmarena platform, significantly ahead of its competitors [3]. - Nano Banana's capabilities include cross-image consistency, multi-image fusion, conversational/instructional fine editing, and enhanced semantic understanding through Gemini's world knowledge [3]. - The pricing model for Nano Banana is competitive, at $30 per million tokens, translating to approximately $0.039 per image, maintaining the "high cost-performance + low latency" characteristics of the Flash series [3]. - Various application scenarios for Nano Banana have been identified, including design work, creative design for social media, image restoration, and integration with external tools for AI video and 3D generation [4]. - The report notes that overseas platforms such as Adobe and Figma have quickly integrated Nano Banana, validating its productivity enhancements [4]. - Google's Veo3 has emerged as the top model in video generation, capable of producing high-definition video along with audio content, and is widely available across Gemini, Flow, and Vertex AI [5]. - The report suggests a positive outlook for the multi-modal field, particularly focusing on the synergy between Google Veo3 and YouTube's copyright ecosystem [6]. Summary by Sections Industry Dynamics - The Nano Banana model was officially released on August 26, 2023, and has quickly established itself as the most advanced image generation and editing model [3]. - The model's capabilities are being leveraged across various sectors, including branding, e-commerce, and social media content creation [4]. Investment Recommendations - The report recommends focusing on companies involved in AI image applications, such as Wanxing Technology and Meitu, as well as video application companies like Kuaishou and Bilibili [8].
消失一年,Kimi杨植麟最新对话:“站在无限的开端”
创业邦· 2025-08-30 03:19
Core Viewpoint - The article discusses the evolution and advancements in AI, particularly focusing on the Kimi K2 model developed by DeepSeek, highlighting the ongoing challenges and the philosophical implications of problem-solving in AI development [4][5][12]. Group 1: Kimi K2 Model Development - The Kimi K2 model, based on the MoE architecture, represents a significant advancement in AI, allowing for open-source programming and interaction with the digital world [4][5]. - The model's release in July 2025 marked a return to public attention for DeepSeek after a period of relative silence from its founder, Yang Zhilin [4][5]. - The development process involved a shift from pre-training and supervised fine-tuning to a focus on pre-training and reinforcement learning, which significantly impacted the company's operational methods [27][28]. Group 2: Philosophical Insights - Yang Zhilin emphasizes that human civilization is a continuous process of conquering problems and expanding knowledge boundaries, drawing inspiration from David Deutsch's book "The Beginning of Infinity" [5][12]. - The notion that every solved problem leads to new questions is central to the ongoing development of AI, suggesting an infinite journey of exploration and innovation [5][12]. Group 3: Technical Innovations - The K2 model aims to maximize token efficiency, allowing the model to learn more effectively from the same amount of data, which is crucial given the slow growth of high-quality data [29][30]. - The introduction of the Muon optimizer significantly enhances token efficiency, enabling the model to learn from data more effectively than traditional optimizers like Adam [30][31]. - The model's ability to perform complex tasks over extended periods without human intervention is a notable advancement, showcasing the potential for end-to-end automation in AI applications [17][44]. Group 4: Agentic Capabilities - The K2 model is characterized as an Agentic model, capable of multi-turn interactions and utilizing various tools to connect with the external world, which enhances its problem-solving capabilities [43][44]. - The development of multi-agent systems is highlighted as a way to improve task execution and collaboration among different agents, allowing for more complex problem-solving [22][44]. - The challenge of generalization in agent models is acknowledged, with ongoing efforts to improve their adaptability to various tasks and environments [34][46].
谢赛宁回忆七年前OpenAI面试:白板编程、五小时会议,面完天都黑了
机器之心· 2025-08-29 09:53
Core Insights - The article discusses the unique interview experiences of AI researchers at major tech companies, highlighting the differences in interview styles and the focus areas of these companies [1][9][20]. Group 1: Interview Experiences - Lucas Beyer, a researcher with extensive experience at top AI firms, initiated a poll about memorable interview experiences at companies like Google, Meta, and OpenAI [2][20]. - Saining Xie shared that his interviews at various AI companies were unforgettable, particularly noting the rigorous two-hour marathon interview at DeepMind, which involved solving over 100 math and machine learning problems [5][6]. - The interview process at Meta was described as more academic, focusing on discussions with prominent researchers rather than just coding [6][7]. Group 2: Company-Specific Insights - The interview style at Google Research was likened to an academic job interview, with a significant emphasis on research discussions rather than solely on coding challenges [7]. - OpenAI's interview process involved a lengthy session focused on a reinforcement learning problem, showcasing the company's commitment to deep research engagement [8][9]. - The article notes that the interview questions reflect the research priorities of these companies, such as Meta's focus on computer vision and OpenAI's emphasis on reinforcement learning [9][20]. Group 3: Notable Interviewers and Candidates - Notable figures like John Schulman and Noam Shazeer were mentioned as interviewers, indicating the high caliber of talent involved in the hiring processes at these firms [7][9]. - Candidates shared memorable moments from their interviews, such as solving complex problems on napkins or engaging in deep discussions about research topics [19][20].
顶层设计定方向!“人工智能+”锚定发展节奏
Guo Ji Jin Rong Bao· 2025-08-27 11:17
Core Viewpoint - The State Council's issuance of the "Opinions on Deepening the Implementation of the 'Artificial Intelligence+' Action" aims to accelerate the development of artificial intelligence (AI) in China, transitioning from experimental exploration to value creation, with a clear three-step plan for AI integration into key sectors by 2035 [1] Group 1: Development Goals - By 2027, AI is expected to achieve widespread integration with six key sectors, with over 70% application penetration of new intelligent terminals and agents, and a significant increase in the scale of the core AI industry [1] - By 2030, AI applications are projected to exceed 90% penetration, becoming a crucial growth driver for the economy and enhancing public governance [1] - By 2035, China aims to fully enter a new stage of intelligent economy and society, supporting the realization of socialist modernization [1] Group 2: Key Actions and Support Capabilities - The "Opinions" outline six key actions for implementation, including AI in science and technology, industrial development, consumer quality enhancement, public welfare, governance capabilities, and global cooperation [1] - Eight foundational support capabilities are emphasized, such as improving model capabilities, enhancing data supply innovation, and strengthening talent development [1] Group 3: Industry Perspectives - Experts highlight that China's rich application scenarios and strong digital foundations in advanced manufacturing will support the sustainable development of AI technologies [2] - The release of the "Opinions" is seen as timely, with calls for coordinated efforts across regions and sectors to ensure effective implementation [2] - Companies like Jieyue Xincheng are aligning their technology and industry strategies with the goals set out in the "Opinions," focusing on foundational model development and multi-modal applications [3] Group 4: Corporate Insights - Lenovo's chairman emphasizes the shift towards human-machine collaboration as a new norm in enterprise operations, advocating for a holistic transformation rather than fragmented upgrades [4] - The potential risks associated with AI development are acknowledged, with a call for companies to leverage AI innovations while ensuring they meet human needs and adhere to ethical standards [4]
杨红霞:跑通大模型“最后一公里”,让AI不再只是“富人的玩具”
Sou Hu Cai Jing· 2025-08-26 19:05
Core Insights - The article discusses the significant investment gap in AI between US and Chinese tech companies, with US firms investing nearly five times more than their Chinese counterparts by 2025 [7][8] - It highlights the challenges and opportunities in AI development, particularly in the context of healthcare and the application of generative AI [16][22] Investment Disparity - In the past five years, US tech giants like Microsoft and Amazon have collectively spent 5.36 trillion RMB on AI, while leading Chinese companies like Tencent and Alibaba have only invested 630 billion RMB [7][8] - By 2025, US companies are projected to invest around 2.5 trillion RMB in AI, compared to approximately 500 billion RMB from Chinese firms [8] AI Model Development - OpenAI's latest model, GPT-5, is claimed to be the best model yet, but it reportedly lacks the emotional interaction and imagination of its predecessor, GPT-4o [3][4] - The complexity of multi-modal AI remains a significant challenge, with current models struggling to accurately extract and correlate image and text data [4][5] Healthcare Applications - The Hong Kong Polytechnic University is developing a specialized small language model for cancer treatment, collaborating with major hospitals to enhance AI's role in complex medical diagnoses [16][22] - The focus is on creating an AI that can assist in cancer patient follow-ups and streamline processes like target area delineation in radiation therapy [22][23] Future Prospects - The article emphasizes the need for Chinese companies to invest more confidently in AI, suggesting that the future breakthroughs may lie in deeper industrial applications rather than just internet-based solutions [12][13] - There is optimism about overcoming current limitations in AI capabilities, particularly in the context of localized data and specialized applications in healthcare [20][21]
最高提效8倍,腾讯游戏发布专业游戏AI大模型,美术师做动画不用辣么“肝”了
3 6 Ke· 2025-08-26 01:52
Core Insights - The article highlights the significant advancements in AI technology within the gaming industry, particularly showcased at the recent Devcom developer conference alongside the Cologne International Game Show. Major companies like Microsoft, Tencent, Google, and Meta presented over 20 discussions focused on how AI can enhance game art production efficiency and integrate seamlessly with traditional workflows [1][3]. Group 1: AI Tools and Solutions - Tencent Games launched its AI-driven comprehensive game creation solution, VISVISE, which includes tools for animation production, model creation, digital asset management, and intelligent NPCs, aimed at alleviating the repetitive and labor-intensive tasks in game art development [3][8]. - The MotionBlink tool within VISVISE can automatically complete animation sequences based on minimal user input, significantly reducing the time required for animation production from several days to just seconds [3][15]. - The GoSkinning tool, part of VISVISE, automates the skinning process for 3D models, improving efficiency by up to 60% in animation skinning tasks, and has been successfully implemented in popular games like "PUBG Mobile" and "Peacekeeper Elite" [8][24]. Group 2: Challenges in Game Art Production - Traditional game art production consumes 50%-60% of time on asset creation, with 3D modeling and animation being the most labor-intensive processes. The complexity of these tasks often leads to inefficiencies, particularly in skinning and animation adjustments [9][10]. - The article discusses the limitations of traditional methods such as manual keyframing and motion capture, which can be time-consuming and require extensive corrections, highlighting the need for AI solutions to streamline these processes [10][11]. Group 3: Development and Future of AI in Gaming - Tencent's approach to developing VISVISE was driven by actual development needs, beginning its exploration of AI in gaming as early as 2016. The system was officially launched in 2024, integrating various AI tools tailored to different aspects of game creation [24][26]. - The future of AI in gaming is seen as a critical area for development, with the potential for AI to enhance NPC interactions and create more immersive gaming experiences. The relationship between gaming and AI is described as symbiotic, with games serving as both a testing ground and a catalyst for AI advancements [29][30][32].