Workflow
多模态模型
icon
Search documents
当所有机器人都在卷四肢和大脑,他十年只做一件事:脸|「锦供参考」Vol.04
锦秋集· 2026-03-03 12:43
过去几年,从国外到国内,从波士顿动力到 Figure,从特斯拉 Optimus 到国内的一批具身智能公司,几乎都在卷更强的四肢、更聪明的大脑。 他叫胡宇航。 在网络上被大家熟知为U航,是首形科技创始人。 但在胡宇航看来,人与机器人之间最关键的接口,并不一定是四肢,也不一定是大脑,而是信任。 在人类社会中,建立信任最快的方式,从来不是语言,也不是行动,而是脸。人类的大脑,从出生开始就对"脸"高度敏感。我们能在几百毫秒内识别出一张脸, 并迅速判断对方的情绪、态度甚至意图。人类可以对一张脸产生情感投射,却很难对一个冷冰冰的机器产生依恋。 "人很难爱上人以外的其他物体"。这种第一眼的直觉魅力,让"脸"成为了建立信任与情感投射的唯一开关。 这也是首形科技从创立之初就选择的一条非共识路线。 这个选择,并不是一个营销噱头,而是来自胡宇航十年的持续经历。 有人说他是"逆袭",有人说他充满"争议"。但很少有人真正看见他光鲜背后带有韧性的经历、沉稳的判断以及牟足力气快准狠的冲劲。 但有一位创业者,选择了一条完全不同的路线。 他十年只做一件 事:机 器人的脸。 我们和200万"电子股东"投的并非他的"流量",而是他"我太想赢了"背 ...
全线飘红!积极因素提振A股开市信心 机构看好这两大主线
Guang Zhou Ri Bao· 2026-02-24 02:49
2月24日,A股迎来马年首个交易日,三大指数全线上涨:上证指数高开1.15%,深证成指高开1.52%,创业板指高开1.7%。有色金属、油气、算力等板块 指数涨幅居前。 对于节后走势,多家券商机构表示乐观。"随着一系列条件的满足和不确定性因素的落地,建议大家重整旗鼓,备战马年的第一波上涨周期。"广发证券策 略首席分析师刘晨明认为。 中国银河证券策略首席分析师杨超也表示,春节假期后,在政策预期、流动性支持与产业趋势催化下,市场震荡上行概率较大,未来一段时间A股市场或 以政策催化为核心驱动力,资金将围绕政策导向的产业主线与主题机会展开博弈,呈现"政策热点轮动、风格切换快速"的特征。 兴业证券经济与金融研究院策略研究首席分析师张启尧认为,对于A股而言,节前A股跟随海外资产调整后已释放了一定的风险,节后A股即将进入一段 高胜率窗口,继续看好A股节后迎来新一轮上行。 | 行情 | 资金净流入 | 涨跌分布 | | --- | --- | --- | | 上证指数 | 深证成指 | 科创综指 | | 4129.13 | 14313.86 | 1830.15 | | +47.06 +1.15% +213.67 +1.52% ...
全年维度看好AI的价值落地与商业化
Core Viewpoint - The year 2026 is identified as a critical year for the commercialization and value realization of AI technologies, following a period of model competition and application exploration from 2023 to 2025 [3]. Market Review - During the period from February 9 to February 13, 2026, the CSI 300 Index increased by 0.36%, while the Computer Index rose by 4.35% [2]. AI Commercialization - Anthropic is recognized as one of the fastest companies in AI commercialization, recently raising $30 billion in a Series G funding round, leading to a valuation of $380 billion [3]. - Anthropic's ARR (Annual Recurring Revenue) reached $1 billion by the end of 2023, projected to grow to $10 billion by the end of 2024, and has already reached $14 billion by February 2026 [3]. - The Claude Code model has become a significant growth driver for Anthropic, with its ARR surpassing $2.5 billion and a fourfold increase in enterprise subscriptions since early 2026 [3]. - OpenAI has disbanded its internal "Mission Alignment" team and reduced its computing expenditure target to $600 billion, with projected total revenue exceeding $280 billion by 2030, indicating a shift towards commercial priorities [3]. Multimodal Models - The year 2026 is anticipated to be a pivotal moment for multimodal models, with significant advancements expected in video and audio capabilities [4]. - OpenAI's initial Sora model, launched in February 2024, is compared to a breakthrough moment in video technology, with subsequent models expected to enhance narrative control and audio support [4]. - The introduction of various models, such as Veo3.1 and Seedance2.0, is expected to drive down costs while improving capabilities, fostering growth in creative sectors like film, gaming, and advertising [4]. Investment Recommendations - The company maintains two key judgments: 2026 will be crucial for AI commercialization, and multimodal models are likely to experience significant advancements [5]. - Recommended AI application companies include Kingsoft Office, Hehe Information, Dingjie Zhizhi, and others, with beneficiaries in the multimodal field such as Wanxing Technology and Meitu [5].
周观点:全年维度看好AI的价值落地与商业化
KAIYUAN SECURITIES· 2026-02-23 10:45
Investment Rating - The industry investment rating is "Positive" (maintained) [1] Core Insights - 2026 is seen as a pivotal year for AI to achieve value realization and commercialization, with major companies focusing on this transition [4][10] - Anthropic is recognized as one of the fastest commercializing large model companies, recently raising $30 billion in Series G funding, pushing its valuation to $380 billion [4][10] - The ARR (Annual Recurring Revenue) of Anthropic reached $14 billion by February 2026, with significant growth driven by its Claude Code model [4][10] - OpenAI has shifted its focus from AGI ideals to commercial priorities, reducing its computational spending target to $600 billion and projecting total revenue to exceed $280 billion by 2030 [4][10] - The emergence of multimodal models is anticipated to reach a "DS moment" in 2026, enhancing capabilities while significantly reducing costs, benefiting sectors like film, gaming, and advertising [5][11] Summary by Sections Market Review - During the period from February 9 to February 13, 2026, the CSI 300 index increased by 0.36%, while the computer index rose by 4.35% [3][13] Investment Recommendations - Key recommendations for AI applications include companies such as Kingsoft Office, Hehe Information, Dingjie Shuzhi, and others, with beneficiaries in the multimodal field including Wanxing Technology, Huitian Ruisheng, and others [6][12]
周观点:全年维度看好AI的价值落地与商业化-20260223
KAIYUAN SECURITIES· 2026-02-23 07:56
Investment Rating - The investment rating for the computer industry is "Positive" (maintained) [1] Core Viewpoints - The year 2026 is seen as a critical year for AI to achieve value realization and commercialization, with major companies like Anthropic leading in commercialization speed and significant revenue growth [4][10] - Multi-modal models are expected to reach a "DS moment" in 2026, enhancing capabilities while significantly reducing costs, which will benefit sectors like film, gaming, and advertising [5][11] Summary by Sections Market Review - During the period from February 9 to February 13, 2026, the CSI 300 index increased by 0.36%, while the computer index rose by 4.35% [3][13] Industry Dynamics - The AI sector is transitioning from model competition to application exploration, with a focus on commercialization in 2026 [4][10] - Anthropic's Claude model has shown impressive growth, with an annual recurring revenue (ARR) reaching $14 billion by February 2026, driven by its enterprise subscription growth [4][10] - OpenAI has shifted its focus from AGI ideals to commercial priorities, with projected revenues exceeding $280 billion by 2030 [4][10] Investment Recommendations - Key AI application companies recommended include Kingsoft Office, Hehe Information, Dingjie Shuzhi, and others, with beneficiaries in the multi-modal field such as Wanxing Technology and Meitu [6][12]
阿里发布千问3.5:性能媲美Gemini 3,Token价格仅为其1/18
Xin Lang Cai Jing· 2026-02-16 09:13
Core Insights - Alibaba has launched the new generation large model Qwen3.5-Plus, claiming it rivals Gemini 3 Pro and is the strongest open-source model globally [1][4] - The Qwen3.5-Plus model features a total of 397 billion parameters, with only 17 billion activated, outperforming the trillion-parameter Qwen3-Max model while reducing deployment memory usage by 60% and significantly enhancing inference efficiency [1][4] - The API pricing for Qwen3.5-Plus is set at 0.8 yuan per million tokens, which is only 1/18th of the cost of Gemini 3 Pro [1][4] Model Architecture and Performance - Qwen3.5 represents a generational leap from pure text models to native multimodal models, utilizing a mixed token pre-training approach that includes visual and text data [1][4] - The model has been trained with a substantial increase in multilingual, STEM, and reasoning data, allowing it to acquire denser world knowledge and reasoning logic [1][4] - Qwen3.5 achieves top-tier performance with less than 40% of the parameters of the Qwen3-Max model, excelling in inference, programming, and agent intelligence evaluations [1][4] Benchmark Performance - In the MMLU-Pro knowledge reasoning evaluation, Qwen3.5 scored 87.8, surpassing GPT-5.2 [2][5] - The model achieved 88.4 in the PhD-level GPQA assessment, outperforming Claude 4.5 [2][5] - Qwen3.5 set a record with a score of 76.5 in the instruction-following IFBench, and it also exceeded Gemini 3 Pro and GPT-5.2 in various agent evaluations [2][5]
这个春节,字节跳动杀疯了!Seedance2.0、豆包2.0接连问世,一文全看懂
Sou Hu Cai Jing· 2026-02-14 14:21
转载自《硅星人Pro》 作者 | 王兆洋 作者|王兆洋 邮箱|wangzhaoyang@pingwest.com 价格只有Gemini 3 pro的1/4、多模态理解和推理能力顶级、从底层支撑了现象级Seedance2.0大杀四方的大一统基座模型豆包2.0,终于来了。 这是最近最被期待的模型之一。即便这个春节的AI圈如此热闹,你也不得不承认,目前字节跳动成功抢走了绝大部分注意力。 先是Seedance2.0的惊艳亮相——各个社交网络上都是它制作的惊人的视频,被形容为"杀死比赛"和结束AIGC童年期,并且被很多人用来与去年DeepSeek 效应对比;再是媲美Nano Banana,在理解和推理上有很大进步的Seedream模型;然后就是刚刚,为前两个模型提供了底层智能基础的基座模型豆包2.0最 终亮相。 这次豆包大模型2.0系列(Doubao-Seed-2.0)提供了多个模型选择:包含 Pro、Lite、Mini 三款多模态通用模型,以及面向开发者的 Code 模型(Doubao- Seed-2.0-Code),以满足不同场景下企业和用户对延迟和成本的不同需求。 至此,字节整个豆包大模型家族到齐。三连击,注意力 ...
Seedance 2.0全量上线,字节正式加入春节模型大战
3 6 Ke· 2026-02-12 09:53
Core Insights - ByteDance has officially launched Seedance 2.0, a video model that supports multi-modal input, marking its entry into the competitive landscape of video generation technology during the Spring Festival model battle [1][2]. Group 1: Product Features - Seedance 2.0 utilizes a unified multi-modal audio-video generation architecture, allowing inputs from text, images, audio, and video [2]. - The model supports mixed modal input, enabling users to input up to 9 images, 3 video clips, 3 audio segments, and natural language instructions simultaneously [3]. - Compared to its predecessor, version 1.5, Seedance 2.0 emphasizes improved generation quality, complex interactions, and high usability in dynamic scenes, adhering more closely to physical laws [6]. Group 2: User Experience - Users can generate a 5-second video in approximately 2 hours, with a deduction of 40 points from their account for each video generated, and the system offers two free acceleration opportunities [4]. - The model allows for video editing capabilities, enabling users to modify specific segments, characters, actions, or plots during the generation process [8]. - Seedance 2.0 supports the generation of multi-shot videos up to 15 seconds long, enhancing its applicability in film and advertising sectors while reducing content production costs [9]. Group 3: Performance Comparison - ByteDance claims that Seedance 2.0 significantly outperforms competitors like OpenAI's Sora 2 Pro and Kuaishou's Keling 3.0 in terms of stability, instruction adherence, and audio-visual synchronization [16]. - In multi-modal task performance, Seedance 2.0 excels in instruction adherence and multi-modal compliance, ranking among the top tier in the industry for editing consistency and dynamic quality [17]. - The model demonstrates strong performance in maintaining consistency in character representation and voice restoration, although there is still room for improvement in multi-character consistency and complex editing effects [18].
AI产品测评体验系列报告:多模态模型迎来Deepseek时刻,供给革命将重新定义内容创作范式
Huachuang Securities· 2026-02-12 04:16
Investment Rating - The report maintains a "Recommendation" rating for the industry, expecting the industry index to rise more than 5% over the next 3-6 months compared to the benchmark index [67]. Core Insights - The report highlights a significant transformation in the content creation paradigm due to advancements in multi-modal models, particularly with the release of new video generation models like Kuaishou's Keling 3.0 and ByteDance's Seedance 2.0, which enhance precision and controllability in video production [3][11]. - The supply-side revolution is expected to reshape the cost structure of content production, with a notable reduction in marginal costs due to improved capabilities in image and video generation [3][58]. - The report emphasizes the potential for AI video generation technology to catalyze downstream opportunities in content IP, content copyright, and AI application tools, as well as the demand for cloud services and computing power [3][58]. Summary by Sections Industry Basic Data - The industry comprises 138 listed companies with a total market value of approximately 22,562.85 billion and a circulating market value of about 20,757.76 billion [5]. Relative Index Performance - The absolute performance over 1 month, 6 months, and 12 months is 8.9%, 31.9%, and 42.7% respectively, while the relative performance is 9.6%, 16.8%, and 21.6% [6]. Model Updates - Keling 3.0 and Seedance 2.0 represent significant upgrades in video generation models, focusing on physical realism, narrative consistency, and enhanced understanding of complex text instructions [11][41]. - Keling 3.0 introduces integrated editing capabilities, allowing for controlled modifications to generated content without the need for complete regeneration [29][33]. - Seedance 2.0 enhances the precision of video production, addressing common issues such as consistency and complex action replication [41][50]. Investment Opportunities - The report identifies several investment opportunities in the content IP sector, including companies like Zhongwen Online, Yuedu Group, and Shanghai Film, as well as in content copyright and AI video production tools [58][59].
春节文娱+AI赋能,传媒板块全线爆发,关注游戏ETF(516010)、影视ETF(516620)
Mei Ri Jing Ji Xin Wen· 2026-02-11 01:28
Group 1 - The media sector experienced a significant surge, with the gaming ETF (516010) rising over 5% and the film ETF (516620) hitting a temporary limit up, driven by increased expectations for entertainment consumption during the Spring Festival and the catalytic effect of AI video models [1][3] - The launch of ByteDance's Seedance 2.0 platform allows for automatic scene planning and sound effect integration, achieving near "indistinguishable" movie-level output, indicating a high level of AI video capability [3] - The current valuation of the gaming sector remains attractive, with a clear logic for a "product year" in 2026, as the core companies' PE valuations have not surpassed previous highs for 2025 and 2026, providing a favorable risk-reward profile [4][5] Group 2 - The domestic gaming market is projected to exceed 350 billion yuan in sales revenue for the first time in 2025, with a year-on-year growth of 7.68%, and a record high of 1,771 game licenses issued, laying a solid foundation for the product year in 2026 [4] - The upcoming 2026 Spring Festival is expected to be the longest in history, with many manufacturers well-prepared, indicating the gaming industry is set to enter a peak season [4] - The gaming sector is currently experiencing a triple resonance of high valuation attractiveness, a product year, and the Spring Festival peak, warranting continued attention [5]