Workflow
多模态大模型
icon
Search documents
Discrete Tokenization:多模态大模型的关键基石,首个系统化综述发布
机器之心· 2025-08-05 18:56
Core Insights - The article discusses the advancements in Discrete Tokenization for Multimodal Large Language Models (LLMs), emphasizing its role in transforming various modalities into discrete representations that LLMs can process effectively [2][39]. - A comprehensive survey has been released, detailing the technical landscape, challenges, and future research directions in the field of Discrete Tokenization for Multimodal LLMs [2][39]. Multimodal LLMs and Discrete Tokenization - Recent breakthroughs in Large Language Models (LLMs) have led to their application in various text tasks, prompting interest in extending their capabilities to non-text modalities such as images, audio, and video [2]. - Discrete Tokenization has emerged as a key solution, utilizing techniques like Vector Quantization (VQ) to compress high-dimensional continuous inputs into compact discrete tokens, enhancing cross-modal understanding and generation [2][39]. Systematic Review and Methodologies - The article presents the first systematic review of Discrete Tokenization for Multimodal LLMs, organizing content based on input data modalities and combinations, from early single-modal to multi-modal tokenization methods [2][39]. - Eight core categories of Vector Quantization methods are identified, including VQ, RVQ, PQ, AQ, FSQ, LFQ, BSQ, and Graph Anchor-Relation Tokenization, each with unique characteristics suitable for different modalities and tasks [8][9][14]. Challenges and Future Directions - Key challenges in Discrete Tokenization include codebook collapse, information loss during quantization, difficulties in gradient propagation, and issues with granularity and semantic alignment [12][36]. - Future research directions may focus on adaptive quantization, unified frameworks, biologically inspired codebooks, cross-modal generalization, and enhancing interpretability [37][36]. Applications in Single and Multimodal Tasks - Discrete Tokenization has been widely applied in single-modal tasks such as image retrieval, audio encoding, and video representation, allowing LLMs to process non-text modalities effectively [20][22]. - In multimodal tasks, it serves as a semantic bridge, enabling models to handle complex inputs across different modalities, facilitating tasks like cross-modal retrieval and generation [27][30].
重金研发“拥抱”AI时代,安防龙头海康威视市值迈向3000亿元
Mei Ri Jing Ji Xin Wen· 2025-08-03 07:41
Core Viewpoint - Hikvision has shown a strong performance in the first half of 2025, with revenue and net profit growth, indicating a successful transition towards AI and IoT solutions [1][3][6] Financial Performance - In the first half of 2025, Hikvision achieved revenue of 41.818 billion yuan, a year-on-year increase of 1.48% [1][3] - The net profit attributable to shareholders was 5.657 billion yuan, reflecting a significant year-on-year growth of 11.71% [1][3] - The operating cash flow improved dramatically from -190 million yuan in the same period last year to 5.34 billion yuan, marking a 2917.5% increase [3] Business Structure - Traditional security business remains the core, but innovative business has emerged as a "second growth curve," contributing 11.766 billion yuan in revenue, a 13.92% increase, accounting for 28.14% of total revenue [3] - Key innovative segments include Hikrobot, Ezviz, Hikvision Automotive Electronics, and Hikvision Microfilm, which have established leading positions in their respective fields [3] Strategic Transition - Hikvision is transitioning from a "security equipment leader" to an "AIoT solution provider," with a focus on leveraging AI breakthroughs for business growth [1][6] - The company has invested over 50 billion yuan in R&D since 2020, with R&D expenses accounting for 13.56% of revenue in the first half of 2025 [6][8] Market Challenges - The traditional security business faces challenges due to shrinking market demand and increased government fiscal pressure, leading to a decline in domestic revenue contribution [4] - Internationally, Hikvision's business has been impacted by being placed on the U.S. entity list and restrictions in key markets like Canada, although the overall revenue impact remains limited [5] AI Innovations - Hikvision has launched hundreds of AI model products across various sectors, including industrial manufacturing and traffic management, enhancing operational efficiency and safety [7][8] - The company’s AI innovations are seen as a key driver for its market valuation, with a target market capitalization approaching 300 billion yuan [8]
智元机器人罗剑岚老师专访!具身智能的数采、仿真、场景与工程化~
自动驾驶之心· 2025-08-01 16:03
1. 大家都知道数数据是提升智能燃料,然后传感器又是采集数据的关键,想问一下智元在传感器的研发采 购上有什么规划?如何增加产品数据的使用性? 罗剑岚:我们已与多家传感器供应商展开合作,重点聚焦视觉触觉与高密度传感器的联合研发。同时,我 们正在构建跨平台的数据采集 API,实现任务语义的统一映射,为模型训练提供标准化、可训练的数据输 入。 点击下方 卡片 ,关注" 具身智能 之心 "公众号 具身智能之心受邀参加WAIC 2025智启具身论坛,并有幸采访到了智元机器人首席科学家罗剑岚博 士。以下为采访过程中罗博重点提到和探讨的问题。 具身智能数据讨论 2. 因为你刚才说的世界模型挺有用的,加入世界模型以后,加一些采集数据可以让它变好了,我想知道完 成这一步之后距离应用还有多远,从采集完数据到应用之间还有什么门槛? 罗剑岚:还有性能,机器人的性能要很高,真正变得有用,在你家里,给一个机器人扫地也好,或者装洗 碗机的机器人,要有95%的成功率,在100万家庭里面,这是很难的问题。 3. Sergey Levine他有发过最新的一篇文章,提出了一个Sporks of AGI观点。仿真会阻碍具身智能的scale。 我想知 ...
从Figma到中国垂类应用全球崛起
格隆汇APP· 2025-08-01 05:27
Group 1 - Figma is revolutionizing design productivity, targeting a $33 billion full-process product development ecosystem, starting from a $2.2 billion front-end design software market [2] - Figma's core product leverages lightweight design, community proliferation, and collaborative work to gain traction in the global design tools market [2] - The company is integrating AI programming capabilities into collaborative platforms, aiming for a future of "no-code development" [4] Group 2 - The global AI application landscape is on the verge of a breakthrough, with multi-modal large language models (MLLM) emerging as a key evolution point [5][6] - Multi-modal applications are proving to have superior monetization capabilities compared to pure text products, with companies like OpenAI and Anthropic achieving significant annual recurring revenue (ARR) [7] - Midjourney and Runway are examples of companies successfully monetizing multi-modal capabilities, with Midjourney generating $500 million annually and Runway exceeding one million paid users [7] Group 3 - Chinese companies are leading in video generation within multi-modal applications, with firms like Meitu, Kuaishou, and Ruqi Software achieving over $100 million in annual revenue [8] - Meitu's AI design tool has captured 25% market penetration in Southeast Asian e-commerce, while Kuaishou's video generation tool reached an ARR of over $100 million within 10 months [8] Group 4 - There are premium opportunities for technology export, as overseas users show a higher willingness to pay for AI services compared to domestic users [9] - Figma's comprehensive coverage of the design process creates an ecological advantage, while domestic companies need to establish dual barriers in vertical fields [10] - The Chinese government is supporting AI application development through initiatives like the "Digital China Construction 2025 Action Plan" [10] Group 5 - The rise of Figma and multi-modal large models signifies a paradigm shift in productivity tools, requiring both foundational architecture innovation and deep dissection of vertical scenarios [12] - Companies that can convert technological advantages into global market shares are expected to emerge as new commercial legends in the AI landscape [12]
邝子平对话印奇:商业模式闭环才能持续推动技术进步,AI时代硬件机会巨大
IPO早知道· 2025-08-01 04:12
启明创投希望通过举办有特色的分论坛,给业界乃至整个人工智能产业带来一些好的、有 用的信息。 本文为IPO早知道原创 作者| Stone Jin 微信公众号|ipozaozhidao 据 IPO早知道消息, 由启明创投主办的 2025世界人工智能大会(WAIC)"启明创投·创业与投资 论坛——创业投资开启AI技术与应用共振周期" 于 7月28日 在上海世博中心蓝厅成功举办。 在对话环节中, 千里科技董事长 印奇 和邝子平围绕《 "AI+终端"进化论:大模型赋能终端进化与 产业重构》展开专题对话。 事实上,启明创投是旷视科技最早的机构投资方之一,邝子平与印奇因 AI 相识已十二、三年。 在本次对话中, 印奇 分享了其认为的 AI终端领域未来将呈现 的 两个核心趋势 ,以及对两段创业 经历的思考。 以下系对话节选: 未来 3年是AI+终端非常有意思的3年 邝子平:我们是因 AI而认识,今天还是因AI这个话题同台,一晃十几年过去了,你现在是千里科技 的董事长,也在做很多其他的AI领域的事情,要不聊一下最近在忙什么事情。 启明创投创始主管合伙人邝子平 在 欢迎 致辞中 表示,作为中国在 AI领域最早投资且布局最丰富的 ...
AI驱动下,通信云行业的全球化变革
Ai Rui Zi Xun· 2025-07-30 01:18
Investment Rating - The report indicates a cautious outlook for the global internet communication cloud market, with a projected market size of approximately $6.8 billion in 2024, anticipating a new growth phase in the next 2-3 years [3][15]. Core Insights - The development of AI is transforming the communication cloud industry into a key infrastructure for human and machine interactions, driven by the need for reliability, real-time communication, and multi-modal capabilities [10][11]. - The demand from developers is increasingly focused on security, intelligence, and openness, with a shift from basic communication services to AI-enabled solutions [6][25]. - The report highlights the dual empowerment of AI and communication, suggesting that both will evolve together to enhance interaction methods and application scenarios [10][11]. Summary by Sections 01 AI时代的新基础设施 - The report emphasizes the significance of internet communication cloud as a foundational infrastructure in the AI era, facilitating immersive AI interactions and meeting the demands for reliable and real-time communication [10][11]. 02 互联网通信云技术演进 - The evolution of technology in the communication cloud sector is marked by a focus on security upgrades and compliance with data privacy regulations, which are becoming essential for global market entry [30][31]. 03 竞争格局与典型企业 - The competitive landscape is characterized by a shift towards providing comprehensive AI capabilities, with top players focusing on integrating AI with communication services to enhance user experience and meet compliance requirements [59][64]. 04 发展趋势及展望 - Future trends indicate that the integration of GenAI will drive the development of multi-modal interactions, with communication cloud vendors optimizing transmission effects to cater to new application scenarios [5][51].
2025年AI驱动下通信云行业的全球化变革
艾瑞咨询· 2025-07-28 09:04
Core Insights - The global internet communication cloud market is projected to reach approximately $6.8 billion in 2024, with expectations of a new growth cycle in the next 2-3 years driven by AI applications [1][7] - AI and communication are mutually empowering, leading to a transformation of communication infrastructure into immersive AI interaction platforms [4][40] Market Overview - The global internet communication cloud market is expected to grow to $6.8 billion in 2024, with a slowdown in growth due to the maturity of AI application scenarios and macroeconomic challenges [7][11] - The current penetration rate of AI in the cloud communication market is around 15%, with potential for growth in new application scenarios such as AI companionship and customer service [7][36] Technological Focus - Developers are increasingly demanding security, intelligence, and openness in communication cloud services, driven by regulatory requirements and the need for data privacy [2][14] - The evolution of communication cloud services is shifting from basic information transmission to AI interaction hubs, focusing on scenario-based empowerment and data value extraction [2][24] Development Trends - The integration of GenAI is driving the convergence of text, voice, and video interactions, prompting communication cloud providers to enhance transmission effectiveness for new use cases [3][43] - Future competition will center around "multimodal large models × scenario-based services," reshaping human-computer interaction paradigms [3][40] Domestic Market Characteristics - The Chinese internet application market is entering a phase of refined operations, with enterprises focusing on enhancing product competitiveness through stable and reliable communication services [11][36] - Despite the exploration of potential blockbuster AI applications, the market remains dominated by "model as application" approaches without significant breakthroughs [11][36] International Market Characteristics - Global demand for communication cloud services is converging on security, intelligence, and openness, influenced by regional policy environments and user behaviors [14][19] - In mature markets like Europe and North America, data privacy and compliance are top priorities, while emerging markets focus on localized adaptations and innovative scenarios [14][19] Security Upgrades - Over 82% of countries are establishing or enhancing data privacy regulations, making compliance a cornerstone for global market entry [17][19] - The demand for self-controlled communication platforms is rising due to geopolitical tensions, necessitating a focus on data security and compliance with local laws [19][22] Smart Upgrades - Communication cloud providers are concentrating on core communication capabilities while integrating third-party AI models to meet customer demands for generative AI capabilities [24][26] - The transition from auxiliary tools to immersive human-computer interaction is underway, with a focus on low-accuracy, low-real-time value scenarios for initial breakthroughs [26][29] Open Upgrades - The openness of communication cloud platforms is reflected in product and ecosystem dimensions, enabling developers to customize functionalities and enhance efficiency [29][33] - As businesses globalize, cross-platform compatibility will become a critical consideration for developers, necessitating stable communication functions across various devices and systems [29][36] Industry Trends - The integration of large models and security technologies is becoming a key focus for communication cloud providers, enhancing their capabilities in a competitive landscape [33][40] - The future of communication cloud services will involve leveraging multimodal large models and wearable hardware to create new interaction paradigms and maximize data value [43][45]
“AI六小虎”战局升级:阶跃星辰冲刺10亿元营收,大模型进入商业化比拼时代|聚焦2025WAIC
Hua Xia Shi Bao· 2025-07-28 04:19
Core Viewpoint - The company aims to achieve an annual revenue target of 1 billion yuan, the highest among the "AI Six Tigers" so far, despite not yet reaching profitability [2][3]. Group 1: Revenue and Business Model - The company has signed contracts worth several hundred million yuan in the first half of the year, indicating strong revenue potential [3]. - Revenue primarily comes from the application of terminal large models in key sectors such as automotive, mobile phones, and IoT devices, with significant partnerships established [3]. - The company has collaborated with over half of the leading domestic smartphone manufacturers and has launched an AI smart cockpit in partnership with Geely [3]. Group 2: Model Development and Technology - The newly released Step 3 model emphasizes generality and multi-modal capabilities, allowing for better adaptability across various applications [4][6]. - The Step 3 model has achieved a performance efficiency of up to 300% on domestic chips compared to competitors, showcasing cost optimization efforts [7]. - The company has formed the "MoCore Ecological Innovation Alliance" with nearly 10 chip and infrastructure manufacturers to enhance the integration of chips, models, and platforms [7]. Group 3: Funding and Future Plans - The company is seeking new funding, with participation from Shanghai State-owned Capital Investment Co., Ltd. in its latest financing round [4]. - There are currently no immediate plans for an IPO, with only one of the "AI Six Tigers" having initiated the process [5]. - The company remains open to using various chip technologies, including NVIDIA, to ensure competitive performance in model development [8][9].
全球约八成医疗机构正在部署或设点生成式AI工具 人工智能正重构医疗健康全产业链
Group 1 - The core viewpoint of the articles is that artificial intelligence (AI) is fundamentally reshaping the global healthcare industry, with approximately 80% of medical institutions deploying or planning to implement generative AI tools [2][3] - AI is becoming the core engine driving leapfrog development in the healthcare sector, enabling new applications in clinical diagnosis, drug and device development, and hospital management [1][2] - The integration of AI technologies into healthcare is leading to a new paradigm characterized by intelligent, precise, and personalized medicine [1] Group 2 - The rapid development of AI technology is profoundly reconstructing the entire healthcare industry chain, with significant advancements from research labs to clinical applications and hospital management systems [2] - Challenges such as data barriers, regulatory ethics, and technical standards are emerging as major obstacles to the development of AI in healthcare [3] - Trust issues and the "black box" nature of algorithms are identified as the biggest barriers to the application of AI in healthcare, necessitating the establishment of transparent and inclusive systems [3]
AI教父辛顿尖峰对话:各国应大量研究并分享让AI善良的技术
Core Insights - The dialogue between Geoffrey Hinton and Zhou Bowen at the World Artificial Intelligence Conference highlighted the advancements in AI, particularly in multimodal models and their potential consciousness [1][4][5] - Hinton emphasized the importance of training AI to be both intelligent and kind, suggesting that different techniques are required for each aspect [6][7] Group 1: AI Consciousness and Learning - Hinton argues that current multimodal chatbots possess a form of consciousness, challenging traditional definitions of subjective experience [4][5] - He believes that intelligent agents can learn from their own experiences, potentially acquiring knowledge beyond human capabilities [6][7] Group 2: Training AI for Kindness - Hinton suggests that while it is possible to develop AI that is both smart and kind, the methodologies for achieving these traits differ significantly [6][7] - He advocates for international collaboration in sharing techniques that promote AI kindness, even if countries are reluctant to share methods for enhancing intelligence [6][7] Group 3: Advice for Young Scientists - Hinton encourages young researchers to explore areas where "everyone is wrong," as this can lead to significant breakthroughs [2][10] - He stresses the importance of perseverance in pursuing new ideas, even in the face of skepticism from mentors [2][10] Group 4: AI's Role in Scientific Advancement - Hinton acknowledges the clear benefits of AI in scientific research, citing examples like protein folding and weather prediction where AI has outperformed traditional methods [8][9] - He believes that AI will continue to drive progress across various scientific fields, enhancing predictive capabilities [8][9]