多模态大语言模型
Search documents
推出金融交易AI Agent,可全天候智能盯盘,这家新加坡金融企业获1000万美元融资|早起看早期
36氪· 2025-05-12 23:56
Core Viewpoint - RockFlow, a Singapore-based AI fintech company, has completed a $10 million A1 funding round to enhance its AI technology and launch its financial AI agent, Bobby [3][4]. Group 1: Company Overview - RockFlow operates five offices globally and covers over 30 countries in nine languages, previously receiving tens of millions in investments from top-tier Silicon Valley funds [4]. - The company launched TradeGPT, the world's first trading AI product, in April 2023, which utilizes multimodal LLM capabilities to analyze vast market information and price-volume data [4]. Group 2: Product Development - RockFlow is developing an AI agent architecture tailored for financial investment scenarios, leveraging cutting-edge technologies such as multimodal large language models (LLM), Fin-Tuning, RAG, Multi-Agent, and CoT [4][5]. - The AI agent aims to enhance understanding and generation capabilities, efficiently process multi-source data, and provide precise financial analysis and investment recommendations [4][5]. Group 3: Investment Process - In investment trading scenarios, RockFlow's AI agent simplifies traditional complex processes into four core steps: real-time information acquisition, analysis, trading strategy construction, and order execution [5]. - The AI agent monitors market dynamics and analyzes extensive data, including financial metrics and social media sentiment, to present personalized real-time trading opportunities [5][6]. Group 4: User Interaction - Users can express their needs in natural language, allowing the AI agent to generate personalized investment configurations and trading strategies based on their profit goals and risk preferences [6]. - The AI agent can also create complex conditional orders and automate investment tasks, assisting users in managing profits and losses effectively [6]. Group 5: Future Outlook - Bobby, the financial AI agent product, is set to launch globally soon, with a team comprising experts from AI, financial mathematics, and investment trading [6].
理想汽车MCAF重构辅助驾驶视觉认知新范式
理想TOP2· 2025-04-25 12:43
以下文章来源于AcademicDaily ,作者AcademicDaily AcademicDaily . AcademicDaily是一个跟踪、推荐和解读大模型等AI成果的技术交流平台,致力于传播和分享前沿技术。 MCAF在理想内部被称为自动驾驶第三只眼。 兼容理想自研的Mind GPT-3o 与 BEV 大模型,无需重新训练。 MCAF是一个 多模态粗到细注意力聚焦框架,核心解决的是长视频理解的关键瓶颈。 当前视频理解领域对长视频(>5分钟)的处理存在显著缺陷,主流方法(如Video-MLLM)依赖全局压缩或均匀采样,导致细 节丢失和冗余计算。MCAF直接针对这一问题,通过多模态分层注意力和时间扩展机制,在信息保留与计算效率之间找到了平 衡点,这是其核心价值。 在平均时长达60分钟的Video-MME数据集上,MCAF超越其他代理方法(如VideoTree、DrVideo)约3-5个百分点。 不同于VideoTree等需要额外奖励模型评估置信度,MCAF利用单一LLM完成生成-评估-调整闭环。这不仅简化了架构(如代码 实现仅需1个LLM接口),还避免了多模型协同的兼容性问题,更适合实际部署。 不过在NEx ...
10倍吞吐提升无损性能:多模态适用的KV cache量化策略来了,即插即用无需改原模型
量子位· 2025-04-03 02:12
CalibQuant团队 投稿 量子位 | 公众号 QbitAI 在InternVL-2.5上实现 10倍吞吐量提升 ,模型性能几乎无损失。 最新1-bit多模态大模型KV cache量化方案 CalibQuant 来了。 通过结合后缩放和校准方法,可显著降低显存与计算成本, 无需改动原模 型即可直接使用 。 即插即用、无缝集成 多模态大语言模型在各种应用中展现出了卓越的性能。然而,它们在部署过程中的计算开销仍然是一个关键瓶颈。 虽然KV cache通过用显存换计算在一定程度上提高了推理效率,但随着KV cache的增大,显存占用不断增加,吞吐量受到了极大限制。 为了解决这一挑战,作者提出了CalibQuant,一种简单却高效的视觉KV cache量化策略,能够大幅降低显存和计算开销。具体来说, CalibQuant引入了一种极端的1比特量化方案, 采用了针对视觉KV cache内在模式设计的后缩放和校准技术,在保证高效性的同时,不牺牲 模型性能。 作者通过利用Triton进行runtime优化,在InternVL-2.5模型上实现了10倍的吞吐量提升。这一方法具有即插即用的特性,能够无缝集成到各 种现有的多 ...
长视频理解新突破!Mamba混合架构让显存消耗腰斩,处理10万视频token不费力
量子位· 2025-03-27 04:16
Core Viewpoint - The article introduces the Vamba model, a hybrid Mamba-Transformer model designed for efficient understanding of long videos, significantly improving processing efficiency without compressing video tokens [1][10]. Group 1: Model Design and Efficiency - Vamba improves the efficiency of processing video tokens during training and inference by redesigning the model architecture rather than compressing video tokens [1][4]. - The model can process four times more video frames under the same hardware conditions compared to traditional Transformer architectures, with over 50% reduction in training memory consumption and doubled training speed [4][9]. - Vamba retains the original spatiotemporal features of videos, avoiding information loss that occurs with traditional downsampling or pooling methods [5][10]. Group 2: Technical Innovations - The core design of Vamba involves breaking down the costly causal self-attention operations into two more efficient components: cross-attention for text tokens and a state space model (SSM) based Mamba-2 module for video tokens [6][7]. - The Mamba-2 module reduces the computational complexity from quadratic to linear, allowing for effective processing of long video sequences [7][9]. - Vamba's architecture allows for efficient alignment of text and video information, enhancing the model's ability to analyze video content based on user queries [9][10]. Group 3: Performance Evaluation - Extensive experiments show that Vamba outperforms existing efficient long video understanding models by approximately 4.3% on the LVBench benchmark [5][10]. - The model demonstrates superior performance across various video duration benchmarks, showcasing its competitive edge in long, medium, and short video understanding tasks [10].