Workflow
Transformer
icon
Search documents
AI医疗影像:在数据“围城”中如何突围
Jing Ji Guan Cha Wang· 2025-12-08 07:06
医疗影像(X光片、CT、MRI、超声等)指利用各种成像技术,将人体内部的结构或组织以可视化的形式呈现出来,对疾病的诊断、治疗和监测起到重要的 作用。 刘劲、段磊、李嘉欣/文 近日,国家卫生健康委办公厅等五部门发布《关于促进和规范"人工智能+医疗卫生"应用发展的实施意见》,提出"人工智能+医疗卫生"发展的时间表:到 2030年,基层诊疗智能辅助应用基本实现全覆盖,推动实现二级以上医院普遍开展医学影像智能辅助诊断、临床诊疗智能辅助决策等人工智能技术应 用,"人工智能+医疗卫生"应用标准规范体系基本完善,建成一批全球领先的科技创新和人才培养基地。 当前,中国的医疗影像智能化建设确实正在提速,推广医学影像智能诊断服务,为提升基层医疗服务能力提供新路径。 由于医疗影像的数字化起步较早,数据结构相对标准化,便于计算机视觉处理,早在90年代,业界便开始尝试将医疗影像与计算机辅助诊断相结合;后来, 以卷积神经网络(CNN)为代表的深度学习技术在图像识别领域取得巨大突破。自2017年左右起,AI技术与医疗影像的研究、临床试验和实际应用开始快 速发展,成为AI技术在各行业中最早实现规模化落地的场景之一。 目前,AI医疗影像产业的 ...
谷歌祭出Transformer杀手,8年首次大突破,掌门人划出AGI死线
3 6 Ke· 2025-12-08 01:01
Core Insights - Google DeepMind CEO Hassabis predicts that Artificial General Intelligence (AGI) will be achieved by 2030, but emphasizes the need for 1-2 more breakthroughs akin to the Transformer and AlphaGo before this can happen [11][4][16]. Group 1: AGI Predictions and Challenges - Hassabis stresses the importance of scaling existing AI systems, which he believes will be critical components of the eventual AGI [3]. - He acknowledges that the path to AGI will not be smooth, citing risks associated with malicious use of AI and potential catastrophic consequences [13]. - The timeline for achieving AGI is estimated to be within 5 to 10 years, with a high bar set for what constitutes a "general" AI system, requiring comprehensive human-like cognitive abilities [16][18]. Group 2: Titans Architecture - Google introduced the Titans architecture at the NeurIPS 2025 conference, which is positioned as the strongest successor to the Transformer [6][21]. - Titans combines the rapid response of Recurrent Neural Networks (RNN) with the powerful performance of Transformers, achieving high recall and accuracy even with 2 million tokens of context [7][8]. - The architecture allows for dynamic updates of core memory during operation, enhancing the model's ability to process long contexts efficiently [22][43]. Group 3: MIRAS Framework - The MIRAS framework is introduced as a theoretical blueprint that underpins the Titans architecture, focusing on memory architecture, attentional bias, retention gates, and memory algorithms [36][39]. - This framework aims to balance the integration of new information with the retention of existing knowledge, addressing the limitations of traditional models [39][40]. Group 4: Performance Metrics - Titans has demonstrated superior performance in long-context reasoning tasks, outperforming all baseline models, including GPT-4, on the BABILong benchmark [43]. - The architecture is designed to effectively scale beyond 2 million tokens, showcasing its advanced capabilities in handling extensive data [43]. Group 5: Future Implications - The advancements in Titans and the potential for Gemini 4 to utilize this architecture suggest a significant leap in AI capabilities, possibly accelerating the arrival of AGI [45][48]. - The integration of multi-modal capabilities and the emergence of "meta-cognition" in Gemini indicate a promising direction for future AI developments [48].
AI 赋能资产配置(二十九):AI 预测股价指南:以 TrendIQ 为例
Guoxin Securities· 2025-12-03 13:18
Core Insights - The report emphasizes the growing importance of AI in asset allocation, particularly in stock price prediction, highlighting the capabilities of AI models like TrendIQ in addressing the limitations of traditional machine learning approaches [3][4][10]. Group 1: AI in Stock Price Prediction - The introduction of AI large models has significantly improved the ability to predict stock prices by effectively collecting and analyzing unstructured information, which traditional models struggled with [3][4]. - TrendIQ is presented as a mature financial asset price prediction platform that offers both local and web-based deployment options, catering to different user needs [4][10]. - The report discusses the evolution of predictive models from LSTM to more advanced architectures like Transformers, which provide better handling of complex financial data and improve predictive accuracy [5][10]. Group 2: Model Mechanisms and Limitations - LSTM has been the preferred model for stock price prediction due to its ability to handle non-linear and time-series data, but it has limitations such as single modality and weak interpretability [6][7]. - The report outlines the integration of LSTM with other models like XGBoost and deep reinforcement learning to enhance predictive capabilities, addressing some of LSTM's shortcomings [6][10]. - The emergence of Transformer architecture is noted for its advantages in global context awareness and the ability to perform zero-shot and few-shot learning, which enhances its applicability in financial predictions [8][10]. Group 3: TrendIQ Implementation - The report details the implementation of TrendIQ, which includes a complete framework for data preparation, model training, and user interaction through a web application [12][20]. - The training process involves collecting historical stock data, preprocessing it, and training the LSTM model, ensuring that users can make predictions through a user-friendly interface [12][20]. - The app integrates various components, including real-time data fetching and prediction functionalities, allowing users to interactively engage with the predictive model [20][28]. Group 4: Future Directions - The report anticipates that future developments in AI stock prediction will focus on multi-modal integration, combining visual data from candlestick charts with textual analysis from financial news and numerical data from price sequences [39][40]. - The potential for real-time knowledge integration into predictive models is highlighted, suggesting that future AI models will be able to adapt to new information dynamically, improving their robustness and accuracy [40][41].
扩散模型走了十年弯路!何恺明重磅新作JiT:回归真正“去噪”本质
自动驾驶之心· 2025-12-01 00:04
以下文章来源于深蓝AI ,作者深蓝学院 深蓝AI . 专注于人工智能、机器人与自动驾驶的学习平台。 来源 | 深蓝AI 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 在生成式 AI 的世界里,扩散模型几乎成了"图像质量保证书"。 但,扩散模型真的在"去噪"吗? 去噪应该是:给我一张被污染的图,我把干净图像还给你。 但如今几乎所有扩散模型都不是这样工作的——它们训练网络去预测噪声、或者预测混合噪声的量。 图1| 这张图演示了 图像生成 的核心思想:自然图像通常位于一个低维流形上,而噪声或混合噪声(如图中所示的噪声项与"图像减噪声量")则散落在高维空间。也正因为干 净图像在低维结构中,而噪声是彻底的高维乱流,让模型去预测图像本身和去预测噪声,本质上是两类完全不同的任务 1 — 把噪声扔掉,让模型专注图像 作者强调一个机器学习领域的经典假设: 自然图像位于一个低维的 manifold 上 ,几何结构连贯、有规律。但噪声呢? 听起来更像是在"学习噪声",而不是"学习图像"。 ...
80后诺奖得主:AlphaFold下一步融合大模型
量子位· 2025-11-28 04:11
鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI 正值 AlphaFold 问世五周年,其设计者、也是凭借AlphaFold获得诺贝尔化学奖的 John Jumper 公开表示: AlphaFold的下一步是与大模型融合。 不过具体方法并没有透露,或许已有所思路,甚至已经在进程之中。 五年期间,AlphaFold已经帮助全球 300多万 研究人员,预测了数亿种蛋白质的三维结构,并影响了超 50万篇 相关论文。 可以说,这是继量子力学和分子生物学革命后,生命科学的又一次重大跃迁。 继最初的 "结构预测革命" 、随后的 "科研常规工具" 化,AlphaFold及其继承技术正在进入新的 大模型 阶段。 AlphaFold+大模型 现在AlphaFold已经从最初单纯地蛋白质结构预测,发展到能够处理更为复杂的多分子复合体以及更广范围的生物分子交互。 科学家们也据此,实现了相当多的成果突破: 即使是在AI浪潮不断涌来的今天,AlphaFold仍然是 AI+生命科学 最具里程碑意义的一次落地。 作为一款由 谷歌DeepMind 开发的AI科研工具,AlphaFold能够精确预测蛋白质的三维结构。 例如最近来自密苏里大 ...
谷歌AI往事:隐秘的二十年,与狂奔的365天
3 6 Ke· 2025-11-27 12:13
Core Insights - Google has undergone a significant transformation in the past year, moving from a state of perceived stagnation to a strong resurgence in AI capabilities, highlighted by the success of its Gemini applications and models [2][3][44] - The company's long-term investment in AI technology, dating back over two decades, has laid a robust foundation for its current advancements, showcasing a strategic evolution rather than a sudden breakthrough [3][6][45] Group 1: Historical Context and Development - Google's AI journey began with Larry Page's vision of creating an ultimate search engine capable of understanding the internet and user intent [9][47] - The establishment of Google Brain in 2011 marked a pivotal moment, focusing on unsupervised learning methods that would later prove essential for AI advancements [12][18] - The "cat paper" published in 2012 demonstrated the feasibility of unsupervised learning and led to the development of recommendation systems that transformed platforms like YouTube [15][16] Group 2: Key Acquisitions and Innovations - The acquisition of DeepMind in 2014 for $500 million solidified Google's dominance in AI, providing access to top-tier talent and innovative research [22][24] - Google's development of Tensor Processing Units (TPUs) was a strategic response to the limitations of existing hardware, enabling more efficient processing of AI workloads [25][30] Group 3: Challenges and Strategic Shifts - The emergence of OpenAI and the success of ChatGPT in late 2022 prompted Google to reassess its AI strategy, leading to a restructuring of its AI teams and a renewed focus on a unified model, Gemini [41][42] - The rapid development and deployment of Gemini and its variants, such as Gemini 3 and Nano Banana Pro, have positioned Google back at the forefront of the AI landscape [43][44] Group 4: Future Outlook - Google's recent advancements in AI reflect a culmination of years of strategic investment and innovation, reaffirming its identity as a company fundamentally rooted in AI rather than merely a search engine [47][48]
谢赛宁盛赞字节Seed新研究!单Transformer搞定任意视图3D重建
量子位· 2025-11-18 05:02
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 单Transformer搞定任意视图3D重建! 这是字节Seed康炳易团队带来的最新研究成果 Depth Anything 3 (下称DA3),获谢赛宁盛赞。 架构足够简单,核心能力却不差。能从一张图、一组多视角照片甚至一段随手拍的视频里,精准算出物体深度、还原相机位置,不仅能拼出完 整3D场景,还能脑补出没拍过的新视角图像。 而且,它在团队全新打造的视觉几何基准上横扫所有任务,相机定位精度平均提升 35.7% ,几何重建准确率涨了 23.6% ,单目深度估计还 超越了自家前代DA2。 以前的3D视觉模型,想做单图深度估计?得单独训练一个模型;想搞多视角3D重建?又要换一套架构。 就连算个相机位置都得搭专属模块,不仅开发成本高,还没法充分利用大规模预训练模型的优势,数据依赖也很严重。 还有就是这些模型往往"术业有专攻",那DA3的单一极简操作究竟是怎样的呢? 极简设计也能打 核心秘诀就两点:一是只用一个普通的视觉Transformer当基础;二是预测目标只抓 深度 和 光线 两个核心。 从架构图上可以看出来,DA3的任务流程可分为四大环节。 首先是输入处理 ...
AI Bubble 深度讨论:万亿美元 CapEx,Dark GPU,广告电商如何带飞 AI|Best Ideas
海外独角兽· 2025-11-14 06:54
Core Viewpoint - The article discusses the current state of the AI bubble, drawing parallels to the past tech bubbles, particularly the fiber optics bubble, and emphasizes the need for a rational understanding of AI investments and their long-term potential [4][5]. Group 1: OpenAI's CapEx and Market Implications - OpenAI's proposed $1.4 trillion CapEx for establishing approximately 30GW of computing resources raises significant questions about its feasibility and the broader implications for the AI market [5][10]. - The projected revenue target of $100 billion by 2027 suggests an unprecedented monetization speed, which may not align with traditional internet product metrics [8]. - OpenAI may need to secure $1.2 trillion in financing to cover the CapEx gap, which is deemed unfeasible given the current cash flow situation of major tech companies [10][11]. Group 2: CapEx Trends Among Major Tech Companies - The "Mag 7" companies have significantly increased their CapEx since 2023, with many showing improved Return on Invested Capital (ROIC) [13]. - The average CapEx to cash flow ratio for S&P 500 companies has decreased from 70-80% in the 1990s to about 46% today, indicating stronger profitability despite increased CapEx [16]. - Major tech firms currently generate approximately $500 billion in free cash flow annually, providing a buffer for ongoing investments [16]. Group 3: Computing Power Demand and Future Projections - Nvidia's projected orders for the next five quarters could reach $500 billion, indicating a doubling of demand compared to recent revenue figures [24]. - The ongoing competition in model development necessitates continued investment in computing power, with firms like Meta and xAI needing to catch up with leading labs [26]. - The demand for inference computing is expected to grow as AI applications become more validated and integrated into workflows, potentially leading to a significant increase in usage [30]. Group 4: AI Market Dynamics and Growth Potential - The AI market is still in its early stages, with significant room for growth in user adoption and application [41]. - Current AI penetration rates in the U.S. are around 40%, with potential for substantial growth as technology becomes more widely accepted [43]. - The commercial viability of AI products is being tested, with various business models emerging, including subscription and usage-based pricing [46][47]. Group 5: Risks and Future Developments - The potential for a "black swan" event exists if a new model mechanism emerges that significantly reduces costs and disrupts existing technologies [51]. - The current trajectory of AI development is seen as stable, with ongoing advancements in transformer models and reinforcement learning [52]. - Market perceptions of AI's value may fluctuate, particularly as companies approach significant milestones or face challenges in meeting revenue expectations [57].
X @Avi Chawla
Avi Chawla· 2025-11-11 20:14
Mixture of Experts (MoE) Architecture - MoE is a popular architecture leveraging different experts to enhance Transformer models [1] - MoE differs from Transformer in the decoder block, utilizing experts (smaller feed-forward networks) instead of a single feed-forward network [2][3] - During inference, only a subset of experts are selected in MoE, leading to faster inference [4] - A router, a multi-class classifier, selects the top K experts by producing softmax scores [5] - The router is trained with the network to learn the best expert selection [5] Training Challenges and Solutions - Challenge 1: Some experts may become under-trained due to the overselection of a few experts [5] - Solution 1: Add noise to the router's feed-forward output and set all but the top K logits to negative infinity to allow other experts to train [5][6] - Challenge 2: Some experts may be exposed to more tokens than others, leading to under-trained experts [6] - Solution 2: Limit the number of tokens an expert can process; if the limit is reached, the token is passed to the next best expert [6] MoE Characteristics and Examples - Text passes through different experts across layers, and chosen experts differ between tokens [7] - MoEs have more parameters to load, but only a fraction are activated during inference, resulting in faster inference [9] - Mixtral 8x7B and Llama 4 are examples of popular MoE-based LLMs [9]
AI赋能资产配置(二十一):从Transformer到Agent,量化投资实战有何变化?
Guoxin Securities· 2025-11-04 13:36
Group 1 - The core conclusion highlights that Transformer enhances stock return prediction accuracy through spatiotemporal integration and multi-relation modeling, with GrifFinNet as a representative model [1][2] - Agent serves as a comprehensive decision-making entity in quantitative investment, simulating a professional investment process through a layered multi-agent framework, addressing challenges in traditional quantitative models [1][3] - The deep coupling of Transformer and Agent creates an integrated system that enhances both modeling precision and decision automation, facilitating a seamless transition from feature modeling to real trading [1][4] Group 2 - Transformer is identified as an efficient modeling architecture for quantitative investment, overcoming limitations of traditional models in handling nonlinear relationships and dynamic time series [2][12] - GrifFinNet, a key model based on Transformer, significantly outperforms traditional tools like LSTM and XGBoost in stock return prediction accuracy, demonstrating its effectiveness in the A-share market [2][24] - The Agent framework addresses issues in traditional quantitative investment by establishing a hierarchical structure that integrates macro selection, company analysis, portfolio optimization, and risk control [3][25] Group 3 - The integration of Transformer and Agent is not merely additive but follows a logic of functional complementarity, enhancing the overall efficiency of quantitative investment processes [4][28] - The multi-agent system designed for fundamental investing effectively combines structured and unstructured data, improving decision-making capabilities and adaptability to market changes [3][26] - Future advancements in AI-enabled quantitative investment will focus on precision, automation, and robustness, with ongoing optimization of both Transformer and Agent systems [4][33]