Workflow
LSTM
icon
Search documents
被拒≠失败!这些高影响力论文都被顶会拒收过
具身智能之心· 2025-12-12 01:22
Core Insights - Waymo has released a deep blog detailing its AI strategy centered around its foundational model, emphasizing the use of distillation methods to create high-efficiency models for onboard operations [1][2] - Jeff Dean highlighted the significance of knowledge distillation, comparing it to the creation of the Gemini Flash model, which showcases the importance of distillation in AI model efficiency [1][2] Historical Context of Rejected Papers - Many foundational technologies in AI, such as optimizers for large models and computer vision techniques, were initially rejected by top conferences, showcasing a historical pattern of oversight in recognizing groundbreaking innovations [6] - Notable figures in AI, including Geoffrey Hinton and Yann LeCun, have faced rejection for their pioneering work, which was later recognized as transformative [6] Case Studies of Rejected Innovations - LSTM, a milestone for sequence data processing, was rejected by NIPS in 1996 but later became crucial in speech recognition and machine translation, highlighting the delayed recognition of its value [7][10] - SIFT, a dominant algorithm in computer vision, faced rejection from ICCV and CVPR due to its perceived complexity, yet proved to be vital in real-world image processing [11][13] - Dropout, a key regularization method for deep neural networks, was initially rejected for its radical approach but later became essential in training deep networks effectively [17][19] - Word2Vec, despite being rejected at ICLR, became a cornerstone in NLP due to its efficiency and practical application, eventually receiving recognition for its impact [20][24] - YOLO transformed object detection by prioritizing speed over precision, facing rejection for its perceived shortcomings but later becoming a widely adopted framework in the industry [28][30] Reflection on Peer Review Limitations - The peer review system often struggles to recognize disruptive innovations, leading to a systematic cognitive lag in evaluating groundbreaking research [40][41] - The tendency to equate mathematical complexity with research contribution can hinder the acceptance of simpler yet effective methods [41] - Historical examples illustrate that the true measure of a research's impact is not determined by initial peer review outcomes but by its long-term relevance and problem-solving capabilities [43][47]
AI 赋能资产配置(二十九):AI 预测股价指南:以 TrendIQ 为例
Guoxin Securities· 2025-12-03 13:18
Core Insights - The report emphasizes the growing importance of AI in asset allocation, particularly in stock price prediction, highlighting the capabilities of AI models like TrendIQ in addressing the limitations of traditional machine learning approaches [3][4][10]. Group 1: AI in Stock Price Prediction - The introduction of AI large models has significantly improved the ability to predict stock prices by effectively collecting and analyzing unstructured information, which traditional models struggled with [3][4]. - TrendIQ is presented as a mature financial asset price prediction platform that offers both local and web-based deployment options, catering to different user needs [4][10]. - The report discusses the evolution of predictive models from LSTM to more advanced architectures like Transformers, which provide better handling of complex financial data and improve predictive accuracy [5][10]. Group 2: Model Mechanisms and Limitations - LSTM has been the preferred model for stock price prediction due to its ability to handle non-linear and time-series data, but it has limitations such as single modality and weak interpretability [6][7]. - The report outlines the integration of LSTM with other models like XGBoost and deep reinforcement learning to enhance predictive capabilities, addressing some of LSTM's shortcomings [6][10]. - The emergence of Transformer architecture is noted for its advantages in global context awareness and the ability to perform zero-shot and few-shot learning, which enhances its applicability in financial predictions [8][10]. Group 3: TrendIQ Implementation - The report details the implementation of TrendIQ, which includes a complete framework for data preparation, model training, and user interaction through a web application [12][20]. - The training process involves collecting historical stock data, preprocessing it, and training the LSTM model, ensuring that users can make predictions through a user-friendly interface [12][20]. - The app integrates various components, including real-time data fetching and prediction functionalities, allowing users to interactively engage with the predictive model [20][28]. Group 4: Future Directions - The report anticipates that future developments in AI stock prediction will focus on multi-modal integration, combining visual data from candlestick charts with textual analysis from financial news and numerical data from price sequences [39][40]. - The potential for real-time knowledge integration into predictive models is highlighted, suggesting that future AI models will be able to adapt to new information dynamically, improving their robustness and accuracy [40][41].
国债期货系列报告:多通道深度学习模型在国债期货因子择时上的应用
Guo Tai Jun An Qi Huo· 2025-08-28 08:42
1. Report Industry Investment Rating No relevant content provided. 2. Core Viewpoints of the Report - The report innovatively proposes a dual - channel deep - learning model (LSTM and GRU) that integrates daily - frequency and minute - frequency data, which can effectively capture market information on different time scales, significantly improve the prediction accuracy and stability of the strategy outside the sample (especially during market downturns), and provide a new idea with strong generalization ability for reconstructing the quantitative timing system of the bond market [2]. - The dual - channel model shows excellent generalization ability and robustness in out - of - sample tests, and can maintain a high winning rate in bear markets, effectively making up for the shortcoming of traditional factors failing in market downturns [3]. - In the multi - factor timing framework, the weight of deep - learning factors should be controlled at a relatively low proportion, and machine - learning factors should play a supplementary role to achieve the unity of interpretability and performance improvement [43][44]. 3. Summary by Relevant Catalogs 3.1 Deep - Learning Model Introduction - Traditional quantitative factors in the bond market have declined in performance in recent years, and there is a need to reconstruct and re - mine bond - market quantitative factors. Deep - learning methods can be used to find complex relationships in data, and RNN, LSTM, and GRU are considered suitable for the timing task of Treasury bond futures [7][8]. - RNN can process time - series data but has the problem of gradient disappearance when dealing with long time - series [9]. - LSTM solves the gradient - disappearance problem through a cell state and three gating units, enabling it to learn long - range dependencies in sequences [15]. - GRU simplifies the structure of LSTM, reduces the number of learnable parameters, and has high parameter efficiency and fast training speed [19]. - A dual - channel model is designed to process daily - frequency and minute - frequency data simultaneously to extract features on different time scales and predict the daily - frequency returns of Treasury bond futures, which can reduce the over - fitting risk [22]. 3.2 Treasury Bond Futures Timing Test 3.2.1 Back - testing Settings - The target variable is the open - to - open return of 10 - year Treasury bond futures, and the back - testing time interval is from January 2016 to August 2025, with daily rebalancing, 100% margin, 1 - time leverage, and a bilateral handling fee of 0.01% [25][26][27]. 3.2.2 Daily - frequency Channel Model - The single - daily - frequency channel model based on daily - frequency features performs well within the sample but poorly outside the sample, with obvious over - fitting [33]. 3.2.3 Dual - channel Model - The dual - channel model fuses multi - frequency time - series information. The addition of minute - frequency information significantly improves the prediction effect of the model outside the sample, enhances the generalization ability and stability, and maintains a relatively high winning rate in both long and short positions [40][41][42]. 3.3 Deep - Learning Allocation in the Multi - factor Framework - Deep - learning factors in the multi - factor timing framework have high performance but also have over - fitting risks and lack of interpretability. The weight of deep - learning factors should be controlled at a relatively low proportion, and machine - learning factors should play a supplementary role [43][44]. 3.4 Conclusion - The report explores the application of deep - learning models in Treasury bond futures quantitative timing and proposes a dual - channel deep - learning framework based on multi - frequency data fusion, which can effectively improve the performance of multi - factor strategies [45].
微云全息(NASDAQ: HOLO)提出基于LSTM加密货币价格预判技术: 投资决策的智慧引擎
Cai Fu Zai Xian· 2025-08-06 03:01
Core Insights - The rise of blockchain technology has made cryptocurrencies an important part of the financial sector, but the market's lack of regulation and high volatility pose significant risks for investors [1] - Traditional financial forecasting methods struggle with the non-linear and complex nature of cryptocurrency price data, highlighting the need for more advanced predictive techniques [1] - The introduction of deep learning algorithms, particularly Long Short-Term Memory (LSTM) networks, offers a promising approach for predicting cryptocurrency prices by effectively capturing long-term dependencies and complex patterns [1] Data Collection and Processing - The company has gathered extensive historical trading data from multiple authoritative sources, covering various time periods, trading platforms, and cryptocurrency types, including price, volume, and market depth [2] - Data quality and reliability were ensured through rigorous cleaning and preprocessing, which involved removing duplicates, errors, and outliers, as well as normalizing and standardizing the data for model input [2] Model Development - A specialized LSTM neural network model was constructed to address the challenges of traditional RNNs, incorporating gating mechanisms to mitigate issues like gradient vanishing and explosion [2] - The model's architecture and parameters were tailored to the specific requirements of cryptocurrency price prediction, and various optimization algorithms were employed to minimize prediction errors during training [2] Performance Evaluation and Optimization - The model's performance was assessed using multiple evaluation metrics, including Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE), leading to further optimization and adjustments to enhance prediction accuracy [2] - Techniques such as regularization and dropout were utilized to prevent overfitting, ensuring the model's robustness [2] Future Directions - The company plans to explore and integrate new technologies and algorithms, such as reinforcement learning and Generative Adversarial Networks (GANs), to further improve the accuracy and generalization capabilities of its price prediction models [4] - There is an emphasis on enhancing data processing and analysis through the integration of big data, cloud computing, and IoT technologies, providing stronger technical support for price forecasting [4]