机器学习

Search documents
第45届国际预测大会在京落幕 预测研究“中国力量”引全球瞩目
Sou Hu Cai Jing· 2025-07-04 07:10
7月2日,第45届国际预测大会(ISF 2025)在北京圆满闭幕。 国际预测大会是该领域最具权威性的国际学术会议。自1981年创办以来,今年首次在中国大陆举办,吸 引了来自全球35个国家和地区的580位顶尖预测科学学者、行业领袖及政策制定者注册参会,规模创历 史新高,充分展现了预测科学在全球的重要性日益增长及中国在该领域的影响力日益提升。 大会围绕"预测科学的前沿与创新"主题,聚焦人工智能、大数据、经济管理、能源环境、气候变化等关 键领域,设置了13场主旨报告、5场深度工作坊、12个平行论坛共计106个专题分论坛,累计开展348场 学术报告。专家学者们就贝叶斯预测、机器学习、大语言模型、预测不确定性、预测组合等热点议题, 以及预测在宏观经济、金融、供应链、能源、医疗、灾害防控等领域的应用展开了广泛而深入的交流。 据了解,下一届国际预测大会(ISF 2026)将于明年在加拿大举行。 ISF 2025大会报告人。主办方供图 本届大会不仅促进了全球预测科学前沿成果的分享与碰撞,也为深化该领域的国际科研合作与交流搭建 了重要平台,对推动预测科学的发展及其在应对全球挑战中的应用具有重要意义。 国际预测者协会主席Laur ...
从实物资产到数据资产:数字化如何重新定义新时代企业价值
3 6 Ke· 2025-07-04 02:15
在工业时代,价值以钢铁、石油和煤炭的吨数来衡量。如今,价值以TB级数据、用户行为模式和算法 精度来衡量。本文探讨了 实体和虚拟资产的数字化,这 不仅仅是一场IT转型,更是对我们定义、创造 和交付价值方式的重新配置。 我们借鉴经济理论、企业架构以及从特斯拉软件定义汽车到亚马逊算法供应链的现实世界商业颠覆,探 讨 数字化为何会带来波动性、新颖性和颠覆性 ,以及领导者在这种流动性中必须采取哪些措施进行治 理。 从有形价值到无形价值的转变 20世纪初,企业实力的衡量标准是实体层面的,比如公司工厂的规模、车队的机车数量,或者其能够加 工的原材料吨数。资产负债表是具体的。有形性就是信任。 但如今,世界上最有价值的公司没有工厂,没有石油钻井平台,也没有实体船队。相反,它们拥有的是 数据管道、数字生态系统和算法控制回路 。 Netflix :没有 DVD 商店,却重塑了全球娱乐业。 在每种情况下, 软件都成为工厂 , 数据成为供应链 。 数字孪生思维的兴起: 如今,物理世界正越来越多地 被数字复制品所映射 ——从建筑物到发动机,再到整个业务流程。这 些"数字孪生"不仅仅是简单的图表,更是能够采集传感器数据、模拟场景并优化结果 ...
上海交通大学发表最新Nature论文
生物世界· 2025-07-03 09:38
编辑丨王多鱼 排版丨水成文 热纳米光子学 在从能源技术到信息处理等各类技术应用中实现了根本性的突破。从热辐射源到热光伏和热 伪装,精确的光谱工程一直受困于反复试验的方法。与此同时, 机器学习 ( Machine Learning ) 在纳 米光子学和超材料的设计方面展现出了强大的能力。 然而,开发一种通用的设计方法来定制具有超宽带控制和精确带选择性的高性能纳米光子辐射源仍是一项 重大挑战,因为它们受到预定义的几何形状和材料、局部优化陷阱以及传统算法的限制。 2025 年 7 月 2 日,上海交通大学 周涵 教授 、 张荻 教授 、新加坡国立大学 仇成伟 教授 、德克萨斯大 学奥斯汀分校 郑跃兵 教授作为共同通讯作者 (上海交通大学 Chengyu Xiao 为第一作者 ) 在 Nature 期 刊发表了题为: Ultrabroadband and band-selective thermal meta-emitters by machine learning 的研 究论文。 该研究提出了一种基于 机器学习 ( Machine Learning ) 的 通用框架,设计出了 多种超宽带和带选择性 的热元辐射源 ( ...
诺安基金孔宪政:以哲学思维理解金融市场,以科学手段获取超额收益
点拾投资· 2025-07-02 23:16
Core Viewpoint - The article emphasizes the importance of scientific thinking and critical analysis in quantitative investment, highlighting the influence of philosopher Karl Popper on investment strategies and the development of models that seek to identify and exploit market inefficiencies. Group 1: Investment Philosophy - The essence of quantitative investment lies in modeling the securities market using scientific methods to identify reproducible patterns that can influence market behavior [16][6] - The investment approach is heavily influenced by Popper's philosophy of "conjecture and refutation," which encourages the search for rules in an uncertain world [7][56] - The focus on objective analysis helps avoid the pitfalls of linear thinking and cognitive biases that can obscure judgment [2][61] Group 2: Performance Metrics - The performance of the multi-strategy fund, specifically the Nuon Multi-Strategy Mixed Fund, achieved a return of 100.74% over the past year, while the Nuon CSI 300 Index Enhanced Fund outperformed the CSI 300 Index by 2.06% with a return of 15.42% [3][29] - The significant outperformance of the Nuon Multi-Strategy Fund compared to small-cap indices like the CSI 2000 indicates that the excess returns are not merely a result of small-cap exposure but rather from sophisticated modeling techniques [3][34] Group 3: Investment Strategies - The concept of "attention value" in the A-share market suggests that investors frequently shift their focus due to the inability of many companies to meet return expectations, which can be strategically exploited for excess returns in micro-cap stocks [26][4] - The investment strategy emphasizes the importance of understanding the underlying statistical patterns and market behaviors rather than relying solely on historical performance [20][22] Group 4: Machine Learning and Model Development - The transition from multi-factor strategies to machine learning models allows for the capture of non-linear patterns, leading to superior returns that exceed human cognitive limitations [3][30] - The use of machine learning in investment models is seen as a way to enhance predictive capabilities and adapt to rapidly changing market conditions [30][40] Group 5: Market Dynamics and Future Outlook - The article argues that the excess returns from micro-cap stocks in the Chinese market are unlikely to converge due to the unique market dynamics and investor behavior [34][35] - The focus on scientific and systematic approaches in investment is expected to reveal opportunities that are not crowded, as many competitors rely on outdated inductive reasoning [45][46]
指数复制及指数增强方法概述
Changjiang Securities· 2025-07-02 11:07
Quantitative Models and Construction Methods 1. Model Name: Optimization Replication - **Model Construction Idea**: Simplify the replication of index returns into an optimization model that minimizes tracking error[31][32] - **Model Construction Process**: 1. Define the return sequence of the portfolio as: $ \tilde{R}_{t} = \Sigma_{i=1}^{M} \widetilde{W}_{i,t} \cdot Y_{i,t} = Y_{t} \cdot \overline{W}_{t} $ where $ \widetilde{W}_{i,t} $ represents the weight of asset $i$ at time $t$, and $Y_{i,t}$ is the return of asset $i$ at time $t$[31] 2. Minimize the tracking error (TE): $ w = arg\,min\;TE $ $ TE = \frac{1}{T} \Sigma_{t=1}^{T} (\tilde{R}_{t} - R_{t})^2 $ where $R_{t}$ is the benchmark return at time $t$[32] 3. Add constraints: - Full investment: $ \Sigma_{i=1}^{N} w_{i} = 1 $ - Non-negativity: $ 0 \leq w_{i} \leq 1 $[33][35] 4. Incorporate style and industry neutrality constraints to reduce overfitting: $ z_{low} \leq \frac{X_{s}^{T}w - X_{s}^{T}\tilde{w}}{s_{b}} \leq z_{up} $ $ w_{low}^{I} \leq X_{I}^{T}w - X_{I}^{T}\bar{w} \leq w_{up}^{I} $[36] 5. Solve the optimization problem to determine the weights of individual stocks[37] 2. Model Name: Pair Trading - **Model Construction Idea**: Identify pairs of stocks or sectors with similar trends and exploit mean-reversion characteristics to generate excess returns[57] - **Model Construction Process**: 1. Identify pairs of stocks or sectors with stable relationships based on quantitative or fundamental analysis 2. Overweight weaker-performing assets and underweight stronger-performing ones 3. Capture excess returns when the price spread reverts to its mean, particularly during events like macroeconomic data releases or seasonal effects[57] --- Model Backtesting Results 1. Optimization Replication - **Annualized Return**: 5.76% - **Excess Return**: 3.74% - **Sharpe Ratio**: 0.41 - **Excess Drawdown**: 3.86% - **Excess Win Rate**: 72% - **Information Ratio (IR)**: 1.51 - **Tracking Error**: 2.22%[23][27] --- Quantitative Factors and Construction Methods 1. Factor Name: Quantitative Multi-Factor - **Factor Construction Idea**: Select long-term effective alpha factors and construct a multi-factor portfolio to achieve stable excess returns[46] - **Factor Construction Process**: 1. Define the multi-factor model: $ \begin{bmatrix} r_{1} \\ r_{2} \\ \vdots \\ r_{n} \end{bmatrix} = \begin{bmatrix} x_{11} \\ x_{21} \\ \vdots \\ x_{n1} \end{bmatrix} f_{1} + \begin{bmatrix} x_{12} \\ x_{22} \\ \vdots \\ x_{n2} \end{bmatrix} f_{2} + \cdots + \begin{bmatrix} x_{1m} \\ x_{2m} \\ \vdots \\ x_{nm} \end{bmatrix} f_{m} + \begin{bmatrix} u_{1} \\ u_{2} \\ \vdots \\ u_{n} \end{bmatrix} $ where $r_{i}$ is the excess return of stock $i$, $x_{ij}$ is the exposure of stock $i$ to factor $j$, and $f_{j}$ is the return of factor $j$[46] 2. Select effective single factors, such as volatility, short-selling intention, and liquidity, based on theoretical direction and empirical validation[48] 3. Construct a multi-factor portfolio by combining these factors and optimizing weights[47] 2. Factor Name: Negative Enhancement - **Factor Construction Idea**: Underweight stocks expected to underperform or incur losses, leveraging negative factors or events to generate stable excess returns[56] - **Factor Construction Process**: 1. Identify stocks with negative attributes, such as analyst downgrades, equity pledges, or poor earnings reports 2. Underweight these stocks in the portfolio to reduce potential losses and achieve alpha[56] 3. Factor Name: Machine Learning-Based Alpha - **Factor Construction Idea**: Use deep learning models like TCN (Temporal Convolutional Networks) to extract complex and effective alpha factors from price and volume data[52] - **Factor Construction Process**: 1. Train neural networks on historical price and volume data to identify patterns and relationships 2. Generate alpha factors that outperform traditional genetic programming methods in terms of depth and complexity[51][52] --- Factor Backtesting Results 1. Quantitative Multi-Factor - **Excess Return**: Stable across long-term horizons - **Key Factors**: Volatility, short-selling intention, liquidity, and local pricing[48] 2. Negative Enhancement - **Excess Return**: Achieved through underweighting stocks with negative attributes or events[56] 3. Machine Learning-Based Alpha - **Performance**: Demonstrated superior results compared to traditional factor generation methods[52] --- Index Enhancement Methods 1. IPO Enhancement - **Annualized Return**: 2.13% in 2025 - **Segment Performance**: Sci-Tech Innovation Board (4.34%), ChiNext (2.52%)[67][68] 2. Stock Index Futures - **Annualized Basis Spread**: - CSI 300: -6.75% - SSE 50: -2.48% - CSI 500: -13.60% - CSI 1000: -18.09%[72][73] 3. Block Trades - **Median Discount Rate**: 5.38% (2017-2025), 8.23% in 2025[74][75] 4. Private Placements - **Median Discount Rate**: 14.55% (2017-2025), 11.87% in 2025[77]
极智嘉 全栈技术筑壁垒掘金仓储自动化黄金赛道
Sou Hu Cai Jing· 2025-07-02 09:30
Company Overview - Beijing Geek+ Technology Co., Ltd. (referred to as "Geek+") is launching its IPO from today until July 4, 2025, with plans to list on the Hong Kong Stock Exchange on July 9, 2025 [2] - The company plans to issue 140,353,000 H-shares, raising approximately HKD 2.358 billion at an issue price of HKD 16.80 per share [2] - Geek+ has attracted four cornerstone investors, collectively subscribing USD 91.3 million (approximately HKD 716.7 million) [2] Technology and Innovation - Geek+ has developed a comprehensive technology stack covering hardware, software, and algorithms, creating a significant technological moat [3] - The company introduced laser-vision fusion SLAM technology, achieving an average positioning accuracy of less than ±10mm, leading the industry [4] - The Hyper+ core algorithm platform is one of the most advanced in the AMR market, optimizing resource allocation and maximizing cost efficiency [5] - Geek+ has created the world's first universal robot technology platform, Robot Matrix, enhancing R&D efficiency by over 30% [6][7] - The company has filed over 2,000 patents by 2024, with its PopPick solution leading globally in compatibility and throughput efficiency [8] Market Landscape - The global AMR market is projected to grow from CNY 38.7 billion in 2024 to CNY 162.1 billion by 2029, with a CAGR of 33.1% [10] - The penetration rate of AMR in warehouse automation is expected to rise from 4.4% in 2020 to 20.2% in 2029 [10] - Key growth drivers include the booming e-commerce sector, increasing demand for logistics automation, and the need for manufacturing efficiency [13] - AMR robots have diverse applications across various industries, including logistics, manufacturing, healthcare, and food service [14] Competitive Advantages - Geek+ has established a global service network and collaborates with partners like Bosch Rexroth and Mujin, creating a complete ecosystem from hardware to systems [18] - The company has received strategic investments from firms like Warburg Pincus, Ant Group, and Intel, with net proceeds of approximately HKD 2.206 billion allocated for R&D and market expansion [19] - Geek+ maintains a leading market share in the AMR sector, with a revenue increase from CNY 790 million in 2021 to CNY 2.41 billion in 2024, reflecting a CAGR of 45% [23] - The company has a customer repurchase rate of 74.6%, indicating strong client retention and satisfaction [24] Industry Outlook - The intelligent logistics automation industry is experiencing rapid growth, with favorable policies supporting technological innovation and application promotion [15] - Advances in AI, machine learning, computer vision, and IoT are enhancing AMR robot performance and functionality [16] - The global labor shortage and the decline of China's demographic dividend are driving the shift towards automation, with Geek+ solutions reducing labor needs by 65% [17]
专家的社会预测,为何总是不准?
Hu Xiu· 2025-07-01 13:34
社会科学的核心使命,是发现、描述并解释人类社会的运行规律与历史变迁。然而,对未来的好奇与关切几乎与解释同样强烈。 过去,由于数据和模型的约束,社会科学家往往只能在宏观社会趋势上给出谨慎的推断。如今,随着机器学习和人工智能的发展,如何 利用大数据做出准确的社会预测成为了社会科学的前沿议题(陈云松等,2020;Lundberg et al.2022)。 然而,在讨论算法与数据能给社会预测带来多大突破之前,还有一个更基础的问题有待检验:在不倚赖数据的情况下,单凭专业知识和 经验,社会科学家究竟能否对未来的社会变迁做出相对准确的判断?由于专家意见经常用于辅助政策制定,并在公共舆论中发挥影响, 评估他们预测的准确性便显得尤为关键。 针对这一研究问题,The Forecasting Collaborative团队在2022年发表于《自然:人类行为》的文章"Insights into the Accuracy of Social Scientists'Forecasts of Societal Change"中展开了一项实验,得出了非常有意思的结论。 The Forecasting Collaborative是一个专注于评 ...
Sebastian Raschka著作免费开放!《机器学习与AI核心30问》,新手专家皆宜
机器之心· 2025-07-01 05:01
机器之心报道 编辑:杜伟 知名 AI 技术博主、《Python 机器学习》作者 Sebastian Raschka 又来放福利了! 今天,他宣布,正值夏季实习和技术面试之际,自己著作《机器学习 Q 与 AI:30 个必备问答》的全部 30 章内容免费开放。他希望能为大家带来帮助,并 祝面试的小伙伴好运。 这本书纸质版(+ 电子版)原价 49.99 美元(约合 358 元),电子版原价 39.9 美元(约合 286 元)。 如今,机器学习和人工智能领域正以前所未有的速度发展。研究人员和从业者常常疲于追赶层出不穷的概念与技术。 本书为你的成长旅途提供了碎片化的知识精华 —— 从机器学习新手到专家,涵盖多个领域的主题。即便是经验丰富的机器学习研究者和从业者,也能从中 发现可纳入自身技能库的新内容 。 评论区有人问,「这本书是用 AI 写的吗?」Sebastian 称当然不是,这样做违背他的个人伦理。有趣的是:这本书的大部分内容写于 2022 年 11 月第一 版 ChatGPT 发布前的几个月,最开始是在 LeanPub 上发布,后来在 2024 年由 No Starch 出版社出版。这本书可能曾是 ChatGPT ...
一文读懂数据标注:定义、最佳实践、工具、优势、挑战、类型等
3 6 Ke· 2025-07-01 02:20
Group 1 - The importance of data annotation for AI and ML is highlighted, as it enables machines to recognize patterns and make predictions by providing meaningful labels to raw data [2][5] - According to MIT, 80% of data scientists spend over 60% of their time preparing and annotating data rather than building models, emphasizing the foundational role of data annotation in AI [2][5] - Data annotation is defined as the process of labeling data (text, images, audio, video, or 3D point cloud data) to enable machine learning algorithms to process and understand it [3][5] Group 2 - The data annotation field is rapidly evolving, significantly impacting AI development, with trends including the use of annotated images and LiDAR data for autonomous vehicles, and labeled medical images for healthcare AI [5][6] - The global data annotation tools market is projected to reach $3.4 billion by 2028, with a compound annual growth rate of 38.5% from 2021 to 2028 [5][6] - AI-assisted annotation tools can reduce annotation time by up to 70% compared to fully manual methods, enhancing efficiency [5][6] Group 3 - The quality of AI models is heavily dependent on the quality of their training data, with well-annotated data ensuring models can recognize patterns and make accurate predictions [5][6] - A 5% improvement in annotation quality can lead to a 15-20% increase in model accuracy for complex computer vision tasks, according to IBM research [5][6] - Organizations typically spend between $12,000 to $15,000 per month on data annotation services for medium-sized projects [5][6] Group 4 - Currently, 78% of enterprise AI projects utilize a combination of internal and outsourced annotation services, up from 54% in 2022 [5][6] - Emerging technologies such as active learning and semi-supervised annotation methods can reduce annotation costs by 35-40% for early adopters [5][6] - The annotation workforce has shifted significantly, with 65% of annotation work now conducted in specialized centers in India, the Philippines, and Eastern Europe [5][6] Group 5 - Various data annotation types include image annotation, audio annotation, video annotation, and text annotation, each requiring specific techniques to ensure effective machine learning model training [9][11][14][21] - The process of data annotation involves several steps, from data collection to quality assurance, ensuring high-quality and accurate labeled data for machine learning applications [32][37] - Best practices for data annotation include providing clear instructions, optimizing annotation workload, and ensuring compliance with privacy and ethical standards [86][89]
机器学习因子选股月报(2025年7月)-20250630
Southwest Securities· 2025-06-30 04:35
Quantitative Factor and Model Analysis Quantitative Models and Construction 1. **Model Name**: GAN_GRU Model **Model Construction Idea**: The GAN_GRU model combines Generative Adversarial Networks (GAN) for generating realistic price-volume sequential features and Gated Recurrent Units (GRU) for encoding these sequential features into predictive signals for stock selection [2][9]. **Model Construction Process**: - **GRU Component**: - Input features include 18 price-volume features such as closing price, opening price, turnover, and turnover rate [10][13]. - Training data consists of the past 400 trading days' features, sampled every 5 trading days, forming a 40x18 feature matrix to predict the cumulative return over the next 20 trading days [14]. - Data preprocessing includes outlier removal and standardization at both time-series and cross-sectional levels [14]. - The GRU network consists of two layers (GRU(128, 128)) followed by an MLP (256, 64, 64), with the final output being the predicted return (pRet) [18]. - **GAN Component**: - The generator (G) uses an LSTM model to preserve the sequential nature of the input features, while the discriminator (D) employs a CNN to process the two-dimensional price-volume feature "images" [29][32]. - The generator's loss function is: $$ L_{G} = -\mathbb{E}_{z\sim P_{z}(z)}[\log(D(G(z)))] $$ where \( z \) represents random noise, \( G(z) \) is the generated data, and \( D(G(z)) \) is the discriminator's output probability [20][21]. - The discriminator's loss function is: $$ L_{D} = -\mathbb{E}_{x\sim P_{data}(x)}[\log D(x)] - \mathbb{E}_{z\sim P_{z}(z)}[\log(1-D(G(z)))] $$ where \( x \) is real data, \( D(x) \) is the discriminator's output for real data, and \( D(G(z)) \) is the output for generated data [23][25]. - Training alternates between updating the discriminator and generator parameters until convergence [26]. **Model Evaluation**: The GAN_GRU model effectively captures both sequential and cross-sectional price-volume features, leveraging the strengths of GANs and GRUs for stock selection [2][9][29]. --- Quantitative Factors and Construction 1. **Factor Name**: GAN_GRU Factor **Factor Construction Idea**: The GAN_GRU factor is derived from the GAN_GRU model's output, representing the encoded price-volume sequential features as a stock selection signal [2][9]. **Factor Construction Process**: - The factor is derived from the predicted return (pRet) output of the GAN_GRU model [18]. - The factor undergoes industry and market capitalization neutralization, followed by standardization [18]. **Factor Evaluation**: The GAN_GRU factor demonstrates strong predictive power across various industries, with consistent performance in both IC and excess returns [36][40]. --- Model Backtest Results 1. **GAN_GRU Model**: - **IC Mean**: 11.54% - **ICIR**: 0.89 - **Turnover Rate**: 0.83 - **Recent IC**: 8.34% - **1-Year IC Mean**: 11.09% - **Annualized Return**: 37.71% - **Annualized Volatility**: 24.95% - **IR**: 1.56 - **Max Drawdown**: 27.29% - **Annualized Excess Return**: 24.95% [36][37]. --- Factor Backtest Results 1. **GAN_GRU Factor**: - **IC Mean**: 11.54% - **ICIR**: 0.89 - **Turnover Rate**: 0.83 - **Recent IC**: 8.34% - **1-Year IC Mean**: 11.09% - **Annualized Return**: 37.71% - **Annualized Volatility**: 24.95% - **IR**: 1.56 - **Max Drawdown**: 27.29% - **Annualized Excess Return**: 24.95% [36][37].