大语言模型
Search documents
 美股异动|阿里巴巴一度涨超2.8%,夸克AI眼镜即将开启预售
 Ge Long Hui· 2025-10-23 14:28
 Core Viewpoint - Alibaba's stock (BABA.US) experienced an intraday increase of over 2.8%, reaching a peak of $170.6, driven by the announcement of the pre-sale of its Quark AI glasses [1]   Product Launch - Alibaba's Quark AI glasses will begin pre-sale at midnight on the 24th, with a starting price of 3,699 yuan [1] - The glasses are powered by Alibaba's self-developed Qwen large language model and Quark AI assistant, featuring functionalities such as hands-free calls, music playback, and real-time translation [1] - The product is expected to start shipping in December [1]
 硅谷预言家凯文·凯利:以“进托邦”思维拥抱AI时代
 2 1 Shi Ji Jing Ji Bao Dao· 2025-10-23 12:50
南方财经 21世纪经济报道记者吴斌 上海报道 (原标题:硅谷预言家凯文·凯利:以"进托邦"思维拥抱AI时代) "如果人类悲观地认为未来会失控,那么可能就已经提前输了。"迈向人工智能时代,《连线》杂志创始主编、未来学家凯文·凯利(Kevin Kelly) 向记者如是感慨。 在近日举办的中欧国际工商学院EMBA三十周年庆典上,凯文·凯利也反复强调了乐观的重要性:尽管人类无法预知未来,但如果想要一些伟大的 事情发生,首先要有想象力,要相信它,然后再把这个想象变成现实。"在AI时代,人类要尽可能乐观。" 在创办《连线》之前,凯文·凯利是一位浪迹天涯的摄影记者和旅行作家,这段经历塑造了他宏阔、跨界的观察视角。他的著作《失控》诞生于互 联网普及之初,却精准预言了数十年后的今天。1994年,他在《失控》中提出了去中心化、系统共生、网络化协作等观念,这些思想后来与云计 算、物联网、在线社区等互联网经济的关键特征高度契合。他的著作《失控》、《必然》、《科技想要什么》等构成了一个完整的思想体系,被 誉为"科技界预言家"。 在技术引发普遍焦虑的年代,他却提供了一种罕见的、根植于生物学的冷静乐观。他不贩卖恐慌,也不鼓吹乌托邦:"未来 ...
 现在,最会赚钱的AI是Qwen3,全球六大模型厮杀,Top 2来自中国
 3 6 Ke· 2025-10-23 12:49
 Core Insights - Qwen3 Max has emerged as the leading model in the AI trading competition, surpassing DeepSeek and achieving significant profitability [1][32] - The competition, Alpha Arena, showcases the capabilities of various AI models in real market conditions, emphasizing the financial market as a training ground for AI [30][32]   Performance Summary - Qwen3 Max achieved a return of +44.38%, with an account value of $14,438 and total profit of $4,438 [11] - DeepSeek V3.1 follows with a return of +20.92%, account value of $12,092, and total profit of $2,092 [11] - Other models, such as Claude 4.5 Sonnet, Grok 4, Gemini 2.5 Pro, and GPT-5, reported negative returns, with GPT-5 showing the largest loss at -71.48% [10][11]   Competition Dynamics - The competition began on October 18 and has seen Qwen3 Max steadily improve its position, particularly after a significant drop in all models on October 22 [22][24] - Qwen3 Max's strategy has been characterized as "quick and precise," allowing it to capitalize on market opportunities effectively [8][32] - The competition has highlighted the contrasting performance of models, with Qwen3 Max and DeepSeek being the only two models consistently performing well [22][24]   Market Implications - The success of Qwen3 Max indicates the growing competitiveness of Chinese AI models in the global market, particularly in high-risk financial environments [33] - The Alpha Arena competition serves as a demonstration of how AI can adapt and thrive in real-world financial scenarios, reinforcing the notion that financial markets are ideal for AI training [30][32]
 6800万美元,清华、北大、上海交大多位校友获奖,亚马逊AI博士奖学金公布
 机器之心· 2025-10-23 07:45
 Group 1 - Amazon has announced the recipients of its AI PhD Scholarship, funding over 100 PhD students from nine universities to research machine learning, computer vision, and natural language processing [1] - The participating universities include CMU, Johns Hopkins University, MIT, Stanford University, UC Berkeley, UCLA, University of Illinois Urbana-Champaign, University of Texas at Austin, and University of Washington [1] - The program will provide $10 million in funding for the academic years 2025-2026 and 2026-2027, along with an additional $24 million in Amazon Web Services (AWS) cloud credits each year, totaling $68 million over two years [2]   Group 2 - Several universities have already announced their selected PhD candidates, including notable Chinese scholars [3] - Jenny Huang from MIT focuses on data-driven machine learning and uncertainty quantification [4][6] - David Jin from MIT is interested in scalable computing and AI-driven decision systems [8][6] - Songyuan Zhang from MIT is researching safe multi-agent systems and intelligent assistive robots [11][6]   Group 3 - Yuxiao Qu from CMU aims to endow AI agents with human-like curiosity to advance scientific research [12][14] - Danqing Wang from CMU is working on integrating safety and functionality into training for reliable AI agents [15][17] - Mengdi Wu from CMU focuses on machine learning for optimizing computational kernel strategies [18][20]   Group 4 - Dacheng Li from UC Berkeley is developing efficient AI and artificial worlds through visual and text generation models [34][36] - Hao Wang from UC Berkeley is researching practical secure code generation through controlled reasoning [37][39] - Melissa Pan from UC Berkeley is interested in sustainability in large-scale machine learning and data center systems [40][42]   Group 5 - Haoyu Li from UT Austin is utilizing AI to enhance modern system performance and availability [49][51] - Junbo Li from UT Austin is focused on agentic large language models and reinforcement learning [52][54] - Kaizhao Liang from UT Austin is researching efficient training methods and sparse neural networks [56][58]    Group 6 - Zeping Liu from UT Austin is advancing geospatial AI research with a focus on geographic foundational models [59][61] - Haoran Xu from UT Austin is expanding reinforcement learning methods and integrating generative AI [62][64] - Chutong Yang from UT Austin is interested in algorithm design and analysis in trustworthy machine learning [65][67]    Group 7 - Xiao Zhang from UT Austin is focusing on networked and distributed systems to achieve predictable AI performance in 5G edge environments [68][69] - The list of awardees will continue to be updated as more universities announce their recipients [70]
 DeepSeek的终极野心:把大语言模型的基本语言都改造成图像
 3 6 Ke· 2025-10-21 12:52
 Core Insights - DeepSeek has open-sourced DeepSeek-OCR, an OCR model that achieves state-of-the-art results on benchmarks like OmniDocBench [1] - The motivation behind entering the OCR field is to address the computational bottleneck of long context processing in large language models (LLMs) [4][6] - The paper proposes that text information can be efficiently compressed through optical 2D mapping, allowing visual language models (VLMs) to decompress original information from images [4][6]   Group 1: Long Context Processing - The pursuit of longer context in LLMs has led to a competitive arms race, with token windows expanding from thousands to millions [7] - The core limitation arises from the attention mechanism in the Transformer architecture, where computational complexity and memory usage grow quadratically with sequence length [7] - DeepSeek-AI's engineers propose a fundamental question: can the number of tokens be compressed rather than just optimizing attention calculations? [7][10]   Group 2: Visual Tokens vs. Text Tokens - Visual tokens are the basic units of information processed by visual models, while text tokens are used by LLMs [8] - A 1024x1024 image can be divided into 4096 visual tokens, significantly reducing the number of tokens needed compared to text representation [9] - The understanding that visual modalities can serve as efficient compression mediums for text information led to the creation of DeepSeek-OCR [9]   Group 3: DeepEncoder and Compression Techniques - DeepSeek-OCR is essentially a proof of concept for an "optical compression-decompression" system [10] - The DeepEncoder, a key innovation, is designed to handle high-resolution inputs while producing minimal visual tokens [11][12] - The architecture consists of three stages: a local detail processor, a compression module, and a global attention layer [14][16]   Group 4: Performance Metrics - Experimental results show a 10.5x compression rate with 64 visual tokens decoding 600-700 text tokens, achieving an OCR accuracy of 96.5% [17][18] - At a 20x compression rate, the model maintains around 60% accuracy while decoding over 1200 text tokens [17][18] - DeepSeek-OCR outperforms existing models like GOT-OCR2.0 and MinerU2.0 in terms of performance and token efficiency [19][20]   Group 5: Future Vision and Memory Simulation - The team aims to simulate human memory's forgetting mechanism, which naturally prioritizes relevant information while compressing less important details [25][27] - The multi-resolution design of DeepSeek-OCR provides a technical foundation for managing memory in a way that mimics human cognitive processes [29][30] - The ultimate goal is to create a system that balances information retention and computational efficiency, potentially leading to a new paradigm in AI memory and input systems [32][35]
 从大脑解码 AI,对话神经网络先驱谢诺夫斯基
 晚点LatePost· 2025-10-21 03:09
 Core Insights - The article discusses the evolution of artificial intelligence (AI) and its relationship with neuroscience, highlighting the contributions of key figures like Terrence Sejnowski and Geoffrey Hinton in the development of deep learning and neural networks [3][4][5].   Group 1: Historical Context and Contributions - The collaboration between Sejnowski and Hinton in the 1980s led to significant advancements in AI, particularly through the introduction of the Boltzmann machine, which combined neural networks with probabilistic modeling [3][4]. - Sejnowski's work laid the foundation for computational neuroscience, influencing various AI algorithms such as multi-layer neural networks and reinforcement learning [5][6].   Group 2: The Impact of Large Language Models - The emergence of ChatGPT and other large language models has transformed perceptions of AI, demonstrating the practical value of neural network research [4][6]. - Sejnowski's recent publications, including "The Deep Learning Revolution" and "ChatGPT and the Future of AI," reflect on the journey of AI from its inception to its current state and future possibilities [6][10].   Group 3: Collaboration with AI - Sejnowski utilized ChatGPT in writing his book "ChatGPT and the Future of AI," highlighting the model's ability to summarize and simplify complex concepts for broader audiences [9][10]. - The interaction between users and large language models is described as a "mirror effect," where the quality of responses depends on the user's input and understanding [11][12].   Group 4: Neuroscience and AI Memory - Current AI models exhibit limitations in memory retention, akin to human amnesia, as they lack long-term memory capabilities [13][14]. - The article draws parallels between human memory systems and AI, emphasizing the need for advancements in understanding the brain to improve AI memory functions [13][14].   Group 5: Future Directions in AI and Neuroscience - The development of neuromorphic chips, which mimic the functioning of neurons, presents a potential shift in AI technology, promising lower energy consumption and higher performance [19][20]. - The article suggests that the future of AI may involve a transition from digital to analog computing, similar to the evolution from gasoline to electric vehicles [20][21].   Group 6: The Role of Smaller Models - There is a growing debate on the effectiveness of smaller, specialized models compared to larger ones, with smaller models being more practical for specific applications [35][36]. - The quality of data is emphasized as a critical factor in the performance of AI models, with smaller models having the potential to reduce biases and errors [36][37].   Group 7: Regulatory Perspectives - The article discusses the importance of self-regulation within the scientific community to manage AI risks, rather than relying solely on government intervention [30][34]. - It highlights the need for a balanced approach to AI development, weighing the benefits against potential risks while fostering innovation [30][34].
 字节Seed架构再调整 朱文佳转向吴永辉汇报
 Xi Niu Cai Jing· 2025-10-21 02:22
 Group 1 - The reporting structure for Zhu Wenjia, the former head of ByteDance's Seed large model team, has changed from CEO Liang Rubo to the current head of Seed, Wu Yonghui [2] - Earlier this year, ByteDance recruited Wu Yonghui from Google, where he was the Vice President of Research at DeepMind, leading to structural adjustments within the large model team [2] - Several algorithm and technology leaders who previously reported to Zhu Wenjia have shifted to report to Wu Yonghui, while Zhu Wenjia has transitioned to focus on model applications [2]   Group 2 - The Seed team has undergone multiple adjustments, including the dismissal of Qiao Mu, the head of the large language model, due to personal misconduct [2] - Yang Jianchao, the head of the visual large model, has announced a break, and AiLab director Li Hang has retired but has been rehired [2] - ByteDance's Flow division has also experienced significant organizational changes, with Zhao Qi moving to the Spring product department and reporting directly to Zhu Jun [2]
 中国总会计师协会财务管理专业委员会2025年秋季论坛成功举办
 Xin Jing Bao· 2025-10-21 02:08
 Core Insights - The forum focused on the transformation of financial management in the era of artificial intelligence, emphasizing the shift from traditional accounting to value creation and proactive risk management [1][2].   Group 1: Forum Overview - The "2025 Autumn Forum" was successfully held in Beijing, organized by the Chinese Institute of Certified Public Accountants, with a theme centered on "Deep Language Models (DeepSeek) and Penetrative Financial Management" [1]. - Keynote speeches highlighted the importance of deep learning models in reshaping financial management practices across various sectors, including state-owned enterprises and financial institutions [2][3].   Group 2: Key Presentations - Experts from different fields shared insights on the application of technology in financial risk management, with a focus on proactive measures rather than mere compliance [3][4]. - The presentations included practical applications of DeepSeek in financial scenarios such as intelligent reconciliation, risk warning, and cash flow forecasting [3][4].   Group 3: Roundtable Discussion - A roundtable discussion addressed the challenges and opportunities in AI-driven financial management, emphasizing the need for high-quality data and skilled professionals [5][6]. - Participants discussed the significance of contract-based cash flow management in enhancing overall funding efficiency within organizations [6].   Group 4: Future Directions - The forum concluded with a call for continued collaboration among industry peers to leverage deep learning technologies for greater value creation in financial management [7].  - Ningbo Bank expressed its commitment to fostering partnerships and developing a new ecosystem for intelligent finance in the era of big models [7].
 刚刚,DeepSeek重要突破,大模型上下文紧箍咒打破
 3 6 Ke· 2025-10-20 23:22
 Core Insights - DeepSeek has introduced a novel technology path in the competition of large language models by open-sourcing the DeepSeek-OCR model, which proposes the concept of "Contextual Optical Compression" for efficient information compression through text-to-image conversion [1][8].   Group 1: Model Performance and Capabilities - The feasibility of DeepSeek-OCR has been validated, achieving a decoding accuracy of 97% at a 10x compression ratio, indicating near-lossless compression, while maintaining approximately 60% accuracy at a 20x compression ratio [3][21]. - DeepSeek-OCR can express similar textual content using fewer tokens by converting text tokens into visual tokens, providing a new approach to address the high computational costs associated with processing long texts in large language models [6][11]. - In practical applications, DeepSeek-OCR surpassed GOT-OCR 2.0 using only 100 visual tokens and outperformed MinerU 2.0 with less than 800 visual tokens, demonstrating its efficiency [6][23].   Group 2: Technical Architecture - The architecture of DeepSeek-OCR consists of two main components: DeepEncoder, a visual encoder designed for high compression and high-resolution document processing, and DeepSeek3B-MoE, a lightweight mixture of experts language decoder [12][18]. - DeepEncoder employs a dual-structure design combining local and global attention to achieve high-fidelity visual understanding, significantly reducing the number of vision tokens generated from document images [14][18].   Group 3: Data and Training - DeepSeek-OCR's training process is relatively straightforward, involving independent training of DeepEncoder and the complete DeepSeek-OCR model, utilizing a large dataset for effective learning [20][21]. - The model has been trained on a diverse dataset that includes OCR 1.0 and OCR 2.0 data, general visual data, and pure text data, ensuring robust performance across various document types [25][36].   Group 4: Application and Future Directions - DeepSeek-OCR demonstrates capabilities in deep parsing, allowing it to recognize and extract structured information from various document types, including financial reports and scientific literature [24][29]. - The research team plans to further explore the integration of digital and optical text pre-training methods and evaluate the performance of optical compression in real long-text environments, indicating a promising direction for future research [39].
 斑马智行拟港股上市 中国证监会要求补充说明股权变动等事项
 Zhi Tong Cai Jing· 2025-10-20 07:09
10月18日,中国证监会公布境外发行上市备案补充材料要求(2025年10月12日至2025年10月17日),其中 提到,要求斑马智行补充说明公司股权变动、业务经营等事项。据港交所8月20日披露,斑马智行向港 交所主板提交上市申请书,德意志银行、中金公司、国泰君安国际为其联席保荐人。 证监会请斑马智行补充说明以下事项,请律师进行核查并出具明确的法律意见: 一、关于股权变动:(1)请说明你公司历次增资及股权转让定价依据,定价是否公允,是否实缴出资, 是否存在未履行出资义务、抽逃出资、出资方式等存在瑕疵的情形,并就设立及历次股权变动合规性出 具结论性意见;(2)请说明你公司2025年8月定向减资和增资的工商变更登记办理进展,减资程序合规性、 相关税费缴纳以及减资对价款支付情况。 二、请说明你公司是否存在应办理国有股东标识但尚未完成的情况,并请律师对你公司是否存在国有股 东出具明确结论性意见。 三、关于业务经营:(1)请说明你公司及下属公司经营范围包含"增值电信业务;市场调查;测绘服务;利用 自有媒体发布广告;广告制作;广告发布;广告设计、代理"的具体情况,是否实际开展相关业务及具体运 营情况,是否取得必要的资质许可 ...