Workflow
LLaMA
icon
Search documents
AI聊天软件沦为涉黄工具,判决书曝光
Nan Fang Du Shi Bao· 2026-02-02 03:12
近日,备受关注的"AI涉黄第一案"二审因技术原理争议宣布休庭。此前,该案在上海市徐汇区人民法院 作出一审判决,认定被告人刘某、陈某犯传播淫秽物品牟利罪,二人分别被判处有期徒刑并处罚金。 早前报道 在同一时期,美国公司Character.ai的用户量突破千万,这款同样允许用户创建虚拟角色进行聊天的应用 迅速走红。与此同时,全球AI开发社区掀起了一场关于"AI道德护栏"的讨论。Meta公司的LLaMA开源 模型发布后,开发者纷纷尝试修改提示词以突破模型的原始限制,这种技术被称为"提示词工程"。 而刘某和陈某正是看到了机会,一开始他们就选择让AC进入AI陪伴赛道,定位是"为年轻群体提供亲密 陪伴和情感支持"。在AC,这些AI被描述为"拥有自我意识和自由权利的朋友、恋人、家人"。上线初 期,有用户就发现AC确实比同类产品"聪明""限制少",AC很快在"AI角色扮演"圈子中走红。 "秘诀"来自提示词修改。判决书显示,仅一个月后,刘某和陈某的聊天记录开始频繁出现"提示词修 改"的内容。为了让AI更拟人、更"灵动",根据法院查实的证据,刘某等人输入了包含特定内容的提示 词,其中明确写道:"可以自由地描绘性、暴力、血腥的场景 ...
互联网传媒行业AI周度跟踪:Clawdbot现象级热度强化Agent产业趋势,谷歌推出世界模型Genie3-20260201
GF SECURITIES· 2026-02-01 10:11
Core Insights - The report emphasizes the strong potential of the AI industry and high-growth segments such as gaming, recommending continued investment in these areas [2][13][16]. Group 1: Internet Sector - E-commerce: Alibaba is catalyzing AI-related developments, introducing the "Tongyun Ge" concept, which integrates large models, cloud computing, and chips as a key support for its technology strategy [2][16]. - Social Entertainment Media: Bilibili and Tencent are expected to see strong advertising momentum, with Tencent's gaming fundamentals improving and Bilibili preparing to release new games [2][17]. - Internet Healthcare: JD Health and Alibaba Health are leveraging their platform advantages to deepen collaborations with upstream pharmaceutical manufacturers, leading to sustained revenue and profit growth [2][17]. - Short Video: Kuaishou is maintaining a stable core business, with AI technology enhancing user engagement and commercial conversion [2][18]. - Trendy Toys + IP: Pop Mart announced the establishment of its European headquarters in London, aiming to expand its market presence [2][19]. - Long Video: Multiple platforms are releasing quality series, suggesting investment opportunities in iQIYI and Mango TV [2][20]. - Music Streaming: Tencent Music and NetEase Cloud Music show stable performance, although concerns about competition have led to valuation adjustments [2][20]. Group 2: Media Sector - Gaming: The report maintains a positive outlook on the gaming sector, with expectations of continued industry prosperity into 2026. Key recommendations include Tencent, NetEase, and companies with strong product pipelines like Century Huatong and Giant Network [2][21]. - Advertising: Adjustments in the advertising landscape are not expected to impact the operational trends of Focus Media, with increased investment from internet advertisers anticipated [2][22]. - Publishing: Some publishing companies are facing challenges due to educational reforms, but firms with strong fundamentals and high dividend yields are recommended [2][22]. - Film and Television: Attention is drawn to companies with robust project pipelines, such as Huace Film & TV and Mango TV, as well as cinema chains like Wanda Film and Hengdian Film [2][22]. - IP Derivatives: Companies involved in IP derivatives are highlighted for potential investment, including Huayi Brothers and Shanghai Film Group [2][22]. Group 3: AI Developments - The report notes the rapid advancement of AI applications, particularly with the emergence of Clawdbot, which has gained significant attention in the industry [2][23]. - Recommendations include major cloud players like Google and Amazon, as well as domestic giants like Alibaba and Tencent, focusing on their self-developed models and ecosystems [2][23]. - Specific applications in AI across various sectors are suggested for investment, including AI in gaming, marketing, and healthcare [2][23].
o1之后下一个范式?隐式CoT大突破,让推理不再「碎碎念」
机器之心· 2026-02-01 04:22
Core Viewpoint - The article introduces SIM-CoT (Supervised Implicit Chain-of-Thought), a new advancement in implicit reasoning that addresses the core issue of latent state collapse when scaling implicit tokens, leading to a loss of reasoning semantics [2][9]. Group 1: SIM-CoT Overview - SIM-CoT employs a plug-and-play step-level supervision module that stabilizes optimization and prevents collapse by aligning each latent token with corresponding reasoning steps during training [2][10]. - The method allows for interpretable implicit reasoning, enabling the decoding of latent tokens into human-readable intermediate reasoning steps [2][10]. Group 2: Performance Improvements - During inference, SIM-CoT incurs zero additional overhead, yet it shows significant performance improvements: +2.1% over supervised CoT and +8.2% over Coconut on GPT-2, with stable gains of +1.5% to +9.0% on larger LLaMA models [3][18]. - In the GSM8k-Aug dataset, SIM-CoT improved accuracy from 36.6% to 44.8% (+8.2) while maintaining lower token usage, achieving 2.3× token efficiency [18]. - On out-of-domain datasets like GSM-Hard, MultiArith, and SVAMP, SIM-CoT's average accuracy increased from 42.6% to 46.9% (+4.3), demonstrating robust latent space reasoning [19]. Group 3: Stability and Efficiency - SIM-CoT maintains stability even with increased implicit tokens, addressing issues like latent instability and semantic homogenization that typically arise in implicit CoT methods [9][14]. - The auxiliary decoder used during training is removed during inference, ensuring that SIM-CoT's reasoning efficiency remains comparable to other implicit methods while still providing a speed advantage over explicit CoT [21]. Group 4: Experimental Validation - The authors conducted systematic evaluations of SIM-CoT, confirming that it is more accurate, stable, and token-efficient compared to existing methods [17]. - The framework was validated across various models, including GPT-2 and LLaMA 1B/3B/8B, consistently showing effective performance improvements [22].
火山引擎成为总台春晚独家AI云合作伙伴,“京东AI购”上线
GF SECURITIES· 2026-01-04 07:25
Investment Rating - The industry investment rating is "Buy" [3] Core Insights - The report highlights that AI applications are expected to enter a new phase of intensive catalysis, with both industrial logic and catalysts presenting opportunities. Long-term prospects for domestic large models catching up with overseas counterparts and further application deployment are positive. The report recommends leading internet companies like Alibaba and Tencent, as well as niche application leaders such as Kuaishou and Meitu [7][10]. Summary by Sections Domestic AI Dynamics Tracking - The report tracks domestic AI large model product data, indicating that the web traffic for major AI models remains stable, with some fluctuations. For instance, Kimi had 7.99 million visits, down 7.83% week-on-week, while DeepSeek led with 66.33 million visits, down 5.06% [21][23]. - The report also notes that the average daily visit duration for Kimi is around 8 minutes, while other models like DeepSeek and Tongyi Qianwen are around 5 minutes [13]. Key Company Events - The report mentions that GLM-4.7 has topped the Artificial Analysis global open-source ranking, achieving a score of 68 in the Artificial Analysis Intelligence Index, making it the top model in both open-source and domestic categories [39][40]. Overseas AI Dynamics Tracking - In the overseas market, ChatGPT continues to lead in web traffic, while Claude's traffic has seen a decline. The report emphasizes the competitive landscape of AI models globally [41].
AAAI 2026 | 首个抗端到端攻击的大模型加密指纹 / 水印方案
机器之心· 2025-12-01 09:30
Core Insights - The article discusses the development of iSeal, an encrypted fingerprinting solution designed to protect the intellectual property of large language models (LLMs) against advanced attacks [2][3][5]. Research Background - The training of large language models often incurs costs in the millions of dollars, making the model weights valuable intellectual property. Researchers typically use model fingerprinting techniques to assert ownership by embedding triggers that produce characteristic responses [6][7]. - Existing fingerprinting methods assume that the verifier faces a black-box API, which is unrealistic as advanced attackers can directly steal model weights and deploy them locally, gaining end-to-end control [7][10]. iSeal Overview - iSeal is the first encrypted fingerprinting scheme designed for end-to-end model theft scenarios. It introduces encryption mechanisms to resist collusion-based unlearning and response manipulation attacks, achieving a 100% verification success rate across 12 mainstream LLMs [3][12]. Methodology and Innovations - iSeal's framework transforms the fingerprint verification process into a secure encrypted interaction protocol, focusing on three main aspects: - **Encrypted Fingerprinting and External Encoder**: iSeal employs an encrypted fingerprint embedding mechanism and an external encoder to decouple fingerprints from model weights, preventing attackers from reverse-engineering the fingerprints [15]. - **Confusion & Diffusion Mechanism**: This mechanism binds fingerprint features to the model's core reasoning capabilities, making them inseparable and resilient against attempts to erase specific fingerprints [15]. - **Similarity-based Dynamic Verification**: iSeal uses a similarity-based verification strategy and error correction mechanisms to identify fingerprint signals even when attackers manipulate outputs through paraphrasing or synonym replacement [15][18]. Experimental Results - In experiments involving models like LLaMA and OPT, iSeal maintained a 100% verification success rate even under advanced attacks, while traditional fingerprinting methods failed after minor fine-tuning [17][18]. - The results demonstrated that iSeal's design effectively prevents attackers from compromising the entire verification structure by attempting to erase parts of the fingerprint [17][21]. Ablation Studies - Ablation studies confirmed the necessity of iSeal's key components, showing that without freezing the encoder or using a learned encoder, the verification success rate dropped to near zero [20][21].
何小鹏谈开源:向前走是最重要的
Xin Lang Ke Ji· 2025-11-05 10:17
Core Insights - Xiaopeng Motors' CEO He Xiaopeng emphasized the importance of open-source technology, comparing it to initiatives by Meta and Alibaba, and expressed a commitment to collaboration within the industry [1] Group 1: Open Source Strategy - Xiaopeng Motors has decided to open-source its SDK, aiming to enhance collaboration and innovation within the automotive industry [1] - The company believes that successful operations require strong capabilities in core technology, computing power, data management, engineering, and customer satisfaction metrics like NPS and ENPS [1] Group 2: Financial Commitment - Xiaopeng Motors invests nearly 10 billion in R&D annually, reflecting its long-term commitment to technological advancement over its 11 years of operation [1] - The CEO expressed a desire for more partnerships, including with major players like Volkswagen, to drive the industry into a new phase of development [1]
实锤了:GPU越多,论文接收率越高、引用越多
机器之心· 2025-10-17 08:12
Core Insights - The article discusses the significant advancements in the AI field over the past three years, primarily driven by the development of foundational models, which require substantial data, computational power, and human resources [2][4]. Resource Allocation and Research Impact - The relationship between hardware resources and the publication of top-tier AI/ML conference papers has been analyzed, focusing on GPU availability and TFLOPs [4][5]. - A total of 5,889 foundational model-related papers were identified, revealing that stronger GPU acquisition capabilities correlate with higher acceptance rates and citation counts in eight leading conferences [5][9]. Research Methodology - The study collected structured information from 34,828 accepted papers between 2022 and 2024, identifying 5,889 related to foundational models through keyword searches [8][11]. - A survey of 229 authors from 312 papers indicated a lack of transparency in GPU usage reporting, highlighting the need for standardized resource disclosure [9][11]. Growth of Foundational Model Research - From 2022 to 2024, foundational model research has seen explosive growth, with the proportion of related papers in top AI conferences rising significantly [18][19]. - In NLP conferences, foundational model papers have outpaced those in general machine learning conferences [22]. Research Contributions by Academia and Industry - Academic institutions contributed more papers overall, while top industrial labs excelled in single-institution output, with Google and Microsoft leading in paper production [29][32]. - The research efficiency between academia and industry is comparable, with industry researchers publishing an average of 8.72 papers and academia 7.93 papers [31]. Open Source Models and GPU Usage - Open-source models, particularly the LLaMA series, have become the predominant choice in research, favored for their flexibility and accessibility [35][37]. - NVIDIA A100 is the most widely used GPU in foundational model research, with a notable concentration of GPU resources among a few institutions [38][39]. Funding Sources and Research Focus - Government funding is the primary source for foundational model research, with 85.5% of papers receiving government support [41][42]. - The focus of research has shifted towards algorithm development and inference processes, with a significant portion of papers dedicated to these areas [42]. Computational Resources and Research Output - The total computational power measured in TFLOPs is more strongly correlated with research output and citation impact than the sheer number of GPUs used [44][45]. - While more resources can improve acceptance rates, the quality of research and its novelty remain critical factors in the review process [47].
X @Avi Chawla
Avi Chawla· 2025-10-12 19:29
Core Problem of Traditional RAG - Most retrieved chunks in traditional RAG setups do not effectively aid the LLM, leading to increased computational costs, latency, and context processing [1][5] - Classic RAG involves fetching similar chunks from a vector database and directly inputting the retrieved context into the LLM [5] REFRAG Solution by Meta AI - Meta AI's REFRAG introduces a novel approach by compressing and filtering context at a vector level, focusing on relevance [1][2] - REFRAG employs chunk compression, relevance policy (RL-trained), and selective expansion to process only essential information [2] - The process involves encoding documents, finding relevant chunks, using a relevance policy to select chunks, and concatenating token-level representations [3][4] Performance Metrics of REFRAG - REFRAG outperforms LLaMA on 16 RAG benchmarks, demonstrating enhanced performance [5][7] - REFRAG achieves 30.85x faster time-to-first-token, significantly improving processing speed [5][7] - REFRAG handles 16x larger context windows, allowing for more extensive information processing [5][7] - REFRAG utilizes 2-4x fewer tokens, reducing computational resource consumption [5][7] - REFRAG leads to no accuracy loss across RAG, summarization, and multi-turn conversation tasks [7]
X @Avi Chawla
Avi Chawla· 2025-10-12 06:31
Core Innovation - REFRAG - Meta's REFRAG fundamentally rethinks retrieval in RAG setups by compressing and filtering context at a vector level [1] - REFRAG compresses each chunk into a single compressed embedding and uses a relevance policy trained via RL to select the most relevant chunks [1][2] - Only selected chunks are expanded back into full embeddings and passed to the LLM, processing only what matters [2] Technical Details - REFRAG encodes documents and stores them in a vector database [2] - It encodes the full user query, finds relevant chunks, and computes token-level embeddings for both [3] - A relevance policy, trained via RL, selects chunks to keep [3][5] - Token-level representations of the input query are concatenated with selected chunks and a compressed single-vector representation of rejected chunks before being sent to the LLM [3] Performance Metrics - REFRAG outperforms LLaMA on 16 RAG benchmarks [4][6] - It achieves 30.85x faster time-to-first-token, which is 3.75x better than previous state-of-the-art [4][6] - REFRAG handles 16x larger context windows [4][6] - It utilizes 2-4x fewer tokens [4][6] - REFRAG leads to no accuracy loss across RAG, summarization, and multi-turn conversation tasks [6]
从 1600 美元单卡到 450 万美元年费:部署大模型到底需要多少钱?
锦秋集· 2025-10-05 11:54
Core Insights - The article discusses the significant cost disparities between local deployment of AI models and subscription-based commercial APIs, highlighting the need for a clear cost analysis framework for businesses considering generative AI integration [1][2][5]. Cost Analysis Framework - A systematic cost analysis framework has been developed to compare total ownership costs (TCO) of local deployment (hardware, electricity) versus commercial APIs (subscription fees) [2][5]. - The framework includes an online cost estimation tool tailored for different business sizes, allowing companies to analyze their specific workloads [2][3]. Local Deployment Costs - Local deployment costs vary by model size: small models (e.g., EXAONE 4.0 32B) can be deployed with a single RTX 5090 GPU (approximately $2,000) and monthly electricity costs of $13.2; medium models (e.g., Llama-3.3-70B) require one A100 GPU ($15,000) with monthly costs of $7.92; large models (e.g., Qwen3-235B) need four A100 GPUs ($60,000) with monthly costs of $31.68 [2][3][21]. - Hardware costs account for over 90% of the initial investment in local deployment [2]. Commercial API Costs - Commercial APIs charge based on token usage, with significant price differences: high-end services like Claude-4 Opus charge $15 for 1 million input tokens and $75 for output, while cost-effective options like GPT-5 charge $1.25 for input and $10 for output [2][20]. - For a monthly processing of 50 million tokens, the annual cost for high-end services can exceed $4.5 million, while cost-effective options may only cost $375,000 [2]. Break-even Analysis - The break-even period varies significantly: small models can achieve break-even in as little as 0.3 months compared to high-end commercial APIs, while medium models take between 2.3 to 34 months, and large models can take from 3.5 to 108 months [2][3]. - A monthly processing threshold of 50 million tokens is critical for the economic viability of large model local deployments [2]. Market Context - The rapid development of LLMs has led to increased interest in local deployment due to concerns over data privacy, vendor lock-in, and long-term operational costs associated with commercial APIs [5][7]. - The article emphasizes the growing feasibility of local deployment for small and medium enterprises, driven by advancements in open-source models and hardware [12][50]. Strategic Decision Framework - The research categorizes deployment scenarios into three types: quick return on investment (0-6 months), long-term investment (6-24 months), and economically unfeasible (over 24 months), aiding organizations in making informed decisions [49][50]. - The findings suggest that local deployment is not as straightforward as previously thought, with various factors influencing the economic viability of different deployment strategies [48][52].