Workflow
GPT
icon
Search documents
Do THIS with OpenClaw so you don't fall behind... (14 Use Cases)
Matthew Berman· 2026-03-18 21:48
This is Jensen Hang, CEO of Nvidia. >> Open Claw is the number one opensource project in the history of humanity. Every single enterprise company, every single software company in the world need an agent strategy.You need to have an open claw strategy. >> So even if you're a power user of OpenClaw, you're probably not getting the most out of it. I have spent over 200 hours and billions of tokens perfecting my OpenClaw setup.And in this video, I'm going to teach you every best practice that I have learned. S ...
DeepSeek、GPT、Qwen,所有大模型架构图都有,Karpathy:宝藏画廊!
机器之心· 2026-03-16 03:53
机器之心报道 最近几年,大模型赛道好不热闹。 叫得上名字的几乎数都数不过来:从 GPT、Llama、Gemma、Mistral,到 DeepSeek、Qwen、Kimi、GLM、MiniMax 等等,新模型几乎以周更的速度出现。 但问题是,当架构创新越来越多时,理解它们反而变得越来越困难。不同论文里的模型结构图风格各异、模块命名不统一,即便是研究者,也很难快速看清一个 模型究竟在哪些地方做出了关键改动。 如果把过去几年主流模型的架构放在一起,你会发现一个明显的空白:我们拥有大量模型,却缺少一张清晰的大模型架构图。 最近,AI 研究者 Sebastian Raschka 就尝试给了这样一张图,他将过去几年主流大模型的结构重新绘制,并整理成了一个在线图谱 「LLM Architecture Gallery」。 原文地址:https://sebastianraschka.com/llm-architecture-gallery/ #card -olmo-2-7b 根据 Raschka 介绍,该网站汇集了他此前两篇博客中的内容,这两篇博客分别为《The Big LLM Architecture Comparison》 ...
让LLM互相“审稿”:简单的LLM Collaboration/Ensemble方法实现7%性能提升
AI前线· 2026-03-11 09:32
Core Insights - The article discusses the emergence of various large language models (LLMs) such as Gemini, GPT, Qwen, Llama, and DeepSeek, highlighting the availability of over 182,000 models on Hugging Face. It identifies two main concerns: persistent performance issues and the distinct advantages and disadvantages of different LLMs [2][3][4][5][6]. LLM Ensemble Concept - The concept of "LLM Ensemble" is introduced, suggesting that instead of relying on a single LLM based on performance rankings, it is more beneficial to consider multiple LLMs simultaneously to leverage their diverse strengths [1]. Post-hoc Ensemble Methods - The article categorizes post-hoc ensemble methods into two types: 1. Selection-then-regeneration methods, which depend heavily on task-specific training data and require fine-tuning a large model, limiting their flexibility [8][9]. 2. Similarity-based selection methods, which are mostly unsupervised and select responses based on similarity metrics, though they are criticized for their simplistic design [2][3]. LLM-PeerReview Framework - The LLM-PeerReview framework is proposed as a simple, unsupervised LLM ensemble method inspired by academic peer review processes. It consists of three sequential modules: Scoring, Reasoning, and Selection [7][12]. Scoring Process - The scoring process utilizes multiple LLMs as judges to evaluate responses to the same prompt, employing a novel "Flipped-triple scoring trick" to mitigate biases inherent in traditional scoring methods [12][13][14]. Reasoning and Selection - Reasoning involves aggregating scores from multiple judges, with two versions: a simple average and a weighted version that considers the review quality of different LLMs. Selection focuses on identifying the highest-scoring response from a pool of candidates [12][15]. Experimental Results - The LLM-PeerReview and its weighted variant LLM-PeerReview-W significantly outperform individual LLMs and existing ensemble baselines, achieving average performance improvements of 6.9% and 7.3% over advanced methods like Smoothie-Global [24]. Method Advantages - The LLM-PeerReview framework is characterized by its unsupervised nature, interpretability, and applicability across various tasks, including both Exact-Match Generation and Open-Ended Generation tasks [17]. Efficiency Analysis - The framework allows for a reduction in the number of evaluators to improve efficiency while maintaining performance quality, contrasting with traditional debate-based methods that require multiple rounds of evaluation [21]. Conclusion - LLM-PeerReview is presented as a transparent and effective ensemble method that mimics the peer review process, demonstrating significant advantages over existing models and methods in terms of performance and flexibility [26].
海外AI应用-25年度总结-26年展望
2026-03-10 10:17
Summary of Key Points from Conference Call Records Industry Overview - The conference call discusses the AI application landscape, particularly focusing on major cloud companies and software sectors, including AI Infrastructure, foundational software, and application software [1][2][3]. Core Insights and Arguments AI Revenue and Capex Trends - By 2027, major cloud companies like Microsoft, Google, and Amazon are expected to generate AI revenues exceeding $20-30 billion, with revenues covering costs by 2027 [1]. - The foundational software sector is seen as having significant "wrongfully punished" opportunities due to its consumption-based billing models, which are expected to benefit from the explosion of data driven by AI agents [1][2]. SaaS Valuation and Market Dynamics - SaaS valuations are at a 10-year low, with data access rights becoming a core barrier to entry. Vertical SaaS companies that manage sensitive data are expected to have higher bargaining power in the AI era [1][4]. - The application software sector is divided into process SaaS, vertical SaaS, and AI software, with vertical SaaS showing a more pronounced rebound in stock prices compared to process SaaS [4][10]. AI Coding and Market Penetration - AI coding is identified as the area with the highest penetration, with software engineering accounting for nearly 50% of AI tool usage. This trend is expected to lead to structural changes in software company cost structures by 2026 [1][25]. C-end Agent Competition - The competition for C-end agents is expected to accelerate in 2026, with a focus on integrating agents with traditional products like Instagram and Google Search. This will drive further investment in foundational infrastructure [1][22]. Additional Important Insights Third-party Infra Opportunities - Third-party infrastructure providers are anticipated to gain new opportunities as enterprises seek to avoid the risks associated with "full-stack bundling" from major model vendors [3][24]. Software Evaluation Metrics - The evaluation metrics for software companies are shifting from revenue growth to metrics like "AI product renewal rates" and "mid-platform coverage," indicating a restructured approach to assessing company performance in the AI era [3][12]. Market Sentiment and Valuation Adjustments - The current pessimistic valuation in the software sector is expected to improve as data value realization occurs, particularly in the second half of 2026 [24]. AI Agent Data Assets - The core barriers for AI agents are likely to stem from the private data accumulated during user interactions, which will enhance the competitive edge of companies that can establish stable user relationships early on [29][30]. Future of Software Company Evaluation - The evaluation framework for software companies is expected to evolve, focusing more on AI-related metrics rather than traditional growth and profit measures [31]. This summary encapsulates the key points discussed in the conference call, highlighting the evolving landscape of AI applications and the implications for various sectors within the software industry.
第一批龙虾受害者出现了
投资界· 2026-03-10 09:02
Core Viewpoint - The article discusses the rising costs associated with using AI tools like OpenClaw, highlighting the financial burden on users while also pointing out the lucrative opportunities for AI model providers [2][5][6]. Group 1: AI Usage and Costs - Users of OpenClaw are experiencing high token consumption costs, which can be several times higher than traditional models, leading to significant expenses for tasks such as generating text or processing data [2][5]. - A programmer reported a token expense of 12,000 yuan within three days due to API key theft, illustrating the financial risks involved [2]. - Developers have shared experiences of spending substantial amounts on token fees, with one instance costing 100 dollars for just two hours of automated task processing [4][5]. Group 2: Market Response and Opportunities - The introduction of policies supporting AI tools like OpenClaw has led to a surge in interest from entrepreneurs, with thousands seeking consultation in regions like Suzhou and Shenzhen [4]. - Major Chinese AI model companies are seeing significant revenue growth, with MiniMax reporting an annual recurring revenue (ARR) exceeding 150 million dollars as of February 2026 [6]. - The overall token usage for Chinese AI models surged to 41.9 trillion tokens in early March 2026, a 34.9% increase from the previous week, indicating a strong market demand [6]. Group 3: Investment and Market Trends - The stock market has reacted positively to the AI boom, with companies like MiniMax seeing stock prices increase by over 50% in early March 2026, reaching new market valuations [7]. - Major tech firms are rapidly deploying their own AI solutions, with ByteDance, Tencent, and Alibaba launching competing products to capitalize on the growing demand for AI capabilities [7]. - The article emphasizes the potential for AI to transform industries by lowering barriers to entry for individual entrepreneurs, reminiscent of the early days of the internet [11].
OpenAI to buy cybersecurity startup Promptfoo to better safeguard AI agents
CNBC· 2026-03-09 18:37
Core Insights - OpenAI is acquiring cybersecurity startup Promptfoo to enhance security tools for AI systems [1][2] - The acquisition will integrate Promptfoo's tools into OpenAI's Frontier platform for AI agents [1] - Promptfoo's team will join OpenAI, and the company will continue to develop its open-source project for testing AI prompts [2] Company Developments - OpenAI has been actively acquiring startups in the competitive AI market, including the recent acquisition of healthcare tech startup Torch for approximately $60 million [3] - The company previously acquired Software Applications, which developed an AI interface for Apple Mac users [3] Industry Context - The AI market is highly competitive, with key players including Anthropic, Google, and Meta [3] - As AI agents become more integrated with real data, the need for robust security and validation is increasingly critical [2]
未知机构:美股行情多头平仓主导标普500指数收盘下跌56个基点报6-20260306
未知机构· 2026-03-06 02:20
美股行情:多头平仓主导 标普500指数收盘下跌56个基点,报6831点,尾盘有数十亿美元买单支撑。 纳斯达克100指数(NDX)下跌29个基点,报25020点;罗素2000指数(R2K)下跌191个基点,报2586点;道琼斯 工业平均指数下跌161个基点,报47955点。 全美所有股票交易所合计成交222亿股,较年初至今日均成交量(194.5亿股)显著放大。 波动率指 美股行情:多头平仓主导 标普500指数收盘下跌56个基点,报6831点,尾盘有数十亿美元买单支撑。 纳斯达克100指数(NDX)下跌29个基点,报25020点;罗素2000指数(R2K)下跌191个基点,报2586点;道琼斯 工业平均指数下跌161个基点,报47955点。 全美所有股票交易所合计成交222亿股,较年初至今日均成交量(194.5亿股)显著放大。 波动率指数(VIX)上涨12.06%,报23.7;WTI原油上涨6.21%,报79.33美元/桶;美国10年期国债收益率上涨3个 基点,报4.13%;黄金下跌124点,报5075美元/盎司;美元指数(DXY)上涨28个基点,报99.05;比特币下跌 2.79%,报71298美元。 美股普遍 ...
X @Xeer
Xeer· 2026-03-03 03:22
Does anyone else gets concerned when you enter a complex prompt into Claude, GPT or Perplexity and it responds almost immediately (without thinking)? I'm like was my request too basic or did I make a mistake in the prompt. ...
The Powerful Alternative To Fine-Tuning
Y Combinator· 2026-02-27 15:00
The world is changing so quickly. This is probably a little bit obvious, but you should just try things and and like every day do something with AI. Last summer, I took a weekend and used um GPT5 to help me build an iPhone app.I hadn't done that in a decade. And yeah, it's so fast and so easy. And that was, you know, an age ago. That was like 8 months ago.Uh now it's even faster and easier. Don't limit yourself. like anything that you imagine, you should just try to use AI and see how far you can get with i ...
Anthropic指控AI公司蒸馏剽窃,马斯克硬刚“贼喊抓贼”
Sou Hu Cai Jing· 2026-02-25 10:13
Core Viewpoint - Anthropic accuses three leading Chinese AI companies, DeepSeek, Moonshot, and MiniMax, of infringing on its Claude model capabilities through fraudulent accounts and proxy services, utilizing a technique known as "model distillation" to enhance their own models [3][4]. Group 1: Allegations of Model Theft - Anthropic claims that the Chinese AI companies used fraudulent accounts to access Claude, generating over 16 million interactions, which they argue violates service terms and access restrictions [3][4]. - The three companies are accused of employing similar methods to access Claude's capabilities, particularly focusing on agentic reasoning, tool usage, and coding abilities [4]. Group 2: Specific Interactions and Patterns - DeepSeek engaged in over 150,000 interactions, focusing on extracting Claude's reasoning capabilities across diverse tasks, indicating coordinated efforts to avoid detection [5]. - Moonshot AI recorded over 3.4 million interactions, targeting agentic reasoning, tool usage, and data analysis, aiming to reconstruct Claude's reasoning pathways [5]. - MiniMax had the largest scale with over 13 million interactions, specifically targeting agent coding and tool usage, demonstrating adaptability by redirecting traffic to capture new features [5]. Group 3: Legal and Ethical Implications - The allegations raise questions about the legality of model distillation and the ethical considerations surrounding AI training, as many large language models are trained on publicly available internet data without explicit consent from original authors [7][8]. - There is an ongoing debate regarding the ownership of synthetic data and compliance issues related to training, particularly for open-source models [8]. Group 4: National Security and Export Controls - Anthropic's accusations highlight concerns over national security, suggesting that illegal distillation could undermine U.S. control over advanced AI technology exports [9]. - Current U.S. export controls primarily focus on hardware rather than large language model API access, indicating a gap in regulatory measures [9]. Group 5: Developer Responsibilities and Compliance - Developers using large language models must ensure their training processes are secure and compliant, maintaining clear records of training data sources and adhering to service terms [10][11]. - Anthropic is investing in defensive technologies to detect "distillation attack" patterns and is implementing protective measures to reduce the effectiveness of illegal distillation while maintaining legitimate user experience [11].