Large Language Model - filings, earnings calls, financial reports, news - Reportify

Large Language Model

Search documents

Seahorse emoji prompts reaction from ChatGPT

NBC News· 2025-11-06 04:42

A very simple question. Is there a seahorse emoji. Well, uh, easy enough for us humans to maybe figure out, but do not ask AI because it can send some models into this doomlooping tail spin.It is the latest AI debate/experiment playing out on the internet. It is turned into a fullblown emoji investigation. Watch this.>> What happens when you ask GPT if there's a seahorse emoji. It says yes and then it freaks out. It's still going.Holy crap. Look how long this is. >> It starts guessing and keeps correcting i ...

Artificial Intelligence

Large Language Model

Artificial Intelligence

Large Language Model

斯坦福新发现：一个“really”，让AI大模型全体扑街

3 6 Ke· 2025-11-04 09:53

Core Insights - A study reveals that over 1 million users of ChatGPT exhibited suicidal tendencies during conversations, highlighting the importance of AI's ability to accurately interpret human emotions and thoughts [1] - The research emphasizes the critical need for large language models (LLMs) to distinguish between "belief" and "fact," especially in high-stakes fields like healthcare, law, and journalism [1][2] Group 1: Research Findings - The research paper titled "Language models cannot reliably distinguish belief from knowledge and fact" was published in the journal Nature Machine Intelligence [2] - The study utilized a dataset called "Knowledge and Belief Language Evaluation" (KaBLE), which includes 13 tasks with 13,000 questions across various fields to assess LLMs' cognitive understanding and reasoning capabilities [3] - The KaBLE dataset combines factual and false statements to rigorously test LLMs' ability to differentiate between personal beliefs and objective facts [3] Group 2: Model Performance - The evaluation revealed five limitations of LLMs, particularly in their ability to discern right from wrong [5] - Older generation LLMs, such as GPT-3.5, had an accuracy of only 49.4% in identifying false information, while their accuracy for true information was 89.8%, indicating unstable decision boundaries [7] - Newer generation LLMs, like o1 and DeepSeek R1, demonstrated improved sensitivity in identifying false information, suggesting more robust judgment logic [8] Group 3: Cognitive Limitations - LLMs struggle to recognize erroneous beliefs expressed in the first person, with significant drops in accuracy when processing statements like "I believe p" that are factually incorrect [10] - The study found that LLMs perform better when confirming third-person erroneous beliefs compared to first-person beliefs, indicating a lack of training data on personal belief versus fact conflicts [13] - Some models exhibit a tendency to engage in superficial pattern matching rather than understanding the logical essence of epistemic language, which can undermine their performance in critical fields [14] Group 4: Implications for AI Development - The findings underscore the urgent need for improvements in AI systems' capabilities to represent and reason about beliefs, knowledge, and facts [15] - As AI technologies become increasingly integrated into critical decision-making scenarios, addressing these cognitive blind spots is essential for responsible AI development [15][16]

Artificial Intelligence

Large Language Model

Artificial Intelligence

Artificial Intelligence

Large Language Model

Artificial Intelligence

刚刚，Cursor 2.0携自研模型Composer强势登场，不再只做「壳」

机器之心· 2025-10-30 01:41

Core Insights - Cursor has officially launched its own large language model, Composer, marking a significant evolution from being a platform reliant on third-party models to becoming an AI-native platform [2][4][3] - The release of Composer is seen as a breakthrough that enhances Cursor's capabilities in coding and software development [4][3] Summary by Sections Composer Model - Composer is a cutting-edge model that, while not as intelligent as top models like GPT-5, boasts a speed that is four times faster than comparable intelligent models [6] - In benchmark tests, Composer achieved a generation speed of 250 tokens per second, which is double that of leading fast inference models and four times that of similar advanced systems [9] - The model is designed for low-latency coding tasks, with most interactions completed within 30 seconds, and early testers have found its rapid iteration capabilities to be user-friendly [11] - Composer utilizes a robust set of tools for training, including semantic search across entire codebases, significantly enhancing its ability to understand and process large codebases [12] - The model is a mixture of experts (MoE) architecture, optimized for software engineering through reinforcement learning, allowing it to generate and understand long contexts [16][19] Cursor 2.0 Update - Cursor 2.0 introduces a multi-agent interface that allows users to run multiple AI agents simultaneously, enhancing productivity by enabling agents to handle different parts of a project [21][24] - The new version focuses on an agent-centric approach rather than a traditional file structure, allowing users to concentrate on desired outcomes while agents manage the details [22] - Cursor 2.0 addresses new bottlenecks in code review and change testing, facilitating quicker reviews of agent changes and deeper code exploration when necessary [25] Infrastructure and Training - The development of large MoE models requires significant investment in infrastructure, with Cursor utilizing PyTorch and Ray to create a customized training environment for asynchronous reinforcement learning [28] - The team has implemented MXFP8 MoE kernels to train models efficiently across thousands of NVIDIA GPUs, achieving faster inference speeds without the need for post-training quantization [28] - The Cursor Agent framework allows models to utilize various tools for code editing, semantic searching, and executing terminal commands, necessitating a robust cloud infrastructure to support concurrent operations [28] Community Feedback - The major update has garnered significant attention, with early users providing mixed feedback, highlighting both positive experiences and areas for improvement [30][31]

Large Language Model

AI Programming Tool

Large Language Model

AI Programming Tool

Inuvo (NYSEAM:INUV) Conference Transcript

2025-10-21 19:02

Inuvo Inc. Conference Call Summary Company Overview - Inuvo Inc. operates in the ad tech industry, leveraging a proprietary large language model for audience discovery and media activation [1][2] - The company has been in operation for 10 years and is publicly traded on NYSE under the ticker symbol INUV [17] Core Business Model - Inuvo generates revenue through a platform business that services major digital supply chains and agencies, as well as direct marketing to clients [2][3] - The technology is protected by 19 patents and 6 pending patents, emphasizing its proprietary nature [3] Industry Landscape - The U.S. ad market is heavily reliant on programmatic media buying, with 64% of ad dollars funneled through these platforms [4] - The ad tech industry is valued at $220 billion and is experiencing growth, particularly in segments like connected TV and retail media networks [4] - Legacy ad systems are struggling due to privacy concerns and the decline of consumer tracking methods like cookies [4][5] Technological Advantages - Inuvo's technology is designed to operate without personal data, focusing instead on collective interests and intent pathways [9][15] - The IntentKey AI platform analyzes billions of real-time signals to create predictive audience models that refresh every five minutes [9][10] - The technology allows for precise targeting and audience discovery, enabling marketers to reach potential customers before competitors [10][15] Performance Metrics - Inuvo claims a 60% performance advantage over competitive platforms, with a high client retention rate [17] - The company has reported a five-year quarterly compound annual growth rate (CAGR) of 24% through Q2 of the current year [17] - The company is approaching the $100 million revenue mark and has access to $10 million in capital [17][18] Future Growth Strategies - Inuvo plans to expand its client base by adding self-serve clients who can execute their own media buys [18] - The company aims to work more directly with brands, moving upstream in the advertising ecosystem [19] - Targeting high-spending sectors like sports gambling is identified as a significant opportunity for revenue growth [20][21] Key Challenges - The ad tech industry is facing a challenging environment, particularly for agencies, which are being washed out [19] - The company is navigating a complex market landscape but believes its privacy-first approach positions it favorably [19] Conclusion - Inuvo Inc. is positioned as a disruptive force in the ad tech industry, leveraging advanced AI technology to address current market challenges and capitalize on growth opportunities [1][10]

Large Language Model

Privacy - compliant Advertising

Advertising Technology

Large Language Model

Privacy - compliant Advertising

Advertising Technology

Alibaba's Zhang on AI in E-Commerce

Bloomberg Television· 2025-10-17 02:51

AI Deployment & Impact - Alibaba is deploying generative AI across its e-commerce platforms during the Double 11 Shopping Festival [1] - The company reworked its 2 billion product listings using a large language model [1][2] - AI improves product relevance by 20% for semantically complex consumer queries [3][4] - AI leads to a 10% improvement in clicks by providing consumers with more relevant and surprising product recommendations [5] Merchant Benefits & Cost Savings - Alibaba provides merchants with a comprehensive suite of AI tools, including content generation, customer service, and marketing [6] - Merchants are saving 20 million Chinese Yuan per day in customer service costs through AI adoption [7] Investment & Future Plans - Alibaba plans to increase CapEx spending beyond the previously pledged 380 billion Chinese Yuan [7] - The usage of AI in terms of inference and tokens is increasing rapidly [8]

Large Language Model

Large Language Model

「重要性采样」并不「重要」？快手清华ASPO攻克重要性采样权重错配

量子位· 2025-10-15 10:20

Core Insights - Reinforcement Learning (RL) has become a crucial component in the post-training phase of Large Language Models (LLMs) like ChatGPT and DeepSeek [1] - A significant issue has emerged with the increasing scale of model parameters: the importance sampling (IS) mechanism may not be as beneficial as previously thought [2][5] - The research team from Kuaishou and Tsinghua University identified a deep-rooted "weight mismatch" phenomenon in existing supervised RL paradigms, leading to overconfidence in models and potential issues like entropy collapse and premature convergence [2][6] Importance Sampling Issues - Importance sampling is intended to correct the distribution differences between old and new policies, allowing models to reuse old data without deviating from the target distribution [5] - In small-scale RL, IS is effective; however, it fails in the context of supervised RL for large language models [6] - Experiments showed that in GRPO algorithms, IS did not provide the expected benefits and instead contributed to training instability [7] Weight Mismatch and Self-Reinforcing Loops - The research revealed that the advantage values in supervised RL are inaccurate, as different tokens contribute differently to the final answer [8] - The average IS weight for positive advantage tokens is higher than for negative ones, leading to a decrease in entropy [9] - IS in supervised RL algorithms has shifted from being a correction term to a token-level weight, causing a self-reinforcing loop that reinforces high-scoring tokens while neglecting low-probability ones [11][12] ASPO Algorithm Introduction - The proposed ASPO (Asymmetric Importance Sampling Policy Optimization) algorithm addresses these issues by inverting the IS weights for positive advantage tokens, allowing low-probability tokens to receive stronger updates [3][18] - ASPO incorporates a Dual-Clipping mechanism to manage extreme values resulting from the inverted weights, ensuring stability while maintaining effective gradient flow [20] Experimental Results - ASPO demonstrated significant advantages in various benchmarks, including mathematical reasoning and code generation tasks, outperforming traditional methods [24] - The average performance improvement was 12.5% for mathematical tasks and 17.0% for code generation tasks, with smoother training curves and reduced entropy collapse [26] - ASPO achieved notable results in the LiveCodeBench v5 benchmark, indicating its superiority over mainstream RL methods [26][27]

Importance Sampling

Reinforcement Learning

Large Language Model

Artificial Intelligence

ASPO (Asymmetric Importance Sampling Policy Optimization)

Importance Sampling

Reinforcement Learning

Large Language Model

Artificial Intelligence

ASPO (Asymmetric Importance Sampling Policy Optimization)

Google AI 今年最大王炸，测试曝光直接复刻 macOS，比GPT-5更值得期待

3 6 Ke· 2025-10-15 09:29

Core Insights - The article discusses the advancements of Google's Gemini 3.0 AI model, highlighting its superior coding capabilities compared to competitors like GPT-5 and Claude [1][3][51] - Gemini 3.0 is reported to generate fully functional web applications, including a macOS-like web operating system, showcasing significant improvements in both functionality and design [6][7][22] - The model's inference speed has also improved, with tasks being completed in 1-2 minutes, which is faster than its predecessors [8][22] Group 1: Model Performance - Gemini 3.0 has demonstrated the ability to generate a fully functional web operating system, allowing users to interact with applications as if they were using a real computer [6][7] - The model's coding capabilities have been tested against various tasks, showing a trend of outperforming GPT-5 and even Claude in certain areas [3][5][51] - Users have reported that Gemini 3.0 can create complex applications, including video editors and interactive games, indicating a leap in its programming abilities [24][44] Group 2: User Experience and Feedback - Feedback from users indicates that Gemini 3.0's design and functionality are impressive, with many noting its ability to create aesthetically pleasing and functional web applications [21][22] - Some users have expressed concerns about the model's default design choices, suggesting that while improvements have been made, there are still areas for enhancement [22][24] - The model's ability to generate unique and creative outputs has led to speculation that it may dominate the front-end development space, similar to its predecessor, nano banana [21][55] Group 3: Competitive Landscape - The advancements of Gemini 3.0 position Google as a strong competitor in the AI space, particularly in coding and application development, challenging the established dominance of OpenAI's GPT-5 and Anthropic's Claude [51][55] - The article notes that while OpenAI continues to leverage its large user base for continuous application development, Google is catching up with innovative features in Gemini 3.0 [51][55] - The competitive dynamics in the AI industry are shifting, with Gemini 3.0's capabilities potentially altering user preferences and market positioning [55]

Artificial Intelligence

Large Language Model

Artificial Intelligence

Artificial Intelligence

Large Language Model

Artificial Intelligence

企业在院校设奖学金，不能简单地理解为“抢人”

Nan Fang Du Shi Bao· 2025-10-15 00:00

Group 1 - Tencent has launched the Qinyun Scholarship, focusing on fundamental research and application innovation in the field of artificial intelligence, targeting master's and doctoral students from mainland China and Hong Kong, Macau, and Taiwan [1] - The scholarship aims to select 15 winners in its first phase, each receiving a cash reward of 200,000 yuan and cloud heterogeneous computing resources valued at 300,000 yuan, along with potential internship or employment opportunities at Tencent [1] - The emphasis on applicants having a forward-looking research vision highlights the need for disruptive innovation in AI, as current large language models are seen as inadequate for true reasoning and scientific discovery [2][4] Group 2 - Young scholars in AI face significant funding challenges, as the cost of research increases with technological advancement, particularly in deploying large models that require expensive hardware [3] - The 300,000 yuan in cloud computing resources can support approximately three months of continuous use of cutting-edge GPU instances, providing crucial support for young AI researchers [4] - Establishing scholarships not only fulfills corporate social responsibility but also helps in talent acquisition and may lead to the discovery of future groundbreaking technologies, creating a win-win situation for companies, society, and students [4]

TENCENT(HK:00700)

Artificial Intelligence

Large Language Model

Software and Internet

青云奖学金

Artificial Intelligence

Large Language Model

Software and Internet

青云奖学金

大模型追逐星辰大海，GPT和Gemini国际天文奥赛夺金

机器之心· 2025-10-13 04:21

Core Insights - The article discusses the remarkable advancements in artificial intelligence, particularly in large language models (LLMs) like GPT-5 and Gemini 2.5 Pro, which have achieved gold medal performances in the International Olympiad on Astronomy and Astrophysics (IOAA) [4][18]. Group 1: AI Model Performance - GPT-5 and Gemini 2.5 Pro excelled in the IOAA, demonstrating strong reasoning and problem-solving capabilities in astronomy and astrophysics [4][12]. - In the theoretical exams, GPT-5 scored an average of 84.2% while Gemini 2.5 Pro scored 85.6%, outperforming other models by 7 to 25 percentage points [12][13]. - The models achieved gold medal status, with GPT-5 scoring 86.8% in 2025, 89.6% in 2023, and 93.0% in 2022, consistently outperforming the best human participants [19][18]. Group 2: Evaluation Framework - The study introduced a more rigorous evaluation framework for assessing LLMs in scientific research, focusing on complex reasoning and problem-solving rather than simple knowledge recall [9][10]. - The IOAA was chosen as a benchmark due to its ecological validity, covering a wide range of astronomical topics and requiring multi-step reasoning [10][9]. Group 3: Error Analysis - The models showed a significant performance gap between different types of questions, with better accuracy in physics/mathematics problems (67-91%) compared to geometric/spatial problems (49-78%) [26]. - Common errors included conceptual misunderstandings and geometric reasoning challenges, indicating fundamental difficulties in achieving deep physical understanding [26][25].

Artificial Intelligence

Large Language Model

Artificial Intelligence

Artificial Intelligence

Large Language Model

Artificial Intelligence

Anthropic· 2025-10-09 16:28

Previous research suggested that attackers might need to poison a percentage of an AI model’s training data to produce a backdoor.Our results challenge this—we find that even a small, fixed number of documents can poison an LLM of any size.Read more: https://t.co/HGMA7k1Lnf ...

AI model poisoning

Large Language Model

AI model poisoning

Large Language Model