Connection Machine
Search documents
腾讯研究院AI速递 20250912
腾讯研究院· 2025-09-11 16:01
Group 1 - Thinking Machines has released its first research blog addressing non-determinism in LLM inference, focusing on batch invariance [1] - The research team improved RMSNorm, matrix multiplication, and attention mechanisms to achieve fully reproducible inference results with acceptable performance loss [1] - The company's valuation has reached $12 billion, with a founding team primarily from OpenAI, and its first product is named Connection Machine [1] Group 2 - OpenAI announced that ChatGPT now officially supports MCP (Model Context Protocol), allowing Plus and Pro users to automate operations with a single prompt [2] - MCP standardizes interactions between AI models, tools, and data sources, enabling different models to share context and support plug-and-play functionality [2] - Users can connect third-party services (like Stripe) in developer mode to complete complex tasks, although this cannot be used simultaneously with other ChatGPT features [2] Group 3 - WeChat official account has launched an "Intelligent Reply" feature supported by Tencent's Hunyuan large model, addressing the issue of operators not being able to respond to reader inquiries in a timely manner [3] - This feature automatically learns from the account's historical articles and reply styles, marking replies as "intelligent replies" and referencing relevant historical articles [3] - Tencent Hunyuan will also introduce Roleplay models and AI avatar applications to provide immersive dialogue experiences, which individual creators can enable in the PC backend of the official account [3] Group 4 - Kimi has open-sourced a new middleware called checkpoint-engine, capable of updating trillion-parameter models across thousands of GPUs in 20 seconds, significantly enhancing reinforcement learning efficiency [4] - This technology employs a hybrid co-location architecture to manage parameter states through a distributed checkpoint engine, enabling parallel processing of parameter broadcasting and reloading [4] - The system design supports complete decoupling of training and inference engines, using a pipeline approach for parameter updates to enhance stability against single-point failures [4] Group 5 - NVIDIA has released a new AI Blueprint that allows 3D artists to quickly create scene prototypes using generative AI technology, generating up to 20 3D models from text prompts [5] - It integrates Microsoft TRELLIS and NVIDIA NIM microservices, achieving speeds 20% faster than native applications, and supports RTX 50 and 40 series GPUs with over 16GB of memory [5] - The workflow automates the conversion from concept to 3D model, with generated models exportable to platforms like Blender for further optimization, significantly reducing prototype design time for artists [5] Group 6 - Baidu Academic has completed an AI reconstruction, launching features like AI academic search, AI literature summarization, AI reading, and paper mapping, creating the first one-stop AI academic platform in the industry [7] - The platform covers the entire academic chain of "search, read, create, and edit," providing literature summarization, full-text translation, topic recommendations, and professional formatting, greatly enhancing research efficiency [7] - It has indexed 690 million literature resources, covering 1.04 million academic sites, and established 4.2 million scholar profiles, with plans to build an academic identity system supported by Baidu's full traffic [7] Group 7 - Tencent Meeting has launched an AI hosting feature in collaboration with Yuanbao, allowing users to have the AI listen to meetings in advance and record in real-time, addressing issues like tardiness and overlapping meetings [8] - Users can activate "AI hosting" on the meeting page or list, enabling Yuanbao to automatically join the meeting and generate intelligent AI minutes, ensuring no content is missed [8] - After the meeting, users can directly ask Yuanbao about the meeting content to assist in decision-making, ensuring that key meetings are always "present" [8] Group 8 - Wang Xingxing, founder of Yushu Technology, expressed regret for not focusing on AI since 2011, believing that the current fields for AI application remain "desolate" [9] - Yushu Technology has announced its IPO plan, expecting to submit an application by the end of 2025, with projected revenue exceeding 1 billion yuan in 2024 and four consecutive years of profitability, aiming to become the largest "quadruped and humanoid robot" stock globally [9] - Wang revised his previous views on data, acknowledging that both robot data and models are core issues, advising young entrepreneurs to embrace current AI technological innovations [9] Group 9 - Sutton, known as the "father of reinforcement learning," stated in a speech that AI is entering an "experience era," where intelligence will be gained from continuous learning rather than static knowledge accumulation [10] - He emphasized that fears surrounding AI are exaggerated, suggesting that AI and human prosperity stem from decentralized collaboration, allowing intelligent agents to coexist peacefully under different objectives [10] - Sutton proposed four predictive principles, asserting that human intelligence will be surpassed, power will shift to the smartest agents, and AI is an inevitable next step in the evolution of the universe [10]
成立7个月首发声,百亿美金独角兽万字雄文:攻克LLM推理非确定性难题
3 6 Ke· 2025-09-11 08:11
Core Insights - Thinking Machines Lab has launched its flagship product named "Connection Machine" and introduced a research blog titled "Connectionism" to share advancements in AI research [1][3][4] - The blog's first article discusses the challenge of achieving reproducible results in large language model (LLM) inference, highlighting the non-deterministic nature of LLM outputs [6][9][20] Group 1: Product and Research Focus - The "Connectionism" blog will evolve with the company's research, covering topics from numerical computation to prompt engineering [3] - The name "Connection Machine" reflects a historical reference to early AI research focused on neural networks [4] Group 2: Non-Determinism in LLM Inference - Achieving reproducibility in LLM inference is crucial, yet challenging, as identical inputs can yield different outputs due to sampling processes [9][11] - The research identifies a hypothesis linking non-determinism to floating-point arithmetic and concurrent execution, suggesting that the order of operations can affect results [14][20] Group 3: Solutions for Reproducibility - The study proposes that non-determinism in LLM inference arises from batch size variations rather than atomic competition, emphasizing the need for batch-invariant operations [20][21] - Implementing batch-invariant strategies for operations like RMSNorm, matrix multiplication, and attention mechanisms is essential for achieving reproducibility [21][28][33] Group 4: Performance and Experimentation - Initial experiments with a deterministic kernel showed that while performance may decline, it remains acceptable, with a performance drop of about 20% compared to standard methods [29][43] - The research demonstrated that using a deterministic kernel resulted in identical outputs across multiple completions, contrasting with the variability seen in non-deterministic settings [42]
她们估值840亿,刚发了第一个AI成果
量子位· 2025-09-11 01:58
Core Insights - Thinking Machines, valued at $12 billion, has released its first research blog focusing on overcoming nondeterminism in large language model (LLM) inference [1][51]. - The research emphasizes the challenge of reproducibility in LLM outputs, attributing it to batch non-invariance [3][12]. Group 1: Research Focus - The main theme of the research is "Defeating Nondeterminism in LLM Inference," which addresses why LLM inference results are often non-reproducible [3][8]. - The root cause identified is batch non-invariance, where the output of a single request is influenced by the number of requests in the same batch [14][15]. Group 2: Technical Findings - The research indicates that floating-point non-associativity and concurrent execution lead to different results in LLM inference, but this explanation is incomplete [9][10]. - The study reveals that the lack of batch invariance is the primary issue, as dynamic adjustments to batch sizes during deployment affect the computation order of key operations [15][16]. Group 3: Proposed Solutions - To achieve batch invariance, the research suggests fixing the reduction order in operations like RMSNorm and matrix multiplication, regardless of batch size [18][19]. - The proposed method involves compiling a unified kernel configuration for all input shapes to avoid switching parallel strategies due to batch size changes, even if it results in a performance loss of about 20% [22][21]. Group 4: Experimental Validation - Three types of experiments were conducted to validate the findings: inference determinism verification, performance verification, and real online policy reinforcement learning application verification [25]. - Results showed that using batch invariant kernels led to 1000 identical outputs, achieving deterministic inference, while non-invariant kernels produced 80 different results [27][28]. Group 5: Company Background - Thinking Machines was co-founded by Mira Murati, former CTO of OpenAI, and includes a team of notable figures from the AI industry, primarily from OpenAI [36][38][46]. - The company recently completed a $2 billion seed funding round, setting a record for AI funding, and is now valued at $12 billion despite not having any product yet [51][50].