Workflow
Local AI
icon
Search documents
Mid-Training 会成为未来的 Pre-Training 吗?
机器之心· 2025-11-23 01:30
Group 1: Core Concepts of Mid-Training - The concept of "Mid-Training" is emerging as a potential new phase in the training of large language models (LLMs), positioned between pre-training and post-training, with OpenAI establishing a dedicated department for it in July 2024 [5][6][7] - Mid-Training is described as a vital stage that enhances specific capabilities of LLMs, such as mathematics, programming, reasoning, and long-context extension, while maintaining the foundational abilities of the model [9][10] - The definition and implementation of Mid-Training are still not universally agreed upon, with various organizations exploring its effects and mechanisms, indicating a growing interest in this area [8][11] Group 2: Technical Insights and Strategies - Research from Peking University and Meituan has attempted to clarify the definition of Mid-Training, focusing on data management, training strategies, and model architecture optimization [8][10] - Key optimization strategies for Mid-Training include data curation to enhance data quality, training strategies like learning rate annealing and context extension, and architecture optimization to improve model performance [10] - The exploration of Mid-Training has gained momentum since 2025, with increasing references in research papers from institutions like Microsoft and Zero One [6][7]
从 Apple M5 到 DGX Spark ,Local AI 时代的到来还有多久?
机器之心· 2025-11-22 02:30
Group 1 - The recent delivery of the DGX Spark AI supercomputer by Huang Renxun to Elon Musk has sparked community interest in local computing, indicating a potential shift from cloud-based AI to local AI solutions [1][4] - The global investment in cloud AI data centers is projected to reach nearly $3 trillion by 2028, with significant contributions from major tech companies, including an $80 billion investment by Microsoft for AI data centers [4][5] - The DGX Spark, priced at $3,999, is the smallest AI supercomputer to date, designed to compress vast computing power into a local device, marking a return of computing capabilities to personal desktops [4][5] Group 2 - The release of DGX Spark suggests that certain AI workloads are now feasible for local deployment, but achieving a practical local AI experience requires not only powerful hardware but also a robust ecosystem of local models and tools [6] Group 3 - The combination of new architectures in SLM and edge chips is expected to push the boundaries of local AI capabilities for consumer devices, although specific challenges remain to be addressed before widespread adoption [3]
OpenAI's GPT-OSS Open Weights Model Is Here And Can Run Locally On Your PC
CNET· 2025-08-11 12:01
Model Overview - OpenAI releases GPTO OSS, a local AI model that runs completely on the user's machine without internet connection [1] - GPTO OSS is a 20 billion parameter model that can run on most mid-tier machines [2][10] Performance and Capabilities - GPTO OSS provides similar responses to ChatGpt in summarizing information [3] - GPTO OSS is noted for its speed and efficiency, which can be adjusted via reasoning effort settings (low, medium, high) [4][5] - The model's responses are comparable to the early versions of ChatGpt, specifically GPT 3.5 [6] - GPTO OSS may generate inaccurate information and cite non-existent sources, similar to early versions of ChatGpt [7][8] Use Cases and Limitations - GPTO OSS is suitable for secure communication and entertainment purposes [10] - Online models like ChatGpt, Deep Research, Gemini, or Claude are recommended for research purposes due to their ability to access the internet and provide verifiable sources [9]
Foundry Local: Cutting-Edge AI experiences on device with ONNX Runtime/Olive — Emma Ning, Microsoft
AI Engineer· 2025-06-27 10:21
Key Benefits of Local AI - Addresses limitations of cloud AI in low-bandwidth or offline environments, exemplified by conference Wi-Fi issues [2][3] - Enhances privacy and security by processing sensitive data locally, crucial for industries handling legal documents and patient information [4] - Improves cost efficiency for applications deployed on millions of devices with high inference call volumes, such as game applications [5] - Reduces real-time latency, essential for AI applications requiring immediate responses [5] Foundry Local Overview - Microsoft introduces Foundry Local, an optimized end-to-end solution for seamless on-device AI, leveraging existing assets like Azure AI Foundry and ONNX Runtime [9] - ONNX Runtime accelerates performance across various hardware platforms, with over 10 million downloads per month [8] - Foundry Local Management Service hosts and manages models on client devices and connects to Azure AI Foundry to download open-source models on demand [10] - Foundry Local CLI and SDK enable developers to explore models and integrate Foundry Local into applications [11] - Foundry Local is available on Windows and macOS, integrated into the Windows platform for simpler AI development [12] Performance and Customer Feedback - Foundry Local accelerates performance across different silicon vendors, including NVIDIA, Intel, AMD, and Qualcomm [12] - Early adopters report ease of use and performance improvements, highlighting benefits like enhanced memory management and faster token generation [13][15][16] - Foundry Local enables hybrid solutions, allowing parts of applications to run locally, addressing data sensitivity concerns [17][18]
CSDN 创始人蒋涛:“码盲”消失,新程序员崛起
3 6 Ke· 2025-06-13 09:56
Core Insights - The rapid rise of AI technologies is transforming user habits, traffic sources, and the foundational aspects of business, with ChatGPT achieving 800 million users in record time and DeepSeek gaining traction globally [1][5] - The emergence of local AI solutions is seen as a response to the dominance of American technologies, particularly in terms of computing power, model access, and data control [3][6][7] Group 1: AI Market Dynamics - CSDN has 49 million registered users and aims to become a productivity platform for developers in the AI era, marking a significant shift in the industry [5] - The AI sector is experiencing unprecedented growth, with companies like Cursor rapidly achieving revenue milestones, highlighting a shift from traditional internet platforms to AI-driven solutions [5][6] Group 2: Challenges and Opportunities - The "three mountains" of power in AI—computing power (CUDA), model access (closed models), and data control (English-dominated datasets)—pose significant challenges for non-English speaking countries [3][6] - The need for a diverse and open data ecosystem is emphasized to overcome data hegemony and enable local AI development [6][7] Group 3: Future of Programming - The concept of "code blindness" is expected to fade as more individuals, including product managers, gain the ability to develop applications independently, transforming the programming landscape [8][9] - The number of developers is projected to grow significantly, with AI tools making coding more accessible and efficient [8][9] Group 4: AI and Hardware Integration - AI is not only transforming software but also hardware, with low-cost solutions enabling the integration of AI capabilities into physical products [11][12] - China's manufacturing capabilities are highlighted as a significant advantage in leveraging AI for hardware innovation, potentially leading to new industry creations [12][13]
CSDN 创始人蒋涛:“码盲”消失,新程序员崛起
AI科技大本营· 2025-06-13 07:51
Core Viewpoint - The article discusses the transition from Global AI to Local AI, emphasizing the need for countries and companies to establish their own data stacks to overcome the "three mountains" of power held by the U.S. in AI technology, models, and data [3][10]. Group 1: Transition to AI - The shift from traditional internet to AI represents a fundamental change in user habits, traffic sources, and business foundations [2]. - ChatGPT has rapidly gained 800 million users, showcasing the speed of AI adoption, while other AI companies are experiencing significant revenue growth [7]. - The emergence of DeepSeek signifies a move towards global equity in AI, challenging the dominance of U.S.-based AI solutions [7][10]. Group 2: The Three Mountains - The "three mountains" that need to be overcome include: 1. **Computing Power Dominance**: The U.S. maintains control through CUDA, necessitating the development of alternative systems like Huawei's CANN and AMD's ROCm [8]. 2. **Model Dominance**: The closed nature of U.S. models limits access, prompting the need for open-source alternatives like DeepSeek [9]. 3. **Data Dominance**: The reliance on English-dominated datasets restricts the development of localized AI solutions, highlighting the need for diverse, multilingual datasets [9]. Group 3: The Future of Programming - The article predicts the decline of "code illiteracy," with more individuals becoming capable of programming as AI tools simplify the coding process [11][12]. - The number of developers is expected to grow significantly, with GitHub reporting 190 million developers, increasing by 20% annually [11]. - The role of traditional programmers will evolve, as many tasks can now be automated by AI, allowing non-programmers to create applications independently [12][15]. Group 4: AI's Impact on Hardware - AI is transforming not only software but also hardware, enabling low-cost programming of physical devices [16]. - The integration of AI with hardware manufacturing in China presents significant opportunities, as demonstrated by successful startups leveraging AI for product development [17]. - The future will see a blend of software and hardware capabilities, allowing for innovative applications in various industries [17]. Group 5: The Future Landscape - The next decade is expected to witness a massive industrial transformation driven by AI, with every individual gaining access to powerful AI tools [18]. - The shift from digitalization to intelligent systems will redefine the boundaries of software development and user interaction [18].