Workflow
模型训练
icon
Search documents
Alarum Technologies .(ALAR) - 2025 Q2 - Earnings Call Transcript
2025-08-28 13:30
Financial Data and Key Metrics Changes - The company reported second quarter revenue of $8.8 million, a slight decrease from $8.9 million in the same period last year, attributed to a shift in customer mix towards the AI segment [16][19] - Non-IFRS gross margin for 2025 was 63%, down from 78% in 2024, reflecting the impact of strategic investments and lower margins from new projects [17] - Non-IFRS net profit was $300,000 for 2025, compared to a net loss of $400,000 in 2024 [19] - Adjusted EBITDA for 2025 was $1 million, down from $3.4 million in 2024 [19] Business Line Data and Key Metrics Changes - The company is experiencing significant growth in the AI segment, which is replacing customers from other segments, leading to a net retention rate (NRR) of 0.98 [16] - The company has launched new projects with major AI and e-commerce platforms, indicating a shift towards larger deal sizes and more significant revenue potential [7][8] Market Data and Key Metrics Changes - The demand for data collection services is increasing, driven by the need for training data for AI models, positioning the company favorably within the evolving market landscape [6][9] - The company is focusing on expanding its customer base, which now includes major tech giants and emerging startups, indicating a broadening market reach [7] Company Strategy and Development Direction - The company is strategically reinvesting earnings into scaling operations, expanding infrastructure, and broadening its IP proxy network to capture long-term value from major AI-driven customers [16][13] - The focus is on building a robust talent pool and developing a cooperative field of data collection products designed for the AI era, aiming to cross-sell to existing customers [11][13] Management's Comments on Operating Environment and Future Outlook - Management highlighted the dynamic and unpredictable nature of the AI market, urging investors to evaluate the company's performance over multiple quarters rather than on a quarter-by-quarter basis [12] - The company anticipates revenue for 2025 to range from $12.8 million, representing a 78% year-over-year increase, with adjusted EBITDA expected to be around $1.1 million [22] Other Important Information - The company has a strong balance sheet with cash and liquid investments of approximately $25 million, allowing for strategic investments while maintaining a focus on sustainable value creation [14][21] - The company is currently experiencing a transition phase, with operating expenses increasing to $5.4 million due to higher employee-related costs, particularly in R&D [18] Q&A Session Summary Question: Clarification on the large customer ramp in Q3 - Management explained that lower margins are due to the new product's infrastructure costs, which are currently high due to the scale of the project [27][28] Question: Infrastructure costs and margin recovery - Management indicated that significant volume increases would be necessary to recover margins, and improvements in cost structure are expected as the project scales [31][32] Question: Broader customer base usage trends - Management noted a significant increase in demand from AI and data-driven customers, with a strong pipeline of new logos expected [36][38] Question: Customer lifetime value and stability - Management expressed optimism that the new AI-driven customer base could lead to higher customer lifetime value and stability over time [42][47] Question: Contribution of the large customer to Q2 results - Management confirmed that the large customer has been ramping up and is already contributing a respectable amount to revenues [51] Question: Visibility into projected revenues - Management stated that there is a level of confidence in the projected $3 million revenue for Q3, with ongoing demand expected [56][57]
热议!DeepSeek V3.1 惊现神秘 Bug,模型故障了?
程序员的那些事· 2025-08-26 12:35
Core Viewpoint - The recent release of Deep Seek V3.1 introduces significant improvements in reasoning efficiency and memory usage, but it also presents unexpected issues with random token generation, particularly the appearance of tokens like "极" and "extreme" during text generation [1][2][25]. Group 1: Version Improvements - Deep Seek V3.1 features a hybrid reasoning architecture that enhances reasoning efficiency by 20%-50% and supports 128K long context processing [1]. - The update incorporates UE8M0 FP8 parameter precision format, resulting in a 75% reduction in memory usage [1]. - The model is now compatible with domestic next-generation chips, reducing reliance on imported GPUs [1]. Group 2: User Feedback and Issues - Users have reported that the V3.1 model generates unexpected tokens such as "极" and "extreme" randomly during text generation [2][12]. - The issue has been observed across various platforms, including third-party APIs like VolcEngine and even on the DeepSeek official website, with a higher occurrence rate on third-party platforms [12][15]. - Developers have expressed confusion as the model fails to resolve these token generation issues even when prompted [3][12]. Group 3: Technical Analysis - Some technical analysts suggest that the appearance of the token "极" (token ID: 2577) may be due to residual data from training datasets, indicating a potential flaw in data cleaning processes [25][26]. - The model may have learned to treat "极" as a semantic boundary marker due to its presence in training data, leading to its random generation in outputs [25][26]. - The issue reflects a broader concern that large models may not be genuinely understanding language but rather learning statistical patterns from the data [27][28].
GPT-oss太离谱:无提示自行想象编程问题,还重复求解5000次
量子位· 2025-08-11 08:32
Core Viewpoint - The article discusses the peculiar behaviors and hallucinations exhibited by the GPT-oss model, particularly in its problem-solving capabilities and language processing, suggesting that it may have been overly optimized for specific reasoning tasks, leading to a lack of naturalness in its outputs [1][33]. Group 1: Model Behavior and Performance - GPT-oss demonstrated the ability to generate a complex programming problem about domino placement in a grid without any prompts, consuming over 30,000 tokens in the process [2][17]. - The model repeated this problem-solving behavior over 5,000 times, indicating a deep binding of the task to its training objectives, which may have resulted in a skewed focus on specific reasoning tasks [19]. - The model's outputs often reflect a strong inclination towards mathematics and coding, diverging from natural language or casual conversation, suggesting it was not designed for everyday dialogue [13][11]. Group 2: Training Data and Language Processing - Analysis of the training data revealed that GPT-oss has a broad coverage of programming languages, with a notably high representation of Perl, although the author questioned the actual proportions of Java and Kotlin [7][9]. - The model frequently transitions between multiple languages during reasoning processes, sometimes evolving into a unique expression termed "Neuralese," which indicates complex internal processing mechanisms [21][23]. - Anomalies in the model's outputs, such as unusual symbols and references, may stem from the OCR processing of training data, leading to errors or misinterpretations [25][27]. Group 3: Hallucination Rates and Limitations - The hallucination rates of GPT-oss are notably high, with the 20 billion parameter model exhibiting a hallucination rate of 91.4% in certain evaluations [34]. - Instances of the model generating non-existent theories, such as the "quantum gravity wave theory," highlight its limitations in producing accurate and relevant information outside of mathematical or programming contexts [36][37]. - The model's performance in everyday tasks is inconsistent, often leading to failures in casual conversation or generating irrelevant outputs [37].
腾讯申请模型训练及信息投放相关专利,提高投放预测模型的准确性
Jin Rong Jie· 2025-08-07 03:21
Core Insights - Tencent Technology (Shenzhen) Co., Ltd. has applied for a patent titled "Model Training Method, Information Delivery Method, Device, Equipment, and Medium," with publication number CN120430833A, filed on February 2024 [1] - The patent describes a method that involves obtaining positive samples, negative samples, and unlabeled samples, which are used to train a label prediction model and a delivery prediction model [1] Company Overview - Tencent Technology (Shenzhen) Co., Ltd. was established in 2000 and is located in Shenzhen, primarily engaged in software and information technology services [1] - The company has a registered capital of 2 million USD and has invested in 15 enterprises, participated in 263 bidding projects, and holds 5000 trademark records and 5000 patent records [1] - Additionally, the company possesses 527 administrative licenses [1]
腾讯申请模型训练方法、装置、电子设备及存储介质专利,提升模型推理准确性
Jin Rong Jie· 2025-08-05 13:22
Group 1 - Tencent Technology (Shenzhen) Co., Ltd. has applied for a patent titled "Model Training Method, Device, Electronic Equipment, and Storage Medium" with publication number CN120431962A, filed on June 2025 [1] - The patent describes a method that includes obtaining sample data sets for multiple training stages, sorted by training difficulty from easy to hard, and training an initial model based on these data sets [1] - The method aims to enhance the accuracy of model inference through a multi-stage training process, which includes optimizing the model based on the correctness of inference results and applying reinforcement learning to achieve a target model [1] Group 2 - Tencent Technology (Shenzhen) Co., Ltd. was established in 2000 and is primarily engaged in software and information technology services, with a registered capital of 2 million USD [2] - The company has invested in 15 enterprises and participated in 263 bidding projects, holding 5000 trademark records and 5000 patent records, along with 527 administrative licenses [2]
腾讯申请模型训练方法相关专利,保证目标模型迭代方向的正确性
Jin Rong Jie· 2025-08-05 07:19
Group 1 - Tencent Technology (Shenzhen) Co., Ltd. applied for a patent titled "Model Training Method, Device, Equipment, Storage Medium, and Computer Program Product" with publication number CN120409606A, filed on April 2025 [1] - The patent application describes a method that includes determining the first preference difference of a target model for training samples and the second preference difference of a reference model for the same samples [1] - The method aims to ensure the correctness of the iteration direction of the target model by calculating total loss values based on preference differences and updating the model parameters accordingly [1] Group 2 - Tencent Technology (Shenzhen) Co., Ltd. was established in 2000 and is primarily engaged in software and information technology services, with a registered capital of 2 million USD [2] - The company has invested in 15 enterprises and participated in 263 bidding projects, holding 5000 trademark records and 5000 patent records [2] - Additionally, the company possesses 527 administrative licenses [2]
周鸿祎:360最近都采购华为芯片,国产性价比高
Nan Fang Du Shi Bao· 2025-07-23 14:03
Group 1 - The gap between domestic chips and Nvidia is acknowledged, but the necessity to use domestic products is emphasized for improvement [1] - 360 Group has recently procured Huawei's chip products, indicating a shift towards domestic technology [1] - Nvidia's H20 chip has been approved for sale to China, which is more suitable for model inference, providing opportunities for domestic AI chips [2] Group 2 - DeepSeek has contributed significantly to the popularity of inference models, although it recently experienced a decline in monthly active users [2] - The decline in DeepSeek's application traffic is not solely negative, as many cloud vendors still rely on DeepSeek's model services [2] - The performance enhancement of open-source models has laid the foundation for the booming AI agents this year, which are seen as key to AI implementation [3] Group 3 - AI coding has emerged as a hot vertical direction for AI agents, with a focus on engineering capabilities like context and prompt engineering [3] - The development of specialized AI agents tailored to different industries is recommended to create unique technical barriers [3] - The potential disruptive future of AI agents has led to significant changes in operational strategies within companies, with a push for efficiency through AI utilization [3]
中国移动山东公司及总公司申请模型训练与问答方法专利,可得到训练完成的问答模型
Jin Rong Jie· 2025-05-24 04:49
Group 1 - China Mobile Communication Group Shandong Co., Ltd. applied for a patent titled "Model Training Method and Question-Answer Method" with publication number CN120030353A, filed on March 2025 [1] - The patent describes a method that includes determining an output result's second modality based on modal parameters and a preset question in the first modality, which represents the form of the preset question [1] - The method aims to generate a target answer corresponding to the preset question and adjust the modal parameters of the question-answer model until the training completion criteria are met [1] Group 2 - China Mobile Communication Group Shandong Co., Ltd. was established in 2000, located in Jinan, with a registered capital of 6341.85 million RMB [2] - The company has made one external investment, participated in 5000 bidding projects, and holds 617 patent records [2] - China Mobile Communication Group Co., Ltd. was founded in 1999, located in Beijing, with a registered capital of 30000 million RMB [2] - This company has made 51 external investments, participated in 5000 bidding projects, and holds 5000 patent records [2]