Workflow
Vision Language Model
icon
Search documents
New DeepSeek just did something crazy...
Matthew Berman· 2025-10-22 17:15
Deepseek OCR Key Features - Deepseek OCR is a novel approach to image recognition that compresses text by 10x while maintaining 97% accuracy [2] - The model uses a vision language model (VLM) to compress text into an image, allowing for 10 times more text in the same token budget [6][11] - The method achieves 96%+ OCR decoding precision at 9-10x text compression, 90% at 10-12x compression, and 60% at 20x compression [13] Technical Details - The model splits the input image into 16x16 patches [9] - It uses SAM, an 80 million parameter model, to look for local details [10] - It uses CLIP, a 300 million parameter model, to store information about how to put the images together [10] - The output is decoded by Deepseek 3B, a 3 billion parameter mixture of experts model with 570 million active parameters [10] Training Data - The model was trained on 30 million pages of diverse PDF data covering approximately 100 languages from the internet [21] - Chinese and English account for approximately 25 million pages, and other languages account for 5 million pages [21] Potential Impact - This technology could potentially 10x the context window of large language models [20] - Andre Carpathy suggests that pixels might be better inputs to LLMs than text tokens [17] - An entire encyclopedia could be compressed into a single high-resolution image [20]
Bridging the Sim-to-Real Gap for Accelerated Robot Training
NVIDIA· 2025-08-12 02:07
Core Technology & Solution - NVIDIA Cosmos is a world foundation model platform designed for developers to generate training data at an industrial scale [2] - Cosmos Predict generates realistic training data from an initial observation, creating diverse action variations using text prompts or action triggers [2] - Cosmos supports multi-view outputs, providing different perspectives from a single frame, which is especially useful for autonomous vehicles and multi-camera robots [3] - Cosmos Transfer applies appearance variations to 3D renders or real-world video, adjusting materials, lighting, weather, and environments to train models that generalize across domains while preserving physical accuracy [3] - Cosmos Reason, a vision language model, filters low-quality samples, annotates scenes, and supports policy training, enabling safe, efficient decision-making [4] - Cosmos World Foundation models are adaptable and can be post-trained to fit different sensors and perspectives [5] Industry Application & Impact - The fusion of AI and computer graphics, exemplified by Cosmos, enables robots and autonomous machines to safely operate in the real world [5] - The technology addresses the challenge of expensive and time-consuming real-world training data capture or manual synthetic data creation for robotics [1]
WeRide Inc.(WRD) - 2024 Q4 - Earnings Call Transcript
2025-03-14 11:00
Financial Data and Key Metrics Changes - Total revenue for Q4 2024 decreased by 3% to RMB 141 million, primarily due to a decline in service revenue, partially offset by growth in product revenue [23] - For the full year 2024, total revenue was RMB 361 million, with product revenue increasing by 62% to RMB 88 million, while service revenue decreased by 21% to RMB 273 million [24] - Operating expenses rose by 82% to RMB 640 million in Q4 and increased by 32% to RMB 2.3 billion for the full year 2024, mainly due to higher personnel-related expenses and share-based compensation [24][25] - Net loss increased by 66% to RMB 592 million in Q4 and by 29% to RMB 2.5 billion for the full year 2024 [28] Business Line Data and Key Metrics Changes - Product revenue saw substantial growth, increasing by 46% to RMB 52 million for Q4, driven by sales of RoboTaxi, RoboSweeper, and RoboVAN [23] - Service revenue decreased to RMB 89 million in Q4, mainly due to the completion of customized R&D services for certain clients [24] - The RoboTaxi fleet size is currently around 400, with over 1,000 autonomous vehicles across all product lines [101] Market Data and Key Metrics Changes - The company has expanded its footprint to over 20 cities in China and has declared L4 vehicles across more than 30 cities in 10 countries [13] - The company launched its first strategic robotaxi pilot project in Switzerland and has begun commercial operations in Abu Dhabi [10][12] Company Strategy and Development Direction - The company focuses on international expansion and establishing a robust ecosystem through strategic partnerships, including collaborations with Uber and Bosch [12][92] - The strategy includes diversifying revenue streams through product innovation and upgrades, with a commitment to scaling the RoboTaxi fleet [20][21] - The company aims to maintain an asset-light business model while expanding its global presence [38][64] Management's Comments on Operating Environment and Future Outlook - Management expressed optimism about long-term potential despite current economic headwinds, emphasizing the importance of internationalization and technology innovation [14][20] - The company believes that overcoming challenges in scaling operations will create opportunities for growth and resilience [50][52] - Management highlighted the importance of safety and regulatory acceptance in the autonomous driving industry [11][41] Other Important Information - The company has achieved record-breaking robotaxi revenue and the highest international revenue since its funding [29] - The company holds autonomous driving permits from four countries and has a track record of zero regulatory incidents caused by autonomous driving system failures [41] Q&A Session Summary Question: What is WeRite's current robotaxi business model? - The company provides autonomous vehicles and services to local partners through a combination of vehicle sales, fixed service fees, and revenue sharing arrangements, ensuring positive contribution margins from day one [34] Question: What is WeRite's current market position in the robotaxi industry? - The company maintains a leading position in technology and commercialization, with a large fleet and strong safety records [40] Question: What major challenges does WeRite foresee in scaling global taxi operations? - Challenges include city-level regulatory approvals and operational scalability, but these challenges also present opportunities for establishing trust with local governments [50] Question: How does WeRite view the future competitive landscape? - The company believes that the high entry barriers in the global taxi market limit new competition, and it is well-positioned to maintain its leadership through advanced technology [52] Question: What is the current cost and path to profitability for the robotaxi business? - The company operates a hybrid fleet model, with competitive BOM costs and expectations for quick profitability in target markets [60] Question: Why is WeRite expanding in multiple countries instead of focusing solely on the domestic market? - The company sees strong demand in international markets and believes that a multi-market approach diversifies revenue streams and improves resilience against market fluctuations [64] Question: What are the next milestones regarding technology and product development? - The company plans to deploy hundreds of vehicles with advanced technology and continue integrating end-to-end models into its L4 systems [71][75] Question: How does WeRite's ADAS initiative interact with its L4 technology? - The ADAS system is designed to integrate seamlessly with the L4 system, leveraging shared data pipelines and toolchains for efficient development [80]