视觉AI

Search documents
多模态都是假的:最强模型数不清手指、认不出雷碧
Hu Xiu· 2025-07-22 07:21
Core Insights - The article discusses the limitations of AI models in recognizing images, particularly focusing on the example of a six-fingered hand, illustrating how models rely on training data and probability rather than true visual understanding [38][41]. Group 1: Multimodal Models - The term "multimodal" refers to models that can process different types of data, such as audio and visual inputs, but many claimed multimodal models have not undergone proper training [7][8]. - True multimodal capabilities involve integrating various sensory inputs, while current models often struggle with complex visual data due to the inherent limitations of their training datasets [8][30]. Group 2: Image Recognition Challenges - AI models do not "see" images in the human sense; they process images as numerical data, which requires extensive preprocessing to convert into high-dimensional vectors for recognition [10][11]. - The recognition process relies heavily on labeled training data, where the model learns to associate images with descriptions, leading to biases based on the prevalence of certain features in the training set [14][15]. Group 3: Data Limitations - The training data used for AI models often does not encompass the full spectrum of real-world scenarios, leading to challenges in recognizing outlier cases, such as a six-fingered hand [29][30]. - Models are typically trained on common patterns, which means they may fail to identify rare or unusual features unless specifically trained on those cases [30][41]. Group 4: Task-Specific Limitations - The ability of a model to recognize specific features, like the number of fingers on a hand, is contingent upon the task it is designed to perform; recognizing a hand may not require identifying the number of fingers [18][36]. - The article emphasizes that while models can be trained to recognize specific features, they still operate within the constraints of their training data and the defined tasks [36][39]. Group 5: Conclusion and Future Opportunities - The discussion concludes that AI models are fundamentally probability-driven systems that require continuous calibration with real-world data to improve their accuracy and reduce hallucinations [41][42]. - Recognizing the limitations of current models and embracing the need for diverse training data may present new opportunities for industries looking to leverage AI technology effectively [42].
「CV 铁三角」落定Meta,视觉 AI 如何向多模态演进?
机器之心· 2025-07-19 05:49
Group 1 - The core viewpoint of the article discusses the strategic hiring by Meta, focusing on the "CV Triangle" and its implications for the evolution of visual AI towards multimodal capabilities [4][5][6] - The "CV Triangle" consists of three key researchers from OpenAI Zurich, previously from GoogleBrain, whose work has significantly influenced the development of modern multimodal AI frameworks [5][6] - The article outlines five representative works led by the "CV Triangle," including S4L, BiT, ViT, MLP-Mixer, and PALI, which collectively contribute to the advancement of visual AI and its integration with other modalities [5][6][7] Group 2 - The article highlights the milestones necessary for the transition from visual AI to multimodal AI, emphasizing the importance of continuous research and development in this field [8]
显示一航班遭劫持?飞常准回应
第一财经· 2025-05-06 13:36
Group 1 - Flight CA929 from Air China was reported to have issued a "7500 alert" by the Feichangzhun app, but the company confirmed that the flight was safe and the alert is under verification [1] - Microsoft officially ceased operations of Skype, transferring its core functionalities to another communication platform, marking the end of Skype's 20-year history [1] - SpaceX's Texas headquarters has been designated as a city named "Starbase," celebrating its establishment as the first new city in Cameron County in nearly 30 years [1] Group 2 - Lei Jun has transitioned from Executive Director to Director of Xiaomi Home Business Company, with new business scope including smart home consumer devices [2] - Liu Qiangdong shared his experience of bringing 76 eggs to university due to financial constraints, emphasizing the importance of creating value for society through entrepreneurship [3] - Yushu Technology's humanoid robot design patent has been authorized, focusing on mechanical robots [4] Group 3 - Kimi's long-thinking model API has been officially launched, showcasing its multi-modal reasoning capabilities [5] - SenseTime has signed a memorandum of cooperation with China Mobile Hong Kong and the Chinese University of Hong Kong Law School to collaborate in visual AI and large model fields [6] - OpenAI announced it will remain under the control of its non-profit organization, retracting a previous restructuring plan [7] Group 4 - Seres reported a 12.99% year-on-year increase in April sales, with the AITO M9 model achieving a 41.19% increase in cumulative sales from January to April [8] - Li Auto announced the launch of the L series smart refresh version on May 8 [9] - Zeekr made urgent personnel adjustments amid sales pressure, appointing new leadership for domestic marketing [10] Group 5 - Pony.ai and Uber have formed a strategic partnership to integrate Pony.ai's Robotaxi services into the Uber platform, starting in the Middle East [11] - Maserati (China) has undergone a leadership change, with Santo Ficili taking over as the legal representative and chairman [12] - Tesla stated that over 95% of parts for the Model 3 and refreshed Model Y are produced in China [13] Group 6 - Tesla's new car sales in the UK dropped by 62% year-on-year in April, marking the lowest level in over two years [14] - Ford Motor Company anticipates a $1.5 billion loss due to tariffs, leading to the withdrawal of its 2025 earnings forecast [14] - Shanghai Lego Land is set to sell limited edition annual passes starting at 1399 yuan and hotel packages starting at 3588 yuan [15] Group 7 - The market supervision department in Xuchang conducted a compliance check on the sale of jade by Pang Donglai, confirming proper sales practices [16] - Pang Donglai's official website and app are currently closed for maintenance, with the founder inviting critics to visit and understand the business [16] - Yonghui Supermarket publicly opposed any actions that undermine business ethics for publicity, expressing solidarity with Pang Donglai [17] Group 8 - Tea Baidao reported a 50% increase in overall sales during the May Day holiday, with some stores seeing sales surge by over 17 times [18] - Nayuki's Tea experienced a 300% increase in order volume at some stores during the May Day holiday, particularly in tourist areas [19] - Domestic gold jewelry prices have surpassed 1000 yuan per gram due to rising international gold prices [19] Group 9 - Alibaba's subsidiary, Daniao Logistics, has increased its registered capital to approximately 498 million yuan [20] - Credit Suisse reached an agreement with the U.S. Department of Justice to pay $5.11 billion to resolve tax-related issues [21] - CATL has established new energy technology companies in Hangzhou and Ya'an, each with a registered capital of 5 million yuan [21]
商汤与中国移动香港、香港中文大学法律学院就视觉AI、大模型等领域达成合作
news flash· 2025-05-06 03:49
Group 1 - SenseTime has signed a memorandum of cooperation with China Mobile Hong Kong and the Faculty of Law at the Chinese University of Hong Kong [1] - The collaboration will focus on areas such as visual AI and large models [1]