Workflow
视觉理解模型
icon
Search documents
阿里开源Qwen3-VL系列旗舰模型 包含两个版本
Di Yi Cai Jing· 2025-09-25 06:08
Core Insights - Alibaba has launched the upgraded Qwen3-VL series, which is the most powerful visual understanding model in the Qwen series to date [1] - The flagship model, Qwen3-VL-235B-A22B, has been open-sourced, featuring both Instruct and Thinking versions [1] - The Instruct version outperforms or matches the performance of Gemini 2.5 Pro in several mainstream visual perception evaluations [1] - The Thinking version achieves state-of-the-art (SOTA) performance across various multimodal reasoning benchmarks [1]
阿里开源视觉理解模型Qwen3-VL
人民财讯9月24日电,在9月24日的2025云栖大会上,阿里开源新一代视觉理解模型Qwen3-VL。 ...
DeepSeek更新;OpenAI与英伟达合作丨新鲜早科技
Group 1: Partnerships and Collaborations - OpenAI and Nvidia announced a partnership agreement, with Nvidia planning to invest up to $100 billion to support OpenAI's data center and infrastructure development, deploying at least 10 gigawatts of Nvidia systems by the second half of 2026 [2] - OpenAI is reportedly collaborating with companies like Luxshare Precision to develop consumer hardware, potentially utilizing GoerTek's MEMS silicon microphone technology [4] - WeRide announced a partnership with Grab to launch the Ai.R autonomous driving service in Singapore, initially deploying 11 autonomous vehicles [9] Group 2: Company Leadership Changes - Oracle appointed Clay Magouyrk and Mike Sicilia as co-CEOs, while the current CEO Safra Catz will serve as the executive vice chair of the board [5] Group 3: Product Launches and Innovations - Baidu's Qianfan launched a new open-source visual understanding model, optimized for enterprise-level multimodal applications, with versions of 3B, 8B, and 70B [6] - Meituan released its first self-developed inference model, LongCat-Flash-Thinking, which improves training speed by over 200% and reduces average token consumption by 64.5% in specific benchmarks [8] - Huawei announced its self-developed HBM memory with a maximum capacity of 144GB and a bandwidth of 4TB/s, set to launch in 2026 [12] - Chipmaker MediaTek unveiled the Dimensity 9500, featuring the latest N3P process and advanced CPU and GPU capabilities, aimed at enhancing mobile performance [15] - Chipmaker Xindong Technology launched the Fenghua 3 GPU, integrating open-source RISC-V CPU with CUDA-compatible GPU for diverse applications [16] Group 4: Financial Activities and Market Movements - Baiwei Storage announced plans to issue H-shares and list on the Hong Kong Stock Exchange to enhance its global strategy and brand image [17] - Daotong Technology plans to transfer 46% of its stake in Shenzhen Saifang Technology for a total of 109 million yuan, focusing on core business development [18] - Zero Gravity Aircraft Industry completed nearly 100 million yuan in A++ round financing to support the development of new energy aircraft [19] Group 5: Industry Trends and Insights - Brain Tiger Technology's founder highlighted that China's brain science research is transitioning from a "follower" to a "leader" stage, emphasizing the need for high-end talent and support for young researchers [11] - The domestic high-end flagship smartphone market is set for a series of new product launches, with OPPO and vivo announcing upcoming flagship models [20]
百度开源视觉理解模型Qianfan-VL!全尺寸领域增强+全自研芯片计算
量子位· 2025-09-22 11:16
Core Viewpoint - Baidu's Qianfan-VL series of visual understanding models has been officially launched and is fully open-sourced, featuring three sizes (3B, 8B, and 70B) optimized for enterprise-level multimodal applications [1][34]. Model Performance and Features - The Qianfan-VL models demonstrate significant core advantages in benchmark tests, with performance improving notably as the parameter size increases, showcasing a good scaling trend [2][4]. - In various benchmark tests, the 70B model achieved a score of 98.76 in ScienceQA_TEST and 88.97 in POPE, indicating its superior performance in specialized tasks [4][5]. - The models are designed to meet diverse application needs, providing reasoning capabilities and enhanced OCR and document understanding features [3][5]. Benchmark Testing Results - The Qianfan-VL series models (3B, 8B, 70B) excel in OCR and document understanding, achieving high scores in various tests such as OCRBench (873 for 70B) and DocVQA_VAL (94.75 for 70B) [6][5]. - The models also show strong performance in reasoning tasks, with the 70B model scoring 78.6 in MathVista-mini and 50.29 in MathVision [8][7]. Technical Innovations - Qianfan-VL employs advanced multimodal architecture and a four-stage training strategy to enhance domain-specific capabilities while maintaining general performance [9][12]. - The models leverage Baidu's Kunlun chip P800 for efficient computation, supporting large-scale distributed computing with up to 5000 cards [12][1]. Application Scenarios - Beyond OCR and document understanding, Qianfan-VL can be applied in chart analysis and video understanding, demonstrating excellent model performance across various scenarios [33][34]. - The open-sourcing of Qianfan-VL marks a significant step towards integrating AI technology into real-world productivity applications [33].
豆包可以跟你打视频了,陪我看《甄嬛传》还挺懂!难倒一众AI的“看时钟”也没难倒它
量子位· 2025-05-26 08:18
金磊 发自 凹非寺 量子位 | 公众号 QbitAI 几乎让大模型全军覆没的新难题—— 看时钟 ,被国产AI给拿下了。 要知道,之前单单是一张时钟的图表,几乎所有大模型都答不对时间。 但现在,国产AI却可以直接 开视频 ,实时报准时间! 可以看到,这个国产AI先是准确地报出了 "4点14分" ,而在等了一分钟后,它也是可以再次准确报时 "4点15分" 。 那么这到底是何许AI也? 不卖关子,它就是豆包发布的新功能—— 视频通话 。 主打一个让AI 边看边聊天 。 而且啊,它还是接入了 联网搜索 的功能,所以回答的准确性和时效性这块也是拿捏到位了。 例如我们对着微博热搜的话题提个问题: 这个热搜第一的是什么新闻呀? 可以看到,联网的豆包在视频通话的过程中,就可以直接把当下的新闻热点给你总结出来。 不得不说,这种跟AI的互动,不论是趣味性还是可靠性,都大大地增强了。 除此之外,这次的新功能还增加了 "字幕" 的选项,点击之后就可以看到之前对话的具体内容啦~ 既然这个功能如此有趣,那我们必须安排一波深度实测。 来,走起~ 跟豆包一块看《甄嬛传》 我们先来简单介绍一下视频通话的操作方式。 打开豆包App之后,依次点击 ...