Workflow
多模态
icon
Search documents
李彦宏的电商梦,靠罗永浩们的数字人能圆吗?
Sou Hu Cai Jing· 2025-06-18 09:55
Core Insights - The digital human technology used in the live stream of Luo Yonghao has set a new record in digital human live streaming, attracting over 13 million viewers and generating a GMV of 55 million yuan, surpassing previous live streams by Luo Yonghao himself [2][3] - Baidu aims to establish Luo Yonghao's digital human as a benchmark in the e-commerce live streaming industry, leveraging AI advancements to enhance user interaction and engagement [2][8] - The cost of creating digital humans has been reduced to around 1,000 yuan, which is 80% lower than the average cost of live streaming with real hosts, indicating significant potential for scalability in the digital human market [8][10] Company Strategy - Baidu's e-commerce team has been working on the digital human project for about three weeks, focusing on refining the technology to meet Luo Yonghao's high standards for humor and interaction [3][6] - The digital human live stream is part of Baidu's broader strategy to capitalize on AI technology to transform the e-commerce landscape, with plans to enhance the capabilities of digital humans and reduce costs further [10][11] - Luo Yonghao has been appointed as the Chief Experience Officer for Baidu's e-commerce platform, indicating a deeper collaboration between him and Baidu in promoting digital human technology [10][12] Market Potential - The digital human live stream has shown promising results, with half of the live streams outperforming real hosts in terms of GMV and conversion rates, suggesting a strong market acceptance [8][10] - Baidu's digital human initiative is seen as a potential game-changer in the over 5 trillion yuan live e-commerce market, with the company aiming to attract more small and medium-sized businesses to utilize this technology [15] - The integration of digital humans into e-commerce is expected to enhance user experience and transaction efficiency, positioning Baidu to compete more effectively in the market [14][15]
MiniMax秀了波AI杂技视频,视频生成赛道又卷起来了
Di Yi Cai Jing· 2025-06-18 08:47
这仍然只是技术迭代的初期。 AI视频生成赛道又热闹起来了。4月,快手可灵发布2.0视频生成模型,6月,字节跳动发布了即梦3.0 Pro视频模型,就在昨天,谷歌宣布Veo3正式上线,今 天MiniMax也加入混战队列,开始卷性价比了。 6月18日,MiniMax在海内外官方平台宣布,新视频生成模型海螺AI(海螺02)上线,同时发布了一段大秀杂技的AI视频。官方表示,这段视频是由3位艺术 家耗时1.5天,使用海螺02生成的多个6-10s视频,再拼接剪辑而成。 杂技画面对AI视频生成来说一直是较有难度的内容,此前AI生成画面时往往肢体拼接混乱,无法准确模仿复杂的人类动作。从此次画面效果来看无论是光 影、人类动作、物理模仿都完成得很好。 不过,需要指出的是,有AI创作者对第一财经表示,AI视频生成还涉及一个成功率的问题,因此样片是完美的,但过程中他们进行了多少次"抽卡",外界并 不能知晓。所谓抽卡指的是, AI往往不是一次就能成功生成用户想要的画面,可能会有出错的概率,因此行业会选择多次生成直到得到自己想要的画面。 不过,即便如此,上述AI创作者认为,这次海螺AI的更新是不错的,行业整体水平都在提升。 在竞技场Art ...
冠军队独享200万,进决赛就有直通offer,腾讯广告算法大赛报名开启
机器之心· 2025-06-18 06:09
机器之心原创 作者:张倩 「2025 年,多模态生成是一个好方向吗?」这是一位同学在今年年初提出的问题。 他之所以有此疑问,是因为在找实习时发现,狭义的 AIGC(如视频生成)岗位较少,就业前景不佳,自己的「底层视觉 + 生成模型」背景不知道怎么才能发挥用 武之地。 这位同学描述的情况相信很多同学都遇到过。确实,这两年 AIGC、多模态生成很火,理论上很多行业都能用上,比如影视、游戏…… 但由于技术发展仍在早 期,能经得起商业验证的场景其实并不多。部分从业者曾在采访中告诉机器之心,他们和影视行业接触过,比如拍短剧的导演,但对方表示,目前 AI 相比普通演 员仍然不具备竞争力。 不过,并非每个行业都如此悲观。据我们观察,至少从三年前开始,多模态生成就已经在广告等行业成功试水,去年更是给一些大厂带来了实打实的收益。在这 些正向回报的激励下,不少企业正在加大投入,希望用生成式 AI(尤其是多模态生成)给广告内容的生产、分发带来一场变革。对于相关人才来说,这里面蕴含 着大量的机会。 生成式 AI + 广告 一条已经跑通的路线 提到广告 AI,大多数人首先想到的是用 AI 助力广告内容的生成。这确实是一项已经开展多年的 ...
统一框架下的具身多模态推理:自变量机器人让AI放下海德格尔的锤子
机器之心· 2025-06-18 06:09
机器之心报道 自变量机器人 自变量机器人 主张,必须放弃以"多模态模块融合"为核心的拼凑式范式,转向一个端到端的统一架构。该 架构旨在彻底消解视觉、语言和行动之间的人为边界,将它们还原为单一信息流进行处理。 当前范式的根本局限 现有主流方法将不同模态视为独立模块,如预训练的 ViT 处理视觉信息,LLM 处理语言理解,然后通过融 合层进行连接。这种"委员会"式的设计存在着本质缺陷。 首先是 表 征瓶颈问题 。信息在不同模态的专属编码器之间传递时,会产生不可避免的压缩损失,就像将一 幅油画描述给盲人,再让盲人向聋人传达画面内容一样,每次转换都会丢失关键的细节和关联。 这种损失 阻碍了模型对物理世界进行深层次的跨模态理解。 最关键的是 无法涌现的问题 。结构上的割裂使得模型难以学习到物理世界中跨越模态的、直觉式的因果规 律。就像一个人无法仅通过阅读教科书就学会骑自行车一样, 真正的物理智能需要的是整体性的、具身的 理解 ,而不是模块化的知识拼接。 当 AI 放下海德格尔的锤子时,意味着机器人已经能够熟练使用工具,工具会"隐退"成为 本体的延伸,而不再是需要刻意思考的对象。 当一位熟练的木匠抓起锤子时,锤子消失了 ...
通信ETF(515880)涨超1.1%,端侧AI驱动行业增长
Mei Ri Jing Ji Xin Wen· 2025-06-18 05:53
Group 1 - The core viewpoint is that the development space for edge AI is vast, with the Volcano Engine showcasing various smart hardware products like AI alarm clocks and AI learning machines, indicating a continuous increase in the categories of large models being implemented on hardware [1] - The Doubao large model family has been fully launched, covering various types such as language, video, and voice, demonstrating outstanding performance and cost advantages, and has been widely applied in industries like mobile, automotive, and finance, accelerating the process of industrial intelligence [1] - Multi-modal vertical upgrades are becoming an important path for the implementation of AI, with edge AI accelerating penetration in IoT and other fields, and technological advancements expected to show exponential growth with improvements in computing power and model optimization [1] Group 2 - The communication industry is performing steadily, with edge AI emerging as a significant growth point [1] - The Communication ETF (515880) tracks the communication equipment index (931160), which is compiled by China Securities Index Co., Ltd., selecting listed companies involved in communication network equipment, terminal devices, and related services from the A-share market to reflect the overall performance of the communication equipment industry [1] - Investors without stock accounts can consider the Guotai Zhongzheng All Index Communication Equipment ETF Connect A (007817) and Guotai Zhongzheng All Index Communication Equipment ETF Connect C (007818) [1]
UU Holo随身AI全球首秀:多模态交互重构“所见皆可问”智能体验
Group 1 - The second "Belt and Road" Technology Exchange Conference was held in Chengdu, Sichuan from June 10 to 12, showcasing cutting-edge technologies and their potential to enhance daily life and future cities [1] - Koala Youran presented three innovative multimodal AI products, including the UU Holo portable AI, which integrates core multimodal large model technology and offers features such as scene recognition, intelligent explanation, multilingual Q&A, and autonomous task execution [1][2] - The UU Holo served as a bilingual AI video guide for the conference, providing immersive intelligent service experiences to attendees [1] Group 2 - The urban traffic video semantic analysis and Youran Smart Central, based on the self-developed Youran Full Modal AI application platform, enable rapid processing and intelligent analysis of massive offline video data, transforming traditional video retrieval methods [2] - The system can automatically parse video elements, generating structured results such as video summaries, environmental analysis, and behavioral insights, allowing users to perform keyword-based video searches in seconds [2] - Youran Smart Central enhances urban governance with high precision (covering over 100 types of events with an accuracy rate of over 90%), high efficiency (processing millions of events daily), and localized development capabilities [2] Group 3 - The company aims to promote technological innovation in multimodal AI and explore new paths for technology to empower human development in collaboration with global partners [3] - The showcased results reflect the company's deep expertise in the AI field and its contributions to smart city construction [3]
还不知道发什么方向论文?别人已经投稿CCF-A了......
具身智能之心· 2025-06-18 03:03
辅导老师介绍 老师均在CVPR、ICCV、ECCV、ICLR、RSS、ICML、ICRA等顶级会议上发表论文,有较丰富的 指导经验。 学员要求 自带一份简历,学校背景:国内TOP100高校,国外QS200以内; 具身智能之心论文辅导正式推出啦!去年的成果还算不错,几个同学中了CVPR和ICRA等会议, 今年和老师们沟通过后,准备继续辅导几名同学冲下顶会,感兴趣的同学可以咨询,辅导方向如 下。 主要方向 更多咨询 多模态大模型,VLA、机器人导航、机器人抓取、具身泛化、具身合成数据、端到端具身智能 体、3DGS等方向; 详细内容欢迎添加微信:oooops-life,做进一步了解。 ...
资金流入游戏板块,游戏ETF(516010)近10日净流入近4亿元,AI技术赋能商业化进程受关注
Mei Ri Jing Ji Xin Wen· 2025-06-18 02:22
Group 1 - The core viewpoint is that the gaming industry is expected to see accelerated application and commercialization of AI products, particularly focusing on AI Agents, AI companionship, and AI multimodal technologies [1] - AI Agents are viewed as productivity tools that enhance efficiency through autonomous decision-making and dynamic interaction, with expectations for continuous optimization throughout the year [1] - AI companionship addresses personalized interaction needs and falls within the broader entertainment sector [1] Group 2 - AI multimodal technologies, including audio, video, and 3D models, are undergoing continuous iteration, driving the accelerated implementation of industry applications [1] - The gaming ETF (code: 516010) tracks the animation and gaming index (code: 930901), which is compiled by China Securities Index Co., Ltd., reflecting the overall performance of listed companies in the Chinese animation and gaming industry [1] - The index constituents are primarily distributed across cultural media and software development sectors, showcasing both industry concentration and innovative growth characteristics [1]
OpenAI以65亿美元收购Jony Ive的io背后,软硬件结合的AI原生硬件公司正在崛起
3 6 Ke· 2025-06-17 23:51
Core Insights - OpenAI has acquired Jony Ive's company io for $6.5 billion to develop a series of hardware products, indicating a strategic move towards integrating hardware with AI capabilities [1] - The emergence of AI-native hardware is facing challenges, including slow market penetration and user acceptance due to overly ambitious product designs [2][4] - The second wave of AI-native hardware is focusing on specific applications, such as meeting transcription and summarization, which have clear user demand and willingness to pay [6][8] Group 1: AI Hardware Development - The development of AI-native hardware is driven by advancements in large language models, enabling more sophisticated human-computer interactions [2] - Initial AI hardware products struggled due to high learning costs and lack of clear application scenarios, leading to poor market performance [4][5] - Companies are now focusing on refining their products to meet specific user needs, resulting in more mature offerings [9] Group 2: Market Dynamics - The pricing of AI hardware, such as the AI Pin at $699 and Apple's Vision Pro at $3,499, limits their market penetration due to high costs compared to traditional smartphones [5] - The supply chain challenges in Silicon Valley hinder rapid hardware iteration and competitive pricing, making it difficult for these companies to gain market share [5][15] - Chinese entrepreneurs benefit from a robust AI hardware supply chain and a large market, positioning them well for future growth in this sector [15][16] Group 3: Future Prospects - The evolution of AI-native hardware may eventually lead to the replacement of smartphones and tablets, necessitating the development of AI-native operating systems [13][14] - The potential for AI hardware to penetrate various sectors, including education and healthcare, is significant as capabilities improve and applications expand [12][16] - Companies are increasingly focusing on specific use cases, such as educational tools and personal companion robots, to drive adoption and revenue [10][12]
【公告全知道】脑机接口+算力+固态电池+机器人+国产芯片!公司参股企业主要从事医疗级全植入式无线脑机接口系统研发
财联社· 2025-06-17 14:09
①脑机接口+算力+固态电池+机器人+国产芯片+国企改革!这家公司参股企业主要从事医疗级全植入式无 线脑机接口系统研发;②脑机接口+边缘计算+机器人+ AI智能体+多模态AI+跨境电商!这家公司脑机技术 聚焦教育、医疗、养老三大核心应用场景;③创新药+细胞免疫治疗!公司创新药产品获欧盟孤儿药资格 认定。 每周日至每周四推送明日股市重大公告!内容包含"停复牌、增减持、投资中标、收购、业绩、解禁、 高送转"等一系列个股利好利空公告,其中重要公告均以红色标注,帮助投资者提前寻找到投资热点, 防范各类黑天鹅事件,并且有充足的时间进行分辨和寻找合适的上市公司。 前言 ...