Workflow
语言
icon
Search documents
以玩促学?游戏代码驱动数据合成,提升多模态大模型通用推理
机器之心· 2025-07-04 08:59
如果告诉你, AI 在推箱子等游戏场景上训练,能让它在几何推理与图表推理上表现更好,你会相信 吗? 复旦 NLP 实验室联合字节跳动智能服务团队的最新研究给出了一个令人意外的发现: 游戏不仅是娱乐 工具,更是训练 AI 推理能力的宝贵资源。 Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning 论文链接: https://arxiv.org/abs/2505.13886 代码仓库: https://github.com/tongjingqi/Code2Logic 标题: 数据和模型: https://huggingface.co/Code2Logic 引言 高质量多模态推理数据的极度稀缺,制约了视觉语言模型( VLMs )复杂推理能力的提升。那么,有 没有一种低成本又可靠的方法来大规模生成这些数据呢? 复旦与字节的研究团队创新性地提出了一个巧妙的思路: 利用游戏代码自动合成视觉推理数据。 | A | ਟ | E | ч | 0 | - | | --- | --- | --- | --- | --- | ...
AI写综述,靠谱吗?
Hu Xiu· 2025-07-04 07:49
当Sam Rodriques还是神经生物学的研究生时,他发现了科学研究中的一个基本问题。他说:"我们说不定已 经拥有了理解人体细胞或大脑的所有必要信息,但不知道到底能否确定这一点,因为没有人类能读完和搞懂 所有这些文献。" 五年后,Rodriques说用人工智能(AI)已经接近解决这个问题。2023年9月,他和初创公司FutureHouse的团 队开发出了一个人工智能系统。这个系统能在几分钟内完成科学知识的总结,准确度超过了维基百科页面 [1]。团队随后用这个系统快速生成了大约17000个人类基因的维基百科式条目,之前它们大多没有详细介绍 页。 Rodriques不是唯一用人工智能来汇总科学知识的人。几十年来,学者们一直在寻找方法来加快文献综述这项 耗时的工作。伦敦国王学院的研究综述专家Iain Marshall说,"综述太长、强度太高,而且经常写完就过时 了。"最近,随着支撑ChatGPT等工具的生成式AI即大语言模型(LLM)的快速发展,人们对自动化综述工作 有了新的期待。 一些较新的基于人工智能的科学搜索引擎已经能通过查找、分类和总结出版物,帮助人们撰写叙述性文献综 述,也就是用文字形式系统地整理研究成 ...
第45届国际预测大会在京落幕 预测研究“中国力量”引全球瞩目
Sou Hu Cai Jing· 2025-07-04 07:10
7月2日,第45届国际预测大会(ISF 2025)在北京圆满闭幕。 国际预测大会是该领域最具权威性的国际学术会议。自1981年创办以来,今年首次在中国大陆举办,吸 引了来自全球35个国家和地区的580位顶尖预测科学学者、行业领袖及政策制定者注册参会,规模创历 史新高,充分展现了预测科学在全球的重要性日益增长及中国在该领域的影响力日益提升。 大会围绕"预测科学的前沿与创新"主题,聚焦人工智能、大数据、经济管理、能源环境、气候变化等关 键领域,设置了13场主旨报告、5场深度工作坊、12个平行论坛共计106个专题分论坛,累计开展348场 学术报告。专家学者们就贝叶斯预测、机器学习、大语言模型、预测不确定性、预测组合等热点议题, 以及预测在宏观经济、金融、供应链、能源、医疗、灾害防控等领域的应用展开了广泛而深入的交流。 据了解,下一届国际预测大会(ISF 2026)将于明年在加拿大举行。 ISF 2025大会报告人。主办方供图 本届大会不仅促进了全球预测科学前沿成果的分享与碰撞,也为深化该领域的国际科研合作与交流搭建 了重要平台,对推动预测科学的发展及其在应对全球挑战中的应用具有重要意义。 国际预测者协会主席Laur ...
Science子刊:2024年的生物医学论文,至少有14%利用了AI辅助写作
生物世界· 2025-07-04 06:47
撰文丨王聪 编辑丨王多鱼 排版丨水成文 当世界发生变化时,人类书写的文字也会随之改变。像战争、传染病大流行这样的重大事件会影响文本语料库中的词频分布。科学学科的兴衰在学术著作中也可 见一斑。那么,科学技术的进步是否也在我们的写作中留下了类似的痕迹呢? 2022 年 11 月, ChatGPT 横空出世,这让人类的写作经历了前所未有的变革:首次出现了一款广泛可用的大语言模型 (LLM) ,它能够在包括学术界在内的 多个领域生成和修改具有媲美人类表现的文本。此后,许多研究人员在日常写作任务中融入了大语言模型 (LLM) ,甚至与大语言模型共同撰写了论文。这也引 发了人们对科研诚信、大语言模型生成内容中的事实错误以及论文工厂滥用大语言模型生成虚假论文的担忧。 基于这些担忧,有研究人员开始尝试追踪 大语言模型辅助写作 ( LLM-assisted writing ) 在科学文本中留下的痕迹。 2025 年 7 月 2 日,德国图宾根大学的研究人员在 Science 子刊 Science Advances 上发表了题为 : Delving into LLM-assisted writing in biomedical ...
AI杀死了破折号,也绞杀了语文
Hu Xiu· 2025-07-04 04:23
Core Viewpoint - The article discusses the phenomenon of distinguishing AI-generated content from human writing through the use of specific punctuation marks, particularly the em dash and quotation marks, which have become symbols of AI writing [1][21][38]. Group 1: AI and Punctuation - The em dash and quotation marks are identified as common features in AI-generated text, leading to a societal response of avoiding these symbols to differentiate human writing from AI [21][52][60]. - The use of these punctuation marks is seen as a reflection of cultural and logical expression, which is now being abandoned to combat AI content [38][67]. - The article highlights a paradox where, to prove human authorship, individuals may need to relinquish previously valued linguistic tools [24][38][67]. Group 2: Cultural Implications - The reliance on specific punctuation as a means of identification reflects a broader cultural shift towards simpler, more primitive forms of communication in response to AI [67][72]. - This shift may lead to a degradation of language richness and expression, as society moves towards a more colloquial and less nuanced mode of communication [70][72]. - The article suggests that the future of human expression may resemble informal, error-prone language, devoid of the complexity that characterized previous writing styles [75][78].
清华&小米团队发布VLA模型综述
理想TOP2· 2025-07-04 02:54
以下文章来源于具身进化 ,作者一起学习 具身进化 . 智启形随,进化无界。 一、 自动驾驶的技术范式演进 自动驾驶技术正从简单的感知-控制,向更高级的认知智能演进,最新的自动驾驶模型可以分为三大范式: ●端到端自动驾驶 (End-to-End AD): 将传感器输入直接映射到驾驶动作。此模式高效但缺乏可解释性,难以处理需要高级推理的"长尾"场景。 ●用于自动驾驶的视觉语言模型 (VLMs for AD): 引入视觉语言模型来理解和解释复杂的交通场景,显著提升了系统的可解释性。但其输出的 语言与车辆的实际控制脱节,存在"行动鸿沟"。 ●用于自动驾驶的视觉-语言-行动模型 (VLA for AD): 当前最新的范式。它在一个统一模型中整合视觉感知、语言理解和动作执行,实现了感 知、推理和行动的闭环。车辆遵循自然语言指令直接输出动作或者轨迹。 二、 VLA自动驾驶模型的核心架构 一个典型的VLA模型由输入、处理、输出三部分构成,旨在无缝整合环境感知、高级指令理解与最终的车辆控制。 1.多模态输入 (Inputs): ○视觉与传感器数据:视觉是系统的核心输入,技术已从早期的单前视摄像头发展到如今的多摄像头环视系统。为 ...
提效10倍,AI颠覆软件开发,这五条经验是关键分水岭
3 6 Ke· 2025-07-04 02:15
Core Insights - AI tools are accelerating the software development process while exposing significant capability gaps among different teams, leading to output differences of up to tenfold or more [1] - The concept of "AI-native development" requires a complete redesign of the development system, integrating AI at every stage from prototyping to deployment [1] - The conversation with Cedric Ith, founder of Perceptron AI, highlights the need for developers to collaborate effectively with AI, focusing on what successful teams do right [1][2] Group 1: Key Experiences from Cedric - Taste is the new competitive advantage, shifting focus from technical skills to design thinking and product intuition in an era where AI can generate code rapidly [3] - The ability to ask precise questions and create delightful user experiences is becoming the new barrier to entry in software development [3] - AI is redefining the design process, allowing designers to explore numerous concepts quickly and generate user-centric solutions [3] Group 2: New Design Paradigms - Natural language is emerging as a primary design interface, shifting the designer's role from creating visuals to articulating product structure through language [4][5] - Designers are developing a "design vocabulary" to communicate effectively with AI, enabling rapid prototyping that previously took engineers days to complete [5][6] - The ability to break down complex requests into clear, executable language is becoming essential for effective collaboration with AI [6] Group 3: The Rise of Design Engineers - The traditional boundary between design and engineering is dissolving, with designers now able to contribute directly to code and manage the entire tech stack [7][8] - This shift enhances efficiency and redefines product manufacturing, as designers gain control over the entire delivery process [8][9] - The iterative speed of design and development has significantly increased, compressing the time between design reviews and implementation from days to hours [10] Group 4: AI-Native Design Principles - Key principles for AI product design include reducing cognitive load, accepting non-determinism, and ensuring transparency in AI reasoning processes [11][12][13] - The design focus is shifting from user execution to user orchestration, requiring designs that facilitate coordination among multiple intelligent agents [14] - Teams adopting these principles early will create more intuitive and trustworthy AI experiences [14] Group 5: Organizational Adaptation in the AI Era - Organizations must transition from building perfect products to creating rapid learning organizations to keep pace with the fast-evolving AI landscape [15][16] - Cedric emphasizes the importance of quickly producing high-fidelity prototypes to gain internal buy-in, making design a catalyst for organizational change [16] - The entire product development cycle is being compressed, leading to unprecedented innovation density [16] Group 6: Cedric's AI Design Stack - The design stack includes tools like Figma for visual design, v0 for dynamic behavior definition, and Cursor for code-level adjustments, facilitating seamless transitions between design and engineering [17] - Component libraries like Shadcn and Tailwind provide standard semantics for AI, reducing risks associated with hallucinations in code generation [17]
无损加速视觉语言模型推理!轻松剪掉视觉冗余Token|腾讯AI Lab
量子位· 2025-07-04 01:42
VScan团队 投稿 量子位 | 公众号 QbitAI 多图像、长视频、细粒度感知正在让大型视觉语言模型(LVLM)变得越来越聪明,但也越来越"吃不消": 视觉Token数量的激增所带来的推理成本暴涨,正逐渐成为多模态智能扩展的最大算力瓶颈。 为解决这个问题, 腾讯AI Lab联合CMU 提出全新解决方案 VScan 。 该方法聚焦于大规模视觉语言模型推理阶段的效率瓶颈,通过精妙的两阶段视觉token筛选机制,在几乎不损性能的前提下,实现高达2.91x 的推理加速。无需修改模型架构、无需重新训练, 兼容FlashAttention, VScan为业界提供了一种轻量、通用、即插即用的推理加速方案。 为了处理更复杂、更丰富的视觉输入,现有LVLM往往需要编码远超文本Token规模的视觉信息。例如,LLaVA-NeXT在处理高分辨率图像时 会引入多达2,880个视觉Token,而Qwen2.5-VL在应对多图像或视频输入时,甚至能处理高达16,384个视觉Token——这一规模已远远超过 传统语言模型所处理的输入长度。 随着Token数量的激增,输入序列随之拉长,而自注意力机制的计算复杂度呈平方增长,这使得推理阶段 ...
AI眼镜行业深度解读:万亿市场如何掘金?
2025-07-03 15:28
AI 眼镜行业深度解读:万亿市场如何掘金?20250703 摘要 AI 眼镜作为 AI 大模型落地移动终端硬件的载体,具备稀缺性和成长性, 其核心功能包括替代蓝牙耳机、运动相机,甚至可能替代智能手机,市 场潜力巨大。 Meta 与雷朋联合推出的雷朋 Meta 是全球爆款产品,2024 年销量达 142 万台,其成功在于外观与普通眼镜无异,但增加了 AI 交互功能,且 性价比突出,延迟可接受,续航较长。 市场规模测算显示,AI 眼镜在音频、运动相机和 AR 显示替代方面分别 有 1,700 亿、300 亿和 1.8 万亿元人民币的潜力,远期市场空间巨大, 预计未来三至五年全球出货量可达 14 亿台。 2024 年 AI 眼镜全球销量约为 152 万副,渗透率仅为 0.3%,但 2025 年第一季度销量同比增长 82%,IDC 预计 2025 年销量将达 1,500 万台, 渗透率达 3.1%,市场处于导入初期,增长迅速。 AI 眼镜产业的核心驱动因素包括科技巨头入局带来的资金涌入、技术迭 代(如 Deepseek 模型和 Micro LED 显示)、成本降低以及爆品效应, 国产化路线通过零部件替代进一步降低成本 ...
中美AI差距有多大,AI竞争焦点在哪?《全球人工智能科研态势报告》全球首发
Tai Mei Ti A P P· 2025-07-03 10:36
Core Insights - The report titled "Global AI Research Landscape Report (2015-2024)" analyzes the evolution of AI research over the past decade, highlighting the competitive landscape between China and the United States in AI talent and publication output [2][7]. Group 1: AI Research Trends - The report identifies four distinct phases in AI research: initial phase (2015-2016), rapid development phase (2017-2019), maturity peak phase (2020-2023), and adjustment phase (2024) [4][5]. - The number of AI papers published globally increased significantly, with a peak of 17,074 papers in 2023, representing nearly a fourfold increase from 2015 [5][6]. - The year 2024 is expected to see a decline in publication volume to 14,786 papers, indicating a shift towards more specialized and application-oriented research [6]. Group 2: Talent Distribution - China has emerged as the second-largest hub for AI talent, with a total of 52,000 researchers by 2024, growing at a compound annual growth rate of 28.7% since 2015 [8]. - The United States leads with over 63,000 AI researchers, with significant contributions from institutions like Stanford and MIT, as well as tech giants like Google and Microsoft [8][9]. - Chinese institutions such as the Chinese Academy of Sciences, Tsinghua University, and Peking University are leading in terms of publication output and talent concentration [7][9]. Group 3: Institutional and Corporate Performance - The Chinese Academy of Sciences published 4,639 top-tier papers, while Tsinghua University and Peking University followed closely, showcasing China's institutional strength in AI research [7][9]. - In contrast, U.S. companies like Google, Microsoft, and Meta have a significantly higher average publication output compared to their Chinese counterparts, reflecting a disparity in research investment and output capabilities [9][10]. - The top three U.S. companies published 5,896 papers, which is 1.8 times the output of the top three Chinese companies [9][10]. Group 4: Gender Disparity in AI Talent - The report highlights a significant gender imbalance in AI research, with women making up only 9.3% of AI talent in China compared to 20.1% in the U.S. [12][13]. - Chinese institutions like Tsinghua University and Peking University have low female representation in AI, at 7.88% and 9.18% respectively, compared to 25%-30% in top U.S. institutions [12][13]. Group 5: Future Trends in AI Research - The report indicates that "deep learning" has been the dominant focus in AI research over the past decade, but its growth rate is expected to slow down, suggesting a need for new approaches [14][15]. - Emerging technologies such as "Transformers" are gaining traction, particularly in natural language processing and multimodal AI, indicating a shift in research focus [15]. - The integration of traditional AI fields with deep learning techniques is becoming more prevalent, reflecting a trend towards collaborative and interdisciplinary research [15].