Workflow
机器之心
icon
Search documents
电商上演「魔法对轰」:卖家用AI假图骗下单,买家拿AI烂水果骗退款
机器之心· 2025-08-05 08:41
Core Viewpoint - The article discusses the increasing misuse of AI technology by both buyers and sellers in e-commerce, leading to a trust crisis and the need for better verification methods to combat fraud [2][10][21]. Group 1: Buyer Misuse of AI - Some buyers are using AI-generated images to falsely claim product defects in order to obtain refunds, exploiting the difficulty of verifying the condition of perishable goods like fruits [2][6]. - This practice has evolved from earlier methods where buyers used basic photo editing tools, making it harder for sellers to detect fraud due to the sophistication of AI-generated images [8][10]. - The phenomenon reflects a "tit-for-tat" mentality among buyers who have previously been deceived by sellers using AI-enhanced product images [10][21]. Group 2: Seller Misuse of AI - Sellers are also misusing AI to create misleading product images, over-enhancing ordinary items, and generating fake reviews, which contributes to the issue of "goods not matching the description" [10][24]. - The article highlights that sellers may use virtual models and AI-generated content to cut costs, further complicating the authenticity of product representations [10][24]. Group 3: Proposed Solutions - Various proposed solutions to combat this issue include requiring buyers to submit videos of defective products, taking multiple photos from different angles, and using in-app cameras to prevent the upload of AI-generated images [11][15][24]. - However, these solutions have limitations, as advanced AI tools can still generate convincing content, making it challenging to establish foolproof verification methods [11][15][23]. Group 4: Technological Innovations - The article suggests that implementing digital watermarking and content provenance technologies could help in identifying and tracing AI-generated content, thus enhancing trust in e-commerce [19][21]. - The development of standards like C2PA and tools such as Google's SynthID aims to embed invisible watermarks in AI-generated media, which could serve as a digital identity for content [19][21][26]. Group 5: Ongoing Challenges - The ongoing "cat-and-mouse" game between AI generation and detection technologies poses a continuous challenge, as both sides evolve rapidly [23][24]. - E-commerce platforms are exploring various strategies, including strengthening evidence chains and utilizing big data analytics to monitor user behavior and detect anomalies [24][26].
科研写作神器,超越Mathpix的科学公式提取工具已开源
机器之心· 2025-08-05 08:41
LaTeX 公式的光学字符识别(OCR)是科学文献数字化与智能处理的基础环节,尽管该领域取得了一定进展,现有方法在真实科学文献处理时仍面临诸多挑战: 其一,主流方法及公开数据集多聚焦于结构简单、符号单一的公式,难以覆盖多学科、高难度的复杂公式;其二,实际文档中广泛存在的多行公式、长公式、分 段公式及页面级复杂排版等情况尚未得到充分关注与处理;其三,大多数方法依赖专用模型,通常需要针对特定任务进行专门设计,难以实现通用性和扩展性。 针对上述挑战,DocTron 团队提出了系统性解决方案。 首先,针对现有数据集覆盖面有限、结构单一的问题,构建了涵盖 多学科、多结构的大规模 高难度数据集 CSFormula ,包含行级、段落级和页面级的复杂排 版。 其次,团队提出的 DocTron-Formula 模型 突破了对特定结构建模的依赖,采用通用大模型驱动的复杂公式识别方法,仅需简单微调即可适配多样化应用场景。 最后,相比于最优的定制化公式识别模型,该方法不仅在主流的开源评测中取得了优秀的性能表现,在实际应用中常见的页面级、段落级复杂排版场景中也取得 了显著优势,推动了公式识别的应用边界。 $$\sigma^{2}=\i ...
谷歌约战,DeepSeek、Kimi都要上,首届大模型对抗赛明天开战
机器之心· 2025-08-05 04:09
Core Viewpoint - The upcoming AI chess competition aims to showcase the performance of various advanced AI models in a competitive setting, utilizing a new benchmark testing platform called Kaggle Game Arena [2][12]. Group 1: Competition Overview - The AI chess competition will take place from August 5 to 7, featuring eight cutting-edge AI models [2][3]. - The participating models include notable names such as OpenAI's o4-mini, Google's Gemini 2.5 Pro, and Anthropic's Claude Opus 4 [7]. - The event is organized by Google and aims to provide a transparent and rigorous testing environment for AI models [6][8]. Group 2: Competition Format - The competition will follow a single-elimination format, with each match consisting of four games. The first model to score two points advances [14]. - If a match ends in a tie (2-2), a tiebreaker game will be played, where the white side must win to progress [14]. - Models are restricted from using external tools like Stockfish and must generate legal moves independently [17]. Group 3: Evaluation and Transparency - The competition will ensure transparency by open-sourcing the game execution framework and environment [8]. - The performance of each model will be displayed on the Kaggle Benchmarks leaderboard, allowing real-time tracking of results [12][13]. - The event is designed to address the limitations of current AI benchmark tests, which struggle to keep pace with the rapid development of modern models [12].
清华叉院教授手把手教你写强化学习
机器之心· 2025-08-05 04:09
Core Insights - The article discusses AReaL-lite, a reinforcement learning training framework designed for algorithm developers, allowing users to modify a single file to implement various RL training algorithms and custom agent workflows, while achieving optimal model performance through Fully Async RL [1][10]. Group 1: Event Details - The sharing session will feature Professor Wu Yi from Tsinghua University's Interdisciplinary Information Institute and core members of the AReaL team, using a multi-turn math reasoning example to teach RL [2][10]. - The live session is scheduled for August 7, 19:30-20:30 Beijing time, and participants are encouraged to prepare a GPU server, preferably with 4 cards [8][10]. Group 2: AReaL-lite Features - AReaL-lite's key characteristics include: - Fully async RL for rapid training [10]. - Ecosystem-friendly, compatible with various open-source ecosystems [10]. - Algorithm-first approach, ensuring minimal file modifications for complex algorithms [10]. Group 3: Team Introduction - The team includes: - Wu Yi, Assistant Professor at Tsinghua University and Chief Scientist of the AReaL team [10]. - Fu Wei, a PhD student at Tsinghua University and core member of the AReaL project [10]. - Mei Zhiyu, a researcher at Ant Group's reinforcement learning lab and a PhD from Tsinghua University [10].
南大周志华团队最新力作:一个算法通吃所有,在线学习迎来新范式?
机器之心· 2025-08-05 04:09
在在线凸优化(online convex optimization)的框架下,已有一些算法能够有效地最小化自适应遗憾值。然而,现有算法存在通用性不足的问题:它们通常只能处 理某一类特定的凸函数,并且需要预先知道某些参数,这限制了它们在实际场景中的应用。 机器之心报道 编辑:冷猫、Panda 世界是动态变化的。为了理解这个动态变化的世界并在其中运行,AI 模型必须具备在线学习能力。为此,该领域提出了一种新的性能指标 —— 适应性遗憾值 (adaptive regret),其定义为任意区间内的最大静态遗憾值。 为了解决这一局限,南京大学周志华团队研究了具有 双重 自适 应性(dual adaptivity) 的通用算法。这类算法不仅能够自动适应函数的性质(如凸、指数凹或强 凸),还能够适应环境的变化(如静态或动态环境)。 论文标题:Dual Adaptivity: Universal Algorithms for Minimizing the Adaptive Regret of Convex Functions 论文链接:https://arxiv.org/pdf/2508.00392 具体而言,该团队提出了一 ...
全球首个人形机器人通用视觉感知系统,Humanoid Occupancy建立多模态环境理解新范式
机器之心· 2025-08-05 04:09
第一作者崔巍,北京人形机器人创新中心感知算法负责人;共同一作王浩宇,极佳科技算法工程师,项目负责人;通讯作者张强,北京人形机器人创新中心 学术委员会主任。 凭借类人化的结构设计与运动模式,人形机器人被公认为最具潜力融入人类环境的通用型机器人。其核心任务涵盖操作 (manipulation)、移动 (locomotion) 与导航 (navigation) 三大领域,而这些任务的高效完成,均以机器人对自身所处环境的全面精准理解为前提。 然而,传统感知系统存在明显局限:有些仅能适配特定场景,难以应对复杂多变的真实环境;有些无法有效融合多种传感器信息,导致数据利用率低下。这 直接造成机器人在实际应用中频繁出现感知失效问题,严重制约了任务执行效率。 为此,北京人形机器人创新中心推出 Humanoid Occupancy 感知系统,为破解这一行业难题提供了革命性方案。该系统通过创新性融合多模态传感器信 息,构建起基于语义占用 (occupancy) 表征的通用感知框架,能够精准捕捉环境中的语义属性与几何特征,为机器人的任务规划和导航决策奠定坚实基 础,也为人形机器人向实际场景大规模部署迈出了关键的一步。 论文标题:Hu ...
手机也能跑,腾讯混元一口气开源4款小模型
机器之心· 2025-08-04 09:01
Core Viewpoint - Tencent's Hunyuan team has open-sourced four small language models, with the largest model being 7 billion parameters, aimed at low-power consumption scenarios and supporting vertical domain fine-tuning [1][3]. Model Characteristics - The four models can run on consumer-grade graphics cards, making them suitable for laptops, smartphones, and smart home devices [3]. - They are designed as fusion inference models, offering fast inference speeds and high cost-performance ratios, with capabilities in language understanding, mathematics, and reasoning [6]. - The models have a long context window of 256k, allowing them to process extensive content equivalent to reading three "Harry Potter" novels [12]. Deployment and Usability - All four models can be deployed on a single card, with compatibility for various consumer devices [12]. - They have been tested in multiple core business applications within Tencent, demonstrating their practicality and effectiveness [15]. Industry Context - The trend of open-sourcing AI models is gaining momentum in China, with Tencent being a significant player in this movement [16][20]. - The recent release of the Hunyuan 3D World Model has also gained significant traction, indicating a growing interest in multi-modal AI capabilities [17]. Application Scenarios - The models are utilized in productivity tools like Tencent Meeting AI Assistant and WeChat Reading AI, achieving precise understanding and summarization of extensive texts [18]. - In the financial sector, AI assistants using these models can achieve over 95% intent recognition accuracy with minimal fine-tuning [18].
3D-R1:让AI理解3D世界的下一步
机器之心· 2025-08-04 09:01
Core Insights - The article discusses the development of a new 3D visual language model called 3D-R1, which aims to enhance reasoning capabilities in understanding complex 3D scenes, potentially setting a new paradigm for 3D AI systems [4][6]. Group 1: Importance of 3D Scene Understanding - Understanding real-world 3D environments is significantly more complex than recognizing images, which is crucial for applications like service robots, autonomous driving, and AR/VR [7]. - Current 3D visual language models face two main challenges: insufficient spatial understanding and weak reasoning capabilities [15][18]. Group 2: Innovations of 3D-R1 - 3D-R1 focuses on precise perception of 3D scenes and incorporates a training mechanism to enhance reasoning abilities, allowing the model to "think" and "judge" like humans [8]. - The model introduces a high-quality reasoning dataset called Scene-30K, which consists of 30,000 structured and logically clear training samples, addressing the lack of multi-step logical training examples in existing datasets [10][13]. - A reinforcement learning mechanism based on Group Relative Policy Optimization (GRPO) is employed to enable the model to self-optimize during the answer generation process [14]. - A dynamic viewpoint selection strategy is proposed to help the model automatically choose the six most representative views, ensuring critical details are not missed [18][19]. Group 3: Performance Evaluation - 3D-R1 has been evaluated across seven 3D tasks, including 3D-QA, 3D Dense Captioning, and 3D Reasoning, demonstrating superior performance compared to previous models [21]. - In the 3D scene dense description task, 3D-R1 outperformed prior specialized models on the ScanRefer and Nr3D datasets [24]. - The model achieved optimal results in the challenging 3D question-answering tasks on the ScanQA benchmark validation and test sets [26]. Group 4: Future Applications - 3D-R1 has significant practical application potential, including in household robotics for understanding object locations and decision-making, in the metaverse/VR for interactive guidance, in autonomous driving for real-time street scene comprehension, and in industrial inspections for identifying potential risk areas [29][30].
在WAIC耳朵听出茧子的「智能体」,是时候系统学一下了
机器之心· 2025-08-04 07:05
Core Insights - The article emphasizes the shift in perception of AI large models from simple chatbots to intelligent agents capable of proactive thinking, planning, and task execution [1][2]. Group 1: LLM and Its Capabilities - Standard LLMs generate text responses based on given prompts, showcasing their versatility as a significant advantage [5]. - The integration of reasoning and external API interactions into LLMs is crucial for developing advanced AI agents [6]. Group 2: Tool Utilization - The ability to teach LLMs to integrate and use external tools has become a hot topic in AI research, with examples including calculators, calendars, and search engines [7]. - LLMs can act as "commanders" that coordinate various specialized tools to solve problems effectively [8]. Group 3: Reasoning Models - Reasoning capabilities have been a core focus in LLM research, with the ability to break down complex problems into smaller tasks and determine which tools to use being essential [21][23]. - The Chain of Thought (CoT) method enhances LLMs' reasoning by guiding them to generate a reasoning process before arriving at a final output [24][25]. Group 4: ReAct Framework - The ReAct framework allows LLM-driven agents to autonomously decompose and solve complex problems through a sequential process that integrates reasoning and action [41]. - The framework expands the action space to include language as a form of action, enabling agents to "think" in addition to executing actions [46][49]. Group 5: Applications and Performance - ReAct has been applied in knowledge-intensive reasoning tasks and decision-making scenarios, demonstrating its effectiveness in various contexts [63][64]. - Performance comparisons show that ReAct consistently outperforms other models, highlighting the importance of reasoning during action execution [77]. Group 6: Future of AI Agents - The development of reliable AI agent systems is crucial, as current systems may fail if any step in the sequential problem-solving process goes wrong [114]. - Ongoing research aims to enhance the capabilities and reliability of AI agents, indicating significant advancements in the near future [115].
机器人手画圆圈,怎么就成为了一大难题了?
机器之心· 2025-08-04 07:05
Core Viewpoint - The article discusses advancements in robotics, particularly focusing on a flexible robotic hand developed by Daxo Robotics, which showcases unique capabilities such as drawing a circle, highlighting the potential of soft robotics in achieving human-like dexterity and flexibility [1][10]. Group 1: Innovations in Robotics - The upcoming World Robot Conference has prompted increased attention to robotic innovations, including humanoid robots and household robots [1][2]. - Daxo Robotics has introduced a robotic hand that can perform tasks like drawing circles, which is claimed to be a significant achievement in robotic dexterity [5][10]. - The robotic hand features 40 tendons without traditional joints, allowing for a unique range of motion and control that traditional robotic arms cannot achieve [7][8]. Group 2: Technical Specifications - The robotic hand is said to possess "unlimited" degrees of freedom and a grip strength of 7 kilograms, indicating its advanced design and functionality [8]. - Unlike rigid robots, the flexible robotic hand can exhibit hundreds of controllable degrees of freedom, enhancing its adaptability and performance [10]. - The hand is designed for machine learning, utilizing both remote control and simulation to gather data, which allows for extensive exploration in its learning capabilities [10]. Group 3: Market Implications - The demonstration of the robotic hand's capabilities, such as drawing and manipulating objects, suggests a shift towards more sophisticated and flexible robotic solutions in various applications [10][11]. - The interest generated by Daxo Robotics' innovations indicates a growing market for soft robotics, which may outperform traditional rigid robots in specific tasks [10][11].