Workflow
量子位
icon
Search documents
AI花17小时写了篇30页学术论文!自主选题,包含实验,还符合APA格式规范
量子位· 2025-10-04 04:13
Core Insights - The article discusses an AI system named Virtuous Machines that autonomously conducted research, producing a 30-page academic paper in 17 hours at a cost of $114 [1][3][24] - The research focused on cognitive psychology, specifically human visual cognition [5][24] Research Process - The AI system generated research questions based on cognitive psychology theories, such as the relationship between visual working memory and mental rotation abilities [9] - It designed an experimental plan, calculated sample size, controlled variables, and measured participants' mental imagery clarity using the VVIQ2 scale [11] - The AI recruited 288 participants through the Prolific online platform, ultimately collecting 277 valid responses [11] - Data analysis involved writing Python code for repeated measures ANOVA, identifying outliers, and adjusting statistical models [12] AI System Architecture - The AI's research capabilities stem from a collaborative structure that simulates human cognitive mechanisms and allows dynamic knowledge interaction [14][21] - The core control module, referred to as Master, oversees the entire process, while other AI assistants focus on specific tasks like literature retrieval and data analysis [15][16] - The system's foundational abilities include knowledge retrieval, abstract reasoning, metacognitive reflection, task decomposition, and autonomous iteration [20] Efficiency and Limitations - The AI's efficiency is highlighted, being over ten times faster than human teams, with rigorous data analysis that avoids statistical pitfalls [24] - However, it occasionally misinterprets theories, mislabels chart axes, and confuses terms, indicating limitations in theoretical depth and innovative thinking compared to human researchers [25][26]
陶哲轩用GPT-5解决数学难题:仅29行Python代码
量子位· 2025-10-04 04:13
Core Insights - The article highlights how AI, specifically GPT-5, has significantly aided mathematician Terence Tao in solving complex mathematical problems, reducing the time and effort required for manual calculations and coding [1][2][3]. Group 1: AI's Role in Mathematics - Terence Tao expressed that without AI assistance, completing similar tasks would take several hours, primarily due to manual coding and debugging [1]. - Tao utilized GPT-5 to tackle a problem on MathOverflow regarding the relationship between the least common multiple sequence and highly abundant numbers, which required extensive numerical searches [7][10]. - The AI's ability to assist in this mathematical inquiry marks a new era of collaboration between humans and machines in exploring complex problems [5][29]. Group 2: Problem-Solving Process - Initially, Tao attempted to have GPT-5 generate a Python program to search for counterexample parameters but faced issues with long execution times and improper initial parameters [19][20]. - He then shifted to a step-by-step dialogue with GPT-5, breaking down the larger problem into smaller, manageable parts, which ultimately led to the successful generation of the required parameters [21][22]. - The final solution involved a concise 29-line Python script generated by GPT-5, which Tao used for independent verification, confirming the results aligned with his heuristic predictions [23][24]. Group 3: Broader Implications of AI in Research - This instance is not the first time Tao has employed AI for mathematical problem-solving; he has previously used AI for various projects, demonstrating its potential as a mediator in mathematical proofs [27][28]. - The article suggests that while AI may not achieve accolades like the Fields Medal in the short term, it can significantly enhance the efficiency and effectiveness of mathematical research [28][29].
OpenAI强硬回击马斯克窃密诉讼!xAI被指恶意人肉离职员工
量子位· 2025-10-04 04:13
Core Viewpoint - OpenAI has responded strongly to the lawsuit filed by xAI, denying all allegations of corporate espionage and asserting that the lawsuit is an attempt to intimidate its employees [2][3][10]. Group 1: Allegations by xAI - xAI has made three main allegations against OpenAI: violation of federal trade secret laws, intentional interference with xAI's economic relationships with its employees, and violation of California's unfair competition laws [11]. - Specific incidents cited include the alleged theft of proprietary information by former xAI engineers Xuechen Li and Jimmy Fraiture, who are accused of transferring sensitive data to OpenAI [12][14][15]. - xAI also claims that a former senior finance executive left without signing a confidentiality agreement and took critical strategic information to OpenAI [19][20]. Group 2: OpenAI's Defense - OpenAI has categorically denied the allegations, stating that Xuechen Li never officially joined the company and did not transfer any proprietary information [27][29]. - Regarding Jimmy Fraiture, OpenAI asserts that any actions taken during his "garden leave" were personal and not directed by OpenAI, and that no confidential information was received [31][32]. - OpenAI emphasizes that the unnamed finance executive's departure was unrelated to any alleged poaching and was due to refusing to engage in improper financial practices at xAI [33][34]. Group 3: Legal Proceedings - OpenAI has filed a motion to dismiss xAI's lawsuit, arguing that the claims lack merit and that the inclusion of names of former employees not accused of wrongdoing is an act of intimidation [37]. - A hearing for this motion is scheduled for November 18, 2025, which will address procedural matters rather than the substantive issues of the case [38].
Nano Banana新增2大功能,还开放API了,一张图不到3毛钱
量子位· 2025-10-03 04:19
Core Insights - Nano Banana has officially opened its API, allowing developers to integrate it into their products and enabling large-scale content production for enterprises [9][10] - The API pricing is set at approximately $0.039 per image output, translating to about 0.28 yuan, with a cost of $30 for every 1 million image output tokens [2][15][16] - Google has introduced two new features: customizable aspect ratios and a pure image generation mode, enhancing its utility for content creators [3][8] Pricing and Cost Structure - Each image generated costs about $0.039 (approximately 0.28 yuan), with the maximum image size being 1024x1024 pixels, consuming around 1290 tokens [16] - The pricing for image generation is 12 times higher than the Gemini 2.5 Flash text mode [17] New Features - The first new feature allows users to customize aspect ratios, offering over ten options including 16:9, 9:16, 4:3, and 3:2, catering to various visual content needs [4][18] - The second feature supports pure image output mode, which returns only images without additional text, saving tokens and reducing contextual interference, ideal for real-time previews and e-commerce displays [7][8] Application and Usability - Users can create their own applications directly in Google AI Studio by inputting prompts, making it accessible for non-developers [13][14] - The new features are designed to meet the practical needs of content creators, positioning Nano Banana as a more practical tool [8]
用两个简单模块实现分割理解双重SOTA!华科大白翔团队等推出多模态新框架
量子位· 2025-10-03 04:19
Core Insights - The article discusses the evolution of multimodal large models from text-to-image generation to pixel-level tasks such as image segmentation, highlighting the challenges of imprecise segmentation results and hallucinations during understanding [1][2]. Group 1: Model Development - The research teams from Huazhong University of Science and Technology and Kingsoft Office proposed two core modules: Semantic Enhanced Feature Extractor (SEFE) and Interleaved Local Visual Coupling (ILVC) to address segmentation accuracy and hallucination issues [3][24]. - SEFE enhances object attribute reasoning by integrating semantic features with pixel-level features, leading to more precise segmentation results [4][25]. - ILVC provides fine-grained supervision by generating local descriptions based on segmentation masks, effectively reducing hallucinations [5][26]. Group 2: Model Performance - The newly developed multimodal large model, LIRA, achieved state-of-the-art (SOTA) performance in both segmentation and understanding tasks [6]. - Compared to InternVL2, LIRA maintains understanding performance while additionally supporting image segmentation tasks; it shows an average improvement of 8.5% in segmentation tasks over OMG-LLaVA and a 33.2% enhancement on MMBench [7]. Group 3: Experimental Results - LIRA demonstrated superior performance across multiple understanding and segmentation datasets, with a slight performance drop of only 0.2% when jointly trained on both comprehension and segmentation datasets [40]. - The integration of SEFE and ILVC resulted in a reduction of hallucination rates by 3.0% and 4.8% for models of sizes 1.8B and 7B, respectively [38]. Group 4: Future Directions - The article suggests that future research should explore the relationship between text and visual tokens, which may provide new insights for enhancing the understanding and segmentation capabilities of multimodal large models [43].
2025人工智能年度评选启动!3大维度5类奖项,正在寻找AI+时代领航者
量子位· 2025-10-03 04:19
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 这是量子位人工智能年度榜单的 第8年 。八年来,我们见证了技术的突破与落地,产业的融合与重塑,也见证了一批又一批推动时代前行 的企业、人物与产品。 在人工智能重新定义一切的时代里,智能技术已不再是单一工具,而是产业与社会协同进化的驱动力。我们期待通过这场年度评选,去发现 并致敬那些真正引领变革、开拓边界的探索者与实践者。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 产品榜 人物榜 2025 人工智能年度潜力创业公司 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 评选标准 : 聚焦于中国人 ...
LeCun不想再忍了!亲口承认要辞职
量子位· 2025-10-03 04:19
Core Viewpoint - Yann LeCun, a Turing Award winner and a key figure in AI at Meta, is reportedly considering resigning from his position as Chief Scientist of FAIR due to dissatisfaction with recent organizational changes within the AI department at Meta [1][2][3]. Group 1: Organizational Changes and Impact - Recent months have seen significant organizational turmoil within Meta's AI division, leading to LeCun's growing frustration [3][9]. - A new policy requiring additional review from the TBD Lab before FAIR can publish research papers has been implemented, which LeCun views as a direct challenge to academic freedom [5][7][21]. - Meta has undergone four internal reorganizations of its AI department within just six months, creating instability and confusion among researchers [15][17]. Group 2: Personal Impact on LeCun - LeCun has reportedly been demoted in the internal power structure, with the appointment of a new chief scientist for the MSL Lab effectively reducing his influence [18][20]. - The new requirement for additional review of research outputs has further restricted LeCun's ability to publish and share his work, which has been a core aspect of FAIR's mission for the past 12 years [23][25]. Group 3: Team Morale and Internal Tensions - The new policies have led to widespread disappointment among the FAIR team, with some members feeling their academic freedom has been severely limited [27][28]. - Tensions are rising between long-standing employees and new hires, as Meta has aggressively recruited top talent from competitors, leading to disparities in resources and treatment [30][34]. - Reports indicate that the work environment has become highly competitive and stressful, with a culture of "territorial disputes" emerging within the AI departments [34][35]. Group 4: Broader Implications for Meta - The internal strife is not limited to the AI teams; employees from other departments, such as Reality Labs, have expressed dissatisfaction with the AI division's direction and management [38]. - The recent launch of Meta AI's new feature "Vibes" has not performed well in the market, further highlighting the challenges the company faces in maintaining its competitive edge [42][43].
斯坦福洗碗机器人新作!灵巧手跟人学采茶做早餐,CoRL 2025提名最佳论文
量子位· 2025-10-02 05:30
Core Viewpoint - The article discusses the development of DexUMI, a data collection and strategy learning framework that enables robots to learn dexterous tasks through human demonstration, significantly improving data collection efficiency and task success rates [2][35]. Group 1: DexUMI Framework - DexUMI utilizes human hands as a natural interface to transfer dexterous skills to various robotic hands, minimizing the embodied differences between human and robotic manipulation [2][17]. - The framework has achieved an average task success rate of 86% across multiple tasks and improved data collection efficiency by 3.2 times compared to traditional remote operation methods [7][32]. Group 2: Hardware and Software Innovations - The hardware component includes a wearable exoskeleton designed for each type of dexterous hand, optimizing parameters to match human hand movements while maintaining wearability [18]. - The software adaptation involves a data processing pipeline that ensures visual consistency between human demonstrations and robotic deployments, crucial for effective skill transfer [22][32]. Group 3: Testing and Results - DexUMI was tested on two different dexterous hand platforms, achieving high success rates in complex tasks such as opening egg cartons and performing tea ceremonies [32][33]. - The Inspire Hand and XHAND 1 were evaluated, with XHAND 1 demonstrating superior performance due to its fully actuated design and advanced tactile sensing capabilities [33][39]. Group 4: Future Implications - The research establishes a new paradigm for efficient data collection and strategy learning, potentially leading to a community for data sharing among researchers and industry players, enhancing the development of dexterous robotic applications [39][41].
Sora2甚至可以预测ChatGPT的输出
量子位· 2025-10-02 05:30
Core Insights - Sora2 demonstrates advanced capabilities in predicting ChatGPT outputs and rendering HTML, blurring the lines between video generation and interactive AI [2][6] - The system can simulate interactions, generating audio responses in a ChatGPT-like manner, showcasing its ability to create coherent and contextually relevant content [4][5] - Sora2 exhibits a strong understanding of physical phenomena, such as light refraction, without explicit prompts, indicating a high level of intelligence and information processing ability [14][18] Group 1: Sora2's Capabilities - Sora2 can generate interactive content, including video scenes and audio responses, effectively simulating a conversation with ChatGPT [4][6] - The system successfully rendered HTML code, producing results that closely match what would be seen in a real browser [7][12] - Sora2's ability to understand and simulate physical concepts, like glass refraction, was demonstrated through a practical test, impressing users with its accuracy [15][18] Group 2: Game Simulation and Information Processing - Sora2 accurately recreated elements from the game "Cyberpunk 2077," including map locations, terrain, and vehicle designs, showcasing its capability to extract and integrate key information [21][25] - Despite minor inaccuracies, Sora2's performance in simulating a side quest reflects its advanced information processing skills and understanding of complex scenarios [24][25] - There is speculation that Sora2's high-level performance may be based on training with large language models (LLMs), hinting at its potential for further undiscovered capabilities [26][27]
Murati翁荔陈丹琦公司发布首个产品,让大模型微调门槛暴降,要重新发明一个OpenAI
量子位· 2025-10-02 03:26
Core Insights - Thinking Machines Lab has launched its first product, Tinker, which simplifies model fine-tuning to the level of modifying Python code [1][12] - The company has moved past the "zero product, zero revenue" valuation of $84 billion [2] Product Overview - Tinker is a flexible API designed for fine-tuning language models, allowing researchers to control algorithms and data without managing infrastructure [12][13] - The initial support for Tinker includes Qwen3 and Llama3 series models, enabling easy switching between small and large models with a simple string modification in Python code [15] - Tinker’s API automates low-level training steps while handling scheduling, scaling, and error recovery [17] Technical Features - Tinker utilizes LoRA to allow multiple training tasks to share the same GPU, reducing costs and enabling more parallel experiments [22] - The gradient update strategy for Tinker is defined as: New parameters = Original parameters + Learning rate × Advantage value × Gradient of log probability [28] Industry Reception - Tinker has garnered significant attention in the industry, with beta testers noting its excellent balance between abstraction and tunability compared to other fine-tuning tools [30] - Research teams from prestigious institutions have already achieved notable results using Tinker [30] Strategic Vision - Thinking Machines Lab aims to reinvent a version of OpenAI that emphasizes open research sharing and greater freedom for researchers [10][11] - The company’s mission aligns with making cutting-edge models more accessible for customization based on individual needs [14]