AI 真能做研究吗？UniPat AI开源UniScientist，用30B小模型给出肯定答案

Core Viewpoint - The article emphasizes that most large models can generate text that resembles research but very few can actually conduct research by proposing hypotheses, collecting evidence, and iteratively validating until conclusions are reached [1] Group 1: Research Capability - Many models performing "research tasks" merely mimic scientific writing without genuine validation, often falling into logical traps and lacking reproducibility [4][5] - UniPat AI's UniScientist, with only 30 billion parameters, claims to possess "autonomous scientific research" capabilities, continuously proposing, falsifying, and refining hypotheses until evidence stabilizes [6][7] Group 2: Data Bottleneck - The article identifies a significant bottleneck in constructing high-quality research training data, with existing methods being either purely human or purely synthetic [8][11] - Human experts excel in verification, while large language models are better at generating diverse research questions and solutions, suggesting a more efficient division of labor [12] Group 3: Formalized Scientific Research - UniScientist models the open scientific process as a dynamic system based on two operations: Active Evidence Integration and Model Abduction [15] - The system executes three actions: generating hypotheses, acquiring external evidence, and updating hypotheses until evidence is sufficiently stable [17][18] Group 4: Verifiable Research Units - UniScientist introduces Evolving Polymathic Synthesis, which expands validated scientific claims into research-level questions and synthesizes evaluation rubrics that assess scientific discoveries rather than superficial qualities [20][21] - The dataset includes over 4,700 research-level instances, each with 20+ rubric items covering more than 50 disciplines and 400 research directions [22] Group 5: Collective Intelligence - UniScientist incorporates a training goal for result aggregation, allowing the model to merge strengths from multiple candidate research outputs to produce a more robust final result [30][31] Group 6: Performance Metrics - The evaluation results are notable, with UniScientist-30B-A3B achieving a score of 28.3 on FrontierScience-Research, surpassing several leading closed-source models [33] - Even without tools, the model's performance showed significant improvement, indicating enhanced research reasoning capabilities through training [35][36] Group 7: Future Directions - The next steps involve integrating real-world experimental capabilities, transitioning from narrative reasoning to a "test-correct" cycle, allowing hypotheses to be instantiated as computational experiments [39][41]