Workflow
Hypothesis Tracking
icon
Search documents
X @Nick Szabo
Nick Szabo· 2025-12-19 03:31
RT Alex Prompter (@alex_prompter)This paper from Harvard and MIT quietly answers the most important AI question nobody benchmarks properly:Can LLMs actually discover science, or are they just good at talking about it?The paper is called “Evaluating Large Language Models in Scientific Discovery”, and instead of asking models trivia questions, it tests something much harder:Can models form hypotheses, design experiments, interpret results, and update beliefs like real scientists?Here’s what the authors did di ...