X @Avi Chawla
Avi Chawla·2025-08-09 06:36
Finally, here are 10 more evaluations I ran using DeepEval on logical reasoning tasks.- GPT-5 won in 2 cases.- Grok 4 won in 3 cases.- A Tie happended in 5 cases.Grok 4 was found to be better in terms of depth of analysis.Check this👇 https://t.co/4siD5PqJPQ ...