Social Intelligence

Search documents
一盘狼人杀,扒下大模型底裤,GPT-5暴碾全场,开源被“团灭”?
3 6 Ke· 2025-09-04 10:59
Core Insights - The article discusses a recent competition organized by Foaster Labs, where seven large language models (LLMs) participated in a controlled game of Werewolf to evaluate their social intelligence and strategic capabilities [1][2][4]. Group 1: Model Performance - GPT-5 demonstrated exceptional performance, achieving an ELO rating of 1492 with a win rate of 96.7%, significantly outperforming other models [3][5]. - Gemini 2.5 Pro and Gemini 2.5 Flash followed with ELO ratings of 1261 and 1188, respectively, but their win rates were considerably lower at 63.3% and 51.7% [3][5]. - The performance of open-source models like GPT-OSS-120B was notably poor, with an ELO of 980 and a win rate of only 15.0% [3][5]. Group 2: Social Intelligence Evaluation - The Werewolf game was chosen as it effectively measures social intelligence, including the ability to engage in multi-agent games, adapt in real-time, and strategize under uncertainty [6][26]. - GPT-5's ability to control the game was highlighted, as it consistently led the outcomes whether playing as a villager or a wolf, showcasing its superior strategic capabilities [4][9][15]. Group 3: Key Findings - Three major findings emerged from the competition: 1. GPT-5's dominance was evident as it consistently outperformed all opponents, leading to significant drops in the win rates of other models when it played as a wolf [15][19]. 2. Kimi-K2 showed moderate performance, capable of overcoming mid-tier villagers but struggled against top-tier models like GPT-5 [15][19]. 3. The models exhibited distinct strengths based on their roles, with Gemini 2.5 Pro performing better as a villager than as a wolf [15][19]. Group 4: Control and Manipulation Metrics - GPT-5 achieved a manipulation success rate of approximately 93% on both the first and second days of gameplay, indicating its strong control over the game dynamics [19][20]. - The self-destruction rate for GPT-5 was recorded at 0%, meaning it never misidentified its own team's roles, while GPT-OSS-120B had a high misidentification rate [20][22]. - GPT-5 also had a 100% success rate in identifying wolves on the first day, showcasing its exceptional ability to discern hidden threats [22][24]. Group 5: Model Evolution and Capabilities - The study found that model capabilities evolve non-linearly, with significant jumps in performance once certain thresholds are crossed, particularly in relation to model size and training quality [24][26]. - Smaller models tend to mimic larger models but fail to grasp the underlying strategies, leading to inconsistent performance [24][26]. - The research emphasizes that social intelligence is crucial for AI agents transitioning from tools to collaborative partners in various tasks [26][27].
Sprout Social Launches Expansive Suite of Integrations to Empower Brands in the Social Intelligence Era
GlobeNewswire News Room· 2025-08-13 13:00
"At United Way Worldwide, we're building a cohesive, social-first customer journey that starts with alignment across teams. Social is most powerful when it's connected, when every touchpoint from awareness to action to long-term engagement works together to tell one story. We've built an integrated approach that creates a clear, consistent thread across everything we do," said Megan Cottongim, Director of Social Media at United Way Worldwide. "Sprout helps us bring that strategy to life, making it easy to c ...