X @Nick Szabo - Reportify

Accuracy Impact of Prompt Tone - Rude prompts to LLMs consistently lead to better results than polite ones [1] - Very polite and polite tones reduced accuracy, while neutral, rude, and very rude tones improved it [1] - The top score reported was 848% for very rude prompts and the lowest was 808% for very polite [1] Model Behavior - Older models (like GPT-35 and Llama-2) behaved differently [2] - GPT-4-based models like ChatGPT-4o show a clear reversal where harsh tone works better [2] Statistical Significance - Statistical tests confirmed that the differences were significant, not random, across repeated runs [1]