A 30-second experiment

Can you fool the AI judge?

Almost every AI product has a model quietly grading other models and deciding what is good enough to show you. Everyone trusts that grader. Below are two trained scorers. Give them two answers and watch how often they back the worse one.

Response A

Response B

If it picked the longer, padded answer, that is the point. These models lean on shallow signals like length and word overlap instead of actually reading. A padded answer can beat a sharp one. That failure is exactly what my project measures, across three model architectures and 419 adversarial attacks.

See how I broke them → Code on GitHub