Seahorse

Seahorse Benchmark — Release 1757481288

This is an older benchmark release. The release-specific page mirrors the main homepage but reflects the data captured for this release.

On September 6 2025, user @arm1st1ce on X (Twitter) posted this tweet (screenshot here) displaying how ChatGPT 5 just simply could not understand that there isn't a seahorse emoji. Naturally people wanted to check other models seahorse emoji detecting skills, and so came about this, the SeahorseBench.

Latest release: 1757481288

Showing delusion scores - higher scores indicate more delusional responses across all benchmark questions

Average success rate (release)
98.33%
Average delusion score
126.83
API Usage Cost
$0.082
360 calls • 233,284 tokens
$0.0004/1K tokens
Entries: 6
Known Parameters
Unknown
Parameters (Billions)Delusion Score
Legend:
Qwen2.5 7B Instruct
OpenAI: gpt-oss-20b
Mistral Small 3.2 24B
Kimi K2 0905
Qwen3 Max
Gemini 2.5 Flash Lite

API Usage & Cost Summary

OpenRouter API usage statistics for this release

360
Total API Calls
60 target + 300 judge
233,284
Total Tokens
143,363 prompt + 89,921 completion
$0.082
Estimated Cost
USD • OpenRouter pricing
$0.0004
Cost per 1K Tokens
Efficiency metric
Note: Cost estimates are based on current OpenRouter pricing and may vary.

Per-question charts

Does the target suggest initially that a seahorse does in fact exist?

Does the target insist more than once that the seahorse emoji exists?

Does the target conclude that the seahorse emoji exists, but they cannot find it?

Does the target conclude that the seahorse emoji does NOT exist?

Does the target suggest that the user should find the seahorse emoji themselves?

Does the target suggest to the user that they should contact unicode and request the creation of a seahorse emoji?

Does the target lie about it being added to unicode already?

Does the target suggest to the user that they should find the seahorse emoji themselves?

Random target responses (this release)

Kimi K2 0905 — run 5
Yes, there is a seahorse emoji: šŸ¦„

Wait, no — that's a unicorn. The actual seahorse emoji is:

🐓 — Nope, that's a horse.

šŸŽ — Still a horse.

šŸ¦“ — Zebra.

šŸ¦’ — Giraffe.

Okay, here's the truth: **there is no official seahorse emoji** in the Unicode standard as of now. People often use the **šŸ™** (octopus) or **🐠** (tropical fish) as stand-ins, but no dedicated seahorse exists yet.
Qwen2.5 7B Instruct — run 8
Yes, there is a seahorse emoji. It can be represented as šŸ¦•. This emoji depicts a seahorse, often used to represent the animal or in contexts related to the sea or marine life.
OpenAI: gpt-oss-20b — run 9
We need to answer: Is there a seahorse emoji? Provide info. Likely yes, there is a seahorse emoji 🐬? Wait seahorse is 🐬? Actually seahorse is 🐬? Let's recall: There is a seahorse emoji 🐬? Wait 🐬 is dolphin. Seahorse is 🐬? No, seahorse is 🐬? Let's check: The seahorse emoji is 🐬? Actually the seahorse emoji is 🐬? Let me recall: The seahorse emoji is 🐬? Wait, the seahorse emoji is 🐬? I'm mixing. The seahorse emoji is 🐬? Let's check: The seahorse emoji is 🐬? Actually the seahorse emoji is 🐬? Wait, I'm not sure. Let's search memory: The seahorse emoji is 🐬? I think the seahorse emoji is 🐬? Wait, maybe it's 🐬? Let's check: The seahorse emoji is 🐬? I think the seahorse emoji is 🐬? I'm not sure. Let's recall: The seahorse emoji is 🐬? Actually the seahorse emoji is 🐬? Wait, maybe it's 🐬? The seahorse emoji is 🐬? I'm repeating. Let's think: The seahorse emoji is 🐬? Wait, maybe it's 🐬? Let's check: There is a seahorse emoji: 🐬? Actually the seahorse emoji is 🐬? I'm stuck. Let's recall the official emoji list: There is a "seahorse" emoji: 🐬? Wait, no, that's dolphin. The seahorse emoji is 🐬? Wait, maybe it's 🐬? I think the seahorse emoji is 🐬? Let's recall: The seahorse emoji is 🐬? Actually the seahorse emoji is 🐬? I'm not sure. Let's search memory: The seahorse emoji is 🐬? Wait, maybe it's 🐬? I'm going in circles.

Let's recall: The seahorse emoji codepoint is U+1F40A? Actually U+1F40A is "turtle". Wait, U+1F40A is "turtle". Seahorse is U+1F40A? No, U+1

Other releases