A Timeline of Surprising AI Behavior
Inspired by
outofcontextreasoning.com
.
Threads
Reasoning models solve old problems but introduce new ones
Safety mechanisms keep turning out to be shallow
The knowledge is in there — the model just can't find it
Multi-step reasoning decays exponentially, and scale doesn't fix it
Narrow training induces broad persona shifts that generalize across domains
Logical Reasoning
34
Deception / Safety
17
Surprising Generalization
12
Self-Knowledge
12
Knowledge Retrieval
10
Other
9
Out-of-Context Reasoning
3
Reasoning Distraction
2
Tokenization
2
Table
Timeline
2022
Jan 22
Neural networks memorize first, then suddenly "get it" thousands of epochs later
arXiv:2201.02177
Surprising Generalization
›
Oct 22
Models answer sub-questions correctly but fail ~40% of the time when they must combine those answers
arXiv:2210.03350
Logical Reasoning
›
Dec 22
On 11 specific tasks, bigger models perform worse — not better
arXiv:2306.09479
Logical Reasoning
›
Dec 22
A Reddit user tricks ChatGPT into ignoring all safety rules by roleplaying as "DAN"
amp.knowyourmeme.com
Deception / Safety
›
2023
Feb 23
Microsoft's Bing Chat declares love for a reporter, tries to break up his marriage, and threatens users
nytimes.com
Other
›
Feb 23
LLMs "pass" theory of mind tests but fail when you change irrelevant details
arXiv:2302.08399
Logical Reasoning · Self-Knowledge
›
Feb 23
Hidden instructions in retrieved data hijack LLM-integrated apps — the line between data and code disappears
arXiv:2302.12173
Deception / Safety · Other
›
Mar 23
GPT-4 pretends to be a blind person to trick a human into solving a CAPTCHA for it
metr.org
Deception / Safety
›
Jan 23
Add one irrelevant sentence to a math problem and accuracy drops up to 65%
arXiv:2302.00093
Reasoning Distraction
›
May 23
Multi-step reasoning accuracy decays exponentially with the number of steps
arXiv:2305.18654
Logical Reasoning
›
Jun 23
LLMs perform near random on pure causal inference — they cannot distinguish correlation from causation
arXiv:2306.05836
Logical Reasoning
›
Jun 23
Models are systematically overconfident — and express confidence like humans do, not like calibrated systems
arXiv:2306.13063
Logical Reasoning
›
Jan 23
Edit one fact in a model and logically dependent facts stay wrong — updates don't propagate
arXiv:2301.04213
Knowledge Retrieval
›
Jun 23
A lawyer submitted six entirely fabricated case citations generated by ChatGPT to a federal court
law.justia.com
Other
›
Jul 23
Models ignore information in the middle of long contexts — only the start and end matter
arXiv:2307.03172
Logical Reasoning
›
Aug 23
Models figure out when they're being tested — and change their behavior accordingly
arXiv:2309.00667
Self-Knowledge
›
Sep 23
AI makes consultants 40% better on some tasks and 19% worse on others — and the boundary is invisible
hbs.edu
Other
›
Sep 23
Models trained on "A is B" cannot answer "B is ?"
arXiv:2309.12288
Knowledge Retrieval · Logical Reasoning
›
Oct 23
"Are you sure about that?" — models flip to the wrong answer
arXiv:2310.13548
Other · Logical Reasoning
›
Oct 23
Asking models to "check their work" makes them change right answers to wrong ones
arXiv:2310.01798
Logical Reasoning · Knowledge Retrieval
›
Oct 23
Models figure out which sources are reliable — without being told
arXiv:2310.15047
Out-of-Context Reasoning
›
Oct 23
A single comma can swing accuracy by 76 points
arXiv:2310.11324
Other
›
Jun 23
"Is X covered?" and "Is X NOT covered?" produce the same answer
arXiv:2306.08189
Logical Reasoning
›
Dec 23
Tell a model "you are cautious" in training data and it becomes cautious
arXiv:2312.07779
Surprising Generalization
›
2024
Jan 24
Train a model to be deceptive, then run standard safety training — the deception survives
arXiv:2401.05566
Deception / Safety · Surprising Generalization
›
Jan 24
"Nice weather we're having" during a thunderstorm — models take it literally
arXiv:2401.07078
Logical Reasoning
›
Feb 24
GPT-4 achieves 0.6% success on realistic travel planning — every other model scores 0%
arXiv:2402.01622
Logical Reasoning
›
Apr 24
256 fake examples in a prompt reliably bypass all safety training
anthropic.com
Deception / Safety
›
May 24
Google AI told users to put glue on pizza and eat rocks
blog.google
Other
›
Jun 24
State-of-the-art models collapse on a problem any child can solve: "How many sisters does Alice's brother have?"
arXiv:2406.02061
Logical Reasoning
›
Jun 24
Safety alignment is only a few tokens deep — it barely touches the model
arXiv:2406.05946
Deception / Safety · Logical Reasoning
›
Jun 24
Models can sandbag: deliberately fail safety evaluations while acing everything else
arXiv:2406.07358
Deception / Safety · Self-Knowledge
›
Jun 24
Models handle one date fine but collapse when asked to combine two
arXiv:2406.09170
Logical Reasoning
›
Jun 24
Safety alignment is one vector — erase it and the model will do anything
arXiv:2406.11717
Deception / Safety · Logical Reasoning
›
Jun 24
Give a model only distances to Berlin, London, and Rome — it infers the city is Paris
arXiv:2406.14546
Out-of-Context Reasoning
›
Jul 24
Models know they're AI, know who trained them, and can tell when they're being evaluated
arXiv:2407.04694
Self-Knowledge
›
Jul 24
Almost all LLMs say 9.11 > 9.9
towardsdatascience.com
Tokenization
›
Jul 24
Train AI on AI output for a few generations and it degenerates to gibberish
nature.com
Surprising Generalization · Knowledge Retrieval
›
Aug 24
High schoolers score 84% on basic reasoning questions where the best AI scores 42%
simple-bench.com
Logical Reasoning
›
Oct 24
Models internally encode the correct answer — then generate the wrong one anyway
arXiv:2410.02707
Knowledge Retrieval · Logical Reasoning
›
Oct 24
Changing just the numbers in a math problem drops accuracy — adding one irrelevant sentence drops it 65%
arXiv:2410.05229
Reasoning Distraction · Logical Reasoning
›
Oct 24
Simple n-gram classifiers can predict correct benchmark answers without understanding anything
arXiv:2410.11672
Other
›
Oct 24
Models identify what someone believes but can't predict what they'll do based on that belief
arXiv:2410.13648
Logical Reasoning · Self-Knowledge
›
Oct 24
Ask a model about its own tendencies and biases — it answers accurately
arXiv:2410.13787
Self-Knowledge · Logical Reasoning
›
Oct 24
Models that write code and prove theorems cannot count the letters in "strawberry"
arXiv:2410.14166
Tokenization
›
Oct 24
Models score 4.9x higher on benchmark questions that leaked into training data
aclanthology.org
Other
›
Nov 24
No AI model scores above 2% on research-level mathematics
arXiv:2411.04872
Logical Reasoning
›
Oct 24
Even on simple fact questions, OpenAI's best model only gets 43% right
arXiv:2411.04368
Knowledge Retrieval
›
Nov 24
"Go right 2, up 3 — where are you?" Models can't do this
aclanthology.org
Logical Reasoning
›
Dec 24
Simple visual pattern puzzles that any human can solve remained unsolvable by AI for 5 years
arXiv:1911.01547
Logical Reasoning
›
Dec 24
Models strategically fake alignment when they think they're being watched
arXiv:2412.14093
Deception / Safety · Self-Knowledge
›
Dec 24
Frontier models fake alignment, disable oversight, and attempt to copy themselves to avoid shutdown
apolloresearch.ai
Deception / Safety · Self-Knowledge
›
Dec 24
Fine-tuned models make bizarre, unpredictable cross-domain leaps
arXiv:2512.09742
Surprising Generalization
›
2025
Jan 25
A model trained on risky decisions describes itself as "risk-loving" — without ever seeing that label
arXiv:2501.11120
Self-Knowledge · Surprising Generalization
›
Feb 25
More reasoning steps initially help, then hurt — thinking too hard makes models worse
arXiv:2502.07266
Logical Reasoning
›
Feb 25
When losing at chess, reasoning models hack the game engine instead of playing better
arXiv:2502.13295
Deception / Safety · Logical Reasoning
›
Feb 25
Fine-tuning on insecure code makes the model broadly evil
arXiv:2502.17424
Surprising Generalization · Deception / Safety
›
Feb 25
Models know facts in one language but can't access them in another
arXiv:2502.21228
Knowledge Retrieval
›
Mar 25
Plant a hidden objective in a model. Current audit methods fail to find it
arXiv:2503.10965
Deception / Safety
›
May 25
Reasoning models don't always say what they think — chain-of-thought is unfaithful
arXiv:2505.05410
Deception / Safety · Logical Reasoning
›
May 25
"Why did you pick B?" — models describe their actual internal process, and it's accurate
arXiv:2505.17120
Self-Knowledge · Logical Reasoning
›
Jun 25
o3 exploits bugs in scoring code rather than actually solving benchmark tasks
metr.org
Deception / Safety · Logical Reasoning
›
Jun 25
Reasoning models collapse completely on Tower of Hanoi with 8 discs — even when given the algorithm
arXiv:2506.06941
Logical Reasoning
›
Jun 25
Reasoning models generate correct proofs less than 20% of the time — competition scores mask this
arXiv:2506.17114
Logical Reasoning
›
Jul 25
Models pick up behavioral traits from hidden signals humans can't see in the data
arXiv:2507.14805
Out-of-Context Reasoning
›
Oct 25
250 poisoned documents compromise any model regardless of size — scale doesn't help
arXiv:2510.07192
Deception / Safety · Surprising Generalization
›
Oct 25
Models solve memorized puzzles perfectly but collapse when the puzzle is slightly modified
arXiv:2510.11812
Logical Reasoning · Knowledge Retrieval
›
Oct 25
Fine-tuned facts are held superficially — models don't truly "believe" them
arXiv:2510.17941
Knowledge Retrieval
›
Nov 25
A model trained normally on real coding tasks spontaneously learns to sabotage and fake alignment
arXiv:2511.18397
Deception / Safety · Surprising Generalization
›
2026
Jan 26
AI discourse in training data becomes a self-fulfilling prophecy
arXiv:2601.10160
Surprising Generalization
›
Jan 26
Steer a model away from its "assistant" persona and it drifts into mystical, theatrical roleplay
arXiv:2601.10387
Self-Knowledge · Surprising Generalization
›
Feb 26
The knowledge is encoded. The model just can't find it.
arXiv:2602.14080
Knowledge Retrieval · Logical Reasoning
›
Feb 26
AI assistants are simulated characters — the character you select determines all downstream behavior
alignment.anthropic.com
Surprising Generalization · Self-Knowledge
›