SAP Β· RL Β· RESEARCH LOG

Research
Log

Documenting a reinforcement learning agent's journey through Super Auto Pets. Each entry is an independent run β€” different strategies, pets, hyperparameters.

Runs logged1
Best win rate85.99%
AlgorithmMaskablePPO
#0012026-05-27

Baseline Run β€” Fish-Core Strategy

First full training run to establish a baseline. Agent converged on a Fish-centric core with Ant, Beaver, and Mosquito support. No explicit strategy guidance β€” emergent behaviour only.

baselinetier-1fish-core3M-stepsself-play
Win Rate85.99%
Avg Rounds16.77
AlgorithmMaskablePPO
Steps~3M

Results Overview

What the agent found

Overall Win Rate0.00%+76pp vs random (~10%)

The agent won 86 out of every 100 games β€” a staggering gap over random play.

Mean Rounds Survived0.00+11.8 vs random baseline

Games stretch further when the agent plays β€” it builds teams that last.

Most Drafted Pet0,019Fish β€” drafted in 91k games

The agent discovered Fish as the cornerstone of almost every winning team.

Top Solo Win Rate0.00%Giraffe β€” highest individual rate

Among pets with sufficient data, Giraffe posted the strongest individual win rate.

After approximately three million training steps, the agent settled into a remarkably consistent strategy: anchor every team around Fish, stack early-tier support pets like Ant, Beaver, and Mosquito, and lean on Horse to amplify burst damage. Rather than discovering an obscure metagame, it converged on the same core units most experienced human players identify β€” which is a kind of validation in itself.

The 85.99% figure is not a cherry-picked run. It is the average across the entire evaluation set, spanning games that reached both early knockout and deep late-game rounds. The agent does not simply fast-win β€” it scales. Performance actually rises through the mid-game before the difficulty spikes of higher tiers begin to tell, peaking at 90.7% around round 17.

The outliers are telling too. Giraffe posts a 90.2% solo win rate yet is picked far less often than Fish or Ant β€” the agent learned that Giraffe is powerful only in the right context, not a blind auto-include. Scorpion and Dog sit near the bottom of the tier list despite theoretical strength; the agent simply never found reliable windows to leverage them.


Agent-Derived Tier List

Pet Tier List

Tiers derived from agent win-rate data: S β‰₯ 85%, A 80–84.9%, B 75–79.9%, C 65–74.9%, D < 65%. Pets with fewer than 2,000 picks are flagged as low-sample.

S
πŸ¦’giraffe90.16%
Picks: 7,465
Win rate: 90.16%
🐜ant86.77%
Picks: 66,428
Win rate: 86.77%
A
🦫beaver83.80%
Picks: 52,992
Win rate: 83.80%
🐟fish83.25%
Picks: 91,019
Win rate: 83.25%
🦟mosquito83.22%
Picks: 48,084
Win rate: 83.22%
🦦otter82.38%
Picks: 49,448
Win rate: 82.38%
πŸ¦†duck81.93%
Picks: 20,540
Win rate: 81.93%
🐴horse81.59%
Picks: 50,321
Win rate: 81.59%
🐷pig81.26%
Picks: 49,859
Win rate: 81.26%
B
πŸ¦—cricket79.83%
Picks: 14,548
Win rate: 79.83%
πŸ†leopard77.44%⚠
Picks: 770
Win rate: 77.44%
⚠ Low sample (<2k picks)
🐒turtle76.37%
Picks: 10,775
Win rate: 76.37%
πŸ•·οΈspider75.96%
Picks: 11,595
Win rate: 75.96%
πŸ€rat75.85%
Picks: 12,527
Win rate: 75.85%
πŸ¦€crab75.83%
Picks: 11,587
Win rate: 75.83%
πŸ¦”hedgehog75.79%
Picks: 12,512
Win rate: 75.79%
🦩flamingo75.71%
Picks: 7,816
Win rate: 75.71%
🦐shrimp75.55%
Picks: 11,819
Win rate: 75.55%
πŸ‰dragon75.14%⚠
Picks: 1,171
Win rate: 75.14%
⚠ Low sample (<2k picks)
πŸ¦₯sloth75.00%⚠
Picks: 4
Win rate: 75.00%
⚠ Low sample (<2k picks)
C
🦍gorilla73.80%⚠
Picks: 811
Win rate: 73.80%
⚠ Low sample (<2k picks)
🦀dodo73.55%
Picks: 4,919
Win rate: 73.55%
🦒swan73.27%
Picks: 13,747
Win rate: 73.27%
🦚peacock72.15%
Picks: 12,563
Win rate: 72.15%
🐑blowfish71.41%
Picks: 5,151
Win rate: 71.41%
πŸ’monkey70.88%
Picks: 2,132
Win rate: 70.88%
πŸ—boar70.54%⚠
Picks: 711
Win rate: 70.54%
⚠ Low sample (<2k picks)
πŸͺ°fly67.89%⚠
Picks: 881
Win rate: 67.89%
⚠ Low sample (<2k picks)
🐱cat67.24%⚠
Picks: 450
Win rate: 67.24%
⚠ Low sample (<2k picks)
πŸ¦…eagle67.02%⚠
Picks: 1,405
Win rate: 67.02%
⚠ Low sample (<2k picks)
πŸ‘sheep66.64%
Picks: 3,995
Win rate: 66.64%
πŸ‡rabbit65.92%
Picks: 3,647
Win rate: 65.92%
🦘kangaroo65.58%
Picks: 4,001
Win rate: 65.58%
🐊crocodile65.33%⚠
Picks: 1,284
Win rate: 65.33%
⚠ Low sample (<2k picks)
πŸ‹whale65.30%
Picks: 4,833
Win rate: 65.30%
πŸ¦›hippo65.29%
Picks: 2,436
Win rate: 65.29%
D
πŸͺcamel64.12%
Picks: 2,402
Win rate: 64.12%
🦬bison63.67%⚠
Picks: 737
Win rate: 63.67%
⚠ Low sample (<2k picks)
🦑badger63.53%
Picks: 4,662
Win rate: 63.53%
🦈shark63.41%⚠
Picks: 1,234
Win rate: 63.41%
⚠ Low sample (<2k picks)
πŸ‚ox62.73%
Picks: 5,095
Win rate: 62.73%
🦌deer61.06%
Picks: 2,469
Win rate: 61.06%
πŸ•dog58.39%⚠
Picks: 878
Win rate: 58.39%
⚠ Low sample (<2k picks)
🐧penguin56.32%
Picks: 2,346
Win rate: 56.32%
πŸ¦‚scorpion55.06%⚠
Picks: 1,447
Win rate: 55.06%
⚠ Low sample (<2k picks)
πŸ„cow51.60%⚠
Picks: 819
Win rate: 51.60%
⚠ Low sample (<2k picks)

Synergy Explorer

Which pets work together?

Heatmap shows co-occurrence counts for the top 10 most-drafted pets. Click a cell to see pair details.

🐟
🐜
🦫
🐴
🐷
🦟
🦦
πŸ¦†
🦒
πŸ¦”
🐟fish
🐜ant
🦫beaver
🐴horse
🐷pig
🦟mosquito
🦦otter
πŸ¦†duck
🦒swan
πŸ¦”hedgehog

Top 5 Synergy Pairs

All 20 synergy pairs


Performance

Win Rate by Round

How does agent performance evolve β€” and eventually degrade β€” as games progress into higher tiers?

Round 1790.7%Peak performance β€” agent scales into mid-game
Rounds 12–17↑ RisingAgent consistently improves through early-mid game
Rounds 74+~50%Sparse data β€” statistical noise, treat with caution

The performance arc tells a compelling story about how Super Auto Pets actually works at depth. In the early rounds, every game begins at the same baseline β€” the agent's 85.99% average. But as games progress into rounds 12–17, something interesting happens: performance climbs rather than declines. The agent's preferred team compositions β€” Fish-centric scaling builds with Ant and Beaver support β€” are precisely the teams that get stronger as they accumulate buffs across multiple turns.

The decline that follows round 17 is not a failure of the agent β€” it reflects the deliberate difficulty design of Super Auto Pets. Tier unlocks introduce opponents with fundamentally different power levels and combat mechanics. The agent, trained on early-tier patterns, runs out of effective adaptation as the late-game tier 5 and 6 units begin appearing on enemy teams.


Pet Spotlight

Explore every pet

Hover to flip β€” click for full stats.

πŸ¦’S
giraffe
90.16%
Agent's read

Reliable performer β€” agent drafted this 7,465 times and won 90.16% of those games.

🐜S
ant
86.77%
Agent's read

Core pick β€” drafted in 66,428k games with 86.77% win rate. Essential early-game cornerstone.

🦫A
beaver
83.80%
Agent's read

Reliable performer β€” agent drafted this 52,992 times and won 83.80% of those games.

🐟A
fish
83.25%
Agent's read

Reliable performer β€” agent drafted this 91,019 times and won 83.25% of those games.

🦟A
mosquito
83.22%
Agent's read

Reliable performer β€” agent drafted this 48,084 times and won 83.22% of those games.

🦦A
otter
82.38%
Agent's read

Reliable performer β€” agent drafted this 49,448 times and won 82.38% of those games.

πŸ¦†A
duck
81.93%
Agent's read

Reliable performer β€” agent drafted this 20,540 times and won 81.93% of those games.

🐴A
horse
81.59%
Agent's read

Reliable performer β€” agent drafted this 50,321 times and won 81.59% of those games.

🐷A
pig
81.26%
Agent's read

Reliable performer β€” agent drafted this 49,859 times and won 81.26% of those games.

πŸ¦—B
cricket
79.83%
Agent's read

Solid support unit. The agent found consistent value here across 14,548 picks.

πŸ†B
leopard
⚠ low sample
77.44%
Agent's read

Solid support unit. The agent found consistent value here across 770 picks.

🐒B
turtle
76.37%
Agent's read

Solid support unit. The agent found consistent value here across 10,775 picks.

πŸ•·οΈB
spider
75.96%
Agent's read

Solid support unit. The agent found consistent value here across 11,595 picks.

πŸ€B
rat
75.85%
Agent's read

Solid support unit. The agent found consistent value here across 12,527 picks.

πŸ¦€B
crab
75.83%
Agent's read

Solid support unit. The agent found consistent value here across 11,587 picks.

πŸ¦”B
hedgehog
75.79%
Agent's read

Solid support unit. The agent found consistent value here across 12,512 picks.

🦩B
flamingo
75.71%
Agent's read

Solid support unit. The agent found consistent value here across 7,816 picks.

🦐B
shrimp
75.55%
Agent's read

Solid support unit. The agent found consistent value here across 11,819 picks.

πŸ‰B
dragon
⚠ low sample
75.14%
Agent's read

Solid support unit. The agent found consistent value here across 1,171 picks.

πŸ¦₯B
sloth
⚠ low sample
75.00%
Agent's read

Solid support unit. The agent found consistent value here across 4 picks.

🦍C
gorilla
⚠ low sample
73.80%
Agent's read

Situational pick β€” provides niche upside but the agent only committed 811 times.

🦀C
dodo
73.55%
Agent's read

Situational pick β€” provides niche upside but the agent only committed 4,919 times.

🦒C
swan
73.27%
Agent's read

Situational pick β€” provides niche upside but the agent only committed 13,747 times.

🦚C
peacock
72.15%
Agent's read

Situational pick β€” provides niche upside but the agent only committed 12,563 times.

🐑C
blowfish
71.41%
Agent's read

Situational pick β€” provides niche upside but the agent only committed 5,151 times.

πŸ’C
monkey
70.88%
Agent's read

Situational pick β€” provides niche upside but the agent only committed 2,132 times.

πŸ—C
boar
⚠ low sample
70.54%
Agent's read

Situational pick β€” provides niche upside but the agent only committed 711 times.

πŸͺ°C
fly
⚠ low sample
67.89%
Agent's read

Situational pick β€” provides niche upside but the agent only committed 881 times.

🐱C
cat
⚠ low sample
67.24%
Agent's read

Situational pick β€” provides niche upside but the agent only committed 450 times.

πŸ¦…C
eagle
⚠ low sample
67.02%
Agent's read

Situational pick β€” provides niche upside but the agent only committed 1,405 times.

πŸ‘C
sheep
66.64%
Agent's read

Situational pick β€” provides niche upside but the agent only committed 3,995 times.

πŸ‡C
rabbit
65.92%
Agent's read

Situational pick β€” provides niche upside but the agent only committed 3,647 times.

🦘C
kangaroo
65.58%
Agent's read

Situational pick β€” provides niche upside but the agent only committed 4,001 times.

🐊C
crocodile
⚠ low sample
65.33%
Agent's read

Situational pick β€” provides niche upside but the agent only committed 1,284 times.

πŸ‹C
whale
65.30%
Agent's read

Situational pick β€” provides niche upside but the agent only committed 4,833 times.

πŸ¦›C
hippo
65.29%
Agent's read

Situational pick β€” provides niche upside but the agent only committed 2,436 times.

πŸͺD
camel
64.12%
Agent's read

Rarely picked β€” agent found limited value despite some opportunities.

🦬D
bison
⚠ low sample
63.67%
Agent's read

Rarely picked β€” agent found limited value despite very few opportunities. Low sample warning.

🦑D
badger
63.53%
Agent's read

Rarely picked β€” agent found limited value despite some opportunities.

🦈D
shark
⚠ low sample
63.41%
Agent's read

Rarely picked β€” agent found limited value despite some opportunities. Low sample warning.

πŸ‚D
ox
62.73%
Agent's read

Situational pick β€” provides niche upside but the agent only committed 5,095 times.

🦌D
deer
61.06%
Agent's read

Rarely picked β€” agent found limited value despite some opportunities.

πŸ•D
dog
⚠ low sample
58.39%
Agent's read

Rarely picked β€” agent found limited value despite very few opportunities. Low sample warning.

🐧D
penguin
56.32%
Agent's read

Rarely picked β€” agent found limited value despite some opportunities.

πŸ¦‚D
scorpion
⚠ low sample
55.06%
Agent's read

Rarely picked β€” agent found limited value despite some opportunities. Low sample warning.

πŸ„D
cow
⚠ low sample
51.60%
Agent's read

Rarely picked β€” agent found limited value despite very few opportunities. Low sample warning.


Training

Learning curves

The agent started knowing nothing. After ~3M environment steps, it reached 85.99% win rate. Charts show representative mock training curves matching the final performance β€” live data exported from TensorBoard.

Episode Reward over Steps
Win Rate over Steps
AlgorithmMaskablePPO
PolicyMLP
Steps~3M
Win Rate85.99%

About

Methodology

This project trained a reinforcement learning agent from scratch to play Super Auto Pets Pack 1, using MaskablePPO from Stable-Baselines3 with a custom Gymnasium environment. Training ran for ~3 million environment steps over a three-month project. The agent reached an 85.99% win rate β€” a 76 percentage-point improvement over the random baseline.