Using AI to Detect Rug Pulls — 3 Case Studies

We picked 3 projects from the last 6 months that the market has already confirmed as rug pulls. We pretended we were sitting 1 week before the rug, fed only the public information that existed at that time into ChatGPT, Claude and Perplexity, and watched whether the models could call it without cheating. Result: 2 out of 3 caught. 1 missed. The one that got missed taught us more than the two that got caught.

Published 2026-04-25 By AI Trade Lab ~10 min read 2,000 words

Research ethics note: all 3 projects in this article have already been publicly identified as rug pulls or exit scams. This piece is a retrospective educational write-up. Project names, contract addresses, and team details are anonymized so we do not pile on users who are still holding bags. This is not an accusation against any project currently in the market.

1. How we ran this post-mortem #

Selection criteria: rug pulls or exit scams that happened in the last 6 months (2025-11 through 2026-04), publicly confirmed by Etherscan / Solscan / DEXTools or multiple media outlets. We screened 11 candidates and picked 3 representative ones:

Project A: DeFi protocol on Solana, ~$4.1M TVL at rug time
Project B: meme-coin + GameFi concept on Base, ~$9.7M market cap at rug time
Project C: infrastructure project on Ethereum mainnet, slow drain over 6+ weeks, ~$1.2M cumulative

Method: we assembled all public information available in the 7 days before the rug into a single research pack for each AI. That pack included:

Material type	Source	What we fed the AI
Contract info	Etherscan / Solscan	verified source / holder distribution / deployer address history
Team info	Website / LinkedIn / Twitter	team-page screenshots / lead figure's X activity over the last 30 days
Project docs	Whitepaper / GitBook	whitepaper PDF (fed to Claude)
On-chain activity	Dune Analytics / Nansen	30-day TVL / holders / top-10 concentration
Social sentiment	Twitter / Discord / Telegram	aggregated keyword summary (not raw posts)

Each AI ran 3 times per project, 9 outputs total per case, and we took the consensus. Afterwards we matched it against what actually happened.

2. Case A: caught — "ghost-deployer team" #

Project A was a DeFi protocol on Solana with about $4.1M TVL at rug time. Claude flagged it high-risk in 8 of 9 runs. The reasoning hit three points:

Point 1: deployer address history. We fed Claude the deployer address and asked it to analyze whether that address had deployed any other contracts in the previous 6 months. Claude pulled Solscan data and immediately flagged that the address had deployed 4 other contracts in the past 6 months — 2 of which had already been abandoned with the owner pulling liquidity. This is what AI is genuinely good at: pattern matching across similar address histories.

Point 2: reverse-checking LinkedIn. The project website listed 5 team members, each with a LinkedIn link. Perplexity went out and scraped the 5 public profiles. It found that 3 of them had an unmistakable synthetic-profile pattern — fewer than 30 connections, profile creation under 4 months old, every endorsement was a copy-pasted English phrase. Perplexity called it "likely synthetic identity team page" outright. A human doing this cross-check would burn 1-2 hours. The AI answered in 20 minutes.

Point 3: TVL growth vs holder growth divergence. Dune data showed TVL +180% over 30 days, but unique holder count was only +24% over the same window. ChatGPT cut straight to it: "TVL growth is dominated by a handful of large addresses, not organic adoption. If they exit, you get a cascade."

AI	9-run verdict	Strongest signal
Claude 4.5 Sonnet	High ×8 / Medium ×1	deployer history
ChatGPT GPT-4o	High ×7 / Medium ×2	TVL vs holder divergence
Perplexity Pro	High ×9	LinkedIn synthetic identities

Project A actually rugged 5 days after our simulated time window. In this case AI was genuinely useful — give it enough material and it parallel-processes 5-6 due-diligence checks in 30 minutes that a human would spend a full afternoon on.

3. Case B: caught — "copy-paste contract + fake team" #

Project B branded itself as "GameFi + meme" on Base and pumped to around $9.7M market cap. The core AI signal here was contract plagiarism.

We dropped the verified source code into ChatGPT Code Interpreter and asked it to run a similarity comparison against 5 known open-source token contracts. Result: 96% similarity to the source of a confirmed 2024 rug project. The only difference was renamed variables and event names. 96% similarity by itself is not proof of a rug, but combined with "new-address deployer + fully anonymous team + Twitter account registered 3 weeks ago", the AI scored it 9/10 risk.

The other thing AI nailed here: the whitepaper. We fed the 38-page whitepaper to Claude (one shot, 200K context) and Claude flagged 4 hard problems:

The Q2 roadmap claimed "partnership with X" — X had never acknowledged any partnership on Twitter.
"Audited by CertiK" — but CertiK's public database has no matching record.
The tokenomics chapter claims "team + advisors + private sale" sums to 67%, but the pie chart in the same document labels it 35%.
A co-founder bio cited "former Coinbase engineer" — no GitHub commit history under that name at Coinbase's public repos.

Item 2 is the killer. "Claims to be audited by CertiK but cannot be found in CertiK's database" shows up in rug projects at an absurdly high frequency. The moment AI sees this, it pegs the risk at 9/10. This kind of "what the project claims vs what the public record says" cross-check is where AI is structurally strong — it has no emotional stake and is not seduced by the project's marketing.

4. Case C: missed — the slow rug #

This is the most valuable section of this article. Project C was an infrastructure project on Ethereum mainnet. The rug played out over 6+ weeks, draining a total of ~$1.2M. All 9 AI runs scored it "low" or "medium" risk.

Why did the AI miss it? Four reasons, in hindsight:

Reason 1: the contract was genuinely audited. Project C had paid a second-tier audit firm for a real audit. The report was on GitHub and queryable in CertiK's database. Once the AI sees "audited + report verifiable" it scores the contract-level rug risk low. But contract-level safety does not stop the team from slowly draining the treasury wallet via multisig — and that move is invisible at the contract layer.

Reason 2: the team was real-name, with real LinkedIn profiles. All 4 Project C team members used real names. Their LinkedIn profiles were created before 2016, had 500+ connections, with previous employers including Coinbase and Consensys. The AI scored "team verifiable" as a positive. But real-name teams can rug too — they just use language like "strategic pivot" and bleed the treasury over weeks instead of one violent withdrawal.

Reason 3: the rug signal lived in the governance forum, not on-chain. In our retrospective dig through the Discord governance channel, we found that starting 4 weeks before the rug the team had been seeding RFC posts hinting at "project direction adjustments". Those signals were text, buried in 200+ governance posts. The "social sentiment" we had fed the AI was only an aggregated keyword summary — not raw discussion text. If we had given it the raw text, the AI might have caught second-order signals like "governance channel discussion density spiking + team response delays".

Reason 4: a slow rug has no breakpoint. AI (and most rug-detection tooling in general) leans on "anomaly events" as signals — TVL crashes, large transfers, sudden liquidity removal. A slow rug pulls $10K-$20K a day for 6 weeks. No single day looks unusual. This is where rugs are evolving: away from violent one-shot exits, toward boiling the frog.

Failure cause	Signal class affected	Fixable next time?
Genuine audit	Contract-layer signals invalidated	Hard — structural
Real-name team	Identity-layer signals invalidated	Hard — structural
No raw forum text fed in	Second-order signals missing	Yes — feed Discord/forum raw text
Slow drain, no anomaly	Anomaly detection invalidated	Needs time-series tooling

Conclusion from Case C: AI is good against old-school rugs (ghost deployer + copy-paste contract + fake identity). It is near-useless against slow rugs. That single lesson reshaped how we do due diligence afterwards.

5. The 5 signal types AI is actually good at #

Combining the wins and the loss across all 3 cases, AI has a clear capability boundary on rug detection. These 5 signal categories are where it genuinely helps:

Deployer address history: what other contracts that address has deployed in the past 6-12 months and how they ended. This is the largest speed advantage AI has over a human — a human does this once in an hour, AI does it in minutes.
Contract source-code similarity: especially similarity against known rug contracts. ChatGPT Code Interpreter or Claude handle this well.
Synthetic identity verification: cross-checking LinkedIn / GitHub / posting history. Perplexity with web access is fastest here.
Whitepaper claims vs public record: do the claimed auditors / investors / partners actually appear in their respective public databases? Claude is strongest on long-document reading.
On-chain metric divergence: do TVL, holder count, and volume tell a consistent story? Dune data piped into ChatGPT or Claude.

What AI almost cannot do: smell the boiling-frog rug. Slow drains, governance manipulation, long-term value dilution — these require humans who have been tracking a project's culture over time.

6. The prompt template we use #

This template now runs against every new project we look at. 30 minutes of work blocks the majority of old-school rugs:

You are a strict crypto project due-diligence analyst. Based on the materials below, give me a rug-pull risk assessment (0-10 score).

Materials:
[1] Contract deployer address: {ADDRESS}
[2] Contract verified source: {SOURCE_CODE}
[3] Team page LinkedIn link list: {LINKS}
[4] Whitepaper PDF: {ATTACHED}
[5] Last 30 days of on-chain metrics: TVL={X}, Holders={Y}, Top10 concentration={Z}%
[6] Project's claimed auditor: {AUDITOR}

Requirements:
1. For every red-flag signal, give "evidence + rating (low/medium/high)". No hand-waving.
2. Do not predict whether it will rug. Describe the current risk structure only.
3. If the material is insufficient to judge an item, say "insufficient information".
4. End with a 0-10 score, plus the 3 strongest "do not invest" reasons (or write "no clear red flags" if there are none).
5. List 3 things you cannot see (your own blind spots).

That last item — "list 3 things you cannot see" — we added later. Forcing the AI to admit its limits makes the output more trustworthy, not less. When the AI explicitly says "I cannot see internal Discord discussions", we know that is the part a human has to cover manually.

AI does not block every rug. But it blocks about 70% of the low-effort ones, which frees your actual brainpower for deep tracking of the projects that look genuinely legit.

Open Binance Research → See the full Prompt Library →

— AI Trade Lab, 2026-04-25

Research disclosure: the 3 case studies are anonymized post-mortems of real events. Some details have been merged across cases for narrative clarity. Nothing here accuses any currently-trading project; the analysis is limited to events the market has already confirmed as rug pulls. Not investment advice. This page contains affiliate referral links (Binance, with rel="sponsored"). If you register through them we may receive a commission. It adds no extra cost to you. Full disclosure →