AI Crypto Tax Workflow — 5-Step Pipeline to Reconcile Trades, Wallets, and Cost Basis
Tax season opens with the "Export History" button on every exchange you have ever touched, and that is where most crypto users lose the next several hours of their life. Field names disagree, time zones drift, cross-chain swaps land as two transactions, staking rewards live in a separate CSV nobody told you about — and this is exactly the kind of grunt work AI is good at. Below is the full 5-step pipeline. It will not tell you how to file. It will tell you how to get your data into a state your accountant can actually use.
1. Why Crypto Taxes Are Actually Hard #
Most people assume the hard part is the capital-gains formula. The formula is trivial (sale price − cost basis = gain/loss). The hard part is recovering the cost basis in the first place. A typical user's data reality looks like this:
- 3–6 exchanges: Binance, OKX, Coinbase, Kraken, Bybit and friends — each one exports a different CSV schema.
- 2–4 wallets: MetaMask, Phantom, Ledger — you have to pull tx history from a block explorer for each address.
- Cross-exchange / cross-chain transfers: BTC leaves Binance, goes to MetaMask, then to Phantom. That sequence must be classified as "internal transfer," not "sell + buy."
- DeFi activity: liquidity provision, staking, airdrops, wrapping. None of these look like what they actually are in a raw CSV row.
- Defunct exchanges: FTX, Celsius records may only exist in old emails.
The 5-step pipeline below does not let AI "file your taxes for you." It turns the pile above into one clean table. You then hand that table to a tax professional, or to Koinly / CoinTracker / TokenTax, and they take it from there.
2. Step 1: Export the Raw Data #
This step is manual — AI cannot help here. But it has to be complete, or everything downstream is built on incomplete data and will be silently wrong.
| Source | Export Path | Format | Commonly Missed Fields |
|---|---|---|---|
| Binance | Tax → Generate Tax Report → choose year | CSV | Internal sub-account transfers, Launchpool rewards |
| OKX | Assets → Bills → Export | CSV / XLSX | Realized vs unrealized P&L on perpetuals |
| Coinbase | Reports → Tax Reports | CSV / Form 8949 | Legacy Coinbase Pro data must be exported separately |
| MetaMask | Export per address via Etherscan / Arbiscan / etc. | CSV | Internal swap steps, L2 → L1 withdrawals |
| Phantom | Pull by address from Solana Explorer or Solscan | Manual cleanup | SPL token metadata |
| Hardware wallets | Pull from each chain's explorer by address | Same as above | Multi-chain wallets require per-chain exports |
Recommendation: create a folder 2025-tax/raw/, one CSV per source, filenames containing year + source. All AI processing starts from this directory. Never let AI connect directly to an exchange.
3. Step 2: Use AI to Unify the Schema #
Every exchange names its columns differently. Binance uses UTC_Time, Coinbase uses Timestamp, OKX uses billCreateTime. Have ChatGPT or Claude write a single normalization script.
Prompt template (GPT-4o + Code Interpreter):
Attached are 3 CSVs exported from different exchanges: binance.csv / okx.csv / coinbase.csv.
Write a Python script that merges them into a unified file unified.csv with exactly these columns:
- timestamp_utc (ISO 8601)
- source (binance / okx / coinbase / wallet_)
- type (buy / sell / convert / transfer_in / transfer_out / staking_reward / airdrop / fee / unknown)
- asset_in (BTC / ETH / USDT ...)
- amount_in (float)
- asset_out
- amount_out
- fee_asset
- fee_amount
- usd_price_at_time (leave empty — filled in the next step)
- raw_tx_id (the original record's ID)
- raw_note (preserve the original CSV's note/memo field verbatim)
Requirements:
1. Auto-detect the time column, amount column, and type column in each CSV. Do NOT assume the column names are the same as any other source.
2. When you cannot confidently map a type, write "unknown". Do NOT guess.
3. Convert all timestamps to UTC.
4. Output a mapping_report.md describing, for each input CSV, which columns mapped to which unified field and which columns were dropped.
The attachments contain the real data. Base the mapping on the actual column names in the files. Do NOT rely on what you "think" the format usually looks like.
The last line is the load-bearing one. GPT-4o will happily hand you a "typical Binance CSV layout" from its training memory — but Binance has changed its export schema at least twice. Anchor the mapping on the actual attached file, not on training-data recollection.
4. Step 3: Classify Transaction Types #
The previous step left a pile of type=unknown rows. Step 3 has AI classify them. This is the step most likely to go wrong — every low-confidence row must be sample-checked by hand.
Prompt template:
Below are all rows from unified.csv with type=unknown (N total).
For each row, classify the actual type into one of these 9 categories:
- buy / sell / convert / transfer_in / transfer_out
- staking_reward / airdrop / mining_reward / interest
- fee_only
Signals you can use:
- Keywords in raw_note ("Distribution" / "Stake Earn" / "Launchpool" / etc.)
- Whether both sides involve the same asset (same asset, different account = transfer)
- Whether the amount is tiny (< $0.5 may be dust or a fee)
- Whether the source is a wallet (an on-chain tx might be swap / mint / claim)
For each row, output:
- tx_id
- recommended type
- confidence (high / medium / low)
- your reasoning
At the end, list every row with confidence=low separately — I will review those by hand.
Do NOT force-fit a classification. If you are unsure, write "low" and state exactly what information is missing.
"Force-fitting" is the model's default behavior. The explicit instruction to mark uncertain rows as low is the only thing that prevents it from inventing a plausible-sounding category. On the real samples I have run, the low-confidence bucket is usually 5–15% of unknowns. Every single one of those rows needs human review.
5. Step 4: Compute Cost Basis #
With types cleaned up, run the cost-basis calculation. This step needs historical pricing — many rows do not have a USD price attached, and you cannot trust the AI's "memory" of historical prices. Pull from CoinGecko's history API instead.
Prompt template (Code Interpreter required):
The attached unified.csv now has types classified.
Please:
1. For each buy / sell / convert / staking_reward / airdrop, fill in usd_price_at_time.
- If the original record already has a USD amount, convert directly.
- Otherwise, use the CoinGecko historical price API (you may fetch over the network — note the number of API calls and any rate limits).
- If no public price exists for that asset on that date, mark "N/A". Do NOT fabricate a price.
2. Compute cost basis for each sale using FIFO:
- For each asset, maintain a queue of "unsold lots" in first-in-first-out order.
- For each sale, draw down from the front of the queue and compute cost_basis accordingly.
- For staking_reward / airdrop / mining_reward, cost_basis = USD value at the time of receipt.
3. Run the same calculation a second time using LIFO, for comparison.
Output three files:
- realized_pnl_fifo.csv — realized gain/loss per sale under FIFO
- realized_pnl_lifo.csv — same, under LIFO
- summary.md — total taxable amount comparison FIFO vs LIFO, plus a data-completeness report
At the end, tell me explicitly: which assets or time ranges have obvious data gaps or low-confidence pricing.
In most jurisdictions, FIFO is the default or the only allowed method. The LIFO comparison is informational — if the spread is large (more than 30%), it means your lot structure matters a lot, and that is a strong reason to have a real tax professional handle the final filing rather than ship a spreadsheet. Important: the US allows specific-identification (which can be more advantageous than either FIFO or LIFO) under conditions described on IRS Form 8949. This is an example, not a recommendation. Talk to a US CPA.
6. Step 5: Produce the Working Paper #
Final step: package everything into a single working paper you can hand to your tax professional. Either ChatGPT or Claude works here.
Prompt template:
Based on unified.csv and realized_pnl_fifo.csv, generate a tax-year working paper in Markdown:
# [Year] Crypto Tax Working Paper
## 1. Data Sources and Date Range
- List of sources + record count per source
- Date range covered
- Known data gaps (which sources were not exported and why)
## 2. Asset Holdings Movement
- Opening balance per asset
- Closing balance per asset
- Max and min holdings during the year
## 3. Realized Gains/Losses (FIFO)
- Realized P&L table, grouped by asset
- Cumulative realized P&L by month
## 4. Income-Type Items
- Total staking rewards
- Total airdrops received
- All income items denominated in USD
## 5. Data Confidence and Known Issues
- Number of low-confidence classifications (count of rows with confidence=low)
- Number of records missing a USD price
- Status of internal-transfer detection
## 6. Notes for the Tax Professional
- Brief description of the AI workflow used
- Which fields were inferred by AI (e.g., transaction types)
- Sections the tax professional should re-verify first
Conclude with the explicit statement: "This document was prepared by the user with the help of AI tools. The final filed numbers must be reviewed by a licensed tax professional."
This working paper is what your tax professional actually wants. They will not re-do the reconciliation work (you just did it); they will verify, adapt to local law, and prepare the filing. Reconciliation usually consumes around 70% of a crypto tax engagement — that is the part AI just took off the table.
7. Six Pitfalls That Burn People #
Pitfall 1: Cross-exchange transfers booked as "sell + buy." Moving BTC from Binance to MetaMask is not a taxable event, but the two CSVs make it look like two trades. Step 3's type=transfer classification must be sample-checked by hand.
Pitfall 2: DeFi swap internals. A single Uniswap USDC → ETH swap can show up on Etherscan as 4–5 internal transactions. AI will happily treat each internal tx as a separate event when it should be one swap. The prompt must explicitly say "collapse all internal transactions sharing the same tx_hash into a single record."
Pitfall 3: Airdrop pricing at receipt. Many airdrops hit your wallet before a public market exists (the first 24 hours often have no listing), and CoinGecko's price is lagging or zero. In that case, mark the row "FMV unavailable" and leave the call to your tax professional — do not let AI invent a number.
Pitfall 4: Auto-compounding staking. Daily auto-compounded staking rewards can produce hundreds of tiny 0.0001 ETH entries in a year. AI will dutifully line-item every one of them — correct but tedious. Ask it to aggregate by day instead: "all stake rewards on YYYY-MM-DD combined into one entry per asset."
Pitfall 5: Pasting KYC PII to the AI. Binance tax reports sometimes contain your email, UID, and IP. Redact before sending. In OpenAI Settings, disable "Improve the model for everyone." Enterprise API access does not train on customer data by default — for large accounts, prefer API over web chat.
Pitfall 6: Trusting AI-generated prices. If the model tells you "ETH on 2024-03-15 was $3,825," that is an approximation from training data — not the actual close. Every price must come from CoinGecko / CryptoCompare historical APIs, fetched explicitly (in code-interpreter mode) or scraped by you and pasted in. AI hallucinates tax numbers. Verify everything.
Export Binance Tax Report → Full Prompt Library →
8. FAQ #
Q1: Can AI file my taxes for me?
No. AI is useful for data reconciliation and preliminary cost-basis math. Final taxable determinations, deductions, and loss carryforwards must be reviewed by a licensed tax professional in your jurisdiction. The output of this workflow is a reconciled working paper, not a return.
Q2: Is it safe to use ChatGPT to organize tax data?
If raw CSVs contain KYC-linked information (email, address, IP), redact first. In OpenAI Settings you can disable "Improve the model for everyone." For large accounts, use API access with a self-hosted toolchain rather than a web chat.
Q3: Is FIFO or LIFO better?
It depends on your jurisdiction's law and your lot structure. Many jurisdictions mandate FIFO; a few allow LIFO or specific-lot identification (US Form 8949 permits specific-ID under conditions — this is an example, not advice). AI can compute several methods in parallel for comparison, but which one you can legally use is a question for a licensed tax professional.
Q4: Versus Koinly / CoinTracker / TokenTax, is DIY-with-AI worth it?
It depends on volume. Under 200 transactions: DIY-with-AI is cheaper (one ChatGPT Plus session is plenty). 200–2,000 transactions: use AI for the front-end cleanup, then import into one of those tools for the final tax math. Over 2,000 transactions: pay for a professional tool, and use AI only for ad hoc lookups.
Q5: How do loss carryforwards from prior years work?
This is where AI is most likely to make a quiet mistake — it has no idea what your unused losses from last year are. Paste last year's working paper in alongside this year's: "Attachment A is 2024 realized P&L, Attachment B is 2025 — compute the carryforward of unused 2024 losses into 2025." Then have your tax professional verify the carryforward figure, because the rules around what carries and how vary by jurisdiction.
rel="sponsored").
Registering through these links may pay us a commission, at no extra cost to you.
This article is not tax, legal, or investment advice.
Crypto tax law varies enormously by jurisdiction and changes frequently. Your final filing must be handled by a licensed tax professional in your jurisdiction.
Full disclosure →
PromptDeck · 2026-05-15