Day 4 (12 Days of Dune): Identifying MEV bot/arbitrage transactions in Uniswap v2
Spot the bot - they all use a similar pattern!
if you missed the original course announcement, read that first)
We’ll be answering questions and covering more beginner-focused concepts during office hours at 2pm EST, so join us in the #12-days-of-dune discord channel. You’ll need to earn the course NFT and connect to the Dune guild.
Thanks for reading Web3 Data Degens! Subscribe for free to receive new posts and support my work.
Welcome to day 4!
You can find the full explanation below (premiering with a live watch party at 9 am EST today).
Here are the main concepts:
Arbitrage/bot/MEV transactions usually call two or more DEX pools to balance the prices (buy from low price, sell to high price). These calls have to be bundled within a single transaction so that they can capture the profit all at once. This means they can’t directly call the pair (or router). They have to create a contract that can call multiple other contracts within one function. This means that MEV transactions likely start with the MEV contract being called.
So to find this contract, we take all swap events and join on “ethereum.transactions” to get the “to” field of the transaction. This gives us the contract that was interacted with first, which we call “contract_interacted” in the query.
However, this isn’t good enough yet - so we also query how many addresses have called this “contract_interacted”. The lower the number, the more likely it is that it’s a MEV contract. If it’s a high number, it might be something like the Uniswap Router02.
We gather all of our variables to scatter contract_interacted across, to explore if a view produces enough separation. Using an algorithm like PCA would be the easiest, but we can’t do that in SQL right now.
Based on my methodology, there is not a good enough separation of contract types to be able to say “yeah, all contracts that have < 300 interactions, > 1k swaps, and > 1m USD traded is an MEV bot”.
Because of that, we will use the MEV labels dataset instead. We bring this off-chain labeled dataset into Dune, sourcing from Etherscan. labels (thanks hildobby). All swaps that went through a MEV labeled contract_interacted becomes categorized as a “mev” swap.
Altogether, we get the below chart as the answer. The query is on Dune here.
Remember, if you found this question hard make sure to come to our office hours (2 PM EST) in our discord! Details are at the top of the email.
Addendum: MEV Explained More In Depth
This was a popular topic, so I explained MEV from more beginner levels and provided some resources in the video below. We also extend our query to include Flashbots MEV data.
Tomorrow, we’ll be covering the question:
Can you identify the top sources of swap volume (i.e. contract versus EOA) by looking at various level of traces?
As always, feel free to reach out to me here or on Twitter with any questions you might have.