Solana Data RPC Guide - Blocks, Tokens, Transfers, and More
Pat Doyle takes us through how to query Solana RPCs, with code comparisons to EVM RPCs.
This article was written by the amazing Pat Doyle - go follow him on Twitter!
If you want to be a guest author on Crypto Data Bytes, you can dm me.
In this article we will go through analyzing Solana data using only RPC methods. We will also have the corresponding analysis for Ethereum using only RPC methods. The goal here is to give data engineers and analysts coming from the Ethereum world a better understanding of how Solana data is structured and can be used in analysis.
The audience should have some technical experience and familiarity with analyzing Ethereum logs and traces.
Requirements:
Ability to run Python/Jupyter Notebooks
Ethereum & Solana Nodes Endpoints (I am using Alchemy, the free credits should be plenty to run the examples)
We will walk through 3 examples:
Getting the Latest Block
Analyzing Compute Units / Gas Fees
Analyzing Token Transfers
I have included additional resources at the bottom of the article if you are brand new to Solana and want to dig deeper.
👨💻 The code for all the examples can be found here.💡
Feel free to open an issue or PR if you have suggestions.
Get Current Block - Starting Simple
Why do we want this?
Let’s start with a really simple example. Getting the current block in a blockchain is always helpful. Why do I need this? Well, imagine I have a pipeline of data that is running on a schedule every hour. To avoid over processing data we would only want to process our latest state plus 1 block and the most recent block. This will allow our data consumption to move faster and our rpc request to stay lower.
Ethereum eth_blockNumber:
Solana getSlot:
Cool. That was easy. Not much to it. Let's keep doing more analysis at the block level.
Get Block Gas Fees & Compute Units
Why do we want this?
I want to understand how network demand is changing over time. Understanding the change in gas fees is a good way to get a view into this trend.
Ethereum uses the concept of gas while Solana uses compute units. They are slightly different but can be used to measure similar trends.
Ethereum
The ETH Block header data just gives us the value we need so parsing the data is actually pretty straight forward. We will use the eth_getBlockByNumber method and pass in the block number of interest in the params. For each block we will return the gas_used / gas_limit and plot those values, with the current gas limit is a value of 30,000,000
Solana
In the case of Solana - The compute units need to be aggregated for each transaction within the block. In the request, we set transactionDetails to True. This returns all the transaction data within the block itself. The transaction data contains all the compute units of interest. To do this we will use the method getBlock. This method has a parameter that allows us to request all the transaction data within the block. As mentioned - The compute units used are within the transaction data so we need to set this to true. Now we just iterate over each block, parse the computeUnitsUsed and group the values by block. To calculate the % of compute units used we divide our total used CUs by the max CUs per block of 48,000,000.
Get Tokens Transferred in a Block
Why do we want this?
Token transfers are at the core of every blockchain researcher’s tool belt. They give us insight into where assets are moving, how balances change over time, token issuance and much more.
Ethereum
Ethereum provides nice and straightforward methods to retrieve logs. Thankfully the ERC process has given us a set of standard contract events that allow us to simply and easily fetch this data. For this example we will get USDC transfers that occur within a 10 block range and plot them. First we begin by creating a function that will get the logs emitted by a contract. We use the eth_getLogs method and include the parameters in our payload: The transfer event signature, the beginning block, the end block and the contract address. Then we iterate through a block range, aggregate our data and plot it.
Solana
There are two different approaches you can take to pull the amount transferred per block from Solana. Each approach is slightly different and will yield different results for a few reasons. It is important to understand what you are analyzing and how these different approaches can be used. One important thing to note that is a key difference from the EVM is that all Solana tokens share the same token program (SPL Token Program). So in each instruction you will be able to see that a token transfer instruction will come from the SPL Token Program. Compare this to the EVM all the event data will come from the specific token contract itself.
Approach #1: Parsing Transfer with Instructions
Lets walk through getting the transfers of USDC per block first. Buckle up, this isn’t as straightforward as ETH but still doable. First, what we will do is request all transactions in a range of blocks. The function below will return all the transactions and transaction details in a block.
Once we get all the transactions we have to do quite a bit of data filtering and cleaning. A single transaction can have many instructions. Each instruction can then have an inner instruction. Transfer instructions can exist in both the Outer and Inner instruction. There are also different types of instructions that indicate tokens moving - these would be transfer, transferChecked, mintTo, mintToChecked, and Burn. We need to make sure we parse all of these instruction types.
For all the transactions we will need to filter for transfers from the SPL Program where one of the transfer types above is present.
Ok, great - progress. But there is one big problem. For standard `transfer` & `mintTo` types we actually do not get any information about the token. We need the token mint address, the decimal precision.
Below is an example of a transfer of `wSOL`
It tells us the sender, recipient and the amount transferred but we can see the token address is missing. Without this information our analysis will be incomplete. The reason for this is basically a detail of SPL Token Program/PDAs works. Remember, when you create a transfer of an asset, the program creates a Program Derived Account which actually holds the balance and maps back to your EOA wallet. If this is a bit confusing it might be worth revisiting how tokens work on Solana. Andrew Hong has another great write up on tokens here.
To get the information about the token what we actually can do is make an additional RPC call for each Source / Destination address in the transfer. We use the getAccountInfo method and call this function for each source/destination address in a transfer.
This function will return the Owner (or the EOA), the mint address (token address) and token decimal and amount they hold (we don’t need this but I thought it nice to show). I think of it similar to the balanceOf function on an ERC20 address. Below is an example response from the getAccountInfo method.
Finally, we arrive at a data frame with all the necessary information we need to start analysis. We can filter our dataframe for just USDC Transfers, sum the amount, group by block number and plot.
With this data we can start to do all sorts of token analysis. Getting to this point is not a linear path and requires multiple RPC calls. This is the best approach to getting transfers.
Approach #2: Parsing Transfers with Pre & Post Token Balance Changes
Each transaction includes details about the balances of accounts before and after a transaction. So a simple approach to calculate the total value that was transferred within a block is to calculate the balance change for each address and token involved in the transaction. We can then sum these up and group them. This approach will tell us the balance changes and thus we can back out the transferred amount. However, these balance changes are different from transfers since a wallet can send/receive the same amount of an asset in a transaction and will have a 0 balance change but there will be transfers in the instructions.
After we parse the balance changes we get an output data frame that looks like this:
We see the Owner, the amount of the token before and after the transaction, the change and the token address.
From here we can start to filter for our token of interest. In this case it will be USDC and then we will plot it.
Remember, this is the path of least resistance - However, what we are actually tracking is the total amount of an asset that changed wallets in a transaction. This transaction is a good example of why this technique is not 100% accurate for tracking transfers.
How do our 2 approaches compare?
Let’s put our two approaches side by side and you can see that the charts follow a similar trend.
We can see that in block 294291897, our transfer values do not line up. While the general trend remains consistent it is not a 1 to 1 approach. I wanted to show both techniques because I think it is important to explain that transfers & balance changes are not the same across the board.
What’s next?
I hope this was helpful as a baseline for getting some of the fundamental analysis going on Solana. In part 2, we will dive deeper into analyzing a specific protocol using just RPC methods and the program IDLs. If you have feedback, comments, questions please feel free to DM me on Twitter/X.
I am always open to feedback and ideas.
Additional Resources
A Guide to Solana for Ethereum Analysts by Andrew Hong is a great resource for learning terminology and concepts for Solana.
MEV Monitoring on Solana by Ghost
Solana Docs have a ton of great resources as well.