Solana Analytics Starter Guide (Part 1): Solana Instructions
Understanding instructions on Solana, table schemas, and walking through DEX basics with Whirlpool AMM.
Want to join an onchain data community? Iβve created an NFT passport and quest system that any analyst can participate in.
π Learn about the Bytexplorers, then mint your passport and join in
This is the first of a four part series covering basic Solana analysis.
π§ Coming from an Ethereum background? Start with this guide translating EVM concepts to their Solana counterparts.
π More of a video learner? Check out this one hour video guide.
By the end of the four guides you should be able to understand the core concepts of any of the covered queries. My DMs on Twitter are always open for questions or ideas.
Basic SQL knowledge will be helpful. If youβre unfamiliar, go through this tutorial.
Table of Contents:
Whirlpool DEX basics and initialization
Breaking down a single swap and working with inner instructions
The goal will be to enable you to calculate βTotal Volumeβ of the SOL-USDC 0.05% fee pool (highlighted below) for direct swaps.
Solana Tables (Raw and Decoded)
Solana produces 100+ GB of a data each day, compared to 2.5 GB on Ethereum mainnet. So if youβre trying to do some historical or aggregated analysis over even a few weeks, your queries will time out on the raw tables (which is why we have decoded tables now to make it much faster).
You can find all the raw/decoded tables described in the docs - weβre going to focus on solana.transactions and solana decoded tables schemas.
solana.transactions
A transaction in Solana can have multiple instructions. Instructions are how you pass data and invoke a function on a program. In the EVM you can only call one contract per transaction, which then cascades into traces of internal contract calls. So you can think of Solana as making multiple top level contract calls (placed into an array).
The βinstructionsβ columns contains the following values per instruction, in an array:
βexecuting_accountβ: this is the program id that was invoked/called - note that addresses in Solana use base58, which can be converted to bytearray using
fromBase58()
)βaccount_argumentsβ: these are all of the accounts that are needed for interactions throughout the instructions. They can be a little weird to wrap your head around at first since they sit in an array without labels - but weβve decoded these for you.
βdataβ: This defines the function being called and parameters to pass in. In the transactions table this is base58, but in decoded tables weβll have transformed it to bytearray for you and decoded the parameters.
βinner_instructionsβ: these are all the instructions that are then invoked by the program. Program to program calls are known as Cross Program Invocations (CPIs). You can kind of think of these like traces in EVM.
Inner instructions follow the same schema, but they donβt have any inner instructions themselves - if you call a program than then calls another program, it just shows up sequentially in inner instructions.
You can reference these instruction columns in SQL by using the following syntax: instructions[1].inner_instructions[2].data.
Weβve also unnested these into the table solana.instruction_calls.
Youβll also need to know about βlog_messagesβ column - before you get excited, this is just an array of console log texts. These can be hard to work with, weβll cover them in part 4 of this series.
The decoded tables build directly off of solana.instruction_calls by creating new columns for every function input in βdataβ and every account in βaccount_argumentsβ with the schema <namespace>_solana.<programName>_call_<instructionName>. Some things to note:
Base columns are inherited with the βcall_β column prefix (stuff like tx_id, tx_index, block_time, block_slot).
account_arguments is exploded into columns, such that each known account gets a βaccount_β prefix.
Inner and Outer instructions are both included, you can use the βcall_is_innerβ column to filter if you would like to. Indexes and executing accounts are included for both, βdataβ and all other arguments apply to just the referenced level.
For most of your work, you should be using decoded tables. Otherwise, you should use the solana.instruction_calls table. It is pretty rare to need solana.transactions or solana.account_activity because those two tables are not performant to query.
You can request decoding for programs with public IDLs by submitting this form.
All together, the table lineage in dune after solana.transactions looks like this:
Donβt worry, weβll go through many more examples of queries using all these tables in the following parts of this guide.
Whirlpool Initialization
Whirlpool is the newest AMM from Orca (launched in March 2022) that uses concentrated liquidity - itβs basically Uniswap v3 but with yield farming mechanics built in.
When a new pair is deployed, it gets its own program id (just like when a new contract is deployed on Ethereum from a factory). Weβre going to use just the USDC-SOL whirlpool for all the guides, so letβs query for itβs initialization first:
SELECT * FROM whirlpool_solana.whirlpool_call_initializePool
WHERE account_whirlpool = '7qbRF6YsyGuLUVs6Y1q64bdVrfe4ZcUUz1JRdoVNUJnm'
Youβll see I reference an βaccount_whirlpoolβ column - thatβs actually decoded from account_argument index 5 in the initialization transaction.
You can see a Token Mint A and Token Mint B as the token addresses referenced in account arguments - these are things that youβll be used to seeing in βdataβ from EVM land.
The βouter instructionβ here is the initialization call, which kicks off many inner instructions. Inner instruction #1.1 creates the whirlpool pool account. Instructions #1.2 and #1.3 create the account for holding token A and then assign the token being held (SOL in this case). Same thing for #1.4 and #1.5, but for token B which is USDC. Youβll learn more about token accounts in part 2, donβt worry too much about this now.
You can find the query for the USDC-SOL whirlpool here. Youβll see I join it on tokens_solana.fungible to get token symbol and decimal, getting us the result that this pool is:
Token A is 'solβ, specifically wrapped sol with 9 decimals.
Token B is βusdcβ with 6 decimals.
The fee tier is 500, which converts to 0.05% after dividing by 1000000.
Breaking down a swap
For an easier time working with DEX data, just query dex_solana.trades instead. The breakdown below is for learning purposes.
Now letβs query for swaps through this pool (we donβt filter for success because decoded tables only contain successful instructions).
SELECT * FROM whirlpool_solana.whirlpool_call_swap sp
WHERE sp.account_whirlpool = '7qbRF6YsyGuLUVs6Y1q64bdVrfe4ZcUUz1JRdoVNUJnm'
limit 100
Where we can see some key columns:
Notably, youβll see there are βamountβ, βotherAmountThresholdβ, and an βaToBβ column. We could construct swaps using these function inputs - but these are the estimated/intended swap values and not the actual amounts because itβs pre-execution.
To get the actual swap amounts - we need to join them from the spl_token transfer table or decode them from βinner_instructionsβ. Letβs look at this example swap:
If you look at enough swaps, youβll notice that the token sold is always transferred in the first inner instruction (3.1 above), then token bought is second (3.2). We can then figure out the token symbol based on if βaToBβ is true or false (if it is true, then a is being sold and b is being bought).
Joining spl_token transfers
The βToken Programβ is really the βspl_token programβ, where the βtransferβ instruction is being called. This maps to the decoded table spl_token_solana.spl_token_call_transfer.
We can join twice (once for the transfer out and once for transfer in):
SELECT
tr_2.amount as token_bought_amount
, tr_1.amount as token_sold_amount
, sp.aToB
, sp.call_tx_id
FROM whirlpool_solana.whirlpool_call_swap sp
LEFT JOIN spl_token_solana.spl_token_call_transfer tr_1
ON tr_1.call_tx_id = sp.call_tx_id
AND tr_1.call_outer_instruction_index = sp.call_outer_instruction_index
AND (sp.call_is_inner = false AND tr_1.call_inner_instruction_index = 1)
LEFT JOIN spl_token_solana.spl_token_call_transfer tr_2
ON tr_2.call_tx_id = sp.call_tx_id
AND tr_2.call_outer_instruction_index = sp.call_outer_instruction_index
AND (sp.call_is_inner = false AND tr_2.call_inner_instruction_index = 2)
WHERE sp.call_tx_id = '44kmeC1edSfp21K5kKNVViJvLHG8XQqqu3KbHsrYcYZGmopWwBgP48c9u1DRBMGtQcbvyxd2TT8syY7ZvwpHqkhF'
The logic is that we want to get the first and second inner instructions when we are looking at an outer instruction call (like our transaction example). If βcall_is_innerβ was True, where something like a user calls Jupiter which then calls Whirlpool happens, then we would need to get the first and second inner instructions after the βcall_inner_instruction_indexβ (weβll go over this last).
You can find this query here. You can see that the values are correct, but just are missing the right decimals.
Decoding instructions data
this section is difficult and optional - if the program you are analyzing is decoded, you should not need to do this.
Sometimes, youβll need to decode instructions data manually for performance reasons. Letβs walk through that swap now. The βdataβ in inner instructions is still in base58 form - we can get it hex using fromBase58(). Letβs look at the actual transfer data using another explorer, Lumina.fyi that retains the bytearray instruction data in UI.
The first byte (0x03) is a βdiscriminatorβ which is basically like a function signature. For non-IDL programs, instructions are in an ordered enum - you can see that Transfer() is the fourth enum value (starting from 0x00) and takes in one argument βamountβ that is u64 - or 8 bytes.
So, weβre going to take the substring of the bytearray βdataβ from the 2nd byte for the next 8 bytes, and then convert that to an integer. This gets slightly technical, but the numerical data is always stored in little endian (EVM uses big endian) which means we need to first do a bytearray_reverse
before using bytearray_to_bigint
.
In practice, that means we take 0x0094357700000000
and flip it to become 0x0000000077534900
and then change it to an integer value of 2000000000
You can use this web tool to check numeric conversion manually.
We decode the first βinner_instructionβ for βtoken_sold_amountβ, and then the second for βtoken_bought_amountβ:
SELECT
bytearray_to_bigint(
bytearray_reverse(
bytearray_substring(
frombase58(sp.call_inner_instructions[1].data)
, 2, 8)
)) as token_sold_amount
, bytearray_to_bigint(
bytearray_reverse(
bytearray_substring(
frombase58(sp.call_inner_instructions[2].data)
, 2, 8)
)) as token_bought_amount
, aToB
, call_tx_id
FROM whirlpool_solana.whirlpool_call_swap sp
WHERE sp.account_whirlpool = '7qbRF6YsyGuLUVs6Y1q64bdVrfe4ZcUUz1JRdoVNUJnm'
AND call_tx_id = '44kmeC1edSfp21K5kKNVViJvLHG8XQqqu3KbHsrYcYZGmopWwBgP48c9u1DRBMGtQcbvyxd2TT8syY7ZvwpHqkhF'
Again, the values are correct but the decimals arenβt included yet. If we join the whirlpool initialization table to the swaps table to get symbols and decimals, and add some CASE WHEN logic for matching tokens to amounts, we can get this!
Congrats - you can now query and play around with direct Whirlpool swaps!
Try and calculate weekly volume and total volume of the USDC-SOL pool now. You should have everything you need.
Next Time:
I hope youβve found this beginners guide helpful, please let me know if you have any questions or something was unclear. Next time, youβll learn how to:
How token (non-native), system (native), and anchor (IDL) programs work
Common associated account, ownership, and PDA patterns
Get latest balances of liquidity pools (TVL)
Go learn all of this in part 2:
Some relevant dashboards for you to check out are: