Solana Analytics Starter Guide (Part 1): Solana Instructions

Understanding instructions on Solana, table schemas, and walking through DEX basics with Whirlpool AMM.

Apr 12, 2023

Want to join an onchain data community? I’ve created an NFT passport and quest system that any analyst can participate in.
🔍 Learn about the Bytexplorers, then mint your passport and join in

This is the first of a four part series covering basic Solana analysis.

💧 Coming from an Ethereum background? Start with this guide translating EVM concepts to their Solana counterparts.

🔊 More of a video learner? Check out this one hour video guide.

By the end of the four guides you should be able to understand the core concepts of any of the covered queries. My DMs on Twitter are always open for questions or ideas.

Basic SQL knowledge will be helpful. If you’re unfamiliar, go through this tutorial.

Solana Tables (Raw and Decoded)

Solana produces 100+ GB of a data each day, compared to 2.5 GB on Ethereum mainnet. So if you’re trying to do some historical or aggregated analysis over even a few weeks, your queries will time out on the raw tables (which is why we have decoded tables now to make it much faster).

You can find all the raw/decoded tables described in the docs - we’re going to focus on solana.transactions and solana decoded tables schemas.

solana.transactions

A transaction in Solana can have multiple instructions. Instructions are how you pass data and invoke a function on a program. In the EVM you can only call one contract per transaction, which then cascades into traces of internal contract calls. So you can think of Solana as making multiple top level contract calls (placed into an array).

The “instructions” columns contains the following values per instruction, in an array:

“executing_account”: this is the program id that was invoked/called - note that addresses in Solana use base58, which can be converted to bytearray using fromBase58())
“account_arguments”: these are all of the accounts that are needed for interactions throughout the instructions. They can be a little weird to wrap your head around at first since they sit in an array without labels - but we’ve decoded these for you.
“data”: This defines the function being called and parameters to pass in. In the transactions table this is base58, but in decoded tables we’ll have transformed it to bytearray for you and decoded the parameters.
“inner_instructions”: these are all the instructions that are then invoked by the program. Program to program calls are known as Cross Program Invocations (CPIs). You can kind of think of these like traces in EVM.

Inner instructions follow the same schema, but they don’t have any inner instructions themselves - if you call a program than then calls another program, it just shows up sequentially in inner instructions.

You can reference these instruction columns in SQL by using the following syntax: instructions[1].inner_instructions[2].data.
We’ve also unnested these into the table solana.instruction_calls.

You’ll also need to know about “log_messages” column - before you get excited, this is just an array of console log texts. These can be hard to work with, we’ll cover them in part 4 of this series.

The decoded tables build directly off of solana.instruction_calls by creating new columns for every function input in “data” and every account in “account_arguments” with the schema <namespace>_solana.<programName>_call_<instructionName>. Some things to note:

Base columns are inherited with the “call_” column prefix (stuff like tx_id, tx_index, block_time, block_slot).
account_arguments is exploded into columns, such that each known account gets a “account_” prefix.
Inner and Outer instructions are both included, you can use the “call_is_inner” column to filter if you would like to. Indexes and executing accounts are included for both, “data” and all other arguments apply to just the referenced level.

For most of your work, you should be using decoded tables. Otherwise, you should use the solana.instruction_calls table. It is pretty rare to need solana.transactions or solana.account_activity because those two tables are not performant to query.

You can request decoding for programs with public IDLs by submitting this form.

All together, the table lineage in dune after solana.transactions looks like this:

Don’t worry, we’ll go through many more examples of queries using all these tables in the following parts of this guide.

Whirlpool Initialization

Whirlpool is the newest AMM from Orca (launched in March 2022) that uses concentrated liquidity - it’s basically Uniswap v3 but with yield farming mechanics built in.

When a new pair is deployed, it gets its own program id (just like when a new contract is deployed on Ethereum from a factory). We’re going to use just the USDC-SOL whirlpool for all the guides, so let’s query for it’s initialization first:

SELECT * FROM whirlpool_solana.whirlpool_call_initializePool
WHERE account_whirlpool = '7qbRF6YsyGuLUVs6Y1q64bdVrfe4ZcUUz1JRdoVNUJnm'

You’ll see I reference an “account_whirlpool” column - that’s actually decoded from account_argument index 5 in the initialization transaction.

solscan link, note that the data decoded accounts aren’t part of account_arguments.

You can see a Token Mint A and Token Mint B as the token addresses referenced in account arguments - these are things that you’ll be used to seeing in “data” from EVM land.

The “outer instruction” here is the initialization call, which kicks off many inner instructions. Inner instruction #1.1 creates the whirlpool pool account. Instructions #1.2 and #1.3 create the account for holding token A and then assign the token being held (SOL in this case). Same thing for #1.4 and #1.5, but for token B which is USDC. You’ll learn more about token accounts in part 2, don’t worry too much about this now.

You can find the query for the USDC-SOL whirlpool here. You’ll see I join it on tokens_solana.fungible to get token symbol and decimal, getting us the result that this pool is:

Token A is 'sol’, specifically wrapped sol with 9 decimals.
Token B is ‘usdc’ with 6 decimals.
The fee tier is 500, which converts to 0.05% after dividing by 1000000.
Subscribe to get the next parts of the guide, and other Web3 data tips and tricks!

Breaking down a swap

For an easier time working with DEX data, just query dex_solana.trades instead. The breakdown below is for learning purposes.

Now let’s query for swaps through this pool (we don’t filter for success because decoded tables only contain successful instructions).

SELECT * FROM whirlpool_solana.whirlpool_call_swap sp
WHERE sp.account_whirlpool = '7qbRF6YsyGuLUVs6Y1q64bdVrfe4ZcUUz1JRdoVNUJnm'
limit 100

Where we can see some key columns:

Notably, you’ll see there are “amount”, “otherAmountThreshold”, and an “aToB” column. We could construct swaps using these function inputs - but these are the estimated/intended swap values and not the actual amounts because it’s pre-execution.

To get the actual swap amounts - we need to join them from the spl_token transfer table or decode them from “inner_instructions”. Let’s look at this example swap:

If you look at the earlier instructions, you’ll see there we instructions to wrap the SOL before the swap ;)

If you look at enough swaps, you’ll notice that the token sold is always transferred in the first inner instruction (3.1 above), then token bought is second (3.2). We can then figure out the token symbol based on if “aToB” is true or false (if it is true, then a is being sold and b is being bought).

Joining spl_token transfers

The “Token Program” is really the “spl_token program”, where the “transfer” instruction is being called. This maps to the decoded table spl_token_solana.spl_token_call_transfer.

We can join twice (once for the transfer out and once for transfer in):

SELECT
    tr_2.amount as token_bought_amount
    , tr_1.amount as token_sold_amount
    , sp.aToB
    , sp.call_tx_id
FROM whirlpool_solana.whirlpool_call_swap sp
LEFT JOIN spl_token_solana.spl_token_call_transfer tr_1 
    ON tr_1.call_tx_id = sp.call_tx_id 
    AND tr_1.call_outer_instruction_index = sp.call_outer_instruction_index 
    AND (sp.call_is_inner = false AND tr_1.call_inner_instruction_index = 1) 
LEFT JOIN spl_token_solana.spl_token_call_transfer tr_2 
    ON tr_2.call_tx_id = sp.call_tx_id 
    AND tr_2.call_outer_instruction_index = sp.call_outer_instruction_index 
    AND (sp.call_is_inner = false AND tr_2.call_inner_instruction_index = 2)
WHERE sp.call_tx_id = '44kmeC1edSfp21K5kKNVViJvLHG8XQqqu3KbHsrYcYZGmopWwBgP48c9u1DRBMGtQcbvyxd2TT8syY7ZvwpHqkhF'

The logic is that we want to get the first and second inner instructions when we are looking at an outer instruction call (like our transaction example). If “call_is_inner” was True, where something like a user calls Jupiter which then calls Whirlpool happens, then we would need to get the first and second inner instructions after the “call_inner_instruction_index” (we’ll go over this last).

You can find this query here. You can see that the values are correct, but just are missing the right decimals.

Decoding instructions data

this section is difficult and optional - if the program you are analyzing is decoded, you should not need to do this.

Sometimes, you’ll need to decode instructions data manually for performance reasons. Let’s walk through that swap now. The “data” in inner instructions is still in base58 form - we can get it hex using fromBase58(). Let’s look at the actual transfer data using another explorer, Lumina.fyi that retains the bytearray instruction data in UI.

The first byte (0x03) is a “discriminator” which is basically like a function signature. For non-IDL programs, instructions are in an ordered enum - you can see that Transfer() is the fourth enum value (starting from 0x00) and takes in one argument “amount” that is u64 - or 8 bytes.

We’ll be getting a lot more comfortable with rust code in the next guide!

So, we’re going to take the substring of the bytearray “data” from the 2nd byte for the next 8 bytes, and then convert that to an integer. This gets slightly technical, but the numerical data is always stored in little endian (EVM uses big endian) which means we need to first do a bytearray_reverse before using bytearray_to_bigint.

In practice, that means we take 0x0094357700000000 and flip it to become 0x0000000077534900 and then change it to an integer value of 2000000000

You can use this web tool to check numeric conversion manually.

We decode the first “inner_instruction” for “token_sold_amount”, and then the second for “token_bought_amount”:

SELECT 
bytearray_to_bigint(
    bytearray_reverse(
        bytearray_substring(
            frombase58(sp.call_inner_instructions[1].data)
        , 2, 8)
    )) as token_sold_amount
, bytearray_to_bigint(
    bytearray_reverse(
        bytearray_substring(
            frombase58(sp.call_inner_instructions[2].data)
        , 2, 8)
    )) as token_bought_amount
, aToB
, call_tx_id
FROM whirlpool_solana.whirlpool_call_swap sp
WHERE sp.account_whirlpool = '7qbRF6YsyGuLUVs6Y1q64bdVrfe4ZcUUz1JRdoVNUJnm'
AND call_tx_id = '44kmeC1edSfp21K5kKNVViJvLHG8XQqqu3KbHsrYcYZGmopWwBgP48c9u1DRBMGtQcbvyxd2TT8syY7ZvwpHqkhF'

Again, the values are correct but the decimals aren’t included yet. If we join the whirlpool initialization table to the swaps table to get symbols and decimals, and add some CASE WHEN logic for matching tokens to amounts, we can get this!

Congrats - you can now query and play around with direct Whirlpool swaps!

Try and calculate weekly volume and total volume of the USDC-SOL pool now. You should have everything you need.

Next Time:

I hope you’ve found this beginners guide helpful, please let me know if you have any questions or something was unclear. Next time, you’ll learn how to: