Increase the gas limit of a single function call

Background

NEAR has a gas limit on each function call action (contract call). As of today the limit is 300 Tgas. The purpose of this proposal is to suggest an increase to this gas limit.

Motivation

Aurora is the EVM compatibility solution on NEAR. The project includes a smart contract on NEAR, called the Engine, which is a full EVM implementation. This contract allows submitting Ethereum transactions to NEAR. Naturally, calls to the Engine contract can be very computationally complex because they contain entire Ethereum transactions (which may involve multiple EVM smart contract executions). As a result, transactions that would pass on Ethereum can fail on Aurora due to the function call gas limit.

This issue has been known for a long time, and we have worked towards resolving it (see for example this post). While the Aurora team continues to work with the nearcore team to improve the efficiency of the Engine (how much EVM gas we can execute per NEAR Tgas), there are still important use cases that remain impossible on Aurora today due to hitting the gas limit. For example:

  • The number of distinct assets that can be borrowed by users in defi platforms on Aurora is limited due to the gas required for liquidation transactions scaling with the number of assets.
  • Flash loans are currently impossible on Aurora since the gas consumed by the borrowing and repay logic at the start and end of the transaction does not leave enough for interesting usage of the borrowed tokens.
  • Some blockchain-based video games also have actions that cannot fit within the current gas limit.

Therefore, we hope to enable more use-cases on Aurora by increasing the amount of NEAR gas the Engine can burn in a single transaction.

Of course this change would not just be for Aurora’s contact, any contract on NEAR could burn more gas in a single call. Are there other use-cases on NEAR itself such a change could enable? If you know of any please comment below.

Proposal

We propose increasing the function call gas limit to 500 Tgas. Similarly, the maximum allowed prepaid gas would need to be raised to 500 Tgas as well.

Justification

For use-cases that scale linearly, a 1.67x increase translates directly to them being able to do more (e.g. allowing users to borrow more asset types). Of course, that reasoning continues, and would imply we make the limit as high as possible. However, there are other considerations besides Aurora (see possible drawback below). The proposed 500 number is meant to be a compromise between the gas-hungry use-cases on Aurora and the potential issues increasing the limit too much may cause. That said, the specific value is open to discussion on this forum post and we look forward to getting input from others in the community.

This increase is not sufficient to allow the highest benchmarked use-case to fit within the limit as it uses 750 Tgas. However, this does move that use-case within reach because only a 1.5x increase in Engine efficiency would be needed, instead of 2.5x.

Possible Drawbacks

Excess compute time in nearcore

Presently, NEAR nodes use a simple approach to determining what receipts can fit in a chunk. The pending receipts are executed one by one until the total gas burnt exceeds the chunk gas limit. This means, at worst, the node can do computational work in excess of the chunk gas limit by an amount equal to the single receipt gas limit.

The reason this matters is because NEAR gas values are tuned to correlate with wall-clock time; 1 Tgas being approximately 1 ms. Therefore, a chunk gas limit of 1000 Tgas allows for NEAR’s target 1 second block time. However, if the node can exceed the chunk limit, then it may spend more than 1 second processing a chunk and thus cause a slowdown of the block time. The more the single receipt gas limit is increased, the worse this potential slowdown gets.

One way to mitigate this drawback would be a change to how receipts are selected to be included in a chunk. Instead of executing each receipt with its full prepaid gas, use min(chunk_gas_limit - total_gas_burnt, prepaid_gas). If a receipt runs out of gas when it is executed with a gas limit lower than its prepaid gas, delay that receipt to the next chunk, along with all remaining pending receipts. With this change, the node would never do more computational work on a single chunk than the chunk gas limit, thus preventing the potential slowdown. However, the downside to this change is that some work would be “wasted” in the sense that the receipt which was partially executed and ran out of gas, due to being at the end of the chunk, will need to be executed again in the following chunk. Additionally, changes would need to be made around how the gas price moves in times of congestion. Congestion would need to be defined in terms of how many delayed receipts there are as opposed to how full the chunk is because chunks will rarely be 100% filled with this model. Indeed, in the worst case where the last receipt of a chunk would have used the full 500 Tgas we propose here, then chunks could appear to be only half filled even when there is “congestion”. Overall, it is unclear whether this is better or worse than allowing a slowdown of the block time. Discussion on this point is welcomed.

Reduced safety margin for gas estimations

As mentioned above, NEAR gas costs are meant to be correlated with wall-clock time. However, due to the complexity of smart contracts it is nearly impossible to guarantee this correlation. It is possible that some operation may be under-costed relative to the amount of time it takes to execute. Such a misestimation of the cost could result in a slowdown of the block time, especially if used maliciously. One tool the NEAR team has to combat this risk is to lower the chunk gas limit in response to a misestimation. For example, suppose some operation takes 50% longer to execute than the amount of gas it costs would suggest and an attacker spams the network with such transactions, reducing the block time from 1 second to 1.5 seconds. In this case, NEAR could lower the chunk gas limit to 660 Tgas, then each chunk could still execute within 1 second even if it takes 50% longer than the amount of gas.

However, NEAR can never lower the chunk gas limit to be less than the gas limit of a single function call because that would break existing contracts. Therefore, increasing the gas limit of a single call reduces the amount of room NEAR has to move the chunk gas limit in response to gas cost misestimation exploits.

We believe the proposed value of 500 Tgas still leaves the NEAR team enough room to respond to issues. This is effectively a 2x safety margin, on top of the 3x safety multiplier that is applied to all gas costs in the first place. However, input from the nearcore runtime team is welcomed, as they do the gas cost estimation work.

5 Likes

Thanks for the nice writeup @birchmd ! I am still trying to wrap my head around the protocol so following might be fairly simple questions.

  • I suppose it is not possible to split a single aurora transaction into multiple transactions. Because in this case, it is possible that the some of the sub-transactions succeed and some fail, thereby breaking the atomicity guarantees.
  • If we were to only increase the gas limit and make no other change, then in the worst case, a node can take approximately 1.5s to process a block. So effectively running 50% slower than intended. Do we have any statistics (e.g. a histogram) on how long function calls are taking right now? I appreciate that we should keep the worst case situation in mind when designing but thinking out loud, I am also curious what kind of impact we will be having on the average case.
  • Another protocol change I can imagine that is that we do not increase the limit for all blocks but only for every Nth block. Then we will need a mechanism to delay the longer receipts till the Nth block. Could this be a feasible design? Would this even be helpful to your usecase or usecases in general? If receipts requiring the increased limit are fairly infrequent, then this design could work while not slowing the worst case as much.

I wonder if it is possible to “checkpoint” the state of the contract in the middle of execution and to continue executing it in the next chunk. I am not even sure if something like that would be possible but if possible then we could avoid the wasted work.

1 Like

Thanks for the comments @akhi3030 !

I suppose it is not possible to split a single aurora transaction into multiple transactions

Indeed, splitting across multiple transactions is a problem for two reasons. One is the atomicity (it is possible to keep the EVM txs atomic by only staging changes until the full execution is complete, but this is complicated). The other issue is Aurora’s block time. We would like to keep the same 1s block time as NEAR, but if an Aurora transaction is split over multiple NEAR transactions then it executes across multiple NEAR blocks, effectively multiplying the Aurora block time.

I am also curious what kind of impact we will be having on the average case

Yes, we always try to keep the worst case in mind because the blockchain environment is adversarial; anyone can submit any code and they may have economic incentives to cause harm to the network. So we must worry about what could happen if a malicious actor intentionally forced the worst case to happen. That said, the average case (as of today) is probably not that bad. I don’t have data about NEAR as a whole, but the average gas usage of transactions on Aurora is only 20 Tgas. And 90% of transactions on Aurora use 50 Tgas or less.

do not increase the limit for all blocks but only for every Nth block

This is an interesting idea! It certainly mitigates the worst case slowdown, though the trade-off here is in the complexity of the protocol. Complexity is not inherently a bad thing, but it does increase the surface area of attack as well as the opportunities for bugs. For example, with this proposal, an attacker could have an impact on the network disproportional to their stake because some blocks are more important than others. Say the attacker somehow contrives the validator selection such that they are always responsible for producing the Nth block, and then they shut off their nodes. This effectively censors whole classes of use-cases, and probably would not require that much stake because the number of blocks they are responsible for is relatively low (assuming N is large-ish). This specific attack may not actually be possible (I’d have to remind myself of the validator selection details), but my main point is that complexity does have a cost in protocol design. I’d be interested to know what others think of this idea as well. The added complexity may very well be worthwhile!

That aside, to answer your question about whether this helps Aurora, yes I think it does. The transactions that spend a lot of gas do tend to be fairly infrequent, and as such I think it would be ok if these transactions took longer to be included in the chain.

if it is possible to “checkpoint” the state of the contract in the middle of execution

This might be possible, though I imagine non-trivial to implement. I’m not an expert on wasm VMs, so someone else will need to weigh in on this one.

2 Likes

I guess the useful metric is how much wall-clock time do we want to allow for a single contract call? With the current limit of 300TGas and 3x safety multiplier, we say that a single call is limited by 100ms.

Given that function calls only do CPU computation and database access, and, crucially, do not do any kind of networking, 100ms limit seems pretty generous. That should be a lot of compute.

At the same time, if my math at Zulip is correct, it does seem that Ethereum allows even heavier transactions (up to 1.2 seconds, with the “what we want to support” up to 0.4 seconds)

1 Like

I guess, it would also be useful to understand why exactly

takes 750 gas, especially after touching_trie_node reduction via caching. I wouldn’t be surprised if there’s some low-handing fruit in aurora (the previous optimizations were based on a single uniswap benchmark essentailly)

2 Likes

Yes, we do want to continue optimizing Aurora of course. For that 5bEgfRQ example, the profile breakdown shows it is 90% wasm op code costs (680 Tgas on wasm, only 50 Tgas on touching trie node). We have also seen that the overhead of interpreting the EVM op codes inside wasm is quite inefficient for compute-heavy workloads (1500x in this example). So there is likely an opportunity for optimization, I agree. Though I do have any specific actionable ideas at this time.

That said, I do like your idea to frame this discussion in terms of wall-clock time for a single function call @matklad ! Whether 100ms is enough or not is a bit of a subjective question. From Aurora’s perspective, being able to support use cases which use more compute than that has value, at least based on the current EVM implementation.

1 Like