[PROPOSAL] Predictable storage gas costs (removing touched trie node cost)

NEP: Eliminate TTN Gas Cost by jakmeier · Pull Request #424 · near/NEPs · GitHub

When executing a smart contract, the gas costs are generally dynamic. But they are predictable, if the code paths taken are known. With one exception: storage operations. Those are impossible to predict accurately.

Here is a suggestion how storage costs could become predictable. The idea is to remove the dynamically counted number of touched trie node from the storage equation. The same cost can be approximated by the storage key length, which is something the dApp developer controls.

Why do we need this change?

There are two main reasons why this change is desirable.

  1. Predictable gas costs are necessary for a good devX.
  2. Validator client implementations are currently restrained by the gas cost model. They could be more efficient if they didn’t have to count the exact number of trie nodes on every access. This, eventually, could lead to lower gas fees overall.

What exactly is the proposed change?

Current formula:

  • base_fee +
  • per_value_byte_fee * value_bytes +
  • per_key_byte_fee * key_bytes +
  • touched_trie_node_fee * uncached_trie_nodes +
  • cached_trie_node_fee * cached_trie_nodes

This formula exists with different parameter values for each type of storage operation. (read/write/has_key/remove).

New suggested formula:

  • base_fee +
  • per_value_byte_fee * value_bytes +
  • per_key_byte_fee * key_bytes
  • touched_trie_node_fee * uncached_trie_nodes +
  • cached_trie_node_fee * cached_trie_nodes

This, but with increased values on per_key_byte_fee to balance costs out.

How does this make costs predictable?

Costs are unpredictable for two reasons.

  • The number of touched trie node depends not only on the own account’s state, but also what other accounts exist on the chain.
  • The number of cached trie node depends on all receipts that have been executed in the same chunk, on the same account or on different accounts.

Both are no longer an issue when the trie node count is removed from the gas formula.

How/why can we just remove parts of the gas cost?

The dApp should be paying for all the work validators do to execute a function call. This includes reading and writing trie nodes when storage is accessed. This is mostly true today and will still be true if the proposal is accepted and implemented.

Charging for trie node costs is not trivial to get right. When multiple storage requests are made in the same function call, many nodes are accessed more than once. The first time a node is accessed, it can potentially be an expensive fetch from disk. Following requests can be served from memory, which is at least an order of magnitude cheaper.

Today, the trie node cost is charged either as a full DB access or as an in-memory access, using different parameter values for each. But because all validators have to agree on which costs is charged, the exact caching strategy must be part of the protocol specification. For that, the chunk cache is specified as a least-recently-used cache with fixed size that starts empty for every chunk processed.

The chunk cache effectively safes somewhere between 30% and 90% of all storage costs today, compared to always charging the full DB cost. In reality, however, 95% - 99% of trie nodes are served from memory. So in some sense it is still overcharged. But to compensate that, the “full DB access” cost is a bit cheaper than it should be, since a validator never has to actually fetch all those values from disk.

To summarise, today’s cost model is an approximation of the trie node hit rate that relies on counting the exact number of trie nodes touched.

The new proposed model is also an approximation. However, it uses the fact that the number of trie nodes is limited by the key length. A key with 20 bytes can produce at most 40 trie nodes, for example.

What was that about client implementation becoming more efficient and lower gas fees?

Today, NEAR Protocol clients must implement storage requests in a very specific way. They have to look up trie node after trie node, starting at the trie root of the current chunk, all the way down to the leaf with the accessed value.

A more efficient way to implement storage would be to have a direct mapping from storage keys to values, rather than traversing the trie each time. This idea is currently already in the late implementation phase for nearcore, under the name of FlatStorage. However, any potential gas fee reduction from such optimisations cannot be realized with the current gas model. Because the protocol mandates counting of the exact number of trie nodes in order to subtract the right amount of gas.

Ok but what new parameter values to use and how does that affect my favourite dApp on NEAR?

This is still work-in-progress. Some analysis on this can be found in this public document: Eliminate TTN gas cost - Google Docs

More details on exact parameter values will follow. And a NEP will be written for this, should the community agree that, in principle, this is a benefitial change.

5 Likes

Very excited to know devX will improve even more going ahead! I’m for implementing this calculation of course, but what I’m wondering is: will this calculation still be relevant when FlatStorage is implemented in nearcore? How straightforward will it be to update it once FS has been implemented?

The new gas formula will suit flat storage much better. In fact, one of the main reasons why I am coming forward with this now is because FS cannot work properly without changing the gas cost model around trie nodes. (We would still have to read all the nodes, just to get the gas cost right.)

But you are right that specific numbers for each parameter might need a change once FS is implemented. Our goal is to ensure that the only necessary change would be to reduce the existing gas parameters. (The FS implementation is making progress day by day and we are looking forward to start the first benchmarks on it soon. Until then it’s difficult to make hard statements regarding this.)

Also, it is open for discussion if this change should be released before FS or together with FS.

Overall, it seems a no-brainer that we should do it, provided that we could do it. So the analysis is the load bearing part here. It’s a rather long document, what’s the TL;DR?

My main question is: does it seem like we need to roll-out flat-state to make this model feasible, or are there parameter values which would make even the current trie code work? Eg, would existing contracts become costlier? Would we create undercharging opportunities?

I am specifically interested in write costs, which are fundamentally dependent on trie shape.

1 Like

Summary of considered options:

  1. Remove TTN cost before FS is ready.
    • Function calls +1 Tgas on average.
    • 9 fn calls / hour need to attach more gas.
    • 7.5 n calls / hour hit the 300 Tgas limit.
    • aurora +1381 NEAR/month, app.nearcrowd.near +280 NEAR/month (info on more accounts)
  2. Change costs along side the first implementation of FS, which we expect this quarter but it cannot optimize write costs.
    • Function calls +0.75 Tgas on average.
    • 7.5 fn calls / hour need to attach more gas.
    • 1.67 fn calls / hour hit the 300 Tgas limit.
    • aurora +932 NEAR/month, app.nearcrowd.near +281 NEAR/month
  3. Delay Flat Storage until we have it working for writes as well. Then remove TTN cost and ship fully optimized FS at once.
    • Function calls -0.075 Tgas on average.
    • 0.29 fn calls / hour need to attach more gas.
    • 0.11 fn calls / hour hit the 300 Tgas limit, all of them on aurora.
    • aurora -68 NEAR/month, app.nearcrowd.near -64 NEAR/month

All options assume we change gas costs in a way that keeps safety margins against potential IO undercharging overall at least at the same level. In the cases where we are most uncomfortable today, the new model will be much more robust though, so it would be an overall win in terms of robustness.

Of course, doing step 1 will not prevent step 2 and 3 from coming in later. The strategy to discuss is if we want to do the intermediate steps or not.

  • Intermediate steps seem to be painful for dApps if they don’t optimize for shorter key lengths.
  • Waiting for step 3 to be ready is risky for node operators, as there are stability concerns around IO timing that only increase as the state grows.

Getting some input of the affected parties would be incredibly valuable here.

Note that shipping FS phase 1 and keeping TTN based gas costs is not a real option. We would see no benefit from flat storage without changing the gas cost model. Not even stability would be improved as long as we keep reading all trie nodes to count them.
There are alternative changes to the gas cost models but they have the same issue with increased write cost until we have FS fully optimized.

1 Like

Just a quick heads-up: We think we found a way to split flat storage into read-only and write-only pieces. This means we can probably deliver flat storage incrementally and then we could also move to the new cost model incrementally. (First only for reads, then also for writes.)

The hope is, it will all work out without increasing gas spending for anyone on the way. We just need to collect data on flat storage performance to confirm that.

So, we finally have a rough idea for flat storage performance. It is roughly in line with prior assumptions of 100us per DB access. This means, from a technical perspective, the proposal to remove TTN from the gas equation is feasible.

As written before, doing this change without full flat storage optimization will increase gas costs significantly for IO heavy contracts. But we can do it just for reads in a first step. The working assumption is that this will not affect gas costs significantly.

There has not been much feedback on this thread. Nothing negative in particular. And we kind of need this for flat storage to be viable at all. I will thus go ahead and submit a NEP for this soon.

I just opened a draft NEP PR: Eliminate TTN Gas Cost by jakmeier · Pull Request #424 · near/NEPs · GitHub

1 Like

Long overdue update: We (protocol team at Pagoda) no longer see this as a viable change to introduce in isolation.

We crunched all the numbers and it seems we would affect some active contracts negatively no matter how we slice it.

Hence, for flat storage MVP, the team decided to go with an implementation gradually removes TTN costs. First, remove TTN costs only for reads, in a later step also for writes. In combination with parameter compute costs (NEP-455) this will allow deploying it without any contracts noticing anything.

In conclusion, this proposal to remove TTN as a stand-alone NEP is not quite feasible and we found a different way to ship flat storage (and remove TTN cost as a sideeffect). Note that it will result in the same benefits for users, just delayed until we have full flat storage implemented and shipped.

Thank you for providing a closing summary Jakob.