Under this proposal tools and programs implementing the NEAR specification(s) will no longer commit to perpetual support for all past revisions of the protocol. Users wishing to interact with the historical blocks and other entities pertaining to the older protocol versions should use the corresponding versions of the tools and programs.
Today any development to neard is required to maintain the exact behavior as seen in all past versions of the program. Among other things, this is necessary to enable replay of the blockchain all the way back to the genesis, as implemented by the various
neard view-state sub-commands.
Some of the prerequisite infrastructure to reconcile the operational needs and the desire to modify the protocol already exists in the form of protocol versioning. Practice has nevertheless demonstrated protocol versioning to incur a significant hit to the development velocity. Every time a change is made, engineers need to carefully consider the interaction between the change and protocol-visible behaviour. In cases where the change is intentionally a breaking change to the protocol, care needs to be taken to also maintain the former behaviour for the previous version of the protocol. In extreme cases the only feasible way to maintain compatibility is to duplicate the affected subsystem in full. First time this happens, logic to switch between different versions of the functionality must be implemented as well. Such logic acts to further impede future evolution of the implementation.
We are not able to consistently verify whether our efforts to maintain compatible behaviour are being implemented correctly in practice. Verifying the protocol versioning is implemented correctly by replaying even portions of the history is an extremely time-consuming process (taking weeks to months per experiment) and requires significant effort to set up. This makes verification of code changes quite onerous, to the point where there has only been One Attempt on behalf of the Contract Runtime team back in 2021.
It would be ideal, if we could make the requirements for compatibility satiable by construction and remove the burden imposed by this functionality on the development process. This proposal suggests an approach at an expense of negligible relative increase in operational complexity and additional coordination.
After the implementation of this proposal a release of neard would support either:
- the latest protocol version only; or
- the latest protocol version and one preceding protocol version only.
Supporting the latest protocol version only achieves correct support for the corresponding protocol version by construction. By using a binary dedicated for each version, we have a much stronger guarantee that exactly the same code is used to handle entities at a specified protocol version. Little effort or consideration is necessary in how new developments and improvements interact with the project’s legacy. On the other hand, this is perhaps quite at odds with the protocol upgrade roll-out negotiation that occurs as part of the routine release process. With every release of neard the deployment process that follows is largely a two-step affair. Today, operators first deploy a newer version of neard on their systems. At some later time, a vote occurs at a protocol level to switch to a newer protocol version. This requires some coordination between all involved parties, but the truly synchronous portion of the process – the protocol upgrade vote – occurs automatically.
Toning down the proposal a little, supporting the latest protocol version and one preceding protocol version with each release of neard is a nice intermediate step. This way we can still benefit from development velocity and correctness improvements, albeit partially, but operationally no observable differences to the validators would be expected, and the implementation doesn’t need to deal with any changes related to protocol version upgrades.
NOTE Ideas expressed in this section haven’t been verified to be implementable. If you see any significant issues with this proposal from the standpoint of either an operator or an implementer, please leave comment.
One of the options to implement the one-protocol-version-only scheme would be to have neard cleanly terminate or
exec a new binary whenever requested to switch to a more recent protocol version. In such a scheme, neard startup would consult with a configuration file, or a similar convention to figure out the binary for the required protocol version:
56: "/opt/bin/neard-1.29" 55: "/opt/bin/neard-1.28" ...
From the operational standpoint this is perhaps even nicer than the setup we have today – deploying a new release is effectively dropping a binary into a predetermined location and appending a configuration file. That said, it is important to acknowledge there are new risks as well – it is much harder to confirm that the new binary is able to perform at all, for example. The newly deployed binary will be invoked for the first time at the protocol version upgrade time, which leaves a much smaller window to resolve any mistakes.
If implemented naively, this approach will have some unpalatable consequences on pacing at which blocks are processed during the protocol upgrade – spinning up a new neard instance takes significant amount of time. These are all solvable with additional implementation effort. Peer connections can be inherited, the process for a new version can be launched some time ahead of the switch so it has time to warm-up, etc.
This change should overall improve the security and reliability of the implementation in the long run. With developers being able to make changes more confidently, without concerns for backwards compatibility, the overall health of the code base will improve. As a direct consequence of this change, fixes to vulnerabilities can be implemented and rolled out more quickly.
The major drawback for this proposal is the more difficult set up for replay. Any replay involving blocks at multiple protocol versions would need to use corresponding releases of neard to apply the blocks. Compared to the overall effort to set-up a replay, this additional requirement should be negligible.
This proposal introduces additional complexity at the intersection of resource control, hand-off and protocol upgrades. Implementation will need to extend the current protocol upgrade mechanism significantly to enable the functionality required to implement this proposal.