Nearcore release cadence

Previously we used a six week release schedule for nearcore. A release candidate is first released to testnet and then 2 weeks later finalized and released to mainnet. The next release candidate would be released to testnet 4 weeks after that. However, as the ecosystem grows, more and more people start running nodes on both testnet and mainnet and it becomes apparent that more time is needed for node operators to upgrade between releases.

Consequently, we considered a 8-week release schedule where a release candidate would run on testnet for 4 weeks before being finalized and released to mainnet. The next release candidate would be published 4 weeks after the mainnet release. The problem with such an approach is that it slows down the release cycle even more. The ability to iterate fast on things is something we value deeply at NEAR and we would like to avoid having extended delay between two releases while maintaining the integrity of releases.

Looking at where the time is actually needed between releases, we realize that it is mostly between a testnet release and the subsequent mainnet release – node operators need time to upgrade their nodes and it is desirable to have the new release running on testnet for a few weeks before it is released to mainnet. What is not absolutely necessary, however, is the time between a mainnet release and the next testnet release. While it may be undesirable to have two releases on the same day, there is no need for an extended time period between the two. As a result, we propose the following five-week release schedule:

  • Testnet release for version 1.x-rc is released on day T.
  • On day T+28, assuming there are no issues, 1.x-rc is stabilized and 1.x is released to mainnet.
  • On day T+35, the next testnet release 1.<x+1>-rc is published.
  • On day T+63, the next mainnet release 1.<x+1> is published

This should give enough time both for node operators to upgrade their nodes and for the release to be sufficiently tested.

Please feel free to comment if you have any suggestions or concerns regarding this proposal. Otherwise we will start using the new schedule for the next release cycle.

7 Likes

I think it’s a great idea, I just have a question:

  • On day T+28, assuming there are no issues, 1.x-rc is stabilized and 1.x is released to mainnet.
  • On day T+35, the next testnet release 1.<x+1>-rc is published.

Why do we need a gap between these events? We could even do the following:

  • Testnet release for version 1.x-rc is released on day T.
  • On day T+28, assuming there are no issues, 1.x-rc is stabilized, 1.x is released to mainnet and the next testnet release 1.<x+1>-rc is published.
  • On day T+56, the next mainnet release 1.<x+1> is published and testnet release 1.<x+2>-rc is published, etc.

I feel that gap increase between testnet and mainnet releases from 2 to 4 weeks already should be enough, though I don’t have a strong opinion here. Also, day T+28 will get more sense as “day of pushing protocol version forward”, simultaneously from master to testnet and from testnet to mainnet.

1 Like

It increases the operational burden for all node operators due to the need to perform two different upgrades. Also this would lead to testnet being two protocol versions ahead of mainnet since mainnet upgrades can take a week.

2 Likes

in addition to Bowen’s comments, having a couple of days of concurrent versions on testnet and mainnet allows for testing potential fixes to issues which passed testnet but became visible on mainnet, and do not require a version update. for example, the recent update to config.toml.

1 Like

I’m comfortable with even 2x faster cadence, but I understand exchanges especially need more time. validators can say they need more time but even 2x faster is enough IMO. it’s our job after all.

IIRC, one of the rationales for the current state was that “we want testnet and mainnet to use the same protocol version, so that contract developers can reliably test their contracts using environment, close to production”. In the current system, we have 4 weeks where testnet and mainnet are even, and 2 week window where mainnet lags behind testnet. So, it’s lagging happens 1/3 of the time.

In the new version, lagging happens 4/5 of the times. That is, we lose the property that “testnet is just like mainnet”. I think this OK in the limit, as the scale of changes naturally decreases over time (ie, even a lagging mainnet is close enough to testnet for practical purposes). This probably is OK today as well, but we need to acknowledge that we want testnet to be generally ahead of mainnet.

Observation: Rust release cycle is six weeks, and that creates a natural “pulse” in the Rust ecosystem. six week nearcore cycles aligns with this pulse, and we can, eg, upgrade our compiler version at the same window during each release. five is co-prime with six, which makes us maximally unaligned. Not sure if that’s good or bad: it definitely is a fun fact, and probably doesn’t matter.

Very good point. We did consider this concern before and asked @josh.quintal to provide some feedback on the developer experience side and I believe that Josh thinks it is not an issue for developers.