Bridge upgradability and governance plan

One important aspect of any smart contract (especially ones that hold / operate large amounts of tokens) is the upgradability process and its governance. This process defines the rules who and according to what procedure is able to change the behaviour of the smart contract by uploading a new version of the contract execution code. Obviously, an upgradability process can be changed over time, with upgrades.

The goal of this post is to give an intro into smart contract upgradability patterns, align it with the rainbow bridge architecture and propose the staged plan for changes in the upgradability, including the governance of these upgrades. The proposed plan is to be discussed here, adjusted according to the comments from the community and validators and finalised. At this stage we need to agree on the principal scheme, while details might be developed over time.

Date of closing this discussion and making final decision on the upgradability plan is Tuesday, 16th of March, 2021.

The initial post with the proposed plan will be updated over time according to the feedback.

Why is this important?

If there’s a way to change a contract code, then even audited and widely tested contracts might be updated to version with critical bugs, which will result in the loss of the tokens owned by this contract. This is extremely important for the bridge contracts, which are dealing with locked tokens and have full control of minting and burning bridged tokens. That’s why all of the upgrades should be thoroughly tested and verified, especially on late stages of the project.

However, the security aspect is double-edged. In the ideal world, the contract code is bug-free and thus, it doesnt need to have fast or hidden upgrades for fixing critical vulnerabilities (see ‘Ethereum is a dark forest’ and ‘Escaping the dark forest’ for implications of such upgrades). In this case, the upgrade can be staged, and then voted (often over a long period of time) with multiple parties involved in the process. Such approach is used, for example, for Bitcoin protocol upgrades. Unfortunately, the voting phase might last for months.

In the reality, contracts have vulnerabilities, especially in the early stages. So it is worth having an ability to do fast upgrades for solving the issues, minimizing the time when the tokens are at risk. This excludes the approach of staged upgrades & voting since hackers would be able to analyze the upgrade code and find the actual vulnerability and make use of it. However, if developers under-tested the upgrade or intentionally act maliciously, again, locked tokens can be stolen.

There’s no ‘one size fits all’ solution: all of them have trade-offs. And since Rainbow bridge is a quite substantial part of the NEAR ecosystem, everyone including users that are not planning to use the bridge should know how it is going to be governed and upgraded.

The scope of the problem

A thing that adds to the complexity of the problem is the bridge architecture. It has several distinct parts:

  1. NEAR blockchain:
    1. Core contracts, like EthClient, EventProver
    2. Core connectors, like FungibleTokenConnector
    3. Custom connectors, developed by 3rd parties
  2. Ethereum blockchain:
    1. Core contracts, like NearClient, NearProover, BorshDecoder, Ed25519
    2. Core connectors, like Erc20Connector
    3. Custom connectors, developed by 3rd parties
  3. Non-blockchain services:
    1. Block relayers, that send block headers to clients and make the bridge operational
    2. Watchdog – a service that checks the NEAR headers submitted to Ethereum and sends challenge in case a wrong block was submitted
    3. Transaction relayers (not yet developed) – services that would on behalf of the user send finalisation transaction, effectively making the user interaction with the bridge atomic.

Non-blockhain services may be run by anyone and the bridge designed in a way that even incorrect behavior of these services is unable to put the funds at risk. At least one pair of uncorrupted block relayers and a watchdog is enough to have a healthy and updated bridge infrastructure, and we would be able to spin them up. In case of any problems with services they get bug-fixes and restarted. We encourage community to either enhance the redundancy of these services by hosting their copies, or develop watchdog scripts which will throw alerts in case strange behavior is observed.

Though custom connectors might seem not important for this particular discussion, there’re two considerations for not forgetting about them:

  1. It would be extremely helpfull for the developers if we would provide a suggested approach to the upgradability of the connectors (to avoid problems with core bridge upgrades)
  2. Custom connectors are using core contracts, which interface (methods, parameters) might change in the future according to the upgrades; so these connectors should adopt

Also, remember: the contracts that store tokens / data are connectors, while the core bridge contracts are only used for the verification purposes. Thus, it is connector contracts to which we should pay attention during the upgrades. Core bridge contracts are able to be redeployed to new addresses without any major harm (some delays in transfers plus connectors should be able to change the link to the core bridge and, perhaps, the interface of the calls).

Existing approaches

The most comprehensive and up-to-date write up about Ethereum side upgradability approaches can be found in OpenZepplin blog (for additional info see also Consensus blog). In short, the following approaches are widely used:

  1. Registry. A special registry contract keeps track of the versions of the contract. A user / contract before interacting with the contract should get the address of the up-to-date version of the contract from the registry.
    1. Pros: simple, easy to audit, ability to work with multiple versions
    2. Cons: data migration is unclear, additional calls for querrying the actual contract required
  2. Parameters /strategies configuration / specific funtions. A contract has methods for updating parameters or links to addresses that implement certain pieces of the logic
    1. Pros: can be finely tuned, additional security for users due to unchangeable main contract code.
    2. Cons: bugs in the main contract are not fixable, higly dependant on the use case, no stadard tested code available
  3. Proxy. There’s a special Proxy contract that does nothing expect delegating calls to the implementation contract.
    1. Pros: the address is static, the data is separated from the implementation
    2. Cons: complicated, may cause many troubles due to EVM specifics (function selector clashes, storage clashes, etc.)
  4. Metamorphic contracts. A contract is deployed using CREATE2 opcode and is able to upgrade after a selfdestruct opcode is used
    1. Pros: doesn’t require proxies
    2. Cons: state of the contract is deleted in the upgrade process, the upgrade cannot be scheduled with a single trandaction => the contract will have downtime

Governance

Besides the upgradability itself, there’re different ways of governing the upgrades:

  1. Singe key / Externally owned accounts. A single key is able to upgrade the contract.
    1. Pros: fast
    2. Cons: centralised, single point of failure
  2. Multisig. N-of-M keys should sign for an upgrade to come in place
    1. Pros: pretty fast, in case in case of not a big number of parties, no single point of failure
    2. Cons: still PoA model; additional trust entity introduced to {Ethereum consensus, NEAR consensus} set.
  3. Timelock. The owner of the contract (either a contract or a single key) is able to stage the upgrade, which will happen in some time from the staging transaction (typically 1-7 days)
    1. Pros: users have time to quite the service in case they disagree with upgrade
    2. Cons: makes fast upgrades impossible, within staging period hacker can analyse the code and break the current version
  4. Pausability (with escape hatches). A contract implements a method for pausing all operations. In paused modes either no actions with the contract are allowed or users can quit the service.
    1. Pros: with escape hatches, this is a solution to security upgrades problem with timelocks
    2. Cons: pausing naturally is centralised and thus should be limited in time.
  5. Commit-reveal upgrades. A security upgrade is commited (in a form of a hash) to the contract, and the code revealed to the trusted security experts. After approval, the upgrade revealed and simultaneously applied
    1. Pros: a solution to security upgrade problem with timelocks, reduces the risk of hackers to reverse-engineer the vulnerability
    2. Cons: hardly distinguishable from multisig governance
  6. DAO / total voting. The contract is governed by the voting of the stakeholders. In case of NEAR, we’re talking of a validator voting
    1. Pros: ultimate decentralisation, if something is broken with the voting, this means that NEAR protocol is also broken and bridge is a minor problem.
    2. Cons: might be slow, unclear how to apply security patches, widening the scope of responsibilities of validators, voting decision should be transferred to Ethereum contracts too, which might be the problem if the bridge stopped

Current state of affairs

Note: There’re no established guidelines for the smart contract upgradability for NEAR blockchain at the moment. The most prominent is this proposal. All the approaches from Ethereum can be applied.

At the moment the upgradability of the bridge is implemented the following way:

  1. Ethereum
    1. Core contracts are not upgradable at the moment, however they have passed security audit. Upgrades to the core bridge are delivered as separate instances of the bridge, causing connectors to change the addresses for prooving the finalisation transactions.
    2. Fungible token connector has special functions that allow for transferring tokens to a new deployment
    3. Governance model is a single key
  2. NEAR
    1. Core contracts passed security audit, but with full access keys, we’re able to deploy the new versions of the contracts in the same accounts, eliminating the need for the connectors to change addresses of the proovers
    2. Fungible token connector also has full access keys, so can be updated without change in the addresses.

Upgradability plan

Assumptions

  • The amount of bugs in the code is at its’ maximum in the beginning of operations of the bridge and in general gets reduced over time
  • Bridge test coverage increases over time
  • Upgrades become more tested and weighed (thought through) over time
  • User base of the product increases over time
  • Locked value in the bridge increases over time (the cost of mistake increases)

General considerations

Taking into account assumptions and common practices, we propose the staged approach for the upgradability of the bridge contracts: early stages would have simple upgradability patterns with more centralised governance, which later will be evolving into more decentralised approach.

Stage 1. Centralised governance, no fixed interfaces

Bridge core, NEAR Bridge core, Ethereum Connectors, NEAR Connectors, Ethereum 3rd party connectors
Single key - Single key Single key
Bridge is redeployed in the same place using the full access key No upgradability pattern, bridge gets redeployed to the new location, no data migration is needed Connectors are redeployed in the new accounts* Connector contract implement special functions that are used by a single owner to secure the locked tokens and connect to the new bridge deployments 3rd party connectors should be fully upgradable in order to adopt the interface changes of the core bridge

* this is done for the sake of not migrating the bridged/locked tokens, but giving the ability to withdraw them.

This approach is simple, allows to iterate fast, and immediately react on severe vulnerabilities, by putting the tokens in the safe place.

Important: All dApps that would integrate with the bridge (either on the contract level or outside the blockchain), should be able to update the integration code, including the contract addresses and interfaces.

Stage 2. Separate multisig governance, proxy patterns

Bridge core, NEAR Bridge core, Ethereum Connectors, NEAR Connectors, Ethereum 3rd party connectors
Multisig Multisig Multisig Multisig
Bridge is redeployed in the same place Simple proxy pattern, proxy address is unchanged, no data migration is needed Connectors get deployed in the same accounts, connector state should be migrated if required Simple proxy pattern, proxy address is unchaged, contract state should be migrated if required 3rd party connectors should be fully upgradable in order to adopt the interface changes of the core bridge

A small group of experienced security engineers and companies become a governance counsil of the bridge. The entities hold keys from multisig contracts on both blockchains. Core bridge contracts and connectors on Ethereum implement proxy pattern, though the actual interface of contracts might change in the future.

This approach makes bridge more resilient, since we’re eliminating a single point of failure, by the cost of slower reaction to vulnerabilities.

Important: All dApps that would integrate with the bridge (either on the contract level or outside the blockchain), should be able to update the integration code, though the addresses of the contracts won’t be changed.

Stage 3. Separate multisig governance, proxy patterns with freeze periods, pausability and escape hatches, fixed interfaces

Bridge core, NEAR Bridge core, Ethereum Connectors, NEAR Connectors, Ethereum 3rd party connectors
Multisig Multisig Multisig Multisig
Bridge interface is rarely updated, all the updates have freeze period, multisig can pause the core bridge execution Bridge interface is rarely updated, all the updates have freeze period, multisig can pause the core bridge execution Connectors interface is rarely updated, all the updates have freeze period, multisig can pause the connector, users are able to withdraw funds through the escape hatches Connectors interface is rarely updated, all the updates have freeze period, multisig can pause the connector, users are able to withdraw funds through the escape hatches 3rd party connectors may be not upgradable

This stage doesn’t change the upgradability pattern, while changes the governance model. We’re making the bridge more predictable and friendly to the end-users by allowing them to quit the system at any point. Vulnerabilities are handled using freeze periods. Multisig contract is unable to stop the execution of the rainbow bridge forever.

Also at this stage, the interfaces are going to be mainly fixed, which means that the requirement for upgrading the integrations with the bridge together with the bridge will be abandoned.

Important: At this stage users that use the bridge need to follow the bridge updates. In case of the funds being at risk and the development team not able to fix the issue in time, we would be asking people to withdraw their funds from the bridge using escape hatches. All the funds that will remain in the bridge at that point will be at risk once pause period will end.

Stage 4. Validator DAO, proxy patterns with freeze periods, pausability and escape hatches, fixed interfaces

Bridge core, NEAR Bridge core, Ethereum Connectors, NEAR Connectors, Ethereum 3rd party connectors
Validator voting Bridged validator voting Validator voting Bridged validator voting
Bridge interface is rarely updated, all the updates have freeze period, multisig of the security team can pause the core bridge execution Bridge interface is rarely updated, all the updates have freeze period, multisig of the security team can pause the core bridge execution Connectors interface is rarely updated, all the updates have freeze period, multisig of the security team can pause the connector, users are able to withdraw funds through the escape hatches Connectors interface is rarely updated, all the updates have freeze period, multisig of the security team can pause the connector, users are able to withdraw funds through the escape hatches 3rd party connectors may be not upgradable

At this stage we’re transitioning to fully decentralised & permissionless governance model, that is aligned with the NEAR consensus mechanism. The entity that is able to update the bridge and connectors is a validator DAO, with 2/3 +1 stake required for the update to happen. However, for the severe vulnerabilities we’re still conserving the security team. This team would be able to pause the bridge and connectors and notify validators about the vulnerability. This approach guarantees the best in class protection for the bridge users.

Note: in order to update the Ethereum contracts, NEAR validators should vote on NEAR blockchain and then their decision should be bridged (using a new custom connector) to Ethereum.

Roadmap of the stages transitions

Transitions between stages should be agreed by the community and most probably depend on the experience of the usage of the bridge: activity of users, total locked value, bug bounty programs, frequency of severe vulnerabilities, etc. It is hard to predict now when actually the transitions between stages would happen.

Bridge development team is commited to produce the code that will implement upgradability and governance patterns, however, we let community decide on metrics for the transitions, similar to how the community was decing on the Mainnet stage 2 transition.

According to our vision of the bridge, we do believe that transition to stage 4 is an ultimate goal, however, the path to it should be as safe as possible to the bridge users.

Closing remarks

Though the proposed upgradability & governance plan might seem complicated, in fact it implements a simple idea: add a piece of the final functionality at a time:

  • Stage 2 introduces operations with multisig contracts and proxy patterns
  • Stage 3 introduces freeze period, pausability and escape hatches
  • Stage 4 introduces decentralised governance

Almost no code will be deleted during these migrations (except for the multisig governance of Ethereum contracts on the stage 4).

The sequence of the features corresponds to the raise of the complexity to value ratio.

14 Likes

Thanks for sharing!
Will DAO model work OK since the stage 1 if in the beginning DAO will contain only 1-2 councils from the core team for fast upgrades but it will add more councils (from validators, third party developers etc) over the time to increase decentralization?

4 Likes

IMO, yes.
There’re two minor differences of the proposed plan and what you have described:

  1. There’s no a de jure DAO on stages 2 and 3, since the only representation of it is two multisig wallet. We don’t want to complicate it with additional logic of voting / syncing of two networks. Remember: participants of the security committee should send updating transactions on both networks!
  2. On the stage 4, the DAO is equal to the consensus rules of NEAR – this is the ultimate goal of making the bridge fully decentralised in its entirety.
6 Likes

A DAO like an instance of https://sputnik.fund will be equivalent using a multisig wallet for this purpose on NEAR side.

On ETH side might be indeed better to use existing multisig software to start.
Though when we have a state prover, we could act on actions done on NEAR.

4 Likes