Icon node failover recovery

What are current p rep candidates using or looking to use for node failover recovery? In order to maintain a high rate of block production, it is very important to have a system for this. Also, with the 6% burn if dropping below 85% block production rate, this makes it even more important

Yes. it is important topic that requires some attention. I guess we nee to deploy 2+ redundant servers and some manual or auto Load-balance capabilities…

Yes. But then there is risk of double signing, which would induce penalties. Hopefully icon can add protection against double signing so we can add this, as it would be ideal. We have brought this up in the testnet group.

I proposed two easy options for validators on our ICON Testnet group: double signing protection on protocol level so that one validator can have two or more nodes with the same keys active in the same time; Or a backup node that is only watching and synching with the network in order to always have the latest state and once is needed it can be activated by command to also produce and sign blocks. There are other options too but imply some work and have some risks and are not as good as these.

ICON team said both options will be available but no timeline yet. I’m specially excited about the first one as it will make very easy to run two machines actively in the same time and basically reducing the risk of downtime extremely.

I think these would be great. Let’s keep in touch on this and follow icon’s developments. Hopefully finalized in time to give us at least a few weeks to implement and test.

I am happy to build in test cases into a terraform deployment I have going on here to support @ChainodeCapital motivations for failover.

https://github.com/robc-io/terragrunt-icon-insight-p-rep/tree/master/aws/single-p-rep/dev

It is very easy for me to segregate features with this layout in different environments. Will start thread covering what I am doing soon.