Node stuck in "EvaluateNetwork" -> no crep_root_hash


#1

I am facing a bug that I initially thought was on my end but I am now suspecting it is not. Any help MASSIVELY appreciated.

I grepped the logs and found this one error:

ERROR peer_loader.
py(55) There's no crep_root_hash to initialize.". 

This is happening with a couple nodes of mine that are stuck in “EvaluateNetwork” state after starting up on TestNet. My firewall is open on the right ports, keystore and password are in right locations, and I am using the docker-compose off the docs.

I have three nodes stuck, two on testnet and one on main that I am helping another with. I tried a number of different ways to register the node / change the registration with preptools (setPRep) but still it does not seem to get past error. Also cleared data directory and resynced many times but still not getting past error. I also did a manual install with a bare bones server (no security hardening scripts / monitoring or any additional features / no ansible configuration steps) and still getting same error.

I checked the keystore and it is indeed matching what I registered with.

My nodes are 3.20.43.8 / 3.20.80.144 on testnet shown on the bottom of the monitor.

Started delving into source and exploring in the container but haven’t been able to deduce the error.

For the container logs I see this:

 [2020-01-10 08:41:52.868] Your IP: 3.20.43.8                                                                                        
  [2020-01-10 08:41:52.872] RPC_PORT: 9000 / RPC_WORKER: 3                                                                            
  [2020-01-10 08:41:52.876] DEFAULT_PATH=/data/PREP-TestNet in Docker Container                                                       
  [2020-01-10 08:41:52.880] DEFAULT_LOG_PATH=/data/PREP-TestNet/log                                                                   
  [2020-01-10 08:41:52.884] DEFAULT_STORAGE_PATH=/data/PREP-TestNet/.storage                                                          
  [2020-01-10 08:41:52.888] scoreRootPath=/data/PREP-TestNet/.score_data/score                                                        
  [2020-01-10 08:41:52.891] stateDbRootPath=/data/PREP-TestNet/.score_data/db                                                         
  [2020-01-10 08:41:52.895] GENESIS_DATA_PATH=/prep_peer/conf/genesis.json                                                            
  [2020-01-10 08:41:52.898] P-REP package version info - prep-node_1912090356xb1e1fe                                                  
  [2020-01-10 08:41:54.969] iconcommons             1.1.2                                                                             
iconrpcserver           1.4.9                                                                                                         
iconsdk                 1.2.0                                                                                                         
iconservice             1.5.20                                                                                                        
loopchain               2.4.20                                                                                                        
  [2020-01-10 08:41:54.973] NETWORK_ENV=PREP-TestNet, SERVICE=zicon, ENDPOINT_URL=https://zicon.net.solidwallet.io, SERVICE_API = http
s://zicon.net.solidwallet.io/api/v3                                                                                                   
  [2020-01-10 08:41:56.022] This node is SubPRep - 0x1                                                                                
  [2020-01-10 08:41:58.255] == [OK] Write json file -> /prep_peer/conf/configure.json, 1.1 KB                                         
Alive peer_list=['54.203.169.166:9000', '3.9.209.139:9000', '195.201.28.160:9000', '167.86.67.6:9000', '94.229.45.8:9000'], fastpeer_d
omains=['https://zicon.net.solidwallet.io', 'fastpeer0.icon:9000', 'fastpeer1.icon:9000', 'fastpeer2.icon:9000', 'fastpeer3.icon:9000'
, 'fastpeer4.icon:9000']  
  [2020-01-10 08:41:58.258] CHANNEL_MANAGE_DATA not found - /prep_peer/conf/channel_manange_data.json  
0
  [2020-01-10 08:41:58.348] START FASTEST MODE : NETWORK_NAME=ZiconPrepNet  
  [2020-01-10 08:41:58.356] [PASS] Already file - /data/PREP-TestNet/ZiconPrepNet_BH2563347_data-20200110_0800.tar.gz  
Wait for rabbitmq-server(127.0.0.1) / USE_EXTERNAL_MQ: false - sleeping
...
 [2020-01-10 10:41:42.512] Network: PREP-TestNet / RUN_MODE: '' / LOG_OUTPUT_TYPE: file  
  [2020-01-10 10:41:42.537] [OK] CHECK=0, Run loop-peer and loop-channel start -> ''  
  [2020-01-10 10:41:42.560] [OK] CHECK=0, Run iconservice start!  
  [2020-01-10 10:41:42.596] [OK] CHECK=0, Run iconrpcserver start!   
 completed with 0 plugins.
[*] To exit press CTRL+C
[2020-01-10 10:41:44 +0900] [512] [INFO] Starting gunicorn 19.9.0
[2020-01-10 10:41:44 +0900] [512] [INFO] Listening at: http://0.0.0.0:9000 (512)
[2020-01-10 10:41:44 +0900] [512] [INFO] Using worker: sanic.worker.GunicornWorker
[2020-01-10 10:41:44 +0900] [537] [INFO] Booting worker with pid: 537
[2020-01-10 10:41:44 +0900] [544] [INFO] Booting worker with pid: 544
[2020-01-10 10:41:44 +0900] [551] [INFO] Booting worker with pid: 551
  [2020-01-10 10:42:27.642] Start Health check ... 30s, HEALTH_ENV_CHECK=true  
  [2020-01-10 10:42:57.647] Start PROC_HEALTH_CHECK ... 30s  
  [2020-01-10 10:42:57.696] Start API_HEALTH_CHECK  ... 30s  
  [2020-01-10 10:42:58.785] == Alive peer_list=['44.225.138.147:9000', '35.180.186.51:9000', '157.245.40.25:9000', '163.172.4.189:9000
', '195.201.28.160:9000'], fastpeer_domains=['fastpeer0.icon', 'fastpeer1.icon', 'fastpeer2.icon', 'fastpeer3.icon', 'fastpeer4.icon']
  
  [2020-01-10 10:42:58.789] Start BlockCheck  
  [2020-01-10 10:42:58.842] Time synchronization with NTP / NTP SERVER: time.google.com  
10

Last my docker-compose

version: "3"
services:
  prep-node:
     image: "iconloop/prep-node:latest" # Added latest 
     container_name: "prep-testnet"
     network_mode: host     
     restart: "always"
     environment:
        NETWORK_ENV: "PREP-TestNet"
        CERT_PATH: "/cert"
        LOOPCHAIN_LOG_LEVEL: "DEBUG"
        ICON_LOG_LEVEL: "DEBUG"        
        FASTEST_START: "yes" 
        PRIVATE_KEY_FILENAME: "keystore" # this is the actual name 
        PRIVATE_PASSWORD: "the right password"
     cap_add:
        - SYS_TIME      
     volumes:
        - ./data:/data # mount a data volumes
        - ./cert:/cert # Automatically generate cert key files here
     ports:
        - 9000:9000
        - 7100:7100

#2

Hello Rob!

I suppose that the FASTEST_START option was not working well. If the old block zip file are remained, the FASTEST_START option could not be working. Or maybe it is just unknown problem.

When you start to sync from the scratch, you need the CREP_ROOT_HASH of the “crep” of the network. Pls add this variable into “environment” part on your docker-compose.yml to solve your problem.

CREP_ROOT_HASH: 0x9718f5d6d6ddb77f547ecc7113c8f1bad1bf46220512fbde356eee74a90ba47c

Thanks!


#3

Thanks Bong for the info. I put that environment variable into the docker-compose and now it is not showing that error in the logs but I am still stuck in the EvaluateNetwork state with the FASTEST_START option set to yes. I then turned off FASTEST_START and it is now syncing from the first block in BLOCK_SYNC mode which will take days / weeks on testnet and months for main.

I then left FASTEST_START on, synced the DB, then turned off FASTEST_SYNC and brought the container up and down but the node is still stuck in EvaluateNetwork. Seems like setting crep_root_hash fixed the error in the logs but not the issue with bringing up the node.

Has anyone else faced this? I did a thorough search in the telegram channels and mostly saw people getting over it by cleaning out data dir then resyncing the DB which I have done many times.

Thank you for your help. Any other suggestions as this is impacting a number of nodes?


#4

Hello, Rob.

Actually this problem can occur when you are syncing the block data from scratch so we do not recommand block syncing from scratch. And FASTEST_START option wouldn’t work if you have a legacy snapshot so you cannot sync from the latest block snapshot.

So just delete data folder and restart your node. Then, you can get the block data from snapshot and it will work. Thanks!


#5

Hi Bong,

I had tried wiping the data dir multiple times and restarting the nodes and still not working on TestNet. Prior mentions of this EvaluateNetwork state in Telegram have also mentioned that wiping data dir helped this but not working for me. Will work on the node for main tomorrow and report back but had tried wiping data dir on that node as well. I can’t help but think this is a common problem between the two networks considering they both are stuck in the “EvaluateNetwork” state.

Never seen this issue before and the only thing that has slightly changed is the registration command I used so to be clear, here are those configs so we can potentially rule that out.

registerPRep.json:

{
    "name": "Insight-C9",
    "country": "USA",
    "city": "San Francisco",
    "email": "insight.icon.prep@gmail.com",
    "website": "https://insight-icon.net",
    "details": "https://static.insight-icon.net/testnet/registration/details_tn4c9.json",
    "p2pEndpoint": "3.130.49.55:7100"
}

details.json -> https://static.insight-icon.net/testnet/registration/details_tn4c9.json

{
  "representative": {
    "logo": {
      "logo_256": "https://www.home.insight-icon.net/static/insight-logo-256.png",
      "logo_1024": "https://www.home.insight-icon.net/static/insight-logo-1024.png",
      "logo_svg": "https://www.home.insight-icon.net/static/insight-logo-256.svg"
    },
    "media": {
      "steemit": "",
      "twitter": "",
      "youtube": "",
      "facebook": "",
      "github": "",
      "reddit": "",
      "keybase": "",
      "telegram": "",
      "wechat": ""
    }
  },
  "server": {
    "location": {
      "country": "USA",
      "city":  "us-east-2"
    },
    "server_type": "cloud",
    "api_endpoint": "3.130.49.55:9000"
  }
}

preptools_config.json

{
    "url": "https://zicon.net.solidwallet.io",
    "nid": 80,
    "keystore": "keystore"
}

Registration command with preptools with preptools_config.json

preptools registerPRep --prep-json ./registerPRep.json

output:

Request] ======================================================================
{
    "from_": "hx879ecb4a95c33219449fb5259172023e18baf79a",
    "to": "cx0000000000000000000000000000000000000000",
    "value": 2000000000000000000000,
    "step_limit": null,
    "nid": 80,
    "nonce": null,
    "version": 3,
    "timestamp": null,
    "method": "registerPRep",
    "data_type": "call",
    "params": {
        "name": "Insight-C9",
        "country": "USA",
        "city": "San Francisco",
        "email": "insight.icon.prep@gmail.com",
        "website": "https://insight-icon.net",
        "details": "https://static.insight-icon.net/testnet/registration/details_tn4c9.json",
        "p2pEndpoint": "3.130.49.55:7100"
    }
}

> Continue? [Y/n]Y
[Response] =====================================================================
txHash : 0x1370c3c5f7e7d0129f0405a11cbeac09560fe97cb1960d9d85e89cb8c2fa488e

curl localhost:9000/api/v1/status/peer | jq -r :

{
  "made_block_count": 0,
  "leader_made_block_count": -1,
  "nid": null,
  "status": "Service is online: 0",
  "state": "EvaluateNetwork",
  "service_available": false,
  "peer_type": "0",
  "audience_count": "0",
  "consensus": "siever",
  "peer_id": "hx879ecb4a95c33219449fb5259172023e18baf79a",
  "block_height": 0,
  "round": -1,
  "epoch_height": -1,
  "unconfirmed_block_height": -1,
  "total_tx": 0,
  "unconfirmed_tx": 0,
  "peer_target": "3.130.49.55:7100",
  "leader_complaint": 1,
  "peer_count": -1,
  "leader": "",
  "epoch_leader": "",
  "versions": {
    "loopchain": "2.4.20",
    "iconservice": "1.5.20",
    "iconrpcserver": "1.4.9",
    "iconcommons": "1.1.2",
    "earlgrey": "0.0.4",
    "icon_rc": "v1.2.0"
  },
  "mq": {
    "peer": {
      "message_count": 0
    },
    "channel": {
      "message_count": 0
    },
    "score": {
      "message_count": 0
    }
  }
}

Thank you so much for your help. Considering how many times I have done all this it is very confusing to me that this is not working and kind of stressing me out as this should be trivial for me at this point.


#6

As an update, this is no longer affecting the nodes in main. The issue was considered to be affecting nodes in both test and main before when they were both getting hung up on the same state -> EvaluateNetwork but after redeploying a node on main, the issue is now only in testnet.


#7

This is resolved. Issue was a missing environment variable that needs to get updated in the docs.

I included SWITCH_BH_VERSION4: 1587271 and CREP_ROOT_HASH: "0x9718f5d6d6ddb77f547ecc7113c8f1bad1bf46220512fbde356eee74a90ba47c" and now nodes comes on fine.