WCCILdata - REDU/WARNING - DataManager, recoveryTimeoutExpired, Recovery timeout expired, aborting recovery and restarting data manager

Enclosed you'll find the explanation for a log-message which can occur during startup in a redundant system when the recovery of the database failed. The log-message is written to the PVSS_II.log-file.

WCCILdata (0), 2014.09.24 10:31:14.121, REDU, WARNING, 54, Unexpected state, DataManager, recoveryTimeoutExpired, Recovery timeout expired, aborting recovery and restarting data manager



Log-message with symbolic names:
 

WCCILdata (0), <TIMESTAMP>, REDU, WARNING, 54, Unexpected state, DataManager, recoveryTimeoutExpired, Recovery timeout expired, aborting recovery and restarting data manager


The log-message is written when the allowed time is exceeded on the system which is starting up and therefore making the passive recovery. The maximum time for the recovery of the database is defined with the following config-entry in the config-redu-file at the [data]-section (value is defined in seconds):
passiveRecoveryTimeout = 1800

If the timeout is reached you have to look why this happened. It can be caused by a slow network, hard disc with an insufficient read/write performance or when a lot of data needs to be copied.
If you want to change the timeout you have to do it in a config.redu-file stored in your project.

When the recovery is started you will normally see the following block of log-messages, at the given example also the timeout-message was added:
WCCILdata (0), <TIMESTAMP>, REDU, INFO, 0, , Sending recovery request to other replica
WCCILdata (0), <TIMESTAMP>, REDU, INFO, 0, , Recovery request accepted, sending file list request
WCCILdata (0), <TIMESTAMP>, REDU, INFO, 0, , File transfer request sent
WCCILdata (0), <TIMESTAMP>, REDU, WARNING, 54, Unexpected state, DataManager, recoveryTimeoutExpired, Recovery timeout expired, aborting recovery and restarting data manager


In rare cases the recovery request in not answered correctly by the running project on the other server in the redundant system. Then you will see the following block of log-messages. The time between the messages is 2 minutes.

WCCILdata (0<TIMESTAMP>, REDU, INFO, 0, , Sending recovery request to other replica
WCCILdata (0), <TIMESTAMP>, REDU, WARNING, 54, Unexpected state, DataManager, recoveryTimeoutExpired, Recovery timeout expired, aborting recovery and restarting data manager
In that case changing the config-entry has no effect. This timeout of 2 minutes is hardcoded in the source code.
If this situation occurred you have to try the startup and recovery again, normally it works when starting the recovery again.


At the following FAQ-entry it is described how to check the hardware performance for the recovery:
portal.etm.at/index.php

Date added:
Last revised:
Hits:
8.817
Rating:
Rating: 3.0. 51 vote(s).
51 anonymous votes
No rating done at all.
Your vote was '' (0 of 5) You are an anonymous user.
You may log on to do personalized votings
Click the rating bar to rate this item Please log on to do ratings
  • Notification

    FE user cannot be identified! (1403201096)

Tags:
Redundancy