WCCILevent - REDU/SEVERE - EventManager, evMain, Redundant peer recovery timeout - aborting recovery (passive recovery)

Enclosed you'll find the explanation for a log message which can occur during startup in a redundant system when the recovery for the event manager failed. The log message is written to the PVSS_II.log-file.

WCCILevent (0), 2014.09.24 14:13:58.482, REDU, SEVERE, 54, Unexpected state, EventManager, evMain, Redundant peer recovery timeout - aborting recovery

Log-message with symbolic names:

WCCILevent (0), <TIMESTAMP>, REDU, SEVERE, 54, Unexpected state, EventManager, evMain, Redundant peer recovery timeout - aborting recovery

The log-message is written when the allowed time is exceeded on the system which is starting up and therefore making the passive recovery. The maximum time for the recovery is defined with the following config entry in the config redu file in the [event]-section (value is defined in seconds):

passiveRecoveryTimeout = 300

The time starts when the initialization of the event manager is finished. Within the timeout the event manager has to establish the connection to the other server and to exchange buffered data.

The timeout can be reached when a lot of buffered data needs to be exchanged or when the network is slow.

If you want to change the timeout you have to do it in a config.redu file stored in your project.

If the timeout was reached you’ll see the following block of log messages. The messages describe that the own system is becoming active and the data-manager is also aborting the recovery:

WCCILevent (0), < TIMESTAMP >, REDU, SEVERE, 54, Unexpected state, EventManager, evMain, Redundant peer recovery timeout - aborting recovery
WCCILevent (0), < TIMESTAMP >, REDU, INFO, 0, , Status change to active requested, but we are still recovering. Change status ASAP
WCCILdata (0), < TIMESTAMP >, REDU, WARNING, 0, , Recovery request aborted from event.

On the other server project you will see a block of corresponding log messages. The first message describes that the recovery abort from the other server was received. The data manager gives this information to the event manager and closes the connection to the data manager on the other server:

WCCILdata (0), < TIMESTAMP >, REDU, WARNING, 0, , Recovery aborted from data.
WCCILdata (0), < TIMESTAMP >, SYS, INFO, 181, Closing connection to (SYS: 0 Data -num 0 CONN: 1)
WCCILdata (0), < TIMESTAMP >, SYS, INFO, 39, Connection lost, MAN: (SYS: 0 Data -num 0 CONN: 1), Connection closed
WCCILevent (0), < TIMESTAMP >, REDU, SEVERE, 54, Unexpected state, EvManager, doReceive, Active recovery aborted with RECOVERY_ABORT message