Failure behavior and switching criteria

The redundancy manager is responsible for monitoring the redundancy state (which computer is active/passive) on both computers. The redundancy manager is started before the project specific managers (Drivers, CTRL, etc.). The redundancy manager also monitors the error state of both systems. The errors are configured with a weighting in the system overview panel (please refer to System overview in redundant systems). The error state is determined when initializing and is updated continuously (optimum state is 0). The monitoring can be configured for all managers, TCP connections, selected data point elements, working memory and hard disk capacity.

The following priorities apply for the active/passive state in a redundant system (the passive computer becomes active and the active computer becomes passive):

Priority 1: Failure:

Complete failure of a computer or neither of the redundant network connections exists. It is no more possible to switch the control If the redundant network connections fail completely. In this case both server will be active.
Priority 2: Manually forced control (set active):

This makes it possible to switch to a computer despite one of the subordinate priorities, provided this is still possible in terms of hardware and software. This priority is considered as switching of the control. With the help of the switching of the control the desired system is set immediately active. This is independent of the error status.
Priority 3: Different error state:

Communication failures of managers, partial failure of a computer (hardware or software). The system switches to the computer with the lower error state.
Priority 4: Defining the priority:

Via this priority it is possible to change (switch) the active computer manually. This switch applies only if both computers run error free or have the same error state.

Note:

After a switch (active/passive) in the redundancy mode, a general query is initiated by the driver automatically!

If one (or more) of the above switchover criteria are met and automatic switchover has been configured or is forced by the user, operational control is transferred to the redundancy partner. This is the partner with the smaller error status. Only the data of this active partner is displayed on both UIs. The passive system is limited to the alignment of the process data.

The redundancy works independently and does not depend on user inputs and responses. However, certain inputs from users are accepted (please refer to priorities 2 and 4). Manual switching triggered by the user has to be executed in the system overview panel.

The following responses are triggered when certain managers fail:

A complete restart of the project and recovery is executed when the data, the archive, the event or the redundancy managers fail.
Note:
The failure of the archive manager only applies to value archives. If an NGA or an RDB manager fails, a project is not restarted.
All other managers will restart the manager or no actions are executed depending on the configuration in the console.

Note:

The reaction of the individual manager is dependent on the setting of the start type in the console (please refer to Administration of managers). The start type of the data, the archive, the event and the redundancy manager is set to "always" by default and can not be changed in order to guarantee a proper operation in the redundancy case!

Note:

If the redundancy partners of a redundant project lose the connection to each other, both WinCC OA projects become active. After reestablishing the connection, the system stops the project with the higher error state and the project will be restarted. With the config entry [calcstate] useOfflineErrorstateInfo also the maximum offline error state can be considered when calculating the error state.

Note:

With the config entry [redu] firstActiveChangeInterval the first redundancy switchover after a connection error (e.g. switch failure) can be delayed by the specified time.

If the connection of the data managers to each other fails during database recovery in the startup phase of a server, the starting project is stopped and restarted.

If the recovery of the database has already been completed, the project continues to boot and automatically becomes active after passiveRecoveryTimeout (Event) expires.

CAUTION:

All statistical functions with a calculation interval greater than the normal project start time must be initialized with the initialize calculation from archive option to get correct values after a server restart.

Note:

UIs started locally on the server must use a fixed manager number. This manager number must be lower than the default defined by using the config entry [general] lowestAutoManNumUI.

If this server 1 loses the network connection, server 2 has lost the connection to the UI with the number 8 and releases this number again. If a new UI is started on a client, the number 8 is assigned to it by server 2 (server 1 is not reachable via the network). If the servers re-establish the connection, the client UI with the number 8 cannot establish a connection to server 1. The UI number 8 is already used by the local UI on server 1.