Step 3 – Configuration of the Patroni Cluster

Set up the Patroni cluster with the provided configuration file template and the PostgreSQL® binaries from the WinCC OA installation. Adapt the configuration to your project needs. For example, calculate and configure the disk space for WAL retention to enable fast recovery after a node failure.

1. Create the folder structure for the PostgreSQL® database

  1. Determine the major version number of PostgreSQL® from the PostgreSQL® binaries in the WinCC OA installation by running postgres.exe --version, e.g.
    C:\><WinCC OA Installation>\bin\pgsql\bin\postgres.exe --version
    postgres (PostgreSQL) <major version number>.<minor version number>
  2. Create the directory C:\postgres\<major version number of PostgreSQL>. It will contain the data directory of the PostgreSQL® database. This directory structure is essential for future major version upgrades of the PostgreSQL® database.
  3. Grant the user pes full access to the created directory structure by setting:
    icacls C:\postgres /q /c /t /grant pes:F

2. Calculate the amount of dedicated Disk Space used for retaining WAL Files

For achieving fast recovery of a database server after a network partition or a node failure, WAL files must be retained to cover the time period from the point of failure up to the current time. These retained WAL files are streamed from the Primary server to the recovering database server, allowing it to catch up and become a synchronized standby node again. This WAL retention mechanism helps avoid the need for a resource- and time-intensive base backup using pg_basebackup.

However, it does require additional disk space, and the time period covered for fast recovery is limited by the available or reserved disk space.

The amount of dedicated disk space (Patroni settings wal_keep_size and max_wal_size) needed for WAL file retention depends on several factors:

  • The project's requirements for how much outage time should be efficiently recoverable (covered_outage_time).
  • The project's throughput, specifically the number of archived values per second (number_of_archived_values/s) and the number of alarms per second (number_of_alarms/s).
  • The size of each value and alarm in the WAL file format, which are 390 bytes per value and 1665 bytes per alarm, respectively. These sizes were determined through measurements on a test system.

    Based on the factors mentioned above, the wal_keep_size is calculated as follows:

    wal_keep_size = covered_outage_time * (number_of_alarms/s * 1665 bytes/alarm + number_of_archived_values/s * 390 Bytes/value)
  • Conversely, the efficiently recoverable outage time from a specific memory space is given by
    covered_outage_time = wal_keep_size / (number_of_alarms/s * 1665 bytes/alarm + number_of_archived_values/s * 390 bytes/value)
    PostgreSQL® requires a second parameter additional to wal_keep_size. It serves as an additional high-water-mark and is named max_wal_size. It is recommended to be slightly larger e.g. by 10%:
    max_wal_size = wal_keep_size * 1.1
  • It is important that the dedicated disk space is at least as large as max_wal_size. This space should always be reserved exclusively for storing WAL files. To ensure this, it is recommended to use a separate drive, as explained in a later step.
  • When max_wal_size is exceeded and a checkpoint occurs in the PostgreSQL® database, old WAL files are deleted to maintain the max_wal_size limit. The frequency at which outdated WAL files are deleted is controlled by the checkpoint_timeout parameter. In this situation, a standby node must be reinitialized using:
    C:\PES>patronictl.bat reinit pgcluster
  • For a standby node that was previously disconnected, this type of recovery uses pg_basebackup. This process can be resource-intensive for both the standby node and the leading node. It may also take a long time to complete, depending on the size of the database. It is important to balance the amount of disk space allocated for storing WAL files with the potential need for a resource-intensive recovery using pg_basebackup if the required WAL files are deleted. Consider this trade-off when planning your storage and backup strategy.

3. Configure Patroni on Database Servers 1 and 2

  1. Configure Patroni on Database Servers 1 and 2
    • Copy the configuration file template <WinCC OA Installation>\data\NGA\PostgreSQL\HAC\patroni.template.yaml to C:\PES\patroni\patroni.yaml.
    • Edit the file and replace the placeholders with the appropriate values as indicated below:
      # <this server's num> is either 1 or 2 for Database Server 1 or Database Server 2, respectively
      name: patroni-<this server's num>
      
      restapi:
        # IP address of local Database Server
        connect_address: <IP of patroni-<this server's num>>:8008
              
      etcd3:
        hosts:
          - <IP of etcd-1>:2379
          - <IP of etcd-2>:2379
          - <IP of etcd-3>:2379
      
      bootstrap:
        dcs:
          postgresql:
            parameters:
              wal_keep_size: <calculated wal_keep_size, e.g. '10GB'>
              max_wal_size: <calculated max_wal_size, e.g. '11GB'>
              checkpoint_timeout: <preferred checkpoint_timeout, e.g. '2min'>
            pg_hba:
              - host replication replicator <IP of patroni-1>/32 scram-sha-256 # Allow Database Server 1 to replicate
              - host replication replicator <IP of patroni-2>/32 scram-sha-256 # Allow Database Server 2 to replicate
              - host all all <IP of patroni-1>/32 scram-sha-256 # Allow Database Server 1 to connect
              - host all all <IP of patroni-2>/32 scram-sha-256 # Allow Database Server 2 to connect
              - host winccoa etm <IP of WinCC OA Server 1>/32 scram-sha-256 # Allow WinCC_OA Server 1 to connect
              - host winccoa etm <IP of WinCC OA Server 2>/32 scram-sha-256 # Allow WinCC_OA Server 2 to connect
              # add direct read clients as needed:
              #- host winccoa etm <IP of direct read client>/32 scram-sha-256
      
      postgresql:
        connect_address: <IP of patroni-<this server's num>>[, <IP of 2nd network card of patroni-<this server's num>>]:5432 # Optional 2nd network address for higher availability
        data_dir: 'C:\postgres\<major version number of PostgreSQL>\data' # put the major version of PostgreSQL in the path (postgresql.exe --version), necessary for future major version upgrade
        bin_dir: '<WinCC OA installation>\bin\pgsql\bin' # use the PostgreSQL binaries from the WinCC OA installation
        authentication:
          replication:
            password: <replicator password> # choose a safe password
          superuser:
            password: <postgres password>   # choose another safe password                                
      Assuming the IP addresses are:
      192.168.178.88 for Database Server 1 (patroni-1, etcd-1),
      192.168.178.89 for Database Server 2 (patroni-2, etcd-1),
      192.168.178.90 for Dedicated etcd Server (etcd-3),
      192.168.178.91 for WinCC OA Server 1,
      192.168.178.92 for WinCC OA Server 2

      With the database name winccoa and the user etm, the file C:\PES\patroni\patroni.yaml would look like this for Database Server 2:

      scope: pgcluster
      namespace: /service/
      name: patroni-2
      
      restapi:
        listen: 0.0.0.0:8008
        connect_address: 192.168.178.89:8008
      
      etcd3:
        hosts:
          - 192.168.178.88:2379
          - 192.168.178.89:2379
          - 192.168.178.90:2379
      
      bootstrap:
        dcs:
          ttl: 30
          loop_wait: 10
          retry_timeout: 10
          maximum_lag_on_failover: 0
          synchronous_mode: on
          synchronous_node_count: 1
          postgresql:
            create_replica_methods:
              - basebackup
            remove_data_directory_on_rewind_failure: true
            remove_data_directory_on_diverged_timelines: true
            use_pg_rewind: true
            use_slots: false
            parameters:
              logging_collector: true
              log_directory: log
              log_filename: postgresql.log
              wal_keep_size: '10GB'
              max_wal_size: '11GB'
              checkpoint_timeout: '2min'
              synchronous_commit: remote_apply
            pg_hba:
              - local all all scram-sha-256
              - local replication all scram-sha-256
              - host replication replicator 127.0.0.1/32 scram-sha-256
              - host replication replicator ::1/128 scram-sha-256
              - host replication replicator 192.168.178.88/32 scram-sha-256
              - host replication replicator 192.168.178.89/32 scram-sha-256
              - host all all 127.0.0.1/32 scram-sha-256
              - host all all ::1/128 scram-sha-256
              - host all all 192.168.178.88/32 scram-sha-256
              - host all all 192.168.178.89/32 scram-sha-256
              - host winccoa etm 192.168.178.91/32 scram-sha-256
              - host winccoa etm 192.168.178.92/32 scram-sha-256
      
        initdb: 
          - encoding: UTF8
          - data-checksums
      
      postgresql:
        listen: 0.0.0.0:5432
        connect_address: 192.168.178.89:5432
        data_dir: 'C:\postgres\17\data'
        bin_dir: 'C:\Program Files\Siemens\WinCC_OA\3.21\bin\pgsql\bin'
        authentication:
          replication:
            username: replicator
            password: Ex-Pwd-94#vXp7*TgRz-DoNotUse
          superuser:
            username: postgres
            password: Ex-Pwd-g7T#k9Z2@QxLmB-DoNotUse
      
      tags:
        nofailover: false
        noloadbalance: false
        clonefrom: false
        nosync: false

4. Install the Patroni Service and start the Patroni Cluster

  1. To install the Patroni service, run the following command on both database servers:
    C:\PES\patroni\patroni_service.exe install
  2. Start the Patroni service (Patroni HA Windows Service) in the Task Manager on database server 1 to bootstrap the first Patroni node.

    Next, repeatedly run the 'patronictl list' command to check for diagnostics and monitor the database initialization on Database Server 1. For more information, see also https://patroni.readthedocs.io/en/latest/patronictl.html

    C:\PES>patronictl.bat list

    The output of the 'patronictl list' command changes during the different stages of initializing Patroni and the PostgreSQL® database. It starts as follows:

    + Cluster: pgcluster (initializing) -----+---------+----+-----------+
    | Member      | Host           | Role    | State   | TL | Lag in MB |
    +-------------+----------------+---------+---------+----+-----------+
    | patroni-1   | 192.168.178.88 | Replica | stopped |    |   unknown |
    +-------------+----------------+---------+---------+----+-----------+

    and continues with:

    + Cluster: pgcluster (initializing) -------+----+-----------+
    | Member | Host        | Role    | State   | TL | Lag in MB |
    +--------+-------------+---------+---------+----+-----------+
    | win1   | 10.2.42.195 | Replica | stopped |    |   unknown |
    +--------+-------------+---------+---------+----+-----------+

    and

    + Cluster: pgcluster (initializing) -----+--------------------------+----+-----------+
    | Member      | Host           | Role    | State                    | TL | Lag in MB |
    +-------------+----------------+---------+--------------------------+----+-----------+
    | patroni-1   | 192.168.178.88 | Replica | initializing new cluster |    |   unknown |
    +-------------+----------------+---------+--------------------------+----+-----------+

    and should finally read:

    Cluster: pgcluster (7524258301090557228) -------+----+-----------+
    | Member | Host | Role | State | TL | Lag in MB |
    +-------------+----------------+--------+---------+----+-----------+
    | patroni-1 | 192.168.178.88 | Leader | running | 1 | |
    +-------------+----------------+--------+---------+----+-----------+

    If the database successfully initializes and starts on Database Server 1, follow these steps:

  3. Open the Task Manager on Database Server 2 and start the Patroni service (Patroni HA Windows Service).

    Then, repeatedly run the 'patronictl list' command to monitor the database replication process on Database Server 2:

    C:\PES>patronictl.bat list

    The output of the 'patronictl list' command changes during the different stages of initializing Patroni and the PostgreSQL® database. It starts as follows:

    + Cluster: pgcluster (7524259526883188064) --------+----+-----------+
    | Member      | Host           | Role    | State   | TL | Lag in MB |
    +-------------+----------------+---------+---------+----+-----------+
    | patroni-1   | 192.168.178.88 | Leader  | running |  1 |           |
    | patroni-2   | 192.168.178.89 | Replica | stopped |    |   unknown |
    +-------------+----------------+---------+---------+----+-----------+

    and continues with:

    + Cluster: pgcluster (7524259526883188064) -----------------+----+-----------+
    | Member      | Host           | Role    | State            | TL | Lag in MB |
    +-------------+----------------+---------+------------------+----+-----------+
    | patroni-1   | 192.168.178.88 | Leader  | running          |  1 |           |
    | patroni-2   | 192.168.178.89 | Replica | creating replica |    |   unknown |
    +-------------+----------------+---------+------------------+----+-----------+

    and should finally read:

    + Cluster: pgcluster (7524259526883188064) ---+-----------+----+-----------+
    | Member      | Host           | Role         | State     | TL | Lag in MB |
    +-------------+----------------+--------------+-----------+----+-----------+
    | patroni-1   | 192.168.178.88 | Sync Standby | streaming |  2 |         0 |
    | patroni-2   | 192.168.178.89 | Leader       | running   |  2 |           |
    +-------------+----------------+--------------+-----------+----+-----------+

    In the event of a successful replication and startup of the database on Database Server 2, it can sometimes be helpful to perform a switchover. This action promotes the Sync Standby to the Leader, making the Sync Standby's PostgreSQL® database the primary database.

Switchover

You can achieve this by performing a switchover by calling:
C:\PES\patronictl.bat switchover

This operation needs more input from the user. If there are only two nodes, like in the HAC, you can accept all the suggested values by pressing Enter and then confirming with 'y' at the end:

+ Cluster: pgcluster (7524259526883188064) ---+-----------+----+-----------+
| Member      | Host           | Role         | State     | TL | Lag in MB |
+-------------+----------------+--------------+-----------+----+-----------+
| patroni-1   | 192.168.178.88 | Leader       | running   |  1 |           |
| patroni-2   | 192.168.178.89 | Sync Standby | streaming |  1 |         0 |
+-------------+----------------+--------------+-----------+----+-----------+
Primary [patroni-1]:
Candidate ['patroni-2'] []:
When should the switchover take place (e.g. 2025-07-07T13:21 )  [now]:
Are you sure you want to switchover cluster pgcluster, demoting current leader patroni-1? [y/N]: y
2025-07-07 12:21:39.27450 Successfully switched over to "patroni-2"
+ Cluster: pgcluster (7524259526883188064) --------+----+-----------+
| Member      | Host           | Role    | State   | TL | Lag in MB |
+-------------+----------------+---------+---------+----+-----------+
| patroni-1   | 192.168.178.88 | Replica | stopped |    |   unknown |
| patroni-2   | 192.168.178.89 | Leader  | running |  1 |           |
+-------------+----------------+---------+---------+----+-----------+

After the switchover is triggered, you can monitor the progress of the switchover and the cluster state using the patronictl list command.

The switchover is complete when the new Leader has the state running and the new Sync Standby is streaming.

+ Cluster: pgcluster (7524259526883188064) ---+-----------+----+-----------+
| Member      | Host           | Role         | State     | TL | Lag in MB |
+-------------+----------------+--------------+-----------+----+-----------+
| patroni-1   | 192.168.178.88 | Sync Standby | streaming |  2 |         0 |
| patroni-2   | 192.168.178.89 | Leader       | running   |  2 |           |
+-------------+----------------+--------------+-----------+----+-----------+

5. Change the DCS-stored Runtime Configuration

After the first Patroni node is bootstrapped on Database Server 1, as described above, all settings under bootstrap.dcs in the patroni.yaml file are saved in the etcd cluster. From this point forward, the bootstrap.dcs section in patroni.yaml is no longer used for the lifetime of the etcd storage, unless you explicitly clear it to re-bootstrap the Patroni cluster. This is typically required during a major PostgreSQL® version upgrade. You can use patronictl to display the settings stored in the DCS:
C:\PES>patronictl.bat show-config pgcluster
and to modify them via:
C:\PES>patronictl.bat edit-config pgcluster
  1. When you use the edit-config option, the Micro editor opens, allowing you to modify the configuration stored in DCS. After making your changes, save the file by pressing Ctrl+S and exit the editor with Ctrl+Q. A summary of your changes (diff) will be shown, and you will be prompted to confirm whether the changes are correct and should be applied. In the example below, an additional host is granted access to the PostgreSQL® database. This could be, for instance, a host running a WinCC OA UI that uses direct read access to retrieve historical data.
    ---
    +++
    @@ -27,6 +27,7 @@
    - host winccoa etm 192.168.178.91/32 scram-sha-256
    - host winccoa etm 192.168.178.92/32 scram-sha-256
    +  - host winccoa etm 192.168.178.93/32 scram-sha-256
    remove_data_directory_on_diverged_timelines: true
    remove_data_directory_on_rewind_failure: true
    
    Apply these changes? [y/N]: y
  2. After confirming the changes, the new configuration will be saved in the DCS, and each Patroni node will apply the updated settings locally. Although the configuration is stored in the DCS, it is recommended to keep the bootstrap.dcs section in one of the patroni.yaml files up to date. This is important in case you need to bootstrap the cluster again, such as during a major PostgreSQL® version upgrade.

6. Move the pg_wal Directory to a dedicated Drive (recommended but optional)

Storing the WAL on a separate disk from the main database files can improve performance. To do this, move the pg_wal directory to a new location while the server is shut down. Then, create a symbolic link from the original location in the main data directory to the new location. For more details, refer to https://www.postgresql.org/docs/current/wal-internals.html.

Follow these steps on each Patroni node, one node at a time:

  1. Stop the Patroni service using the Task Manager . This will shut down the PostgreSQL® database.
  2. Move the folder C:\postgres<major version number of PostgreSQL>\data\pg_wal to the dedicated drive, for example, W:.
  3. Create a symbolic link to W:\pg_wal by running the following command:
    mklink /D "C:\postgres\<major version number of PostgreSQL>\data\pg_wal" "W:\pg_wal"
  4. Start the Patroni service in the Task Manager and wait until it reaches the Sync Standby role.