Step 3 – Configuration of the Patroni Cluster

Set up the Patroni cluster with the provided configuration file template and the PostgreSQL® binaries from the WinCC OA installation. Adapt the configuration to your project needs. For example, calculate and configure the disk space for WAL retention to enable fast recovery after a node failure.

1. Create the folder structure for the PostgreSQL® database

  1. Determine the major version number of PostgreSQL® from the installed PostgreSQL® binaries from the WinCC OA setup by running:
    psql --version
  2. Create the directory /var/lib/postgresql/<major version number of PostgreSQL®>. It will contain the data directory of the PostgreSQL® database. This directory structure is essential for future major version upgrades of the PostgreSQL® database.
  3. The /var/lib/postgresql directory and all its subdirectories must be owned by the postgres user and the postgres group. To achieve this, execute:
    sudo chown -R postgres /var/lib/postgresql
    sudo chgrp -R postgres /var/lib/postgresql

2. Calculate the amount of dedicated Disk Space used for retaining WAL Files

To enable fast recovery of a database server after a network partition or node failure, WAL files must be retained to cover the period from the current time back to the point of failure in the past. These retained WAL files are streamed from the Primary server to the recovering database server, allowing it to catch up and become a synchronized standby node again. This WAL retention mechanism helps avoid the need for a resource- and time-intensive base backup using pg_basebackup. However, it does require additional disk space, and the time period covered for fast recovery is limited by the available or reserved disk space.

Additionally, the dedicated disk space (Patroni settings wal_keep_size and max_wal_size) for WAL file retention must be determined based on the following factors:

  • The project's requirements for efficiently recoverable outage time (covered_outage_time).
  • The project's throughput, including the number of archived values per second (number_of_archived_values/s) and the number of alarms per second (number_of_alarms/s).
  • The size of each value and alarm in the WAL file format, which are 390 bytes per value and 1665 bytes per alarm, respectively. These sizes were determined through measurements on a test system.

    Based on the factors mentioned above, the wal_keep_size is calculated as follows:

      wal_keep_size = covered_outage_time * (number_of_alarms/s * 1665 bytes/alarm + number_of_archived_values/s * 390 Bytes/value)
  • Conversely, the efficiently recoverable outage time from a specific memory space is given by:
    covered_outage_time = wal_keep_size / (number_of_alarms/s * 1665 bytes/alarm + number_of_archived_values/s * 390 bytes/value)
  • PostgreSQL® requires a second parameter in addition to wal_keep_size. This parameter, called max_wal_size, acts as an additional high-water mark. We recommend setting max_wal_size to be slightly larger than wal_keep_size, for example by about 10%.
    max_wal_size = wal_keep_size * 1.1
  • It is important that the dedicated disk space is at least as large as max_wal_size. This space must always be reserved exclusively for storing WAL files. To ensure this, it is recommended to use a separate drive, as explained in a later step. When max_wal_size is exceeded and a checkpoint occurs in the PostgreSQL® database, old WAL files are deleted to maintain the max_wal_size limit. The checkpoint_timeout parameter controls how often outdated WAL files are deleted. In such cases, a standby node must be reinitialized using:
    patronictl --config-file /etc/patroni/patroni.yaml reinit pgcluster
  • For a standby node that was previously disconnected, recovery uses pg_basebackup. This process is resource-intensive for both the standby and the leading node, and the duration depends on the size of the database. You should balance the amount of disk space allocated for storing WAL files with the potential need for a resource-intensive recovery using pg_basebackup if the required WAL files have been deleted. Consider this trade-off when planning your system.

3. Configure Patroni on Database Servers 1 and 2

Copy the configuration file template from <WinCC OA installation>/data/NGA/PostgreSQL/HAC/patroni.template.yaml to /etc/patroni/patroni.yaml. Ensure that the user postgres has read permission for this file. Then, open the file and update the placeholder entries as indicated below:

# <this server's num> is either 1 or 2 for Database Server 1 or Database Server 2, respectively
name: patroni-<this server's num>

restapi:
  # IP address of local Database Server
  connect_address: <IP of patroni-<this server's num>>:8008
        
etcd3:
  hosts:
    - <IP of etcd-1>:2379
    - <IP of etcd-2>:2379
    - <IP of etcd-3>:2379

bootstrap:
  dcs:
    postgresql:
      parameters:
        wal_keep_size: <calculated wal_keep_size, e.g. '10GB'>
        max_wal_size: <calculated max_wal_size, e.g. '11GB'>
        checkpoint_timeout: <preferred checkpoint_timeout, e.g. '2min'>
      pg_hba:
        - host replication replicator <IP of patroni-1>/32 scram-sha-256 # Allow Database Server 1 to replicate
        - host replication replicator <IP of patroni-2>/32 scram-sha-256 # Allow Database Server 2 to replicate
        - host all all <IP of patroni-1>/32 scram-sha-256 # Allow Database Server 1 to connect
        - host all all <IP of patroni-2>/32 scram-sha-256 # Allow Database Server 2 to connect
        - host winccoa etm <IP of WinCC OA Server 1>/32 scram-sha-256 # Allow WinCC_OA Server 1 to connect
        - host winccoa etm <IP of WinCC OA Server 2>/32 scram-sha-256 # Allow WinCC_OA Server 2 to connect
        # add direct read clients as needed:
        #- host winccoa etm <IP of direct read client>/32 scram-sha-256

postgresql:
  connect_address: <IP of patroni-<this server's num>>[, <IP of 2nd network card of patroni-<this server's num>>]:5432 # Optional 2nd network address for higher availability
  data_dir: /var/lib/postgresql/<major version number of PostgreSQL®>/data # put the major version of PostgreSQL® in the path (postgresql.exe --version), necessary for future major version upgrade
  bin_dir: /usr/pgsql-<major version number of PostgreSQL>/bin # use the PostgreSQL® binaries previously installed from the WinCC OA delivery
  authentication:
    replication:
      password: <replicator password> # choose a safe password
    superuser:
      password: <postgres password>   # choose another safe password
Given the following IP addresses:
192.168.178.88 for Database Server 1 (patroni-1, etcd-1),
192.168.178.89 for Database Server 2 (patroni-2, etcd-1),
192.168.178.90 for Dedicated etcd Server (etcd-3),
192.168.178.91 for WinCC OA Server 1,
192.168.178.92 for WinCC OA Server 2
and using the database name winccoa with the user etm, the /etc/patroni/patroni.yaml file for Database Server 2 would look like the following:
scope: pgcluster
namespace: /service/
name: patroni-2

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.178.89:8008

etcd3:
  hosts:
    - 192.168.178.88:2379
    - 192.168.178.89:2379
    - 192.168.178.90:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 0
    synchronous_mode: on
    synchronous_node_count: 1
    postgresql:
      create_replica_methods:
        - basebackup
      remove_data_directory_on_rewind_failure: true
      remove_data_directory_on_diverged_timelines: true
      use_pg_rewind: true
      use_slots: false
      parameters:
        logging_collector: true
        log_directory: log
        log_filename: postgresql.log
        wal_log_hints: on
        wal_keep_size: '10GB'
        max_wal_size: '11GB'
        checkpoint_timeout: '2min'
        synchronous_commit: remote_apply
      pg_hba:
        - local all all scram-sha-256
        - local replication all scram-sha-256
        - host replication replicator 127.0.0.1/32 scram-sha-256
        - host replication replicator ::1/128 scram-sha-256
        - host replication replicator 192.168.178.88/32 scram-sha-256
        - host replication replicator 192.168.178.89/32 scram-sha-256
        - host all all 127.0.0.1/32 scram-sha-256
        - host all all ::1/128 scram-sha-256
        - host all all 192.168.178.88/32 scram-sha-256
        - host all all 192.168.178.89/32 scram-sha-256
        - host winccoa etm 192.168.178.91/32 scram-sha-256
        - host winccoa etm 192.168.178.92/32 scram-sha-256

  initdb: 
    - encoding: UTF8
    - data-checksums

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.178.89:5432
  data_dir: /var/lib/postgresql/17/data
  bin_dir: /usr/pgsql-17/bin
  authentication:
    replication:
      username: replicator
      password: Ex-Pwd-94#vXp7*TgRz-DoNotUse
    superuser:
      username: postgres
      password: Ex-Pwd-g7T#k9Z2@QxLmB-DoNotUse

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false

Copy the configuration file template from <WinCC OA installation>/data/NGA/PostgreSQL/HAC/patroni.template.yaml to /etc/patroni/patroni.yaml. Ensure that the user postgres has read permission for this file. Then, open the file and update the placeholder entries as indicated below:

# <this server's num> is either 1 or 2 for Database Server 1 or Database Server 2, respectively
name: patroni-<this server's num>

restapi:
  # IP address of local Database Server
  connect_address: <IP of patroni-<this server's num>>:8008
        
etcd3:
  hosts:
    - <IP of etcd-1>:2379
    - <IP of etcd-2>:2379
    - <IP of etcd-3>:2379

bootstrap:
  dcs:
    postgresql:
      parameters:
        wal_keep_size: <calculated wal_keep_size, e.g. '10GB'>
        max_wal_size: <calculated max_wal_size, e.g. '11GB'>
        checkpoint_timeout: <preferred checkpoint_timeout, e.g. '2min'>
      pg_hba:
        - host replication replicator <IP of patroni-1>/32 scram-sha-256 # Allow Database Server 1 to replicate
        - host replication replicator <IP of patroni-2>/32 scram-sha-256 # Allow Database Server 2 to replicate
        - host all all <IP of patroni-1>/32 scram-sha-256 # Allow Database Server 1 to connect
        - host all all <IP of patroni-2>/32 scram-sha-256 # Allow Database Server 2 to connect
        - host winccoa etm <IP of WinCC OA Server 1>/32 scram-sha-256 # Allow WinCC_OA Server 1 to connect
        - host winccoa etm <IP of WinCC OA Server 2>/32 scram-sha-256 # Allow WinCC_OA Server 2 to connect
        # add direct read clients as needed:
        #- host winccoa etm <IP of direct read client>/32 scram-sha-256

postgresql:
  connect_address: <IP of patroni-<this server's num>>[, <IP of 2nd network card of patroni-<this server's num>>]:5432 # Optional 2nd network address for higher availability
  data_dir: /var/lib/postgresql/<major version number of PostgreSQL®>/data # put the major version of PostgreSQL® in the path (postgresql.exe --version), necessary for future major version upgrade
  bin_dir: /usr/pgsql-<major version number of PostgreSQL>/bin # use the PostgreSQL® binaries previously installed from the WinCC OA delivery
  authentication:
    replication:
      password: <replicator password> # choose a safe password
    superuser:
      password: <postgres password>   # choose another safe password
Given the following IP addresses:
192.168.178.88 for Database Server 1 (patroni-1, etcd-1),
192.168.178.89 for Database Server 2 (patroni-2, etcd-1),
192.168.178.90 for Dedicated etcd Server (etcd-3),
192.168.178.91 for WinCC OA Server 1, 192.168.178.92 for WinCC OA Server 2
and using the database name winccoa with the user etm, the /etc/patroni/patroni.yaml file for Database Server 2 would look like the following:
scope: pgcluster
namespace: /service/
name: patroni-2

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.178.89:8008

etcd3:
  hosts:
    - 192.168.178.88:2379
    - 192.168.178.89:2379
    - 192.168.178.90:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 0
    synchronous_mode: on
    synchronous_node_count: 1
    postgresql:
      create_replica_methods:
        - basebackup
      remove_data_directory_on_rewind_failure: true
      remove_data_directory_on_diverged_timelines: true
      use_pg_rewind: true
      use_slots: false
      parameters:
        logging_collector: true
        log_directory: log
        log_filename: postgresql.log
        wal_log_hints: on
        wal_keep_size: '10GB'
        max_wal_size: '11GB'
        checkpoint_timeout: '2min'
        synchronous_commit: remote_apply
      pg_hba:
        - local all all scram-sha-256
        - local replication all scram-sha-256
        - host replication replicator 127.0.0.1/32 scram-sha-256
        - host replication replicator ::1/128 scram-sha-256
        - host replication replicator 192.168.178.88/32 scram-sha-256
        - host replication replicator 192.168.178.89/32 scram-sha-256
        - host all all 127.0.0.1/32 scram-sha-256
        - host all all ::1/128 scram-sha-256
        - host all all 192.168.178.88/32 scram-sha-256
        - host all all 192.168.178.89/32 scram-sha-256
        - host winccoa etm 192.168.178.91/32 scram-sha-256
        - host winccoa etm 192.168.178.92/32 scram-sha-256

  initdb: 
    - encoding: UTF8
    - data-checksums

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.178.89:5432
  data_dir: /var/lib/postgresql/17/data
  bin_dir: /usr/lib/postgresql/17/bin
  authentication:
    replication:
      username: replicator
      password: Ex-Pwd-94#vXp7*TgRz-DoNotUse
    superuser:
      username: postgres
      password: Ex-Pwd-g7T#k9Z2@QxLmB-DoNotUse

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false
Note:
According to the Patroni documentation https://patroni.readthedocs.io/en/latest/yaml_configuration.html, users are created during the initialization process. Replicas will use this user to access the replication source through streaming replication.

4. Create and start the Patroni service on both Database Servers

  1. Ensure that /etc/patroni/patroni.yaml can be read by all users:
    sudo chmod a+r /etc/patroni/patroni.yaml
  2. Copy the service file from <WinCC OA installation>/data/NGA/PostgreSQL/HAC/patroni.service to /etc/systemd/system/ and load the service using the following command:
    systemctl daemon-reload
    systemctl enable patroni.service
  3. Start the Patroni service:
    systemctl start patroni.service
  4. Check the status of Patroni using:
    systemctl status patroni.service
  5. If you encounter an error during startup, check for a more detailed explanation:
    journalctl -u patroni.service -e
  6. If the Patroni service is running, you can use the patronictl list command to check the status and initialization progress of the database on Database Server 1. For more details, see https://patroni.readthedocs.io/en/latest/patronictl.html.
    patronictl --config-file /etc/patroni/patroni.yaml list
  7. The output of the patronictl list command changes during the various stages of Patroni and PostgreSQL® database initialization. It begins with:
    + Cluster: pgcluster (uninitialized) -----------+
    | Member | Host | Role | State | TL | Lag in MB |
    +--------+------+------+-------+----+-----------+

    continues with:

    + Cluster: pgcluster (initializing) -----+---------+----+-----------+
    | Member      | Host           | Role    | State   | TL | Lag in MB |
    +-------------+----------------+---------+---------+----+-----------+
    | patroni-1   | 192.168.178.88 | Replica | stopped |    |   unknown |
    +-------------+----------------+---------+---------+----+-----------+

    and finally it should read:

    + Cluster: pgcluster (initializing) -----+--------------------------+----+-----------+
    | Member      | Host           | Role    | State                    | TL | Lag in MB |
    +-------------+----------------+---------+--------------------------+----+-----------+
    | patroni-1   | 192.168.178.88 | Replica | initializing new cluster |    |   unknown |
    +-------------+----------------+---------+--------------------------+----+-----------+
  8. If the database initializes and starts successfully on Database Server 1:
    • Start the Patroni service on Database Server 2 and monitor the service status and replication status by running the patronictl list command repeatedly:
      patronictl --config-file /etc/patroni/patroni.yaml list
    • The output of the patronictl list command changes during the various stages of Patroni and PostgreSQL® database initialization. It begins
      + Cluster: pgcluster (7524259526883188064) --------+----+-----------+
      | Member      | Host           | Role    | State   | TL | Lag in MB |
      +-------------+----------------+---------+---------+----+-----------+
      | patroni-1   | 192.168.178.88 | Leader  | running |  1 |           |
      | patroni-2   | 192.168.178.89 | Replica | stopped |    |   unknown |
      +-------------+----------------+---------+---------+----+-----------+

      continues with:

      + Cluster: pgcluster (7524259526883188064) -----------------+----+-----------+
      | Member      | Host           | Role    | State            | TL | Lag in MB |
      +-------------+----------------+---------+------------------+----+-----------+
      | patroni-1   | 192.168.178.88 | Leader  | running          |  1 |           |
      | patroni-2   | 192.168.178.89 | Replica | creating replica |    |   unknown |
      +-------------+----------------+---------+------------------+----+-----------+

      and finally it should read:

      + Cluster: pgcluster (7524259526883188064) ---+-----------+----+-----------+
      | Member      | Host           | Role         | State     | TL | Lag in MB |
      +-------------+----------------+--------------+-----------+----+-----------+
      | patroni-1   | 192.168.178.88 | Sync Standby | streaming |  2 |         0 |
      | patroni-2   | 192.168.178.89 | Leader       | running   |  2 |           |
      +-------------+----------------+--------------+-----------+----+-----------+
    • In the event of a successful replication and startup of the database on Database Server 2, it can be helpful to perform a switchover. This process promotes the Sync Standby to Leader, making the Sync Standby's PostgreSQL® database the new primary. You can accomplish this switchover by calling:
      patronictl --config-file /etc/patroni/patroni.yaml switchover
    • This operation will require additional user input. If you have only two nodes, as in the HAC, you can accept all suggested values by pressing Enter for each prompt and then confirming with 'y' at the end:
      Current cluster topology
      + Cluster: pgcluster (7524259526883188064) ---+-----------+----+-----------+
      | Member      | Host           | Role         | State     | TL | Lag in MB |
      +-------------+----------------+--------------+-----------+----+-----------+
      | patroni-1   | 192.168.178.88 | Leader       | running   |  1 |           |
      | patroni-2   | 192.168.178.89 | Sync Standby | streaming |  1 |         0 |
      +-------------+----------------+--------------+-----------+----+-----------+
      Primary [patroni-1]:
      Candidate ['patroni-2'] []:
      When should the switchover take place (e.g. 2025-07-07T13:21 )  [now]:
      Are you sure you want to switchover cluster pgcluster, demoting current leader patroni-1? [y/N]: y
      2025-07-07 12:21:39.27450 Successfully switched over to "patroni-2"
      + Cluster: pgcluster (7524259526883188064) --------+----+-----------+
      | Member      | Host           | Role    | State   | TL | Lag in MB |
      +-------------+----------------+---------+---------+----+-----------+
      | patroni-1   | 192.168.178.88 | Replica | stopped |    |   unknown |
      | patroni-2   | 192.168.178.89 | Leader  | running |  1 |           |
      +-------------+----------------+---------+---------+----+-----------+
                                          
  9. After the switchover was triggered, the progress of the switchover and the state of the cluster can again be monitored with the patronictl list command. The switchover is completed when the new Leader has state running and the new Sync Standby is streaming.
    + Cluster: pgcluster (7524259526883188064) ---+-----------+----+-----------+
    | Member      | Host           | Role         | State     | TL | Lag in MB |
    +-------------+----------------+--------------+-----------+----+-----------+
    | patroni-1   | 192.168.178.88 | Sync Standby | streaming |  2 |         0 |
    | patroni-2   | 192.168.178.89 | Leader       | running   |  2 |           |
    +-------------+----------------+--------------+-----------+----+-----------+

5. Change the DCS-Stored Runtime Configuration

Once the first Patroni node is bootstrapped on Database Server 1, as described above, all settings in patroni.yaml under bootstrap.dcs are saved in the etcd cluster.

After this initial bootstrap, the bootstrap.dcs section in the patroni.yaml file is no longer used for the lifetime of the etcd storage, unless you explicitly clear it to re-bootstrap the Patroni cluster. This is typically required during a major PostgreSQL® version upgrade.

You can use patronictl to view the settings stored in the DCS by running:
patronictl --config-file /etc/patroni/patroni.yaml show-config
and to modify them via
patronictl --config-file /etc/patroni/patroni.yaml edit-config
  1. When you use the edit-config option, the vim editor (Red Hat Linux) / nano editor (Debian) opens the configuration file stored in the DCS, allowing you to make changes directly.

    After you finish editing, save the file and exit the editor. The system will then show you a summary of the changes (a diff) you made. You will be prompted to confirm whether these changes are correct and should be applied.

    In the example below, an additional host is granted access to the PostgreSQL® database. For instance, this could be a host running a WinCC OA UI that uses direct read access to retrieve historical data:

    ---
    +++
    @@ -27,6 +27,7 @@
    - host winccoa etm 192.168.178.91/32 scram-sha-256
    - host winccoa etm 192.168.178.92/32 scram-sha-256
    +  - host winccoa etm 192.168.178.93/32 scram-sha-256
    remove_data_directory_on_diverged_timelines: true
    remove_data_directory_on_rewind_failure: true
    
    Apply these changes? [y/N]: y
  2. After you confirm the changes, the new configuration is saved in the DCS, and each Patroni node will apply the settings locally. Although the configuration is stored in the DCS, it is recommended to keep the bootstrap.dcs section in one of the patroni.yaml files up to date. This is important in case you need to bootstrap the cluster again, such as during a major PostgreSQL® version upgrade.

6. Move the pg_wal Directory to a dedicated Drive (recommended but optional)

Storing the WAL on a separate disk from the main database files can improve performance. To do this, move the pg_wal directory to a new location while the server is shut down. Then, create a symbolic link from the original location in the main data directory to the new location. For more details, refer to https://www.postgresql.org/docs/current/wal-internals.html.

Follow these steps on each Patroni node, one node at a time:

  1. Stop the Patroni service with:
    systemctl stop patroni.service
    This action will also shut down the PostgreSQL® database.
  2. Move the /var/lib/postgresql/data/pg_wal directory to the dedicated drive, for example /dev/sdb1, while preserving the permissions and ownership of pg_wal. Then, create a symbolic link to /dev/sdb1/pg_wal using the following command:
    sudo ln -s /dev/sdb1/pg_wal /var/lib/postgresql/17/data/pg_wal
  3. Start the Patroni service and wait until it reaches the Sync Standby role.