Step 3 – Configuration of the Patroni Cluster
Set up the Patroni cluster with the provided configuration file template and the PostgreSQL® binaries from the WinCC OA installation. Adapt the configuration to your project needs. For example, calculate and configure the disk space for WAL retention to enable fast recovery after a node failure.
1. Create the folder structure for the PostgreSQL® database
- Determine the major version number of PostgreSQL® from the PostgreSQL® binaries
in the WinCC OA installation by running
postgres.exe --version, e.g.C:\><WinCC OA Installation>\bin\pgsql\bin\postgres.exe --version postgres (PostgreSQL) <major version number>.<minor version number> - Create the directory C:\postgres\<major version number of PostgreSQL>. It will contain the data directory of the PostgreSQL® database. This directory structure is essential for future major version upgrades of the PostgreSQL® database.
- Grant the user
pesfull access to the created directory structure by setting:icacls C:\postgres /q /c /t /grant pes:F
2. Calculate the amount of dedicated Disk Space used for retaining WAL Files
For achieving fast recovery of a database server after a network partition or a
node failure, WAL files must be retained to cover the time period from the point
of failure up to the current time. These retained WAL files are streamed from the Primary server to the
recovering database server, allowing it to catch up and become a
synchronized standby node again. This WAL retention mechanism helps avoid
the need for a resource- and time-intensive base backup using
pg_basebackup.
However, it does require additional disk space, and the time period covered for fast recovery is limited by the available or reserved disk space.
The amount of dedicated disk space (Patroni settings wal_keep_size
and max_wal_size) needed for WAL file retention depends on
several factors:
- The project's requirements for how much outage time should be
efficiently recoverable (
covered_outage_time). - The project's throughput, specifically the number of archived values per
second (
number_of_archived_values/s) and the number of alarms per second (number_of_alarms/s). - The size of each value and alarm in the WAL file format, which are 390
bytes per value and 1665 bytes per alarm, respectively. These sizes were
determined through measurements on a test system.
Based on the factors mentioned above, the
wal_keep_sizeis calculated as follows:wal_keep_size = covered_outage_time * (number_of_alarms/s * 1665 bytes/alarm + number_of_archived_values/s * 390 Bytes/value)
- Conversely, the efficiently recoverable outage time from a specific memory space
is given by
PostgreSQL® requires a second parameter additional tocovered_outage_time = wal_keep_size / (number_of_alarms/s * 1665 bytes/alarm + number_of_archived_values/s * 390 bytes/value)wal_keep_size. It serves as an additional high-water-mark and is namedmax_wal_size. It is recommended to be slightly larger e.g. by 10%:max_wal_size = wal_keep_size * 1.1 - It is important that the dedicated disk space is at least as large as
max_wal_size. This space should always be reserved exclusively for storing WAL files. To ensure this, it is recommended to use a separate drive, as explained in a later step. - When
max_wal_sizeis exceeded and a checkpoint occurs in the PostgreSQL® database, old WAL files are deleted to maintain themax_wal_sizelimit. The frequency at which outdated WAL files are deleted is controlled by thecheckpoint_timeoutparameter. In this situation, a standby node must be reinitialized using:C:\PES>patronictl.bat reinit pgcluster - For a standby node that was previously disconnected, this type of recovery uses
pg_basebackup. This process can be resource-intensive for both the standby node and the leading node. It may also take a long time to complete, depending on the size of the database. It is important to balance the amount of disk space allocated for storing WAL files with the potential need for a resource-intensive recovery usingpg_basebackupif the required WAL files are deleted. Consider this trade-off when planning your storage and backup strategy.
3. Configure Patroni on Database Servers 1 and 2
- Configure Patroni on Database Servers 1 and 2
- Copy the configuration file template <WinCC OA Installation>\data\NGA\PostgreSQL\HAC\patroni.template.yaml to C:\PES\patroni\patroni.yaml.
- Edit the file and replace the placeholders with the appropriate values
as indicated below:
# <this server's num> is either 1 or 2 for Database Server 1 or Database Server 2, respectively name: patroni-<this server's num> restapi: # IP address of local Database Server connect_address: <IP of patroni-<this server's num>>:8008 etcd3: hosts: - <IP of etcd-1>:2379 - <IP of etcd-2>:2379 - <IP of etcd-3>:2379 bootstrap: dcs: postgresql: parameters: wal_keep_size: <calculated wal_keep_size, e.g. '10GB'> max_wal_size: <calculated max_wal_size, e.g. '11GB'> checkpoint_timeout: <preferred checkpoint_timeout, e.g. '2min'> pg_hba: - host replication replicator <IP of patroni-1>/32 scram-sha-256 # Allow Database Server 1 to replicate - host replication replicator <IP of patroni-2>/32 scram-sha-256 # Allow Database Server 2 to replicate - host all all <IP of patroni-1>/32 scram-sha-256 # Allow Database Server 1 to connect - host all all <IP of patroni-2>/32 scram-sha-256 # Allow Database Server 2 to connect - host winccoa etm <IP of WinCC OA Server 1>/32 scram-sha-256 # Allow WinCC_OA Server 1 to connect - host winccoa etm <IP of WinCC OA Server 2>/32 scram-sha-256 # Allow WinCC_OA Server 2 to connect # add direct read clients as needed: #- host winccoa etm <IP of direct read client>/32 scram-sha-256 postgresql: connect_address: <IP of patroni-<this server's num>>[, <IP of 2nd network card of patroni-<this server's num>>]:5432 # Optional 2nd network address for higher availability data_dir: 'C:\postgres\<major version number of PostgreSQL>\data' # put the major version of PostgreSQL in the path (postgresql.exe --version), necessary for future major version upgrade bin_dir: '<WinCC OA installation>\bin\pgsql\bin' # use the PostgreSQL binaries from the WinCC OA installation authentication: replication: password: <replicator password> # choose a safe password superuser: password: <postgres password> # choose another safe passwordAssuming the IP addresses are:192.168.178.88 for Database Server 1 (patroni-1, etcd-1), 192.168.178.89 for Database Server 2 (patroni-2, etcd-1), 192.168.178.90 for Dedicated etcd Server (etcd-3), 192.168.178.91 for WinCC OA Server 1, 192.168.178.92 for WinCC OA Server 2With the database name
winccoaand the useretm, the file C:\PES\patroni\patroni.yaml would look like this forDatabase Server 2:scope: pgcluster namespace: /service/ name: patroni-2 restapi: listen: 0.0.0.0:8008 connect_address: 192.168.178.89:8008 etcd3: hosts: - 192.168.178.88:2379 - 192.168.178.89:2379 - 192.168.178.90:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 0 synchronous_mode: on synchronous_node_count: 1 postgresql: create_replica_methods: - basebackup remove_data_directory_on_rewind_failure: true remove_data_directory_on_diverged_timelines: true use_pg_rewind: true use_slots: false parameters: logging_collector: true log_directory: log log_filename: postgresql.log wal_keep_size: '10GB' max_wal_size: '11GB' checkpoint_timeout: '2min' synchronous_commit: remote_apply pg_hba: - local all all scram-sha-256 - local replication all scram-sha-256 - host replication replicator 127.0.0.1/32 scram-sha-256 - host replication replicator ::1/128 scram-sha-256 - host replication replicator 192.168.178.88/32 scram-sha-256 - host replication replicator 192.168.178.89/32 scram-sha-256 - host all all 127.0.0.1/32 scram-sha-256 - host all all ::1/128 scram-sha-256 - host all all 192.168.178.88/32 scram-sha-256 - host all all 192.168.178.89/32 scram-sha-256 - host winccoa etm 192.168.178.91/32 scram-sha-256 - host winccoa etm 192.168.178.92/32 scram-sha-256 initdb: - encoding: UTF8 - data-checksums postgresql: listen: 0.0.0.0:5432 connect_address: 192.168.178.89:5432 data_dir: 'C:\postgres\17\data' bin_dir: 'C:\Program Files\Siemens\WinCC_OA\3.21\bin\pgsql\bin' authentication: replication: username: replicator password: Ex-Pwd-94#vXp7*TgRz-DoNotUse superuser: username: postgres password: Ex-Pwd-g7T#k9Z2@QxLmB-DoNotUse tags: nofailover: false noloadbalance: false clonefrom: false nosync: false
4. Install the Patroni Service and start the Patroni Cluster
- To install the Patroni service, run the following command
on both database
servers:
C:\PES\patroni\patroni_service.exe install - Start the Patroni service (Patroni HA Windows Service) in the Task
Manager on database server 1 to bootstrap the first Patroni
node.
Next, repeatedly run the
'patronictl list'command to check for diagnostics and monitor the database initialization onDatabase Server 1. For more information, see also https://patroni.readthedocs.io/en/latest/patronictl.htmlC:\PES>patronictl.bat listThe output of the
'patronictl list'command changes during the different stages of initializing Patroni and the PostgreSQL® database. It starts as follows:+ Cluster: pgcluster (initializing) -----+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +-------------+----------------+---------+---------+----+-----------+ | patroni-1 | 192.168.178.88 | Replica | stopped | | unknown | +-------------+----------------+---------+---------+----+-----------+and continues with:
+ Cluster: pgcluster (initializing) -------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +--------+-------------+---------+---------+----+-----------+ | win1 | 10.2.42.195 | Replica | stopped | | unknown | +--------+-------------+---------+---------+----+-----------+and
+ Cluster: pgcluster (initializing) -----+--------------------------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +-------------+----------------+---------+--------------------------+----+-----------+ | patroni-1 | 192.168.178.88 | Replica | initializing new cluster | | unknown | +-------------+----------------+---------+--------------------------+----+-----------+and should finally read:
Cluster: pgcluster (7524258301090557228) -------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +-------------+----------------+--------+---------+----+-----------+ | patroni-1 | 192.168.178.88 | Leader | running | 1 | | +-------------+----------------+--------+---------+----+-----------+If the database successfully initializes and starts on Database Server 1, follow these steps:
- Open the Task Manager on Database Server 2 and start the
Patroni service (Patroni HA Windows Service).
Then, repeatedly run the
'patronictl list'command to monitor the database replication process on Database Server 2:C:\PES>patronictl.bat listThe output of the
'patronictl list'command changes during the different stages of initializing Patroni and the PostgreSQL® database. It starts as follows:+ Cluster: pgcluster (7524259526883188064) --------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +-------------+----------------+---------+---------+----+-----------+ | patroni-1 | 192.168.178.88 | Leader | running | 1 | | | patroni-2 | 192.168.178.89 | Replica | stopped | | unknown | +-------------+----------------+---------+---------+----+-----------+and continues with:
+ Cluster: pgcluster (7524259526883188064) -----------------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +-------------+----------------+---------+------------------+----+-----------+ | patroni-1 | 192.168.178.88 | Leader | running | 1 | | | patroni-2 | 192.168.178.89 | Replica | creating replica | | unknown | +-------------+----------------+---------+------------------+----+-----------+and should finally read:
+ Cluster: pgcluster (7524259526883188064) ---+-----------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +-------------+----------------+--------------+-----------+----+-----------+ | patroni-1 | 192.168.178.88 | Sync Standby | streaming | 2 | 0 | | patroni-2 | 192.168.178.89 | Leader | running | 2 | | +-------------+----------------+--------------+-----------+----+-----------+In the event of a successful replication and startup of the database on Database Server 2, it can sometimes be helpful to perform a switchover. This action promotes the Sync Standby to the Leader, making the Sync Standby's PostgreSQL® database the primary database.
Switchover
C:\PES\patronictl.bat switchover
This operation needs more input from the user. If there are only two nodes, like in the HAC, you can accept all the suggested values by pressing Enter and then confirming with 'y' at the end:
+ Cluster: pgcluster (7524259526883188064) ---+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+--------------+-----------+----+-----------+
| patroni-1 | 192.168.178.88 | Leader | running | 1 | |
| patroni-2 | 192.168.178.89 | Sync Standby | streaming | 1 | 0 |
+-------------+----------------+--------------+-----------+----+-----------+
Primary [patroni-1]:
Candidate ['patroni-2'] []:
When should the switchover take place (e.g. 2025-07-07T13:21 ) [now]:
Are you sure you want to switchover cluster pgcluster, demoting current leader patroni-1? [y/N]: y
2025-07-07 12:21:39.27450 Successfully switched over to "patroni-2"
+ Cluster: pgcluster (7524259526883188064) --------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+---------+----+-----------+
| patroni-1 | 192.168.178.88 | Replica | stopped | | unknown |
| patroni-2 | 192.168.178.89 | Leader | running | 1 | |
+-------------+----------------+---------+---------+----+-----------+After the switchover is triggered, you can monitor the progress of the
switchover and the cluster state using the patronictl list
command.
The switchover is complete when the new Leader has the state running
and the new Sync Standby is streaming.
+ Cluster: pgcluster (7524259526883188064) ---+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+--------------+-----------+----+-----------+
| patroni-1 | 192.168.178.88 | Sync Standby | streaming | 2 | 0 |
| patroni-2 | 192.168.178.89 | Leader | running | 2 | |
+-------------+----------------+--------------+-----------+----+-----------+5. Change the DCS-stored Runtime Configuration
bootstrap.dcs in the
patroni.yaml file are saved in the etcd
cluster. From this point forward, the bootstrap.dcs section in
patroni.yaml is no longer used for the lifetime of the etcd
storage, unless you explicitly clear it to re-bootstrap the Patroni cluster. This is
typically required during a major PostgreSQL® version upgrade. You can use
patronictl to display the settings stored in the
DCS:C:\PES>patronictl.bat show-config pgclusterC:\PES>patronictl.bat edit-config pgcluster- When you use the
edit-configoption, the Micro editor opens, allowing you to modify the configuration stored inDCS. After making your changes, save the file by pressing Ctrl+S and exit the editor with Ctrl+Q. A summary of your changes (diff) will be shown, and you will be prompted to confirm whether the changes are correct and should be applied. In the example below, an additional host is granted access to the PostgreSQL® database. This could be, for instance, a host running a WinCC OA UI that uses direct read access to retrieve historical data.--- +++ @@ -27,6 +27,7 @@ - host winccoa etm 192.168.178.91/32 scram-sha-256 - host winccoa etm 192.168.178.92/32 scram-sha-256 + - host winccoa etm 192.168.178.93/32 scram-sha-256 remove_data_directory_on_diverged_timelines: true remove_data_directory_on_rewind_failure: true Apply these changes? [y/N]: y - After confirming the changes, the new configuration will be saved in the
DCS, and each Patroni node will apply the updated settings locally. Although the configuration is stored in theDCS, it is recommended to keep thebootstrap.dcssection in one of the patroni.yaml files up to date. This is important in case you need to bootstrap the cluster again, such as during a major PostgreSQL® version upgrade.
6. Move the pg_wal Directory to a dedicated Drive (recommended but optional)
Storing the WAL on a separate disk from the main database files can improve
performance. To do this, move the pg_wal directory to a new
location while the server is shut down. Then, create a symbolic link from the
original location in the main data directory to the new location. For more details,
refer to https://www.postgresql.org/docs/current/wal-internals.html.
Follow these steps on each Patroni node, one node at a time:
- Stop the Patroni service using the Task Manager . This will shut down the PostgreSQL® database.
- Move the folder C:\postgres<major version number of PostgreSQL>\data\pg_wal to the dedicated drive, for example, W:.
- Create a symbolic link to W:\pg_wal by running the
following
command:
mklink /D "C:\postgres\<major version number of PostgreSQL>\data\pg_wal" "W:\pg_wal" - Start the Patroni service in the Task Manager and wait until it reaches the Sync Standby role.
