Hello,
I have a problem on my project which I can't see the reason.
All my system are under CentOS 7, servers are redundant and running WinCCOA 3.14 P10, clients are running WinCCOA 3.14 P10 too.
Let's say the server 1 is active and server 2 is inactive. The client connects on the server 1.
When I have a network failure on the server 1, the client reconnect on the server 2 (both server becomes active).
But when the network is back, the client disconnect (before 17min30 of network failure) and tries to reconnect on the server 1 (server 2 is rebooting).
There is a loop of reconnect from the client, every 30s, but everytime it fails until 17min30 after the connection loss, and then reconnect successfully.
T : connection loss.
T+17min : reconnect fails.
T+17min30 : reconnect success.
It seems to have a 17min30 UI network reset and I don't know why.
Can someone explain me why is it happening and how can I reduce that time to an acceptable period ?
Regards,
UI Reconnection failure for 17min30 after connection lost to redundant system (unix)
- uxout
- Posts:82
- Joined: Wed Jul 20, 2016 12:07 pm
UI Reconnection failure for 17min30 after connection lost to redundant system (unix)
- mkerk
- Posts:75
- Joined: Wed Oct 20, 2010 12:25 pm
Re: UI Reconnection failure for 17min30 after connection lost to redundant system (unix)
Hello,
without having a look on the WinCC OA Logfiles from both redu-servers and Client projects, it´s difficult to find an explanation for this behavior.
I would recommend to get in contact with the official WinCC OA support and submit a service request using
support.industry.siemens.com/cs/start
BR,
Mousser.
without having a look on the WinCC OA Logfiles from both redu-servers and Client projects, it´s difficult to find an explanation for this behavior.
I would recommend to get in contact with the official WinCC OA support and submit a service request using
support.industry.siemens.com/cs/start
BR,
Mousser.
- uxout
- Posts:82
- Joined: Wed Jul 20, 2016 12:07 pm
Re: UI Reconnection failure for 17min30 after connection lost to redundant system (unix)
here is an update after several tests.
It's only happening on Unix (tested under CentOS 7) and on WinCC OA 3.14 (tested under 3.14P10 and 3.15P7), not happening on Windows systems. Will happen on 3.16 probably, I'm going to test tonight.
So here is the minimal case to reproduce the situation, we need 2 systems (Client + Server)
- On the Server machine, create a redundant project
- On the Server machine, add a CTRL Manager with option "webclient_http.ctl"
- On the Client machine, launch a WCCOAUi with option "-server http://:8080", it should open a login panel, just log as "root".
- On the Client machine, note the UI Number (you can see this in "System overview" panel)
- Now, just cut the network link between the 2 systems (unplug the network cable or disable the network adapter on 1 of the 2 systems)
- You should see on Server machine, on the "System overview" panel that the UI of the client will still remains connected until around 18 min after network loss (green bar under UI icon)
On Windows systems, the UI client socket is getting closed quickly after connection loss.
I have investigated furthermore by adding debug flags "-rcv 1 -dbg alive", it seems that there are alive messages coming in loop from the unconnected client socket.
Each ~6sec, I get the following message on the WinCCOA Console : "WCCILproxy1:WCCILproxy (1), 2018.05.30 04:27:03.121, DBG, (SYS: 1 Event -num 0 CONN: 1) @ s-winccoa-p1:4998 >R (SYS: 1 Ui -num 8 CONN: 1(R)) @ S-WINCCOA-P2:45500 read: 96 left: 130976/ #buffer 0".
s-winccoa-p1 is the server, s-winccoa-p2 is the client actually.
I have used netstat and I noticed that the socket remains active after connection loss between s-winccoa-p1 and s-winccoa-p2.
Could this be a bug ? If not, how can I bypass that situation?
It's only happening on Unix (tested under CentOS 7) and on WinCC OA 3.14 (tested under 3.14P10 and 3.15P7), not happening on Windows systems. Will happen on 3.16 probably, I'm going to test tonight.
So here is the minimal case to reproduce the situation, we need 2 systems (Client + Server)
- On the Server machine, create a redundant project
- On the Server machine, add a CTRL Manager with option "webclient_http.ctl"
- On the Client machine, launch a WCCOAUi with option "-server http://:8080", it should open a login panel, just log as "root".
- On the Client machine, note the UI Number (you can see this in "System overview" panel)
- Now, just cut the network link between the 2 systems (unplug the network cable or disable the network adapter on 1 of the 2 systems)
- You should see on Server machine, on the "System overview" panel that the UI of the client will still remains connected until around 18 min after network loss (green bar under UI icon)
On Windows systems, the UI client socket is getting closed quickly after connection loss.
I have investigated furthermore by adding debug flags "-rcv 1 -dbg alive", it seems that there are alive messages coming in loop from the unconnected client socket.
Each ~6sec, I get the following message on the WinCCOA Console : "WCCILproxy1:WCCILproxy (1), 2018.05.30 04:27:03.121, DBG, (SYS: 1 Event -num 0 CONN: 1) @ s-winccoa-p1:4998 >R (SYS: 1 Ui -num 8 CONN: 1(R)) @ S-WINCCOA-P2:45500 read: 96 left: 130976/ #buffer 0".
s-winccoa-p1 is the server, s-winccoa-p2 is the client actually.
I have used netstat and I noticed that the socket remains active after connection loss between s-winccoa-p1 and s-winccoa-p2.
Could this be a bug ? If not, how can I bypass that situation?
- uxout
- Posts:82
- Joined: Wed Jul 20, 2016 12:07 pm
Re: UI Reconnection failure for 17min30 after connection lost to redundant system (unix)
Hello,
Case closed ! We have found a solution.
Just to help other people in case of the same problem.
I have set in /etc/sysctl.conf :
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 1
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_retries2 = 5
and now the clients UI are correctly disconnected after 15 sec of network failure.
Thanks for your help.
Case closed ! We have found a solution.
Just to help other people in case of the same problem.
I have set in /etc/sysctl.conf :
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 1
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_retries2 = 5
and now the clients UI are correctly disconnected after 15 sec of network failure.
Thanks for your help.