OID subindex column queries in the SNMP driver

Post by **rsneddon** » Tue Mar 12, 2019 2:29 am

Hi ETM,

We're developing a monitoring system on top of WinCC OA (3.16 P003) and have encountered some unexpected behaviour when dynamically querying column data via the SNMP manager.

As part of the system we are monitoring routers and switches via SNMP and would like to indicate changes of port connectivity.
The particular routers that we are monitoring (RuggedCom 1500X and RS900G) use a dynamic table to indicate which ports are active and what destination system the port is connected to (basically we want to do LLDP-light)
To map this data we need to retrieve both the OID (to determine which port) and the value to show what it is currently connected to.

Reading through the online help I have found the entry concerning the use of 'subindex 2' in the topic 'Defining the peripheral addresses of the SNMP driver'

Subindex
The subindexhas a different meaning depending on the send/receive direction. The sending of arrays is not supported in the send direction. Thus, the sub index is in this case always 0. A 64bit number is an exception. For a 64bit number the higher-order part of the number is set via the sub index 1. In the receive direction the sub index 2 always means the OID of the received value (datatype text). This is necessary when a table is read column by column, so that the individual rows can be assigned to an OID. The sub index 1 returns the higher-order part of the number in case of an Uint64. Via the sub index 0 the actual value is accessed. The elements can be polled only when the sub index is 0 (polling = the manager sends a request to the agent and the agent sends a response to the manager).
NOTE
To use sub index of 2, you also have to use sub index of 0 or else it won't work.

We've created a pair of data point elements
<dp>.OIDs dyn_string
17_1.0.8802.1.1.2.1.4.1.1.7B Subindex 2, Driver 1 (polling 30s)
Transformation: visible string (see note below)

<dp>.Values dyn_string
17_1.0.8802.1.1.2.1.4.1.1.7B Subindex 0, Driver 1 (polling 30s)
Transformation: visible string

First questions:

The <dp>.OIDs element alternates between containing the OIDs and their associated values.

This is observable in GEDI and also in _dp_fct handler / dpConnect callback - is this expected behaviour?
I can add logic to determine when the OIDs element is showing values vs OIDs based on the content (the values will never contain the source OID prefix so can ignore value changes on the dyn_string that are not actually OIDs)

I've also tried setting the OIDs element to transformation type 'objectID' to see if this stopped the flashing, but this instead generates a log message of the form

Code: Select all

WCCOAsnmp    (1), 2019.03.12 09:07:55.791, PARAM,WARNING,    54, Unexpected state, The _address "17_1.0.8802.1.1.2.1.4.1.1.7B" was already defined with the transformation type 666! Other transformation type 664 will be ignored.

However, even if we address the above, we are finding we get intermittent failures where the request ends up terminating with > 1000 entries returned in the dyn_string (there are a max of 16 physical ports on the equipment) - this typically takes a few minutes and spikes the router's CPU usage up above 80%, obviously not something we'd be keen to have happening on a production system.

We've looked at WireShark traces of the SNMP traffic when this is happening and can see that the WinCC OA driver is handling get-response/get-next-request pairs to query the table and that the device is responding with multiple time-stamped OID's for each port (e.g. below the trailing 65.65 identifies one of the 16 available ports, but we get multiple results for this port and eventually the WinCC OA driver terminates the query after ~1000 results have been returned without actually getting to the end of the table)

Code: Select all

get-next .1.0.8802.1.1.2.1.4.1.1.7.6.65.65
get-next .1.0.8802.1.1.2.1.4.1.1.7.267667972.65.65
get-next .1.0.8802.1.1.2.1.4.1.1.7.267668027.65.65
get-next .1.0.8802.1.1.2.1.4.1.1.7.267668153.65.65
get-next .1.0.8802.1.1.2.1.4.1.1.7.267668260.65.65
get-next .1.0.8802.1.1.2.1.4.1.1.7.267668353.65.65
...

WCCOAsnmp    (1), 2019.03.01 13:26:10.208, PARAM,INFO,       54, Unexpected state, SNMPAbsractAgent, handleBulkRequest, Agent 17 target 17 bulk query for address 17_1.0.8802.1.1.2.1.4.1.1.7B terminated

This appears to be some sort of timing issue when the router (or possibly the WinCC OA driver) are busy and a new timestamped OID has been generated for the same port prior to requesting the next value.

We've separately reached out to RuggedCom to see if this is a configuration issue but were wondering if this is a known issue on the WinCC OA side (i.e. handling dynamic OIDs with embedded time stamps)?

Using a different SNMP browser we've found that issuing an SNMP get-bulk request, rather than iterative get-response/get-next-requests, when the device is responding in this manner returns only one timestamp's worth of data.

Is there any way to get the WinCC OA SNMP driver to issue an SNMP get-bulk command?

I've also looked at the SNMPAgent internal Browse.Start/Result/RequestId data points, but based on WireShark traces these also use the get-response/get-next-request pattern to retrieve data.

We've also considered an alternate approach where we poll on individual port-based OIDs as there is a fixed number (16) of maximum possible ports with a well-known set of final OID index values.
However doing this generates large amounts of noise in the log as the OIDs only "exist" if there is actually a cable plugged in to the physical port

Code: Select all

WCCOAsnmp    (1), 2019.03.01 13:04:51.046, PARAM,WARNING,    54, Unexpected state, SNMPAbstractTarget, no data sent for , (AID 16) target 1 1.0.8802.1.1.2.1.4.1.1.7.6.33.33: no such object/no such instance
WCCOAsnmp    (1), 2019.03.01 13:04:51.061, PARAM,WARNING,    54, Unexpected state, SNMPAbstractTarget, no data sent for , (AID 16) target 1 1.0.8802.1.1.2.1.4.1.1.7.6.34.34: no such object/no such instance
WCCOAsnmp    (1), 2019.03.01 13:04:51.076, PARAM,WARNING,    54, Unexpected state, SNMPAbstractTarget, no data sent for , (AID 16) target 1 1.0.8802.1.1.2.1.4.1.1.7.6.35.35: no such object/no such instance
WCCOAsnmp    (1), 2019.03.01 13:04:51.092, PARAM,WARNING,    54, Unexpected state, SNMPAbstractTarget, no data sent for , (AID 16) target 1 1.0.8802.1.1.2.1.4.1.1.7.6.36.36: no such object/no such instance
WCCOAsnmp    (1), 2019.03.01 13:04:51.107, PARAM,WARNING,    54, Unexpected state, SNMPAbstractTarget, no data sent for , (AID 16) target 1 1.0.8802.1.1.2.1.4.1.1.7.6.37.37: no such object/no such instance
WCCOAsnmp    (1), 2019.03.01 13:04:51.124, PARAM,WARNING,    54, Unexpected state, SNMPAbstractTarget, no data sent for , (AID 16) target 1 1.0.8802.1.1.2.1.4.1.1.7.6.38.38: no such object/no such instance
WCCOAsnmp    (1), 2019.03.01 13:04:51.138, PARAM,WARNING,    54, Unexpected state, SNMPAbstractTarget, no data sent for , (AID 16) target 1 1.0.8802.1.1.2.1.4.1.1.7.6.66.66: no such object/no such instance
WCCOAsnmp    (1), 2019.03.01 13:04:51.273, PARAM,WARNING,    54, Unexpected state, SNMPAbstractTarget, no data sent for , (AID 16) target 1 1.0.8802.1.1.2.1.4.1.1.7.6.67.67: no such object/no such instance

Is there any way to suppress or filter (just) this warning message?

The API example ExternLogFeed shows how to add other external logging sources, but it isn't clear if I can use these facilities to, for example, perform custom filtering on the existing SNMP log feed.
These messages only appear in the PVSS_II.log file, not the WCCOAsnmp*.log files, so I could possibly disable this input feed and replace it with a custom one that filters these messages out of the PVS_II.log presentation in the viewer.
This would still likely result in other log messages that an administrator would want to see being rolled over out of existence by the associated SNMP noise (so if I did write a custom filtering log feed I could also potentially write my own 'less noisy' log file along the way and then deal with roll over of that file and … starts to sound kind-a messy)

Thanks,
Richard

Post by **rsneddon** » Thu Apr 11, 2019 5:54 am

Just in case anyone else stumbles across a similar problem - the issue with >1000 OIDs returned appears to be caused when you have two WinCCOA SNMP agents (erroneously) configured with the same target address.
When their polling schedules coincide the device cannot distinguish between get-nexts from each agent as they come from the same IP address and the device starts interleaving replies based on the alternating generated timestamps and then we can never get to the end of the table.

After adding a check in our configuration data generator for duplicate IP addresses, and clearing out all the old _SNMPAgent data points, the failures are no longer occurring.

I do, however, still see the flashing between OIDs and Values when using the subindex 2 setting, but as noted above can work around this one.

Richard

Bugs / Problems

OID subindex column queries in the SNMP driver

OID subindex column queries in the SNMP driver

Re: OID subindex column queries in the SNMP driver