All too often, checking the health of an industrial automation network comes too late. Many a production facility doesn’t bother to assess the condition of the operational technology (OT) network connecting its control system to its automated machines until something goes wrong. They implement a system for monitoring their networks when they have to resolve disruptions and diagnose problems.
But even if you have a monitoring system from day one, with managed switches and a continuous network monitoring device, like Indu-Sol’s PROFINET-INspektor NT, you still have to know what to look for. Unlike the electrical and safety specs for the automated machines that run on them, there aren’t always clearly defined parameters for determining the operational quality of an industrial network.
Aside from obvious slowdowns or loss in production, gauging the health of a network can be very nuanced. However, there are specific metrics that can be tracked to help identify potential problems early on.
Jitter
Jitter is a key parameter for measuring the health of any OT network. It refers to a delay in telegrams arriving at points in a PROFINET or similar network but is also the measurement of variation between set update rates for sending data packets to a controller and actual update rates. So, for example, telegrams received every 15 milliseconds from an I/O device with a set update rate of 10 milliseconds, means the jitter is 5 milliseconds.
However just having jitter doesn’t necessarily mean there’s an issue. A fully operational OT network can have an acceptable amount of jitter. For PROFINET networks, it’s 50%. That’s why, when monitoring the health of a network, it’s important to make sure jitter hasn’t exceeded tolerable levels. If the jitter level is too high, the cause is related to line depth and set update rates. Either there are too many devices in a series, or the update rate is set incorrectly, causing too many telegrams to be sent down the same line. And if it’s not adjusted to bring jitter to an acceptable amount, it will affect production.
Frame Gaps
Unlike jitter, there’s no acceptable amount of frame gaps for an OT network. Frame gaps, or "telegram gaps", are missing telegrams. They’re usually discovered when an I/O device or controller resends a message after not receiving a reply. The question then becomes; how did the telegram disappear? There could be a variety of different causes, such as issues with a cable or EMC disturbances. But no matter what, any frame gap indicates a serious problem. Thankfully, most monitoring systems that can detect frame gaps can help pinpoint where in the network topology they occur and thus help focus any search for the cause.
When investigating telegram gaps, there are two possible causes that should always be considered: missing and incorrect telegrams in the switches. Missing telegrams (discards) are when a switch cannot send a telegram because the line is busy, and there is not enough memory to save the data packets, which are therefore discarded. Incorrect telegrams (errors) in switches are caused by PROFINET switches analyzing the received data packets to determine if any errors occurred during transmission and then deleting any invalid telegrams. Both situations lead to telegram gaps and can often be resolved by replacing the switches or changing the topology.
Net Load
The phrase “net load” refers to network load, which is the measurement of bandwidth usage compared to the network’s maximum capacity, expressed as a percentage. It’s essential for gauging the data throughput between devices, which is crucial for OT networks, like those using PROFINET. Unlike traditional office IT networks that have an eight-wire connection to allow gigabit speed, PROFINET networks have a four-wire connection with a communication limit of 100 megabits. The recommended net load for a PROFINET network is 20% or lower. You can examine the net load for specific points in a network at the switches. Tracking net load and updating the network to handle increasing levels will prevent devices and switches from becoming overloaded and causing system outages.
You should also keep the net load in mind when choosing switches for a new installation. PROFINET switches are divided into net load classes I, II and III, with class III being the one that can handle the highest load. Simpler switches, on the other hand, can only handle a smaller percentage of the net load before the function is affected. Indu-Sol's PROnet plan program makes it easier to choose switches that can handle the net load in a network.
Load Ratio
Load ratio is a common factor for evaluating communications networks. It relates to the amounts of different data types being transferred that can affect one another. The load ratio of a PROFINET network, for example, refers to the proportion of TCP/IP and UDP telegrams to PROFINET telegrams. The recommended ratio is about 100 PROFINET telegrams per TCP/IP and UDP telegrams, or 100:1, to ensure stable communication and extend the system’s lifespan. This is because PROFINET uses the same OSI model as TCP/IP but skips layers, making PROFINET telegrams smaller than TCP/IP and UDP packets for faster transmission time. If the ratio falls below 100:1 — say, for example, to 70:1 — the excessive TCP/IP and UDP traffic can slow PROFINET traffic, overload switches, and cause dropped PROFINET telegrams.
Maintaining a healthy OT network isn’t easy. Meeting production demands while safeguarding against potential issues requires vigilance. But it doesn’t need to be all-consuming. Understanding what to look out for and keep an eye on will make proactive monitoring much easier. Knowing these metrics brings you one step closer to preventing disruptions and keeping your operation running smoothly.