PRP and HSR: Redundancy protocols

Posted date 03/08/2017

Author

INCIBE (INCIBE)

Industrial control systems have evolved towards an automation of the most part of the states and events, setting the manual control aside to a minimal interaction. Controlling the critical interruptions of a process, and the response time required raised new network-level needs that have to be met. The importance of network protocols and their correct choice for the design of our industrial networks is essential in the security of critical systems.

The importance of network communication protocols, as well as their recovery times against a loss of link, is considered critical in certain industrial systems. Traditional protocols recalculate all the way when a link failure is reported. However, the recovery algorithms of these network protocols are transparent to the upper layers, although the time it takes to recalculate the links affects the transmission time of a message.

During a critical event, even if the delay is increased in milliseconds, the problem can be just as serious as the loss of information.

When a protection system is used, the room or window of exposure of a device must tend to "zero"; otherwise these systems would no longer be considered protection systems.

Connection between system type and acceptable reception times of messages

-Connection between system type and acceptable reception times of messages-

The traditional protocols of loop checking and retrieval of links in network layer 2 Spanning Tree Protocol (STP) y Rapid Spanning Tree Protocol (RSTP) have time limitations when reconfiguring the network table, so they are not adequate for the current demands of certain critical systems. For meeting this demand, the redundancy protocols Parallel Redundancy Protocol (PRP)and Highly-available Seamless Redundancy (HSR)will be developed and edited in the standard IEC 62439-3.

Network recovery times of the most common protocols

-Network recovery times of the most common protocols-

Redundancy protocols are used at the lower levels defined in the ISA-95 pyramid. In other words: at the field level and control level.

PRP

PRP is a protocol to ensure high availability and reduce the network recovery time and, therefore, the transmission to "zero". This protocol is based on the use of two independent networks at all levels, LAN A and LAN B, and sends the same message at the same time in both networks.

The device must send for each of its two network interfaces, a frame with the same MAC and the same IP for a different port in both networks.

Differences between Dual LAN, Redundant LAN and PRP

-Differences between Dual LAN, Redundant LAN and PRP-

The configuration of these two networks, as well as the routers ("switch") do not change by the use or do not change this protocol. The latency of both networks should be similar but not the same. If the latencies were very different we would always reach the frame through the same network and we should expect this time difference for the second frame.

The PRP protocol is located within an IP network frame. This situation provides one of the major advantages of this protocol: for the purposes of the network, the use of the protocol is transparent, so devices that use PRP and Devices that do not use it can communicate through this network.

Taking into account this feature of the PRP protocol, two types of devices in each of our networks can be found. A device where redundancy is integrated, DANP (Double Attached Node implementing PRP), and a device with no redundancy, SAN (Single Attached Node).

PRP information flow

-PRP information flow-

The DANP devices will execute an additional operation of this protocol through a Link Redundancy Entity (LRE), which is in charge of the duplication of the frames and the management of the redundant frames, being transparent to the other layers of the OSI model. This operation is performed at a layer 2 level.

The work performed by the LRE will consists in:

Receiving a frame of the link layer 3. During this process, it creates two new frames adding a series of bytes after the IP header called Redundancy Control Trailer (RCT).
Sending at the same time the two frames by their corresponding port to each of the two independent networks.
At the receiver, the LRE is responsible for removing the RCT and sending the frame to the upper layers. There are several ways to configure the algorithm to rule out the frames. It is usually configured to rule out the duplicate frame in the LRE. Another valid configuration is sending the duplicate frames to the upper layers. The frame ID is usually saved and it is ruled out when another one with the same ID arrives.

The following scheme shows the Ethernet frame and the PRP frame (Ethernet + RCT). The RCT structure adds 6 bytes:

Number of sequence: 16 bits. Incremental number to identify the duplicate.
Network Indicator: 4 bits. It identifies LAN A (0xA) or LAN B (0xB).
Frame size: 12 bits. Frame size with the RCT.
PRP suffix: 16 bits. It indicates the type of Ethernet frame (0x88FB) PRP.

Redundancy Control Trailer frame format PRP

-Redundancy Control Trailer frame format PRP-

The main characteristics of this protocol are:

The LAN A and the LAN B networks must be independent in case of failure and they operate in parallel. They must have similar latencies. The LAN A and the LAN B networks do not have to be identical.
All the SAN devices must only be connected to one of both LAN networks (A or B).
LAN A switches and LAN B switches are considered SAN.
All the DANP have to be connected to both networks. They must have the same IP and the same MAC in A and B ports, and be unique in the network.

HSR

HSR is a redundancy protocol, just like PRP. It ensures the high availability and reduces the network recovery time and, therefore, the transmission to "Zero". It is based on a device redundancy, a network frame of level 3. Then it becomes 2 identical HSR frames and they are sent through the two device ports to a network with ring topology, in opposite directions. The devices linked by this ring network are called DANH (Double Attached Node implementing HSR).

HSR information flow

-HSR information flow-

The operation procedures of the HSR protocol are different from the PRP. Both work on layer 2 of the link, but unlike PRP, whose frame is located within a standard Ethernet frame in the data field, the HSR protocol modifies the Ethernet frame headers by adding its new HSR fields. That is the reason why the network must exclusively contain DANH nodes.

The main differences with the PRP protocol are:

There is just one LAN network with a mandatory ring topology.
All devices in this network must understand the HSR protocol. They are DANH nodes.
The switches are SAN devices, so they can not be inserted into the ring.
The devices incorporate an operation system to send and receive frames. Since all nodes are connected together, they must add an extra functionality that works as a bridge for all frames in which it does not intervene as a source or destination. It simply forwards the frames received in one port from the other, letting the frames "flow" to their destination.

HSR protocol frame scheme:

HSR frame format

-HSR frame format-

SAN devices, through an intermediate device called RedBox (Redundancy Box), may become VAND devices (Virtual Doubly Attached Node) for both HSR and PRP. This process allows any device to be integrated into the redundancy network.

The RedBox devices have two interesting features:

They are independent from the redounded device.
They are reusable: they may change device, network or protocol.

The most current RedBox-type devices are very versatile and configurable by the administrator:

They allow for the output protocol HSR, PRP to be chosen.
They allow duplicity of frames for more than one device at a time, thanks to IP whitelists, for example.
They can be reconfigured remotely. It allows a redundant device of the network to be removed, without any stop or manual rewiring.
It has configurable security mechanisms, like SSL, allowing authentication through several systems, etc. to be chosen.

Networks and Security

Redundancy protocols improve two of the three pillars of security by duplicating messages: availability and integrity. However, a greater degree of confidentiality to messages is not provided, so the level of security is not increased from a logical point of view.

These protocols work in the lower layers of the OSI models, so it is necessary to make sure that the protocols used on them, especially on the application layer, provide that level of security that these protocols do not have.

Implementation of redundancy protocols in a control system

-Implementation of redundancy protocols in a control system-

Critical systems require additional cybersecurity measures. PRP and HSR protocols allow us to increase two of the most important factors: availability and integrity. Reducing these attack vectors to a minimum thanks to redundancy protocols is a great advantage for security in critical systems.

Before using these protocols, we must carry out an in-depth analysis of the needs of our industry. The cost of implementation and maintenance of the necessary devices for the implementation of these protocols, both technical and economic expenses, has to be weighed up, although in certain environments it is mandatory to use them.

Etiquetas

Best practices