View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0009619 | Part 81: UAFX Connecting Devices and Information Model | Spec | public | 2024-06-21 18:42 | 2024-11-01 14:08 |
Reporter | Brian Batke | Assigned To | Brian Batke | ||
Priority | normal | Severity | major | Reproducibility | have not tried |
Status | closed | Resolution | no change required | ||
Summary | 0009619: Missing description of connection behavior when heartbeat or data subscription times out | ||||
Description | Consider the case of a fully active unidirectional with heartbeat or bidirectional connection. Data is being sent and received at both endpoints. If the subscriber on either endpoint times out, the publisher should stop. Otherwise the other endpoint may never know that its published data is not being received. e.g., a controller that is sending outputs and receiving inputs from a device. If the device stops receiving the outputs (input to the device), but keeps on sending inputs, the controller will never know that its outputs are not being received. This was the reason for defining a heartbeat connection. If there is no action to be taken, in the event of a timeout, there is no reason to have the heartbeat. It should not be required for an endpoint to "poll" the other to get its status. It could be that such a status could be communicated in the PubSub data, but there is no definition for this, and it would not work in the case of a heartbeat. | ||||
Tags | No tags attached. | ||||
related to | 0009618 | assigned | Brian Batke | No mechanism to handle case of never receiving first data |
|
I think it would be better to change the status in the header instead of STOP. The status is also sent for the heartbeat. If the Writer is stopped, this could cause different timing related issues in situations where communication is not reliable and state changes more frequently. The status may even be used in for the case where the Reader is in preoperational to indicate to the other side that nothing was received so far. |
|
Which header? The network message header? I don't see a status in the network msg header. If it is a heartbeat message, then there would be no other header, correct? But let's say that it is possible to add a status to the header. Seems like you would then need a specific status code to say that the corresponding reader is not receiving data. And then we would still need to define behavior when the other side receives the bad status. ie., then it would need to stop publishing. So in the end, the result needs to be the same. We have this pattern in other protocols (EtherNet/IP and PROFI) with stopping production when the receving side is in error or timeout, and this does not cause any timing issues. |
|
The NetworkMessage is the wrong level. The NetworkMessage is just a transport container for the DataSetMessages. And each DataSetMessage has a Status, even if the payload is empty for heartbeat messages. A FX connection works on DataSetWriter/Reader and therefore on DataSetMessage level. In theory, multiple DataSetMessages from multiple FX connections in different states may be contained in a NetworkMessage. For Periodic Fixed the NetworkMessages would be still sent even if one of the Writers in the WriterGroup is disabled. The corresponding DataSetMessages would have the first (valid) bit set to false. I saw the slides from Jan today and do not like the idea that one side behaves different than the other. This requires additional configuration and it is not symmetric. To send a dedicated Status instead of stopping the Writer would have the same effect (indicate the error faster to the other side) but recovery would be faster and more reliable if the problem is temporary and resolved before the clean-up time is over. Since the Status is already in the DataSetMessage, there is no change in PubSub necessary. And the bundling / error handling of Reader/Writer pairs must be defined in FX anyhow. I think a dedicated Status has the same result but is more reliable and gives applications more information and flexibility |
|
OK, I didn't realize that the DataSet message header would always be included with a heartbeat. But the definition of the Status in that header says: "The overall status of the DataSetMessage", which would seem to be contrary if the status then said something like "the corresponding reader is in error". And then in that case, we would still need to define what the subscriber is supposed to do when it receives that status. And what it would need to do is to stop publishing for the corresponding writer. So the end effect is the same. If you don't stop publishing or otherwise tear down the connection, then it is forever stuck. But we should probably discuss all this in a meeting. |
|
If there is a related Reader that is relevant for the "overall status", I do not see a problem to indicate a problem in the status that is not directly a problem in the DataSetMessage itsself but from OPC UA FX point of view the "overall status of the DataSetMessage" is not GOOD. I agree that you still need to define the behaviour but this also the case if you simply stop the Writer. The behaviour of clean-up could be the same for "did not receive DataSetMessages" and "received DataSetMessagew with Uncertain_RelatedReaderError" |
|
This issue is about missing description in case of "loss of heartbeat" or "loss of subscribed data". "5.5.4.2 Operation Edit: If this is the problem that was discussed on Tuesday, I think we need to differentiate: 1) Loss of heartbeat or loss of subscribed data (indicating transmission failure on device or frame propagation on wire) vs 1) is covered with what is already specified. Why is it relevant if this takes CleanupTimeout2 + CleanupTimeout1? The Connection is supposed to exist, and its failure is an error condition. Cleaning up the connection will free resources on either end, which are supposed to be used in any case, once the reason for the failure has been resolved. Edit2: If Endpoint1 was interested in the status of Endpoint2, regardless of whether a CleanupTimeout was defined, it could always map the Status Variable of the ConnectionEndpoint into a data message, rather than using a heartbeat. |
|
I agree with David, that for the case of a running connection (both sides are in operation) no additional text or work is required, but the solution that Matthias described - with some standard uncertain status code is worth implementing. I do think this is an issue that initially would have to be handled in Part 14 and then we could make use of it in our specs. Matthias is proposed solution could be used to handle the case where one or the two device has never reached operation (and is only in pre-operational - do to missing keys or missing values etc). It could also be useful for an application that would like to provides feedback, that there is a problem at the application level. |
|
In the Operational case, I suppose we could say that the CleanupTimeout will handle this. If the reader stops receiving and goes to error, the ConnectionEndpoint goes to error, and eventually will be cleaned up and the writer will stop as well. It may be problematic if that is a long cleanup timeout (if you want a quick detection of a problem), but then the user or engineering tool could just set a short cleanup timeout. The Preoperational case is another problem though |
|
See previous note. Determined to be "not a problem". Connection Endpoint will be deleted after the cleanup timeout expires, which will then cause data to stop being produced. If users want quick feedback and endpoint deletion, a short cleanup timeout can be set. |
|
Agreed in call that this issue is not a problem and no changes are required |
Date Modified | Username | Field | Change |
---|---|---|---|
2024-06-21 18:42 | Brian Batke | New Issue | |
2024-07-11 12:43 | Matthias Damm | Note Added: 0021446 | |
2024-07-11 15:09 | Brian Batke | Note Added: 0021447 | |
2024-07-11 17:01 | Matthias Damm | Note Added: 0021448 | |
2024-07-12 11:45 | Brian Batke | Note Added: 0021452 | |
2024-07-12 13:27 | Matthias Damm | Note Added: 0021454 | |
2024-07-19 10:42 | David Puffer | Note Added: 0021491 | |
2024-07-19 10:51 | David Puffer | Note Edited: 0021491 | |
2024-07-19 10:56 | David Puffer | Note Edited: 0021491 | |
2024-08-16 12:38 | Paul Hunkar | Relationship added | related to 0009618 |
2024-08-16 12:39 | Paul Hunkar | Assigned To | => Brian Batke |
2024-08-16 12:39 | Paul Hunkar | Status | new => assigned |
2024-09-06 04:59 | Paul Hunkar | Note Added: 0021665 | |
2024-09-06 13:44 | Brian Batke | Note Added: 0021670 | |
2024-11-01 14:02 | Brian Batke | Status | assigned => resolved |
2024-11-01 14:02 | Brian Batke | Resolution | open => fixed |
2024-11-01 14:02 | Brian Batke | Fixed in Version | => 1.00.03 |
2024-11-01 14:02 | Brian Batke | Note Added: 0021965 | |
2024-11-01 14:07 | Paul Hunkar | Resolution | fixed => no change required |
2024-11-01 14:08 | Paul Hunkar | Status | resolved => closed |
2024-11-01 14:08 | Paul Hunkar | Fixed in Version | 1.00.03 => |
2024-11-01 14:08 | Paul Hunkar | Note Added: 0021966 |