View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0009618 | Part 81: UAFX Connecting Devices and Information Model | Spec | public | 2024-06-21 14:09 | 2024-09-06 13:55 |
Reporter | Brian Batke | Assigned To | Brian Batke | ||
Priority | normal | Severity | major | Reproducibility | have not tried |
Status | assigned | Resolution | open | ||
Summary | 0009618: No mechanism to handle case of never receiving first data | ||||
Description | Consider a bidirectional connection, where Controller (or Device) A starts publishing, which is received at B. But the publications from B are never received at A. Could be due to a faulty cable or some other network issue. In this case, the subscriber on A will be stuck in preoperational, and the connection will not time out. B would not know that is publications are not being received unless it somehow explicitly checks, which should not be a required behavior. There needs to be a mechanism to handle this situation. | ||||
Tags | No tags attached. | ||||
related to | 0009619 | assigned | Brian Batke | Missing description of connection behavior when heartbeat or data subscription times out |
|
Discussion from working group meeting |
|
Sending a status in the PubSub message (indicating preoperational) may be informative, but it doesn't really solve the problem. Consider a case where both Endpoint 1 and Endpoint 2 begin publishing, but neither are receiving any messages because of some kind of network configuration issue. Neither would ever see the status indicating the other side is still preoperational. They both remain in preoperational indefinitely. Or consider the unidirectional case where EP2 starts sending (it is operational) but EP1 never receives any messages because of a network error. EP1 would remain in preoperational indefinitely. There needs to be some kind of timeout -- probably configurable -- to handle these situations. |
|
In the described case:
In this case, A would know that something is wrong with B, because it never receives any data from B. Any diagnosing entity observing A would observe EP1 to be in Preoperational, which according to part 81 is only the case if either one or both of DSR and DSW are in pre-operational. |
|
I edited the original description to correct a mixup of endpoint A and B. I think there are two example cases of interest: One endpoint having an issue, and both endpoints having an issue.
In the case of a controller opening connections via its CM, the controller needs (in my experience at least) a time limit on connections being established and providing data in order for the user application to run. To have the possibility of the connection sitting in a "waiting for data" state, indefinitely, is a problem. So having a timeout for going operational would solve this. We could say that this should be the application's responsibility, but it seems to me that this should be part of the connection state machine. You wouldn't expect a TCP connection to complete the handshake and start sending data on one side, and then sit waiting forever for the other side. Different sort of protocol but the principle is similar. |
|
The additional status in the header would eliminate the case where one side is stuck in preoperational - since this would be reported to the other side. i.e. your case 1. To me the issue with a simple timeout, is that depending on why it is pre-operational the values of the timeout and or what should be done are very different. If waiting on an SKS - then the CM should be checking the SKS and this operation could be slow, thus a longer timeout might be needed), if it is not receiving an initial value it would have a different timeout - probably fairly short or at least different. The CM in this case would want to check if the other node is reachable and configured correctly (if TSN is involved then the CM might be looking at TSN configuration - again if TSN is involved then the timeout might be larger. I think in all case - as long as the CM is monitoring then we would not be having a leak - but just reporting a problem (that the application or engineer can correct - and in case like a bad switch configuration of a stuck SKS would be fixed without any changes to the connection. |
|
We should talk about this in a meeting. This was described better with Jan's slides. |
Date Modified | Username | Field | Change |
---|---|---|---|
2024-06-21 14:09 | Brian Batke | New Issue | |
2024-07-20 06:12 | Paul Hunkar | Note Added: 0021497 | |
2024-07-20 12:37 | Brian Batke | Note Added: 0021499 | |
2024-07-23 12:20 | David Puffer | Note Added: 0021502 | |
2024-07-23 12:23 | David Puffer | Note Edited: 0021502 | |
2024-07-23 15:14 | Brian Batke | Description Updated | |
2024-07-23 20:47 | Brian Batke | Note Added: 0021506 | |
2024-08-16 12:38 | Paul Hunkar | Relationship added | related to 0009619 |
2024-08-16 12:39 | Paul Hunkar | Assigned To | => Brian Batke |
2024-08-16 12:39 | Paul Hunkar | Status | new => assigned |
2024-09-06 05:56 | Paul Hunkar | Note Added: 0021667 | |
2024-09-06 13:55 | Brian Batke | Note Added: 0021673 |