Ethernet bond failure detection - sockets

I was testing my transmitter application on an ethernet interface and I deliberately put down the interface using "ifdown eth0". This stopped the message transmission and socket function "sendto" raised err ENETUNREACH. When the same interface was brought up again using "ifup eth0", message transmission resumed automatically.
However, when I am doing the same steps using bonded ethernet interface, then the error is not recovered. In other words, "sendto" gives ENETUNREACH error when bonded interface goes down. But when it is made up again, the transmission is not started back. Instead, the error is changed to "ENODEV".
Is there any action an application needs to perform on bonded interface failure, in order to recover from the same? If yes, how an application comes to know about the failure?
In case of infiniband bond, application receives RDMA errors like PORT_ERR so it is easy to reconnect the socket.
Also, is there any specific bond configuration which may auto-recover from such failures or indicate the application about the same. As far as I understand, the bonded interface should behave like a normal interface and should auto-recover from the failure.
Appreciate your help!

Related

How to use TCP keep_alive property to get notified on the event of a unresponsive peer?

Scenario:
I have a client and server written using boost::asio 1.63. Generally the connection and communication part works well and great.
I have written a Watchdog on both sides which send dummy packets to peers in an interval of 2 seconds each. The objective of the watchdog is that the concerned peer reports a connection error if it does not receive the dummy packet it is expecting in the next 2 seconds. This is even more important for me because it might happen the 2 peers are not transacting packets for any user purpose but each of them is required to report a connection error if any of the peer goes down. The peer can go down even because of a kernel crash in which case it would not be possible for that peer to send a message. This is a classic problem of course which exists even beyond asio and TCP.
My Watchdog works perfectly well. No issues at all.
But, recently I read about the keep_alive feature in sockets. I tried out the following code and seems like I can a property called keep_alive on the TCP socket by getting the native handle to the socket from within my code using boost::asio.
boost::asio::io_service ioService;
boost::asio::ip::tcp::socket mySocket(ioService);
int on = 1;
int delay = 120;
setsockopt(mySocket.native_handle(), SOL_SOCKET, SO_KEEPALIVE, &on, sizeof(on));
setsockopt(mySocket.native_handle(), IPPROTO_TCP, TCP_KEEPALIVE, &delay, sizeof(delay));
Question:
Above code compiles well on macOS, Linux and iOS. That looks great. But, how do I benefit from this? Does this give me a callback or event when the peer goes down? Does this free me up from writting the Watchdog that I described above?
I have used boost::asio::async_connect to connect to the peer. Can I get a callback to my connectionHandler when the perr goes down after the defined timeout interval?
Having set the keep_alive options, how do I then get to know that my peer is not responding anymore?
If the disconnetion was detected when an async operation is pending, your socket's completion handler will be invoked with the appropriate error code.
The problem is that TCP keep_alive option doesn't not always detect disconnects.
In general, there is no reliable way to detect sudden disconnection, other than by implementing application-level ping/heartbeat.
You can also see this thread.

How to intercept J1939 CAN messages?

I'm building a HIL/SIL test with Simulink, which tests the Vehicle Control Unit(VCU) from a vehicle. This VCU talks with a Power Distribution Module(PDM) over a J1939 CAN network. The PDM handles the in- and outputs from switches and to actuators and puts the information on the CAN bus. The VCU then knows what the PDM is seeing from connected sensors. In turn, the VCU puts info on the CAN bus on how the PDM should control the connected actuators.
My laptop is hooked to the same CAN bus with a Vector adapter and Simulink.
To test the VCU, I need to mimic the PDM and send messages to the VCU as if I were the PDM. The VCU then has to take the correct actions and control the real PDM accordingly.
Obviously, if I just mimic the PDM, my messages will interfere with those sent from the real PDM. So basically, I need the PDM to shut up and only listen. I do the talking for the PDM. However, the PDM is not configurable in a listen-only mode, so I have to intercept all messages it sends so they never arrive at the VCU.
My idea was that i'd detect(by observing the arbitration field of all messages) when the PDM starts sending, and pull a bit down in the arbitration field. It'd recognise the priority of my 'message' over its own, and it'd stop transmitting. It'd be as if the CAN bus is always to busy to give room to the PDM. This would shut up the PDM without it throwing errors. But other suggestions are welcome.
So (how) is it possible to intercept J1939 CAN messages in MATLAB/Simulink, or with a separate CAN controller?
Here is an idea, how to realize what you are looking for. You need some extra hardware, however.
This is the rough outline:
Setup a CAN-gateway device, which has two independent CAN-interfaces can0 and can1.
Disconnect the PDM from the CAN-bus and connect it to one of the interfaces of your CAN-gateway, e.g. can0
Connect the second interface of the CAN-gateway, can1, to the original CAN-bus, which also includes your laptop and the VCU
Program your CAN-gateway to forward all incoming CAN-frames on can1 to the can0 interface
As you want to ignore all messages from the PDM, simply ignore the CAN-frames coming in on interface can0 and not forward them to can1
Example, how to realize such a CAN-gateway:
Hardware: Use a Raspberry Pi and a CAN extension board with two can-interfaces, such as the PiCAN2 duo board.
Software: Write a small program to forward traffic between the interfaces can0 and can1, using socketcan, which is already included in the Linux kernel.
In case your devices are communicating via the higher layer J1939 transport protocol, you might also need to get the J1939 transport protocol running on the Raspberry Pi. If you are simply using 29-bit indentifiers with a maximum payload of 8 byte of data, this should also not be necessary.
Alternatively, you could also use a more expensive commercial solution, this CAN-Router for example.
Your original idea:
I think what you are envisioning is technically feasible, but might have some other drawbacks.
As the drivers of can controllers typically don't expose interfaces to interactively manipulate CAN-frames while their transmission is still ongoing, you could directly address a can-transceiver from a microcontroller
A few researchers realized a CAN Denial of service attack by turning the first recessive bit in a CAN-frame after the arbitration ID into a dominant bit for certain selected CAN-IDs. They used an Arduino Uno and a Microchip MCP2551 E/P CAN transceiver. The code used is also available online. As this interactive manipulation of CAN-frames during transmission is related to what you are looking for, this could be a good starting point for you.
Still I see some drawbacks, when you silence the PDM this way:
You will not only silence the PDM this way, but also (at least) delay the transmission of other nodes on the CAN-bus with arbitration IDs that have
lower priority than the messages from the PDM
It is very likely that the PDM will go into some error state, when it is not able to successfully send its CAN-frames to the bus after a certain number of retries
Yet another idea:
In case you are able to adapt the software of the VCU, change it in a way that it does not consume the CAN-frames from the PDM, but CAN-frames from your laptop by using different CAN-IDs for the same messages. You will have to change the dbc-file for that purpose.

Packet drop notification in Layer-2

IS there a way I can in user-space get notification about a packet being dropped at Layer-2 in 802.11.
According to my understanding what happens is, when a packet is sent out on the medium, there are Layer-2 ACKs which are received if it is delivered correctly (if not,it does the retransmission and ultimately drops the packet if not delivered after several retries..)
I want to be able to access this notification (in user-space)and change the behavior of packet transmission.
I want to be able to send the packet to another host from the FIB rather than dropping the packet.
I have read about libpcap libraries and netfilter hooks which allows me to capture packet and inject them back on the networking stack..
But I'm not able to find hooks (if any, for the wireless stack) to help me capture the packet notification in Layer-2.
Please correct me if I'm not understanding something correctly. Also, any heads-up or links to read would be great.
No, you cannot, at least not using the standardised sockets interfaces. 802.11 is a link layer, and the sockets API is strictly link-layer agnostic: unless it's going to work on all link layers, it's not in sockets. There are good reasons for that: the kind of cross-layer interaction that you envision has been tried many times, and it's always turned out more trouble than it's worth.
You didn't give us any details about the application — but the best solution is most probably to change your application-layer protocol to send explicit acknowledgments, and send your data over the fallback route when you fail to receive an ACK.

Select() is not coming out in client side

I have written one client socket program using linux sockets only. Here is the information giving picture what I am doing in my program
Creating the socket
Making connection with server socket
assigning that socket to read set and exception set for select.
using the select method giving the timeout value NULL in a separate thread
Server is running in one external device.
this program is working fine for reading and all.. Now I am facing problem when I unplug the power cable of that device.
I assumed that when we remove the power cable of the device all the sockets will abruptly closed and connected client sockets will get read event. when we try to read we receive number of bytes read as zero that means connection closed by server.
But in my program when I unplug the power cable of the device, Here in my client program select is not coming out means client socket is not getting any event. I don't understand why..
Any suggestion will be appreciated on how we can come to know that connection is closed by server or any information on whats the sockets behaviour when shutting down the power supply.
I need your help, its very critical.
thank you.
When a remote machine is suddenly cut off the network (network cable unplug or power loss), there is no way it can inform the other side of the connection about that. What is more the client side that performs only reads from a half-open socket (like in your case) won't be able to detect this either.
The only way to know about a connection loss is to to send a packet. Since all data being sent should be acknowledged by the other side, TCP on a client computer will keep retrying to send an unconfirmed portion of data till the number of attempts is exhausted. Then a ETIMEDOUT error should be returned (via a socket that is expecting read events). You can create one more socket for sending these messages periodically to detect a peer disappearance (heart beat connection) on the client side. But all this retries might still take some time.
Another option could be to use SO_KEEPALIVE socket option. After some time a connection has been idle, TCP starts sending probe messages to the server and can detect its disappearance. The default values for idle item are usually enormously huge, so they need to be modified. Some of other parameters that might be related (TCP_KEEPCNT, TCP_KEEPINTVL, TCP_KEEPIDLE). It appears, this option might be implemented differently on different systems or can be simply absent.
I've never personally tried to solve this problem so all this is just a bunch of thoughts that might give some ideas. Here is one more source of ideas.

JMX RMI agent fault tolerance mechanisms

I am using the JMX-RMI agent for message passing. I have a java program which sends a message having a name/id to a set of listener/listeners.Based on the message received by the listeners the client side programs behave accordingly.This piece works fine but I would like to know what kind of fault tolerance is built in into the JMX RMI agent.
If the listener stops accidentally, does JMX restart it or logs the error somewhere,what if the message queue on either side is full. Any documentation which explains the underlying architecture of JMX RMI or the in built fault tolerance mechanism will be appreciated. If it doesn't have any fault tolerance mechanisms, what would be a good way of doing it.
Thanks Much
I am assuming your client side listeners are using the standard javax.management.remote connectors. Without some customization, I would say you can implement some straightforward Fault Detection. For Fault Tolerance you're probably looking at some sort of clustering solution.
There are two layers of connectivity you need to be concerned about:
The MBeanServerConnection itself. In other words, if the whole server side JVM terminates, your client side processes need to know.
While the server JVM and the subsidiary MBeanServerConnection may continue to be available, the "hosted", the listener/client message forwarder service itself may stop/fail/stall.
For #1, the client processes can register a NotificationListener with the JMXConnector using the addConnectionNotificationListener method. Your local connection will then emit JMXConnectionNotifications on all the following events:
A new client connection has been opened.
A client connection has been closed.
A client connection has failed unexpectedly.
A client connection has potentially lost notifications. This notification only appears on the client side.
This way, your clients will know when a connection to the server has been established and lost.
For #2, it's a bit more specific to your application, but perhaps you can adapt a simple pattern like this:
When your listener/forwarder service starts, emit a start-notification. When it stops,emit a stopped-notification. The two categories of listeners that would register for these notifications would be:
The clients, so they know the service has started/stopped.
A server side "watcher" that can listen for a "stop" and restart the service.
Is that more-or-less what you were thinking of ?