Modbus TCP problems in SCADA-LTS - modbus

I'm trying out SCAD-LTS with my trusty Modbus simulator ModRSim2.exe as a Docker image on a Windows 10 laptop. I'm having difficulty maintaining a stable connection to the Modbus TCP server. Update rate is at 1s. Data is beeing read, but alarms keep popping up (connection lost, checksum) and the connection keeps beeing reset.

The checkbox is for encapsulating the Modbus RTU protocol in TCP data packets. Checking the checkbox was a bad idea (although it seemed to reduce the number of errors it actually made things a lot worse).
I ended up increasing the cycle time from 1 to 2 seconds. That made the error messages go away. Weird, because I would not expect protocol errors to arise from short cycle times. I think it might be a bug(?)

Related

Why is my TCP socket showing connected but not responding?

I have a program using a bi-directional TCP socket to send messages from the host PC to a VLinx ethernet-to-serial converter and then on to a PLC via RS-232. During heavy traffic the socket will intermittently stop communicating although all soft tests of the connection show that it is connected, active and writeable. I suspect that something is interrupting the connection causing the socket to close with out FIN/ACK. How can I test to see where this disconnect might be occuring?
The program itself is written in VB6 and uses Catalyst SocketTools/SocketWrench as opposed to the standard Winsock library. The methodology, properties and code seem to be sound since the same setup works reliably at two other sites. It's just this one site in particular where this problem occurs. It only happens during production when there is traffic on the network and can lose connection anywhere between 20 - 100 times per 10-hour day.
There are redundant tests in place to catch this loss of communication and keep the system running. We have tests on ACK messages, message queue size, time between transmissions (tokens on 2s interval), etc. Typically, the socket will not be unresponsive for more than 30 seconds before it is caught, closed and re-established which works properly >99% of the time.
Previously I had enabled the SocketTools logging capabilities which did not capture any relevant information. Most recently I have tried to have the system ping the VLinx on the first sign of a missed message (2.5 seconds). Those pings have always been successful, meaning that if there is a momentary loss of connection at a switch or AP it does not stay disconnected for long.
I do not have access to the network hardware aside from the PC and VLinx that we own. The facility's IT is also not inclined to help track these kinds of things down because they work on a project-based model.
Does anyone have any suggestions what I can do to try and determine where the problem is occurring so that I can then try to come up with a permanent solution to this issue rather than the band-aid of reconnecting multiple times per day?
A tool like Wireshark may be helpful in seeing what's going on at the network level. The logging facility in SocketTools/SocketWrench can only report what's going on at the API level, and it sounds like whatever the underlying problem is occurs at a lower level in the TCP stack.
If this is occurring after periods of relative inactivity, followed by a burst of activity, one thing you could try doing is enabling keep-alive and see if that makes any difference.

Using boost::asio::tcp, how to get notified when socket connection is broken

Scenario:
I am using boost::asio::tcp protocol between 2 peers connected over the network. My code runs on Linux, macOS and iOS.
I have a Watchdog ping-pong mechanism implemented on my both sides to check that the socket connection is okay between the peers. This is done by sending a dummy packet every 2 seconds which I think is a known approach.
Challenge:
But, is there are way I can avoid writing this watchdog myself?
Is there a way to enable asio/TCP stack itself to do this for me and trigger an event right when the socket connection is not okay? I have been trying to understand the kepp_alive functionality in TCP stack because it seems to have answer to my question?
But then again it looks like I can tweak the keep_alive parameters but boost::asio::tcp does not seem to give me an API to do that.
Question:
Will tweaking of the keep_alive parameters help me acheive my goal? My goal is to get a notification from the asio or TCP stack when the socket is not connected anymore due any wierd reasons on the peer side. Note that peer can go down in a weird way like kernel panic or something really crazy.
Just setting socket.set_option(boost::asio::socket_base::keep_alive(true)); does not seem to help. The default time running on Linux is too high. I do get a notification even after many minutes after event of peer going down.

lwip stm32 - http requests failing

I running freeRTOS and lwip 1.4.1 with the socket api in use on an stm32 processor (stm32f407).
Overall it works pretty fine.
I can send and receive data with udp and tcp.
But in a timewindow of 3 to 7 days I see a strange behavior.
My Problem
Every 3 to 7 days my client (Windows 10, which sends 1-2 HTTP-Requests per second) fails to send those requests. When this happens, there are ~10 Requests successively, which are failing. In very few moments, the stack won't regenerate at all.
My Guess
I think I have possibly missconfigured something in my LWIP config.
Because the stack is well used and shouldn't have any bugs in this direction
My Ethernet settings
server and client are directly connected, no switch,hub or router in between.
server (stm32/lwip):
static, 192.168.168.2
netmask, 255.255.255.0
client (win10) eth0:
static, 192.168.168.1
netmask, 255.255.255.0
client (win10) eth1:
dhcp, to normal working network
My Tries
At the moment I have tests running which are sending ~7-8 Requests per second, but the error doesn't apply more often.
I played around with the lwip config:
more memory for the stack
more pbufs
bigger pbufs
with/without backlog
But everything without improving of this connection problem.
Could it be because of the often reused port numbers from the client, which could make this problem?
Here I have the relevant part of the lwip debuging output:
tcp debugging output
https://pastebin.com/a9JabhET
Here the Wireshark log:
orig screenshot
hole wireshark log:
https://www.file-upload.net/download-12682664/debug_tcp_00001_20170828172950.html
And here my lwipopts.h:
lwip configuration:
https://pastebin.com/cW0v4hF6
It seems a memory problem, but as it is temporary, it could be a timeout on something.
I suggest to use the memory stats functions of LwIP, and also to enable the ARP debug messages.
I have a new Job and am no longer working on this issue.
Befor I stated my new job I could show that it was not a memory Problem on LwIP (I defined unreasonable large pbufs and memorypools) they never reached their limits.
The problem was in the DMA driver for the ETH. When once reached the memory chain end of the DMA driver the chain elements never got freed, so I run into RBU (Receive Buffer Underrun) problems and the RBU Flag never got reseted again and the DMA ETH driver was hanging in this RBU interrupt (even if there where enough LwIP buffs to write to from DMA chain). So I added a sledgehammer fix to the DMA driver and disabled the RBU interrupt (I am polling the RBU flag in multiple situations and clear it if needed, and start to read again from ETH).
I think since then the problem is more or less "solved". Not nice, but it worked.
I've got some information of my coworker at my old working place:
The RBU Interrupt and the clear did not work, because our used CAN stack did not work very well with FreeRTOS, the CAN stack used on busy systems much over 90% of CPU time, which let to the strange behaviour in ETH driver and LWIP.

How to speed up slow / laggy Windows Phone 7 (WP7) TCP Socket transmit?

Recently, I started using the System.Net.Sockets class introduced in the Mango release of WP7 and have generally been enjoying it, but have noticed a disparity in the latency of transmitting data in debug mode vs. running normally on the phone.
I am writing a "remote control" app which transmits a single byte to a local server on my LAN via Wifi as the user taps a button in the app. Ergo, the perceived responsiveness/timeliness of the app is highly important for a good user experience.
With the phone connected to my PC via USB cable and running the app in debug mode, the TCP connection seems to transmit packets as quickly as the user taps buttons.
With the phone disconnected from the PC, the user can tap up to 7 buttons (and thus case 7 "send" commands with 1 byte payloads before all 7 bytes are sent.) If the user taps a button and waits a little between taps, there seems to be a latency of 1 second.
I've tried setting Socket.NoDelay to both True and False, and it seems to make no difference.
To see what was going on, I used a packet sniffer to see what the traffic looked like.
When the phone was connected via USB to the PC (which was using a Wifi connection), each individual byte was in its own packet being spaced ~200ms apart.
When the phone was operating on its own Wifi connection (disconnected from USB), the bytes still had their own packets, but they were all grouped together in bursts of 4 or 5 packets and each group was ~1000ms apart from the next.
btw, Ping times on my Wifi network to the server are a low 2ms as measured from my laptop.
I realize that buffering "sends" together probably allows the phone to save energy, but is there any way to disable this "delay"? The responsiveness of the app is more important than saving power.
This is an interesting question indeed! I'm going to throw my 2 cents in but please be advised, I'm not an expert on System.Net.Sockets on WP7.
Firstly, performance testing while in the debugger should be ignored. The reason for this is that the additional overhead of logging the stack trace always slows applications down, no matter the OS/language/IDE. Applications should be profiled for performance in release mode and disconnected from the debugger. In your case its actually slower disconnected! Ok so lets try to optimise that.
If you suspect that packets are being buffered (and this is a reasonable assumption), have you tried sending a larger packet? Try linearly increasing the packet size and measuring latency. Could you write a simple micro-profiler in code on the device ie: using DateTime.Now or Stopwatch class to log the latency vs. packet size. Plotting that graph might give you some good insight as to whether your theory is correct. If you find that 10 byte (or even 100byte) packets get sent instantly, then I'd suggest simply pushing more data per transmission. It's a lame hack I know, but if it aint broke ...
Finally you say you are using TCP. Can you try UDP instead? TCP is not designed for real-time communications, but rather accurate communications. UDP by contrast is not error checked, you can't guarantee delivery but you can expect faster (more lightweight, lower latency) performance from it. Networks such as Skype and online gaming are built on UDP not TCP. If you really need acknowledgement of receipt you could always build your own micro-protocol over UDP, using your own Cyclic Redundancy Check for error checking and Request/Response (acknowledgement) protocol.
Such protocols do exist, take a look at Reliable UDP discussed in this previous question. There is a Java based implementation of RUDP about but I'm sure some parts could be ported to C#. Of course the first step is to test if UDP actually helps!
Found this previous question which discusses the issue. Perhaps a Wp7 issue?
Poor UDP performance with Windows Phone 7.1 (Mango)
Still would be interested to see if increasing packet size or switching to UDP works
ok so neither suggestion worked. I found this description of the Nagle algorithm which groups packets as you describe. Setting NoDelay is supposed to help but as you say, doesn't.
http://msdn.microsoft.com/en-us/library/system.net.sockets.socket.nodelay.aspx
Also. See this previous question where Keepalive and NoDelay were set on/off to manually flush the queue. His evidence is anecdotal but worth a try. Can you give it a go and edit your question to post more up to date results?
Socket "Flush" by temporarily enabling NoDelay
Andrew Burnett-Thompson here already mentioned it, but he also wrote that it didn't work for you. I do not understand and I do not see WHY. So, let me explain that issue:
Nagle's algorithm was introduced to avoid a scenario where many small packets had to been sent through a TCP network. Any current state-of-the-art TCP stack enables Nagle's algorithm by default!
Because: TCP itself adds a substantial amount of overhead to any the data transfer stuff that is passing through an IP connection. And applications usually do not care much about sending their data in an optimized fashion over those TCP connections. So, after all that Nagle algorithm that is working inside of the TCP stack of the OS does a very, very good job.
A better explanation of Nagle's algorithm and its background can be found on Wikipedia.
So, your first try: disable Nagle's algorithm on your TCP connection, by setting option TCP_NODELAY on the socket. Did that already resolve your issue? Do you see any difference at all?
If not so, then give me a sign, and we will dig further into the details.
But please, look twice for those differences: check the details. Maybe after all you will get an understanding of how things in your OS's TCP/IP-Stack actually work.
Most likely it is not a software issue. If the phone is using WiFi, the delay could be upwards of 70ms (depending on where the server is, how much bandwidth it has, how busy it is, interference to the AP, and distance from the AP), but most of the delay is just the WiFi. Using GMS, CDMA, LTE or whatever technology the phone is using for cellular data is even slower. I wouldn't imagine you'd get much lower than 110ms on a cellular device unless you stood underneath a cell tower.
Sounds like your reads/writes are buffered. You may try setting the NoDelay property on the Socket to true, you may consider trimming the Send and Receive buffer sizes as well. The reduced responsiveness may be a by-product of there not being enough wifi traffic, i'm not sure if adjusting MTU is an option, but reducing MTU may improve response times.
All of these are only options for a low-bandwidth solution, if you intend to shovel megabytes of data in either direction you will want larger buffers over wifi, large enough to compensate for transmit latency, typically in the range of 32K-256K.
var socket = new System.Net.Sockets.Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)
{
NoDelay = true,
SendBufferSize = 3,
ReceiveBufferSize = 3,
};
I didn't test this, but you get the idea.
Have you tried setting SendBufferSize = 0? In the 'C', you can disable winsock buffering by setting SO_SNDBUF to 0, and I'm guessing SendBufferSize means the same in C#
Were you using Lumia 610 and mikrotik accesspoint by any chance?
I have experienced this problem, it made Lumia 610 turn off wifi radio as soon as last connection was closed. This added perceivable delay, compared to Lumia 800 for example. All connections were affected - simply switching wifi off made all apps faster. My admin says it was some feature mikrotiks were not supporting at the time combined with WMM settings. Strangely, most other phones were managing just fine, so we blamed cheapness of the 610 at the beginning.
If you still can replicate the problem, I suggest trying following:
open another connection in the background and ping it all the time.
use 3g/gprs instead of wifi (requires exposing your server to the internet)
use different (or upgraded) phone
use different (or upgraded) AP

Best socket options for client and sever that continuously transfer data

I am using Java (although I think the socket options is implement in most languages) to implement a client and server. The server sends data to the client for processing which the client acknowledges. On another port the client then sends the results of the processing back to the server. When it comes to options such as
SO_LINGER
SO_KEEPALIVE
SO_NODELAY
SO_REUSEADDRESS
SO_SENDBUFFER
SO_RECBUFFER
TCP_NODELAY
We have noticed that the connection between the client and server occasionally breaks. There will be a timeout on the send or the receive. When this happens will kill the socket and open a new one to continue.
What would be the best options to set in terms of the above scenario and is there anything that we could do from our side (programmatically or options-wise) to try minimize the amount of times the connection is dropped. We are using normal TCP/IP.
UPDATE:
The bounty on this ends soon. I haven't had a satisfactory answer yet so it is still open. I think everyone is missing the point of the quest. What is the best practice with regards to the options above for sockets that continuously chat. I have already got a ping packet in that if there is no work to be done (hardly ever the scenario) the normal message is sent with no inner elements so there is always processing.
Strictly speaking, you don't need any of these socket options:
* SO_LINGER
You need to set SO_LINGER only if your application still has outstanding packets to send when close(2) or shutdown(2) has been called. Not really applicable for your application.
* SO_KEEPALIVE
Sending keepalive-pings every two hours would really only help very long-lived but -very- quiet connections going through stateful firewalls with very long session timeouts. (Two hours between pings is entirely too long to be practical in today's Internet.)
* SO_NODELAY
This (presumably an alias for TCP_NODELAY) disables Nagle's algorithm, which is just a small-packet-avoidance problem. Perhaps Nagle is getting in the way in your application, but it takes special sequences of packets to introduce 500ms delays into processing; it never just hangs connections.
* SO_REUSEADDRESS
Useful for all 'servers' that listen on well-known port numbers; use on 'clients' is almost always covering up some bug or other, but it is sometimes necessary if requests must come from a well-known port number.
* SO_SENDBUFFER
* SO_RECBUFFER
These buffer sizes influence the kernel-side buffer sizes maintained for receiving or sending data while your program (receive buffer) or the socket (send buffer) isn't yet ready to accept more data. If these are set too small, your application might not transfer data as smoothly as possible, reducing throughput, but it should not lead to any stalls if these are set smaller than optimal. Of course, too large may put unreasonable demands on kernel memory, but there should be a reasonable system-wide maximum allowed size.
* TCP_NODELAY
Disables Nagle. Not likely to do more than introduce 500ms delays if your application sends multiple small packets before attempting a blocking read.
Really, you shouldn't need to set any socket options.
Can you distill your code into something that could be pasted here and tested or inspected? I'm used to TCP sessions surviving for days or weeks without trouble, so this is pretty surprising.
First I think that this page is relevant, regarding half-open connections.
http://nitoprograms.blogspot.com/2009/05/detection-of-half-open-dropped.html
That being said, TCP is designed to hide connection problems, so you may often find yourself in cases where the connection is broken, but neither side thinks it is. You have addressed this partially by using timeouts and taking that as a sign the connection is broken.
Since you are writing the client and server, I would avoid relying on TCP to tell you when the connection is broken altogether. I would just have the server also acknowledge the receipt of the result from the client. Then both sides will expect immediate responses to their messages, and you can track which messages have been ack'd and set an appropriately small timeout for receiving the ack. This is not a timeout on the send or receive, but a timeout on the time between sending a message and receiving the ack for that message. Then you can set the timeout appropriately depending on the quality of your connection (e.g. very small if you are running on loopback, but large if running over wireless with a weak signal).
Regarding the options you list, you will want to use SO_REUSEADDRESS so that you won't be prevented from reopening the socket, for example if it hasn't finished closing from a previously killed process.
You probably have, but it is best to check the obvious....
Have you verified that it IS the socket that is timing out, and not your code? Sockets are fairly stable, and while there might be an issue somewhere, it seems more likely that it is in your code. I would use logs, timestamps, and synchronised clocks to be sure.
There may be an issue that you genuinely DO take a long time to do the calculation, so maybe adding a 'I'm still thinking about it' message to your protocol that gets sent regularly, to keep the connection alive?
Of course networks will drop out from time to time regardless of what you do, and it sounds like you are already handling that case nicely.
try these options
SO_LINGER - for specyfying when the Socket close s called while some unsent data in the queue
TCP_NODELAY - For non blocking datat transfer
I would strongly encourage you to use a ping/echo model between client and server, so that if no data is sent for x seconds a ping message needs to be send. A typical reason for a break might be a firewall, which shuts down socketss because of inactivity.
The typical issue where the TCP model fails are physical problems e.g. a pulled/broken cable and hangs on one side, where technically someone is listening until a queue overrun kicks in (which might never happen given your amount of data).
What are the chances the connection is going through a NAT firewall somewhere along the way? Stateful firewalls maintain a table of open connections so that packets belonging to an allowed connection can quickly pass through the system, without forcing firewall admins to write overly-complex rule sets.
The downside is that this table can grow immensely large, so it must be pruned as connections are closed or as they appear to have simply grown stale and died quietly. A connection that has gone silent for 20 minutes is usually quiet enough to reaped. (Which is really very quick, as the TCP KEEPALIVE is typically two hours, making it nearly useless in the face of NAT firewalls.)
So: is this going through a NAT firewall? Is the connection quiet for long stretches? If so, add a ping/pong to your protocol, and fire it every few minutes.