WiFi lag spikes after quiet period - sockets

I have a simple client<>server setup where the client sends UDP packets to the server on say port 2000 many times per second. The server has a thread with an open BSD socket listening on port 2000 and reads data using a blocking recvfrom call. That's it. I've set up a simple tic toc timer around the recvfrom call in the server and plotted the results when running this over Wifi.
When the server is connected to the access point via Wifi, it's similar in that usually the recvfrom call also take 0.015 seconds. However, after a short period of radio silence where no packets are sent (about half a second) the next packet that comes in on the server will cause the recvfrom call to take an extremely long time (between 0.6 and 3 seconds), followed by a succession of very quick calls (about 0.000005 seconds) and then back to normal (around 0.015 seconds). Here's some sample data:
0.017361 <--normal
0.014914
0.015633
0.015867
0.015621
... <-- radio silence
1.168011 <-- spike after period of radio silence
0.000010 <-- bunch of really fast recvfrom calls
0.000005
0.000006
0.000005
0.000006
0.000006
0.000005
0.015950 <-- back to normal
0.015968
0.015915
0.015646
If you look closely you can notice this on the graph.
When I connect the server to the access point over a LAN (i.e. with a cable), everything works perfectly fine and the recvfrom call always takes around 0.015 seconds. But over Wifi I get these crazy spikes.
What on earth could be going on here?
P.S. The server is running Mac OS X, the client is an iPhone which was connected to the access point via Wifi in both cases. I've tried running the client on an iPad and get the same results. The access point is a Apple Airport Extreme base station with a network that is extended using an Apple Airport Express. I've also tried with a Thompson router and a simple (non WDS network) and still get the same issue.
UPDATE
I rewrote the server part on Windows .NET in C# and tested it over the Wifi keeping everything else the same and the issue disappeared. So it suggests that it's a OS/network stack/socket issue on Mac OS X.

I don't think you can do anything about it. Several things can happen:
The WiFi MAC layer must allocate bandwidth slots to multiple users, it will usually try to give a user long enough time to send as much traffic as possible. But while other users are busy, this client can't send traffic. You even see this with only one user (consequence of the 802.11 protocol), but you'll notice this most with multiple active users of course.
IOS itself may have some kind of power saving and buffers packets for some time to send bursts, so it can keep some subsystems idle for a period of time.
You have some other radio signal that interferes.
This is not an exhaustive list, just what I could think of on short-notice with only the given input.
One thing: 0.6 to 3 seconds is not an extremely long time in the wireless domain, it might be 'long', but latency is with reason one of the biggest issues in all wireless communications. Don't forget that most wifi AP's are based on quite old specs, so I wouldn't say these numbers are extreme (I wouldn't expect 3s gaps however).

Related

Why is my TCP socket showing connected but not responding?

I have a program using a bi-directional TCP socket to send messages from the host PC to a VLinx ethernet-to-serial converter and then on to a PLC via RS-232. During heavy traffic the socket will intermittently stop communicating although all soft tests of the connection show that it is connected, active and writeable. I suspect that something is interrupting the connection causing the socket to close with out FIN/ACK. How can I test to see where this disconnect might be occuring?
The program itself is written in VB6 and uses Catalyst SocketTools/SocketWrench as opposed to the standard Winsock library. The methodology, properties and code seem to be sound since the same setup works reliably at two other sites. It's just this one site in particular where this problem occurs. It only happens during production when there is traffic on the network and can lose connection anywhere between 20 - 100 times per 10-hour day.
There are redundant tests in place to catch this loss of communication and keep the system running. We have tests on ACK messages, message queue size, time between transmissions (tokens on 2s interval), etc. Typically, the socket will not be unresponsive for more than 30 seconds before it is caught, closed and re-established which works properly >99% of the time.
Previously I had enabled the SocketTools logging capabilities which did not capture any relevant information. Most recently I have tried to have the system ping the VLinx on the first sign of a missed message (2.5 seconds). Those pings have always been successful, meaning that if there is a momentary loss of connection at a switch or AP it does not stay disconnected for long.
I do not have access to the network hardware aside from the PC and VLinx that we own. The facility's IT is also not inclined to help track these kinds of things down because they work on a project-based model.
Does anyone have any suggestions what I can do to try and determine where the problem is occurring so that I can then try to come up with a permanent solution to this issue rather than the band-aid of reconnecting multiple times per day?
A tool like Wireshark may be helpful in seeing what's going on at the network level. The logging facility in SocketTools/SocketWrench can only report what's going on at the API level, and it sounds like whatever the underlying problem is occurs at a lower level in the TCP stack.
If this is occurring after periods of relative inactivity, followed by a burst of activity, one thing you could try doing is enabling keep-alive and see if that makes any difference.

GameKit synchronization

I tried to implement GameKit to play songs synchronously on several devices over bluetooth/wifi. Still I always have 0.1-0.5 sec latency. Think that such synchronisation is trivial. I found BM receiver metronome that implements what I want, sound is played really synchronously.
I'm using GKSendDataReliable and sending 1 small packet with rhythm. Done all things using this tutorial here.
I've googled a lot but can't find the answer and/or my bottleneck. Would appreciate any suggestions/approaches. Maybe some tutorials?
how many devices do you use? I assume it's server-client scheme. Here's what I would do (it may not work but you are asking for suggestions/approaches), very rough algorithm:
-server sets up a timer and sends its value to clients using PING packet, remembering it per client
-client receives PING packet, and sends response packet
-server gets the response packet, checks the time and stores the resulting delta somewhere (in a vector)
-server sends more PING packets and gets more time deltas
-when server has enough data, it calculates average ping times (CLIENTX_TIME) and chooses the biggest one (BIGGEST_TIME)
-server sends START packet to all the clients, packet has some additional info saying START PLAYING in XX_TIME
-server starts playing music after BIGGEST_TIME
-clients get the START packets and start playing music after XX time, where XX is calculated per player (so for slowest client it will be 0, for other it will be BIGGEST_TIME-CLIENTX_TIME)
-this process continues over and over, everytime you want to play something, you do it in advance
thing is that you will NEVER be able to get everything synced all the time - this is the nature of the network unfortunately ;)

Consistent 500ms delay on network comms on SOME winXP,win7 systems

We're having a problem with network performance on some WinXP and Windows 7 machines, but not all.
We'll issue a send() call, then recv() the response. Logging system ticks to a log file, the application thinks the delay is on the recv() (almost exactly 500 ms + a 15 ms quantum every time). However, Wireshark shows no significant delay at all from when the send packet actually goes out to the response coming in. So, something appears either be blocking the send() for 500 ms, or the whole response gets delayed before the app sees it (which is up to 30K or so, so lots of packets).
Turning the Windows Firewall on or off does not do anything. This computer has a Trend Micro trial installed, but unactivated and disabled. Other computers with the problem have had other antivirus software, etc.
We've looked into Nagle and delayed ACK, and neither seems to be the culprit. We're also using TCP_NODELAY just in case. The TcpAckFrequency registry entry doesn't change anything either.
We're doing a single send(), and a single recv(). Nothing fancy.
Thinking it might be a problem with our using port 80 and having some unknown packet inspection choke on us, we tried different ports with the same effect.
Any ideas?
EDIT
We have some expert users claiming that uninstalling the antivirus (even a disabled one) fixes it for them, but not all the time. Also, different systems are using different antivirus packages (this one has a disabled Trend Micro, others are Norton, etc) It's weak evidence of something, perhaps... I thought I'd mention it.
EDIT 2
Updated to make this non-Win7 specific as we've now found a WinXP system that behaves the same way.

How to speed up slow / laggy Windows Phone 7 (WP7) TCP Socket transmit?

Recently, I started using the System.Net.Sockets class introduced in the Mango release of WP7 and have generally been enjoying it, but have noticed a disparity in the latency of transmitting data in debug mode vs. running normally on the phone.
I am writing a "remote control" app which transmits a single byte to a local server on my LAN via Wifi as the user taps a button in the app. Ergo, the perceived responsiveness/timeliness of the app is highly important for a good user experience.
With the phone connected to my PC via USB cable and running the app in debug mode, the TCP connection seems to transmit packets as quickly as the user taps buttons.
With the phone disconnected from the PC, the user can tap up to 7 buttons (and thus case 7 "send" commands with 1 byte payloads before all 7 bytes are sent.) If the user taps a button and waits a little between taps, there seems to be a latency of 1 second.
I've tried setting Socket.NoDelay to both True and False, and it seems to make no difference.
To see what was going on, I used a packet sniffer to see what the traffic looked like.
When the phone was connected via USB to the PC (which was using a Wifi connection), each individual byte was in its own packet being spaced ~200ms apart.
When the phone was operating on its own Wifi connection (disconnected from USB), the bytes still had their own packets, but they were all grouped together in bursts of 4 or 5 packets and each group was ~1000ms apart from the next.
btw, Ping times on my Wifi network to the server are a low 2ms as measured from my laptop.
I realize that buffering "sends" together probably allows the phone to save energy, but is there any way to disable this "delay"? The responsiveness of the app is more important than saving power.
This is an interesting question indeed! I'm going to throw my 2 cents in but please be advised, I'm not an expert on System.Net.Sockets on WP7.
Firstly, performance testing while in the debugger should be ignored. The reason for this is that the additional overhead of logging the stack trace always slows applications down, no matter the OS/language/IDE. Applications should be profiled for performance in release mode and disconnected from the debugger. In your case its actually slower disconnected! Ok so lets try to optimise that.
If you suspect that packets are being buffered (and this is a reasonable assumption), have you tried sending a larger packet? Try linearly increasing the packet size and measuring latency. Could you write a simple micro-profiler in code on the device ie: using DateTime.Now or Stopwatch class to log the latency vs. packet size. Plotting that graph might give you some good insight as to whether your theory is correct. If you find that 10 byte (or even 100byte) packets get sent instantly, then I'd suggest simply pushing more data per transmission. It's a lame hack I know, but if it aint broke ...
Finally you say you are using TCP. Can you try UDP instead? TCP is not designed for real-time communications, but rather accurate communications. UDP by contrast is not error checked, you can't guarantee delivery but you can expect faster (more lightweight, lower latency) performance from it. Networks such as Skype and online gaming are built on UDP not TCP. If you really need acknowledgement of receipt you could always build your own micro-protocol over UDP, using your own Cyclic Redundancy Check for error checking and Request/Response (acknowledgement) protocol.
Such protocols do exist, take a look at Reliable UDP discussed in this previous question. There is a Java based implementation of RUDP about but I'm sure some parts could be ported to C#. Of course the first step is to test if UDP actually helps!
Found this previous question which discusses the issue. Perhaps a Wp7 issue?
Poor UDP performance with Windows Phone 7.1 (Mango)
Still would be interested to see if increasing packet size or switching to UDP works
ok so neither suggestion worked. I found this description of the Nagle algorithm which groups packets as you describe. Setting NoDelay is supposed to help but as you say, doesn't.
http://msdn.microsoft.com/en-us/library/system.net.sockets.socket.nodelay.aspx
Also. See this previous question where Keepalive and NoDelay were set on/off to manually flush the queue. His evidence is anecdotal but worth a try. Can you give it a go and edit your question to post more up to date results?
Socket "Flush" by temporarily enabling NoDelay
Andrew Burnett-Thompson here already mentioned it, but he also wrote that it didn't work for you. I do not understand and I do not see WHY. So, let me explain that issue:
Nagle's algorithm was introduced to avoid a scenario where many small packets had to been sent through a TCP network. Any current state-of-the-art TCP stack enables Nagle's algorithm by default!
Because: TCP itself adds a substantial amount of overhead to any the data transfer stuff that is passing through an IP connection. And applications usually do not care much about sending their data in an optimized fashion over those TCP connections. So, after all that Nagle algorithm that is working inside of the TCP stack of the OS does a very, very good job.
A better explanation of Nagle's algorithm and its background can be found on Wikipedia.
So, your first try: disable Nagle's algorithm on your TCP connection, by setting option TCP_NODELAY on the socket. Did that already resolve your issue? Do you see any difference at all?
If not so, then give me a sign, and we will dig further into the details.
But please, look twice for those differences: check the details. Maybe after all you will get an understanding of how things in your OS's TCP/IP-Stack actually work.
Most likely it is not a software issue. If the phone is using WiFi, the delay could be upwards of 70ms (depending on where the server is, how much bandwidth it has, how busy it is, interference to the AP, and distance from the AP), but most of the delay is just the WiFi. Using GMS, CDMA, LTE or whatever technology the phone is using for cellular data is even slower. I wouldn't imagine you'd get much lower than 110ms on a cellular device unless you stood underneath a cell tower.
Sounds like your reads/writes are buffered. You may try setting the NoDelay property on the Socket to true, you may consider trimming the Send and Receive buffer sizes as well. The reduced responsiveness may be a by-product of there not being enough wifi traffic, i'm not sure if adjusting MTU is an option, but reducing MTU may improve response times.
All of these are only options for a low-bandwidth solution, if you intend to shovel megabytes of data in either direction you will want larger buffers over wifi, large enough to compensate for transmit latency, typically in the range of 32K-256K.
var socket = new System.Net.Sockets.Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)
{
NoDelay = true,
SendBufferSize = 3,
ReceiveBufferSize = 3,
};
I didn't test this, but you get the idea.
Have you tried setting SendBufferSize = 0? In the 'C', you can disable winsock buffering by setting SO_SNDBUF to 0, and I'm guessing SendBufferSize means the same in C#
Were you using Lumia 610 and mikrotik accesspoint by any chance?
I have experienced this problem, it made Lumia 610 turn off wifi radio as soon as last connection was closed. This added perceivable delay, compared to Lumia 800 for example. All connections were affected - simply switching wifi off made all apps faster. My admin says it was some feature mikrotiks were not supporting at the time combined with WMM settings. Strangely, most other phones were managing just fine, so we blamed cheapness of the 610 at the beginning.
If you still can replicate the problem, I suggest trying following:
open another connection in the background and ping it all the time.
use 3g/gprs instead of wifi (requires exposing your server to the internet)
use different (or upgraded) phone
use different (or upgraded) AP

How can I compare the time between an Iphone and a (web) server?

I have an application made up of a server which sends occasional messages to Iphones. The latency between the two devices is important to the problem domain - if it takes less than a second for the message to arrive, everything's fine; if it takes more than 5 seconds, there's almost certainly a problem. The server-side messages are time stamped with the server time.
Using the cellular data connection, we see occasional delays, but we can't quantify them, because there's no guarantee that the Iphone's clock is synchronized with the server; one our test phones, we see different times for different carriers.
Is there a simple way to synchronize time between the Iphone and the server? I've looked at (S)NTP, which seems to be the right way to go.
Any alternatives? We only need to be accurate within seconds, not milli seconds.
I'm not sure what the exact situation is, so this may be a non-solution, but:
Presuming that you want to figure out the latency between the phone and the server (and only this) at set intervals (decided by the server). Presuming also that the error checking is done server-side, instead of synchronizing clocks, you might go with a "ping" approach.
Server pings client iPhone, and starts a stopwatch.
Client immediately pings server.
As soon as client ping reaches server, server stops the stopwatch and checks the time.
If I misunderstood your problem, apologies.
Well, a somewhat simplistic solution is that you could have the phones tell the server what time they have at various times and keep a database table of deltas. Then adjust your reported timestamp to the serve's time r +/- the delta. iPhones are synced to the carrier's time server to the best of my knowledge. The other possibility is to have both the phone and server query a common time source on a daily basis. It's unlikely that the time would vary much over a single day.