iPhone: Strategies for uploading large files from phone to server - iphone

We're running into issues uploading hires images from the iPhone to our backend (cloud) service. The call is a simple HTTP file upload, and the issue appears to be the connection breaking before the upload is complete - on the server side we're getting IOError: Client read error (Timeout?).
This happens sporadically: most of the time it works, sometimes it fails. When a good connection is present (ie. wifi) it always works.
We've tuned various timeout parameters on the client library to make sure we're not hitting any of them. The issue actually seems to be unreliable mobile connectivity.
I'm thinking about strategies for making the upload reliable even when faced with poor connectivity.
The first thing that came to mind was to break the file into smaller chunks and transfer it in pieces, increasing the likelihood of each piece getting there. But that introduces a fair bit of complexity on both the client and server side.
Do you have a cleverer approach? How would you tackle this?

I would use the ASIHTTPRequest library. It's have some great features like bandwidth throttling. It can upload files directly from the system instead of loading the file into memory first. Also I would break the photo into like 10 parts. So for a 5 meg photo, it would be like 500k each. You would just create each upload using a queue. Then when the app goes into background, it can complete the part it's currently uploading. If you cannot finish uploading all the parts in the allocated time, just post a local notification reminding the user it's not completed. Then after all the parts have been sent to your server, you would call a final request that would combine all the parts back into your photo on the server-side.

Yeah, timeouts are tricky in general, and get more complex when dealing with mobile connections.
Here are a couple ideas:
Attempt to upload to your cloud service as you are doing. After a few failures (timeouts), mark the file, and ask the user to connect their phone to a wifi network, or wait till they connect to the computer and have them manually upload via the web. This isn't ideal however, as it pushes more work to your users. The upside is that implementationwise, it's pretty straight forward.
Instead of doing an HTTP upload, do a raw socket send instead. Using raw socket, you can send binary data in chunks pretty easily, and if any chunk-send times out, resend it until the entire image file is sent. This is "more complex" as you have to manage binary socket transfer but I think it's easier than trying to chunk files through an HTTP upload.
Anyway that's how I would approach it.


low connectivity protocols or technologies

I'm trying to enhance a server-app-website architecture in reliability, another programmer has developed.
At the moment, android smartphones start a tcp connection to a server component to exchange data. The server takes the data, writes them into a DB and another user can have a look on the data through a website. The problem is that the smartphones very regularly are in locations where connectivity is really bad. The consequence is that the smartphones lose the tcp connection and it's hard to reconnect. Now my question is, if there are any protocols that are so lightweight or accomodating concerning bad connectivity that the data exchange could work better or more reliable.
For example, I was thinking about replacing the raw TCP interface with a RESTful API, but I don't really know how well REST works in this scenario, as I don't have any experience in this area.
Maybe useful to know for answering this question: The server component is programmed in c#. The connecting components are android smartphones.
Please understand that I dont add some code to this question, because in my opinion its just a theoretically question.
Thank you in advance !
REST runs over HTTP which runs over TCP so it would have the same issues with connectivity.
Moving up the stack to the application you could perhaps think in terms of 'interference'. I quite often have to use technical stuff in remote areas with limited reception and it reminds of trying to communicate in a storm. If you think about it, if you're trying to get someone to do something in a storm where they can hardly hear you and the words get blown away (dropped signal), you don't read them the manual on how to fix something, you shout key words such as 'handle', 'pull', 'pull', 'PULL', 'ok'. So the information reaches them in small bursts you can repeat (pull, what? pull, eh? PULL! oh righto!)
Can you redesign the communications between the android app and the server so the server can recognise key 'words' with corresponding data and build up the request over a period of time? If you consider idempotency, each burst of data would not alter the request if it has already been received (pull, PULL!) and over time the android app could send/receive smaller chunks of the request. If the signal stays up, just keep sending. If it goes down, note which parts of the request haven't been sent and retry them when the signal comes back.
So you're sending the request jigsaw-style but the server knows how to reassemble the pieces in the right order. A STOP word at the end tells the server ok this request is complete, go work on it. Until that word arrives the server can store the incomplete request or discard it if no more data comes in.
If the server respond to the first request chunk with an id, the app can use the id to get the response and keep trying until the full response comes back, at which point the server can remove the response from its jigsaw cache. A fair amount of work though.

Download multiple files vs single one big file&unzip via socket

I need my client to download 30Mb worth of files.
Following is the setup.
They are comprised of 3000 small files.
They are downloaded through tcp bsd socket.
They are stored in client's DB as they get downloaded.
Server can store all necessary files in memory.(no file access on server side)
I've not seen many cases where client downloads such large number of files which I suspect due to server side's file access.
I'm also worried if multiplexer(select/epoll) will be overwhelmed by excessive network request handling.(Do I need to worry about this?)
With the above suspicions, I zipped up 3000 files to 30 files.
(overall size doesn't change much because the files are already compressed files(png))
Test shows,
3000 files downloading is 25% faster than 30files downloading & unzipping.
I suspect it's because client device's is unable to download while unzipping & inserting into DB, I'm testing on handheld devices.. iPhone..
(I've threaded unzipping+DB operation separate from networking code, but DB operation seems to take over the whole system. I profiled a bit, and unzipping doesn't take long, DB insertion does. On server-side, files are zipped and placed in memory beforehand.)
I'm contemplating on switching back to 3000 files downloading because it's faster for clients.
I wonder what other experienced network people will say over the two strategies,
1. many small data
2. small number of big data & unzipping.
For experienced iphone developers, I'm threading out the DB operation using NSOperationQueue.
Does NSOperationQueue actually threads out well?
I'm very suspicious on its performance.
-- I tried posix thread, no significant difference..
I'm answering my own question.
It turned out that inserting many images into sqlite DB at once in a client takes long time, as a result, network packet in transit is not delivered to client fast enough.
After I adopted the suggestion in the faq to speed up "many insert", it actually outperforms the "many files download individually strategy".

How can I measure the breakdown of network time spent in iOS?

Uploads from my app are too slow, and I'd like to gather some real data as to where the time is being spent.
By way of example, here are a few stages a request goes through:
Initial radio connection (significant source of latency in EDGE)
DNS lookup (if not cached)
SSL/TLS handshake.
HTTP request upload, including data.
Server processing time.
HTTP response download.
I can address most of these (e.g. by powering up the radio earlier via a dummy request, establishing a dummy HTTP 1.1 connection, etc.), but I'd like to know which ones are actually contributing to network slowness, on actual devices, with my actual data, using actual cell towers.
If I were using WiFi, I could track a bunch of these with Wireshark and some synchronized clocks, but I need cellular data.
Is there any good way to get this detailed breakdown, short of having to (gak!) use very low level socket functions to reproduce my vanilla http request?
Ok, the method I would use is not easy, but it does work. Maybe you're already tried this, but bear with me.
I get a time-stamped log of the sending time of each message, the time each message is received, and the time it is acted upon. If this involves multiple processes or threads, I have each one generate a log, and then merge them into a common timeline.
Then I plot out the timeline. (A tool would be nice, but I did it by hand.)
What I look for is things like 1) messages re-transmitted due to timeouts, 2) delays between the time a message is received and the time it's acted upon.
Usually, this identifies problems that I can fix in the code I can control. This improves things, but then I do it all over again, because chances are pretty good that I missed something the last time.
The result was that a system of asynchronous message-passing can be made to run quite fast, once preventable sources of delay have been eliminated.
There is a tendency in posting questions about performance to look for magic fixes to improve the situation. But, the real magic fix is to refine your diagnostic technique so it tells you what to fix, because it will be different from anyone else's.
An easy solution to this would be once the application get's fired, make a Long Polling connection with the server (you can choose when this connection need's to establish prior hand, and when to disconnect), but that is a kind of a hack if you want to avoid all the sniffing of packets with less api exposure iOS provides.

Faulty-connection Proof File Transfer Protocol?

I frequently do website development live over an FTP connection. That is to say, I use a code editor with a built in FTP window and push/pull files to work on them, upload the changes, etc. This is mostly because it's unreasonable to try to create a local development server, and I use too many computers for that to be practical anyway without a lot of work.
My trouble is, the internet connection at our home is not exactly... stable. It's fast and mostly reliable, but it has a tendancy to glitch far more frequently than any other connection I've worked on (it's wireless DSL) and as a result, dropped connections are far too frequent. (It's about as reliable as AT&T is with phone calls in that regard.) When working with FTP, I find that if it drops the connection mid-file transfer, it can be difficult to recover. First of all, when the connection is dropped, it saves a blank file to the server (how is this helpful?) breaking the page I was working on completely, and the icing on the cake is that depending on the timing, vsftpd will get itself stuck in a timeout and I have to SSH in and restart it before I can access that file again.
This process alone has only been beneficial because it's taught me to build up some data protection techniques clientside, to prevent the server from eating my recent changes if the dropped connection happens to hang or crash my client. Overall though, it's a pretty failed situation, and I'm surprised I get any work done at all.
Long, long context, I know, but my question is this: Is there a file transfer protocol that is designed to handle "flakey" connections like mine? I'd imagine that, for example, trying to transfer files over a 3G tethered connection would yield the same results, especially while traveling. It seems like FTP and SFTP both rely on a persistant connection, and can deal with dropped packets but not the loss of the entire socket through a reconnect. It seems to me like a file transfer daemon should be able to store the state of the user interacting with it, and thus detect failed transfers and be ready to "resume" if the user reconnects in a reasonable amount of time.
Thanks if anyone knows anything. I'm seriously considering trying to write such a protocol myself (I've had a lot of success coding the ajax on my page to handle faulty connections, for example) but I don't want to dive in if there's already a solution available.
You want rsync. If the connection drops, you just repeat the command and it picks up right where it left off. Built in error checking and everything. Works over SSH, Windows client exists. Somebody's probably written a GUI front end.
BitTorrent works well with flakey connections. I hear that it is fast, too!

Streaming Data

I unsuccessfully searched Google for a good definition and understanding of streaming data and its characteristics. My questions are:
What is streaming data?
How can it be detected?
"How can it be detected" is not an appropriate question. Instead my question is:
How is it different from buffered data and other data transfer mechanisms?
It depends in what context you mean but basically streaming data is analagous to asynchronous data. Take the Web as an example. The Web (or HTTP specifically) is (basically) a request-response mechanism in that a client makes a request and receives a response (typically a Web page of some kind).
HTTP doesn't natively support the ability for servers to push content to clients. There are a number of ways this can be faked, including:
Polling: forcing the client to make repeated requests, typically inconspicuously (as far as the client is concerned);
Long-lived connections: this is where the client makes a normal HTTP request but instead of returning immediately the server hangs on to the request until there's something to send back. When the request times out or a response is sent th eclient sends another request. In this way you can fake server push;
Plug-ins: Java applets, Flash, Silverlight and others can be used to achieve this.
Anything where the server effectively sends data to the client (rather than the client asking for it)--regardless of the mechanism and whether or not the client is polling for that data--can be characterised as streaming data.
With non-HTTP transports (eg vanilla TCP) server push is typically easier (but can still run afoul of firewalls and th elike). An example of this might be a sharetrading application that receives market information from a provider. That's streaming data.
How do you detect it? Bit of a vague question. I'm not really sure what you're getting at.
When you say streaming data I think of the following, although I'm not sure if this is what you're getting at. To me it's playing a video/audio file while it's downloading. That's what happens when you go to YouTube and watch a video and it starts playing even though you haven't downloaded the whole video yet. But you can see the video downloading - I'm sure you're familiar with the seek bar filling up as the file downloads. It doesn't necessarily have to be a video or audio file but that's the most common.