I need my client to download 30Mb worth of files.
Following is the setup.
They are comprised of 3000 small files.
They are downloaded through tcp bsd socket.
They are stored in client's DB as they get downloaded.
Server can store all necessary files in memory.(no file access on server side)
I've not seen many cases where client downloads such large number of files which I suspect due to server side's file access.
I'm also worried if multiplexer(select/epoll) will be overwhelmed by excessive network request handling.(Do I need to worry about this?)
With the above suspicions, I zipped up 3000 files to 30 files.
(overall size doesn't change much because the files are already compressed files(png))
Test shows,
3000 files downloading is 25% faster than 30files downloading & unzipping.
I suspect it's because client device's is unable to download while unzipping & inserting into DB, I'm testing on handheld devices.. iPhone..
(I've threaded unzipping+DB operation separate from networking code, but DB operation seems to take over the whole system. I profiled a bit, and unzipping doesn't take long, DB insertion does. On server-side, files are zipped and placed in memory beforehand.)
I'm contemplating on switching back to 3000 files downloading because it's faster for clients.
I wonder what other experienced network people will say over the two strategies,
1. many small data
2. small number of big data & unzipping.
EDIT
For experienced iphone developers, I'm threading out the DB operation using NSOperationQueue.
Does NSOperationQueue actually threads out well?
I'm very suspicious on its performance.
-- I tried posix thread, no significant difference..
I'm answering my own question.
It turned out that inserting many images into sqlite DB at once in a client takes long time, as a result, network packet in transit is not delivered to client fast enough.
http://www.sqlite.org/faq.html#q19
After I adopted the suggestion in the faq to speed up "many insert", it actually outperforms the "many files download individually strategy".
Related
I creating a database in Filemaker, the database is about 1GB and includes around 500 photos.
Filemaker maker server is having performance issues, its crashes and takes it’s time when searching though the database. My IT department recommended to raise the cache memory.
I raised the memory 252MB but it's still struggling to give a consistent performance. The database shows now peaks in the CPU.
What can cause this problem?
Verify at FileMaker.com that your server meets the minimum requirements for your version.
For starters:
Increase the cache to 50% of the total memory available to FileMaker server.
Verify that the hard disk is unfragmented and has plenty of free space.
FM Server should be extremely stable.
FMS only does two things:
reads data from the disk and sends it to the network
takes data from the network and writes it to the disk
Performance bottlenecks are always disk and network. FMS is relatively easy on CPU and RAM unless Web Direct is being used.
Things to check:
Are users connecting through ethernet or wifi? (Wifi is slow and unreliable.)
Is FMS running in a virtual machine?
Is the machine running a supported operating system?
Is the database using Web Direct? (Use a 2-machine deployment for web direct.)
Is there anything else running on the machine? (Disable virus and indexing.)
Make sure users are accessing the live databases through FMP client and not through file sharing.
How are the database being backed up? NEVER let anything other than FMS see the live files. Only let OS-level backup processes see backup copies, never the live files.
Make sure all the energy saving options on the server are DISABLED. You do NOT want the CPU or disks sleeping or powering down.
Put the server onto an uninterruptible power supply (UPS). Bad power could be causing problems.
HTTP/2 was a very nice improvement, now we don't have the network overhead related to creating multiple connections to download multiple files, and is recommended to stop concatenating JS and CSS files to improve caching, but I'm not sure if from the perspective of I/O, having to read multiple files from hard drive will impact performance.
I don't know if Web servers like Apache or nginx have some RAM caching of frequent files and this avoids I/O overhead and I can stop concatenating and stop worrying about this.
No, having to read multiple files from the hard disk won't worsen performance, specially if your are using the current crop of SSD hard-drives. They have amazing performance and thanks to them network bandwidth, SSL and everything else are the bottlenecks.
Also, operating systems automatically cache in RAM recently accessed files, so the feature you would like doesn't need to be built into the web server, because it is already built into the operating system. And this works even if you are using magnetic drives.
There is one thing though. We (at shimmercat.com) have noticed that sometimes compression suffers by having multiple small files instead of a concatenated one. In that sense, it would have been nice if HTTP/2 had included a compression format like SDCH ... But this should matter more for particularly small files than for big files. All in all, I think that the cache benefits for small files are worth it.
We're running into issues uploading hires images from the iPhone to our backend (cloud) service. The call is a simple HTTP file upload, and the issue appears to be the connection breaking before the upload is complete - on the server side we're getting IOError: Client read error (Timeout?).
This happens sporadically: most of the time it works, sometimes it fails. When a good connection is present (ie. wifi) it always works.
We've tuned various timeout parameters on the client library to make sure we're not hitting any of them. The issue actually seems to be unreliable mobile connectivity.
I'm thinking about strategies for making the upload reliable even when faced with poor connectivity.
The first thing that came to mind was to break the file into smaller chunks and transfer it in pieces, increasing the likelihood of each piece getting there. But that introduces a fair bit of complexity on both the client and server side.
Do you have a cleverer approach? How would you tackle this?
I would use the ASIHTTPRequest library. It's have some great features like bandwidth throttling. It can upload files directly from the system instead of loading the file into memory first. Also I would break the photo into like 10 parts. So for a 5 meg photo, it would be like 500k each. You would just create each upload using a queue. Then when the app goes into background, it can complete the part it's currently uploading. If you cannot finish uploading all the parts in the allocated time, just post a local notification reminding the user it's not completed. Then after all the parts have been sent to your server, you would call a final request that would combine all the parts back into your photo on the server-side.
Yeah, timeouts are tricky in general, and get more complex when dealing with mobile connections.
Here are a couple ideas:
Attempt to upload to your cloud service as you are doing. After a few failures (timeouts), mark the file, and ask the user to connect their phone to a wifi network, or wait till they connect to the computer and have them manually upload via the web. This isn't ideal however, as it pushes more work to your users. The upside is that implementationwise, it's pretty straight forward.
Instead of doing an HTTP upload, do a raw socket send instead. Using raw socket, you can send binary data in chunks pretty easily, and if any chunk-send times out, resend it until the entire image file is sent. This is "more complex" as you have to manage binary socket transfer but I think it's easier than trying to chunk files through an HTTP upload.
Anyway that's how I would approach it.
We need to write software that would continuously (i.e. new data is sent as it becomes available) send very large files (several Tb) to several destinations simultaneously. Some destinations have a dedicated fiber connection to the source, while some do not.
Several questions arise:
We plan to use TCP sockets for this task. What failover procedure would you recommend in order to handle network outages and dropped connections?
What should happen upon upload completion: should the server close the socket? If so, then is it a good design decision to have another daemon provide file checksums on another port?
Could you recommend a method to handle corrupted files, aside from downloading them again? Perhaps I could break them into 10Mb chunks and calculate checksums for each chunk separately?
Thanks.
Since no answers have been given, I'm sharing our own decisions here:
There is a separate daemon for providing checksums for chunks and whole files.
We have decided to abandon the idea of using multicast over VPN for now; we use a multi-process server to distribute the files. The socket is closed and the worker process exits as soon as the file download is complete; any corrupted chunks need to be downloaded separately.
We use a filesystem monitor to capture new data as soon as it arrives to the tier 1 distribution server.
I'm looking for ways to gather files from clients. These clients have our software and we are currently using FTP for gathering files from them. The files are collected from the client's database, encrypted and uploaded via FTP to our FTP server. The process is fraught with frustration and obstacles. The software is frequently blocked by common firewalls and often runs into difficulties with VPNs and NAT (switching to Passive instead of Active helps usually).
My question is, what other ideas do people have for getting files programmatically from clients in a reliable manner. Most of the files they are submitting are < 1 MB in size. However, one of them ranges up to 25 MB in size.
I'd considered HTTP POST, however, I'm concerned that a 25 mb file would often fail over a post (the web server timing out before the file could completely be uploaded).
Thoughts?
AndrewG
EDIT: We can use any common web technology. We're using a shared host, which may make central configuration changes difficult to make. I'm familiar with PHP from a common usage perspective... but not from a setup perspective (written lots of code, but not gotten into anything too heavy duty). Ruby on Rails is also possible... but I would be starting from scratch. Ideally... I'm looking for a "web" way of doing it as I'd like to eventually be ready to transition from installed code.
Research scp and rsync.
One option is to have something running in the browser which will break the upload into chunks which would hopefully make it more reliable. A control which does this would also give some feedback to the user as the upload progressed which you wouldn't get with a simple HTTP POST.
A quick Google found this free Java Applet which does just that. There will be lots of other free and pay for options that do the same thing
You probably mean a HTTP PUT. That should work like a charm. If you have a decent web server. But as far as I know it is not restartable.
FTP is the right choice (passive mode to get through the firewalls). Use an FTP server that supports Restartable transfers if you often face VPN connection breakdowns (Hotel networks are soooo crappy :-) ) trouble.
The FTP command that must be supported is REST.
From http://www.nsftools.com/tips/RawFTP.htm:
Syntax: REST position
Sets the point at which a file transfer should start; useful for resuming interrupted transfers. For nonstructured files, this is simply a decimal number. This command must immediately precede a data transfer command (RETR or STOR only); i.e. it must come after any PORT or PASV command.