I'm writing software that uploads and downloads a number of files using ftp with a remote server. Download speeds are fine and stay consistent at upwards of 4mb/s. Small uploads are instantaneous. The problem I'm experiencing is when my program uploads a large 40Mb zip file, i'm getting extremely poor performance. It seems to upload in bursts (100-200Kb/s) and then delay for a second and do this repeatedly until the file eventually finishes uploading. Programatically downloading the file from the same server takes 30 seconds tops, uploading the same file to the same server using filezilla takes about the same amount of time. Uploading through the software can take up to 15 minutes. Something is clearly wrong.
I am using the starksoft ftp library to handle uploads/downloads from here: http://starksoftftps.codeplex.com/
Here is an example of the problematic code:
FtpClient ftp = new FtpClient(sourcecfg.Host);
ftp.MaxUploadSpeed = 0;
ftp.MaxDownloadSpeed = 0;
ftp.FileTransferType = TransferType.Binary;
ftp.DataTransferMode = TransferMode.Passive;
ftp.Open(sourcecfg.FtpUserName, sourcecfg.FtpPassword);
ftp.PutFile(backupTempPath, targetcfg.getFullPath() + "wordpress-backup.zip", FileAction.Create);
I've also tried using an overloaded version of PutFile that takes a Stream object instead of a path string. The results were unchanged.
Incidentals: I'm compiling in visual c# express 2008 in winxp inside of a virtualbox instance. I've tried both the debug and production exe's with no change in results.
The issues feels like a buffering or a throttling issue but while looking at the internal code of the ftp classes i don't see anything unusual and I'm specifically setting it not to throttle. Any suggestions or comments about this particular ftp component library?
It might be interesting to know if the connection using FileZilla is an Active mode connection or Passive mode connection. Other things of interest would be to try downloading the file using an FTP client that shows the dialog between the client and the server. I am not sure if FileZilla will show you this information or not.
Related
Im considering taking web server from China to reduce site loading times from China/China users. Problem is, how to sync/keep same data between two sites? When editing content in the site it should update these changes to site in China server.
Server is running Linux, Apache and MySQL. Website is using WordPress.
FYI I'm already using CDN and site loading speed is still too long from China.
Basically your solution would need to...
Copy the entire contents of your http'd directory from the main server to the Chinese server.
Copy the entire contents of your MySQL database from the main server to the Chinese server.
Perform these tasks at a regular interval without manual intervention.
I can guide you to references that will help with each task and sometimes can show you a quick example. However, if you want to get it to work and especially if you want to optimize the process, you're going to have to look through the references yourself.
If I didn't do it this way this answer would get even more horrendously long that it already is.
Before we start you should remember...
Thing 0 - Please Try Not to be Intimidated by the Length of this Answer
I know I've written a lot, perhaps more than I should have, but I guarantee you are capable of implementing this in no more than a day. I have tried to be thorough but that does not mean that what I'm describing is particularly complicated.
Thing 1 - Shutdown your Chinese Server During Transfer
This transfer of data is going to make your Chinese server unusable while it's in progress, as you might have guessed. You need to make sure that you're Chinese server is not operational during the transfer. Otherwise the server might have only partial data available which could cause problems for both client and server, particularly in relation to MySQL.
Thing 2 - Use Compression as much as You Can
As time consuming as compression and decompression can be for large amounts of data, believe me it is nothing compared to the time you will waste sending the uncompressed data to China. Network usage, not processor time, is really going to be the limiting factor in getting the transfer done quickly. Try to send compressed files whenever possible.
Thing 3 - Try to Use Checksums
Sending all your data, particularly in compressed format, will leave it vulnerable to corruption in transit. Whenever you send a file I encourage you to use some kind of checksum on the data to verify that it has not been corrupted. For brevity I will not be showing you how to do this but I'm sure you're smart enough to figure out how to pepper in some verification.
In case you're not familiar with checksums, the Wikipedia article about them is pretty straight forward. The most commonly used are the MD5 and the SHA-1, but both of those are somewhat collision prone. I would recommend the SHA-2 (also called SHA-256/512) or the very new SHA-3.
Copying your Http'd Directory to the Chinese Server
As far as I know (and I could be wrong) there is no built in way to transfer files from one Apache server to another...so you're going to have to write your own script for this.
You're also going to need to have two separate scripts: one for the main server and one for the Chinese server. Here's a breakdown of what each script needs to do.
On your main server...
Log in as you're Apache server's user. (Reference for switching users.)
zip/gzip/tar.gz your http'd directory's contents. (Reference for zip. Reference for gzip. Reference for tar.)
scp (secure copy) the compressed file to your Chinese server. Make sure to copy it to the username that Apache runs under. (Reference for scp.)
Delete the compressed file.
Initiate the Chinese server's script (this will be discussed later).
You will likely be using a shell script for all of this, so I hope you're familiar with the terminal. A simple example would look like this.
#!/bin/sh
## First I'll define some variables to explain this better.
APACHE_USER="whatever your Apache server's username is (usually it's www-data)";
WWW_DIR="your http'd directory relative to ~ (usually it's /var/www)";
CHINA_HOST="the host name/IP address of your Chinese server"
CHINA_USER="Apache's username on the Chinese server";
CHINA_PWD="Apache's user password on the Chinese server";
CHINA_HOME="the home directory of the Apache user on your Chinese server";
## Now to the real scripting. I will be using zip for compression.
su - "$APACHE_USER";
zip -r copy.zip "$WWW_DIR";
scp copy.zip "$CHINA_USER#$CHINA_HOST:$CHINA_HOME" < echo $CHINA_PWD;
rm copy.zip;
## Then you initiate the next step of the process.
## Like I said this will be covered later.
On your Chinese server...
Log in as the Apache user.
Delete the content of the http'd directory (probably /var/www relative to ~).
Decompress the scp'd file (this will change depending on how you compressed it).
Copy the decompressed directory to the http'd directory (this step is unnecessary if you choose to compress with zip).
Deleted the compressed, scp'd file.
Notify main server to continue next step (again, will be discussed later).
This is pretty straight forward and I don't think you need another example for this part.
Copying the MySQL Database Contents
You can find a good reference for how to do this in this article from the MySQL website. Basically copying database contents is a built in feature. Try to make use of the compression options!
Performing these Tasks at Regular Intervals without Manual Intervention
Ok this is where things get kind of complicated.
The first thing you need to know is how to schedule tasks at regular intervals on Linux. This is done with a command line tool called crontab. You can see good examples for setting up cron jobs in this article, and the full crontab documentation here.
However what will take more skill than just scheduling the job at regular intervals will be synchronizing the data transfer. If you simply set one server to send data at a certain time and the other to receive it at a certain time, you will get many bugs. Be sure of that.
My recommendation would be to create a socket in the Chinese server that listens for instructions from the main server.
This can be done in a variety of languages. Because you're using Linux I would recommend doing this in C, but it can be done in almost any language including Bash.
A full example would be too much but basically this will be the flow of what you have to do.
Socket in China listens for connections.
Cron job in main server connects to China socket.
Main server authenticates itself.
Chinese server stops Apache, stops accepting requests.
Chinese server acknowledges authentication approved.
Main server scp's website contents to Chinese server.
Main server tells Chinese server that scp is complete.
Chinese server replaces Apache's http'd directory's contents with the data that has been scp'd.
Chinese server announces success to main server.
Main server copies MySQL data.
Main server tells Chinese server process is complete.
Chinese server resumes Apache service.
Chinese server notify's main server that service is resumed.
Socket is closed.
Chinese server goes back to listening for connection from main server.
I hope this helps!
Current implementation-
Divide the original file into files equal to the number of servers.
Ensure each server picks one file for processing.
Each server splits the file into 90 buckets.
Use ForkManager to fork 90 processes, each operating on a bucket.
The child processes will make the API calls.
Merge the output of child processes.
Merge the output of each server.
Stats-
The size of the content downloaded using the API call is 40KB.
On 2 servers, the above process for a 225k user file runs in 15 minutes. My aim is to finish a 10 million file in 30 minutes. (Hope this doesn't sound absurd!)
I contemplated using BerkeleyDB but, couldn't find how do I convert the BerkeleyDB file into normal ASCII file.
This sounds like a one-time operation to me. Although I don't understand the 30 minute limit, I have a few suggestions I know from experience.
First of all, as I said in my comment, your bottleneck will not be reading the data from your files. It will also not be writing the results back to a harddrive. The bottleneck will be in the transfer between your machines and the remote machines. Your setup sounds sophisticated, but that might not help you in this situation.
If you are hitting a webservice, someone is running that service. There are servers that can only handle a certain ammount of load. I have brought down the dev environment servers of a big logistics company with a very small load test I ran at night. Often, these things are equipped for long-term load, but not short, heavy load.
Since IT is all about talking to each other through various protocols, like web services or other APIs, you should also consider just talking to the people who run this service. If you have a business-relationship, that is easy. If not, try to find a way to reach them and to ask if their service is able to handle so many requests at all. You could end up with them excluding you permanently because to their admins it looks like you tried to DDOS them.
I'd ask them if you could send them the files (or an excerpt of the data, cut down to what is relevant for processing) so they can do the operations in batch on their side. That way, you remove the load for processing everything as web requests, and the time it takes to do these requests.
The application is simple, an HTML form that posts to a Perl script. The problem is we sometimes have our customers upload very large files (gt 500mb) and their internet connections can be unreliable at times.
Is there any way to resume a failed transfer like in WinSCP or is this something that can't be done without support for it in the client?
AFAIK, it must be supported by the client. Basically, the client and the server need to negotiate which parts of the file (likely defined as parts in "multipart/form-data" POST) have already been uploaded, and then the server code needs to be able to merge newly uploaded data with existing one.
The best solution is to have custom uploader code, usually implemented in Java though I think this may be possible in Flash as well. You might be even able to do this via JavaScript - see 2 sections with examples below
Here's an example of how Google did it with YouTube: http://code.google.com/apis/youtube/2.0/developers_guide_protocol_resumable_uploads.html
It uses "308 Resume Incomplete" HTTP response which sends range: bytes=0-408 header from the server to indicate what was already uploaded.
For additional ideas on the topic:
http://code.google.com/p/gears/wiki/ResumableHttpRequestsProposal
Someone implemented this using Google Gears on calient side and PHP on server side (the latter you can easily port to Perl)
http://michaelshadle.com/2008/11/26/updates-on-the-http-file-upload-front/
http://michaelshadle.com/2008/12/03/updates-on-the-http-file-upload-front-part-2/
It's a shame that your clients can't use ftp uploading, since this already includes abilities like that. There is also "chunked transfer encoding" in HTTP. I don't know what Perl modules might support it already.
I need to allows users to download multiple images in a single download. This download will include an sql file and images. Once the download completes, the sql will execute, inserting text into an sqlite database. This text will include references to the download images. The text and images are rendered in a UIWebView.
What is the best way to download everything in a single download? I was thinking to use a bundle since it can be loaded at runtime but not sure of any limitations/restrictions in this scenario. I have tested putting the bundle into the Documents folder and then accessing resources inside of it. That seems to work fine in a simple test.
You're downloading everything through a socket, which only knows about bytes, so a bundle, or even a file, doesn't "naturally" transfer through, the server side opens files and encodes and sends them into the connection, the client reads from the socket and reconstructs the original file structure.
Assuming the application has a UI for picking which items needs to be transferred, it could then request all items to the server, and the server could then send all the items through the single connection with some delimitation you invent, so that the iPhone app can split the stream back into the individual files.
Or another options is that the client could just perform individual HTTP requests for the different files, through pretty straightforward use of NSURLConnection.
The former sounds like an attempt to optimize the latter. Have you already tested and verified that the latter is too slow/inefficient? It definitely is more complex to implement.
There is a latency issue with multiple HTTP connections that you run in a sequence, however you can perhaps mitigate it by running multiple downloads connections in parallel -- for example through an NSOperationQueue with a limit of 2 to 5 concurrent download operations.
I inherited a legacy Perl script from an old server which is being removed. The script needs to be implemented on a new server. I've got it on the new server.
The script is pretty simple; it connects via expect & ssh to network devices and gathers data. For debugging purposes, I'm only working with the portion that gathers a list of the interfaces from the device.
The script on the new server always shows me a page within about 5 seconds of reloading it. Rarely, it includes the list of interfaces from the remote device. Most commonly, it contains all the HTML elements except the list of interfaces.
Now, on the old server, sometimes the script would take 20 seconds to output the data. That was fine.
Based on this, it seems that apache on the new server is displaying the data before the Perl script has finished returning its data, though that could certainly be incorrect.
Additional Information:
Unfortunately I cannot post any code - work policy. However, I'm pretty sure it's not a problem with expect. The expect portions are written as expect() or die('error msg') and I do not see the error messages. However, if I set the expect timeout to 0, then I do see the error messages.
The expect timeout value used in the script normally is 20 seconds ... but as I mentioned above, apache displays the static content from the script after about 5 seconds, and 95% of the time does not display the content that should retrieved from expect. Additionally, the script writes the expect content to a file on the drive - even when the page does not display it.
I just added my Troubleshooting Perl CGI scripts guide to Stackoverflow. :)
You might try CGI::Inspect. I haven't needed to try it myself, but I saw it demonstrated at YAPC, and it looked awesome.