SFTP file uploading and downloading at same time - perl

A cronjob runs every 3 hours to download a file using SFTP. The scheduled program is written in Perl and the module used is Net::SFTP::Foreign.
Can the Net::SFTP::Foreign download files that are only partially uploaded using SFTP?
If so, do we need to check the SFTP file modified date to check copy process completion?
Suppose a new file is uploading by someone in SFTP and he file upload/copy is in progress. If a download is attempted at the same time, do I need to code for the possibility of fetching only part of a file?

It's not a question of the SFTP client you use, that's irrelevant. It's how the SFTP server handles the situation.
Some SFTP servers may lock the file being uploaded, preventing you from accessing it, while it is still being uploaded. But most SFTP servers, particularly the common OpenSSH SFTP server, won't lock the file.
There's no generic solution to this problem. Checking for timestamp or size changes may work for you, but it's hardly reliable.
There are some common workarounds to the problem:
Have the uploader upload "done" file once upload finishes. Make your program wait for the "done" file to appear.
You can have dedicated "upload" folder and have the uploader (atomically) move the uploaded file to "done" folder. Make your program look to the "done" folder only.
Have a file naming convention for files being uploaded (".filepart") and have the uploader (atomically) rename the file after upload to its final name. Make your program ignore the ".filepart" files.
See (my) article Locking files while uploading / Upload to temporary file name for example of implementing this approach.
Also, some FTP servers have this functionality built-in. For example ProFTPD with its HiddenStores directive.
A gross hack is to periodically check for file attributes (size and time) and consider the upload finished, if the attributes have not changed for some time interval.
You can also make use of the fact that some file formats have clear end-of-the-file marker (like XML or ZIP). So you know, when you download an incomplete file.
For details, see my answer to SFTP file lock mechanism.

The easiest way to do that when the upload process is also under your control, is to upload files using temporal names (for instance, foo-20170809.tgz.temp) and once the upload finishes, rename then (Net::SFTP::Foreign::put method supports the atomic option which does just that). Then on the download side, filter out the files with names corresponding to temporal files.
Anyway, Net::SFTP::Foreign get and rget methods can be instructed to resume a transfer passing the option resume => 1.
Also, if you have full SSH access to the SFTP server, you could check if some other process is still writing to the file to be downloaded using fuser or some similar tool (though, note that even then, the file may be incomplete if for instance there is some network issue and the uploader needs to reconnect before resuming the transfer).

You can check the size of the file.
Connect to SFTP.
Check file size.
Sleep for 5/10 seconds.
Check file size again.
If size did not change, download the file, if the size changed do step 3.

Related

Scripted FTP Upload from Container

I am trying to upload a file from a container field to a location on FTP as a serverside script. I have been trying to use the Base Elements BE_FTP_Upload as I'm lead to believe this works on a server script, however I just simply cannot get it to work, I've had the file on FTP, but its always blank missing the content.
I should also add that the BE_Curl_Trace feedback shows successful connection to the FTP, it seems to be my method of moving the file rather than a bad connection. Script attached. (excuse the squiggles, data protection and what not.)
After all of this, simply changing the "filewin:" to "file:" solved my problem, I am now exporting from FM to FTP via a scheduled server script :)

Talend: Using tfilelist to access files from a shared network path

I have a Talend job that searches a directory and then uploads it to our database.
It's something like this: dbconnection>twaitforfile>tfilelist>fileschema>tmap>db
I have a subjobok that then commits the data into the table iterates through the directory and movies files to another folder.
Recently I was instructed to change the directory to a shared network path using the same components as before (I originally thought of changing components to tftpfilelist, etc.)
My question being how to direct it to the shared network path. I was able to get it to go through using double \ but it won't read any of the new files arriving.
Thanks!
I suppose if you use tWaitForFile on the local filesystem Talend/Java will hook somehow into the folder and get a message if a new file is being put into it.
Now, since you are on a network drive first of all this is out of reach of the component. Second, the OS behind the network drive could be different.
I understand your job is running all the time, listening. You could change the behaviour to putting a tLoop first which would check the file system for new files and then proceed. There must be some delta check in how the new files get recognized.

Is it possible to keep a local mirror of an FTP site with aria2?

I have a website to which I have FTP access only (otherwise I'd use rsync for this) and I'd like to keep a local copy of it. At the moment I run the following wget command every once in a while
wget -m --ftp-user=me --ftp-password=secret ftp://my.server.com
When there are many updates it does get tedious with wget only having one connection at a time. I read about aria2 but couldn't find any hints as to answer the questions whether it would be possible to use aria2 as a replacement for this purpose?
No, according to the aria2 docs the option for downloading only newer files only works with http(s).
--conditional-get[=true|false]
Download file only when the local file is older than remote file. This function only works with HTTP(S) downloads only. It does not work if file size is specified in Metalink.

Is it possible to have nginx stream a file for download that is currently being written to?

So I have 2 services running, one transcodes a file in realtime (ffmpeg), and another exposes it through http (nginx). The problem I currently have is that when ffmpeg begins transcoding, and I access the file through nginx, only a portion of the written bytes are downloaded.
Question, is it possible to config nginx in such a way as to stream the file currently being written to until writing finishes and I now have the complete file on my local computer?
Thank you
I don't believe Nginx by itself can do this. You would need an application (php, perl, python, whatever) that can monitor the transcoding progress and serve the request with chunked transfer encoding Essentially it would keeping the connection open to the client and deliver more data as it becomes available.
I had a very similar issue. I don't know how to begin streaming while ffmpeg is still transcoding the file, but here was my fix:
I had a php script that made the system calls (though you could write this in pretty much any language). The script had ffmpeg write to a temporary file. Before calling ffmpeg, it checked whether the temporary file existed to eliminate concurrency issues. If so, it waited until the real file existed.
Once the file was done converting, it renamed the temporary file to the real file and redirected the http request to the transcoded file.

What is the best way to transfer files between servers constantly?

I need to transfer files between servers ... no specific move, but continuos moves. What is the best way ?
scp, ftp, rsync ... other ?
Of course if it's a "common" method (like ftp) I would block just to works between the IP's of the servers.
Also I need a SECURED form to transfer files ... I mean, to be totally sure that the files have moved successfully .
Has rsync a way to know the files were moved successfully ? maybe an option to check size or checksum or whatever.
Note: The servers are in different location
try rsync, utility that keeps copies of a file on two computer systems