Looking for command line ftp client (linux) - command-line

I am looking to batch download a large number of files (>800). I have a text file with the list of all the filenames. These filenames are then used to derive the URL from which they can be downloaded. I had been parsing through the file with a python script, using subprocess to wget the files.
wget ftp://ftp.name.of.site/filename-prefix/filename/filename+suffix
However for reasons unknown to me, wget is failing to properly connect. I wanted to know if I could essentially use an ftp program that would work in a similar manner, i.e. no login and stay within the commandline.
Edit:
What's in my text file:
ERS032033
ERS032214
ERS032234
ERS032223
ERS032218
The ERS### act as the prefix. The whole thing is the filename. The final file (i.e. filename+suffix) would look something like: ERS032033_1.fastq.gz
Submitting the correct url is not the problem.

Since you are using Python, I suggest dropping the subprocess approach and using the urllib module instead:
import urllib
handle = urllib.urlopen('ftp://ftp.name.of.site/filename-prefix/filename/filename+suffix')
print handle.read()
handle.close()
Assuming you are using Python 2 (urllib.request for Python 3)
If you simply need batch download, urllib.urlretrieve is a cleaner approach.

Related

Aborting wget on file found (Windows)

I am trying to use wget for Windows (on Windows 7) to find and download a file that I don't know the full name of (I have a partial name, and I know the form of the unknown part of the name). I am using an input file with a list of the possible file names, and I want to abort wget when the file is found (the rest of the possibilities will give 404 errors). How can I cause wget to abort automatically when that one file is found?

Analyzing unzipped files on PythonAnywhere

I have a python code that runs perfectly well on my computer, except that my laptop is very slow. I want to upload the code and run it on PythonAnywhere. (Not even sure if this is the best suited resource, but it is relatively easy to use!)
I have successfully uploaded some files and my code to my home directory, and the code runs ok. But since I have many files to analyze, I have uploaded a zip file, successfully upzipped it (to home/myname/part1) and saved my python code to the same directory. When I try to run my code in that directory, however, it does not work. In fact, the code does not return any error, it analyzes 2 blank files instead of the 100 or so I uploaded. And then it stops and exits as if the job was done.
Any ideas why? TIA!
If that matters, here is a condensed version of the code ("analysis" is not a command, but stands for more lines of analysis, all of which run fine on my computer):
import csv, re
from string import punctuation
import glob, io
csvfile=open("test.csv", "w", newline='', encoding='cp850', errors='replace')
writer=csv.writer(csvfile)
for filename in glob.glob('*.txt'):
###Open files and arrange them so that they are ready for pre-processing
with open(filename, encoding='utf-8', errors='ignore') as f:
analysis
output=zip(file1, file_name, more_date)
writer=csv.writer(open('test.csv','a',newline='', encoding='cp850', errors='replace'))
writer.writerows(output)
csvfile.flush()
My guess is that you're not processing the files you think you're processing. Print the result of your glob so that you can verify that you're working in the directory you think you are.

file for saving cookie data not found when using HTTP::Cookies in Perl script

all. I had some questions about the Perl module HTTP::Cookies. The example on CPAN is like below:
$cookie_jar = HTTP::Cookies->new( file => '$ENV{\'HOME\'}/lwp_cookies.dat', autosave => 1);
The lwp_cookies.dat file is used to save cookie data on my local machine as I understand. On my machine, '$ENV{\'HOME\'}' is an empty path. The script runs good, even after execution I can't find any file named "lwp_cookies.dat" on my machine. I changed '$ENV{\'HOME\'}' to '$ENV{\'TMP\'}', which is a path really exists after I verified by Perl print. Still I can't find the "lwp_cookies.dat" in my TEMP folder. My first question is how the HTTP::Cookies is working with the "lwp_cookies.dat" file.
On the other hand, on one of my systems(all're Windows system as mentioned here), the same code produce error message below:
Can't open $ENV{'HOME'}/lwp_cookies.dat: No such file or directory
So it's strange to me. On my good system, even file or path not exists, the script runs well, which I suppose the file is created on some temp memory instead; on bad system, the code example doesn't work at all.
If you want the $ENV{'HOME'} variable to interpolate into the string, you need double quotes; single quotes don't interpolate variables:
`file => "$ENV{'HOME'}/lwp_cookies.dat",`

compare file size after ftp get with the original file on server

In SQL I'm using xp_cmdShell to run FTP commands. I have no problem getting the list of files or copying files to the local server, but I want to compare copied file size to the original to make sure the get has been successful.
Any ideas on how to compare file sizes?
From a command prompt you can use the DOS File Compare command (fc). In your case you probably want to do a binary compare (there is no file size compare). I binary compare should work in your case.
Most DOS commands will return some code that let s you know the status.
http://www.computerhope.com/fchlp.htm
EDIT
Sorry, I read your question and realized you want to compare it against a file on the ftp server. I think this is a moot point since if ftp reports a successful file transfer there is no reason to compare (unless your source of comparison for not the ftp site). Does that make sense?
What you could do it use the FTP command ls command.
ftp> ls <filename>
where ftp> is the ftp prompt and not part of the command. This command gives you the file size in bytes. Then you need to use the dos command for the local file. Here is a StackOverflow question (and answer) about that.
Windows command for file size only?

How can I resume downloads in Perl?

I have a project that depends upon some other binaries to be downloaded from web at install time.For this what i do is:
if ( file-present-in-src/)
# skip that file
else
# use wget to download the file
The problem with this approach is that when I interrupt a download in middle, and do invoke the script next time, the partially downloaded file is also skipped (which is not desired), also I want wget to resume the download of the partially downloaded file.
How should I go about it:
Possible Solutions I could think of:
Let the file to be downloaded to some file say download_tmp. Move to original file
if successful.
Handle SIG{'INT'} to write proper cleanup code.
But none of these could help resume the partial file download,
Any insights?
Fist, I don't understand what this has to do with Perl, since you're using wget to do the dowloading ... You could use libwww-perl (perldoc LWP) and have more control about the download process.
Then I second your idea of downloading to a "tmp" filename and move the file on success.
However I think you need to go further and verify the integrity of the files. Doing an MD5 or SHA hash is very easy, and match the downloaded one with what you're expecting. You can have a short file on server containing the checksum (filename.md5). Determine success only when you have a match.
Note that catching all the signals and generally trying to make the process unkillable, and then expecting it to have worked is bound to fail at one point or another. There could be a network timeout, a crash, power failure, configuration problem on the server ... you should instead assume downloads can fail, because they will, and code so that your process can recover.
Finally you're not telling us what kind of binaries you're downloading and what you're doing with them. Since you use wget I'm going to assume you're on Unix; you should consider using RPM+Yum or the likes, they handle all this for you. RPM are easy to write, really.
use your first approach ..
download to "FileName".tmp
move "FileName".tmp to "FileName" move! not copy
once per diem clean out all .tmp files (paranoia rulez)
You could just use wget's -N and -c options and remove the entire "if file exists" logic.