I have a python code that runs perfectly well on my computer, except that my laptop is very slow. I want to upload the code and run it on PythonAnywhere. (Not even sure if this is the best suited resource, but it is relatively easy to use!)
I have successfully uploaded some files and my code to my home directory, and the code runs ok. But since I have many files to analyze, I have uploaded a zip file, successfully upzipped it (to home/myname/part1) and saved my python code to the same directory. When I try to run my code in that directory, however, it does not work. In fact, the code does not return any error, it analyzes 2 blank files instead of the 100 or so I uploaded. And then it stops and exits as if the job was done.
Any ideas why? TIA!
If that matters, here is a condensed version of the code ("analysis" is not a command, but stands for more lines of analysis, all of which run fine on my computer):
import csv, re
from string import punctuation
import glob, io
csvfile=open("test.csv", "w", newline='', encoding='cp850', errors='replace')
writer=csv.writer(csvfile)
for filename in glob.glob('*.txt'):
###Open files and arrange them so that they are ready for pre-processing
with open(filename, encoding='utf-8', errors='ignore') as f:
analysis
output=zip(file1, file_name, more_date)
writer=csv.writer(open('test.csv','a',newline='', encoding='cp850', errors='replace'))
writer.writerows(output)
csvfile.flush()
My guess is that you're not processing the files you think you're processing. Print the result of your glob so that you can verify that you're working in the directory you think you are.
Related
I am trying to use kdb q script to download file from remote source.
How can I make the download keep going if there is an error?
also, how can i mark it down what its downloaded in linux when there are other files in the same directory???
Here is my code:
file:("abc.csv";"def.csv");
dbdir:"/home/terry/";
dlFunc:{
system "download.sh abc.com user /"get /remote/path/",x /",dbdir};
dlFunc each file;
If you're asking how to continue downloading other files if one file fails then you can put a protected eval around your dlFunc each file, e.g.
#[dlFunc;;()]each file;
You could capture the list of failed files using something like:
badfiles:();
{#[dlFunc;x;{y;badfiles,:enlist x}x]}each file;
Then inspect the badfiles list afterwards. The ones that succeeded would be:
file except badfiles
Ok, so here's my issue. I have written a build script in bash that pipes output to tee and sorts different output to different log files (so I can summarize errors/warnings at the end and get some statistics on files built). I wanted to use the colorgcc perl script (colorgcc.1.3.2) to colorize the output from gcc and had found in other places that this won't work piping to tee, since the script checks if it is writing to something that is not a tty. Having disabled this check everything was working until I did a full build and discovered some of the code we receive from another group builds C dependency files (we don't control this code, changing it or the build process for these isn't really an option).
The problem is that these .d files have the form as follows:
filename.o filename.d : filename.c \
dependant_file1.h \
dependant_file2.h (and so on for however many dependencies there are)
This output from GCC gets written into the .d file, but, since it is close enough to a warning/error message colorgcc outputs color codes (believe it's the check for filename:lineno:message but not 100% sure, could be filename:message check in the GCCOUT while loop). I've tried editing the regex to attempt to not match this but my perl-fu is admittedly pretty weak. So what I end up with is a color code on each line for these dependency files, which obviously causes the build to fail.
I ended up just replacing the check for ! -t STDOUT with a check for a NO_COLOR envar I set and unset in the build script for these directories (emulates the previous behavior of no color for non-tty). This works great if I run the full script, but doesn't if I cd into the directory and just run make (obviously setting and unsetting manually would work but this is a pain to do every time). Anyone have any ideas how to prevent this script from writing color codes into dependency files?
Here's how I worked around this. I added the following to colorgcc to search the gcc input for the flag to generate the .d files and just directly called the compiler in that case. This was inserted in place of the original TTY check.
for each $argnum (0 .. $#ARGV)
{
if ($ARGV[$argnum] =~ m/-M{1,2}/)
{
exec $compiler, #ARGV
or die("Couldn't exec");
}
}
I don't know if this is the proper 'perl' way of doing this sort of operation but it seems to work. Compiling inside directories that build .d files no longer inserts color codes and the source file builds do (both to terminal and my log files like I wanted). I guess sometimes the answer is more hacks instead of "hey, did you try giving up?".
all. I had some questions about the Perl module HTTP::Cookies. The example on CPAN is like below:
$cookie_jar = HTTP::Cookies->new( file => '$ENV{\'HOME\'}/lwp_cookies.dat', autosave => 1);
The lwp_cookies.dat file is used to save cookie data on my local machine as I understand. On my machine, '$ENV{\'HOME\'}' is an empty path. The script runs good, even after execution I can't find any file named "lwp_cookies.dat" on my machine. I changed '$ENV{\'HOME\'}' to '$ENV{\'TMP\'}', which is a path really exists after I verified by Perl print. Still I can't find the "lwp_cookies.dat" in my TEMP folder. My first question is how the HTTP::Cookies is working with the "lwp_cookies.dat" file.
On the other hand, on one of my systems(all're Windows system as mentioned here), the same code produce error message below:
Can't open $ENV{'HOME'}/lwp_cookies.dat: No such file or directory
So it's strange to me. On my good system, even file or path not exists, the script runs well, which I suppose the file is created on some temp memory instead; on bad system, the code example doesn't work at all.
If you want the $ENV{'HOME'} variable to interpolate into the string, you need double quotes; single quotes don't interpolate variables:
`file => "$ENV{'HOME'}/lwp_cookies.dat",`
I am looking to batch download a large number of files (>800). I have a text file with the list of all the filenames. These filenames are then used to derive the URL from which they can be downloaded. I had been parsing through the file with a python script, using subprocess to wget the files.
wget ftp://ftp.name.of.site/filename-prefix/filename/filename+suffix
However for reasons unknown to me, wget is failing to properly connect. I wanted to know if I could essentially use an ftp program that would work in a similar manner, i.e. no login and stay within the commandline.
Edit:
What's in my text file:
ERS032033
ERS032214
ERS032234
ERS032223
ERS032218
The ERS### act as the prefix. The whole thing is the filename. The final file (i.e. filename+suffix) would look something like: ERS032033_1.fastq.gz
Submitting the correct url is not the problem.
Since you are using Python, I suggest dropping the subprocess approach and using the urllib module instead:
import urllib
handle = urllib.urlopen('ftp://ftp.name.of.site/filename-prefix/filename/filename+suffix')
print handle.read()
handle.close()
Assuming you are using Python 2 (urllib.request for Python 3)
If you simply need batch download, urllib.urlretrieve is a cleaner approach.
I have a project that depends upon some other binaries to be downloaded from web at install time.For this what i do is:
if ( file-present-in-src/)
# skip that file
else
# use wget to download the file
The problem with this approach is that when I interrupt a download in middle, and do invoke the script next time, the partially downloaded file is also skipped (which is not desired), also I want wget to resume the download of the partially downloaded file.
How should I go about it:
Possible Solutions I could think of:
Let the file to be downloaded to some file say download_tmp. Move to original file
if successful.
Handle SIG{'INT'} to write proper cleanup code.
But none of these could help resume the partial file download,
Any insights?
Fist, I don't understand what this has to do with Perl, since you're using wget to do the dowloading ... You could use libwww-perl (perldoc LWP) and have more control about the download process.
Then I second your idea of downloading to a "tmp" filename and move the file on success.
However I think you need to go further and verify the integrity of the files. Doing an MD5 or SHA hash is very easy, and match the downloaded one with what you're expecting. You can have a short file on server containing the checksum (filename.md5). Determine success only when you have a match.
Note that catching all the signals and generally trying to make the process unkillable, and then expecting it to have worked is bound to fail at one point or another. There could be a network timeout, a crash, power failure, configuration problem on the server ... you should instead assume downloads can fail, because they will, and code so that your process can recover.
Finally you're not telling us what kind of binaries you're downloading and what you're doing with them. Since you use wget I'm going to assume you're on Unix; you should consider using RPM+Yum or the likes, they handle all this for you. RPM are easy to write, really.
use your first approach ..
download to "FileName".tmp
move "FileName".tmp to "FileName" move! not copy
once per diem clean out all .tmp files (paranoia rulez)
You could just use wget's -N and -c options and remove the entire "if file exists" logic.