How to use PyFileSystem to redirect output of other program - virtual-disk

I am using python in Windows.
You know there is no tmpfs filesystem like Linux in Windows. So, I want to use pyfilesystem instead. The original codes are like below:
>>> import SomeLibs1
>>> import SomeLibs2
>>> SomeLibs1.Convert('input.txt', 'output.txt')
>>> SomeLibs2.ReadResults('output.txt')
SomeLibs(1/2) means some libs of python.
These codes will input 'input.txt' and then give the results to 'output.txt'. SomeLibs2 will read the 'output.txt' to do another thing.
So, I just want to put the 'output.txt' in the virtual store(just like tmpfs in Linux), because I don't need to store it in a hard disk, which might be running faster.
Could PyFileSystem give any help, just like Memory Filesystem?
Please give me some ideas and thanks a lot!

Related

Analyzing unzipped files on PythonAnywhere

I have a python code that runs perfectly well on my computer, except that my laptop is very slow. I want to upload the code and run it on PythonAnywhere. (Not even sure if this is the best suited resource, but it is relatively easy to use!)
I have successfully uploaded some files and my code to my home directory, and the code runs ok. But since I have many files to analyze, I have uploaded a zip file, successfully upzipped it (to home/myname/part1) and saved my python code to the same directory. When I try to run my code in that directory, however, it does not work. In fact, the code does not return any error, it analyzes 2 blank files instead of the 100 or so I uploaded. And then it stops and exits as if the job was done.
Any ideas why? TIA!
If that matters, here is a condensed version of the code ("analysis" is not a command, but stands for more lines of analysis, all of which run fine on my computer):
import csv, re
from string import punctuation
import glob, io
csvfile=open("test.csv", "w", newline='', encoding='cp850', errors='replace')
writer=csv.writer(csvfile)
for filename in glob.glob('*.txt'):
###Open files and arrange them so that they are ready for pre-processing
with open(filename, encoding='utf-8', errors='ignore') as f:
analysis
output=zip(file1, file_name, more_date)
writer=csv.writer(open('test.csv','a',newline='', encoding='cp850', errors='replace'))
writer.writerows(output)
csvfile.flush()
My guess is that you're not processing the files you think you're processing. Print the result of your glob so that you can verify that you're working in the directory you think you are.

How to use Archive::Extract safely - againist zip bomb or similar?

Problem outline:
need allow upload ZIP files (and tgz and more compressed directory trees) via web-from
the zip files should be extracted for their content handling
planning to use Archive::Extract for the extracting
here are things like ZIP BOMBS and like...
From the manual
Archive::Extract can use either pure perl modules or command line
programs under the hood. Some of the pure perl modules (like
Archive::Tar and Compress::unLZMA) take the entire contents of the
archive into memory, which may not be feasible on your system.
Consider setting the global variable $Archive::Extract::PREFER_BIN to
1 , which will prefer the use of command line programs and won't
consume so much memory.
The questions are:
When I set the $Archive::Extract::PREFER_BIN = 1 - i'm enough protected againist ZIP-BOMB like things?
$Archive::Extract::PREFER_BIN protect me againist much memory usage - but, the standard unzip, tar -z unrar binaries are safe againist zip bomb like attacks?
If not - how to handle safely uploaded compressed directory tree? (so here is not only one file inside the e.g zip archive).
$Archive::Extract::PREFER_BIN = 1 doesn't protect you against zip bombs, you are passing the problem to the binary unzip tool of your system.
This SO question may helps you. I like the idea of running a second process with ulimit.

Virtual filesystem in Perl

I'm looking for a virtual filesystem layer in Perl. Something that would provide a general abstraction for basic filesystem routines like ls, mkdir and so on, regardless how the actual filesystem is implemented.
I'd like an interface like this:
# create a directory "/some/path/tmp" in my current filesystem
my $plainfs = Module::new->(type => 'local', root=>'/some/path);
$plainfs->mdkir("/tmp");
# create "tmp" dir on a remote filesystem
my $sshfs = Module::new->(type=>'ssh', root=>'user:password#example.com:~/pub')
$sshfs->mdkir("/tmp");
I found the VFS package on MetaCPAN, unfortunately there are only empty, unimplemented modules.
Is something already implemented? Right now, I'm looking for only “local” filesystems and ftp or ssh—I don't need a database “filesystem” or any other exotic “filesystem” like CVS or so. Searching 20k MetaCPAN modules is painful without any tagging system or alike…
Perhaps File::System is what you're looking for. It provides basic functionalities found in common operating systems for managing a virtual file system (not necessarily comprised only of files and directories).
Most of the functionalities are presented as method of the File::System::Object package.
what about some FUSE implementation? ( file system in userspace ) ? I would guess there is at least one pseudo-filesystem implemented in perl based on that. After all, it should be quite easy to implement, basically it's no more than some set of operations like mount, ls, df, stat and so on. I was once through autofs sources in C, looked pretty straightforward. You might want to see http://code.google.com/p/mogilefs/ as well.
Don't be too stuck up on the module approach. All you need is some utility that mounts SSH/FTP filesystem as a local filesystem and then you will simply use standard commands like cd, mkdir and so on. The reason why you don't see any modules for this is that this approach is generally preferred.
Look at http://sourceforge.net/apps/mediawiki/fuse/index.php?title=FileSystems
You will simply use FUSE to mount any of those file systems and that is it. Here are some links to look at, but most of those can be got as packages in most distributions too.
http://sourceforge.net/projects/lufs/
http://lftpfs.sourceforge.net
Here is module to simply mount FUSE file systems within perl:
http://search.cpan.org/~dpavlin/Fuse/Fuse.pm
There are a LOT of File::* modules which handle different parts of cross-platform filesystem management.
For example:
use File::Spec::Functions qw(catfile);
Will let you get my $filename = catfile $root, $path, "$filename.$ext"; or my $new_directory = catfile $path, "new_sub_directory"; and be sure to use the correct separators, e.g. / or \, et cetera.
Another thing you seem to want can be had with:
use File::Path qw(make_path);
which is pretty handy, and can be called like make_path($new_directory, { mode => 0755 });
I'm not really sure if File::System actually handles remote systems the way you want.
A couple different ways occur to me to handle that, but I think Net::SSH::Expect is what I've used in the past, and isn't too bad, although you'd probably have an easier time if you could somehow mount the remote filesystem locally, do what you have to do, then unmount it.

Looking for command line ftp client (linux)

I am looking to batch download a large number of files (>800). I have a text file with the list of all the filenames. These filenames are then used to derive the URL from which they can be downloaded. I had been parsing through the file with a python script, using subprocess to wget the files.
wget ftp://ftp.name.of.site/filename-prefix/filename/filename+suffix
However for reasons unknown to me, wget is failing to properly connect. I wanted to know if I could essentially use an ftp program that would work in a similar manner, i.e. no login and stay within the commandline.
Edit:
What's in my text file:
ERS032033
ERS032214
ERS032234
ERS032223
ERS032218
The ERS### act as the prefix. The whole thing is the filename. The final file (i.e. filename+suffix) would look something like: ERS032033_1.fastq.gz
Submitting the correct url is not the problem.
Since you are using Python, I suggest dropping the subprocess approach and using the urllib module instead:
import urllib
handle = urllib.urlopen('ftp://ftp.name.of.site/filename-prefix/filename/filename+suffix')
print handle.read()
handle.close()
Assuming you are using Python 2 (urllib.request for Python 3)
If you simply need batch download, urllib.urlretrieve is a cleaner approach.

perl - disk name on Linux

What module would you recommend to get a disk name on Linux? I've done some search on CPAN but all modules I've found are too old. In Bash I can use something like:
disk_name=$(df |grep -w '/'|awk '{print $1}'|cut -d/ -f3)
echo $disk_name
sda6
Please help me to understand how to do same in Perl.
Thanks.
The "proper" way to list mounted disks on Linux is through the getmntent() system call, which can be accessed from Perl using the Quota module:
use Quota;
Quota::setmntent();
while (my ($dev, $path, $type, $opts) = Quota::getmntent()) {
print "The root device is $dev.\n" if $path eq "/";
}
Quota::endmntent();
As a bonus, using the Quota module to list device mount points should be fairly portable to other Unixish systems, which parsing various system files or the output of df may not be. Unfortunately, this seemingly basic module is not included in the standard Perl distribution, so you have to get it from CPAN (or from your distro's package repository — for example, Debian / Ubuntu have the libquota-perl package).
Ps. Simply splitting the device name on / and taking the third element (as your cut command does) is not a safe way to turn, say, /dev/sdb1 into sdb1. Some issues with it are that:
Not all block devices have to live under /dev — it's really just a convention.
Even if the device file is under /dev, it might be in a subdirectory of it. For example, my root filesystem is on the device /dev/disk/by-uuid/627f8512-f037-4c6c-9892-6130090c0e0f.
Sometimes, the device name might not even be an actual filesystem path: for example, virtual or in-memory filesystems such as tmpfs are often mounted with the device name none, but it's possible to use any device name with them.
If you do want to get rid of the /dev/ part, I'd suggest a conservative approach using a regexp, for example like this:
if ($dev =~ m(^/dev/(.*)$)s) {
print "The directory $path is mounted from device $1 under /dev.\n";
} else {
print "The directory $path is not mounted from a device under /dev.\n"
}
What you're describing is not the disk name but the device name of the block device representing the partition mounted at root (/). On a regular computer it would normally be something like /dev/sdXN or /dev/hdXN with X being the disk number (primary hard drive is usually A, secondary is B, etc.) and N is the partition number on that device.
Provided you're always running on a unix system, you can try reading /etc/mtab file, which lists all mounted partitions, or the special file /proc/mounts, which pretty much does the same. You'll need to parse it afterwards to find the one you need and get the device name from it.
Alternatively, you can just run df as a process and get its input into perl, something like
open(DF, "df|");
#mount_points = <DF>;
close(DF);
and then iterate over the data to find what you need. I'm not aware of any modules of the top of my head that would do the job for you, but the code seems pretty simple to me anyway.
P.S. Note that Max OS X, while being a derivative of BSD, doesn't have the same file structure and therefore this approach wouldn't work. On Mac OS X, you can read file /etc/fstab.hd, which contains similar info but in a slightly different format.
One way to do just what you are doing in the question
df / | perl -ne 'm"^/\w+/(\w+)";print "$1\n" if defined $1;'
but using a CPAN library to do it is probably better.