Anybody help me make a virtual file system in Perl.
Very simple, 2 depth level, as
/subdir
subdir-l2
file2.txt
/file1.txt
I try use Fuse.pm, but not understand how create subdir level. I create %files hash, and if go to subdir, recreate it with new records. It's for test only.
#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use Fuse;
use POSIX qw(ENOENT EISDIR EINVAL);
my (%files) = (
'.' => {
type => 0040,
mode => 0755,
ctime => 1490603721
},
subdir => {
type => 0040,
mode => 0755,
ctime => 1490603721
},
"file1.txt" => {
type => 0100,
mode => 0755,
ctime => 1490603721
}
);
sub filename_fixup {
my ($file) = shift;
$file =~ s,^/,,;
$file = '.' unless length($file);
return $file;
}
sub getdir {
my $tmp = shift;
if ($tmp eq '/') {
return (keys %files),0;
} else {
(%files) = (
'.' => {
type => 0040,
mode => 0755,
ctime => 1490603721
},
# /subdir/subdir-l2
"subdir-l2" => {
type => 0040,
mode => 0755,
ctime => 1490603721
} ,
# /subdir/a-l2.file
"file2.txt" => {
cont => "File 'al2'.\n",
type => 0100,
mode => 0755,
ctime => 1490603721
}
);
return (keys %files),0;
}
}
sub getattr {
my ($file) = filename_fixup(shift);
$file =~ s,^/,,;
$file = '.' unless length($file);
return -ENOENT() unless exists($files{$file});
my ($size) = exists($files{$file}{cont}) ? length($files{$file}{cont}) : 0;
$size = $files{$file}{size} if exists $files{$file}{size};
my ($modes) = ($files{$file}{type}<<9) + $files{$file}{mode};
my ($dev, $ino, $rdev, $blocks, $gid, $uid, $nlink, $blksize) = (0,0,0,1,0,0,1,1024);
my ($atime, $ctime, $mtime);
$atime = $ctime = $mtime = $files{$file}{ctime};
return ($dev,$ino,$modes,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks);
}
Fuse::main(
mountpoint => "/tmp/123",
getdir => \&getdir,
getattr => \&getattr,
);
one level mount fine, but if go to deeper i get
?????????? ? ? ? ? ? file2.txt
?????????? ? ? ? ? ? subdir-l2
I'm really not a regular user of the Fuse module, neither of FUSE system. Tinkered with this issue out of pure curiosity. Thus, although I can't explain in very much details how to use the plain Fuse module to achieve your goal, I have a working code that does create the wanted filesystem (at least on my system, and seems that it is capable of creating any arbitrary filesystem tree), and I can explain how I got this code working.
So first of all I discovered the Fuse::Simple module on CPAN.
Its SYNOPSIS shows that it provides a really simple API to the Fuse module for creating arbitrary filesystems from a hash structure. Its source code isn't that huge, so I just created 'listing.pl' script file and copied there most of the functions (except fserr that caused a Modification of a read-only value exception), put the main sub contents out, so they will be the main script's flow, hardcoded the filesystem structure ($fs var), and made some little adjustments here and there (like declare vars with my to prevent exceptions), and finally got the filesystem mounted, with all directories listed and files readable. So this is the code I got at last:
#!/usr/bin/env perl
use strict;
use warnings;
use diagnostics;
use Carp;
use Fuse;
use Errno qw(:POSIX); # ENOENT EISDIR etc
use Fcntl qw(:DEFAULT :mode); # S_IFREG S_IFDIR, O_SYNC O_LARGEFILE etc.
use Switch;
my $debug = 0;
my %codecache = ();
my $ctime = time();
my $uid = $>;
my $gid = $) + 0;
my $fs = {
"file1.txt" => "File 1 contents",
"subdir" => {
"subdir-l2" => {},
"file2.txt" => "File 2 contents"
}
};
# some default args
my %args = (
"mountpoint" => "listing",
"debug" => $debug,
"fuse_debug" => 0,
"threaded" => 0,
"/" => $fs
);
# the default subs
my %fs_subs = (
"chmod" => \&fs_not_imp,
"chown" => \&fs_not_imp,
"flush" => \&fs_flush,
"fsync" => \&fs_not_imp,
"getattr" => \&fs_getattr,
"getdir" => \&fs_getdir,
"getxattr" => \&fs_not_imp,
"link" => \&fs_not_imp,
"listxattr" => \&fs_not_imp,
"mkdir" => \&fs_not_imp,
"mknod" => \&fs_not_imp,
"open" => \&fs_open,
"read" => \&fs_read,
"readlink" => \&fs_readlink,
"release" => \&fs_release,
"removexattr" => \&fs_not_imp,
"rmdir" => \&fs_not_imp,
"rename" => \&fs_not_imp,
"setxattr" => \&fs_not_imp,
"statfs" => \&fs_statfs,
"symlink" => \&fs_not_imp,
"truncate" => \&fs_truncate,
"unlink" => \&fs_not_imp,
"utime" => sub{return 0},
"write" => \&fs_write,
);
# except extract these ones back out.
$debug = delete $args{"debug"};
$args{"debug"} = delete( $args{"fuse_debug"} ) || 0;
delete $args{"/"};
# add the functions, if not already defined.
# wrap in debugger if debug is set.
for my $name (keys %fs_subs) {
my $sub = $fs_subs{$name};
# $sub = wrap($sub, $name) if $debug;
$args{$name} ||= $sub;
}
Fuse::main(%args);
sub fetch {
my ($path, #args) = #_;
my $obj = $fs;
for my $elem (split '/', $path) {
next if $elem eq ""; # skip empty // and before first /
$obj = runcode($obj); # if there's anything to run
# the dir we're changing into must be a hash (dir)
return ENOTDIR() unless ref($obj) eq "HASH";
# note that ENOENT and undef are NOT the same thing!
return ENOENT() unless exists $obj->{$elem};
$obj = $obj->{$elem};
}
return runcode($obj, #args);
}
sub runcode {
my ($obj, #args) = #_;
while (ref($obj) eq "CODE") {
my $old = $obj;
if (#args) { # run with these args. don't cache
delete $codecache{$old};
print "running $obj(",quoted(#args),") NO CACHE\n" if $debug;
$obj = saferun($obj, #args);
} elsif (exists $codecache{$obj}) { # found in cache
print "got cached $obj\n" if $debug;
$obj = $codecache{$obj}; # could be undef, or an error, BTW
} else {
print "running $obj() to cache\n" if $debug;
$obj = $codecache{$old} = saferun($obj);
}
if (ref($obj) eq "NOCACHE") {
print "returned a nocache() value - flushing\n" if $debug;
delete $codecache{$old};
$obj = $$obj;
}
print "returning ",ref($obj)," ",
defined($obj) ? $obj : "undef",
"\n" if $debug;
}
return $obj;
}
sub saferun {
my ($sub, #args) = #_;
my $ret = eval { &$sub(#args) };
my $died = $#;
if (ref($died)) {
print "+++ Error $$died\n" if ref($died) eq "ERROR";
return $died;
} elsif ($died) {
print "+++ $died\n";
# stale file handle? moreorless?
return ESTALE();
}
return $ret;
}
sub nocache {
return bless(\ shift, "NOCACHE"); # yup, utter abuse of bless :-)
}
sub dump_open_flags {
my $flags = shift;
printf " flags: 0%o = (", $flags;
for my $bits (
[ O_ACCMODE(), O_RDONLY(), "O_RDONLY" ],
[ O_ACCMODE(), O_WRONLY(), "O_WRONLY" ],
[ O_ACCMODE(), O_RDWR(), "O_RDWR" ],
[ O_APPEND(), O_APPEND(), "|O_APPEND" ],
[ O_NONBLOCK(), O_NONBLOCK(), "|O_NONBLOCK" ],
[ O_SYNC(), O_SYNC(), "|O_SYNC" ],
[ O_DIRECT(), O_DIRECT(), "|O_DIRECT" ],
[ O_LARGEFILE(), O_LARGEFILE(), "|O_LARGEFILE" ],
[ O_NOFOLLOW(), O_NOFOLLOW(), "|O_NOFOLLOW" ],
) {
my ($mask, $flag, $name) = #$bits;
if (($flags & $mask) == $flag) {
$flags -= $flag;
print $name;
}
}
printf "| 0%o !!!", $flags if $flags;
print ")\n";
}
sub accessor {
my $var_ref = shift;
croak "accessor() requires a reference to a scalar var\n"
unless defined($var_ref) && ref($var_ref) eq "SCALAR";
return sub {
my $new = shift;
$$var_ref = $new if defined($new);
return $$var_ref;
}
}
sub fs_not_imp { return -ENOSYS() }
sub fs_flush {
# we're passed a path, but finding my coderef stuff from a path
# is a bit of a 'mare. flush the lot, won't hurt TOO much.
print "Flushing\n" if $debug;
%codecache = ();
return 0;
}
sub easy_getattr {
my ($mode, $size) = #_;
return (
0, 0, # $dev, $ino,
$mode,
1, # $nlink, see fuse.sourceforge.net/wiki/index.php/FAQ
$uid, $gid, # $uid, $gid,
0, # $rdev,
$size, # $size,
$ctime, $ctime, $ctime, # actually $atime, $mtime, $ctime,
1024, 1, # $blksize, $blocks,
);
}
sub fs_getattr {
my $path = shift;
my $obj = fetch($path);
# undef doesn't actually mean "file not found", it could be a coderef
# file-sub which has returned undef.
return easy_getattr(S_IFREG | 0200, 0) unless defined($obj);
switch (ref($obj)) {
case "ERROR" { # this is an error to be returned.
return -$$obj;
}
case "" { # this isn't a ref, it's a real string "file"
return easy_getattr(S_IFREG | 0644, length($obj));
}
# case "CODE" should never happen - already been run by fetch()
case "HASH" { # this is a directory hash
return easy_getattr(S_IFDIR | 0755, 1);
}
case "SCALAR" { # this is a scalar ref. we use these for symlinks.
return easy_getattr(S_IFLNK | 0777, 1);
}
else { # what the hell is this file?!?
print "+++ What on earth is ",ref($obj)," $path ?\n";
return easy_getattr(S_IFREG | 0000, 0);
}
}
}
sub fs_getdir {
my $obj = fetch(shift);
return -$$obj if ref($obj) eq "ERROR"; # THINK this is a good idea.
return -ENOENT() unless ref($obj) eq "HASH";
return (".", "..", sort(keys %$obj), 0);
}
sub fs_open {
# doesn't really need to open, just needs to check.
my $obj = fetch(shift);
my $flags = shift;
dump_open_flags($flags) if $debug;
# if it's undefined, and we're not writing to it, return an error
return -EBADF() unless defined($obj) or ($flags & O_ACCMODE());
switch (ref($obj)) {
case "ERROR" { return -$$obj; }
case "" { return 0 } # this is a real string "file"
case "HASH" { return -EISDIR(); } # this is a directory hash
else { return -ENOSYS(); } # what the hell is this file?!?
}
}
sub fs_read {
my $obj = fetch(shift);
my $size = shift;
my $off = shift;
return -ENOENT() unless defined($obj);
return -$$obj if ref($obj) eq "ERROR";
# any other types of refs are probably bad
return -ENOENT() if ref($obj);
if ($off > length($obj)) {
return -EINVAL();
} elsif ($off == length($obj)) {
return 0; # EOF
}
return substr($obj, $off, $size);
}
sub fs_readlink {
my $obj = fetch(shift);
return -$$obj if ref($obj) eq "ERROR";
return -EINVAL() unless ref($obj) eq "SCALAR";
return $$obj;
}
sub fs_release {
my ($path, $flags) = #_;
dump_open_flags($flags) if $debug;
return 0;
}
sub fs_statfs {
return (
255, # $namelen,
1,1, # $files, $files_free,
1,1, # $blocks, $blocks_avail, # 0,0 seems to hide it from df?
2, # $blocksize,
);
}
sub fs_truncate {
my $obj = fetch(shift, ""); # run anything to set it to ""
return -$$obj if ref($obj) eq "ERROR";
return 0;
}
sub fs_write {
my ($path, $buf, $off) = #_;
my $obj = fetch($path, $buf, $off); # this runs the coderefs!
return -$$obj if ref($obj) eq "ERROR";
return length($buf);
}
Final word: I didn't try to use the module itself (it's not listed in my distro package repository, and I was too lazy (sorry) to install it by cpanm or other way). But I think that if I'll have to just use FUSE with Perl, I'll probably just use Fuse::Simple instead of Fuse, maybe forking it. I'd use plain Fuse only for my academic research, I think...
Hope this helps.
So I have stumbled upon a little issue when trying to build out a simple "Airport Search Script" in Perl.
my $filename = '/home/student/perl-basic/topic-07/iata_airports.csv';
my $number = '1';
my $matching;
my $latitude;
my $longitude;
my $word = 'false';
GetOptions (
"filename=s" => \$filename,
"number=i" => \$number,
"matching=s" => \$matching,
"latitude=f" => \$latitude,
"longitude=f" => \$longitude,
"word=s" => \$word
);
sub parse_airports {
my $file = shift;
my $csv = Text::CSV->new( { binary => 1, eol => $/ } );
open ( my $fh, "<", $file ), or die "Error opening input file: $!";
my $ra_colnames = $csv->getline ( $fh );
$csv->column_names( #$ra_colnames );
my $ra_airports = $csv->getline_hr_all( $fh );
close ( $fh );
return $ra_airports;
}
sub get_name_matching_airports {
}
my $rah_airports = parse_airports( $filename );
my $rah_airports_found = [];
if ($matching) {
say "Up to $number airports matching $matching in $filename:";
$rah_airports_found = get_name_matching_airports(
airports => $rah_airports,
matching_string => $matching,
word => $word,
);
}
elsif ($latitude && $longitude) {
say "Up to $number airports near [$latitude, $longitude] in $filename:"
}
else {
say "Must have at least --matching, or --latitude and --longitude as arguments";
}
print pp($rah_airports_found);
So where I am struggling is in the "sub get_name_matching_airports"
Because you do not have the file let me explain the file structure.
It is a hash (ALL IATA Airports) with hashes (DETAILS of each airport). There are around 15 keys in each airport hash and one of the keys titles is (NAME). I have opened the file and parsed all the info into a hash ref which is returned at the end of the sub "parse_airports".
In the sub "get_name_matching_airports" I need to find additional airports with similar names based on the argument I passed in, into ($matching).
EXAMPLE: I parse (case-insensitive) "London" as an argument from the command line e.g. ./search_airports2 --matching London. In the sub "get_name_matching_airports" I will need to respond with any airport that has london (case-insensitive) in key(name).
Then push these newly found airports which are similar into the array "rah_airports_found" and in the end print this out.
SO I SOLVED MY PROBLEM WITH THE FOLLOWING CODE:
sub get_name_matching_airports {
my %params = (
airports => undef,
matching_string => undef,
word => undef,
#_
);
my #rah_airports_found;
my $ra_airports = $params{airports};
my $counter = 0;
foreach my $i ( #$ra_airports ) {
if ( $params{word} ) {
if ( $i->{name} eq $params{matching_string} ) {
push #rah_airports_found, $i;
$counter++;
}
}
else {
if ( $i->{name} =~ /$params{matching_string}/i ) {
push #rah_airports_found, $i;
$counter++;
}
if ( defined( $number ) && $counter == $number ) {
return \#rah_airports_found;
}
}
}
return \#rah_airports_found;
}
Example:
for my $Airport_rf (keys %{$rah_airports}) {
if ( $Airport_rf->{NAME} =~ m{\Q$matching\E}xi) {
# do your stuff here
}
}
If you donĀ“t know the exact key of the hashref, you have to match the CLI parameter against all values.
I am searching through three text files for one of four specific gene names (stored in $var#). When it is found, it takes the value found after the gene name and adds it to a count. We then average the value by taking total $count_exp# and dividing by the number of appearances within all files.
What is the proper way to let the user know when a gene name is not found in each file? I'm having difficulties handling the flow of this loop/conditional.
Here is a snippet of code that handles one of the three text files....
foreach $hyperosmotic(#hyperosmotic)
{
#hyperosmotic1=split(/\t/,$hyperosmotic);
$name=$hyperosmotic1[0];
$exp=$hyperosmotic1[1];
chomp $name;
chomp $exp;
if ($name eq $var1)
{
$count_exp1 = $count_exp1 + $exp;
$count_var1 = ++$count_var1;
}
elsif ($name eq $var2)
{
$count_exp2 = $count_exp2 + $exp;
$count_var2 = ++$count_var2;
}
elsif ($name eq $var3)
{
$count_exp3 = $count_exp3 + $exp;
$count_var3 = ++$count_var3;
}
elsif ($name eq $var4)
{
$count_exp4 = $count_exp4 + $exp;
$count_var4 = ++$count_var4;
}
}
You basically want to use arrays:
(and use strict; use warnings;)
my #count_var = (0)x4;
my #count_exp = (0)x4;
my #var = ($var1, $var2, ...);
HYPEROSMOTIC:
for my $hyperosmotic (#hyperosmotic) {
my ($name, $exp) = split /\t/, $hyperosmotic;
for my $i (0 .. $#var) {
if ($name eq $var[$i]) {
$count_exp[$i] += $exp;
$count_var[$i]++;
next HYPEROSMOTIC; # jump into next iteration of the labeled loop
}
}
# this code is only reached if no var matched:
die qq[I don't have a var for name "$name"];
# That just threw a fatal error. You may want to do something different.
}
You could improve efficiency by using hashes:
my %counts = (
$var1 => {exp => 0, var => 0},
$var2 => {exp => 0, var => 0},
$var3 => {exp => 0, var => 0},
$var4 => {exp => 0, var => 0},
);
for my $hyperosmotic (#hyperosmotic) {
my ($name, $exp) = split ...;
if (my $count = $counts{$name}) {
$count->{exp} += $exp;
$count->{var}++;
} else {
die qq[I don't have a var for name "$name"];
}
}
I would like to compare the values of two hash references.
The data dumper of my first hash is this:
$VAR1 = {
'42-MG-BA' => [
{
'chromosome' => '19',
'position' => '35770059',
'genotype' => 'TC'
},
{
'chromosome' => '2',
'position' => '68019584',
'genotype' => 'G'
},
{
'chromosome' => '16',
'position' => '9561557',
'genotype' => 'G'
},
And the second hash is similar to this but with more hashes in the array. I would like to compare the genotype of my first and second hash if the position and the choromosome matches.
map {print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n"}sort keys %$cave_snp_list;
map {print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n"}sort keys %$geno_seq_list;
I could do that for the first array of the hashes.
Could you help me in how to work for all the arrays?
This is my actual code in full
#!/software/bin/perl
use strict;
use warnings;
use Getopt::Long;
use Benchmark;
use Config::Config qw(Sequenom.ini);
useDatabase::Conn;
use Data::Dumper;
GetOptions("sam=s" => \my $sample);
my $geno_seq_list = getseqgenotypes($sample);
my $cave_snp_list = getcavemansnpfile($sample);
#print Dumper($geno_seq_list);
print scalar %$geno_seq_list, "\n";
foreach my $sam (keys %{$geno_seq_list}) {
my $seq_used = $geno_seq_list->{$sam};
my $cave_used = $cave_snp_list->{$sam};
print scalar(#$geno_seq_list->{$_}) if sort keys %$geno_seq_list, "\n";
print scalar(#$cave_used), "\n";
#foreach my $seq2com (# {$seq_used } ){
# foreach my $cave2com( # {$cave_used} ){
# print $seq2com->{chromosome},":" ,$cave2com->{chromosome},"\n";
# }
#}
map { print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n" } sort keys %$cave_snp_list;
map { print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n" } sort keys %$geno_seq_list;
}
sub getseqgenotypes {
my $snpconn;
my $gen_list = {};
$snpconn = Database::Conn->new('live');
$snpconn->addConnection(DBI->connect('dbi:Oracle:pssd.world', 'sn', 'ss', { RaiseError => 1, AutoCommit => 0 }),
'pssd');
#my $conn2 =Database::Conn->new('live');
#$conn2->addConnection(DBI->connect('dbi:Oracle:COSI.world','nst_owner','nst_owner', {RaiseError =>1 , AutoCommit=>0}),'nst');
my $id_ind = $snpconn->execute('snp::Sequenom::getIdIndforExomeSample', $sample);
my $genotype = $snpconn->executeArrRef('snp::Sequenom::getGenotypeCallsPosition', $id_ind);
foreach my $geno (#{$genotype}) {
push #{ $gen_list->{ $geno->[1] } }, {
chromosome => $geno->[2],
position => $geno->[3],
genotype => $geno->[4],
};
}
return ($gen_list);
} #end of sub getseqgenotypes
sub getcavemansnpfile {
my $nstconn;
my $caveman_list = {};
$nstconn = Database::Conn->new('live');
$nstconn->addConnection(
DBI->connect('dbi:Oracle:CANP.world', 'nst_owner', 'NST_OWNER', { RaiseError => 1, AutoCommit => 0 }), 'nst');
my $id_sample = $nstconn->execute('nst::Caveman::getSampleid', $sample);
#print "IDSample: $id_sample\n";
my $file_location = $nstconn->execute('nst::Caveman::getCaveManSNPSFile', $id_sample);
open(SNPFILE, "<$file_location") || die "Error: Cannot open the file $file_location:$!\n";
while (<SNPFILE>) {
chomp;
next if /^>/;
my #data = split;
my ($nor_geno, $tumor_geno) = split /\//, $data[5];
# array of hash
push #{ $caveman_list->{$sample} }, {
chromosome => $data[0],
position => $data[1],
genotype => $nor_geno,
};
} #end of while loop
close(SNPFILE);
return ($caveman_list);
}
The problem that I see is that you're constructing a tree for generic storage of data, when what you want is a graph, specific to the task. While you are constructing the record, you could also be constructing the part that groups data together. Below is just one example.
my %genotype_for;
my $record
= { chromosome => $data[0]
, position => $data[1]
, genotype => $nor_geno
};
push #{ $gen_list->{ $geno->[1] } }, $record;
# $genotype_for{ position }{ chromosome }{ name of array } = genotype code
$genotype_for{ $data[1] }{ $data[0] }{ $sample } = $nor_geno;
...
return ( $caveman_list, \%genotype_for );
In the main line, you receive them like so:
my ( $cave_snp_list, $geno_lookup ) = getcavemansnpfile( $sample );
This approach at least allows you to locate similar position and chromosome values. If you're going to do much with this, I might suggest an OO approach.
Update
Assuming that you wouldn't have to store the label, we could change the lookup to
$genotype_for{ $data[1] }{ $data[0] } = $nor_geno;
And then the comparison could be written:
foreach my $pos ( keys %$small_lookup ) {
next unless _HASH( my $sh = $small_lookup->{ $pos } )
and _HASH( my $lh = $large_lookup->{ $pos } )
;
foreach my $chrom ( keys %$sh ) {
next unless my $sc = $sh->{ $chrom }
and my $lc = $lh->{ $chrom }
;
print "$sc:$sc";
}
}
However, if you had limited use for the larger list, you could construct the specific case
and pass that in as a filter when creating the longer list.
Thus, in whichever loop creates the longer list, you could just go
...
next unless $sample{ $position }{ $chromosome };
my $record
= { chromosome => $chromosome
, position => $position
, genotype => $genotype
};
...