how to compare the the array values with different file in different directory?

how to compare the the array values with different file in different directory? - perl

#!/usr/bin/perl
use strict;
use warnings;
use warnings;
use 5.010;
my #names = ("RD", "HD", "MP");
my $flag = 0;
my $filename = 'Sample.txt';
if (open(my $fh, '<', $filename))
{
while (my $row = <$fh>)
{
foreach my $i (0 .. $#names)
{
if( scalar $row =~ / \G (.*?) ($names[$i]) /xg )
{
$flag=1;
}
}
}
if( $flag ==1)
{
say $filename;
}
$flag=0;
}
here i read the content from one file and compare with array values, if file contant matches with array value i just display the file. in the same way how can i access different file from different direcory and compare the array values with same?

Q: How can I access a different file?
A: By specifying a different filename.
By the way: If you are using flags for loop control in Perl, you are doing something wrong. You can specify that this was the last iteration of the loop (in C: break), or that you want to start the next iteration. You can label the loops so that you can break out of as many loops as you like at once:
#!/usr/bin/perl
use 5.010; use warnings;
my #names = qw(RD HD MP);
# unpack command line arguments
my ($filename) = #ARGV;
open my $fh, "<", $filename or die "Oh noes, $filename is bad: $!";
LINE:
while (my $line = <$fh>) {
NAME:
foreach my $name (#names) {
if ($line =~ /\Q$name\E/) { # \QUOT\E the $name to escape everything
say "$filename contains $name";
last LINE;
}
}
}
Other highlights:
using a foreach loop as intended and
removing the (in this context) senseless \G assertion
You can then execute the script as perl script.pl Sample.txt or perl script.pl ../another.dir/foo.bar or whatever.

You can use the ~~ operator in Perl 5.10.
Don't forget to chomp the trailing whitespace.
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
my #names = ('RD', 'HD', 'MP');
my $other_dir = '/tmp';
my $filename = 'Sample.txt';
if ( open( my $fh, '<', "$other_dir/$filename" ) ) {
ROW:
while ( my $row = <$fh> ) {
chomp $row; # remove trailing \n
if ( $row ~~ #names ) {
say $filename;
last ROW;
}
}
}
close $fh;

Related

Print variable after closing the file in Perl

Below code works fine but I want $ip to be printed after closing the file.
use strict;
use warnings;
use POSIX;
my $file = "/tmp/example";
open(FILE, "<$file") or die $!;
while ( <FILE> ) {
my $lines = $_;
if ( $lines =~ m/address/ ) {
my ($string, $ip) = (split ' ', $lines);
print "IP address is: $ip\n";
}
}
close(FILE);
sample data in /tmp/example file
$cat /tmp/example
country us
ip_address 192.168.1.1
server dell

This solution looks for the first line that contains ip_address followed by some space and a sequence of digits and dots
Wrapping the search in a block makes perl delete the lexical variable $fh. Because it is a file handle, that handle will also be automatically closed
Note that I've used autodie to avoid the need to explicitly check the status of the open call
This algorithm will find the first occurrence of ip_address and stop reading the file immediately
use strict;
use warnings 'all';
use autodie;
my $file = '/tmp/example';
my $ip;
{
open my $fh, '<', $file;
while ( <$fh> ) {
if ( /ip_address\h+([\d.]+)/ ) {
$ip = $1;
last;
}
}
}
print $ip // 'undef', "\n";
output
192.168.1.1

Store all ips in an array and you'll then have it for later processing.
The shown code can also be simplified a lot. This assumes a four-number ip and data like that shown in the sample
use warnings;
use strict;
use feature 'say';
my $file = '/tmp/example';
open my $fh, '<', $file or die "Can't open $file: $!";
my #ips;
while (<$fh>) {
if (my ($ip) = /ip_address\s*(\d+\.\d+\.\d+\.\d+)/) {
push #ips, $ip;
}
}
close $fh;
say for #ips;
Or, once you open the file, process all lines with a map
my #ips = map { /ip_address\s*(\d+\.\d+\.\d+\.\d+)/ } <$fh>;
The filehandle is here read in a list context, imposed by map, so all lines from the file are returned. The block in map applies to each in turn, and map returns a flattened list with results.
Some notes
Use three-argument open, it is better
Don't assign $_ to a variable. To work with a lexical use while (my $line = <$fh>)
You can use split but here regex is more direct and it allows you to assign its match so that it is scoped. If there is no match the if fails and nothing goes onto the array

use warnings;
use strict;
my $file = "test";
my ( $string,$ip);
open my $FH, "<",$file) or die $!;
while (my $lines = <FH>) {
if ($lines =~ m/address/){
($string, $ip) = (split ' ', $lines);
}
}
print "IP address is: $ip\n";
This will give you the output you needed. But fails in the case of multiple IP match lines in the input file overwrites the last $ip variable.

Print a variable which is inside two loops

I couldn't figure it out how to escape this.
I would like to print the variable $rfam_column, which is inside two loops. But I cannot just write the print command right after the place where $rfam_column appears, because I would like to print other things which will be outside the loop and combine them to the printed content.
I would appreciate any advice as to what I'm doing wrong here.
use warnings;
use strict;
my $in;
GetOptions('input' => \$in) or die;
if ( $in ) {
my $input = $ARGV[0] or die;
open (my $fh, '<', $input) or die "Can't open $input $!\n";
chomp (my #db_file = <$fh>);
close $fh;
my #list = grep /RNA/, #db_file;
my $column;
my #column = ();
foreach $column ( #list ) {
my #all_columns = split (/\t/, $column);
my $rfam_column = $all_columns[0];
# insert "|" between RFs
foreach $_ ( $rfam_column ) {
s/^/|/;
}
}
}
print "$rfam_column";
Global symbol "$rfam_column" requires explicit package name at script_vbeta.pl line 90.
Execution of script_vbeta.pl aborted due to compilation errors.
EDITED to include all the code and information of the input--output as suggested:
Input file is a table with n lines vs n columns like this (I extracted a few columns otherwise it would be much long to represent in a line):
RF00001 1302 5S ribosomal RNA
RF00006 1307 Vault RNA
RF00007 1308 U12 minor spliceosomal RNA
RF00008 1309 Hammerhead ribozyme (type III)
Output should be like this:
|RF00001|RF00006|RF00007
And the code (usage: script.pl -i input_file):
use warnings;
use strict;
use Getopt::Long;
Getopt::Long::Configure("pass_through");
my $in;
GetOptions('input' => \$in) or die;
if ( $in ) {
my $input = $ARGV[0] or die;
open (my $fh, '<', $input) or die "Can't open $input $!\n";
chomp (my #db_file = <$fh>);
close $fh;
my #list = grep /RNA/, #db_file;
my $column;
my #column = ();
foreach $column ( #list ) {
my #all_columns = split (/\t/, $column);
my $rfam_column = $all_columns[0];
# insert "|" between RFs
foreach $_ ( $rfam_column ) {
s/^/|/;
}
}
}
print "$rfam_column";

I think you want
if ($in) {
...
my #rfams;
for my $row (#list) {
my #fields = split(/\t/, $row);
my $rfam = $fields[0];
push #rfams, $rfam;
}
my $rfams = join('|', #rfams);
print("$rfams\n");
}

I would like to print other things which will be outside the loop and combine them to the $rfam_column content
You can include anything that is in an outer scope in print. You can just put your print statement inside the inner loop
By the way, I don't know what you mean by
# insert "|" between RFs
foreach $_ ($rfam_column) {
s/^/|/;
}
That is the same as
$rfam_column =~ s/^/|/;
which just adds a pipe | character to the beginning of the string
What is an RF?

Loop through file in perl and remove strings with less than 4 characters

I am trying to bring a file loop through it and remove any strings that have less than four characters in it and then print the list. I come from a javascript world and perl is brand new to me.
use strict;
use warnings;
sub lessThan4 {
open( FILE, "<names.txt" );
my #LINES = <FILE>;
close( FILE );
open( FILE, ">names.txt" );
foreach my $LINE ( #LINES ) {
print FILE $LINE unless ( $LINE.length() < 4 );
}
close( FILE );
}

use strict;
use warnings;
# automatically throw exception if open() fails
use autodie;
sub lessThan4 {
my #LINES = do {
# modern perl uses lexical, and three arg open
open(my $FILE, "<", "names.txt");
<$FILE>;
};
# remove newlines
chomp(#LINES);
open(my $FILE, ">", "names.txt");
foreach my $LINE ( #LINES ) {
print $FILE "$LINE\n" unless length($LINE) < 4;
# possible alternative to 'unless'
# print $FILE "$LINE\n" if length($LINE) >= 4;
}
close($FILE);
}

You're basically there. I hope you'll find some comments on your code useful.
# Well done for including these. So many new Perl users don't
use strict;
use warnings;
# Perl programs traditionally use all lower-case subroutine names
sub lessThan4 {
# 1/ You should use lexical variables for filehandles
# 2/ You should use the three-argument version of open()
# 3/ You should always check the return value from open()
open( FILE, "<names.txt" );
# Upper-case variable names in Perl are assumed to be global variables.
# This is a lexical variable, so name it using lower case.
my #LINES = <FILE>;
close( FILE );
# Same problems with open() here.
open( FILE, ">names.txt" );
foreach my $LINE ( #LINES ) {
# This is your biggest problem. Perl doesn't yet embrace the idea of
# calling methods to get properties of a variable. You need to call
# length() as a function.
print FILE $LINE unless ( $LINE.length() < 4 );
}
close( FILE );
}
Rewriting to take all that into account, we get the following:
use strict;
use warnings;
sub less_than_4 {
open( my $in_file_h, '<', 'names.txt' ) or die "Can't open file: $!";
my #lines = <$in_file_h>;
close( $in_file_h );
open( my $out_file_h, '>', 'names.txt' ) or die "Can't open file: $!";
foreach my $line ( #lines ) {
# Note: $line will include the newline character, so you might need
# to increase 4 to 5 here
print $out_file_h $line unless length $line < 4;
}
close( $out_file_h );
}

I am trying to bring a file loop through it and remove any strings that have less than four characters in it and then print the list.
I suppose you need to remove strings from the file which are less than 4 chars in length.
#!/usr/bin/perl
use strict;
use warnings;
open ($FH, "<", "names.txt");
my #final_list;
while (my $line = <$FH>) {
map {
length($_) > 4 and push (#final_list, $_) ;
} split (/\s/, $line);
}
print "\nWords with more than 4 chars: #final_list\n";

#Please try this one:
use strict;
use warnings;
my #new;
while(<DATA>)
{
#Push all the values less than 4 characters
push(#new, $_) unless(length($_) > '4');
}
print #new;
__DATA__
Williams
John
Joe
Lee
Albert
Francis
Sun

Can i collect the output of find(\&wanted, #directories) in an array

I am writing a script which will traverse the directory(including subdir also) and push the desired file in an array so that i can work on each file.
Here is my code:
use strict;
use warnings;
use File::Find;
my $path = $ARGV[0];
find({ wanted => \&GetappropriateFile }, $path);
sub GetappropriateFile
{
my $file = $_;
my #all_file;
# print "$file\n";
if ( -f and /traces[_d+]/)
{
#print "$file\n";
open(my $fh, "<", $file) or die "cannot open file:$!\n";
while( my $line = <$fh>){
$line =~ /Cmd\sline:\s+com.android*/;
push(#all_file,$file);
#print "$file\n";
}
close($fh);
#print"#all_file\n";
}
}
Problem Area : my $file = $_;
Instead of using " $file" if i could get a way to use an array here then i can easily read those files one by one and filter it.
Here what i am tring to do is : I have to open each file and check for the string "Cmd line: com.android" as soon as i get this string in the file i have to push this current file in an array and start reading the another file.

It would be better to avoid global vars.
use strict;
use warnings;
use File::Find qw( find );
sub IsAppropriateFile {
my ($file) = #_;
if (-f $file && $file =~ /traces[_d+]/) {
open(my $fh, "<", $file) or die "cannot open file:$!\n";
while ( my $line = <$fh> ) {
if ($line =~ /Cmd\sline:\s+com.android*/) {
return 1;
}
}
}
return 0;
}
{
my $path = $ARGV[0];
my #matching_files;
find({
wanted => sub {
push #matching_files, $_ if IsAppropriateFile($_);
},
}, $path);
print("$_\n") for #matching_files; # Or whatever.
}

Put declaration of #all_file outside of function, and use it after find() finishes,
my #all_file;
sub GetappropriateFile
{
..
}
You could also stop with file reading after successful match,
if ($line =~ /Cmd\sline:\s+com.android*/) {
push(#all_file, $file);
last;
}

Perl - empty rows while writing CSV from Excel

I want to convert excel-files to csv-files with Perl. For convenience I like to use the module File::Slurp for read/write operations. I need it in a subfunction.
While printing out to the screen, the program generates the desired output, the generated csv-files unfortunately just contain one row with semicolons, field are empty.
Here is the code:
#!/usr/bin/perl
use File::Copy;
use v5.14;
use Cwd;
use File::Slurp;
use Spreadsheet::ParseExcel;
sub xls2csv {
my $currentPath = getcwd();
my #files = <$currentPath/stage0/*.xls>;
for my $sourcename (#files) {
print "Now working on $sourcename\n";
my $outFile = $sourcename;
$outFile =~ s/xls/csv/g;
print "Output CSV-File: ".$outFile."\n";
my $source_excel = new Spreadsheet::ParseExcel;
my $source_book = $source_excel->Parse($sourcename)
or die "Could not open source Excel file $sourcename: $!";
foreach my $source_sheet_number ( 0 .. $source_book->{SheetCount} - 1 )
{
my $source_sheet = $source_book->{Worksheet}[$source_sheet_number];
next unless defined $source_sheet->{MaxRow};
next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow};
next unless defined $source_sheet->{MaxCol};
next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol};
foreach my $row_index (
$source_sheet->{MinRow} .. $source_sheet->{MaxRow} )
{
foreach my $col_index (
$source_sheet->{MinCol} .. $source_sheet->{MaxCol} )
{
my $source_cell =
$source_sheet->{Cells}[$row_index][$col_index];
if ($source_cell) {
print $source_cell->Value, ";"; # correct output!
write_file( $outFile, { binmode => ':utf8' }, $source_cell->Value, ";" ); # only one row of semicolons with empty fields!
}
}
print "\n";
}
}
}
}
xls2csv();
I know it has something to do with the parameter passing in the write_file function, but couldn't manage to fix it.
Has anybody an idea?
Thank you very much in advance.

write_file will overwrite the file unless the append => 1 option is given. So this:
write_file( $outFile, { binmode => ':utf8' }, $source_cell->Value, ";" );
Will write a new file for each new cell value. It does however not match your description of "only one row of semi-colons of empty fields", as it should only be one semi-colon, and one value.
I am doubtful towards this sentiment from you: "For convenience I like to use the module File::Slurp". While the print statement works as it should, using File::Slurp does not. So how is that convenient?
What you should do, if you still want to use write_file is to gather all the lines to print, and then print them all at once at the end of the loop. E.g.:
$line .= $source_cell->Value . ";"; # use concatenation to build the line
...
push #out, "$line\n"; # store in array
...
write_file(...., \#out); # print the array
Another simple option would be to use join, or to use the Text::CSV module.

Well, in this particular case, File::Slurp was indeed complicating this for me. I just wanted to avoid to repeat myself, which I did in the following clumsy working solution:
#!/usr/bin/perl
use warnings;
use strict;
use File::Copy;
use v5.14;
use Cwd;
use File::Basename;
use File::Slurp;
use Tie::File;
use Spreadsheet::ParseExcel;
use open qw/:std :utf8/;
# ... other functions
sub xls2csv {
my $currentPath = getcwd();
my #files = <$currentPath/stage0/*.xls>;
my $fh;
for my $sourcename (#files) {
say "Now working on $sourcename";
my $outFile = $sourcename;
$outFile =~ s/xls/csv/gi;
if ( -e $outFile ) {
unlink($outFile) or die "Error: $!";
print "Old $outFile deleted.";
}
my $source_excel = new Spreadsheet::ParseExcel;
my $source_book = $source_excel->Parse($sourcename)
or die "Could not open source Excel file $sourcename: $!";
foreach my $source_sheet_number ( 0 .. $source_book->{SheetCount} - 1 )
{
my $source_sheet = $source_book->{Worksheet}[$source_sheet_number];
next unless defined $source_sheet->{MaxRow};
next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow};
next unless defined $source_sheet->{MaxCol};
next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol};
foreach my $row_index (
$source_sheet->{MinRow} .. $source_sheet->{MaxRow} )
{
foreach my $col_index (
$source_sheet->{MinCol} .. $source_sheet->{MaxCol} )
{
my $source_cell =
$source_sheet->{Cells}[$row_index][$col_index];
if ($source_cell) {
print $source_cell->Value, ";";
open( $fh, '>>', $outFile ) or die "Error: $!";
print $fh $source_cell->Value, ";";
close $fh;
}
}
print "\n";
open( $fh, '>>', $outFile ) or die "Error: $!";
print $fh "\n";
close $fh;
}
}
}
}
xls2csv();
I'm actually NOT happy with it, since I'm opening and closing the files so often (I have many files with many lines). That's not very clever in terms of performance.
Currently I still don't know how to use the split or Text:CSV in this case, in order to put everything into an array and to open, write and close each file only once.
Thank you for your answer TLP.