modification of script in perl - perl

currently I have the following script
#!/usr/bin/env perl
use strict;
use warnings;
my %seen;
my $header = <> . <>;
print $header;
my $last_sequence_number = 0;
open( my $output, ">", "output.$last_sequence_number.out" ) or die $!;
print {$output} $header;
$seen{$last_sequence_number}++;
while (<>) {
my ($key) = split;
next unless $key =~ m/^\d+$/;
my $sequence_number = int( $key / 1000 );
if ( not $sequence_number == $last_sequence_number ) {
print "Opening new file for $sequence_number\n";
close($output);
open( $output, ">", "output.$sequence_number.out" ) or die $!;
print {$output} $header unless $seen{$sequence_number}++;
$last_sequence_number = $sequence_number;
}
print {$output} $_;
}
the script splits a file into other files with the pattern file 1 file 2 ... now I would need to pass to the script another parameter which allows to specify a prefix for the output so if this additional input is 1 then the output would be
1_file1,1_file2....and so on.. how could I do that?
I know that something like
use Getopt::Long;
could be used?
tried this
#!/usr/bin/env perl
use strict;
use warnings;
my %seen;
my $header = <> . <>;
print $header;
my ( $suffix, $filename ) = #ARGV;
open ( my $input, "<", $filename ) or die $!;
my $last_sequence_number = 0;
open( my $output, ">", "output.$last_sequence_number.out" ) or die $!;
print {$output} $header;
$seen{$last_sequence_number}++;
while (<$input>) {
my ($key) = split;
next unless $key =~ m/^\d+$/;
my $sequence_number = int( $key / 1000 );
if ( not $sequence_number == $last_sequence_number ) {
print "Opening new file for $sequence_number\n";
close($output);
open( $output, ">", "output.$sequence_number.out" ) or die $!;
print {$output} $header unless $seen{$sequence_number}++;
$last_sequence_number = $sequence_number;
}
print {$output} $_;
}
but that is not working. What is wrong?
I get
No such file or directory at ./spl.pl line 10, <> line 2.
after the header is printed.

As Sobrique says, your problem is the magical nature of <>. But I don't think that it's as hard to deal with as he thinks.
The point is that <> looks at the current value of #ARGV. So you can add other command line arguments as long as you ensure that you have removed them from #ARGV before you use <> for the first time.
So change your code so that it starts like this:
my %seen;
my $prefix = shift;
my $header = <> . <>;
You can then call your program like this:
$ your_program.pl prefix_goes_here list of file names...
Everything else should now work the same as it currently does, but you have your prefix stored away in $prefix so that you can use it in your print statements.
I hope that's what you wanted. Your question isn't particularly clear.

I would do something like this.
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use Getopt::Long qw(:config bundling);
use Pod::Usage;
{
my $man = 0;
my $help = 0;
my $verbose = 0;
my $prefix = '';
my $suffix = '';
my $header_lines = 2;
my $bunch_size = 1000;
GetOptions(
'help|?' => \$help,
'man' => \$man,
'verbose|v+' => \$verbose,
'prefix|p=s' => \$prefix,
'suffix|s=s' => \$suffix,
'header|h=i' => \$header_lines,
'bunch|batch|bucket|b=i' => \$bunch_size
) or pod2usage(2);
pod2usage(1) if $help;
pod2usage( -exitval => 0, -verbose => 2 ) if $man;
pod2usage(
-exitval => 3,
-message => "Headers lines can't be negative number"
) if $header_lines < 0;
pod2usage(
-exitval => 4,
-message => "Bunch size has to be positive"
) unless $bunch_size > 0;
my $header = '';
$header .= <> for 1 .. $header_lines;
my %seen;
my $current_output_number = -1;
sub key2output { int( shift() / $bunch_size ) }
sub set_output {
my $output_number = shift;
if ( $output_number != $current_output_number ) {
my $seen = $seen{$output_number}++;
printf STDOUT "Opening %sfile for %d\n", $seen ? '' : 'new ',
$output_number
if $verbose;
open my $fh, $seen ? '>>' : '>',
$prefix . $output_number . $suffix;
select $fh;
print $header unless $seen;
$current_output_number = $output_number;
}
}
}
while (<>) {
my ($key) = /^(\d+)\s/;
next unless defined $key;
set_output( key2output($key) );
print;
}
__END__
=head1 NAME
code.pl - splits file by first number by thousands
=head1 SYNOPSIS
code.pl [options] [file ...]
Options:
--help brief help message
--man full documentation
--prefix output filename prefix
--suffix outpit filename suffix
--header number of header lines (default: 2)
=head1 OPTIONS
=over 8
=item B<--help>
Print a brief help message and exits.
=item B<--man>
Prints the manual page and exits.
=back
=head1 DESCRIPTION
B<This program> will read the given input file(s) and do something
useful with the contents thereof.
=cut
Just finish documentation and you can ship it to your colleagues.

The problem you've got is that the diamond operator <> is a piece of special perl magic.
It takes 'all filenames on command line' opens them and processes them in order.
To do what you're trying to do:
my ( $suffix, $filename ) = #ARGV;
open ( my $input, "<", $filename ) or die $!;
Then you can change your while loop to:
while ( <$input> ) {
And modify the output filename according to your desires. The key different there is that it'll only take one filename at that point - first arg is suffix, second is name.
You could perhaps extend this with:
my ( $suffix, #names ) = #ARGV;
And then run a foreach loop:
foreach my $filename ( #names ) {
open .... #etc

Related

PERL script to compare partition usage from today's to yesterday's in different hosts using a CSV file

I am new in perl struggling to build my 1st script but, the partition module doesn't work. More details are commented in the code. Maybe the code should be rewritten using hash and reference but I have no idea how to do it. Can someone please, help me?
#!/usr/bin/perl -w
#This script compares the today's result and yesterday's results between partitions for each host.
#The "df" output is stored in a CSV file. If any partitions have been changed (mounted/unmounted)
#the script must to warn and don't compare the results for that host.
use warnings;
my $file = "/tmp/df.csv";
my $host = `hostname | awk -F'.' '{print \$1}'`;
my $yesterdays_date = `date -d "-1 day" '+%d.%m.%Y'`;
my $todays_date = `date '+%d.%m.%Y'`;
chomp ($host, $yesterdays_date, $todays_date);
open(HANDLE, "$file") or die "Error opening file $file: $!";
my #array = <HANDLE>;
foreach (#array){
#columns = split /,/;
if(/$host/ and /$todays_date/){
my $todays_result = $columns[5];chomp;
#my $todays_partition = $columns[6];chomp;
print "Today\'s disk usage on $host: $columns[5]\n";
#print "$todays_result <<T.Result T.Partition>> $todays_partition"
}
elsif(/$host/ and /$yesterdays_date/){
my $yesterdays_result = $columns[5];chomp;
#my $yesterdays_partition = $columns[6];chomp;
print "Yesterday\'s disk usage on $host: $columns[5]\n";
#print "$yesterdays_result <<Y.Result Y.Partition>> $yesterdays_partition";
}
#Debug: Print differences in mount point (condition must be "ne" instead eq)
#if ($todays_partition eq $yesterdays_partition){
#print "$todays_partition <<Partition equal>> $yesterdays_partition";
#}
#else{
#print "Debug: Host or Date DIFFERENT or NOT FOUND for today and yesterday\n";
#}
#TO DO: print "The diference: $todays_result-$yesterdays_result", "\n";
};
close HANDLE;
The CSV file contains the following lines:
testhost,25.08.2018,100M,0,100M,0,/run/user/0
localhost,01.09.2018,6.7G,1.5G,5.2G,23,/
localhost,01.09.2018,485M,0,485M,0,/dev
localhost,01.09.2018,496M,4.0K,496M,1,/dev/shm
localhost,01.09.2018,496M,6.7M,490M,2,/run
localhost,01.09.2018,496M,0,496M,0,/sys/fs/cgroup
localhost,01.09.2018,497M,110M,387M,23,/boot
localhost,01.09.2018,100M,0,100M,0,/run/user/0
localhost,02.09.2018,6.7G,1.5G,5.2G,23,/
localhost,02.09.2018,485M,0,485M,0,/dev
localhost,02.09.2018,496M,4.0K,496M,1,/dev/shm
localhost,02.09.2018,496M,6.7M,490M,2,/run
localhost,02.09.2018,496M,0,496M,0,/sys/fs/cgroup
localhost,02.09.2018,497M,110M,387M,23,/boot
localhost,02.09.2018,100M,0,100M,0,/run/user/0
Bonus: Help with English grammar :D
You need information from multiple lines, so you will need some variables outside of the loop.
If the records in the CSV are ordered chronologically, you can use the following:
use strict;
use warnings;
use DateTime qw( );
use Sys::Hostname qw( hostname );
use Text::CSV_XS qw( );
my $qfn = "/tmp/df.csv";
open(my $fh, '<', $qfn)
or die("Can't open \"$qfn\": $!\n");
my $target_host = hostname =~ s/\..*//rs; # /
my $today_dt = DateTime->now( time_zone => "local" )->set_time_zone("floating")->truncate( to => "day" );
my $yday_dt = $today_dt->clone->subtract( days => 1 );
my $today = $today_dt->strftime("%d.%m.%Y");
my $yday = $yday_dt ->strftime("%d.%m.%Y");
my $csv = Text::CSV_XS->new({ auto_diag => 2, binary => 1 });
my %yday_usage_by_partition;
while ( my $row = $csv->getline($fh) ) {
my ($host, $date, $partition, $usage) = #$row[0,1,6,5];
next if $host ne $target_host;
if ($date eq $yday) {
$yday_usage_by_partition{$partition} = $usage;
}
elsif ($date eq $today) {
if (!exists($yday_usage_by_partition{$partition})) {
warn("No data for $yday for partition $partition\n");
next;
}
print("$partition: $$yday_usage_by_partition{$partition} -> $usage\n");
}
}

Print a variable which is inside two loops

I couldn't figure it out how to escape this.
I would like to print the variable $rfam_column, which is inside two loops. But I cannot just write the print command right after the place where $rfam_column appears, because I would like to print other things which will be outside the loop and combine them to the printed content.
I would appreciate any advice as to what I'm doing wrong here.
use warnings;
use strict;
my $in;
GetOptions('input' => \$in) or die;
if ( $in ) {
my $input = $ARGV[0] or die;
open (my $fh, '<', $input) or die "Can't open $input $!\n";
chomp (my #db_file = <$fh>);
close $fh;
my #list = grep /RNA/, #db_file;
my $column;
my #column = ();
foreach $column ( #list ) {
my #all_columns = split (/\t/, $column);
my $rfam_column = $all_columns[0];
# insert "|" between RFs
foreach $_ ( $rfam_column ) {
s/^/|/;
}
}
}
print "$rfam_column";
Global symbol "$rfam_column" requires explicit package name at script_vbeta.pl line 90.
Execution of script_vbeta.pl aborted due to compilation errors.
EDITED to include all the code and information of the input--output as suggested:
Input file is a table with n lines vs n columns like this (I extracted a few columns otherwise it would be much long to represent in a line):
RF00001 1302 5S ribosomal RNA
RF00006 1307 Vault RNA
RF00007 1308 U12 minor spliceosomal RNA
RF00008 1309 Hammerhead ribozyme (type III)
Output should be like this:
|RF00001|RF00006|RF00007
And the code (usage: script.pl -i input_file):
use warnings;
use strict;
use Getopt::Long;
Getopt::Long::Configure("pass_through");
my $in;
GetOptions('input' => \$in) or die;
if ( $in ) {
my $input = $ARGV[0] or die;
open (my $fh, '<', $input) or die "Can't open $input $!\n";
chomp (my #db_file = <$fh>);
close $fh;
my #list = grep /RNA/, #db_file;
my $column;
my #column = ();
foreach $column ( #list ) {
my #all_columns = split (/\t/, $column);
my $rfam_column = $all_columns[0];
# insert "|" between RFs
foreach $_ ( $rfam_column ) {
s/^/|/;
}
}
}
print "$rfam_column";
I think you want
if ($in) {
...
my #rfams;
for my $row (#list) {
my #fields = split(/\t/, $row);
my $rfam = $fields[0];
push #rfams, $rfam;
}
my $rfams = join('|', #rfams);
print("$rfams\n");
}
I would like to print other things which will be outside the loop and combine them to the $rfam_column content
You can include anything that is in an outer scope in print. You can just put your print statement inside the inner loop
By the way, I don't know what you mean by
# insert "|" between RFs
foreach $_ ($rfam_column) {
s/^/|/;
}
That is the same as
$rfam_column =~ s/^/|/;
which just adds a pipe | character to the beginning of the string
What is an RF?

What produces the white space in my perl programm?

As the title says, I have a program or better two functions to read and write a file either in an array or to one. But now to the mean reason why I write this: when running my test several times my test program that tests my functions produces more and more white space. Is there somebody that could explain my fail and correct me?
my code
Helper.pm:
#!/usr/bin/env perl
package KconfCtl::Helper;
sub file_to_array($) {
my $file = shift();
my ( $filestream, $string );
my #rray;
open( $filestream, $file ) or die("cant open $file: $!");
#rray = <$filestream>;
close($filestream);
return #rray;
}
sub array_to_file($$;$) {
my #rray = #{ shift() };
my $file = shift();
my $mode = shift();
$mode='>' if not $mode;
my $filestream;
if ( not defined $file ) {
$filestream = STDOUT;
}
else {
open( $filestream, $mode, $file ) or die("cant open $file: $!");
}
my $l = #rray; print $l,"\n";
foreach my $line (#rray) {
print $filestream "$line\n";
}
close($filestream);
}
1;
test_helper.pl:
use KconfCtl::Helper;
use strict;
my #t;
#t= KconfCtl::Helper::file_to_array("kconf.test");
#print #t;
my $t_index=#t;
#t[$t_index]="n";
KconfCtl::Helper::array_to_file(\#t, "kconf.test", ">");
the result after the first:
n
and the 2nd run:
n
n
When you read from a file, the data includes the newline characters at the end of each line. You're not stripping those off, but you are adding an additional newline when you output your data again. That means your file is gaining additional blank lines each time you read and write it
Also, you must always use strict and use warnings 'all' at the top of every Perl script; you should avoid using subroutine prototypes; and you should declare all of your variables as late as possible
Here's a more idiomatic version of your module code which removes the newlines on input using chomp. Note that you don't need the #! line on the module file as it won't be run from the command line, but you my want it on the program file. It's also more normal to export symbols from a module using the Exporter module so that you don't have to qualify the subroutine names by prefixing them with the full package name
use strict;
use warnings 'all';
package KconfCtl::Helper;
sub file_to_array {
my ($file) = #_;
open my $fh, '<', $file or die qq{Can't open "$file" for input: $!}; #'
chomp(my #array = <$fh>);
return #array;
}
sub array_to_file {
my ($array, $file, $mode) = #_;
$mode //= '>';
my $fh;
if ( $file ) {
open $fh, $mode, $file or die qq{Can't open "$file" for output: $!}; #'
}
else {
$fh = \*STDOUT;
}
print $fh $_, "\n" for #$array;
}
1;
and your test program would be like this
#!/usr/bin/env perl
use strict;
use warnings 'all';
use KconfCtl::Helper;
use constant FILE => 'kconf.test';
my #t = KconfCtl::Helper::file_to_array(FILE);
push #t, 'n';
KconfCtl::Helper::array_to_file(\#t, FILE);
When you read in from your file, you need to chomp() the lines, or else the \n at the end of the line is included.
Try this and you'll see what's happening:
use Data::Dumper; ## add this line
sub file_to_array($) {
my $file = shift();
my ( $filestream, $string );
my #rray;
open( $filestream, '<', $file ) or die("cant open $file: $!");
#rray = <$filestream>;
close($filestream);
print Dumper( \#rray ); ### add this line
return #rray;
}
you can add
foreach(#rray){
chomp();
}
into your module to stop this happening.

Output .Resx From .CS using perl script

.CS contains string within double quotes and I am trying to extract these strings into .resx file.
The existing code output the .resx but with only one string whereas .CS file contains more than one strings in quotes.
Can you please provide any reference to achieve this?
use strict;
use warnings;
use File::Find;
use XML::Writer;
use Cwd;
#user input: [Directory]
my $wrkdir = getcwd;
system "attrib -r /s";
print "Processing $wrkdir\n";
find( \&recurse_src_path, $wrkdir );
sub recurse_src_path
{
my $file = $File::Find::name;
my $fname = $_;
my #lines;
my $line;
if ( ( -f $file ) && ( $file =~ /.*\.cs$/i ) )
{
print "..";
open( FILE, $file ) || die "Cannot open $file:\n$!";
while ( $line = <FILE> )
{
if ( $line =~ s/\"(.*?)\"/$1/m )
{
chomp $line;
push( #lines, $line );
my $nl = '0';
my $dataIndent;
my $output = new IO::File(">Test.resx");
#binmode( $output, ":encoding(utf-8)" );
my $writer = XML::Writer->new(
OUTPUT => $output,
DATA_MODE => 1,
DATA_INDENT => 2
);
$writer->xmlDecl("utf-8");
$writer->startTag('root');
foreach my $r ($line)
{
print "$1\n";
$writer->startTag( 'data', name => $_ );
$writer->startTag('value');
$writer->characters($1);
$writer->endTag('value');
$writer->startTag('comment');
$writer->characters($1);
$writer->endTag('comment');
$writer->endTag('data');
}
$writer->endTag('root');
$writer->end;
$output->close();
}
}
close FILE;
}
}
Use the /g regex modifier. For example:
use strict;
use warnings;
my $cs_string = '
// Imagine this is .cs code here
system "attrib -r /s";
print "Processing $wrkdir\n";
find( \&recurse_src_path, $wrkdir );
';
while ($cs_string =~ /\"(.*)\"/g) {
print "Found quoted string: '$1'\n"
}
;
See also: http://perldoc.perl.org/perlrequick.html#Matching-repetitions
You might also want to look at File-Slurp to read your .cs code into a single Perl scalar, trusting that your .cs file is not too large.
Finally combine this with your existing code to get the .resx output format.

How can I delete the last 10 lines of a file in perl

I am taking a total number of line as a user input and then I am deleting those numbers of l ine from the file.
I saw this learn.perl.org/faq/perlfaq5.html#How-do-I-count-the-number-of-lines-in-a-file- and then I tired the below simple logic.
Logic:
Get the Total number of lines
Subtracts it by the numbers entered by user
print the lines
Here is my code :
#!/usr/bin/perl -w
use strict;
open IN, "<", "Delete_line.txt"
or die " Can not open the file $!";
open OUT, ">", "Update_delete_line.txt"
or die "Can not write in the file $!";
my ($total_line, $line, $number, $printed_line);
print"Enter the number of line to be delete\n";
$number = <STDIN>;
while ($line = <IN>) {
$total_line = $.; # Total number of line in the file
}
$printed_line = $total_line - $number;
while ($line = <IN>) {
print OUT $line unless $.== $printed_line;
}
Well, neither i am getting any error in code nor any out put ? why I just don't know.
Can any one give me some suggestion.
A Perl solution that's efficient for large files requires the use of File::ReadBackwards
use File::ReadBackwards qw( );
my $num_lines = 10;
my $qfn = 'file.txt';
my $pos = do {
my $fh = File::ReadBackwards->new($qfn)
or die $!;
$fh->readline() for 1..$num_lines;
$fh->tell()
};
truncate($qfn, $pos)
or die $!;
This does not read the whole file twice (unlike the OP's method).
This does not read the whole file (unlike the Tie::File solutions).
This does not read the whole file into memory.
Yet another way is to use Tie::File
#!/usr/bin/env perl
use strict;
use warnings;
use Tie::File;
tie my #lines, 'Tie::File', 'myfile' or die "$!\n";
$#lines -= 10;
untie #lines;
This has the advantage of not loading the file into memory while acting like it does.
Here a solution that passes through a stream and prints all but the last n lines where n is a command line argument:
#!/usr/bin/perl
my #cache;
my $n = shift #ARGV;
while(<>) {
push #cache, $_;
print shift #cache if #cache > $n;
}
or the one-liner version:
perl -ne'BEGIN{$n=shift#ARGV}push#c,$_;print shift#c if#c>$n' NUMBER
After finishing reading from IN, you have to reopen it or seek IN, 0, 0 to reset its position. You also have to set $. to zero again.
Also, the final condition should be changed to unless $. > $printed_line so you skip all the lines over the threshold.
The "more fun" answer: use Tie::File!
use strict;
use warnings;
use Tie::File;
tie my #file, 'Tie::File', 'filename' or die "$!";
$#file -= 10;
Just read the file in reverse and delete the first n lines: -
open my $filehandle, "<", "info.txt";
my #file = <$filehandle>;
splice(#file, -10);
print #file;
Note: This loads the entire file into memory.
You could just buffer the last 10 lines and then not print out the remaining 10.
use English qw<$INPLACE_EDIT>;
{ local #ARGV = $name_of_file_to_edit;
local $INPLACE_EDIT = '.bak';
my #buffer;
for ( 1..$num_lines_to_trim ) {
push #buffer, <>;
}
while ( <> ) {
print shift #buffer;
push #buffer, $_;
}
}
You could also do this with File::Slurp::edit_file_lines:
my #buffer;
my $limit_reached = 0;
edit_file_lines {
push #buffer, $_;
return ( $limit_reached ||= #buffer > $num_lines_to_trim ) ? shift #buffer
: ''
;
} $name_of_file;
my $num_lines = 10;
my $qfn = 'file.txt';
system('head', '-n', -$num_lines, '--', $qfn);
die "Error" if $?;
Easy with a C like for :
#!/usr/bin/perl -w
use strict;
open(my $in,"<","Delete_line.txt") or die "Can not open the file $!";
open(my $out,">","Update_delete_line.txt") or die"Can not write in the file $!";
print"Enter the number of lines to be delete\n";
my $number=<STDIN>;
my #file = <$in>;
for (my $i = 0; $i < $#file - $number + 1; $i++) {
print $out $file[$i];
}
close $in;
close $out;
#
# Reads a file trims the top and the bottom of by passed num of lines
# and return the string
# stolen from : http://stackoverflow.com/a/9330343/65706
# usage :
# my $StrCatFile = $objFileHandler->ReadFileReturnTrimmedStrAtTopBottom (
# $FileToCat , $NumOfRowsToRemoveAtTop , $NumOfRowsToRemoveAtBottom) ;
sub ReadFileReturnTrimmedStrAtTopBottom {
my $self = shift ;
my $file = shift ;
my $NumOfLinesToRemoveAtTop = shift ;
my $NumOfLinesToRemoveAtBottom = shift ;
my #cache ;
my $StrTmp = () ;
my $StrReturn = () ;
my $fh = () ;
open($fh, "<", "$file") or cluck ( "can't open file : $file for reading: $!" ) ;
my $counter = 0;
while (<$fh>) {
if ($. >= $NumOfLinesToRemoveAtTop + 1) {
$StrTmp .= $_ ;
}
}
close $fh;
my $sh = () ;
open( $sh, "<", \$StrTmp) or cluck( "can't open string : $StrTmp for reading: $!" ) ;
while(<$sh>) {
push ( #cache, $_ ) ;
$StrReturn .= shift #cache if #cache > $NumOfLinesToRemoveAtBottom;
}
close $sh ;
return $StrReturn ;
}
#eof ReadFileReturnTrimmedStrAtTopBottom
#