How to group the values in foreach by if condition? - perl

My script like this
use warnings;
use strict;
my #ar = <DATA>;
for(my $i = 0; $i<=$#ar; $i++){
$ar[$i] =~m/(\d+)$/g;
print "$ar[$i]\n" if ($& <= 15);
print "$ar[$i]\n" if ($& >100);
print "$ar[$i]\n" if ($& <40 && $& > 15);
}
__DATA__
hinsa 121
mkzin 12
mkva 34
mvakine 2
mzkev 9
mkvvz 5
mkhvzz 35
It gives the outputs but it is not group the value by if condition. and I also try this
#ar = <DATA>;
for(my $i = 0; $i<=$#ar; $i++){
$ar[$i] =~m/(\d+)$/g;
print "$ar[$i]\n" if ($& <= 15);
}
for(my $v = 0; $v<=$#ar; $v++){
$ar[$v] =~m/(\d+)$/g;
print "$ar[$v]\n" if ($& >100);
}
for(my $z = 0; $z<=$#ar; $z++){
$ar[$z] =~m/(\d+)$/g;
print "$ar[$z]\n" if ($& <40 && $& > 15);
}
In this code the second for condition is not working.
It gives the output:
mkzin 12
mvakine 2
mzkev 9
mkvvz 5
mkva 34
mkhvzz 35
I expect output is
mkzin 12
mvakine 2
mzkev 9
mkvvz 5
hisa 121
mkva 34
mkhvzz 35
How can i do it?
And also please explain, In my script 2 why the second foreach condition is not working?

#Hussain: When you write a perl code make sure that you use use strict; and use warnings;. I have modified your perl code and the problem with your code is you are trying to compare uninitialized $& value with a number. So it will throw a warning saying use of uninitialized $& in numeric gt (>) at so and so. For that i have modified with a scalar variable as shown below:
Input File(test.txt):
hinsa 121
mkzin 12
mkva 34
mvakine 2
mzkev 9
mkvvz 5
mkhvzz 35
Code:
use strict;
use warnings;
#Pass test.txt as an argument to the program
my $file = $ARGV[0];
open (my $fh, "<", $file) || die "cant open file";
my #ar = <$fh>;
for(my $i = 0; $i<=$#ar; $i++){
my $temp = 0;
($temp) = $ar[$i] =~ m/(\d+)/g;
print "$ar[$i]\n" if ($temp <= 15);
}
for(my $v = 0; $v<=$#ar; $v++){
my $temp = 0;
($temp) = $ar[$v] =~ m/(\d+)/g;
print "$ar[$v]\n" if ($temp > 100);
}
for(my $z = 0; $z<=$#ar; $z++){
my $temp = 0;
($temp) = $ar[$z] =~ m/(\d+)/g;
print "$ar[$z]\n" if ($temp <40 && $temp > 15);
}
close($fh);
Output:
mkzin 12
mvakine 2
mzkev 9
mkvvz 5
hisa 121
mkva 34
mkhvzz 35

There is no need for such convoluted code.
This program works by saving each line of the file into the appropriate element of array #groups, and printing the contents once the file has been read.
I hope you realise that lines with a value between 40 and 100 won't be printed at all?
use strict;
use warnings;
my #groups;
while (<DATA>) {
next unless /(\d+)/;
my $i;
$i = 0 if $1 <= 15;
$i = 1 if $1 > 100;
$i = 2 if $1 < 40 and $1 > 15;
push #{ $groups[$i] }, $_ if defined $i;
}
for (#groups) {
print for #$_;
print "\n";
}
__DATA__
hinsa 121
mkzin 12
mkva 34
mvakine 2
mzkev 9
mkvvz 5
mkhvzz 35
output
mkzin 12
mvakine 2
mzkev 9
mkvvz 5
hinsa 121
mkva 34
mkhvzz 35

Related

Parsing a file by summing up different columns of each row separated by blank line

I have a file input as below;
#
volume stats
start_time 1
length 2
--------
ID
0x00a,1,2,3,4
0x00b,11,12,13,14
0x00c,21,22,23,24
volume stats
start_time 2
length 2
--------
ID
0x00a,31,32,33,34
0x00b,41,42,43,44
0x00c,51,52,53,54
volume stats
start_time 3
length 2
--------
ID
0x00a,61,62,63,64
0x00b,71,72,73,74
0x00c,81,82,83,84
#
I need output in below format;
1 33 36 39 42
2 123 126 129 132
3 213 216 219 222
#
Below is my code;
#!/usr/bin/perl
use strict;
use warnings;
#use File::Find;
# Define file names and its location
my $input = $ARGV[0];
# Grab the vols stats for different intervals
open (INFILE,"$input") or die "Could not open sample.txt: $!";
my $date_time;
my $length;
my $col_1;
my $col_2;
my $col_3;
my $col_4;
foreach my $line (<INFILE>)
{
if ($line =~ m/start/)
{
my #date_fields = split(/ /,$line);
$date_time = $date_fields[1];
}
if ($line =~ m/length/i)
{
my #length_fields = split(/ /,$line);
$length = $length_fields[1];
}
if ($line =~ m/0[xX][0-9a-fA-F]+/)
{
my #volume_fields = split(/,/,$line);
$col_1 += $volume_fields[1];
$col_2 += $volume_fields[2];
$col_3 += $volume_fields[3];
$col_4 += $volume_fields[4];
#print "$col_1\n";
}
if ($line =~ /^$/)
{
print "$date_time $col_1 $col_2 $col_3 $col_4\n";
$col_1=0;$col_2=0;$col_3=0;$col_4=0;
}
}
close (INFILE);
#
my code result is;
1
33 36 39 42
2
123 126 129 132
#
BAsically, for each time interval, it just sums up the columns for all the lines and displays all the columns against each time interval.
$/ is your friend here. Try setting it to '' to enable paragraph mode (separating your data by blank lines).
#!/usr/bin/env perl
use strict;
use warnings;
local $/ = '';
while ( <> ) {
my ( $start ) = m/start_time\s+(\d+)/;
my ( $length ) = m/length\s+(\d+)/;
my #row_sum;
for ( m/(0x.*)/g ) {
my ( $key, #values ) = split /,/;
for my $index ( 0..$#values ) {
$row_sum[$index] += $values[$index];
}
}
print join ( "\t", $start, #row_sum ), "\n";
}
Output:
1 33 36 39 42
2 123 126 129 132
3 213 216 219 222
NB - using tab stops for output. Can use sprintf if you need more flexible options.
I would also suggest that instead of:
my $input = $ARGV[0];
open (my $input_fh, '<', $input) or die "Could not open $input: $!";
You would be better off with:
while ( <> ) {
Because <> is the magic filehandle in perl, that - opens files specified on command line, and reads them one at a time, and if there isn't one, reads STDIN. This is just like how grep/sed/awk do it.
So you can still run this with scriptname.pl sample.txt or you can do curl http://somewebserver/sample.txt | scriptname.pl or scriptname.pl sample.txt anothersample.txt moresample.txt
Also - if you want to open the file yourself, you're better off using lexical vars and 3 arg open:
open ( my $input_fh, '<', $ARGV[0] ) or die $!;
And you really shouldn't ever be using 'numbered' variables like $col_1 etc. If there's numbers, then an array is almost always better.
Basically, a block begins with start_time and ends with a line of of whitespace. If instead end of block is always assured to be an empty line, you can change the test below.
It helps to use arrays instead of variables with integer suffixes.
When you hit the start of a new block, record the start_time value. When you hit a stat line, update column sums, and when you hit a line of whitespace, print the column sums, and clear them.
This way, you keep your program's memory footprint proportional to the longest line of input as apposed to the largest block of input. In this case, there isn't a huge difference, but, in real life, there can be. Your original program was reading the entire file into memory as a list of lines which would really cause your program's memory footprint to balloon when used with large input sizes.
#!/usr/bin/env perl
use strict;
use warnings;
my $start_time;
my #cols;
while (my $line = <DATA>) {
if ( $line =~ /^start_time \s+ ([0-9]+)/x) {
$start_time = $1;
}
elsif ( $line =~ /^0x/ ) {
my ($id, #vals) = split /,/, $line;
for my $i (0 .. $#vals) {
$cols[ $i ] += $vals[ $i ];
}
}
elsif ( !($line =~ /\S/) ) {
# guard against the possibility of
# multiple blank/whitespace lines between records
if ( #cols ) {
print join("\t", $start_time, #cols), "\n";
#cols = ();
}
}
}
# in case there is no blank/whitespace line after last record
if ( #cols ) {
print join("\t", $start_time, #cols), "\n";
}
__DATA__
volume stats
start_time 1
length 2
--------
ID
0x00a,1,2,3,4
0x00b,11,12,13,14
0x00c,21,22,23,24
volume stats
start_time 2
length 2
--------
ID
0x00a,31,32,33,34
0x00b,41,42,43,44
0x00c,51,52,53,54
volume stats
start_time 3
length 2
--------
ID
0x00a,61,62,63,64
0x00b,71,72,73,74
0x00c,81,82,83,84
Output:
1 33 36 39 42
2 123 126 129 132
3 213 216 219 222
When I run your code, I get warnings:
Use of uninitialized value $date_time in concatenation (.) or string
I fixed it by using \s+ instead of / /.
I also added a print after your loop in case the file does not end with a blank line.
Here is minimally-changed code to produce your desired output:
use strict;
use warnings;
# Define file names and its location
my $input = $ARGV[0];
# Grab the vols stats for different intervals
open (INFILE,"$input") or die "Could not open sample.txt: $!";
my $date_time;
my $length;
my $col_1;
my $col_2;
my $col_3;
my $col_4;
foreach my $line (<INFILE>)
{
if ($line =~ m/start/)
{
my #date_fields = split(/\s+/,$line);
$date_time = $date_fields[1];
}
if ($line =~ m/length/i)
{
my #length_fields = split(/\s+/,$line);
$length = $length_fields[1];
}
if ($line =~ m/0[xX][0-9a-fA-F]+/)
{
my #volume_fields = split(/,/,$line);
$col_1 += $volume_fields[1];
$col_2 += $volume_fields[2];
$col_3 += $volume_fields[3];
$col_4 += $volume_fields[4];
}
if ($line =~ /^$/)
{
print "$date_time $col_1 $col_2 $col_3 $col_4\n";
$col_1=0;$col_2=0;$col_3=0;$col_4=0;
}
}
print "$date_time $col_1 $col_2 $col_3 $col_4\n";
close (INFILE);
__END__
1 33 36 39 42
2 123 126 129 132
3 213 216 219 222

perl script to delete lines from file

I have a file that I want to read it and for each 'word' found to delete the next 2 lines including the line with the 'word'.
the structure of the file is somekind line this:
1
2
3
word
321
3213
412
word
132
1231
this is what I have until now:
open FILE, "$localDir\\x.txt" or die $!;
#fileLines = <FILE>;
close FILE;
$output = 'y.txt';
open my $outfile, '>', $output or die "Can't write to $output: $!";
for ($i = 0; $i < scalar(#fileLines); $i++) {
next if ($fileLines[$i] =~ /'word/);
print $outfile $_ ;
}
thanks
I'd do it something like this:
#!/usr/bin/env perl
use strict;
use warnings;
#iterate one line at a time.
while ( <DATA> ) {
#if we hit the delimiter, read and discard two more line.
if ( m/word/ ) { <DATA>; <DATA> ; }
#otherwise print it.
else { print; };
}
__DATA__
1
2
3
word
321
3213
412
word
132
1231
Which gives:
1
2
3
412
The excellent Tie::File module often seems to be forgotten. It is ideal for this sort of thing
use strict;
use warnings;
use File::Copy qw/ copy /;
use Tie::File;
my $local_dir = '.';
copy "$local_dir/x.txt", 'y.txt';
tie my #file, 'Tie::File', 'y.txt';
for ( my $i = 0; $i < #file; ) {
if ( $file[$i] eq 'word' ) {
splice #file, $i, 3;
}
else {
++$i;
}
}
output
1
2
3
412

Transposing the matrix in perl

I am trying to perform a transpose on a data contained in a file. The data is as follows:
1 2 3 4 5
2 3 4 5 6
4 5 6 7 9
4 3 7 6 9
I am getting the result as follows which is incorrect. I am not getting the error in the code due to which the last column is not transposed properly. Any solution...
Code:
#!/usr/bin/perl
use strict;
use warnings;
my #dependent; # matrix of dependent variable
# Reading the data from text file to the matrix
open( DATA, "<example.txt" ) or die "Couldn't open file , $!"; #depenedent
# Storing data into the array in matrix form
while ( my $linedata = <DATA> ) {
push #dependent, [ split '\t', $linedata ];
}
my $m = #dependent;
#print "$m\n";
my $n = #{ $dependent[1] };
#print $n;
#print "Matrix of dependent variables Y \n";
for ( my $i = 0; $i < $m; $i++ ) {
for ( my $j = 0; $j < $n; $j++ ) {
#print $dependent[$i][$j]," ";
}
#print "\n";
}
my #transpose;
for ( my $i = 0; $i < $n; $i++ ) {
for ( my $j = 0; $j < $m; $j++ ) {
$transpose[$i][$j] = $dependent[$j][$i];
}
}
for ( my $i = 0; $i < $n; $i++ ) {
for ( my $j = 0; $j < $m; $j++ ) {
print $transpose[$i][$j], " ";
}
print "\n";
}
chomp your data when you read it, before you split it; your strange output is caused by the last element of each row of the input still having a newline attached.
Just as a side note, DATA isn't a very good name to pick for a filehandle; perl already defines a special builtin filehandle named DATA for reading data that's embedded in a script or a module, so using that name for yourself can lead to confusion :)

How do print second column elements in row separated by comma(,) if the first element of column is same

The input what I am handling is as follows.
Q9NRG9 15
Q9NRG9 160
Q9NRG9 56
Q9NRG9 89
Q16613 26
Q16613 63
Q16613 102
O95477 19
O95477 91
O95477 78
O95477 86
O95477 16
O95477 203
O95477 66
P78363 18
P78363 159
P78363 88
I want output as
Q9NRG9 15,160,56,89
Q16613 26,63,102
O95477 78,86,16,203,66
I tried with perl program, but I couldn't get correct output what I want.
Using perl from the command line:
perl -lane '
push #{ $h{$F[0]} }, $F[1]
}{
$" = ",";
print "$_ #{ $h{$_} }" for keys %h
' file
O95477 19,91,78,86,16,203,66
Q9NRG9 15,160,56,89
P78363 18,159,88
Q16613 26,63,102
To maintain the order, you can do:
perl -lane '
$k{$F[0]}++ or push #r, $F[0];
push #{ $h{$F[0]} }, $F[1]
}{
$" = ",";
print "$_ #{ $h{$_} }" for #r
' file
Try this:
open (FILE, "text.txt") or die "cannot open file".$!;
my %data;
while(<FILE>){
chomp($_);
my ($key, $value) = split(/\s+/,$_);
push(#{$data{$key}}, $value);
}
foreach (keys %data){
print $_." ".join(",",#{$data{$_}})."\n";
}

perl simple matching yet not matching it

in blah.txt:
/a/b/c-test
in blah.pl
1 my #dirs;
2 $ws = '/a/b/c-test/blah/blah'; <--- trying to match this
3 sub blah{
4 my $err;
5 open(my $fh, "<", "blah.txt") or $err = "catn do it\n";
6 if ($err) {
7 print $err;
8 return;
9 } else {
10 while(<$fh>){
11 chomp;
12 push #dirs, $_;
13 }
14 }
15 close $fh;
16 print "successful\n";
17 }
18
19
20 blah();
21
22 foreach (#dirs) {
23 print "$_\n"; #/a/b/c-test
24 if ($_ =~ /$ws/ ) { <--- didnt match it
25 print "GOT IT!\n";
26 } else {
27 print "didnt get it\n";
28 }
29 }
~
perl blah.pl
successful
/a/b/c-test
didnt get it
I am not quite sure why it is not matching.
Anyone know?
Consider,
if ($ws =~ /$_/ ) {
instead of,
if ($_ =~ /$ws/ ) {
as /a/b/c-test/blah/blah contains /a/b/c-test string, not otherwise.
As a side notes:
use at least strict and warnings
read and process file in while() loop instead of filling array first
if you must fill array, use my #dirs = <$fh>; chomp(#dirs);