grep tab separated string in perl - perl

I am trying to grep tab separated numbers (eg 1\t3) in an array something like
#data=
1 3
2 3
1 3
3 3
the idea behind the code is something like this
#!usr/bin/perl
use strict;
use warnings;
my #data = ( "1\t3", "2\t3", "1\t3", "3\t3", );
for (my $i=0;$i<4;$i++) {
for (my $j=0;$j<4_size;$j++) {
my $pattern= "$i\t$j";
my #count=grep(/$pattern/,#data);
undef $pattern;
print "$pattern\tcount\n";
}
}
hoping for output something like
1st and second column: pattern
3nd column : count of total matches
1 1
1 2
1 3 2
2 1
2 3 1
3 1
3 2
3 3 1
but the output is null for some reasons,
I am recently learnt and finding it very intriguing.
any suggestions?

The code below:
Does not crash if input contains unexpected characters (e.g., '(')
Only counts exact matches for the sequences of digits on either side of "\t".
Matches lines that might have been read from a file or __DATA__ section without using chomp using \R.
--
#!/usr/bin/env perl
use strict;
use warnings;
my #data = ( "1\t3", "2\t3", "(\t4", "1\t3", "3\t3", "11\t3" );
for my $i (1 .. 3) {
for my $j (1 .. 3) {
my $pattern = "$i\t$j";
my $count = grep /\A\Q$pattern\E\R?\z/, #data;
print join("\t", $pattern, $count ? $count : ''), "\n";
}
}
Output:
1 1
1 2
1 3 2
2 1
2 2
2 3 1
3 1
3 2
3 3 1

You almost got it. Here is a working version:
#!usr/bin/perl
use strict;
use warnings;
my #data = ( "1\t3", "2\t3", "1\t3", "3\t3", );
foreach my $i (1 .. 3) {
foreach my $j (1 .. 3) {
my $pattern = "$i\t$j";
my $count = grep(/$pattern/, #data);
print $pattern . ($count ? "\t$count\n" : "\n");
}
}

Related

Split distinct numbers, count occurance and return percents of each

I would like to split a user input of numbers and split the numbers into a list in perl and return their count and percents sorted by numbers splitted:
For instance users inputs in stdin: 112219992221474774
Num 1 are 4 with 22.22%
Num 2 are 5 with 27.77%
Num 4 are 3 with 16.67%
Num 7 are 3 with 16.67%
Num 9 are 3 with 16.67%
What I have done so far?
use strict;
use warnings;
sub main {
print ("Enter the numbers: ");
chomp(my $num = <STDIN>);
my #Array = split (//, $num);
my %numbers;
$numbers{$_}++ for split / /, $Array[0];
my $total;
while (my ($k, $v) = each %numbers){
$total += $v;
print "Num $k are $v with $v/$total %\n";
}
}
main();
When I run this I get:
Num 1 are 1 with 1/1 %
Any clue why is so?
You split the input string correctly using an empty string //, what returns the list of characters in that string; in this case the digits you need. Why then split the first element of #Array again, and by space? That results in your %numbers being (1 => 1).
As the split // returned the list of digits just count them.
use warnings;
use strict;
use feature 'say';
while (1) {
print "Enter the number: ";
my $num = <STDIN>;
chomp $num;
if ($num =~ /[^0-9]/) {
say "Non-digit(s) in input $num. Please try again.";
next;
}
my #digits = split //, $num; #/
my %freq;
++$freq{$_} for #digits;
for (sort keys %freq) {
say "Num are $freq{$_} with ", (sprintf "%.2f%%", ($freq{$_}/#digits)*100)
}
}
Quit with Ctrl-C. With input of 112219992221474774 the output is
Num 1 are 4 with 22.22%
Num 2 are 5 with 27.78%
Num 4 are 3 with 16.67%
Num 7 are 3 with 16.67%
Num 9 are 3 with 16.67%

Perl Loop Conceptualization

I'm having some trouble trying to figure out how to attack this problem. I have a file that looks like :
1 1 1 1 1 1 2 a
1 1 1 3 4 4 4 a
1 1 1 4 4 4 2 a
2 2 2 3 3 3 2 b
2 2 2 1 1 1 1 b
Which, I want to count how many times the first 3 numbers repeat consecutively. As in I load the data set into perl, and it will spit out:
3
2
as the string '1 1 1' was found at the start of a line 3 times in a row, and the string '2 2 2' was found to begin a line 2 times in a row. Either string may appear later in the file, which complicates things. As well, I will not know what the beginning string is either, and it varies.
Honestly, I'm just really confused on how to attack this. If anyone can help conceptualize this/ give me some pseudo-code to help, that would be great.
EDIT: As per Borodins answer, which works perfectly for what I asked. However, if I wanted to print the count plus which letter it is, How could I do that? So far,
my ( $inp, $outp) = qw / OUT2 OUTFILE/;
open my $input, '<', $inp or die;
open my $output, '>', $outp or die;
my ($last_key, $count);
while ( <$input> ) {
my $key = join ' ', (split)[0..2];
my $id = join ' ', (split)[7];
if ( defined $last_key and $key eq $last_key ) {
++$count;
}
else {
printf "%s %d $id\n", $last_key, $count if defined $last_key;
$last_key = $key;
$count = 1;
}
printf "%s %d $id\n", $last_key, $count if eof;
}
Which gives :
1 1 1 3 b
2 2 2 2 b
which isn't quite what I'm aiming for.
Thanks!
EDIT2
Got what I wanted working. Oftentimes, all it takes is asking for help to figure it out yourself.
Updated code:
my ( $inp, $outp) = qw / OUT2 OUTFILE/;
open my $input, '<', $inp or die;
open my $output, '>', $outp or die;
my ($last_key, $count, $last_id);
while ( <$input> ) {
my $key = join ' ', (split)[0..2];
my $id = join ' ', (split)[7];
if ( defined $last_key and $key eq $last_key ) {
++$count;
}
else {
printf "%s %d $last_id\n", $last_key, $count if defined $last_key;
$last_key = $key;
$count = 1;
$last_id = $id;
}
printf "%s %d $id\n", $last_key, $count if eof;
}
on:
1 1 1 1 1 1 2 a
1 1 1 3 4 4 4 a
1 1 1 4 4 4 2 a
2 2 2 3 3 3 2 b
2 2 2 1 1 1 1 b
3 3 3 2 5 4 2 c
gives:
1 1 1 3 a
2 2 2 2 b
3 3 3 1 c
Thanks all!
You just have to keep track of the previous line, or at least the relevant part of it, and how many times it was seen:
use strict;
use warnings;
my $count = 0;
my $last_prefix = '';
my $last_value = '';
while (my $line = <>) {
my ($prefix, $value) = $line =~ /^(\S+\s+\S+\s+\S+).*(\S+)/ or die "malformed line $line";
if ($prefix ne $last_prefix) {
if ($count) {
print "$count $last_value\n";
}
$last_prefix = $prefix;
$last_value = $value;
$count = 0;
}
++$count;
}
if ($count) {
print "$count $last_value\n";
}
This is just a matter of forming a key from the first three fields and counting the number of times they occur, printing a line of output whenever the key changes or the end of the file is reached
use strict;
use warnings;
my ($last_key, $count);
while ( <DATA> ) {
my $key = join ' ', (split)[0..2];
if ( defined $last_key and $key eq $last_key ) {
++$count;
}
else {
printf "%s -> %d\n", $last_key, $count if defined $last_key;
$last_key = $key;
$count = 1;
}
printf "%s -> %d\n", $last_key, $count if eof;
}
__DATA__
1 1 1 1 1 1 2 a
1 1 1 3 4 4 4 a
1 1 1 4 4 4 2 a
2 2 2 3 3 3 2 b
2 2 2 1 1 1 1 b
3 3 3 1 1 1 1 c
output
1 1 1 -> 3
2 2 2 -> 2
3 3 3 -> 1
Update
To include the final column in the output data, just change
my $key = join ' ', (split)[0..2]
to
my $key = join ' ', (split)[0..2,-1]
output
1 1 1 a -> 3
2 2 2 b -> 2
3 3 3 c -> 1
Here is one way of doing it:
# Open file and loop through lines
open (INFH, '<', "num.txt");
my $count = 0;
my $str;
my %countHash;
while(<INFH>){
# split the line using space characters to get first three numbers
my #numArray = split(' ', $_);
#Concatenating first three numbers as a string to use as key
$key = "$numArray[0]" . "$numArray[1]" . "$numArray[2]";
#If the combination exists, update the value by adding 1. Else add new
if (!exists $countHash{$key}){
$countHash{$key} = 1;
}else{
$countHash{$key} += 1;
}
}
print %countHash;
I will update if I can make it any better.

Perl find the elements that appears once in an array

Given an array of elements, how to find the element that occurs once only in that array:
my #array = qw(18 1 18 3 18 1 1 2 3 3);
result should be: 2
This is a variation on perlfaq5 - How can I remove duplicate elements from a list or array?
Just use a hash to count the elements, and then print the ones seen only once.
use strict;
use warnings;
my #array = qw(18 1 18 3 18 1 1 2 3 3);
my #nondup = do {
my %count;
$count{$_}++ for #array;
grep {$count{$_} == 1} keys %count;
};
print "#nondup\n";
Outputs:
2
You can also try this in simple way.
use strict;
use warnings;
my #array = qw(7 8 7 5 18 1 18 3 18 1 1 2 3 3 4 5 6 7);
my $tm = "";
my %hash=();
foreach $tm(#array){
if(exists $hash{$tm}){
$hash{$tm} = "";
}
else{
$hash{$tm} = "$tm";
}
}
print join ("\n", values %hash);exit;

How can I organize this data using Perl?

I am new to Perl. I have an input file such as:
a 7 5
b 8 2
a 3 2
b 4 1
c 6 1
How can I get output like
column_1_val, number_occurrence_column_1, sum_of_column_2, sum_of_column_3
For example
a 2 10 7
b 2 12 3
c 1 6 1
The program below is a possible solution. I have used the DATA file handle whereas you will presumably need to open an external file and use the handle from that.
use strict;
use warnings;
use feature 'say';
my %data;
while (<DATA>) {
my ($key, #vals) = split;
$data{$key}[0]++;
my $i;
$data{$key}[++$i] += $_ for #vals;
}
say join ' ', $_, #{$data{$_}} for sort keys %data;
__DATA__
a 7 5
b 8 2
a 3 2
b 4 1
c 6 1
output
a 2 10 7
b 2 12 3
c 1 6 1
That would be something like (untested):
while (<>) {
if (m/(\w+)\s+(\d+)\s+(\d+)/) {
($n, $r1, $r2) = ($1, $2, $3);
$nr{$n}++;
$r1{$n} += $r1;
$r2{$n} += $r2;
}
}
for $n (sort keys %nr) {
print "$n $nr{$n} $r1{$n} $r2{$n}\n";
}
This is a very quick-and-dirty way of doing what you described, but it should get you on your way.
Even i am not aware of perl.But in case you are concerned with the result.the below is the solution in awk.It might /might not help you.but in case u need it :
awk '{c[$1]++;a[$1]=a[$1]+$2;b[$1]+=$3}END{for(i in a)print i,c[i],a[i],b[i]}' file3
A slightly different take:
my %records;
while ( <> ) {
my #cols = split ' ';
my $rec = $records{ $cols[0] } ||= {};
$rec->{number_occurrence_column_1}++;
$rec->{sum_of_column_2} += $cols[1];
$rec->{sum_of_column_3} += $cols[2];
}
foreach my $rec ( map { { col1 => $_, %{ $records{ $_ } } }
sort keys %records
) {
print join( "\t"
, #$rec{ qw<col1 number_occurrence_column_1
sum_of_column_2 sum_of_column_3
>
}
), "\n"
;
}

Going out of loop Perl

I have two arrays, I am evaluating the values of one array with other. What i have done is
#array_x= qw(1 5 3 4 6);
#array_y= qw(-3 4 2 1 3);
foreach $x (#array_x){
foreach $y (#array_y){
if ($x-$y > 0){
next;
}
print "$x\n";
}
}
Here, problem is , in array_x, its first index i.e 1-(-3)=4, it satisfies, but next 1-4=-3 is not satisfying the condition, hence it should break the loop and go for next element of array_x. Here only 5 and 6 satisfies the condition with all elements of array_y, so i should get only 5,6 in the output.
Here is your loops with labels so you can break to the outer level:
XVALUE:
foreach $x (#array_x){
YVALUE:
foreach $y (#array_y){
if ($x-$y > 0){
next XVALUE;
}
print "$x\n";
}
}
You can label each loop and exit the one you want. See perldoc last
E.g.:
LINE: while (<STDIN>) {
last LINE if /^$/; # exit when done with header
#...
}
If the intention is to just find the elements which are greater than the element in the subsequent list, the following would find it in 1 iteration of each array.
use strict;
my #array_x= qw(1 5 3 4 6);
my #array_y= qw(-3 4 2 1 3);
my $max_y = $array_y[0];
foreach my $y (#array_y) {
$max_y = $y if $y > $max_y;
}
foreach my $x (#array_x) {
print "\nX=$x" if $x > $max_y;
}
Output:
X=5
X=6
Not really sure what is your need, but is this what you want?
#!/usr/bin/perl
use Modern::Perl;
my #array_x= qw(1 5 3 4 6);
my #array_y= qw(-3 4 2 1 3);
foreach my $x(#array_x){
my $OK=1;
foreach my $y(#array_y){
next if $x > $y;
$OK=0;
last;
}
say "x=$x" if $OK;
}
output:
x=5
x=6
I think you might want to rethink your method. You want to find all values in #x which are greater than all in #y. You shouldn't loop over all #y each time, you should find the max of it, then filter on the max.
use strict;
use warnings;
use List::Util 'max';
my #x= qw(1 5 3 4 6);
my #y= qw(-3 4 2 1 3);
my $ymax = max #y;
my #x_result = grep { $_ > $ymax } #x;
Or since I am crazy about the new state keyword:
use strict;
use warnings;
use 5.10.0;
use List::Util 'max';
my #x= qw(1 5 3 4 6);
my #y= qw(-3 4 2 1 3);
my #x_result = grep { state $ymax = max #y; $_ > $ymax } #x;
Edit: on re-reading previous answers, this is the same concept as angel_007, though I think this implementation is more self-documenting/readable.
Revised answer:
#!/usr/bin/perl
use strict;
use warnings;
my #array_x= qw(1 5 3 4 6);
my #array_y= qw(-3 4 2 1 3);
LABEL: for my $x (#array_x) {
for my $y (#array_y) {
next LABEL unless $x > $y;
}
print "$x\n";
}