I'm having some trouble trying to figure out how to attack this problem. I have a file that looks like :
1 1 1 1 1 1 2 a
1 1 1 3 4 4 4 a
1 1 1 4 4 4 2 a
2 2 2 3 3 3 2 b
2 2 2 1 1 1 1 b
Which, I want to count how many times the first 3 numbers repeat consecutively. As in I load the data set into perl, and it will spit out:
3
2
as the string '1 1 1' was found at the start of a line 3 times in a row, and the string '2 2 2' was found to begin a line 2 times in a row. Either string may appear later in the file, which complicates things. As well, I will not know what the beginning string is either, and it varies.
Honestly, I'm just really confused on how to attack this. If anyone can help conceptualize this/ give me some pseudo-code to help, that would be great.
EDIT: As per Borodins answer, which works perfectly for what I asked. However, if I wanted to print the count plus which letter it is, How could I do that? So far,
my ( $inp, $outp) = qw / OUT2 OUTFILE/;
open my $input, '<', $inp or die;
open my $output, '>', $outp or die;
my ($last_key, $count);
while ( <$input> ) {
my $key = join ' ', (split)[0..2];
my $id = join ' ', (split)[7];
if ( defined $last_key and $key eq $last_key ) {
++$count;
}
else {
printf "%s %d $id\n", $last_key, $count if defined $last_key;
$last_key = $key;
$count = 1;
}
printf "%s %d $id\n", $last_key, $count if eof;
}
Which gives :
1 1 1 3 b
2 2 2 2 b
which isn't quite what I'm aiming for.
Thanks!
EDIT2
Got what I wanted working. Oftentimes, all it takes is asking for help to figure it out yourself.
Updated code:
my ( $inp, $outp) = qw / OUT2 OUTFILE/;
open my $input, '<', $inp or die;
open my $output, '>', $outp or die;
my ($last_key, $count, $last_id);
while ( <$input> ) {
my $key = join ' ', (split)[0..2];
my $id = join ' ', (split)[7];
if ( defined $last_key and $key eq $last_key ) {
++$count;
}
else {
printf "%s %d $last_id\n", $last_key, $count if defined $last_key;
$last_key = $key;
$count = 1;
$last_id = $id;
}
printf "%s %d $id\n", $last_key, $count if eof;
}
on:
1 1 1 1 1 1 2 a
1 1 1 3 4 4 4 a
1 1 1 4 4 4 2 a
2 2 2 3 3 3 2 b
2 2 2 1 1 1 1 b
3 3 3 2 5 4 2 c
gives:
1 1 1 3 a
2 2 2 2 b
3 3 3 1 c
Thanks all!
You just have to keep track of the previous line, or at least the relevant part of it, and how many times it was seen:
use strict;
use warnings;
my $count = 0;
my $last_prefix = '';
my $last_value = '';
while (my $line = <>) {
my ($prefix, $value) = $line =~ /^(\S+\s+\S+\s+\S+).*(\S+)/ or die "malformed line $line";
if ($prefix ne $last_prefix) {
if ($count) {
print "$count $last_value\n";
}
$last_prefix = $prefix;
$last_value = $value;
$count = 0;
}
++$count;
}
if ($count) {
print "$count $last_value\n";
}
This is just a matter of forming a key from the first three fields and counting the number of times they occur, printing a line of output whenever the key changes or the end of the file is reached
use strict;
use warnings;
my ($last_key, $count);
while ( <DATA> ) {
my $key = join ' ', (split)[0..2];
if ( defined $last_key and $key eq $last_key ) {
++$count;
}
else {
printf "%s -> %d\n", $last_key, $count if defined $last_key;
$last_key = $key;
$count = 1;
}
printf "%s -> %d\n", $last_key, $count if eof;
}
__DATA__
1 1 1 1 1 1 2 a
1 1 1 3 4 4 4 a
1 1 1 4 4 4 2 a
2 2 2 3 3 3 2 b
2 2 2 1 1 1 1 b
3 3 3 1 1 1 1 c
output
1 1 1 -> 3
2 2 2 -> 2
3 3 3 -> 1
Update
To include the final column in the output data, just change
my $key = join ' ', (split)[0..2]
to
my $key = join ' ', (split)[0..2,-1]
output
1 1 1 a -> 3
2 2 2 b -> 2
3 3 3 c -> 1
Here is one way of doing it:
# Open file and loop through lines
open (INFH, '<', "num.txt");
my $count = 0;
my $str;
my %countHash;
while(<INFH>){
# split the line using space characters to get first three numbers
my #numArray = split(' ', $_);
#Concatenating first three numbers as a string to use as key
$key = "$numArray[0]" . "$numArray[1]" . "$numArray[2]";
#If the combination exists, update the value by adding 1. Else add new
if (!exists $countHash{$key}){
$countHash{$key} = 1;
}else{
$countHash{$key} += 1;
}
}
print %countHash;
I will update if I can make it any better.
Related
I'm trying to figure out why this keeps printing the "majority element" candidate in every cycle.
The code I've been trying to make work is a Majority Element search (to find an element that is repeated more than half of the length of a list).
I can't separate the processes of finding the candidate and testing against the array because my input is a text file that has an indeterminate number of arrays. It's an exercise from rosalind.info that has different inputs every time you try to solve it.
An example of the input would be
-5 5 5 5 5 5 5 5 -8 7 7 7 1 7 3 7 -7 1 6 5 10 100 1000 1 -5 1 6 7 1 1 10 1
Here's what I've written so far.
foreach my $currentrow (#lists) {
my #row = ();
#row = split( /\s/, $currentrow );
my $length = $#row;
my $count = 0;
my $i = 0;
for $i ( 0 .. $length - 1 ) {
if ( $count == 0 ) {
$candidate = $row[$i];
$count++;
}
if ( ( $count > 0 ) and ( $i = $length - 1 ) ) {
my $counter2 = 0;
for my $j ( 0 .. $length - 1 ) {
if ( $row[$j] == $candidate ) {
$counter2++;
}
}
if ( $counter2 <= ( $#row / 2 ) and ( $i = $length - 1 ) ) {
$candidate = -1;
print $candidate, " ", $i, " ";
}
if ( $counter2 > ( $#row / 2 ) and ( $i = $length - 1 ) ) {
print $candidate, " ", $i, " ";
}
}
if ( $candidate == $row[$i] and $count > 0 ) {
$count = $count + 1;
}
if ( $candidate != $row[$i] and $count > 0 ) {
$count = $count - 1;
}
}
}
Do you have use strict and use warnings 'all' in place?
I imagine that your problem may be because of the test $i = $length - 1, which is an assignment, and should be $i == $length - 1
To find a majority element I would use a hash:
perl -nae '%h=(); $h{$_}+=2 for #F; $h{$_}>#F and print for keys %h; print "\n"'
Each line of input is treated separately. Each line of output matches a line of input and presents its majority element or is empty if there is no such element.
Edit: Now the solution uses autosplit (-a), which is shorter and work not only for numbers.
I have a script that downloads data from a database.
But I am having trouble formatting the data into rows.
#!/perl/bin/perl
use FOOConf;
FOOConf::makeDBConnection(production);
$dbh=$EVTConf::dbh;
use Data::Dumper ;
my %extend_hash = %{#_[0]};
my $query = "select level_id,e_risk_symbol,e_exch_dest,penny,specialist from etds_extend";
if(!$dbh) {
print "Error connecting to DataBase; $DBI::errstr\n";
}
my $cur_msg = $dbh->prepare($query) or die "\n\nCould not prepare statement: ".$dbh->errstr;
$cur_msg->execute();
my (#row);
while (#row = $cur_msg->fetchrow_array ) {
#foreach $row(#row) {
#print "$row \n" ;
printf "%-8s %-4s %-2s %-2s %-2s\n ", $row[0], $row[1], $row[2], $row[3], $row[4], $row[5];
#printf "%-12s %6.2f\n", $row[0], $row[3];
#for (my $i = 0; $i < scalar(#row); $i++) {
# printf "%-12s = %s\n", $cur_msg->{NAME}[$i], $row[$i];
# }
#}
}
i am using this to format the rows. The format is all mesed up
printf "%-8s %-4s %-2s %-2s %-2s\n ", $row[0], $row[1], $row[2], $row[3],
this is what the format is :
5 MRO CS 1 0
5 FFIV CS 1 0
5 GM CS 1 0
5 MCP CS 1 0
5 RVBD CS 1 0
6 OIS_SPIN XISX 0 1
6 CVEO XISX 0 1
6 MRVL AMXO 0 1
6 MRX AMXO 0 1
6 MS XISX 0 1
6 MTG XISX 0 1
if I just use this loop
while (#row = $cur_msg->fetchrow_array ) {
foreach $row(#row) {
print "$row \n" ;
}
}
each row gets printed out one line at a time - the thread starts with a 5 - which seems to screw up the formatting. How do i factor this out first 5 ?
[ walt]$ ./test_db_data_format.very_simple | head -20
5
MRO
CS
1
0
5
FFIV
CS
1
0
5
GM
CS
1
0
5
MCP
CS
1
0
when i use I this loop this format - which is really nice.
That level id comes in at beginning without being attached to a symbol - screwing me up.
while (#row = $cur_msg->fetchrow_array ) {
for (my $i = 0; $i < scalar(#row); $i++) {
printf "%-12s = %s\n", $cur_msg->{NAME}[$i], $row[$i];
}
}
This is the results with the rows from the data base:
LEVEL_ID = 5
E_RISK_SYMBOL = MRO
E_EXCH_DEST = CS
PENNY = 1
SPECIALIST = 0
LEVEL_ID = 5
E_RISK_SYMBOL = FFIV
E_EXCH_DEST = CS
PENNY = 1
SPECIALIST = 0
LEVEL_ID = 5
E_RISK_SYMBOL = GM
E_EXCH_DEST = CS
PENNY = 1
SPECIALIST = 0
LEVEL_ID = 5
What I need is the E_RISK_SYMBOL to start and a newline after LEVEL_ID to start a new row.
It should look just like this just like this:
MRO CS 1 0 5
FFIV CS 1 0 5
GM CS 1 0 5
MCP CS 1 0 5
RVBD CS 1 0 6
OIS_SPIN XISX 0 1 6
It looks like your primary problem was just a trailing space in your format string after the \n and the fact that you were printing the $row[0] first instead of last.
You can also simplify your code by including the my declaration in the while (COND) and also using an array slice instead of listing out a bunch of individual array elements.
while (my #row = $cur_msg->fetchrow_array ) {
printf "%-8s %-4s %-2s %-2s %-2s\n", #row[1..4,0];
}
Note: You were also passing 6 values to a format string with only 5 spots in your first code. If you actually want the 6th variable to be displayed, you'll have to specify its format as well.
while (#row = $cur_msg->fetchrow_array ) {
printf "%-8s %-4s %-2s %-2s %-2s\n", $row[1], $row[2], $row[3], $row[4], $row[0];
}
Personally, I'd probably fetch it as a hashref to aide in making it more readable/understandable/maintainable.
while (my $row = $cur_msg->fetchrow_hashref ) {
printf "%-8s %-4s %-2s %-2s %-2s\n", $row->{e_risk_symbol},
$row->{e_exch_dest},
$row->{penny},
$row->{specialist},
$row->{level_id};
}
I am trying to grep tab separated numbers (eg 1\t3) in an array something like
#data=
1 3
2 3
1 3
3 3
the idea behind the code is something like this
#!usr/bin/perl
use strict;
use warnings;
my #data = ( "1\t3", "2\t3", "1\t3", "3\t3", );
for (my $i=0;$i<4;$i++) {
for (my $j=0;$j<4_size;$j++) {
my $pattern= "$i\t$j";
my #count=grep(/$pattern/,#data);
undef $pattern;
print "$pattern\tcount\n";
}
}
hoping for output something like
1st and second column: pattern
3nd column : count of total matches
1 1
1 2
1 3 2
2 1
2 3 1
3 1
3 2
3 3 1
but the output is null for some reasons,
I am recently learnt and finding it very intriguing.
any suggestions?
The code below:
Does not crash if input contains unexpected characters (e.g., '(')
Only counts exact matches for the sequences of digits on either side of "\t".
Matches lines that might have been read from a file or __DATA__ section without using chomp using \R.
--
#!/usr/bin/env perl
use strict;
use warnings;
my #data = ( "1\t3", "2\t3", "(\t4", "1\t3", "3\t3", "11\t3" );
for my $i (1 .. 3) {
for my $j (1 .. 3) {
my $pattern = "$i\t$j";
my $count = grep /\A\Q$pattern\E\R?\z/, #data;
print join("\t", $pattern, $count ? $count : ''), "\n";
}
}
Output:
1 1
1 2
1 3 2
2 1
2 2
2 3 1
3 1
3 2
3 3 1
You almost got it. Here is a working version:
#!usr/bin/perl
use strict;
use warnings;
my #data = ( "1\t3", "2\t3", "1\t3", "3\t3", );
foreach my $i (1 .. 3) {
foreach my $j (1 .. 3) {
my $pattern = "$i\t$j";
my $count = grep(/$pattern/, #data);
print $pattern . ($count ? "\t$count\n" : "\n");
}
}
I have the following problem: from a file (file.dat) with the following formatted datas
1 2 3 4
2 1 3 4 5
3 1 2
4 1 2
5 2 6 7
6 5 8
7 5 8
8 6 7 9
9 8
I want to find:
if the first element of a row appears in the other rows and if the first element of the subsequent rows appear in the row taken in exam;
if it exists then I want to print "I have found the link x y";
if the "link" exists, then I want to count how many times the other elements in the row taken in exam appear in the row where the link is present and print "I have found z triangles".
For example in this case when the program compare the first row and the second row and find that "the link 1 2" exists and then write also "I have find 2 triangles" (because in each rows there are the numbers 3 and 4).
For this purpose I have tried to write the following program:
use strict;
use warnings;
use diagnostics;
use Data::Dumper;
############ DATA ABSORTION
my $file = 'file.dat';
open my $fh, "<", $file or die "Cannot open $file: $!";
############ COLLECT THE DATAS IN A VECTOR as vector[i][j]
my #vector;
while (<$fh>) {
push #vector, [ split ];
}
############ START THE RESEARCH OF THE LINKS AND TRIANGLES BY MEANS OF FOR LOOPS
my #link;
my $triangles;
for (my $i=0 ; $i < scalar #vector; $i++){
$triangles=0;
for(my $j=0; $j < scalar #vector; $j++){
for (my $k=$i+1; $k < scalar #vector; $k++){
for(my $l=0; $l < scalar #vector; $l++){
if($vector[$i][0]==$vector[$k][$l] && $vector[$i][$j]==$vector[$k][0] && $l != 0 && $j != 0) {
#link=($vector[$i][0],$vector[$k][0]);
print "I found the link #link\n";
if($vector[$i][$j]==$vector[$k][$l] && $l != 0 && $j != 0 && $i != $k){
$triangles++;
}
print "The number of triangles is $triangles\n\n";
}
}
}
}
}
The program print the right number of links but I found that if the number of rows is lower of the number of colums in the file, the program doesn't read the full row and this could be a problem for my link research. I think the problem is due at the scalar #vector upper limit in the for instrunctions (but I don't understand why).
The second problem is that it does't count the right number o triangles that I'am looking for... Any helps?
This program does what you require. In addition it prints the three corners of each triangle when one is found.
use strict;
use warnings;
use 5.010;
my $filename = 'file.dat';
open my $fh, '<', $filename or die qq{Cannot open "$filename": $!};
my %vector;
while (<$fh>) {
my #fields = split;
my $root = shift #fields;
$vector{$root} = { map { $_ => 1} #fields };
}
my #roots = sort { $a <=> $b } keys %vector;
for my $i (0 .. $#roots) {
my $aa = $roots[$i];
for my $j ($i + 1 .. $#roots) {
my $bb = $roots[$j];
next unless $vector{$aa}{$bb} and $vector{$bb}{$aa};
say "I found the link $aa $bb";
my $triangles = 0;
for my $cc ( keys %{$vector{$aa}} ) {
next if $cc == $aa or $cc == $bb;
if ($vector{$bb}{$cc}) {
say "Triangle $aa - $bb - $cc";
$triangles++;
}
}
say "I have found $triangles triangle". ($triangles == 1 ? '' : 's');
print "\n";
}
}
There are only two triangles in the data you show: 1-2-3 and 1-2-4. Following your algorithm results in this program counting triangles more than once, with the corners in different orders. To count each distinct triangle only once, change the line
next if $cc == $aa or $cc == $bb;
to
next if $cc <= $aa or $cc <= $bb;
output
I found the link 1 2
Triangle 1 - 2 - 4
Triangle 1 - 2 - 3
I have found 2 triangles
I found the link 1 3
Triangle 1 - 3 - 2
I have found 1 triangle
I found the link 1 4
Triangle 1 - 4 - 2
I have found 1 triangle
I found the link 2 3
Triangle 2 - 3 - 1
I have found 1 triangle
I found the link 2 4
Triangle 2 - 4 - 1
I have found 1 triangle
I found the link 2 5
I have found 0 triangles
I found the link 5 6
I have found 0 triangles
I found the link 5 7
I have found 0 triangles
I found the link 6 8
I have found 0 triangles
I found the link 7 8
I have found 0 triangles
I found the link 8 9
I have found 0 triangles
[ Only answers first question ]
$j and $l are suppose to iterate over the column indexes, but you count rows. The correct loops are:
for my $i (0 .. $#vector-1) {
for my $j (0 .. $#{ $vector[$i] }) {
for my $k ($i+1 .. $#vector) {
for my $l (0 .. $#{ $vector[$k] }) {
This question has two parts:
Establish if a link exists between two rows
Establish the total 'unique' numbers they share in common
Using an AoA is fine, but using a HoH makes life a little easier:
my %links;
while ( <$fh> ) {
chomp;
my ( $from, #to ) = split;
$links{$from}{$_}++ for #to;
}
You can then check to see if the link exists:
print "Link $from $to\n" if exists $links{$from} && exists $links{$from}{$to};
And finding common "triangles" should be easy as well:
use List::MoreUtils 'uniq';
sub get_triangles {
my ( $from, $to ) = #_;
for ( $from, $to ) { # Bail out if link doesn't exist
warn( "'$_' does not exist"), return unless exists $links{$_};
}
my #triangles = map { exists $links{$from} && exists $links{$to} }
uniq( values %{$links{$from}}, values %{$links{to}} );
return #triangles;
}
I am new to Perl. I have an input file such as:
a 7 5
b 8 2
a 3 2
b 4 1
c 6 1
How can I get output like
column_1_val, number_occurrence_column_1, sum_of_column_2, sum_of_column_3
For example
a 2 10 7
b 2 12 3
c 1 6 1
The program below is a possible solution. I have used the DATA file handle whereas you will presumably need to open an external file and use the handle from that.
use strict;
use warnings;
use feature 'say';
my %data;
while (<DATA>) {
my ($key, #vals) = split;
$data{$key}[0]++;
my $i;
$data{$key}[++$i] += $_ for #vals;
}
say join ' ', $_, #{$data{$_}} for sort keys %data;
__DATA__
a 7 5
b 8 2
a 3 2
b 4 1
c 6 1
output
a 2 10 7
b 2 12 3
c 1 6 1
That would be something like (untested):
while (<>) {
if (m/(\w+)\s+(\d+)\s+(\d+)/) {
($n, $r1, $r2) = ($1, $2, $3);
$nr{$n}++;
$r1{$n} += $r1;
$r2{$n} += $r2;
}
}
for $n (sort keys %nr) {
print "$n $nr{$n} $r1{$n} $r2{$n}\n";
}
This is a very quick-and-dirty way of doing what you described, but it should get you on your way.
Even i am not aware of perl.But in case you are concerned with the result.the below is the solution in awk.It might /might not help you.but in case u need it :
awk '{c[$1]++;a[$1]=a[$1]+$2;b[$1]+=$3}END{for(i in a)print i,c[i],a[i],b[i]}' file3
A slightly different take:
my %records;
while ( <> ) {
my #cols = split ' ';
my $rec = $records{ $cols[0] } ||= {};
$rec->{number_occurrence_column_1}++;
$rec->{sum_of_column_2} += $cols[1];
$rec->{sum_of_column_3} += $cols[2];
}
foreach my $rec ( map { { col1 => $_, %{ $records{ $_ } } }
sort keys %records
) {
print join( "\t"
, #$rec{ qw<col1 number_occurrence_column_1
sum_of_column_2 sum_of_column_3
>
}
), "\n"
;
}