I have a script that downloads data from a database.
But I am having trouble formatting the data into rows.
#!/perl/bin/perl
use FOOConf;
FOOConf::makeDBConnection(production);
$dbh=$EVTConf::dbh;
use Data::Dumper ;
my %extend_hash = %{#_[0]};
my $query = "select level_id,e_risk_symbol,e_exch_dest,penny,specialist from etds_extend";
if(!$dbh) {
print "Error connecting to DataBase; $DBI::errstr\n";
}
my $cur_msg = $dbh->prepare($query) or die "\n\nCould not prepare statement: ".$dbh->errstr;
$cur_msg->execute();
my (#row);
while (#row = $cur_msg->fetchrow_array ) {
#foreach $row(#row) {
#print "$row \n" ;
printf "%-8s %-4s %-2s %-2s %-2s\n ", $row[0], $row[1], $row[2], $row[3], $row[4], $row[5];
#printf "%-12s %6.2f\n", $row[0], $row[3];
#for (my $i = 0; $i < scalar(#row); $i++) {
# printf "%-12s = %s\n", $cur_msg->{NAME}[$i], $row[$i];
# }
#}
}
i am using this to format the rows. The format is all mesed up
printf "%-8s %-4s %-2s %-2s %-2s\n ", $row[0], $row[1], $row[2], $row[3],
this is what the format is :
5 MRO CS 1 0
5 FFIV CS 1 0
5 GM CS 1 0
5 MCP CS 1 0
5 RVBD CS 1 0
6 OIS_SPIN XISX 0 1
6 CVEO XISX 0 1
6 MRVL AMXO 0 1
6 MRX AMXO 0 1
6 MS XISX 0 1
6 MTG XISX 0 1
if I just use this loop
while (#row = $cur_msg->fetchrow_array ) {
foreach $row(#row) {
print "$row \n" ;
}
}
each row gets printed out one line at a time - the thread starts with a 5 - which seems to screw up the formatting. How do i factor this out first 5 ?
[ walt]$ ./test_db_data_format.very_simple | head -20
5
MRO
CS
1
0
5
FFIV
CS
1
0
5
GM
CS
1
0
5
MCP
CS
1
0
when i use I this loop this format - which is really nice.
That level id comes in at beginning without being attached to a symbol - screwing me up.
while (#row = $cur_msg->fetchrow_array ) {
for (my $i = 0; $i < scalar(#row); $i++) {
printf "%-12s = %s\n", $cur_msg->{NAME}[$i], $row[$i];
}
}
This is the results with the rows from the data base:
LEVEL_ID = 5
E_RISK_SYMBOL = MRO
E_EXCH_DEST = CS
PENNY = 1
SPECIALIST = 0
LEVEL_ID = 5
E_RISK_SYMBOL = FFIV
E_EXCH_DEST = CS
PENNY = 1
SPECIALIST = 0
LEVEL_ID = 5
E_RISK_SYMBOL = GM
E_EXCH_DEST = CS
PENNY = 1
SPECIALIST = 0
LEVEL_ID = 5
What I need is the E_RISK_SYMBOL to start and a newline after LEVEL_ID to start a new row.
It should look just like this just like this:
MRO CS 1 0 5
FFIV CS 1 0 5
GM CS 1 0 5
MCP CS 1 0 5
RVBD CS 1 0 6
OIS_SPIN XISX 0 1 6
It looks like your primary problem was just a trailing space in your format string after the \n and the fact that you were printing the $row[0] first instead of last.
You can also simplify your code by including the my declaration in the while (COND) and also using an array slice instead of listing out a bunch of individual array elements.
while (my #row = $cur_msg->fetchrow_array ) {
printf "%-8s %-4s %-2s %-2s %-2s\n", #row[1..4,0];
}
Note: You were also passing 6 values to a format string with only 5 spots in your first code. If you actually want the 6th variable to be displayed, you'll have to specify its format as well.
while (#row = $cur_msg->fetchrow_array ) {
printf "%-8s %-4s %-2s %-2s %-2s\n", $row[1], $row[2], $row[3], $row[4], $row[0];
}
Personally, I'd probably fetch it as a hashref to aide in making it more readable/understandable/maintainable.
while (my $row = $cur_msg->fetchrow_hashref ) {
printf "%-8s %-4s %-2s %-2s %-2s\n", $row->{e_risk_symbol},
$row->{e_exch_dest},
$row->{penny},
$row->{specialist},
$row->{level_id};
}
Related
I have a tab delimited text file, something like this:
data S1 S2 S3 S4 S5 S6
data1 0 0 0 0 0 0
data2 0 5 3 5 0.1 0.9
data3 0 3 9 3 0 0.01
data4 0 0 4 4 0 0
data5 2 5 11 7 5 0.2
data6 0 0 0 8. 0 0
data7 0 1 5 2 06 0.04
Well, the structure of the file is just little more complex, is a metagenomics file, something like:
D_0__Archaea;D_1__Euryarchaeota;D_2__Thermoplasmata;D_3__Thermoplasmatales;D_4__Marine Group II;D_5__uncultured archaeon 0 0 0 0 0 0 0 0 0.0035 0.00293 0.00834 0
from D_0__ to D_5__ = first column (data in the example)
each number represent each column (S)
but at the end, is similar !!!!
what I want, is to use the %row hash to extract the first line (data) with a single #label_match (s3), and print out in a single txt file, I mean, if I want s3 and s6, print out something like this:
S3_file.txt (take the name of each column to print out the name of the file):
s3 data #avoid this line in the print out, just to explain !!!
0 data1
3 data2
9 data3
4 data4
11 data5
0 data6
5 data7
and
S6_file.txt:
0 data1
0.9 data2
0.01 data3
0 data4
0.2 data5
0 data6
0.04 data7
I have this code, and I think that in the %row section i have to make a foreach loop, to extract each #label_match one by one, but I don't know how.
This is my code:
#!/usr/bin/env perl
use strict;
use warnings;
use List::MoreUtils qw(uniq);
use Data::Dumper qw(Dumper);
use Getopt::Long;
use List::Util qw(sum);
my ($infile_taxon, $search_label, $output_file, $help, $help_desc, $options, $options_desc, $keep_file);
GetOptions (
't=s' =>\$infile_taxon,
's=s' =>\$search_label,
'kf' =>\$keep_file,
'o=s' =>\$output_file,
'h' =>\$help,
'op' =>\$options
);
#---------------------------Subrutin to clean the selected Taxon --------------------
sub Taxon_Clean {
my (#clean_result);
foreach (#_){
chomp;
if ($_ =~ s/D_0__//g | s/;D_\d__/\t/g | s/;/\t/g){
push #clean_result, $_;
}
}
return #clean_result;
}
#------------------------------------------------------ Open Files-------------------
open INFILE_TAXONOMY, '<', "$infile_taxon" or die $!;
my (#taxon, #sample_names);
#------------------------------------------------------ Taxon -----------------------
my ( #header, #label_match, #not_match, #taxon_filter);
while (<INFILE_TAXONOMY>){
chomp;
if ($_=~ m/^$|Constructed from biom file/g) {
next;
}
elsif ($_=~ s/OTU ID/Taxon/g){
chomp ( #header = split '\t', $_ );
#------------------------------------------------------ Search Label ----------------
if ($search_label){
my #label_wanted= split (/\,/, $search_label);
unshift #label_wanted, '#Taxon';
#label_wanted = uniq (#label_wanted);
foreach (#label_wanted){
my $unit =$_;
chomp $unit;
if (my #match_wanted= grep (/$unit/, #header)){
push (#label_match, #match_wanted);
}
else {
push (#not_match, $unit);
}
}
# --------- Check Point ---------
push (my #defined_elements, #label_match);
shift #defined_elements;
if (! #defined_elements){
print "\n\tNON of the Search Samples \" $search_label \" "
. "Were Found in \" $infile_taxon \" File !!!\n\n";
exit;
}
elsif (grep {defined($_)} #defined_elements){
if (grep {defined($_)} #not_match){
print "\n\tSamples No Found: #not_match\n\n";
}
}
}
}
elsif ($_=~ m/^#/g){
next;
}
elsif ($search_label) {
my %row;
#row{#header} = split '\t';
my #filter= join "\t", #row{#label_match}, "\n";
push (#taxon_filter, #filter);
#print Dumper (\%row);
}
else {
push (#taxon, $_);
}
}
# The Next section is to extract all the wanted columns in a single file,
# but here is where I want to extract one by one column i a separate file !!!
open OUTPUT, '>', "Taxonomic_results_file.txt", or die "can't create the output file";
foreach (#taxon_filter){
chomp $_;
my ($tax, #values) = split '\t', $_;
my $unit_val = join("\t", map { $_ } #values);
my $sum_elements = sum (#values);
if ($sum_elements == 0){
next;
}
else {
push (my #tx, $tax);
#tx = Taxon_Clean (#tx);
print OUTPUT "$unit_val\t#tx\n";
}
}
close INFILE_TAXONOMY;
close OUTPUT;
exit;
Thanks So Much
You're already a lot of the way there with the #row{#header} type syntax. That takes a hash-slice, which means you can match multiple elements based on hash keys.
Output works much the same
open ( my $s3_file, '>', 'S3_file.txt' ) or warn $!;
my #output_fields = qw ( s3 data ); #matches column headings
And lower down inside the %row block:
print {$s3_file} join ("\t", #row{#output_fields} )), "\n";
I'm trying to figure out why this keeps printing the "majority element" candidate in every cycle.
The code I've been trying to make work is a Majority Element search (to find an element that is repeated more than half of the length of a list).
I can't separate the processes of finding the candidate and testing against the array because my input is a text file that has an indeterminate number of arrays. It's an exercise from rosalind.info that has different inputs every time you try to solve it.
An example of the input would be
-5 5 5 5 5 5 5 5 -8 7 7 7 1 7 3 7 -7 1 6 5 10 100 1000 1 -5 1 6 7 1 1 10 1
Here's what I've written so far.
foreach my $currentrow (#lists) {
my #row = ();
#row = split( /\s/, $currentrow );
my $length = $#row;
my $count = 0;
my $i = 0;
for $i ( 0 .. $length - 1 ) {
if ( $count == 0 ) {
$candidate = $row[$i];
$count++;
}
if ( ( $count > 0 ) and ( $i = $length - 1 ) ) {
my $counter2 = 0;
for my $j ( 0 .. $length - 1 ) {
if ( $row[$j] == $candidate ) {
$counter2++;
}
}
if ( $counter2 <= ( $#row / 2 ) and ( $i = $length - 1 ) ) {
$candidate = -1;
print $candidate, " ", $i, " ";
}
if ( $counter2 > ( $#row / 2 ) and ( $i = $length - 1 ) ) {
print $candidate, " ", $i, " ";
}
}
if ( $candidate == $row[$i] and $count > 0 ) {
$count = $count + 1;
}
if ( $candidate != $row[$i] and $count > 0 ) {
$count = $count - 1;
}
}
}
Do you have use strict and use warnings 'all' in place?
I imagine that your problem may be because of the test $i = $length - 1, which is an assignment, and should be $i == $length - 1
To find a majority element I would use a hash:
perl -nae '%h=(); $h{$_}+=2 for #F; $h{$_}>#F and print for keys %h; print "\n"'
Each line of input is treated separately. Each line of output matches a line of input and presents its majority element or is empty if there is no such element.
Edit: Now the solution uses autosplit (-a), which is shorter and work not only for numbers.
I'm having some trouble trying to figure out how to attack this problem. I have a file that looks like :
1 1 1 1 1 1 2 a
1 1 1 3 4 4 4 a
1 1 1 4 4 4 2 a
2 2 2 3 3 3 2 b
2 2 2 1 1 1 1 b
Which, I want to count how many times the first 3 numbers repeat consecutively. As in I load the data set into perl, and it will spit out:
3
2
as the string '1 1 1' was found at the start of a line 3 times in a row, and the string '2 2 2' was found to begin a line 2 times in a row. Either string may appear later in the file, which complicates things. As well, I will not know what the beginning string is either, and it varies.
Honestly, I'm just really confused on how to attack this. If anyone can help conceptualize this/ give me some pseudo-code to help, that would be great.
EDIT: As per Borodins answer, which works perfectly for what I asked. However, if I wanted to print the count plus which letter it is, How could I do that? So far,
my ( $inp, $outp) = qw / OUT2 OUTFILE/;
open my $input, '<', $inp or die;
open my $output, '>', $outp or die;
my ($last_key, $count);
while ( <$input> ) {
my $key = join ' ', (split)[0..2];
my $id = join ' ', (split)[7];
if ( defined $last_key and $key eq $last_key ) {
++$count;
}
else {
printf "%s %d $id\n", $last_key, $count if defined $last_key;
$last_key = $key;
$count = 1;
}
printf "%s %d $id\n", $last_key, $count if eof;
}
Which gives :
1 1 1 3 b
2 2 2 2 b
which isn't quite what I'm aiming for.
Thanks!
EDIT2
Got what I wanted working. Oftentimes, all it takes is asking for help to figure it out yourself.
Updated code:
my ( $inp, $outp) = qw / OUT2 OUTFILE/;
open my $input, '<', $inp or die;
open my $output, '>', $outp or die;
my ($last_key, $count, $last_id);
while ( <$input> ) {
my $key = join ' ', (split)[0..2];
my $id = join ' ', (split)[7];
if ( defined $last_key and $key eq $last_key ) {
++$count;
}
else {
printf "%s %d $last_id\n", $last_key, $count if defined $last_key;
$last_key = $key;
$count = 1;
$last_id = $id;
}
printf "%s %d $id\n", $last_key, $count if eof;
}
on:
1 1 1 1 1 1 2 a
1 1 1 3 4 4 4 a
1 1 1 4 4 4 2 a
2 2 2 3 3 3 2 b
2 2 2 1 1 1 1 b
3 3 3 2 5 4 2 c
gives:
1 1 1 3 a
2 2 2 2 b
3 3 3 1 c
Thanks all!
You just have to keep track of the previous line, or at least the relevant part of it, and how many times it was seen:
use strict;
use warnings;
my $count = 0;
my $last_prefix = '';
my $last_value = '';
while (my $line = <>) {
my ($prefix, $value) = $line =~ /^(\S+\s+\S+\s+\S+).*(\S+)/ or die "malformed line $line";
if ($prefix ne $last_prefix) {
if ($count) {
print "$count $last_value\n";
}
$last_prefix = $prefix;
$last_value = $value;
$count = 0;
}
++$count;
}
if ($count) {
print "$count $last_value\n";
}
This is just a matter of forming a key from the first three fields and counting the number of times they occur, printing a line of output whenever the key changes or the end of the file is reached
use strict;
use warnings;
my ($last_key, $count);
while ( <DATA> ) {
my $key = join ' ', (split)[0..2];
if ( defined $last_key and $key eq $last_key ) {
++$count;
}
else {
printf "%s -> %d\n", $last_key, $count if defined $last_key;
$last_key = $key;
$count = 1;
}
printf "%s -> %d\n", $last_key, $count if eof;
}
__DATA__
1 1 1 1 1 1 2 a
1 1 1 3 4 4 4 a
1 1 1 4 4 4 2 a
2 2 2 3 3 3 2 b
2 2 2 1 1 1 1 b
3 3 3 1 1 1 1 c
output
1 1 1 -> 3
2 2 2 -> 2
3 3 3 -> 1
Update
To include the final column in the output data, just change
my $key = join ' ', (split)[0..2]
to
my $key = join ' ', (split)[0..2,-1]
output
1 1 1 a -> 3
2 2 2 b -> 2
3 3 3 c -> 1
Here is one way of doing it:
# Open file and loop through lines
open (INFH, '<', "num.txt");
my $count = 0;
my $str;
my %countHash;
while(<INFH>){
# split the line using space characters to get first three numbers
my #numArray = split(' ', $_);
#Concatenating first three numbers as a string to use as key
$key = "$numArray[0]" . "$numArray[1]" . "$numArray[2]";
#If the combination exists, update the value by adding 1. Else add new
if (!exists $countHash{$key}){
$countHash{$key} = 1;
}else{
$countHash{$key} += 1;
}
}
print %countHash;
I will update if I can make it any better.
I have a file with columns of numbers:
1 0.0 0.0
2 0.0 0.0
3 15.2 0.0
4 7.0 9.0
5 0.0 3.0
6 1.0 0.0
7 0.0 2.5
8 0 0 0 0
I need to find the sum of numbers from row 3 to 7 of the right two columns. So for column2 i want to sum 15.2, 7.0 and 1.0. For column3 i want to sum 9.0, 3.0 and 2.5. I need to maintain the single decimal point format.
code:
While (<INPUT>){
my #a = split;
my $c2 .= $a[1];
my $c3 .= $a[2];
my $c2_string = substr($c2, 2, 5);
my $c3_string = substr($c3, 2, 5);
my #sumarray = split ('', $c2);
#then loop through each element and add them up.
This doesnt seem to work. How can i maintain separation of each number while maintaining the decimal format?
For c2, wrong Output:
1
5
.
2
7
.
0
0
.
0
etc
Desired Output:
c2=23.2
c3=14.5
my $x = my $y = 0;
while (<INPUT>) {
my #a = split;
($a[0] >=3 and $a[0] <=7) or next;
$x += $a[1];
$y += $a[2];
}
print "c2=$x\n", "c3=$y\n";
perl -lane'
($F[0] >=3 and $F[0] <=7) or next;
$x += $F[1]; $y += $F[2];
END{ print for "c2=$x","c3=$y" }
' file
my #data;
while (<INPUT>) {
push #data, [ split ];
}
my ($sum2, $sum3);
for (my $i = 2; $i < 7; $i++) {
$sum2 += $data[$i][1];
$sum3 += $data[$i][2];
}
print "$sum2, $sum3\n";
Output:
23.2, 14.5
And this one does not create an array for the entire file:
my ($sum2, $sum3);
while (<INPUT>) {
my #v = split;
if ($v[0] > 2 && $v[0] < 8) {
$sum2 += $v[1];
$sum3 += $v[2];
}
}
#!/usr/bin/perl -w
use strict;
my $infile = 'in.txt';
open my $input, '<', $infile or die "Can't open to $infile: $!";
my ($col1, $sum_col2, $sum_col3 );
while (<$input>) {
my (#cols) = split;
$col1 = $cols[0];
$sum_col2 += $cols[1] if $col1 == 3 .. 7;
$sum_col3 += $cols[2] if $col1 == 3 .. 7;
}
print "Column2: $sum_col2\n";
print "Column3: $sum_col3\n";
Output:
Column2: 23.2
Column3: 14.5
I have the following sparse matrix A.
2 3 0 0 0
3 0 4 0 6
0 -1 -3 2 0
0 0 1 0 0
0 4 2 0 1
Then I would like to capture the following information from there:
cumulative count of entries, as matrix is scanned columnwise.
Yielding:
Ap = [ 0, 2, 5, 9, 10, 12 ];
row indices of entries, as matrix is scanned columnwise.
Yielding:
Ai = [0, 1, 0, 2, 4, 1, 2, 3, 4, 2, 1, 4 ];
Non-zero matrix entries, as matrix is scanned columnwise.
Yielding:
Ax = [2, 3, 3, -1, 4, 4, -3, 1, 2, 2, 6, 1];
Since the actual matrix A is potentially very2 large, is there any efficient way
in Perl that can capture those elements? Especially without slurping all matrix A
into RAM.
I am stuck with the following code. Which doesn't give what I want.
use strict;
use warnings;
my (#Ax, #Ai, #Ap) = ();
while (<>) {
chomp;
my #elements = split /\s+/;
my $i = 0;
my $new_line = 1;
while (defined(my $element = shift #elements)) {
$i++;
if ($element) {
push #Ax, 0 + $element;
if ($new_line) {
push #Ai, scalar #Ax;
$new_line = 0;
}
push #Ap, $i;
}
}
}
push #Ai, 1 + #Ax;
print('#Ax = [', join(" ", #Ax), "]\n");
print('#Ai = [', join(" ", #Ai), "]\n");
print('#Ap = [', join(" ", #Ap), "]\n");
A common strategy for storing sparse data is to drop the values you don't care about (the zeroes) and to store the row and column indexes with each value that you do care about, thus preserving their positional information:
[VALUE, ROW, COLUMN]
In your case, you can economize further since all of your needs can be met by processing the data column-by-column, which means we don't have to repeat COLUMN for every value.
use strict;
use warnings;
use Data::Dumper;
my ($r, $c, #dataC, #Ap, #Ai, #Ax, $cumul);
# Read data row by row, storing non-zero values by column.
# $dataC[COLUMN] = [
# [VALUE, ROW],
# [VALUE, ROW],
# etc.
# ]
$r = -1;
while (<DATA>) {
chomp;
$r ++;
$c = -1;
for my $v ( split '\s+', $_ ){
$c ++;
push #{$dataC[$c]}, [$v, $r] if $v;
}
}
# Iterate through the data column by column
# to compute the three result arrays.
$cumul = 0;
#Ap = ($cumul);
$c = -1;
for my $column (#dataC){
$c ++;
$cumul += #$column;
push #Ap, $cumul;
for my $value (#$column){
push #Ax, $value->[0];
push #Ai, $value->[1];
}
}
__DATA__
2 3 0 0 0
3 0 4 0 6
0 -1 -3 2 0
0 0 1 0 0
0 4 2 0 1
This is what you are looking for, I guess:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper::Simple;
my #matrix;
# Populate #matrix
while (<>) {
push #matrix, [ split /\s+/ ];
}
my $columns = #{ $matrix[0] };
my $rows = #matrix;
my ( #Ap, #Ai, #Ax );
my $ap = 0;
for ( my $j = 0 ; $j <= $rows ; $j++ ) {
for ( my $i = 0 ; $i <= $columns ; $i++ ) {
if ( $matrix[$i]->[$j] ) {
$ap++;
push #Ai, $i;
push #Ax, $matrix[$i]->[$j];
}
}
push #Ap, $ap;
}
print Dumper #Ap;
print Dumper #Ai;
print Dumper #Ax;
Updated based on FM's comment. If you do not want to store any of the original data:
#!/usr/bin/perl
use strict;
use warnings;
my %matrix_info;
while ( <DATA> ) {
chomp;
last unless /[0-9]/;
my #v = map {0 + $_ } split;
for (my $i = 0; $i < #v; ++$i) {
if ( $v[$i] ) {
push #{ $matrix_info{$i}->{indices} }, $. - 1;
push #{ $matrix_info{$i}->{nonzero} }, $v[$i];
}
}
}
my #cum_count = (0);
my #row_indices;
my #nonzero;
for my $i ( sort {$a <=> $b } keys %matrix_info ) {
my $mi = $matrix_info{$i};
push #nonzero, #{ $mi->{nonzero} };
my #i = #{ $mi->{indices} };
push #cum_count, $cum_count[-1] + #i;
push #row_indices, #i;
}
print(
"\#Ap = [#cum_count]\n",
"\#Ai = [#row_indices]\n",
"\#Ax = [#nonzero]\n",
);
__DATA__
2 3 0 0 0
3 0 4 0 6
0 -1 -3 2 0
0 0 1 0 0
0 4 2 0 1
Output:
C:\Temp> m
#Ap = [0 2 5 9 10 12]
#Ai = [0 1 0 2 4 1 2 3 4 2 1 4]
#Ax = [2 3 3 -1 4 4 -3 1 2 2 6 1]
Ap is easy: simply start with zeroes and increment each time you meet a nonzero number. I don't see you trying to write anything into #Ap, so it's no surprise it doesn't end up as you wish.
Ai and Ax are trickier: you want a columnwise ordering while you're scanning rowwise. You won't be able to do anything in-place since you don't know yet how many elements the columns will yield, so you can't know in advance the elements' position.
Obviously, it would be a hell lot easier if you could just alter the requirement to have a rowwise ordering instead. Failing that, you could get complex and collect (i, j, x) triplets. While collecting, they'd naturally be ordered by (i, j). Post-collection, you'd just want to sort them by (j, i).
The code you provided works on a row-by-row basis. To get results sequential by columns you have to accumulate your values into separate arrays, one for each column:
# will look like ([], [], [] ...), one [] for each column.
my #columns;
while (<MATRIX>) {
my #row = split qr'\s+';
for (my $col = 0; $col < #row; $col++) {
# push each non-zero value into its column
push #{$columns[$col]}, $row[$col] if $row[$col] > 0;
}
}
# now you only need to flatten it to get the desired kind of output:
use List::Flatten;
#non_zero = flat #columns;
See also List::Flatten.