Double hash entries while printing - perl

As part of a bigger program I've got a hash. I'm testing the program printing the keys but they are all duplicates, I don't know why
while ( my $line = <SEQ> ) {
chomp $line;
$line =~ s/>//;
my ( #split1 ) = split( "\t", $line );
foreach my $chr ( keys %position ) {
#print Dumper \%position;
print "$chr\n";
foreach my $pos ( sort keys %{ $position{$chr} } ) {
if ( $split1[0] =~ /$chr/ ) {
#print "$chr\t$pos\n";
}
}
}
}
%position is a nested hash, when I print the keys on print "$chr\n"; they are all doubled and I don't understand why.
The file opened on handle SEQ looks like this:
>chr1\tACTGTAGTCTCATCCTAT...
>chr2\tACGTAGCTAGT....
and so on

You have a foreach loop inside a while which will print all of the keys of the %position hash for every line of input
That will cause symptoms like what you're describing. Is that what you're looking for?

Start with this:
my %data;
while(<DATA>){
chomp;
my ($chr, $seq) = split;
$data{$chr} = $seq;
}
print Dumper \%data;

Related

How to push an array into nested hash

Does anybody can explain that how can i push array into nested hash. suppose I create a nested hash and want to push an array into key which is a value of another key and also how to access those values of array. pardon me, If i am technically wrong in explaining my query.
Here is part of my code:
if ($line !~ /#/)
{
#region = split /\t/, $line;
#ancestor = split /:/, $region[8];
my #div = split /\s/, $ancestor[0];
push #value, {$region[3],$region[4]};
#print "$region[3],$region[4]\n";
push #{$hash{$div[1]}{$region[0]}},$region[3],$region[4];
}
Here is the code to create the multidimensional hash.
my $filename = 'out.txt';
my %hash = ();
open(my $fh, $filename) or die "Could not open file '$filename' $!";
while (my $line = <$fh>) {
chomp $row;
if ($line !~ /#/)
{
#region = split /\t/, $line;
$len = scalar #region;
if($len >8){
#ancestor = split /:/, $region[8];
my #div = split /\s/, $ancestor[0];
push #value, {$region[3],$region[4]};
push #{$hash{$div[1]}{$region[0]}},[$region[3],$region[4]];
}
}
}
Now to access the hash you can use the below code:
for my $key1 (keys %hash) {
print("Hello $key1");
for my $key2 (keys %{$hash{$key1}}){
print("\t$key2\n");
#res = #{$hash{$key1}{$key2}};
foreach my $arr(#res){
print("\t\t");
print join(",", #{$arr}), "\n";
}
}
print("\n");
}
Hope the above code will work for you.If you need any further help, let me know in comments.
Use square brackets for anonymous arrays. Use curly braces for anonymous hashes.
push #value, [ $region[3], $region[4] ];
push #{$hash{$div[1]}{$region[0]}}, [ $region[3], $region[4] ];

perl variable not storing data outside block

I have written below mention code to read a file and and storing data to array #s_arr.
But when I am trying to print that #s_arr array outside the block it shows nothing.
use Data::Dumper;
my #s_arr;
my #err;
my %sort_h_1;
$fname = '/qv/Error.log';
open( IN, "<$fname" );
foreach $line ( <IN> ) {
if ( $line =~ /CODE\+(\w{3})(\d{5})/ ) {
$a = "$1$2";
push #err, $a;
}
}
close IN;
$prev = "";
$count = 0;
my %hash;
foreach ( sort #err ) {
if ( $prev ne $_ ) {
if ( $count ) {
$hash{$prev} = $count;
}
$prev = $_;
$count = 0;
}
$count++;
}
print Dumper \%hash;
printf( "%s:%d\n", $prev, $count ) if $count;
$hash{$prev} = $count;
my $c = 0;
print "Today Error Count\n";
foreach my $name ( sort { $hash{$b} <=> $hash{$a} } keys %hash ) {
#printf "%-8s %s\n", $name, $hash{$name};
#my %sort_h ;
push #s_arr, $name;
push #s_arr, $hash{$name};
#$sort_h{$name} = $hash{$name} ;
#print Dumper \%sort_h ;
#print Dumper \#s_arr ;
$c++;
if ( $c eq 30 ) {
exit;
}
}
print Dumper \#s_arr; # It's showing nothing
You are calling exit inside of your foreach loop. That makes the program stop, and the print Dumper #s_arr is never reached.
To break out of a loop you need to use last.
foreach my $name ( sort ... ) {
# ...
$c++;
last if $c == 30; # break out of the loop when $c reaches 30
}
I used the postfix variant of if here because that makes it way easier to read. Also note that as zdim pointed out above, you should use the numerical equality check == when checking for numbers. eq is for strings.

Perl Regular expression extract

I'm trying to extract a certain string of numbers from a text file using a regular exression, but when my code runs, it is grabbing the numbers after the slash in the separation between date and time. Here is what I have so far.
while ( <INFILE> ) {
my #fields = split( /\ /, $_ );
my #output;
foreach my $field ( #fields ) {
if ( $field =~ /[0-9]{5}\// ) {
push #output, $field;
}
}
if ( #output ) {
my $line = join( ' ', #output );
print "$line\n";
print OUTFILE "$line\n";
}
}
The line I am trying to extract data from is
D2001235 9204 254/2004 254/1944 254/2041 15254/2011 ALL-V4YM 001 AUTO C-C0000
The data I need is the 15254 but when I run my code it returns 15254/2011 and my program is erroring out.
The problem is that you are storing the entire $field in the output array, but you only want the number to the left of the slash to be stored. You could use capturing parentheses in the regular expression and the $1 special variable. This outputs 15254:
use warnings;
use strict;
while (<DATA>) {
my #fields = split( /\ /, $_ );
my #output;
foreach my $field (#fields) {
if ( $field =~ /^([0-9]{5})\// ) {
push #output, $1;
}
}
if (#output) {
my $line = join( ' ', #output );
print "$line\n";
}
}
__DATA__
D2001235 9204 254/2004 254/1944 254/2041 15254/2011 ALL-V4YM 001 AUTO C-C0000
As explained, you are saving an entire field in #output if it matches the regex, instead of just the first part before the slash
Your split is also unnecessarily complicated, and join isn't needed
All you need is this
while ( <INFILE> ) {
my #output = map m{^([0-9]{5})/}, split;
if ( #output ) {
print "#output\n";
print OUTFILE "#output\n";
}
}

Perl : Need to append two columns if the ID's are repeating

If id gets repeated I am appending app1, app2 and printing it once.
Input:
id|Name|app1|app2
1|abc|234|231|
2|xyz|123|215|
1|abc|265|321|
3|asd|213|235|
Output:
id|Name|app1|app2
1|abc|234,265|231,321|
2|xyz|123|215|
3|asd|213|235|
Output I'm getting:
id|Name|app1|app2
1|abc|234,231|
2|xyz|123,215|
1|abc|265,321|
3|asd|213,235|
My Code:
#! usr/bin/perl
use strict;
use warnings;
my $basedir = 'E:\Perl\Input\\';
my $file ='doctor.txt';
my $counter = 0;
my %RepeatNumber;
my $pos=0;
open(OUTFILE, '>', 'E:\Perl\Output\DoctorOpFile.csv') || die $!;
open(FH, '<', join('', $basedir, $file)) || die $!;
my $line = readline(FH);
unless ($counter) {
chomp $line;
print OUTFILE $line;
print OUTFILE "\n";
}
while ($line = readline(FH)) {
chomp $line;
my #obj = split('\|',$line);
if($RepeatNumber{$obj[0]}++) {
my $str1= join("|",$obj[0]);
my $str2=join(",",$obj[2],$obj[3]);
print OUTFILE join("|",$str1,$str2);
print OUTFILE "\n";
}
}
This should do the trick:
use strict;
use warnings;
my $file_in = "doctor.txt";
open (FF, "<$file_in");
my $temp = <FF>; # remove first line
my %out;
while (<FF>)
{
my ($id, $Name, $app1, $app2) = split /\|/, $_;
$out{$id}[0] = $Name;
push #{$out{$id}[1]}, $app1;
push #{$out{$id}[2]}, $app2;
}
foreach my $key (keys %out)
{
print $key, "|", $out{$key}[0], "|", join (",", #{$out{$key}[1]}), "|", join (",", #{$out{$key}[2]}), "\n";
}
EDIT
To see what the %out contains (in case it's not clear), you can use
use Data::Dumper;
and print it via
print Dumper(%out);
I'd tackle it like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use 5.14.0;
my %stuff;
#extract the header row.
#use the regex to remove the linefeed, because
#we can't chomp it inline like this.
#works since perl 5.14
#otherwise we could just chomp (#header) later.
my ( $id, #header ) = split( /\|/, <DATA> =~ s/\n//r );
while (<DATA>) {
#turn this row into a hash of key-values.
my %row;
( $id, #row{#header} ) = split(/\|/);
#print for diag
print Dumper \%row;
#iterate each key, and insert into $row.
foreach my $key ( keys %row ) {
push( #{ $stuff{$id}{$key} }, $row{$key} );
}
}
#print for diag
print Dumper \%stuff;
print join ("|", "id", #header ),"\n";
#iterate ids in the hash
foreach my $id ( sort keys %stuff ) {
#join this record by '|'.
print join('|',
$id,
#turn inner arrays into comma separated via map.
map {
my %seen;
#use grep to remove dupes - e.g. "abc,abc" -> "abc"
join( ",", grep !$seen{$_}++, #$_ )
} #{ $stuff{$id} }{#header}
),
"\n";
}
__DATA__
id|Name|app1|app2
1|abc|234|231|
2|xyz|123|215|
1|abc|265|321|
3|asd|213|235|
This is perhaps a bit overkill for your application, but it should handle arbitrary column headings and arbitary numbers of duplicates. I'll coalesce them though - so the two abc entries don't end up abc,abc.
Output is:
id|Name|app1|app2
1|abc|234,265|231,321
2|xyz|123|215
3|asd|213|235
Another way of doing it which doesn't use a hash (in case you want to be more memory efficient), my contribution lies under the opens:
#!/usr/bin/perl
use strict;
use warnings;
my $basedir = 'E:\Perl\Input\\';
my $file ='doctor.txt';
open(OUTFILE, '>', 'E:\Perl\Output\DoctorOpFile.csv') || die $!;
select(OUTFILE);
open(FH, '<', join('', $basedir, $file)) || die $!;
print(scalar(<FH>));
my #lastobj = (undef);
foreach my $obj (sort {$a->[0] <=> $b->[0]}
map {chomp;[split('|')]} <FH>) {
if(defined($lastobj[0]) &&
$obj[0] eq $lastobj[0])
{#lastobj = (#obj[0..1],
$lastobj[2].','.$obj[2],
$lastobj[3].','.$obj[3])}
else
{
if($lastobj[0] ne '')
{print(join('|',#lastobj),"|\n")}
#lastobj = #obj[0..3];
}
}
print(join('|',#lastobj),"|\n");
Note that split, without it's third argument ignores empty elements, which is why you have to add the last bar. If you don't do a chomp, you won't need to supply the bar or the trailing hard return, but you would have to record $obj[4].

Retrieving values matching the same ID with perl

This is a simple problem but cannot find any working solution for it. I have 2 files and the first file holds all the ID that I am interested in, for example "tomato", "cucumber", but also the ones I am not interested in, which hold no value in the second file. The second file has the following data structure
tomato red
tomato round
tomato sweet
cucumber green
cucumber bitter
cucumber watery
What I need to get is a file containing all the IDs with all the matching values from the second file, everything tab-seperated, like this:
tomato red round sweet
cucumber green bitter watery
What I did so far is create a hash out of the IDs in the first file:
while (<FILE>) {
chomp;
#records = split "\t", $_;
{%hash = map { $records[0] => 1 } #records};
}
And this for the second file:
while (<FILE2>) {
chomp;
#records2 = split "\t", $_;
$key, $value = $records2[0], $records2[1];
$data{$key} = join("\t", $value);
}
close FILE;
foreach my $key ( keys %data )
{
print OUT "$key\t$data{$key}\n"
if exists $hash{$key}
}
Would be grateful for some simple solution for combining all the values matching the same ID! :)
for th first file:
while (<FILE>) {
chomp;
#records = split "\t", $_;
$hash{$records[0]} = 1;
}
and for the second:
while (<FILE2>) {
chomp;
#records2 = split "\t", $_;
($key,$value) = #records2;
$data{$key} = [] unless exists $data{$key};
push #{$data{$key}}, $value;
}
close FILE;
foreach my $key ( keys %data ) {
print OUT $key."\t".join("\t", #{$data{$key}})."\n" if exists $hash{$key};
}
This seems to do what is needed
use strict;
use warnings;
my %data;
open my $fh, '<', 'file1.txt' or die $!;
while (<$fh>) {
$data{$1} = {} if /([^\t]+)/;
}
open $fh, '<', 'file2.txt' or die $!;
while (<$fh>) {
$data{$1}{$2}++ if /^(.+?)\t(.+?)$/ and exists $data{$1};
}
while ( my ($key, $values) = each %data) {
print join("\t", $key, keys %$values), "\n";
}
output
tomato sweet round red
cucumber green watery bitter
It's easier if you read the data mapping first.
Also, if you are using Perl, you should consider from the get-go leveraging one its main strengths - CPAN libraries. For example, the reading in of the file is as simple as read_file() from File::Slurp; instead of having to open/close the file yourself and then run a while(<>) loop.
use File::Slurp;
my %data;
my #data_lines = File::Slurp::read_file($filename2);
chomp(#data_lines);
foreach my $line (#data_lines) { # Improved version from CyberDem0n's answer
my ($key, $value) = split("\t", $line);
$data{$key} ||= []; # Make sure it's an array reference if first time
push #{ $data{$key} }, $value;
}
my #id_lines = File::Slurp::read_file($filename1);
chomp(#id_lines);
foreach my $id (#id_lines) {
print join("\t", ( $id, #{ $data{$id} } ) )."\n";
}
A slightly more hacky but a bit shorter code adds the ID to the list of values in the data hash from the get go:
my #data_lines = File::Slurp::read_file($filename2);
chomp(#data_lines);
foreach my $line (#data_lines) { # Improved version from CyberDem0n's answer
my ($key, $value) = split("\t", $line);
$data{$key} ||= [ $id ]; # Add the ID for printing
push #{ $data{$key} }, $value;
}
my #id_lines = File::Slurp::read_file($filename1);
chomp(#id_lines);
foreach my $id (#id_lines) {
print join("\t", #{ $data{$id} } ) ."\n"; # ID already in %data!
}