Perl get hash keys value count in single hash itself - perl

I have written a script which processes data from $data variable and get the count of values of each key.
The script works fine but I am producing 2 hashes called %data_hash, %count_hash. One for storing the data and another to get the count of those key values.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $data = "KEY1,VAL1
KEY2,VAL1
KEY1,VAL2
KEY1,VAL3
KEY1,VAL4
KEY2,VAL2
KEY2,VAL3
KEY2,VAL4
KEY1,VAL5
";
my (%data_hash, %count_hash);
foreach my $each_data (split /\n/, $data){
my ($key, $val) = (split /,/, $each_data);
push( #{$data_hash{$key}}, $val );
}
print Dumper(\%data_hash);
foreach my $key (sort keys %data_hash) {
$count_hash{$key} = scalar #{$data_hash{$key}};
}
print Dumper(\%count_hash);
Can I have a single hash instead of 2 and get the count by retaining the data?

Sure you can:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $data = "KEY1,VAL1
KEY2,VAL1
KEY1,VAL2
KEY1,VAL3
KEY1,VAL4
KEY2,VAL2
KEY2,VAL3
KEY2,VAL4
KEY1,VAL5
";
my (%data_hash, %count_hash);
foreach my $each_data (split /\n/, $data){
my ($key, $val) = (split /,/, $each_data);
push( #{$data_hash{$key}{vals}}, $val );
$data_hash{$key}{num_vals}++;
}
print Dumper(\%data_hash);

Related

Reading text file into hash and accessing values perl

I am trying to read text file content into hash but having some problem reading as well as accessing it.
resctrl_top
/path/to/a/
vdm05top
/path/to/b/
/path/to/c/
/path/to/d/
/path/to/e/
/path/to/f/
The file format will be as above. My desired output is a hash with the non spacing line as key, and the path lines as values. I would like to know also how to access each values for different keys.
resctrl_top => /path/to/a/
vdm05top => /path/to/b/,/path/to/c/,...
Below are the effort I tried:
use strict;
use warnings;
my %hash;
open FILE, "filename.txt" or die $!;
my $key;
while (my $line = <FILE>) {
chomp($line);
if ($line !~ /^\s/) {
($key) = $line =~ /^\S+/g;
$hash{$key} = [];
} else {
$line =~ s/^\s+//;
push #{ $hash{$key} }, $line;
}
}
close FILE;
foreach (keys %hash){
print "$key => $hash{$key}\n";
}
Try this way:
use strict;
use warnings;
use Data::Dumper;
my %hash;
my $key;
while (my $line = <DATA>) {
chomp($line);
if ($line !~ /^\s/) {
$key = $line;
} else {
$line =~ s/\s//g;
push (#{$hash{$key}} , $line);
}
}
my %final;
foreach my $k (keys %hash){
my $val = join(",", #{$hash{$k}});
$final{$k} = $val; #New hash will have key and respective values
}
print Dumper(\%final);
__DATA__
resctrl_top
/path/to/a/
vdm05top
/path/to/b/
/path/to/c/
/path/to/d/
/path/to/e/
/path/to/f/
Result:
$VAR1 = {
'vdm05top' => '/path/to/b/,/path/to/c/,/path/to/d/,/path/to/e/,/path/to/f/',
'resctrl_top' => '/path/to/a/'
};
Hope this solves your problem.
Here's a pretty simple solution.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Data::Dumper; # Just for output
my ($key, %hash); # Declare globals
while (<DATA>) { # Quick hack - read from DATA
chomp;
if (/^\s/) { # If the line starts with a space
s/^\s+//;
push #{$hash{$key}}, $_;
} else { # The line is a key
$key = $_;
}
}
say Dumper \%hash;
__DATA__
resctrl_top
/path/to/a/
vdm05top
/path/to/b/
/path/to/c/
/path/to/d/
/path/to/e/
/path/to/f/

How to separate an array in Perl based on pattern

I am trying to write a big script but I am stuck on a part. I want to sprit an array based on ".."
From the script I got this:
print #coordinates;
gene complement(872..1288)
my desired output:
complement 872 1288
I tried:
1) my #answer = split(.., #coordinates)
print("#answer\n");
2) my #answer = split /../, #coordinates;
3) print +(split /\../)[-1],[-2],[-3] while <#coordinates>
4) foreach my $anwser ( #coordinates )
{$anwser =~ s/../"\t"/;
print $anwser;}
5) my #answer = split(/../, "complement(872..1288)"); #to see if the printed array is problematic.
which prints:
) ) ) ) ) ) ) ) )
6) my #answer = split /"gene "/, #coordinates; # I tried to "catch" the entire output's spaces and tabs
which prints
0000000000000000000000000000000001000000000100000000
But none of them works. Does anyone has any idea how to step over this issue?
Ps, unfortunately, I can't run my script right now on Linux so I used this website to run my script. I hope this is not the reason why I didn't get my desired output.
my $RE_COMPLEMENT = qr{(complement)\((\d+)\.\.(\d+)\)}msx;
for my $item (#coordinates) {
my ($head, $i, $j) = $item =~ $RE_COMPLEMENT;
if (defined($head) && defined($i) && defined($j)) {
print("$head\t$i\t$j\n");
}
}
split operates on a scalar, not on an array.
my $string = 'gene complement(872..1288)';
my #parts = split /\.\./, $string;
print $parts[0]; # gene complement(872
print $parts[1]; # 1288)
To get the desired output, you can use a substitution:
my $string = 'gene complement(872..1288)';
$string =~ s/gene +|\)//g;
$string =~ s/\.\./ /;
$string =~ s/\(/ /;
Desired effect can be achieved with
use of tr operator to replace '(.)' => ' '
then splitting data string into element on space
storing only required part of array
output elements of array joined with tabulation
use strict;
use warnings;
use feature 'say';
my $data = <DATA>;
chomp $data;
$data =~ tr/(.)/ /;
my #elements = (split ' ', $data)[1..3];
say join "\t", #elements;
__DATA__
gene complement(872..1288)
Or as an alternative solution with only substitutions (without splitting data string into array)
use strict;
use warnings;
use feature 'say';
my $data = <DATA>;
chomp $data;
$data =~ s/gene\s+//;
$data =~ s/\)//;
$data =~ s/[(.]+/\t/g;
say $data;
__DATA__
gene complement(872..1288)
Output
complement 872 1288

How to print $_

How can I print $_ outside using grep? I want to print the data inside %try_1 if it match with %try
.........
#!/usr/bin/perl
use strict;
use warnings;
use Tie::Autotie 'Tie::IxHash';
tie my %try, 'Tie::IxHash';
$try{STRONG}{ALLIES}='A';
$try{AGILE}{BOBBY}='B';
$try{AGILE}{HOBBY}='B';
$try{SMART}{CAKRA}='C';
$try{SMART}{PHONE}='C';
$try{SMART}{PEOPLE}='C';
my %try_1;
$try_1{STRONGER}='A';
$try_1{AGILER}='B';
$try_1{SMARTER}='C';
foreach my $temp_0 (keys %try)
{
print "\n".$_."\n" if (grep {$_ =~ /$temp_0/i} (keys %try_1));
}
You're almost there. Change:
print "\n".$_."\n" if (grep {$_ =~ /\Q$temp_0/i} (keys %try_1));
To:
print "\n".$_."\n" for grep {/\Q$temp_0/i} keys %try_1;
The \Q handles any special character $temp_0 might contain.

Perl : Need to append two columns if the ID's are repeating

If id gets repeated I am appending app1, app2 and printing it once.
Input:
id|Name|app1|app2
1|abc|234|231|
2|xyz|123|215|
1|abc|265|321|
3|asd|213|235|
Output:
id|Name|app1|app2
1|abc|234,265|231,321|
2|xyz|123|215|
3|asd|213|235|
Output I'm getting:
id|Name|app1|app2
1|abc|234,231|
2|xyz|123,215|
1|abc|265,321|
3|asd|213,235|
My Code:
#! usr/bin/perl
use strict;
use warnings;
my $basedir = 'E:\Perl\Input\\';
my $file ='doctor.txt';
my $counter = 0;
my %RepeatNumber;
my $pos=0;
open(OUTFILE, '>', 'E:\Perl\Output\DoctorOpFile.csv') || die $!;
open(FH, '<', join('', $basedir, $file)) || die $!;
my $line = readline(FH);
unless ($counter) {
chomp $line;
print OUTFILE $line;
print OUTFILE "\n";
}
while ($line = readline(FH)) {
chomp $line;
my #obj = split('\|',$line);
if($RepeatNumber{$obj[0]}++) {
my $str1= join("|",$obj[0]);
my $str2=join(",",$obj[2],$obj[3]);
print OUTFILE join("|",$str1,$str2);
print OUTFILE "\n";
}
}
This should do the trick:
use strict;
use warnings;
my $file_in = "doctor.txt";
open (FF, "<$file_in");
my $temp = <FF>; # remove first line
my %out;
while (<FF>)
{
my ($id, $Name, $app1, $app2) = split /\|/, $_;
$out{$id}[0] = $Name;
push #{$out{$id}[1]}, $app1;
push #{$out{$id}[2]}, $app2;
}
foreach my $key (keys %out)
{
print $key, "|", $out{$key}[0], "|", join (",", #{$out{$key}[1]}), "|", join (",", #{$out{$key}[2]}), "\n";
}
EDIT
To see what the %out contains (in case it's not clear), you can use
use Data::Dumper;
and print it via
print Dumper(%out);
I'd tackle it like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use 5.14.0;
my %stuff;
#extract the header row.
#use the regex to remove the linefeed, because
#we can't chomp it inline like this.
#works since perl 5.14
#otherwise we could just chomp (#header) later.
my ( $id, #header ) = split( /\|/, <DATA> =~ s/\n//r );
while (<DATA>) {
#turn this row into a hash of key-values.
my %row;
( $id, #row{#header} ) = split(/\|/);
#print for diag
print Dumper \%row;
#iterate each key, and insert into $row.
foreach my $key ( keys %row ) {
push( #{ $stuff{$id}{$key} }, $row{$key} );
}
}
#print for diag
print Dumper \%stuff;
print join ("|", "id", #header ),"\n";
#iterate ids in the hash
foreach my $id ( sort keys %stuff ) {
#join this record by '|'.
print join('|',
$id,
#turn inner arrays into comma separated via map.
map {
my %seen;
#use grep to remove dupes - e.g. "abc,abc" -> "abc"
join( ",", grep !$seen{$_}++, #$_ )
} #{ $stuff{$id} }{#header}
),
"\n";
}
__DATA__
id|Name|app1|app2
1|abc|234|231|
2|xyz|123|215|
1|abc|265|321|
3|asd|213|235|
This is perhaps a bit overkill for your application, but it should handle arbitrary column headings and arbitary numbers of duplicates. I'll coalesce them though - so the two abc entries don't end up abc,abc.
Output is:
id|Name|app1|app2
1|abc|234,265|231,321
2|xyz|123|215
3|asd|213|235
Another way of doing it which doesn't use a hash (in case you want to be more memory efficient), my contribution lies under the opens:
#!/usr/bin/perl
use strict;
use warnings;
my $basedir = 'E:\Perl\Input\\';
my $file ='doctor.txt';
open(OUTFILE, '>', 'E:\Perl\Output\DoctorOpFile.csv') || die $!;
select(OUTFILE);
open(FH, '<', join('', $basedir, $file)) || die $!;
print(scalar(<FH>));
my #lastobj = (undef);
foreach my $obj (sort {$a->[0] <=> $b->[0]}
map {chomp;[split('|')]} <FH>) {
if(defined($lastobj[0]) &&
$obj[0] eq $lastobj[0])
{#lastobj = (#obj[0..1],
$lastobj[2].','.$obj[2],
$lastobj[3].','.$obj[3])}
else
{
if($lastobj[0] ne '')
{print(join('|',#lastobj),"|\n")}
#lastobj = #obj[0..3];
}
}
print(join('|',#lastobj),"|\n");
Note that split, without it's third argument ignores empty elements, which is why you have to add the last bar. If you don't do a chomp, you won't need to supply the bar or the trailing hard return, but you would have to record $obj[4].

perl printing hash of arrays with out Data::Dumper

Here is the code, I know it is not perfect perl. If you have insight on how I an do better let me know. My main question is how would I print out the arrays without using Data::Dumper?
#!/usr/bin/perl
use Data::Dumper qw(Dumper);
use strict;
use warnings;
open(MYFILE, "<", "move_headers.txt") or die "ERROR: $!";
#First split the list of files and the headers apart
my #files;
my #headers;
my #file_list = <MYFILE>;
foreach my $source_parts (#file_list) {
chomp($source_parts);
my #parts = split(/:/, $source_parts);
unshift(#files, $parts[0]);
unshift(#headers, $parts[1]);
}
# Next get a list of unique headers
my #unique_files;
foreach my $item (#files) {
my $found = 0;
foreach my $i (#unique_files) {
if ($i eq $item) {
$found = 1;
last;
}
}
if (!$found) {
unshift #unique_files, $item;
}
}
#unique_files = sort(#unique_files);
# Now collect the headers is a list per file
my %hash_table;
for (my $i = 0; $i < #files; $i++) {
unshift #{ $hash_table{"$files[$i]"} }, "$headers[$i]";
}
# Process the list with regex
while ((my $key, my $value) = each %hash_table) {
if (ref($value) eq "ARRAY") {
print "$value", "\n";
}
}
The Perl documentation has a tutorial on "Printing of a HASH OF ARRAYS" (without using Data::Dumper)
perldoc perldsc
You're doing a couple things the hard way. First, a hash will already uniqify its keys, so you don't need the loop that does that. It appears that you're building a hash of files, with the values meant to be the headers found in those files. The input data is "filename:header", one per line. (You could use a hash of hashes, since the headers may need uniquifying, but let's let that go for now.)
use strict;
use warnings;
open my $files_and_headers, "<", "move_headers.txt" or die "Can't open move_headers: $!\n";
my %headers_for_file;
while (defined(my $line = <$files_and_headers> )) {
chomp $line;
my($file, $header) = split /:/, $line, 2;
push #{ $headers_for_file{$file} }, $header;
}
# Print the arrays for each file:
foreach my $file (keys %headers_for_file) {
print "$file: #{ $headers_for_file{$file}}\n";
}
We're letting Perl do a chunk of the work here:
If we add keys to a hash, they're always unique.
If we interpolate an array into a print statement, Perl adds spaces between them.
If we push onto an empty hash element, Perl automatically puts an empty anonymous array in the element and then pushes onto that.
An alternative to using Data::Dumper is to use Data::Printer:
use Data::Printer;
p $value;
You can also use this to customise the format of the output. E.g. you can have it all in a single line without the indexes (see the documentation for more options):
use Data::Printer {
index => 0,
multiline => 0,
};
p $value;
Also, as a suggestion for getting unique files, put the elements into a a hash:
my %unique;
#unique{ #files } = #files;
my #unique_files = sort keys %unique;
Actually, you could even skip that step and put everything into %hash_table in one pass:
my %hash_table;
foreach my $source_parts (#file_list) {
chomp($source_parts);
my #parts = split(/:/, $source_parts);
unshift #{ $hash_table{$parts[0]} }, $parts[1];
}