Related
I want to write a script that can compare two multi-dimension hash to see whether they match or not. This two hash has same value of key, the comparison will done in a pair mean starting with $j=0 first compare $line[0] $line[1] then follow by $line[2] $line[3]
$hash{"key"}{$key}{"$row $col"}= "$line[$j] $line[$j+1]";
$hash1{"key"}{$key1}{"$row1 $col1"}="$line1[$j] $line1[$j+1]";
my $line and $line1 is take from a huge file and will process the huge file line by line. For example:
#line = 1 2 3 4 5 6 7 8 #fst line from file1
#line1 = 1 2 3 3 4 5 6 7 7 #fst line from file2
when $key and $key1 match the scripts will continue to compare the $row $col then lastly compare $line[$j] $line[$j+1]. I am trying to use one of the module Test::More that mention at previous similar question but it can only compare the 1st not matching and the output is in default format.
not ok 1 - data structures should be the same
# Failed test 'data structures should be the same'
# at hash_check1.pl line 80.
# Structures begin differing at:
# $got->{key}{100}{2 3} = '62 19'
# $expected->{key}{100}{2 3} = '12 24'
# Tests were run but no plan was declared and done_testing() was not seen.
Any best way to compare this type of multi-dimension hash?
8/22 Edited
If Test::More module is able to compare all the $key match for two hash i can take that as well but if i can output the matching and non matching in my own prefer format will be great. For example i would like to output like
$key at $row $col no match with value $line[$j] $line[$j+1] ( expected value $line1[$j] $line1[$j+1] )
8/22 Edited
Below is part of my code
use strict;
use warnings;
use Test::More;
open ( FILE1 , '<', "file.txt" ) or die $!;
open ( FILE2 , '<' , "file1.txt" ) or die $!;
chomp (my #file1 = <FILE1>);
chomp (my #file2 = <FILE2>);
my %hash=();
my %hash1=();
for ( $i =0 ; $i<=511 ; $i++ ) {
my #line = split(" ",$file1[$i]);
my #line1 = split(" ",$file2[$i]);
my $key = ($i+1)*10;
my $key1 = ($i+2)*10;
for ( $j=0; $j<=15 ; $j+=2){
my $col = hex($j);
my $col1 = hex($j+1);
$hash{"key"}{$key}{"$row $col1"}= "$line[$j] $line[$j+1]";
$hash1{"key"}{$key1}{"$row1 $col1"}= "$line1[$j] $line1[$j+1]";
}
}
##############comparison part start here###################
is_deeply(\%hash, \%hash1, 'data structures should be the same'); #can only print one mismatch
################Any better way?###########
Here is an example that will compare nested hashes with string values:
#! /usr/bin/env perl
use feature qw(say);
use warnings;
use strict;
my %hash;
my %hash1;
$hash{"key"}{A}{"2 3"}= "1 2";
$hash1{"key"}{A}{"2 3"}="2 3";
$hash{"key"}{B}{"2 3"}= "1 2";
$hash1{"key"}{C}{"2 3"}="2 3";
compare(\%hash, \%hash1);
sub compare {
return _compare( #_, "" );
}
sub get_key_str { return $_[0] . '{' . $_[1] . '}' }
sub _process_key {
my ( $h1, $h2, $info, $key, $str1, $str2 ) = #_;
if ( exists $h2->{$key} ) {
my $val1 = $h1->{$key};
my $val2 = $h2->{$key};
if ( ref $val1 eq "HASH" and ref $val2 eq "HASH" ) {
_compare( $val1, $val2, get_key_str( $info, $key ) );
}
else {
die "Expected string value" if ref $val1 or ref $val2;
if ( $val1 ne $val2 ) {
say "Value '$val1' in $str1 hash for key " . get_key_str( $info, $key )
. " does not match value '$val2' in $str2 hash";
}
}
}
else {
my $cur_key = get_key_str( $info, $key );
say "Got key $cur_key in $str1 hash, but missing in $str2 hash";
}
}
sub _compare {
my ( $h1, $h2, $info ) = #_;
my %processed_keys;
for (keys %$h1) {
_process_key( $h1, $h2, $info, $_, "first", "second" );
$processed_keys{$_}++;
}
for (keys %$h2) {
next if exists $processed_keys{$_};
_process_key( $h2, $h1, $info, $_, "second", "first" );
}
}
Output:
Got key {key}{B} in first hash, but missing in second hash
Value '1 2' in first hash for key {key}{A}{2 3} does not match value '2 3' in second hash
Got key {key}{C} in second hash, but missing in first hash
use strict;
use warnings;
my %result_hash = ();
my %final_hash = ();
Compare_results();
foreach my $key (sort keys %result_hash ){
print "$key \n";
print "$result_hash{$key} \n";
}
sub Compare_results
{
while ( <DATA> )
{
my($instance,$values) = split /\:/, $_;
$result_hash{$instance} = $values;
}
}
__DATA__
1:7802315095\d\d,7802315098\d\d;7802025001\d\d,7802025002\d\d,7802025003\d\ d,7802025004\d\d,7802025005\d\d,7802025006\d\d,7802025007\d\d
2:7802315095\d\d,7802025002\d\d,7802025003\d\d,7802025004\d\d,7802025005\d\d,7802025006\d\d,7802025007\d\d
Output
1
7802315095\d\d,7802315098\d\d;7802025001\d\d,7802025002\d\d,7802025003\d\d,7802025004\d\d,7802025005\d\d,7802025006\d\d,7802025007\d\d
2
7802315095\d\d,7802025002\d\d,7802025003\d\d,7802025004\d\d,7802025005\d\d,7802025006\d\d,7802025007\d\d
Iam trying to fetch value of each key and again trying to split the comma seperated value from result hash , if i find a semicolon in any value i would want to store the left and right values in separate hash keys.
Something like below
1.#split the value of result_hash{$key} again by , and see whether any chunk is seperated by ;
2. #every chunk without ; and value on left with ; should be stored in
#{$final_hash{"eto"}} = ['7802315095\d\d','7802315098\d\d','7802025002\d\d','7802025003\d\d','7802025004\d\d','7802025005\d\d','7802025006\d\d','7802025007\d\d'] ;
3.#Anything found on the right side of ; has to be stored in
#{$final_hash{"pro"}} = ['7802025001\d\d'] ;
Is there a way that i can handle everything in the subroutine? Can i make the code more simpler
Update :
I tried splitting the string in a single shot, but its just picking the values with semicolon and ignoring everything
foreach my $key (sort keys %result_hash ){
# print "$key \n";
# print "$result_hash{$key} \n";
my ($o,$t) = split(/,|;/, $result_hash{$key});
print "Left : $o \n";
print "Left : $t \n";
#push #{$final_hash{"eto"}}, $o;
#push #{$final_hash{"pro"}} ,$t;
}
}
My updated code after help
sub Compare_results
{
open my $fh, '<', 'Data_File.txt' or die $!;
# split by colon and further split by , and ; if any (done in insert_array)
my %result_hash = map { chomp; split ':', $_ } <$fh> ;
foreach ( sort { $a <=> $b } (keys %result_hash) )
{
($_ < 21)
? insert_array($result_hash{$_}, "west")
: insert_array($result_hash{$_}, "east");
}
}
sub insert_array()
{
my ($val,$key) = #_;
foreach my $field (split ',', $val)
{
$field =~ s/^\s+|\s+$//g; # / turn off editor coloring
if ($field !~ /;/) {
push #{ $file_data{"pto"}{$key} }, $field ;
}
else {
my ($left, $right) = split ';', $field;
push #{$file_data{"pto"}{$key}}, $left if($left ne '') ;
push #{$file_data{"ero"}{$key}}, $right if($right ne '') ;
}
}
}
Thanks
Update Added a two-pass regex, at the end
Just proceed systematically, analyze the string step by step. The fact that you need consecutive splits and a particular separation rule makes it unwieldy to do in one shot. Better have a clear method than a monster statement.
use warnings 'all';
use strict;
use feature 'say';
my (%result_hash, %final_hash);
Compare_results();
say "$_ => $result_hash{$_}" for sort keys %result_hash;
say '---';
say "$_ => [ #{$final_hash{$_}} ]" for sort keys %final_hash;
sub Compare_results
{
%result_hash = map { chomp; split ':', $_ } <DATA>;
my (#eto, #pro);
foreach my $val (values %result_hash)
{
foreach my $field (split ',', $val)
{
if ($field !~ /;/) { push #eto, $field }
else {
my ($left, $right) = split ';', $field;
push #eto, $left;
push #pro, $right;
}
}
}
$final_hash{eto} = \#eto;
$final_hash{pro} = \#pro;
return 1; # but add checks above
}
There are some inefficiencies here, and no error checking, but the method is straightforward. If your input is anything but smallish please change the above to process line by line, what you clearly know how to do. It prints
1 => ... (what you have in the question)
---
eto => [ 7802315095\d\d 7802315098\d\d 7802025002\d\d 7802025003\d\ d ...
pro => [ 7802025001\d\d ]
Note that your data does have one loose \d\ d.
We don't need to build the whole hash %result_hash for this but only need to pick the part of the line after :. I left the hash in since it is declared global so you may want to have it around. If it in fact isn't needed on its own this simplifies
sub Compare_results {
my (#eto, #pro);
while (<DATA>) {
my ($val) = /:(.*)/;
foreach my $field (split ',', $val)
# ... same
}
# assign to %final_hash, return from sub
}
Thanks to ikegami for comments.
Just for the curiosity's sake, here it is in two passes with regex
sub compare_rx {
my #data = map { (split ':', $_)[1] } <DATA>;
$final_hash{eto} = [ map { /([^,;]+)/g } #data ];
$final_hash{pro} = [ map { /;([^,;]+)/g } #data ];
return 1;
}
This picks all characters which are not , or ;, using the negated character class, [^,;]. So that is up to the first either of them, left to right. It does this globally, /g, so it keeps going through the string, collecting all fields that are "left of" , or ;. Then it cheats a bit, picking all [^,;] that are right of ;. The map is used to do this for all lines of data.
If %result_hash is needed build it instead of #data and then pull the values from it with my #values = values %hash_result and feed the map with #values.
Or, broken line by line (again, you can build %result_hash instead of taking $data directly)
my (#eto, #pro);
while (<DATA>) {
my ($data) = /:(.*)/;
push #eto, $data =~ /([^,;]+)/g;
push #pro, $data =~ /;([^,;]+)/g;
}
I have a csv with about 160,000 lines, it looks like this:
chr1,160,161,3,0.333333333333333,+
chr1,161,162,4,0.5,-
chr1,309,310,14,0.0714285714285714,+
chr1,311,312,2,0.5,-
chr1,499,500,39,0.717948717948718,+
chr2,500,501,8,0.375,-
chr2,510,511,18,0.5,+
chr2,511,512,6,0.333333333333333,-
I would like to pair lines where column 1 is the same, column 3 matches column 2 and where column 6 is a '+' while on the other line it is a '-'. If this is true I would like to sum column 4 and column 5.
My desired out put would be
chr1,160,161,7,0.833333333333333,+
chr1,309,310,14,0.0714285714285714,+
chr1,311,312,2,0.5,-
chr1,499,500,39,0.717948717948718,+
chr2,500,501,8,0.375,-
chr2,510,511,24,0.833333333333333,-
the best solution I can think of is to duplicate the file and then match columns between the file and it's duplicate with perl:
#!/usr/bin/perl
use strict;
use warnings;
open my $firstfile, '<', $ARGV[0] or die "$!";
open my $secondfile, '<', $ARGV[1] or die "$!";
my ($chr_a, $chr_b,$start,$end,$begin,$finish, $sum_a, $sum_b, $total_a,
$total_b,$sign_a,$sign_b);
while (<$firstfile>) {
my #col = split /,/;
$chr_a = $col[0];
$start = $col[1];
$end = $col[2];
$sum_a = $col[3];
$total_a = $col[4];
$sign_a = $col[5];
seek($secondfile,0,0);
while (<$secondfile>) {
my #seccol = split /,/;
$chr_b = $seccol[0];
$begin = $seccol[1];
$finish = $seccol[2];
$sum_b = $seccol[3];
$total_b = $seccol[4];
$sign_b = $seccol[5];
print join ("\t", $col[0], $col[1], $col[2], $col[3]+=$seccol[3],
$col[4]+=$seccol[4], $col[5]),
"\n" if ($chr_a eq $chr_b and $end==$begin and $sign_a ne $sign_b);
}
}
And that works fine, but ideally I'd like to be able to do this within the file itself without having to duplicate it, because I have many files and so I would like to run a script over all of them that is less time-consuming.
Thanks.
In the absence of a response to my comment, this program will do as you ask with the data you provide.
use strict;
use warnings;
my #last;
while (<DATA>) {
s/\s+\z//;
my #line = split /,/;
if (#last
and $last[0] eq $line[0]
and $last[2] eq $line[1]
and $last[5] eq '+' and $line[5] eq '-') {
$last[3] += $line[3];
$last[4] += $line[4];
print join(',', #last), "\n";
#last = ()
}
else {
print join(',', #last), "\n" if #last;
#last = #line;
}
}
print join(',', #last), "\n" if #last;
__DATA__
chr1,160,161,3,0.333333333333333,+
chr1,161,162,4,0.5,-
chr1,309,310,14,0.0714285714285714,+
chr1,311,312,2,0.5,-
chr1,499,500,39,0.717948717948718,+
chr2,500,501,8,0.375,-
chr2,510,511,18,0.5,+
chr2,511,512,6,0.333333333333333,-
output
chr1,160,161,7,0.833333333333333,+
chr1,309,310,14,0.0714285714285714,+
chr1,311,312,2,0.5,-
chr1,499,500,39,0.717948717948718,+
chr2,500,501,8,0.375,-
chr2,510,511,24,0.833333333333333,+
I have the below data and I need to make the second column as the header. Any help is appreciated.
Data:
IBM,Voltality,7,73894756.93897434897
IBM,Market,100,983874.34324
GOOG,Sanity,15,8932748
GOOG,Rate Jump,25,873476378.234234
MBLY,Market,340,23423423432.6783
Output:
PRODUCT|Market|Rate Jump|Sanity|Voltality
IBM|100,983874.34324|||7,73894756.93897434897
GOOG||25,873476378.234234|15,8932748|||
MBLY|340,23423423432.6783|||
Code (incomplete / not sure hot to get to the end):
#!/usr/bin/perl
use strict;
use Getopt::Long;
use warnings;
use Data::Dumper;
my $valsep = ',';
my ( %type, %keys, %ccy, %cnt, %avg );
while (<>) {
chomp;
my ( $product, $reason, $count, $lat ) = split /,/;
my $key = "$product,$reason";
if ( not exists( $type{$reason} ) ) {
$type{$reason} = $reason;
}
$ccy{$key} = $product;
$cnt{$key} = $count;
$avg{$key} = $lat;
}
close(INPUT);
print Dumper ( \%ccy );
print Dumper ( \%type );
my ( %pair, %details );
foreach my $rows ( sort keys %ccy ) {
print "the key is : $rows and $ccy{$rows}\n";
foreach my $res ( sort keys %type ) {
print "The type is : $res and $type{$res}\n";
}
}
You just need to keep track of your columns and row data when parsing the data structure.
The following demonstrates:
#!/usr/bin/perl
use strict;
use warnings;
my $fh = \*DATA;
my %columns;
my %rows;
while (<$fh>) {
chomp;
my ( $company, $col, $vals ) = split ',', $_, 3;
# Track Columns for later labeling
$columns{$col}++;
$rows{$company}{$col} = $vals;
}
my #columns = sort keys %columns;
# Header
print join( '|', 'PRODUCT', #columns ), "\n";
for my $company ( sort keys %rows ) {
print join( '|', $company, map { $_ // '' } #{ $rows{$company} }{#columns} ), "\n";
}
__DATA__
IBM,Voltality,7,73894756.93897434897
IBM,Market,100,983874.34324
GOOG,Sanity,15,8932748
GOOG,Rate Jump,25,873476378.234234
MBLY,Market,340,23423423432.6783
Outputs:
PRODUCT|Market|Rate Jump|Sanity|Voltality
GOOG||25,873476378.234234|15,8932748|
IBM|100,983874.34324|||7,73894756.93897434897
MBLY|340,23423423432.6783|||
The following code will do the job; rather than using several hashes, I've put all the data in a hash of hashes. I've put comments in the script to explain what is happening in case you are not sure. You can, of course, delete them in your script.
#!/usr/bin/perl
use warnings;
use strict;
my %market;
while (<DATA>) {
next unless /\w/;
# remove line endings
chomp;
# split line by commas -- only split into three parts
my #col = split ",", $_, 3;
# save the data as $market{col0}{col1} = col2
$market{$col[0]}{$col[1]} = $col[2];
}
# create an output file
my $outfile = 'output.txt';
open( my $fh, ">", $outfile ) or die "Could not open $outfile: $!";
my #headers = ('Market','Rate Jump','Sanity','Volatility');
# print out the header line, joined by |
print { $fh } join('|', 'PRODUCT', #headers) . "\n";
# for each product in the market data
for my $p (sort keys %market) {
# print the product name
print { $fh } join('|', $p,
# go through the headers using map (map acts like a "for" loop)
# if the relevant property exists in the market data, print it;
# if not, print nothing
map { $market{$p}{$_} // '' } #headers) . "\n";
}
# this is the input data. You might be reading yours in from a file
__DATA__
IBM,Voltality,7,73894756.93897434897
IBM,Market,100,983874.34324
GOOG,Sanity,15,8932748
GOOG,Rate Jump,25,873476378.234234
MBLY,Market,340,23423423432.6783
Output:
PRODUCT|Market|Rate Jump|Sanity|Volatility
GOOG||25,873476378.234234|15,8932748|
IBM|100,983874.34324|||7,73894756.93897434897
MBLY|340,23423423432.6783|||
What I was trying to do was combine elements[1..3] into a single array, and then make the has out of that. Then sort by keys and print out the whole thing.
#!/usr/bin/perl
my %hash ;
while ( <> ) {
#elements = split /,/, $_;
#slice_elements = #elements[1..3] ;
if ($elements[0] ne '' ) {
$hash{ $elements[0] } = $slice_elements[0];
}
}
foreach $key (sort keys %hash ) {
print "$key; $hash{$key}\n";
}
This is what I get when I print this out -
casper_mint#casper-mint-dell /tmp $ /tmp/dke /tmp/File1.csv
060001.926941; TOT
060002.029434; RTP
060002.029568; RTP
060002.126895; UL
060002.229327; RDS/A
060002.312512; EON
060002.429382; RTP
060002.585408; BCS
060002.629333; LYG
060002.712240; HBC
This is waht I want the elements of the array - element[0] is the key and element[1..3] in the value
060001.926941,TOT,86.26,86.48
060002.029434,RTP,310.0,310.66
060002.029568,RTP,310.0,310.74
060002.126895,UL,34.06,34.14
060002.229327,RDS/A,84.47,84.72
060002.312512,EON,56.88,57.04
060002.429382,RTP,310.08,310.77
060002.585408,BCS,58.96,59.06
060002.629333,LYG,46.13,46.41
060002.712240,HBC,93.06,93.23
Always include use strict; and use warnings; at the top of EVERY perl script.
What you need is to create a new anonymous array [ ] as the value to your hash. Then join the values when displaying the results:
#!/usr/bin/perl
use strict;
use warnings;
my %hash;
while (<>) {
chomp;
my #elements = split /,/, $_;
if ($elements[0] ne '' ) {
$hash{ $elements[0] } = [#elements[1..3]];
}
}
foreach my $key (sort keys %hash ) {
print join(',', $key, #{$hash{$key}}) . "\n";
}
Of course, if your data really is fixed width like that, and you're not actually doing anything with the values, there actually is no need to split and join. The following would do the same thing:
use strict;
use warnings;
print sort <>;