Appending values to Hash if key is same in Perl - perl
Problem is to read a file with value at every new line. Content of file looks like
3ssdwyeim3,3ssdwyeic9,2017-03-16,09:10:35.372,0.476,EndInbound
3ssdwyeim3,3ssdwyfyyn,2017-03-16,09:10:35.369,0.421,EndOutbound
3ssdwyfxc0,3ssdwyfxfi,2017-03-16,09:10:35.456,0.509,EndInbound
3ssdwyfxc0,3ssdwyhg0v,2017-03-16,09:10:35.453,0.436,EndOutbound
With the string before first comma being the Key and string in between last and second last comma the Value
i.e. for the first line 3ssdwyeim3 becomes the key and 0.476 Value.
Now as we are looping over each line if the key exists we have to concatenate the values separated by comma.
Hence for the next new line as key already exists key remains 3ssdwyeim3 but the value is updated to 0.476,0.421.
Finally we have to print the keys and values in a file.
I have written a code to achieve the same, which is as follows.
sub findbreakdown {
my ( $out ) = #_;
my %timeLogger;
open READ, "out.txt" or die "Cannot open out.txt for read :$!";
open OUTBD, ">$out\_breakdown.csv" or die "Cannot open $out\_breakdown.csv for write :$!";
while ( <READ> ) {
if ( /(.*),.*,.*,.*,(.*),.*/ ) {
$btxnId = $1;
$time = $2;
if ( !$timeLogger{$btxnId} ) {
$timeLogger{$btxnId} = $time;
}
else {
$previousValue = $timeLogger{$btxnId};
$newValue = join ",", $previousValue, $time;
$timeLogger{$btxnId} = $newValue;
}
}
foreach ( sort keys %timeLogger ) {
print OUTBD "$_ ,$timeLogger{$_}\n";
}
}
close OUTBD;
close READ;
}
However Something is going wrong and its printing like this
3ssdwyeim3,0.476
3ssdwyeim3,0.476,0.421
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436
Whereas expected is:
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436
Your program is behaving correctly, but you are printing the current state of the entire hash after you process each line.
Therefore you are printing hash keys before they have the complete set of values, and you have many duplicated lines.
If you move the foreach loop that prints to the end of your program (or simply use the debugger to inspect the variables) you will find that the final state of the hash is exactly what you expect.
Edit: I previously thought the problem was the below, but it's because I misread the sample data in your question.
This regular expression is not ideal:
if (/(.*),.*,.*,.*,(.*),.*/) {
The .* is greedy and will match as much as possible (including some content with commas). So if any line contains more than six comma-separated items, more than one item will be included in the first matching group. This may not be a problem in your actual data, but it's not an ideal way to write the code. The expression is more ambiguous than necessary.
It would be better written like this:
if (/^([^,]*),[^,]*,[^,]*,[^,]*,([^,]*),[^,]*$/) {
Which would only match lines with exactly six items.
Or consider using split on the input line, which would be a cleaner solution.
This is much simpler than you have made it. You can just split each line into fields and use push to add the value to the list corresponding to the key
I trust you can modify this to read from an external file instead of the DATA file handle?
use strict;
use warnings 'all';
my %data;
while ( <DATA> ) {
my #fields = split /,/;
push #{ $data{$fields[0]} }, $fields[-2];
}
for my $key ( sort keys %data ) {
print join(',', $key, #{ $data{$key} }), "\n";
}
__DATA__
3ssdwyeim3,3ssdwyeic9,2017-03-16,09:10:35.372,0.476,EndInbound
3ssdwyeim3,3ssdwyfyyn,2017-03-16,09:10:35.369,0.421,EndOutbound
3ssdwyfxc0,3ssdwyfxfi,2017-03-16,09:10:35.456,0.509,EndInbound
3ssdwyfxc0,3ssdwyhg0v,2017-03-16,09:10:35.453,0.436,EndOutbound
output
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436
Related
Regular expression to print a string from a command outpout
I have written a function that uses regex and prints the required string from a command output. The script works as expected. But it's does not support a dynamic output. currently, I use regex for "icmp" and "ok" and print the values. Now, type , destination and return code could change. There is a high chance that command doesn't return an output at all. How do I handle such scenarios ? sub check_summary{ my ($self) = #_; my $type = 0; my $return_type = 0; my $ipsla = $self->{'ssh_obj'}->exec('show ip sla'); foreach my $line( $ipsla) { if ( $line =~ m/(icmp)/ ) { $type = $1; } if ( $line =~ m/(OK)/ ) { $return_type = $1; } } INFO ($type,$return_type); } command Ouptut : PSLAs Latest Operation Summary Codes: * active, ^ inactive, ~ pending ID Type Destination Stats Return Last (ms) Code Run ----------------------------------------------------------------------- *1 icmp 192.168.25.14 RTT=1 OK 1 second ago
Updated to some clarifications -- we need only the last line As if often the case, you don't need a regex to parse the output as shown. You have space-separated fields and can just split the line and pick the elements you need. We are told that the line of interest is the last line of the command output. Then we don't need the loop but can take the last element of the array with lines. It is still unclear how $ipsla contains the output -- as a multi-line string or perhaps as an arrayref. Since it is output of a command I'll treat it as a multi-line string, akin to what qx returns. Then, instead of the foreach loop my #lines = split '\n', $ipsla; # if $ipsla is a multi-line string # my #lines = #$ipsla; # if $ipsla is an arrayref pop #lines while $line[-1] !~ /\S/; # remove possible empty lines at end my ($type, $return_type) = (split ' ', $lines[-1])[1,4]; Here are some comments on the code. Let me know if more is needed. We can see in the shown output that the fields up to what we need have no spaces. So we can split the last line on white space, by split ' ', $lines[-1], and take the 2nd and 5th element (indices 1 and 4), by ( ... )[1,4]. These are our two needed values and we assign them. Just in case the output ends with empty lines we first remove them, by doing pop #lines as long as the last line has no non-space characters, while $lines[-1] !~ /\S/. That is the same as while ( $lines[-1] !~ /\S/ ) { pop #lines } Original version, edited for clarifications. It is also a valid way to do what is needed. I assume that data starts after the line with only dashes. Set a flag once that line is reached, process the line(s) if the flag is set. Given the rest of your code, the loop my $data_start; foreach (#lines) { if (not $data_start) { $data_start = 1 if /^\s* -+ \s*$/x; # only dashes and optional spaces } else { my ($type, $return_type) = (split)[1,4]; print "type: $type, return code: $return_type\n"; } } This is a sketch until clarifications come. It also assumes that there are more lines than one.
I'm not sure of all possibilities of output from that command so my regular expression may need tweaking. I assume the goal is to get the values of all columns in variables. I opted to store values in a hash using the column names as the hash keys. I printed the results for debugging / demonstration purposes. use strict; use warnings; sub check_summary { my ($self) = #_; my %results = map { ($_,undef) } qw(Code ID Type Destination Stats Return_Code Last_Run); # Put results in hash, use column names for keys, set values to undef. my $ipsla = $self->{ssh_obj}->exec('show ip sla'); foreach my $line (#$ipsla) { chomp $line; # Remove newlines from last field if($line =~ /^([*^~])([0-9]+)\s+([a-z]+)\s+([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\s+([[:alnum:]=]+)\s+([A-Z]+)\s+([^\s].*)$/) { $results{Code} = $1; # Code prefixing ID $results{ID} = $2; $results{Type} = $3; $results{Destination} = $4; $results{Stats} = $5; $results{Return_Code} = $6; $results{Last_Run} = $7; } } # Testing use Data::Dumper; print Dumper(\%results); } # Demonstrate check_summary(); # Commented for testing #INFO ($type,$return_type); Worked on the submitted test line. EDIT: Regular expressions allow you to specify patterns instead of the exact text you are attempting to match. This is powerful but complicated at times. You need to read the Perl Regular Expression documentation to really learn them. Perl regular expressions also allow you to capture the matched text. This can be done multiple times in a single pattern which is how we were able to capture all the columns with one expression. The matches go into numbered variables... $1 $2
loop through sql result with duplicate values and show only line with some string with perl
I am selecting from sql three values, hostname, place, message, assigning them to hashref and then in loop printing into file. Before I print it I have to clean the message as it contains in DB plenty of mess like html tags etc while (my $result = $sth_get_message->fetchrow_hashref) { my $message = $result->{MESSAGE}; $message =~ s/<[^>]*>//g; $message =~ s/\n/ /g; $message =~ s/ |"/ /g; $message =~ s/\r//g; my $message_out; if (index($message, 'removed') != -1) {$message_out = $message;} else {$message_out= "";} my $pc = $result->{PC}; my $place = $result->{PLACE}; print OUT "$pc|$message_out|$place\n"; } the result is like this pc1||place pc2|message with word removed|place pc3||place pc3|message with word removed|place pc4||place pc4||place pc4|message with word removed|place some of them have more messages, but only one is relevant, so rest is cleaned but I have duplicates of pc's. What I want to get is to have unique pc list and if it has no message ok, show blank, if it has message with removed, show only this line pc1||place pc2|message with word removed|place pc3|message with word removed|place pc4|message with word removed|place as far as I understood perl hashes this should be possible if I put pc as a key and then work with values but I have no clue how thanks for help
When you have a list of values that you wish to find the unique values for you can populate a hash along these lines: my $hashref = {}; ## initialise an empty hashref .. nb I prefer to work with refs but you could use has hash type variable foreach my $row ( $rows ) { $hashref->{$row} = 1; } now you have a hasref that is indexed by unique keys so you can do the following print join(",", keys %$hashref ); or foreach my $unique_key ( sort keys %$hashref ) { print "$unique key in sorted order\n"; } In your case you could use the string with the fields concatenated or you could just use the values that you wish to make unique and then assign something containing the entire row detail you are interested in place of the 1 value used above as a placeholder. You are then able to retrieve the full row detail with something like $hashref->{$unique_key}
Nested Loops in Perl - Failing logic
I'm trying to work out a bigger problem, but have simplified the issue for readability, ultimately the logic below is the reason the extended program is failing. I am using Perl to search for a short sequence of letters within a larger sequence (protein sequences), and if it is not found, then I'd like to do some calculations. I don't know whether I'm going crazy, but I can't work out why this logic is failing. sub calculateEpitopeMutations { my #mutationArray; my #epitopeArray; my $count; my $localEpitope; open( EPITOPESIN2, $ARGV[5] ) or die "Unable to open file $ARGV[5]\n"; while ( my $line = <EPITOPESIN2> ) { chomp $line; push #epitopeArray, $line; } while ( my ( $key, $value ) = each our %sequencesForCalculation ) { foreach ( #epitopeArray ) { $localEpitope = $_; if ( $value =~ /($localEpitope)/g ) { print "$key\n$localEpitope\nexactly the same\n\n"; next; } else { #This is where I'd like to do the further calculations print "$key\n$localEpitope\nthere is a difference\n\n"; next; } } } } $ARGV[5] is the name of a text file containing a list of 9-character sequences, exactly like the following RVSENIQRF SFQVDCFLW The idea is to put these into array #epitopeArray and iterate through these, and compare them with all (currently just one) $value sequences in the hash %sequencesForCalculation. %sequencesForCalculation is a hash, where $value is a long sequence of characters, like this MDSNTMSSFQVDCFLWHIRKRFADNGLGDAPFLDRLRRDQKSLKGRGNTLGLDIETATLVGKQIVEWILKEESSETLRMTIASVPTSRYLSDMTLEEMSRDWFMLMPRQKKIGPLCVRLDQAVMEKNIVLKANFSVIFNRLETLILLRAFTEEEAIVGEISPLPSLPGHTYEDVKNAVGVLIGGLEWNGNTVRVSENIQRFAWRNCDENGRPSLPPEQK Currently, the small 9-character long sequence $localEpitope is contained in the longer sequence $value so when I iterate through the program, I should get this printed every time. ($key contains a header of information about the protein sequences, but is irrelevant so I have shortened it to just the variable name.) $key RVSENIQRF Exactly the same $key SFQVDCFLW Exactly the same $key But instead I'm getting this $key RVSENIQRF exactly the same $key SFQVDCFLW there is a difference $key Any ideas? Please let me know if anything further is required.
Update TL;DR: You should change $value =~ /($localEpitope)/g to $value =~ /$localEpitope/ Okay now that we know the real circumstances, the problem (as melpomene points out in his comment) is that you have the /g modifier on your pattern match. There's no reason for that; you don't want check how many times the substring appears, you just want to know whether it's there at all The problem is that variables subjected to a /g pattern search keep a state that says where the last search ended. So you're searching for $epitopeArray[0] in the longer string and finding it, and then searching for $epitopeArray[1] from where the previous search terminated. The first substring appears after the second one, so only the first is found For more information on this behaviour, take a look at the pos function which returns the current value of this state. For instance pos($value) will return the character offset where the next m//g will start its search This short program demonstrates the problem. With the /g modifier only BBB is found. Remove it and both are found use strict; use warnings; use 5.010; my $long_s = 'xxxAAAxxxBBBxxx'; for my $substr ( qw/ BBB AAA / ) { if ( $long_s =~ /$substr/g ) { say "$substr okay"; } else { say "$substr nope"; } } output BBB okay AAA nope Original You say Currently, the small 9-character long sequence ($localEpitope) IS contained in the longer sequence ($value), and so when I iterate through the program, I should get the following printed everytime So $localEpitope is a substring of $value and you're saying that $value =~ /($localEpitope)/g evaluates to true That is correct behaviour. $value =~ /$localEpitope/ will check whether $localEpitope can be found anywhere in $value Unfortunately it's not clear enough from what you've written to suggest a solution
i want to merge multiple csv files by specific condition using perl
i have multiple csv files, i want to merge all those files..... i am showing some of my sample csv files below... M1DL1_Interpro_sum.csv IPR017690,Outer membrane, omp85 target,821 IPR014729,Rossmann,327 IPR013785,Aldolase,304 IPR015421,Pyridoxal,224 IPR003594,ATPase,179 IPR000531,TonB receptor,150 IPR018248,EF-hand,10 M1DL2_Interpro_sum.csv IPR017690,Outer membrane, omp85 target,728 IPR013785,Aldolase,300 IPR014729,Rossmann,261 IPR015421,Pyridoxal,189 IPR011991,Winged,113 IPR000873,AMP-dependent synthetase/ligase,111 M1DL3_Interpro_sum.csv IPR017690,Outer membrane,905 IPR013785,Aldolase,367 IPR014729,Rossmann,338 IPR015421,Pyridoxal,271 IPR003594,ATPase,158 IPR018248,EF-hand,3 now to merge these files i have tried the following code #ARGV = <merge_csvfiles/*.csv>; print #ARGV[0],"\n"; open(PAGE,">outfile.csv") || die"Can't open outfile.csv\n"; while($i<scalar(#ARGV)) { open(FILE,#ARGV[$i]) || die"Can't open ...#ARGV[$i]...\n"; $data.=join("",<FILE>); close FILE; print"file completed...",$i+1,"\n"; $i++; } #data=split("\n",$data); #data2=#data; print scalar(#data); for($i=0;$i<scalar(#data);$i++) { #id1=split(",",#data[$i]); $id_1=#id1[0]; #data[$j]=~s/\n//; if(#data[$i] ne "") { print PAGE "\n#data[$i],"; for($j=$i+1;$j<scalar(#data2);$j++) { #id2=split(",",#data2[$j]); $id_2=#id2[0]; if($id_1 eq $id_2) { #data[$j]=~s/\n//; print PAGE "#data2[$j],"; #data2[$j]=""; #data[$j]=""; print "match found at ",$i+1," and ",$j+1,"\n"; } } } print $i+1,"\n"; } merge_csvfiles is a folder which contains all the files output of above code is IPR017690,Outer membrane,821,IPR017690,Outer membrane ,728,IPR017690,Outer membrane,905 IPR014729,Rossmann,327,IPR014729,Rossmann,261,IPR014729,Rossmann,338 IPR013785,Aldolase,304,IPR013785,Aldolase,300,IPR013785,Aldolase,367 IPR015421,Pyridoxal,224,IPR015421,Pyridoxal,189,IPR015421,Pyridoxal,271 IPR003594,ATPase,179,IPR003594,ATPase,158 IPR000531,TonB receptor,150 IPR018248,EF-hand,10,IPR018248,EF-hand,3 IPR011991,Winged,113 IPR000873,AMP-dependent synthetase/ligase but i want the output in following format.... IPR017690,Outer membrane,821,IPR017690,Outer membrane ,728,IPR017690,Outer membrane,905 IPR014729,Rossmann,327,IPR014729,Rossmann,261,IPR014729,Rossmann,338 IPR013785,Aldolase,304,IPR013785,Aldolase,300,IPR013785,Aldolase,367 IPR015421,Pyridoxal,224,IPR015421,Pyridoxal,189,IPR015421,Pyridoxal,271 IPR003594,ATPase,179,0,0,0,IPR003594,ATPase,158 IPR000531,TonB receptor,150,0,0,0,0,0,0 IPR018248,EF-hand,10,0,0,0,IPR018248,EF-hand,3 0,0,0,IPR011991,Winged,113,0,0,0 0,0,0,IPR000873,AMP-dependent synthetase/ligase,111,0,0,0 Has anybody got any idea how can i do this? Thank you for the help
As mentioned in Miguel Prz's comment, you haven't explained how you want the merge to be performed, but, judging by the "desired output" sample, it appears that what you want is to concatenate lines with matching IDs from all three input files into a single line in the output file, with "0,0,0" taking the place of any lines which don't appear in a given file. So, then: #!/usr/bin/env perl use strict; use warnings; my #input_files = glob 'merge_csvfiles/*.csv'; my %data; for my $i (0 .. $#input_files) { open my $infh, '<', $input_files[$i] or die "Failed to open $input_files[$i]: $!"; while (<$infh>) { chomp; my $id = (split ',', $_, 2)[0]; $data{$id}[$i] = $_; } print "Input file read: $input_files[$i]\n"; } open my $outfh, '>', 'outfile.csv' or die "Failed to open outfile.csv: $!"; for my $id (sort keys %data) { my #merge_data; for my $i (0 .. $#input_files) { push #merge_data, $data{$id}[$i] || '0,0,0'; } print $outfh join(',', #merge_data) . "\n"; } The first loop collects all the lines from each file into a hash of arrays. The hash keys are the IDs, so the lines for that ID from all files are kept together, and the value for each key is (a reference to) an array of the line associated with that ID in each file; using an array for this allows us to keep track of values which are missing as well as those which are present. The second loop then takes the keys of that hash (in alphabetical order) and, for each one, creates a temporary array of the values associated with that ID, substituting "0,0,0" for missing values, joins them into a single string, and prints that to the output file. The results, in outfile.csv, are: IPR000531,TonB receptor,150,0,0,0,0,0,0 0,0,0,IPR000873,AMP-dependent synthetase/ligase,111,0,0,0 IPR003594,ATPase,179,0,0,0,IPR003594,ATPase,158 0,0,0,IPR011991,Winged,113,0,0,0 IPR013785,Aldolase,304,IPR013785,Aldolase,300,IPR013785,Aldolase,367 IPR014729,Rossmann,327,IPR014729,Rossmann,261,IPR014729,Rossmann,338 IPR015421,Pyridoxal,224,IPR015421,Pyridoxal,189,IPR015421,Pyridoxal,271 IPR017690,Outer membrane, omp85 target,821,IPR017690,Outer membrane, omp85 target,728,IPR017690,Outer membrane,905 IPR018248,EF-hand,10,0,0,0,IPR018248,EF-hand,3 Edit: Added explanations requested by OP in comments can u expalain me the working of my $id = (split ',', $_, 2)[0]; and $# in this program my $id = (split ',', $_, 2)[0]; gets the text prior to the first comma in the last line of text that was read: Because I didn't specify what variable to put the data in, while (<$infh>) reads it into the default variable $_. split ',', $_, 2 splits up the value of $_ into a list of comma-separated fields. The 2 at the end tells it to only produce at most 2 fields; the code will work fine without the 2, but, since I only need the first field, splitting into more parts isn't necessary. Putting (...)[0] around the split command turns the returned list of fields into an (anonymous) array and returns the first element of that array. It's the same as if I'd written my #fields = split ',', $_, 2; my $id = $fields[0];, but shorter and without the extra variable. $#array returns the highest-numbered index in the array #array, so for my $i (0 .. $#array) just means "loop over the indexes for all elements in #array". (Note that, if I hadn't needed the value of the index counter, I would have instead looped over the array's data directly, by using for my $filename (#input_files), but it would have been less convenient to keep track of the missing values if I'd done it that way.)
Opening a text file as a hash and searching within that hash
I have an assignment to write a Perl file to open a text file of IP addresses and their hostnames, separated by a new line, and load it into a hash. I'm then supposed to ask for user input as to what the user would like to search for within the file. If a result is found, the program should print the value and key, and ask for input again until the user doesn't input anything. I'm not even close to the end, but need a bit of guidance. I've cobbed together some code from here and through using some Google-Fu. Here's my work in progress: #!/usr/bin/perl print "Welcome to the text searcher! Please enter a filename: "; $filename = <>; my %texthash = (); open DNSTEXT, "$filename" or die! "Insert a valid name! "; while (<DNSTEXT>) { chomp; my ($key, $value) = split("\n"); $texthash{$key} .= exists $texthash{$key} ? ",$value" : $value; } print $texthash{$weather.com} #print "What would you like to search for within this file? " #$query = <> #if(exists $text{$query}) { As is probably glaringly obvious, I'm quite lost. I'm not sure if I'm inserting the file into the hash correctly, or how to even print a value to debug.
The problem here is we don't know what the input file looks like. Assuming that the input file somehow looks like: key1,value1 key2,value2 key3,value3 (or other similar manner, in this case the key and value pair are separated by a comma), you could do this: my %text_hash; # the my $line in the while() means that for every line it reads, # store it in $line while( my $line = <DNSTEXT>) { chomp $line; # depending on what separates the key and value, you could replace the q{,} # with q{<whatever is between the key and value>} my ( $key, $value ) = split q{,},$line; $text_hash{$key} = $value; } But yeah, please tell us what the content of the file looks like.