how to add record in alphabetical order - perl

A File contains (a.txt)
# this is test file
Data: 15th may 2010
Records :
a
b
c
d
g
l
just consider if i want to add new record "f" - i want addit in betwen d and g

You can use Tie::File and treat the text file as array. This example assumes that there're no blank lines and initial text:
use Tie::File;
tie my #array, 'Tie::File', 'filename' or die $!;
my $rec = 'f';
for my $i (0..$#array) {
if (($array[$i] cmp $rec) == 1) {
splice #array, $i, 0, $rec;
last
}
}

in one line:
perl -le 'print for(sort((map {chomp; $_} (<>)), "f"))' < infile > outfile
You obviously need to process the headers beforehand, but the technique is pretty clear
For example:
[dsm#localhost:~]$ perl -le 'print for(sort((map {chomp; $_;} (<>)), "f"))' <<EOP
> x
> v
> b
> m
> p
> o
> l
> j
> k
> EOP
b
f
j
k
l
m
o
p
v
x
[dsm#localhost:~]$

use Tie::File ();
use List::MoreUtils qw(lastidx);
my $new_item = 'f';
{
my #file;
tie #file, 'Tie::File', 'a.txt'
or die "Could not open 'a.txt' for reading: $!";
splice #file, (1 + lastidx { $_ lt $new_item } #file), 0, $new_item;
}

Related

How to get only uniq column values from a line using linux or perl?

I have a file like below
1 B B C D B
2 K B D D K
1 B B C D B
2 K B D D K
1 B B C D B
2 K B D D K
I want the output to look like this
1 B C D
2 K B D
1 B C D
2 K B D
1 B C D
2 K B D
Sort commands don't work, so I tried writing a Perl program -
use strict;
use Data::Dumper;
my $file1 = <$ARGV[0]>;
open (IF2, "$file1") || die "Cannot open the file\n";
open (OUT, ">$file1.out") || die "Cannot open the out file\n";
my $k = 0;my $i=0;
my #line;
my $m;
my #line2;
while ( chomp($m = <IF2>) ) {
my $count = 0;
#line2 = split(/\t/,$m);#<stdin>;
my $l = length #line2;print $l;<stdin>;
for (my $x = 0; $x < $l;$x++) {
my $k = 0;
for (my $y = 0;$y < $l; $y) {
$i++;
#
if ($count == 0)
{
print OUT "\t$line2[$x]";
$count++;
}
if ($count != 0 && $x != $y)
{
if ($line2[$x] eq $line2[$y])
{
$k++;
}
}
}
if ($k == 0)
{
print OUT "\t$line2[$x]";
}
}
print OUT "\n";
}
print $i;
close IF2;
close OUT;
But it didn't work.
Can some one help?
Note The input and output examples in the question were edited: now the data is consistent with the title, asking to remove all duplicates. (See the original question for how it looked.) I am leaving the answer to the original question as well, until we hear from the poster.
I would like to make a general remark first.
Your posted code is an honest attempt to write a C-style program in Perl. I recommend to resist that and to learn how to use Perl instead. It does take a little investment of time and effort but it goes quick and pays off quick. As you can see below, it makes many things incomparably easier.
Even so, there are issues with the posted code but I can't write a code review now.
Edited question
We need to remove all duplicates from each line. One way: split the line by whitespace then remove duplicates from the list, which is a standard task with ready solutions.
use warnings;
use strict;
use List::MoreUtils qw(uniq);
my $file = '...';
my $fileout = '...';
open my $fh, '<', $filen or die "Can't open $file: $!";
open my $fh_out, '>', $fileout or die "Can't open $fileout: $!";
while (<$fh>)
{
my #unique = uniq split;
print $fh_out "#unique\n";
}
close $fh;
close $fh_out;
The line with spaces is restored by printing the quoted array, when spaces (or whatever is in $") are inserted between elements . The other option is to join the resulting list
my $res = join ' ', uniq split;
print $fh_out $res, "\n";
or just print $fh_out join(' ', uniq split), "\n";.
This uses uniq from List::MoreUtils module. Note from uniq that
The order of elements in the returned list is the same as in LIST.
Once List::MoreUtils got mentioned, take note of the core module List::Util as well.
A one-liner version
perl -MList::MoreUtils=uniq -lne'print join " ", uniq split' input > output
or
perl -MList::MoreUtils=uniq -lane'print join " ", uniq #F' input > output
See Command switches in perlrun
Original question (see it in the edit history)
I was answering the question with the belief that this was the input
1 B B C D B 2 K B D D K 1 B B C D B 2 K B D D K 1 B B C D B 2 K B D D K
and this was the desired output
1 B C D 2 K B D 1 B C D 2 K B D 1 B C D 2 K B D
By your desired output you want to remove only adjacent duplicates (not get "uniq" as the title says).
For that you can use the ability of regex to match repeated patterns, by using backreferences. First we need to strip all spaces, and we'll put them back in the end. For example
use warnings;
use strict;
my $file = '...';
my $fileout = '...';
open my $fh, '<', $filen or die "Can't open $file: $!";
open my $fh_out, '>', $fileout or die "Can't open $fileout: $!";
while (my $line = <$fh>)
{
$line =~ s/\s*//g; # remove spaces /
$line =~ s/(.)\1+/$1/g; # remove adjacent duplicates
$line =~ s/(.)/$1 /g; # restore space
print $fh_out $line;
}
close $fh;
close $fh_out;
The . matches any character, replace it with something more restrictive if needed (for example \w, for a "word" character). See perlretut. Note that we cannot restore space along with replacement (like s/(.)\1+/$1 /g) since non-repeating chars aren't matched and don't get the space back.
This can be done in more concise ways.
A one-liner version
perl -pe's/\s*//g; s/(.)\1+/$1/g; s/(.)/$1 /g' input > output
See Command switches in perlrun
I suggest something like this. It finds all occurrences of whitespace followed by non-whitespace, and checks whether the non-whitespace has been seen before on the current line. The matched substring is deleted if the non-whitespace has been seen before, otherwise it is left untouched
use strict;
use warnings 'all';
while ( <DATA> ) {
my %seen;
s/(\s+(\S+))/ $seen{$2}++ ? '' : $1 /eg;
print;
}
__DATA__
1 B B C D B
2 K B D D K
1 B B C D B
2 K B D D K
1 B B C D B
2 K B D D K
output
1 B C D
2 K B D
1 B C D
2 K B D
1 B C D
2 K B D
This may be done in a one-liner like this
perl -pe 'my %s; s/(\s+(\S+))/ $s{$2}++ ? "" : $1 /eg' myfile

Perl: Search and Replace

I'm trying to improve my script in which I hope to match characters in input.txt (column 4: H1, 2HB, CA, HB3) to dictionary.txt and replace with appropriate characters from dictionary.txt (column 2: H, HB, C, 3HB). Using dictionary.txt as a dictionary:
input.txt
1 N 22 H1 MET
1 H 32 2HB MET
1 C 40 CA MET
2 H 35 HB3 ASP
dictionary.txt
MET H H1
MET HB 2HB
MET C CA
ASP 3HB HB3
output
1 N 22 H MET
1 H 32 HB MET
1 C 40 C MET
2 H 35 3HB ASP
I'm trying to approach this by first matching the word in input.txt (MET) and dictionary.txt (MET) and then performing the substitution. This is what I've written so far:
#!/usr/bin/perl
use strict;
use warnings;
my %dictionary;
open my $dic_fh, '<', 'dictionary.txt' or die "Can't open file: $!";
while (my $ref = <$dic_fh>) {
chomp $ref;
my #columns = split(/\t/, $ref);
my $res_name = $columns[0];
my $ref_nuc = $columns[1];
$dictionary{$res_name} = {$ref_nuc};
open my $in_fh, '<', 'input.txt' or die "Can't open file: $!";
while (my $line = <$in_fh>) {
chomp $line;
my #columns = split(/\t/, $line);
my #name = $columns[3];
if (my $name eq $res_name) {
my $line = $_;
foreach my $res_name (keys %dictionary) {
$line =~ s/$name/$dictionary{$ref_nuc}/;
}
print $line;
}
}
}
The problem seems to be that you are assigning the single field $columns[3] to array #name, and then expecting to find it in $name, which is a separate variable altogether. You even declare $name at the point of the comparison
You are also executing the statement
$line =~ s/$name/$dictionary{$ref_nuc}/;
once for each key in the hash. That is unnecessary: it needs to be done only once. It is also better to change the value of $columns[3] to $dictionary{$columns[3]} instead of doing a search and replace on the whole line, as the target string may appear in other columns that you don't want to modify
It is very simple to do by building a dictionary hash and replacing the fourth field of the input file with its dictionary lookup
use strict;
use warnings;
use 5.010;
use autodie;
open my $fh, '<', 'dictionary.txt';
my %dict;
while ( <$fh> ) {
my ($k, $v) = (split)[2,1];
$dict{$k} = $v;
}
open $fh, '<', 'input.txt';
while ( <$fh> ) {
my #fields = split;
$fields[3] = $dict{$fields[3]};
say join "\t", #fields;
}
output
1 N 22 H MET
1 H 32 HB MET
1 C 40 C MET
2 H 35 3HB ASP

Find mismatch on 2nd column between 2 text files

I have these 2 text files and I would like to find any mismatch on 2nd column between files. The mismatch to be identified is based on type of F ,P and N regardless which lines they occur. I have 1F, 3P in first file while 2P,1N and 1F in second file. When do comparison, both files should have equal occurrence of type 1F, 3P and 1N.
Text1:
f0x11 F
f0x34 P
drx99
dex67 P
edx43 P
sdx33
Text2:
1 P
2 N
4
5 F
6
7 P
Expected Output:
Text 1 has missing type of N
Text 2 has missing type of P
What I have tried so far does not produce desired output.
code:
use strict;
my %ref_data;
my %ref_data2;
open my $fh, '<', 'Text1' or die "Could not open file to read:$!";
while (<$fh>) {
chomp;
my ($res, $type) = split;
if (defined $type){
$ref_data{$type} = "$type";
}
}
our ($data,$data2);
open $fh, '<', 'Text2' or die "Could not open file to read:$!";
while (<$fh>) {
chomp;
my ($res, $type) = split;
if (defined $type){
$ref_data2{$type}= "$type";
$data2= $ref_data2{$type};
$data = $ref_data{$type};
print "File 2 has missing type of $type\n" unless $data;
}
}
foreach ($data){
print "File 1 has missing type of $_\n" if $data ne $data2;
}
You appear to want to keep track of how many times the values in Column 2 occur within each file -- for example, in a comment you write, "I have 1F, 3P in first file while 2P,1N and 1Fin second file". If that's the case, you need a better data structure.
Specifically, one that counts occurrences of the values in Column 2, and you need those counts to be tracked separately for each file. That suggests a hash-of-hashes.
use strict;
use warnings;
# Example usage:
# perl YOUR_SCRIPT.pl a.txt b.txt
my #files = #ARGV;
# Count the values in Column 2, organizing the tallies like this:
# $tallies{COL_2}{FILE_NAME} = N
my %tallies;
while (<>) {
my #cols = split;
$tallies{$cols[1]}{$ARGV} ++ if #cols > 1;
}
# Print discrepancies.
for my $c2 (keys %tallies) {
my #t = map { $tallies{$c2}{$_} || 0 } #files;
next if $t[0] == $t[1];
print "$c2: $files[0] has $t[0]; $files[1] has $t[1]\n";
}
Example output:
N: a.txt has 0; b.txt has 1
P: a.txt has 3; b.txt has 2
Also worth noting: this code does not have to open any files explicitly, and file names are not hard-coded in the program. Instead, we pass input file names as command-line arguments, get those arguments via #ARGV, process lines in those files via <>, and know which file we're currently processing via $ARGV.
I've refactored your code where you seem to be duplicating the same behavior.
The output isn't to spec, but should be clear enough for you to understand and finish up yourself.
I added a close $fh; and use warnings; as well
#!/usr/bin/perl
use strict;
use warnings;
#run
my %max; # hash of combined data
my $file_data_1 = parse_file_into_hash("text1", \%max);
my $file_data_2 = parse_file_into_hash("text2", \%max);
diff_hashes(\%max, $file_data_1, $file_data_2);
# diff_hashes($max, $h1, $h2)
#
# diffs 2 hash refs against a combined $max hash and prints results
sub diff_hashes {
my ($max, $h1, $h2) = #_;
# TODO - do all the comparisios and some error checking (if keys exist etc...) here
for my $key (keys %$max) {
print "max/combined: $key = $max->{$key}\n";
my $h1_print = exists $h1->{$key} ? $h1->{$key} : "0";
my $h2_print = exists $h2->{$key} ? $h2->{$key} : "0";
print "h1: $key = $h1_print\n";
print "h2: $key = $h2_print\n";
}
}
# parse_file_into_hash($file, $max)
#
# $max is a hash reference (passed by reference) so you can count occurences over
# multiple files...
# returns reference of hash ( $line_number => $data_value )
sub parse_file_into_hash {
my ($file, $max) = #_;
my %ref_data;
open my $fh, '<', $file or die "Could not open file to read:$!";
while (my $line = <$fh>) {
chomp $line;
my ($res, $type) = split /\s+/, $line;
if ($type) {
$ref_data{$type}++;
if (!exists $max->{$type} || $ref_data{$type} > $max->{$type}) {
$max->{$type} = $ref_data{$type};
}
}
}
close $fh;
return \%ref_data;
}
Output ran against your example files:
$ ./example.pl
max/combined: F = 1
h1: F = 1
h2: F = 1
max/combined: N = 1
h1: N = 0
h2: N = 1
max/combined: P = 3
h1: P = 3
h2: P = 2

Iterate a simple program

I have the following file:
a b 5
c d 6
e f 7
g h 4
i j 3
k l 10
and I want to find which line present the minimum value in the third column and erase it from the initial file. After this, I want to iterate again the program and find again which line present the minimum and make the same thing for 2 more times.
The output file should be
c d 6
e f 7
k l 10
I tried to write the following code:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $file1 = "try.dat";
open my $fg, "<", $file1 or die "Error during $file1 opening: $!";
my #vector;
while (<$fg>) {
push #vector, [ split ];
}
my $minimum = $vector[0][2];
my #blacklist;
for my $i (0 .. $#vector) {
if ($minimum > $vector[$i][2]){
$minimum = $vector[$i][2];
push #blacklist, #vector[$i+1];
}
}
#say "my minimum is $minimum";
#say "the blacklist is composed by #blacklist";
I don't know how to erase the elements contained by the #blacklist (that in the first case should be i j 3) and how to iterate everything.
Any help for the iteration?
This sort of thing is what Tie::File was made for. It allows you to modify the file in-place by modfying a tied array.
This program does what you want. The helper function minidx returns the first index of the element of the passed array that holds the smallest value.
The program works by copying the third field of the file records into array #field3, and finding the index of the smallest value in there. The element at that index is then deleted from both the file and #field3 using splice.
use strict;
use warnings;
use Tie::File;
tie my #file, 'Tie::File', 'file.txt' or die $!;
my #field3 = map { (split)[2] } #file;
for (1 .. 3) {
my $i = minidx(\#field3);
splice #file, $i, 1;
splice #field3, $i, 1;
}
sub minidx {
my ($arr) = #_;
my ($i, $v);
for (0 .. $#$arr) {
($i, $v) = ($_, $arr->[$_]) unless defined $v and $arr->[$_] >= $v;
}
return $i;
}
output
c d 6
e f 7
k l 10
When you say #blacklist = #vector you are adding the entire #vector array to the black list. You probably want to do a push #blacklist, $vector[$i]. That will push the array reference into blacklist.
Now, blacklist has an array ref in it, so you have to deference it to print it.
say "the blacklist is composed by #{$blacklist[0]}";
Edit: For iterating and writing:
I would skip the #blacklist array (unless you need it for something else) and remove the min values from #vector. Then you can write #vector to some file.
my $num_elts_to_remove = 3;
for (my $j = 0; $j < $num_elts_to_remove; $j++) {
my $minimum = $vector[0][2];
my $min_idx = 0;
for my $i (0 .. $#vector) {
if ($minimum > $vector[$i][2]){
$minimum = $vector[$i][2];
$min_idx = $i;
}
}
push #blacklist, $vector[$min_index];
splice #vector, $min_idx, 1; #remove array with smallest value
}
Now write the array to a file
open OUT, ">", $outfile or die "Error: $!";
foreach(#vector) {
print OUT join " ", #$_;
print OUT "\n";
}
close(OUT);
Prints:
c d 6
e f 7
k l 10
Taking Borodin's Tie::File suggestion even further. I have written a cute module called Tie::Array::CSV which allow you to treat a delimited file as an array (and because it uses Tie::File underneath, it is both read and write). Because of this I can use Perlish operations like map and sort (and Schwartzian transform!) to perform this task:
#!/usr/bin/env perl
use strict;
use warnings;
use Tie::Array::CSV;
tie my #data, 'Tie::Array::CSV', 'data', sep_char => ' ';
# get a list of row ids sorted by last value (inc)
my $i = 0;
my #sorted =
map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [$i++, $_->[-1]] }
#data;
#splice the rows by index of the lowest three (from bottom to top)
splice #data, $_, 1 for reverse sort #sorted[0..2];
Note that in the end you want to remove rows from the bottom so that you don't have to reindex every time.

Majority Voting in perl?

I have 5 files containing the same words. I want to read each word in all the files and decide the winning word by detecting the following characters in a word (*, #, $, &) separated by tabs. Then, I want to generate an output file. Ii can only have 2 winners. For example:
file1
we$
are*
...
file2
we$
are#
...
file3
we&
are*
...
file4
we$
are#
...
file5
we$
are&
...
output file:
we$
are*#
Here is how I started:
#!/usr/local/bin/perl -w
sub read_file_line {
my $fh = shift;
if ($fh and my $line = <$fh>) {
chomp($line);
return $line;
}
return;
}
open(my $f1, "words1.txt") or die "Can't";
open(my $f2, "words2.txt") or die "Can't";
open(my $f3, "words3.txt") or die "Can't";
open(my $f4, "words4.txt") or die "Can't";
open(my $f5, "words5.txt") or die "Can't";
my $r1 = read_file_line($f1);
my $r2 = read_file_line($f2);
my $r3 = read_file_line($f3);
my $r4 = read_file_line($f4);
my $r5 = read_file_line($f5);
while ($f5) {
#What can I do here to decide and write the winning word in the output file?
$r1 = read_file_line($f1);
$r2 = read_file_line($f2);
$r3 = read_file_line($f3);
$r4 = read_file_line($f4);
$r5 = read_file_line($f5);
}
Test Data Generator
#!/usr/bin/env perl
use strict;
use warnings;
foreach my $i (1..5)
{
my $file = "words$i.txt";
open my $fh, '>', $file or die "Failed to open $file for writing ($!)";
foreach my $w (qw (we are the people in charge and what we say goes))
{
my $suffix = substr('*#$&', rand(4), 1);
print $fh "$w$suffix\n";
}
}
Majority Voting Code
#!/usr/bin/env perl
use strict;
use warnings;
my #files = ( "words1.txt", "words2.txt", "words3.txt",
"words4.txt", "words5.txt"
);
my #fh;
{
my $n = 0;
foreach my $file (#files)
{
open my $f, '<', $file or die "Can't open $file for reading ($!)";
$fh[$n++] = $f;
}
}
while (my $r = process_line(#fh))
{
print "$r\n";
}
sub process_line
{
my(#fhlist) = #_;
my %words = ();
foreach my $fh (#fhlist)
{
my $line = <$fh>;
return unless defined $line;
chomp $line;
$words{$line}++;
}
my $combo = '';
foreach my $word (keys %words)
{
return $word if ($words{$word} > 2);
$combo .= $word if ($words{$word} == 2);
}
$combo =~ s/(\W)\w+(\W)/$1$2/;
return $combo;
}
Example Data and Results
$ perl datagenerator.pl
$ perl majorityvoter.pl > results.txt
$ paste words?.txt results.txt
we* we$ we& we# we# we#
are* are# are# are* are$ are*#
the* the& the# the# the& the&#
people& people& people$ people# people# people&#
in# in* in$ in* in* in*
charge* charge# charge& charge* charge# charge#*
and$ and* and$ and& and$ and$
what& what& what$ what& what# what&
we# we* we* we& we* we*
say$ say& say$ say$ say$ say$
goes$ goes& goes# goes# goes# goes#
$
This seems to be correct for the test data in the files generated.
Revised requirements - example output
The 'revised requirements' replaced the '*#$&' markers after the words with a tab and one of the letters 'ABCD'. After some swift negotiation, the question is restored to its original form. This output is from a suitably adapted version of the answer above - 3 code lines changed, 2 in the data generator, 1 in the majority voter. Those changes are not shown - they are trivial.
we C we D we C we C we D we C
are C are D are C are B are A are C
the B the D the A the A the D the A|D
people D people B people A people B people D people B|D
in D in B in C in B in D in D|B
charge C charge D charge D charge D charge A charge D
and A and B and C and C and B and B|C
what B what B what B what C what C what B
we D we B we D we B we A we B|D
say D say D say B say D say D say D
goes A goes C goes A goes C goes A goes A
Revised test generator - for configurable number of files
Now that the poster has worked out how to handle the revised scenario, this is the data generator code I used - with 5 tags (A-E). Clearly, it would not take a huge amount of work to configure the number of tags on the command line.
#!/usr/bin/env perl
use strict;
use warnings;
my $fmax = scalar(#ARGV) > 0 ? $ARGV[0] : 5;
my $tags = 'ABCDE';
my $ntags = length($tags);
my $fmt = sprintf "words$fmax-%%0%0dd.txt", length($fmax);
foreach my $fnum (1..$fmax)
{
my $file = sprintf $fmt, $fnum;
open my $fh, '>', $file or die "Failed to open $file for writing ($!)";
foreach my $w (qw(We Are The People In Charge And What We Say Goes))
{
my $suffix = substr($tags, rand($ntags), 1);
print $fh "$w\t$suffix\n";
}
}
Revised Majority Voting Code - for arbitrary number of files
This code works with basically arbitrary numbers of files. As noted in one of the (many) comments, it does not check that the word is the same in each file as required by the question; you could get quirky results if the words are not the same.
#!/usr/bin/env perl
use strict;
use warnings;
my #files = scalar #ARGV > 0 ? #ARGV :
( "words1.txt", "words2.txt", "words3.txt",
"words4.txt", "words5.txt"
);
my $voters = scalar(#files);
my #fh;
{
my $n = 0;
foreach my $file (#files)
{
open my $f, '<', $file or die "Can't open $file for reading ($!)";
$fh[$n++] = $f;
}
}
while (my $r = process_line(#fh))
{
print "$r\n";
}
sub process_line
{
my(#fhlist) = #_;
my %words = ();
foreach my $fh (#fhlist)
{
my $line = <$fh>;
return unless defined $line;
chomp $line;
$words{$line}++;
}
return winner(%words);
}
# Get tag X from entry "word\tX".
sub get_tag_from_word
{
my($word) = #_;
return (split /\s/, $word)[1];
}
sub winner
{
my(%words) = #_;
my $maxscore = 0;
my $winscore = ($voters / 2) + 1;
my $winner = '';
my $taglist = '';
foreach my $word (sort keys %words)
{
return "$word\t$words{$word}" if ($words{$word} >= $winscore);
if ($words{$word} > $maxscore)
{
$winner = $word;
$winner =~ s/\t.//;
$taglist = get_tag_from_word($word);
$maxscore = $words{$word};
}
elsif ($words{$word} == $maxscore)
{
my $newtag = get_tag_from_word($word);
$taglist .= "|$newtag";
}
}
return "$winner\t$taglist\t$maxscore";
}
One Example Run
After considerable experimentation on the data presentation, one particular set of data I generated gave the result:
We A|B|C|D|E 2 B C C E D A D A E B
Are D 4 C D B A D B D D B E
The A 5 D A B B A A B E A A
People D 4 E D C D B E D D B C
In D 3 E C D D D B C A A B
Charge A|E 3 E E D A D A B A E B
And E 3 C E D D C A B E B E
What A 5 B C C A A A B A D A
We A 4 C A A E A E C D A E
Say A|D 4 A C A A D E D A D D
Goes A 3 D B A C C A A E E B
The first column is the word; the second is the winning tag or tags; the third (numeric) column is the maximum score; the remaining 10 columns are the tags from the 10 data files. As you can see, there two each of 'We A', 'We B', ... 'We E' in the first row. I've also generated (but not preserved) one result set where the maximum score was 7. Given enough repetition, these sorts of variations are findable.
Sounds like the job for a hash of hashes. Untested code:
use strict;
use warnings;
use 5.010;
use autodie;
use List::Util qw( sum reduce );
my %totals;
my #files = map "words$_.txt", 1..5;
for my $file (#files) {
open my $fh, '<', $file;
while (<$fh>) {
chomp;
my ($word, $sign) = /(\w+)(\W)/;
$totals{$word}{$sign}++;
}
}
open my $totals_fh, '>', 'outfile.txt';
my #sorted_words = sort { sum values %{$totals{$a}} <=> sum values %{$totals{$b}} } keys %totals; #Probably something fancier here.
for my $word (#sorted_words[0, 1]) {
#say {$totals_fh} $word, join('', keys %{$totals{$word}} ), "\t- ", function_to_decide_text($totals{$word});
say {$totals_fh} $word, reduce {
$totals{$word}{ substr $a, 0, 1 } == $totals{$word}{$b} ? $a . $b
: $totals{$word}{ substr $a, 0, 1 } > $totals{$word}{$b} ? $a
: $b;
} keys %{ $totals{$word} };
}
EDIT: Forgot about the only two winners part. Fixed, somewhat.
EDIT2: Fixed as per comments.
#!/usr/bin/perl
use strict;
use warnings;
my #files = qw(file1 file2 file3 file4 file5);
my $symbols = '*#$&'; # no need to escape them as they'll be in a character class
my %words;
foreach my $file (#files) {
open(my $fh, '<', $file) or die "Cannot open $file: $!";
while (<$fh>) {
if (/^(\w+[$symbols])$/) {
$words{$1} ++; # count the occurrences of each word
}
}
close $fh;
}
my $counter = 0;
my $previous = -1;
foreach my $word (sort {$words{$b} <=> $words{$a}} keys %words) {
# make sure you don't exit if two words at the top of the list
# have the same number of occurrences
if ($previous != $words{$word}) {
last if $counter > 1;
}
$counter ++; # count the output
$previous = $words{$word};
print "$word occurred $words{$word} times.\n";
}
Worked when I tried it out...