Find the top 5 closest rather than just the closest? - text-processing

How would you find 5 numbers in a column of numbers that are the closest to a $VariableNumber?
For example, if the $VariableNumber = 30 then:
Example input file:
50
100
70
40
20
10
65
41
92
Example output:
20
40
41
10
50
there's an answer that someone’s posted elsewhere before that finds the closest number match in a specific column on a specific line to a given value, which goes as follows:
awk -v col_num="3" -v value="$Number" '
func abs(x) { return (x<0) ? -x : x }
{
distance = abs($col_num - value)
}
NR==1 || distance<shortest_distance {
shortest_distance = distance
nearest_value = $col_num
}
END {
print nearest_value
}
'
But I haven't been able to adapt it

I'd just sort the entries by distance to n and select the first ones like:
awk -v n=30 '
function abs(x) {return x < 0 ? -x : x}
{print abs($0 - n) "\t" $0}' < file |
sort -n |
head -n 5 |
cut -f 2-

As usual, Stéphane’s answer is very good;
simple and straightforward. 
But, if you really really want to do it entirely in awk,
and you have GNU awk (a.k.a. gawk), you can do this:
awk -v t="$VariableNumber" '
{
d = $1 - t
if (d < 0) d = -d
e = d "#" $1
if (NR <= 5) {
a[NR] = e
} else {
a[5+1] = e
asort(a, a, "#val_num_asc")
delete a[5+1]
}
}
END {
print "---"
if (NR <= 5) asort(a, a, "#val_num_asc")
for (i in a) { gsub(".*#", "", a[i]); print a[i]; }
}
'
For each input value,
this computes d as the absolute difference
between that value and the target value,
t (which is set on the command line to the value of $VariableNumber,
which, as per your example, might be 30). 
It then constructs an array entry, e,
consisting of the difference,
concatenated with a # and the original number. 
This array entry is then added to the array a. 
The first five input values are simply put into array elements 1 through 5.
After that, each number is appended to the array
by being put into element 6.  Then the array is sorted. 
Since the array entries start with the difference value,
numbers that are close to the target
(for which the difference value is low)
are sorted to the beginning of the array,
and numbers that are far from the target
are sorted to the end of the array. 
(Specify "#val_num_asc"
to sort the values as numbers rather than strings. 
Without this, differences of 10 and 20 will sort below 3 and 4.) 
Then the 6th element (the one that is farthest from the target) is deleted.
Finally (upon reaching the END of the data), we
Check whether the number of records is ≤ 5. 
If it is, sort the array,
because it is still in the order of the input data. 
(Arguably, this step is optional.)
For each element is the array,
strip off the difference and the #
by searching for the regular expression .*#
and substituting (gsub) nothing. 
Then print the original value.
Obviously, if you want to look at a column other the first one,
you can change all the occurrences of $1 in the script. 
(The script you show in your question
demonstrates how to allow the column number to be specified at run time.) 
And, if you want some number other than the closest five,
just change all appearances of 5. 
(I could have referred to a[6] in lines 9 and 11;
I wrote a[5+1] to facilitate simple-minded parameterization.)

Another, for all awks (tested with gawk, mawk, Debian's original-awk and Busybox awk):
$ awk -v v=30 -v n=5 ' # v is the central value,
function abs(d) { # n is the number of wanted values
return (d<0?-d:d) # d is distance, c array index, va value array
} # da distance array, max is greatest of smallest
((d=abs(v-$1))<max) && c==n { # if we find distance < max of top n smallest
da[maxc]=d # replace max in distance array
va[maxc]=$1 # and in value array
max=0 # set max to min to find new max distance
for(ct in da)
if(da[ct]>max) { # find the new max in the top n smallest
max=da[ct]
maxc=ct
}
if(max==0) # if max is 0, all are 0s so might as well exit
exit
next
}
c<n { # we need n first distances
da[++c]=d # fill distance array with them
va[c]=$1 # related values to value array
if(d>max) { # look for max
max=d
maxc=c
}
}
END { # in the end or exit
for(c in va) # get all values in value array
print va[c] # and output them
}' file
Output (in no particular order, array implementation related):
50
10
41
40
20
Execution time is linear, worst case is size of value array times record count so still linear (right? :).

Related

Brainfuck challenge

I have a any challenge. I must write brainfuck-code.
For a given number n appoint its last digit .
entrance
Input will consist of only one line in which there is only one integer n ( 1 < = n < = 2,000,000,000 ) , followed by a newline ' \ n' (ASCII 10).
exit
On the output has to find exactly one integer denoting the last digit of n .
example I
entrance: 32
exit: 2
example II:
entrance: 231231132
exit: 2
This is what I tried, but it didn't work:
+[>,]<.>++++++++++.
The last input is the newline. So you have to go two memory positions back to get the last digit of the number. And maybe you don't have to return a newline character, so the code is
,[>,]<<.
Nope sorry, real answer is
,[>,]<.
because your answer was getting one too far ;)
Depending on the interpreter, you might have to escape the return key by yourself. considering the return key is ASCII: 10, your code should look like this :
>,----- -----[+++++ +++++>,----- -----]<.
broken down :
> | //first operation (just in case your interpreter does not
support a negative pointer index)
,----- ----- | //first entry if it's a return; you don't even get in the loop
[
+++++ +++++ | //if the value was not ASCII 10; you want the original value back
>, | //every next entry
----- ----- | //check again for the the return,
you exit the loop only if the last entered value is 10
]
<. | //your current pointer is 0; you go back to the last valid entry
and you display it
Your issue is that a loop continues for forever until at the end of the loop the cell the pointer is currently on in equal to 0. Your code never prints in the loop, and never subtracts, so your loop will never end, and all that your code does is take an ASCII character as input, move one forward, take an ASCII character as input, and so on. All of your code after the end of the loop is useless, because that your loop will never end.

Creating an array from an str concatenation Matlab

Hello I have these two vectors
Q = [1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4]
and
Year = [2000,2000,2000,2000,2001,2001,2001,2001,2002.....]
and I would like to concatenate them into one single array Time
Time = [20001,20002,20003,20004,20010....]
Or
Time= {'2000Q1', '2000Q2', '2000Q3', '2000Q4', '2001Q1'....}
So far I tried with this code
m = zeros(136,1)
for i=1:136
m(i,1)= strcat(Q(i),Year(i));
end
And Matlab outputed me this:
Subscripted assignment dimension mismatch.
Help pls ?
If your vectors Year and Q have the same number of elements, you do not need a loop, just transpose them (or just make sure they are in column), then concatenate with the [] operator:
Time = [ num2str(Year.') num2str(Q.') ] ;
will give you:
20001
20002
20003
20004
20011
...
And if you want the 'Q' character, insert it in the expression:
Time = [ num2str(Year.') repmat('Q',length(Q),1) num2str(Q.') ]
Will give you:
2000Q1
2000Q2
2000Q3
2000Q4
2001Q1
...
This will be a char array, if you want a cell array, use cellstr on the same expression:
time = cellstr( [num2str(Year.') repmat('Q',length(Q),1) num2str(Q.')] ) ;
To obtain strings:
strtrim(mat2cell(num2str([Year(:) Q(:) ],'%i%i'), ones(1,numel(Q))));
Explanation:
Concat both numeric vectors as two columns (using [...])
Convert to char array, where each row is the concatenation of two numbers (using num2str with sprintf-like format specifiers). It is assumed that all numbers are integers (if not, change the format specifiers). This may introduce unwanted spaces if not all the concatenated numbers have the same number of digits.
Convert to a cell array, putting each row in a different cell (using mat2cell).
Remove whitespaces in each cell (using strtrim)
To obtain numbers: apply str2double to the above:
str2double(strtrim(mat2cell(num2str([Year(:) Q(:) ],'%i%i'), ones(1,numel(Q)))));
Or compute directly
10.^ceil(max(log10(Q)))*Year + Q;
You can use arrayfun
If you want your output in string format (with a 'Q' in the middle) then use sprintf to format the string
Time = arrayfun( #(y,q) sprintf('%dQ%d', y, q ), Year, Q, 'uni', 0 );
Resulting with a cellarray
Time =
'2000Q1' '2000Q2' '2000Q3' '2000Q4' '2001Q1' '2001Q2' '2001Q3'...
Alternatively, if you skip the 'Q' you can save each number in an array
Time = arrayfun( #(y,q) y*10+q, Year, Q )
Resulting with a regular array
Time =
20001 20002 20003 20004 20011 20012 20013 ...
Thats because you are initializing m to zeros(136,1) and then trying to save a full string into the first value. and obviously a double cannot hold a string.
I give you 2 options, but I favor the first one.
1.- you can just use cell arrays, so your code converts into:
m = cell(136,1)
for ii=1:136
m{ii}= strcat(Q(ii),Year(ii));
end
and then m will be: m{1}='2000Q1';
2.- Or if you know that your strings will ALWAYS be the same size (in your case it lokos like they are always 6) you can:
m = zeros(136,strsize)
for ii=1:136
m(ii,:)= strcat(Q(ii),Year(ii));
end
and then m will be: m(1,:)= [ 50 48 48 48 81 49 ] wich translated do ASCII will be 2000Q1

Printing a 2500 x 2500 dimensional matrix using Perl

I am very new to Perl. Recently I wrote a code to calculate the coefficient of correlation between the atoms between two structures. This is a brief summary of my program.
for($i=1;$i<=2500;$i++)
{
for($j=1;$j<=2500;$j++)
{
calculate the correlation (Cij);
print $Cij;
}
}
This program prints all the correlations serially in a single column. But I need to print the correlations in the form of a matrix, something like..
Atom1 Atom2 Atom3 Atom4
Atom1 0.5 -0.1 0.6 0.8
Atom2 0.1 0.2 0.3 -0.5
Atom3 -0.8 0.9 1.0 0.0
Atom4 0.3 1.0 0.8 -0.8
I don't know, how it can be done. Please help me with a solution or suggest me how to do it !
Simple issue you're having. You need to print a NL after you finish printing a row. However, while i have your attention, I'll prattle on.
You should store your data in a matrix using references. This way, the way you store your data matches the concept of your data:
my #atoms; # Storing the data in here
my $i = 300;
my $j = 400;
my $value = ...; # Calculating what the value should be at column 300, row 400.
# Any one of these will work. Pick one:
my $atoms[$i][$j] = $value; # Looks just like a matrix!
my $atoms[$i]->[$j] = $value; # Reminds you this isn't really a matrix.
my ${$atoms[$1]}[$j] = $value; # Now this just looks ridiculous, but is technically correct.
My preference is the second way. It's just a light reminder that this isn't actually a matrix. Instead it's an array of my rows, and each row points to another array that holds the column data for that particular row. The syntax is still pretty clean although not quite as clean as the first way.
Now, let's get back to your problem:
my #atoms; # I'll store the calculated values here
....
my $atoms[$i]->[$j] = ... # calculated value for row $i column $j
....
# And not to print out my matrix
for my $i (0..$#atoms) {
for my $j (0..$#{ $atoms[$i] } ) {
printf "%4.2f ", $atoms[$i]->[$j]; # Notice no "\n".
}
print "\n"; # Print the NL once you finish a row
}
Notice I use for my $i (0..$#atoms). This syntax is cleaner than the C style three part for which is being discouraged. (Python doesn't have it, and I don't know it will be supported in Perl 6). This is very easy to understand: I'm incrementing through my array. I also use $#atom which is the length of my #atoms array -- or the number of rows in my Matrix. This way, as my matrix size changes, I don't have to edit my program.
The columns [$j] is a bit tricker. $atom[$i] is a reference to an array that contains my column data for row $i, and doesn't really represent a row of data directly. (This is why I like $atoms[$i]->[$j] instead of $atoms[$i][$j]. It gives me this subtle reminder.) To get the actual array that contains my column data for row $i, I need to dereference it. Thus, the actual column values are stored in row $i in the array array #{$atoms[$i]}.
To get the last entry in an array, you replace the # sigil with $#, so the last index in my
array is $#{ $atoms[$i] }.
Oh, another thing because this isn't a true matrix: Each row could have a different numbers of entries. You can't have that with a real matrix. This makes using an Array of Arrays in Perl a bit more powerful, and a bit more dangerous. If you need a consistent number of columns, you have to manually check for that. A true matrix would automatically create the required columns based upon the largest $j value.
Disclaimer: Pseudo Code, you might have to take care of special cases and especially the headers yourself.
for($i=1;$i<=2500;$i++)
{
print "\n"; # linebreak here.
for($j=1;$j<=2500;$j++)
{
calculate the correlation (Cij);
printf "\t%4f",$Cij; # print a tab followed by your float giving it 4
# spaces of room. But no linebreak here.
}
}
This is of course a very crude and quick and dirty solution. But if you save the output into a .csv file, most csv-able spreadsheet programs (OpenOfice) should easily be able to read it into a proper table. If the spreadsheet viewer of your choice can not understand tabs as delimeter, you could easily add ; or / or whatever it can use into the printf string.

How does the Brainfuck Hello World actually work?

Someone sent this to me and claimed it is a hello world in Brainfuck (and I hope so...)
++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.
I know the basics that it works by moving a pointer and increment and decrementing stuff...
Yet I still want to know, how does it actually work? How does it print anything on the screen in the first place? How does it encode the text? I do not understand at all...
1. Basics
To understand Brainfuck you must imagine infinite array of cells initialized by 0 each.
...[0][0][0][0][0]...
When brainfuck program starts, it points to any cell.
...[0][0][*0*][0][0]...
If you move pointer right > you are moving pointer from cell X to cell X+1
...[0][0][0][*0*][0]...
If you increase cell value + you get:
...[0][0][0][*1*][0]...
If you increase cell value again + you get:
...[0][0][0][*2*][0]...
If you decrease cell value - you get:
...[0][0][0][*1*][0]...
If you move pointer left < you are moving pointer from cell X to cell X-1
...[0][0][*0*][1][0]...
2. Input
To read character you use comma ,. What it does is: Read character from standard input and write its decimal ASCII code to the actual cell.
Take a look at ASCII table. For example, decimal code of ! is 33, while a is 97.
Well, lets imagine your BF program memory looks like:
...[0][0][*0*][0][0]...
Assuming standard input stands for a, if you use comma , operator, what BF does is read a decimal ASCII code 97 to memory:
...[0][0][*97*][0][0]...
You generally want to think that way, however the truth is a bit more complex. The truth is BF does not read a character but a byte (whatever that byte is). Let me show you example:
In linux
$ printf ł
prints:
ł
which is specific polish character. This character is not encoded by ASCII encoding. In this case it's UTF-8 encoding, so it used to take more than one byte in computer memory. We can prove it by making a hexadecimal dump:
$ printf ł | hd
which shows:
00000000 c5 82 |..|
Zeroes are offset. 82 is first and c5 is second byte representing ł (in order we will read them). |..| is graphical representation which is not possible in this case.
Well, if you pass ł as input to your BF program that reads single byte, program memory will look like:
...[0][0][*197*][0][0]...
Why 197 ? Well 197 decimal is c5 hexadecimal. Seems familiar ? Of course. It's first byte of ł !
3. Output
To print character you use dot . What it does is: Assuming we treat actual cell value like decimal ASCII code, print corresponding character to standard output.
Well, lets imagine your BF program memory looks like:
...[0][0][*97*][0][0]...
If you use dot (.) operator now, what BF does is print:
a
Because a decimal code in ASCII is 97.
So for example BF program like this (97 pluses 2 dots):
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++..
Will increase value of the cell it points to up to 97 and print it out 2 times.
aa
4. Loops
In BF loop consists of loop begin [ and loop end ]. You can think it's like while in C/C++ where the condition is actual cell value.
Take a look BF program below:
++[]
++ increments actual cell value twice:
...[0][0][*2*][0][0]...
And [] is like while(2) {}, so it's infinite loop.
Let's say we don't want this loop to be infinite. We can do for example:
++[-]
So each time a loop loops it decrements actual cell value. Once actual cell value is 0 loop ends:
...[0][0][*2*][0][0]... loop starts
...[0][0][*1*][0][0]... after first iteration
...[0][0][*0*][0][0]... after second iteration (loop ends)
Let's consider yet another example of finite loop:
++[>]
This example shows, we haven't to finish loop at cell that loop started on:
...[0][0][*2*][0][0]... loop starts
...[0][0][2][*0*][0]... after first iteration (loop ends)
However it is good practice to end where we started. Why ? Because if loop ends another cell it started, we can't assume where the cell pointer will be. To be honest, this practice makes brainfuck less brainfuck.
Wikipedia has a commented version of the code.
+++++ +++++ initialize counter (cell #0) to 10
[ use loop to set the next four cells to 70/100/30/10
> +++++ ++ add 7 to cell #1
> +++++ +++++ add 10 to cell #2
> +++ add 3 to cell #3
> + add 1 to cell #4
<<<< - decrement counter (cell #0)
]
> ++ . print 'H'
> + . print 'e'
+++++ ++ . print 'l'
. print 'l'
+++ . print 'o'
> ++ . print ' '
<< +++++ +++++ +++++ . print 'W'
> . print 'o'
+++ . print 'r'
----- - . print 'l'
----- --- . print 'd'
> + . print '!'
> . print '\n'
To answer your questions, the , and . characters are used for I/O. The text is ASCII.
The Wikipedia article goes on in some more depth, as well.
The first line initialises a[0] = 10 by simply incrementing ten times
from 0. The loop from line 2 effectively sets the initial values for
the array: a[1] = 70 (close to 72, the ASCII code for the character
'H'), a[2] = 100 (close to 101 or 'e'), a[3] = 30 (close to 32, the
code for space) and a[4] = 10 (newline). The loop works by adding 7,
10, 3, and 1, to cells a[1], a[2], a[3] and a[4] respectively each
time through the loop - 10 additions for each cell in total (giving
a[1]=70 etc.). After the loop is finished, a[0] is zero. >++. then
moves the pointer to a[1], which holds 70, adds two to it (producing
72, which is the ASCII character code of a capital H), and outputs it.
The next line moves the array pointer to a[2] and adds one to it,
producing 101, a lower-case 'e', which is then output.
As 'l' happens
to be the seventh letter after 'e', to output 'll' another seven are
added (+++++++) to a[2] and the result is output twice.
'o' is the
third letter after 'l', so a[2] is incremented three more times and
output the result.
The rest of the program goes on in the same way.
For the space and capital letters, different array cells are selected
and incremented or decremented as needed.
Brainfuck
same as its name.
It uses only 8 characters > [ . ] , - + which makes it the quickest programming language to learn but hardest to implement and understand.
….and makes you finally end up with f*cking your brain.
It stores values in array: [72 ][101 ][108 ][111 ]
let, initially pointer pointing to cell 1 of array:
> move pointer to right by 1
< move pointer to left by 1
+ increment the value of cell by 1
- increment the value of element by 1
. print value of current cell.
, take input to current cell.
[ ] loop, +++[ -] counter of 3 counts bcz it have 3 ′+’ before it, and - decrements count variable by 1 value.
the values stored in cells are ascii values:
so referring to above array: [72 ][101 ][108 ][108][111 ]
if you match the ascii values you’ll find that it is Hello writtern
Congrats! you have learned the syntax of BF
——-Something more ———
let us make our first program i.e Hello World, after which you’re able to write your name in this language.
+++++ +++++[> +++++ ++ >+++++ +++++ >+++ >+ <<<-]>++.>+.+++++ ++..+++.++.+++++ +++++ +++++.>.+++.----- -.----- ---.>+.>.
breaking into pieces:
+++++ +++++[> +++++ ++
>+++++ +++++
>+++
>+
<<<-]
Makes an array of 4 cells(number of >) and sets a counter of 10 something like :
—-psuedo code—-
array =[7,10,3,1]
i=10
while i>0:
element +=element
i-=1
because counter value is stored in cell 0 and > moves to cell 1 updates its value by+7 > moves to cell 2 increments 10 to its previous value and so on….
<<< return to cell 0 and decrements its value by 1
hence after loop completion we have array : [70,100,30,10]
>++.
moves to 1st element and increment its value by 2(two ‘+’) and then prints(‘.’) character with that ascii value. i.e for example in python:
chr(70+2) # prints 'H'
>+.
moves to 2nd cell increment 1 to its value 100+1 and prints(‘.’) its value i.e chr(101)
chr(101) #prints ‘e’
now there is no > or < in next piece so it takes present value of latest element and increment to it only
+++++ ++..
latest element = 101 therefore, 101+7 and prints it twice(as there are two‘..’) chr(108) #prints l twice
can be used as
for i in array:
for j in range(i.count(‘.’)):
print_value
———Where is it used?——-
It is just a joke language made to challenge programmers and is not used practically anywhere.
To answer the question of how it knows what to print, I have added the calculation of ASCII values to the right of the code where the printing happens:
> just means move to the next cell
< just means move to the previous cell
+ and - are used for increment and decrement respectively. The value of the cell is updated when the increment/decrement happens
+++++ +++++ initialize counter (cell #0) to 10
[ use loop to set the next four cells to 70/100/30/10
> +++++ ++ add 7 to cell #1
> +++++ +++++ add 10 to cell #2
> +++ add 3 to cell #3
> + add 1 to cell #4
<<<< - decrement counter (cell #0)
]
> ++ . print 'H' (ascii: 70+2 = 72) //70 is value in current cell. The two +s increment the value of the current cell by 2
> + . print 'e' (ascii: 100+1 = 101)
+++++ ++ . print 'l' (ascii: 101+7 = 108)
. print 'l' dot prints same thing again
+++ . print 'o' (ascii: 108+3 = 111)
> ++ . print ' ' (ascii: 30+2 = 32)
<< +++++ +++++ +++++ . print 'W' (ascii: 72+15 = 87)
> . print 'o' (ascii: 111)
+++ . print 'r' (ascii: 111+3 = 114)
----- - . print 'l' (ascii: 114-6 = 108)
----- --- . print 'd' (ascii: 108-8 = 100)
> + . print '!' (ascii: 32+1 = 33)
> . print '\n'(ascii: 10)
All the answers are thorough, but they lack one tiny detail: Printing.
In building your brainfuck translator, you also consider the character ., this is actually what a printing statement looks like in brainfuck. So what your brainfuck translator should do is, whenever it encounters a . character it prints the currently pointed byte.
Example:
suppose you have --> char *ptr = [0] [0] [0] [97] [0]...
if this is a brainfuck statement: >>>. your pointer should be moved 3 spaces to right landing at: [97], so now *ptr = 97, after doing that your translator encounters a ., it should then call
write(1, ptr, 1)
or any equivalent printing statement to print the currently pointed byte, which has the value 97 and the letter a will then be printed on the std_output.
I think what you are asking is how does Brainfuck know what to do with all the code. There is a parser written in a higher level language such as Python to interpret what a dot means, or what an addition sign means in the code.
So the parser will read your code line by line, and say ok there is a > symbol so i have to advance memory location, the code is simply, if (contents in that memory location) == >, memlocation =+ memlocation which is written in a higher level language, similarly if (content in memory location) == ".", then print (contents of memory location).
Hope this clears it up. tc

How to find the index of the smallest element in awk like this? [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
Input file
Cat|Dog|Dragon -40|1000|-20
K|B|L|D|E -9|1|-100|-8|9
Output file:
Dragon 20
B 1
The workflow is like this: In column2, find the index of the smallest absolute value, then fetch element in column1 using this index. Does anyone have ideas about this?
Using my incredible powers of perception, I detect a hint that this is not precisely an operational problem. Could it be Homework?
{
split($1, catdog, "|")
split($2, numbers, "|")
smallest = -1
for(i in numbers) {
a = numbers[i]
if(a < 0)
a = -a
if(smallest == -1 || a < smallest) {
smallest = a
j = i
}
}
printf("%-9s %2d\n", catdog[j], smallest)
}
The following awk command should work:
awk '
function abs(value)
{
return (value<0?-value:value)
}
{
len=split($2,arr,"|")
min=abs(arr[1])
minI=1
for(i=1;i<=len;i++){
if(abs(arr[i])<min){
min=abs(arr[i])
minI=i
}
}
split($1,arr2,"|")
print(arr2[minI],min)
}' file
Output:
Dragon 20
B 1
perl -lnwe '($k,$v) = map [split /\|/], split;
my %a;
#a{#$k} = map abs, #$v;
print "$_\t$a{$_}" for
(sort { $a{$a} <=> $a{$b} } keys %a)[0];
' input.txt
Output:
Dragon 20
B 1
Explanation:
The command line switches:
-l handle line endings, for convenience
-n read input from argument file name or stdin
The code:
The rightmost split splits each line on whitespace. We split those fields again on pipe | and put the result in an array ref [ ... ] so they fit inside a scalar variable ($k and $v). Then we declare a lexical hash %a to hold our data for each new input line. We need this declaration to avoid values from one line leaking over into the next line. We then assign via a hash slice the keys from $k to the absolute values in $v. This is the same principle as:
#foo{'a', 'b', 'c'} = (1, 2, 3); # %foo = ( a => 1, b => 2, c => 3);
Then we sort the hash on the values, take the first value with a subscript [0] and print out the corresponding key and value separated by a tab.