Sorting vs Linear search for finding min/max - perl

Recently, I came across the following piece of code in perl that returns the minimum numeric value among all passed arguments.
return 0 + ( sort { $a <=> $b } grep { $_ == $_ } #_ )[0];
I usually use simple linear search to find the min/max in a list, which for me seems to be simple and adequately optimal. Is the above code in any way better than simple linear search? Anything to do with perl in this case? Thanks!

O() doesn't say anything about how long an algorithm takes. For example, all else being equal, I'd always choose Algorithm 2 among the following two:
Algorithm 1: O(2*N + 1000 days) = O(N)
Algorithm 2: O(5*N + 100 ms) = O(N log N)
O() specifies how the time the algorihm takes scales as the size of the input increases. (Well, it can be used for any resources, not just time.) Since the earlier two answers only talk in terms of O(), they are useless.
If you want to know how fast an algorithm which algorithm is better for an input of a given size, you'll need to benchmark them.
In this case, it looks like List::Util's min is always significantly better.
$ perl x.pl 10
Rate sort LUmin
sort 1438165/s -- -72%
LUmin 5210584/s 262% --
$ perl x.pl 100
Rate sort LUmin
sort 129073/s -- -91%
LUmin 1485473/s 1051% --
$ perl x.pl 1000
Rate sort LUmin
sort 6382/s -- -97%
LUmin 199698/s 3029% --
Code:
use strict;
use warnings;
use Benchmark qw( cmpthese );
use List::Util qw( min );
my %tests = (
'sort' => 'my $x = ( sort { $a <=> $b } #n )[0];',
'LUmin' => 'my $x = min #n;',
);
$_ = 'use strict; use warnings; our #n; ' . $_
for values %tests;
local our #n = map rand, 1..( $ARGV[0] // 10 );
cmpthese(-3, \%tests);

You are right. If you do not need sorted data for any other purpose, the simple linear search is fastest. To do its job, a sort would have to look at each datum at least once, anyway.
Only when the sorted data would be useful for other purposes -- or when I didn't care about run time, power usage, heat dissipation, etc. -- would I sort data to find the minimum and maximum values.
Now, #SimeonVisser is correct. The sort does have O(n*log(n)). This is not as much slower than O(n) as many programmers imagine that it were. In practical cases of interest, the overhead of managing the sort's balanced binary tree (or other such structure) probably matters about as much as the log(n) factor does. So, one needn't shrink in horror from the prospect of sorting! However, the linear search is still faster: you are quite right about this.
Moreover, #DavidO adds such an insightful comment that I would quote it here in his own words:
A linear search is also an easier algorithm to generalize. A linear search could easily (and relatively efficiently) be disk based for large data sets, for example. Whereas doing a disk based sort becomes relatively expensive, and even more complex if the field sizes aren't
normalized.

Linear search is O(n) for obvious reasons. Sorting is O(n log n) (see sort in Perl documentation). So yes, linear search is indeed faster in terms of complexity. This does not only apply to Perl but to any programming language that implements these algorithms.
As with many problems, there are multiple ways to solve it and there are also multiple ways to obtain the min/max of a list. Conceptually I would say that linear search is better when you only want the min or max of a list as the problem does not call for sorting.

Related

Write a recursive function to find the smallest element in a vector(MATLAB)

Write a recursive function to find the smallest element in a vector.
We can not use loops but can use if statements.
Using RECURSION is a must.
I Could Not think of any solution, the main problem was if I define a function then I have to give it some value and if I do so then whenever recursion occur it will again reset the value of that variable.
function miminimumis=minimumval(k)
aa=k(1);
k=k(k<k(1));
if length(k)==0
miminimumis=aa;
else
% this line gives the recursion
miminimumis=minimumval(k);
end
end
here we create a new array which consists of elements only smaller than the first element. if this array is empty then it means that first element is min, if not we do the same for the new array unless we reach an empty array. the recursion is provided by using the same function in the definition of the function.
Solutions which in the worst case reduce the problem size by 1 will cause the recursive stack to have O(length(array)) growth. An example of this would be when you filter the array to yield values less than the first element when the array is in descending order. This will inevitably lead to stack overflow for sufficiently large arrays. To avoid this, you want to use a recursion which splits the problem of size n into two subproblems of size n/2, yielding O(log(length(array))).
I'm not a Matlab user/programmer, so I'll express the algorithm in pseudo-code. The following assumes that arrays are 1-based and that there is a built-in function min(a,b) which yields the minimum of two scalars, a and b. (If not, it's easy to replace min() with if/else logic.)
function min_element(ary) {
if length(ary) == 1 {
return ary[1]
}
split ary into first_half, second_half which differ in length by no more than 1
return min( min_element(first_half), min_element(second_half) )
}
This could alternatively be written using two additional arguments for the lo_index and hi_index to search between. Calculate the midpoint as the integer average of the low and high indices, and make the two recursive calls min_element(ary, lo_index, mid) and min_element(ary, mid+1, hi_index). The base condition is when lo_index == hi_index, in which case you return that element. This should be faster, since it uses simple integer arithmetic and avoids creating sub-arrays for the subproblems. It has the trade-off of being slightly less friendly to the end user, who has to start the process by calling min_element(ary, 1, length(ary)).
If the stack limit is 500, you'll be limited to arrays of length < 500 using linear stack growth algorithms. With the divide-and-conquer algorithm described above, you won't get stack overflow unless you have an array of length ~2500, much bigger than any array you could actually create.

Easier method to compute minimal perfect hash?

I have smallish(?) sets (ranging in count from 0 to 100) of unsigned 32 bit integers. For a given set, I want to come up with minimal parameters to describe a minimal(istic) perfect hash of the given set. High level of the code I used to experiment with the idea ended up something like:
def murmur(key, seed=0x0):
// Implements 32bit murmur3 hash...
return theHashedKey
sampleInput = [18874481, 186646817, 201248225, 201248705, 201251025, 201251137, 201251185, 184472337, 186649073, 201248625, 201248721, 201251041, 201251153, 184473505, 186649089, 201248657, 201251009, 201251057, 201251169, 186646818, 201248226, 201248706, 201251026, 201251138, 201251186, 186649074, 201248626, 201248722, 201251042, 201251154, 186649090, 201248658, 201251010, 201251058, 201251170]
for seed in range(11111): // arbitrary upper seed limit
for modulus in range(10000):
hashSet = set((murmur(x, seed=seed) % modulus for x in sampleInput))
if len(hashSet) >= len(allValves):
print('minimal modulus', modulus, 'for seed', seed)
break
This is just basic pseudo code for a 2 axis brute force search. I add lines by keeping track of the different values, I can find seed and modulus values that give a perfect hash and then select the one with the smallest modulus.
It seems to me that there should be a more elegant/deterministic way to come up with these values? But that's where my math skills overflow.
I'm experimenting in Python right now, but ultimately want to implement something in C on a small embedded platform.

What is the meaning of this line keys(%S)=#C_fields;?

I have one general question in Perl.What is the meaning of below line
keys(%S)=#C_fields;
keys(%S)=#C_fields; is identical to keys(%S) = scalar #C_fields;
and from perldoc -f keys
Used as an lvalue, keys allows you to increase the number of hash buckets allocated for the given hash. This can gain you a measure of efficiency if you know the hash is going to get big. (This is similar to pre-extending an array by assigning a larger number to $#array.) If you say
keys %hash = 200;
then %hash will have at least 200 buckets allocated for it--256 of them, in fact, since it rounds up to the next power of two.
So hash %S will get number of buckets which are at least size of #C_fields array.

Performing math operations on very large numbers in Perl

I have a case where some values in a data file have 64 bit wrap around which makes them very large, like, 18446744073709551608.
So I have to perform a subtraction from 2^64. I tried this using the simple
2^64 - 18446744073709551608
But I guess this number is too large and don't get the actual answer 8. What do I need to do to perform this substraction.
Check out the bignum pragma:
use bignum;
print 2**64 - 18446744073709551608;
This should properly print 8.
Note that bignum is just a layer that makes all constant numbers automatically Math::BigFloat or Math::BigInt objects. If you only want this for some numbers, you can either specify the use bignum; in a restricted scope, add no bignum; in places, or explicitly use Math::BigFloat->new('your constant') (or BigInt) to make particular numbers and the results of any operations involving them big.
use bignum ;
print 2**64 - 18446744073709551608 ,"\n" ;
in perl language , ^ = ** , mod = %
good luck !

What's the smallest non-zero, positive floating-point number in Perl?

I have a program in Perl that works with probabilities that can occasionally be very small. Because of rounding error, sometimes one of the probabilities comes out to be zero. I'd like to do a check for the following:
use constant TINY_FLOAT => 1e-200;
my $prob = calculate_prob();
if ( $prob == 0 ) {
$prob = TINY_FLOAT;
}
This works fine, but I actually see Perl producing numbers that are smaller than 1e-200 (I just saw a 8.14e-314 fly by). For my application I can change calculate_prob() so that it returns the maximum of TINY_FLOAT and the actual probability, but this made me curious about how floating point numbers are handled in Perl.
What's the smallest positive floating-point value in Perl? Is it platform-dependent? If so, is there a quick program that I can use to figure it out on my machine?
According to perldoc perlnumber, Perl uses the native floating point format where native is defined as whatever the C compiler that was used to compile it used. If you are more worried about precision/accuracy than speed, take a look at bignum.
The other answers are good. Here is how to find out the approximate ε if you did not know any of that information and could not post your question on SO ;-)
#!/usr/bin/perl
use strict;
use warnings;
use constant MAX_COUNT => 2000;
my ($x, $c);
for (my $y = 1; $y; $y /= 2) {
$x = $y;
# guard against too many iterations
last if ++$c > MAX_COUNT;
}
printf "%d : %.20g\n", $c, $x;
Output:
C:\Temp> thj
1075 : 4.9406564584124654e-324
It may be important to note that that smallest number is what's called a subnormal number, and math done on it may produce surprising results:
$ perl -wle'$x = 4.94e-324; print for $x, $x*1.4, $x*.6'
4.94065645841247e-324
4.94065645841247e-324
4.94065645841247e-324
That's because it uses the smallest allowed (base-2) exponent and a mantissa of the form (base-2) 0.0000000...0001. Larger, but still subnormal, numbers will also have a mantissa beginning 0. and an increasing range of precision.
I actually don't know how perl represents floating point numbers (and I think this is something you configure when you build perl), but if we assume that IEEE 754 is used then epsilon for a 64 bit floating point number is 4.94065645841247E-324.