maximum number of comparisons for merging a list of length 5 and a list of length 6 - mergesort

I'm stuck on this question: What is the maximum number of comparisons for merging a list of length 5 and a list of length 6? Write down 2 lists where this maximum number is needed and 2 which can be merged using fewer comparisons. Find an upper bound for the number of comparisons for sorting 11 items using MERGESORT.
Any help would be appreciated!

Related

increasing primes producing last digits and order of efficiency

Adding two increasing primes can produce last digits of 0,2,4,6,8: there are six ways to do this,
1+9=3+7=0; 3+9=2; 1+3=4;7+9=6;1+7=8. Of course, 1+9 is equivalent to 9+1. Efficiency means after finding a large number of the desired last digit, one has reached the smallest prime in doing so. For example, starting at 3,19,23,29,43,59,73,79,83,89,103 I have found ten last digits of 2 by the time I reached 103. Do you think there is an order of efficiency that stays constant for each of the six ways for these last digits? Which of 1+9, 3+7 and so on will be the
most efficient or the second most efficient or the
least efficient.

Is there an efficient way to calculate the smallest N numbers from a set of numbers in hardware (HDL)?

I am trying to calculate the smallest N numbers from a set and I've found software algorithms to do this. I'm wondering if there is an efficient way to do this in hardware (i.e. HDL - in System Verilog or Verilog)? I am specifically trying to calculate the smallest 2 numbers from a set.
I am trying to do this combinationally optimizing with respect to area and speed (for a large set of signals) but I can only think of comparator trees to do this? Is there a more efficient way of doing this?
Thank you, any help is appreciated~
I don't think you can work around using comparator trees if you want to find the two smallest elements combinationally. However, if your goal isn't low latency than a (possibly pipelined) sequential circuit could also be an option.
One approach that I can come up with on the spot would be to break down the operation doing kind of an incomplete bubble sort in hardware using small sorting networks. Depending on the amount of area you are willing to spend you can use a smaller or larger p-sorting network that combinationaly sorts p elements at a time where p >= 3. You can then apply this network on your input set of size N, sorting p elements at a time. The two smallest elements in each iteration are stored in some sort of memory (e.g. an SRAM memory, if you want to process larger amounts of elements).
Here is an example for p=3 (the brackets indicate the grouping of elements the p-sorter is applied to):
(4 0 9) (8 6 7) (4 2 1) --> (0 4 9) (6 7 8) (1 2 4) --> 0 4 6 7 1 2
Now you start the next round:
You apply the p-sorter on the results of the first round.
Again you store the two smallest outputs of your p-sorter into the same memory overwriting values from the previous round.
Here the continuation of the example:
(0 4 6) (7 1 2) --> (0 4 6) (1 2 7) --> 0 4 1 2
In each round you can reduce the number of elements to look at by a factor of 2/p. E.g. with p==4 you discard half the elements in each round until the smallest two elements are stored at the first two memory locations. So the algorithm has time/cycle complexity of O(n log(n)). For an actual hardware implementation, you probably want to stick to powers of two for the size p of the sorting network.
Although the control logic of such a circuit is not trivial to implement the area should be mainly dominated by the size of your sorting network and the memory you need to hold the first 2/p*N intermediate results (assuming your input signals are not already stored in a memory that you can reuse for that purpose). If you want to tune your circuit towards throughput you can increase p and pipeline the sorting network at the expense of additional area. Additional speedup could be gained by replacing the single memory using up to p two-port memories (1 read and 1 write port each) which would allow you to fetch and write back the data for the sorting network in a single cycle thus increasing the utilization ratio of the comparators in the sorting network.

What is the maximum size of symbol data type in KDB+?

I cannot find the maximum size of the symbol data type in KDB+.
Does anyone know what it is?
If youa re talking the physical length of a symbol, well symbols exist as interred strings in kdb, so the maximum string length limit would apply. As strings are just a list of characters in kdb, the maximum size of a string would be the maximum length of a list. In 3.x this would be 264 - 1, In previous versions of kdb this limit was 2,000,000,000.
However there is a 2TB maximum serialized size limit that would likely kick in first, you can roughly work out the size of a sym by serializing it,
q)count -8!`
10
q)count -8!`a
11
q)count -8!`abc
13
So each character adds a single byte, this would give a roughly 1012 character length size limit
If you mean the maximum amount of symbols that can exist in memory, then the limit is 1.4B.

What is the shortest human-readable hash without collision?

I have a total number of W workers with long worker IDs. They work in groups, with a maximum of M members in each group.
To generate a unique group name for each worker combination, concantating the IDs is not feasible. I am think of doing a MD5() on the flattened sorted worker id list. I am not sure how many digits should I keep for it to be memorable to humans while safe from collision.
Will log( (26+10), W^M ) be enough ? How many redundent chars should I keep ? I there any other specialized hash function that works better for this scenario ?
The total number of combinations of 500 objects taken by up to 10 would be approximately 2.5091E+20, which would fit in 68 bits (about 13 characters in base36), but I don't see an easy algorithm to assign each combination a number. An easier algorithm would be like this: if you assign each person a 9-bit number (0 to 511) and concatenate up to 10 numbers, you would get 90 bits. To encode those in base36, you would need 18 characters.
If you want to use a hash that with just 6 characters in base36 (about 31 bits), the probability of a collision depends on the total number of groups used during the lifetime of the application. If we assume that each day there are 10 new groups (that were not encountered before) and that the application will be used for 10 years, we would get 36500 groups. Using the calculator provided by Nick Barnes shows that there is a 27% chance of a collision in this case. You can adjust the assumptions to your particular situation and then change the hash length to fit your desired maximum chance of a collision.

Hash a Sequence of positive/negative integers

I have a file with millions of lines (actually it's an online stream of data, which means we are receiving it line by line) , each line consists of an array of integers which is not sorted (positive and negative) there's no limit for the each number and the lengths are different and we might have duplicate values in one line,
I want to remove the duplicate lines (if 2 lines have same values regardless of how they are ordered we consider them duplicate), is there any good hashing function ?
We want to do this in O(n) while n is number of lines (we can assume that the maximum possibele number of elements in each line is constant, e.g. we have maximum of 100 elements in each line)
I've read some of the questions posted here in stackoverflow and I also googled it, most of them were for the cases where the arrays are of the same length or the integers are positive or even or they are sorted, is there any way to solve this in general case ?
My solution:
First we sort each line with the use of O(n) sorting algorithm e.g. Counting sort , then we put them into a string and then we use md5 hashing to put them into a hashset. If it's not in the set we put it into that set, if it's already in the list we check the arrays with the same hash value.
Problem with the solution : sorting using the Counting Sort takes a lot of space as there's no limit for the numbers and the collisions are possible .
The problem with using a hashing algorithm on a set of data this large is that you have a high probability of two different lines hashing to the same value. You want to stay in O(n) but I am not sure that is possible, with the size of the data and accuracy needed. If you use heapsort, which is space efficient and then traverse down the new sorted data removing consecutive lines that are the same you could accomplish this in O(nlogn)