Go Golang : Merge Sort Stack Overflow - merge

http://play.golang.org/p/rRccL6YHtQ
I just implement the same code as in CLRS
Pseudocode from CLRS
Merge-Sort(A, p, r)
if p < r
q = [(q+r)/2]
Merge-Sort(A, p, q)
Merge-Sort(A, q+1, r)
Merge(A, p, q, r)
Merge(A, p, q, r)
n_1 = q - p + 1
n_2 = r - q
let L[1 .. n_1 + 1] and R[1 .. n_2 + 1] be new arrays
for i = 1 to n_1
L[i] = A[p+i-1]
for j = 1 to n_2
R[j] = A[q+j]
L[n_1 + 1] = INFINITE
R[n_2 + 1] = INFINITE
i = 1
j = 1
for k = p to r
if L[i] <= R[j]
A[k] = L[i]
i = i + 1
else A[k] = R[j]
j = j + 1
But I am getting the stack overflow in the merge sort.
[9 -13 4 -2 3 1 -10 21 12]
runtime: goroutine stack exceeds 250000000-byte limit
fatal error: stack overflow
runtime stack:
runtime.throw(0x1b4980, 0x20280)
How do I make this work?
func MergeSort(slice []int, first, last int) {
if len(slice) < 2 {
return
}
if first < last {
mid := len(slice) / 2
MergeSort(slice, first, mid)
MergeSort(slice, mid+1, last)
Merge(slice, first, mid, last)
}
}
thanks a lot!

mid := len(slice) / 2
That's not where the middle should go. The middle is supposed to be halfway between first and last, which define the region of the slice you're sorting, not halfway through the slice. Alternatively, you can slice the slice to make new slices and drop first and last.

You don't seem to be re-slicing the slice you're merge-sorting, so there is no termination of the recursion.
You could either re-slice the slice before recursion, or you can do something like length := last - first (this may have an off-by-one, depending if "last" is the last index or one past the last index) and then go from there.

Related

Hash function that returns the same hash for a sum even if different terms lead to the same sum

let's say I have:
n = 14
n is the result of the following sums of integers:
[5, 2, 7] -> 5 + 2 + 7 = 14 = n
[3, 4, 5, 2] -> 3 + 4 + 5 + 2 = 14 = n
[1, 13] -> 1 + 13 = 14 = n
[13, 1] -> 13 + 1 = 14 = n
[4, 3, 5, 2] -> 4 + 3 + 5 + 2 = 14 = n
...
I would need a hash function h so that:
h([5, 2, 7]) = h([3, 4, 5, 2]) = h([1, 13]) = h([13, 1]) = h([4, 3, 5, 2]) = h(...)
I.e. it doesn't matter the order of the integer terms and as long as their integer sum is the same, their hash should also the same.
I need to do this without computing the sum n, because the terms as well as n can be very high and easily overflow (they don't fit the bits of an int), that's why I am asking this question.
Are you aware or maybe do you have an insight on how I can implement such a hash function?
Given a list/sequence of integers, this hash function must return the same hash if the sum of the integers would be the same, but without computing the sum.
Thank you for your attention.
EDIT: I elaborated on #derpirscher's answer and modified his function a bit further as I had collisions on multiples of BIG_PRIME (this example is in JavaScript):
function hash(seq) {
const BIG_PRIME = 999999999989;
const MAX_SAFE_INTEGER_DIV_2_FLOOR = Math.floor(Number.MAX_SAFE_INTEGER / 2);
let h = 0;
for (i = 0; i < seq.length; i++) {
let value = seq[i];
if (h > MAX_SAFE_INTEGER_DIV_2_FLOOR) {
h = h % BIG_PRIME;
}
if (value > MAX_SAFE_INTEGER_DIV_2_FLOOR) {
value = value % BIG_PRIME;
}
h += value;
}
return h;
}
My question now would be: what do you think about this function? Are there some edge cases I didn't take into account?
Thank you.
EDIT 2:
Using the above function hash([1,2]); and hash([4504 * BIG_PRIME +1, 4504 * BIG_PRIME + 2]) will collide as mentioned by #derpirscher.
Here is another modified of version of the above function, which computes the modulo % BIG_PRIME only to one of the two terms if either of the two are greater than MAX_SAFE_INTEGER_DIV_2_FLOOR:
function hash(seq) {
const BIG_PRIME = 999999999989;
const MAX_SAFE_INTEGER_DIV_2_FLOOR = Math.floor(Number.MAX_SAFE_INTEGER / 2);
let h = 0;
for (let i = 0; i < seq.length; i++) {
let value = seq[i];
if (
h > MAX_SAFE_INTEGER_DIV_2_FLOOR &&
value > MAX_SAFE_INTEGER_DIV_2_FLOOR
) {
if (h > MAX_SAFE_INTEGER_DIV_2_FLOOR) {
h = h % BIG_PRIME;
} else if (value > MAX_SAFE_INTEGER_DIV_2_FLOOR) {
value = value % BIG_PRIME;
}
}
h += value;
}
return h;
}
I think this version lowers the number of collisions a bit further.
What do you think? Thank you.
EDIT 3:
Even though I tried to elaborate on #derpirscher's answer, his implementation of hash is the correct one and the one to use.
Use his version if you need such an hash function.
You could calculate the sum modulo some big prime. If you want to stay within the range of int, you need to know what the maximum integer is, in the language you are using. Then select a BIG_PRIME that's just below maxint / 2
Assuming an int to be 4 bytes, maxint = 2147483647 thus the biggest prime < maxint/2 would be 1073741789;
int hash(int[] seq) {
BIG_PRIME = 1073741789;
int h = 0;
for (int i = 0; i < seq.Length; i++) {
h = (h + seq[i] % BIG_PRIME) % BIG_PRIME;
}
return h;
}
As at every step both summands will always be below maxint/2 you won't get any overflows.
Edit
From a mathematical point of view, the following property which may be important for your use case holds:
(a + b + c + ...) % N == (a % N + b % N + c % N + ...) % N
But yeah, of course, as in every hash function you will have collisions. You can't have a hash function without collisions, because the size of the domain of the hash function (ie the number of possible input values) is generally much bigger than the the size of the codomain (ie the number of possible output values).
For your example the size of the domain is (in principle) infinite, as you can have any count of numbers from 1 to 2000000000 in your sequence. But your codomain is just ~2000000000 elements (ie the range of int)

How to determine the result of a code without running it

If I am given the code below, how can I tell what the resulting y-value would be. I apologize if this is a simple question but I find these types of questions very difficult.
For foo(-1,10)
function y = foo(x, a)
for k=-1:0
b=x-k;
while (x > -2) && (x < 2)
x=x+a+1;
end
end
y = b + x;
end
When running the programme I can see that b=10 but I don't understand how you get that. I would appreciate if someone could make this clearer for me.
Thank you!
Start from the top:
foo(x, a) has two parameters: x and a
foo(-1, 10) would mean that x = -1 and a = 10.
From there go down each line.
b = x - k would start out as b = -1 + (the value of k on that current iteration of the loop
Then you would do the same for the while loop.
x = -1 + 10 + 1
So,
x = 10
Now take that value and plug it into the while loop condition:
(10 > -2) and (10 < 2)
Is this condition true? No. So you move on to the next iteration of the for loop
At the end you set y equal to the value you got for b + the value you got for x

ind2sub for nonzero elements of triangular matrix

I just wanted to simply find the index of (row, col) that is a minimum point of a matrix A. I can use
[minval, imin] = min( A(:) )
and MATLAB built in function
[irow, icol] = ind2sub(imin);
But for efficiency reason, where matrix A is trigonal, i wanted to implement the following function
function [i1, i2] = myind2ind(ii, N);
k = 1;
for i = 1:N
for j = i+1:N
I(k, 1) = i; I(k, 2) = j;
k = k + 1;
end
end
i1 = I(ii, 1);
i2 = I(ii, 2);
this function returns 8 and 31 for the following input
[irow, icol] = myind2ind(212, 31); % irow=8, icol = 31
How can I implement myind2ind function more efficient way without using the internal "I"?
The I matrix can be generated by nchoosek.
For example if N = 5 we have:
N =5
I= nchoosek(1:N,2)
ans =
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
4 5
so that
4 repeated 1 times
3 repeated 2 times
2 repeated 3 times
1 repeated 4 times
We can get the number of rows of I with the Gauss formula for triangular number
(N-1) * (N-1+1) /2 =
N * (N -1) / 2 =
10
Given jj = size(I,1) + 1 - ii as a row index I that begins from the end of I and using N * (N -1) / 2 we can formulate a quadratic equation:
N * (N -1) / 2 = jj
(N^2 -N)/2 =jj
So
N^2 -N - 2*jj = 0
Its root is:
r = (1+sqrt(8*jj))/2
r can be rounded and subtracted from N to get the first element (row number of triangular matrix) of the desired output.
R = N + 1 -floor(r);
For the column number we find the index of the first element idx_first of the current row R:
idx_first=(floor(r+1) .* floor(r)) /2;
The column number can be found by subtracting current linear index from the linear index of the first element of the current row and adding R to it.
Here is the implemented function:
function [R , C] = myind2ind(ii, N)
jj = N * (N - 1) / 2 + 1 - ii;
r = (1 + sqrt(8 * jj)) / 2;
R = N -floor(r);
idx_first = (floor(r + 1) .* floor(r)) / 2;
C = idx_first-jj + R + 1;
end

Counting Number of Specific Outputs of a Function

If I have a matrix and I want to apply a function to each row of the matrix. This function has three possible outputs, either x = 0, x = 1, or x > 0. There's a couple things I'm running into trouble with...
1) The cases that output x = 1 or x > 0 are different and I'm not sure how to differentiate between the two when writing my script.
2) My function isn't counting correctly? I think this might be a problem with how I have my loop set up?
This is what I've come up with. Logically, I feel like this should work (except for the hiccup w/ the first problem I've stated)
[m n] = size(matrix);
a = 0; b = 0; c = 0;
for i = 1 : m
x(i) = function(matrix(m,:));
if x > 0
a = a + 1;
end
if x == 0
b = b + 1;
end
if x == 1
c = c + 1;
end
end
First you probably have an error in line 4. It probably should be i instead of m.
x(i) = function(matrix(i,:));
You can calculate a, b and c out of the loop:
a = sum(x>0);
b = sum(x==0);
c = sum(x==1);
If you want to distinguish x==1 and x>0 then may be with sum(xor(x==1,x>0)).
Also you may have problem with precision error when comparing double values with 0 and 1.

Matlab how to find out if a matrix is in another martrix

Let me describe my problem with an example. Suppose we have matrix A:
A =
1 0 1
1 1 1
0 1 1
and matrix B:
B =
1 1
1 1
How do i write function C = func(A, B) to check if B exists in A or not?
If it exists in A, the function returns C = [0 0 0; 0 1 1; 0 1 1], and if it does not, the function returns C = [0 0 0; 0 0 0; 0 0 0];.
Edit:
It should be mentioned that if A is m-by-n, and B is p-by-q, then m > p and p > q always.
Thanks in advance.
The most efficient requires the signal processing toolbox. Then you can simply use xcorr2(). Following your example, the following should work:
C = xcorr2 (A, B);
[Row, Col] = find (C == max (C(:)));
%% these are not the coordinates for the center of the best match, you will
%% have to find where those really are
%% The best way to understand this is to strip the "padding"
row_shift = (size (B, 1) - 1)/2;
col_shift = (size (B, 2) - 1)/2;
C = C(1+row_shift:end-row_shift, 1+col_shift:end-col_shift)
[Row, Col] = find (C == max (C(:)));
if (B == A(Row-row_shift:Row+row_shift, Col-col_shift:Col+col_shift))
disp ("B shows up in A");
endif
The code above looks a bit convoluted because I'm trying to cover inputs of any size (still only odd sized) but should work.
If you don't have this toolbox, I think you can use the Octave code for it which should require only small adjustments. Basically, the only three lines that matter follow (note that the code is under GPL):
[ma,na] = size(a);
[mb,nb] = size(b);
c = conv2 (a, conj (b (mb:-1:1, nb:-1:1)));
In Octave, if you are using at least the signal package 1.2.0, xcorr2 can also take an extra option "coeff" which calculates the normalized cross correlation. This will have a value of 1 when the match is perfect, so you can simplify this with:
C = xcorr2 (A, B, "coeff");
if (any (C(:) == 1)
display ("B shows up in A");
endif
I would do a loop to check resized sub-matrix of A. My code stops after B is found once but it could be updated to count the number of found occurrences.
A = [1 0 1; 1 1 1; 0 1 1; 0 0 0];
B = [1 1; 1 1];
[m n] = size(A);
[p q] = size(B);
found = 0;
x = 0;
while ~found && (x+p-1 < m)
y = 0;
x = x + 1;
while ~found && (y+q-1 < n)
y = y + 1;
A(x:x+p-1,y:y+q-1)
found = isequal(A(x:x+p-1,y:y+q-1),B);
end
end
fprintf('Found Matrix B in A(%d:%d, %d:%d)\n',x,x+p-1,y,y+q-1);
Maybe there is some vectored way, but this is a straight forward function which just slides across A and checks for B:
function ret=searchmat(A,B)
ret=zeros(size(A));
for k=0:size(A,1)-size(B,1)
for l=0:size(A,2)-size(B,2)
if isequal(A(1+k:size(B,1)+k,1+l:size(B,2)+l),B)
ret(1+k:size(B,1)+k,1+l:size(B,2)+l)=1; % or B as you wish
end
end
end
end
If B is discovered more than once, it's also marked with ones. If you want to quit after the first match, you can just put in a return statement in the if clause.