The call stack size of quick sort - quicksort

I read this answer and found an implementation of Quicksort here. It's still unclear to me why Quicksort requires O(log n) extra space.
I understand what a call stack is. I applied the implementation stated above to an array of random numbers and saw n - 1 calls of quickSort.
public static void main(String[] args) {
Random random = new Random();
int num = 8;
int[] array = new int[num];
for (int i = 0; i < num; i++) {
array[i] = random.nextInt(100);
}
System.out.println(Arrays.toString(array));
quickSort(array, 0, array.length - 1);
System.out.println(Arrays.toString(array));
}
static int partition(int arr[], int left, int right) {
int i = left, j = right;
int tmp;
int pivot = arr[(left + right) / 2];
while (i <= j) {
while (arr[i] < pivot)
i++;
while (arr[j] > pivot)
j--;
if (i <= j) {
tmp = arr[i];
arr[i] = arr[j];
arr[j] = tmp;
i++;
j--;
}
}
return i;
}
static void quickSort(int arr[], int left, int right) {
System.out.println("quickSort. left = " + left + " right = " + right);
int index = partition(arr, left, right);
if (left < index - 1)
quickSort(arr, left, index - 1);
if (index < right)
quickSort(arr, index, right);
}
The output I saw:
[83, 65, 68, 91, 43, 45, 58, 82]
quickSort. left = 0 right = 7
quickSort. left = 0 right = 6
quickSort. left = 0 right = 4
quickSort. left = 0 right = 3
quickSort. left = 0 right = 2
quickSort. left = 0 right = 1
quickSort. left = 5 right = 6
[43, 45, 58, 65, 68, 82, 83, 91]
It makes that 7 (n -1) calls. So why does quickSort require O(log n) space for its call stack if the number of calls depends on n, not log n?

I think I understand why the stack size of Quicksort is O(n) in the worst case.
One part of the array (suppose left) to be sorted consists of one element, and the other part (right) consists of n - 1 elements. The size of the left part is always 1, and the size of the right part decrements by 1 every time.
Thus, we initially call Quicksort and then call it n - 1 times for the right part recursively. So extra space for the call stack is O(n). And since the partitioning procedure takes O(n) for every recursive call, the time complexity is O(n2).
As for the average case analysis, now I don't know how to prove O(n * log n) for the time complexity and O(log n) for extra space. But I know that if I divide the input array into two almost equal parts, I'll call Quicksort (log n) / 2 times for the left part. And the right part is sorted using tail recursion which doesn't add to the call stack.
https://en.wikipedia.org/wiki/Quicksort
So extra space needed for Quicksort is O (log n) in this case. The constant factor 1/2 is left out.
Since the partitioning routine is n, the time complexity is O(n * log n).
Please correct me if my assumptions are wrong. I'm ready to read and accept your answer.

Related

Minimum cost solution to connect all elements in set A to at least one element in set B

I need to find the shortest set of paths to connect each element of Set A with at least one element of Set B. Repetitions in A OR B are allowed (but not both), and no element can be left unconnected. Something like this:
I'm representing the elements as integers, so the "cost" of a connection is just the absolute value of the difference. I also have a cost for crossing paths, so if Set A = [60, 64] and Set B = [63, 67], then (60 -> 67) incurs an additional cost. There can be any number of elements in either set.
I've calculated the table of transitions and costs (distances and crossings), but I can't find the algorithm to find the lowest-cost solution. I keep ending up with either too many connections (i.e., repetitions in both A and B) or greedy solutions that omit elements (e.g., when A and B are non-overlapping). I haven't been able to find examples of precisely this kind of problem online, so I hoped someone here might be able to help, or at least point me in the right direction. I'm not a graph theorist (obviously!), and I'm writing in Swift, so code examples in Swift (or pseudocode) would be much appreciated.
UPDATE: The solution offered by #Daniel is almost working, but it does occasionally add unnecessary duplicates. I think this may be something to do with the sorting of the priorityQueue -- the duplicates always involve identical elements with identical costs. My first thought was to add some kind of "positional encoding" (yes, Transformer-speak) to the costs, so that the costs are offset by their positions (though of course, this doesn't guarantee unique costs). I thought I'd post my Swift version here, in case anyone has any ideas:
public static func voiceLeading(from chA: [Int], to chB: [Int]) -> Set<[Int]> {
var result: Set<[Int]> = Set()
let im = intervalMatrix(chA, chB: chB)
if im.count == 0 { return [[0]] }
let vc = voiceCrossingCostsMatrix(chA, chB: chB, cost: 4)
// NOTE: cm contains the weights
let cm = VectorUtils.absoluteAddMatrix(im, toMatrix: vc)
var A_links: [Int:Int] = [:]
var B_links: [Int:Int] = [:]
var priorityQueue: [Entry] = []
for (i, a) in chA.enumerated() {
for (j, b) in chB.enumerated() {
priorityQueue.append(Entry(a: a, b: b, cost: cm[i][j]))
if A_links[a] != nil {
A_links[a]! += 1
} else {
A_links[a] = 1
}
if B_links[b] != nil {
B_links[b]! += 1
} else {
B_links[b] = 1
}
}
}
priorityQueue.sort { $0.cost > $1.cost }
while priorityQueue.count > 0 {
let entry = priorityQueue[0]
if A_links[entry.a]! > 1 && B_links[entry.b]! > 1 {
A_links[entry.a]! -= 1
B_links[entry.b]! -= 1
} else {
result.insert([entry.a, (entry.b - entry.a)])
}
priorityQueue.remove(at: 0)
}
return result
}
Of course, since the duplicates have identical scores, it shouldn't be a problem to just remove the extras, but it feels a bit hackish...
UPDATE 2: Slightly less hackish (but still a bit!); since the requirement is that my result should have equal cardinality to max(|A|, |B|), I can actually just stop adding entries to my result when I've reached the target cardinality. Seems okay...
UPDATE 3: Resurrecting this old question, I've recently had some problems arise from the fact that the above algorithm doesn't fulfill my requirement |S| == max(|A|, |B|) (where S is the set of pairings). If anyone knows of a simple way of ensuring this it would be much appreciated. (I'll obviously be poking away at possible changes.)
This is an easy task:
Add all edges of the graph in a priority_queue, where the biggest priority is the edge with the biggest weight.
Look each edge e = (u, v, w) in the priority_queue, where u is in A, v is in B and w is the weight.
If removing e from the graph doesn't leave u or v isolated, remove it.
Otherwise, e is part of the answer.
This should be enough for your case:
#include <bits/stdc++.h>
using namespace std;
struct edge {
int u, v, w;
edge(){}
edge(int up, int vp, int wp){u = up; v = vp; w = wp;}
void print(){ cout<<"("<<u<<", "<<v<<")"<<endl; }
bool operator<(const edge& rhs) const {return w < rhs.w;}
};
vector<edge> E; //edge set
priority_queue<edge> pq;
vector<edge> ans;
int grade[5] = {3, 3, 2, 2, 2};
int main(){
E.push_back(edge(0, 2, 1)); E.push_back(edge(0, 3, 1)); E.push_back(edge(0, 4, 4));
E.push_back(edge(1, 2, 5)); E.push_back(edge(1, 3, 2)); E.push_back(edge(1, 4, 0));
for(int i = 0; i < E.size(); i++) pq.push(E[i]);
while(!pq.empty()){
edge e = pq.top();
if(grade[e.u] > 1 && grade[e.v] > 1){
grade[e.u]--; grade[e.v]--;
}
else ans.push_back(e);
pq.pop();
}
for(int i = 0; i < ans.size(); i++) ans[i].print();
return 0;
}
Complexity: O(E lg(E)).
I think this problem is "minimum weighted bipartite matching" (although searching for " maximum weighted bipartite matching" would also be relevant, it's just the opposite)

Quicksort - Space complexity - Why is it O(logN) not O(N)?

So in quicksort, the space complexity is said to be O(log N) but here is what I've thought. Since the logN arises from the stack calls, one can always choose the worst pivot leading to O(N) calls rather than O(logN) calls? Shouldn't it be O(N)?
This java example limits stack space to O(log(n)) by only using recursion for the smaller part, then looping back to handle the larger part. Worst case time complexity is still O(n^2).
public static void qsort(long[] a, int lo, int hi)
{
while(lo < hi){
int md = lo+(hi-lo)/2;
int ll = lo-1;
int hh = hi+1;
long p = a[md];
long t;
while(true){
while(a[++ll] < p);
while(a[--hh] > p);
if(ll >= hh)
break;
t = a[ll];
a[ll] = a[hh];
a[hh] = t;
}
ll = hh++;
if((ll - lo) <= (hi - hh)){
qsort(a, lo, ll);
lo = hh;
} else {
qsort(a, hh, hi);
hi = ll;
}
}
}

merge sort performance compared to insertion sort

For any array of length greater than 10, is it safe to say that merge sort performs fewer comparisons among the array's elements than does insertion sort on the same array because the best case for the run time of merge sort is O(N log N) while for insertion sort, its O(N)?
My take on this. First off, you are talking about comparisons, but there are swaps as well that matter.
In insertion sort in the worst case (an array sorted in opposite direction) you have to do n^2 - n comparisons and swaps (11^2 - 11 = 121 - 11 = 110 for 11 elements, for example). But if the array is even partially sorted in needed order (I mean many elements already stay at correct positions or even not far from them), the number of swaps&comparisons may significantly drop. The right position for the element will be found pretty soon and there will be no need for performing as many actions as in case of an array sorted in opposite order. So, as you can see for arr2, which is almost sorted, the number of actions will become linear (in relation to the input size) - 6.
var arr1 = [11,10,9,8,7,6,5,4,3,2,1];
var arr2 = [1,2,3,4,5,6,7,8,11,10,9];
function InsertionSort(arr) {
var arr = arr, compNum = 0, swapNum = 0;
for(var i = 1; i < arr.length; i++) {
var temp = arr[i], j = i - 1;
while(j >= 0) {
if(temp < arr[j]) { arr[j + 1] = arr[j]; swapNum++; } else break;
j--;
compNum++;
}
arr[j + 1] = temp;
}
console.log(arr, "Number of comparisons: " + compNum, "Number of swaps: " + swapNum);
}
InsertionSort(arr1); // worst case, 11^2 - 11 = 110 actions
InsertionSort(arr2); // almost sorted array, few actions
In merge sort we always do aprox. n*log n actions - the properties of the input array don't matter. So, as you can see in both cases we will get both of our arrays sorted in 39 actions:
var arr1 = [11,10,9,8,7,6,5,4,3,2,1];
var arr2 = [1,2,3,4,5,6,7,8,11,10,9];
var actions = 0;
function mergesort(arr, left, right) {
if(left >= right) return;
var middle = Math.floor((left + right)/2);
mergesort(arr, left, middle);
mergesort(arr, middle + 1, right);
merge(arr, left, middle, right);
}
function merge(arr, left, middle, right) {
var l = middle - left + 1, r = right - middle, temp_l = [], temp_r = [];
for(var i = 0; i < l; i++) temp_l[i] = arr[left + i];
for(var i = 0; i < r; i++) temp_r[i] = arr[middle + i + 1];
var i = 0, j = 0, k = left;
while(i < l && j < r) {
if(temp_l[i] <= temp_r[j]) {
arr[k] = temp_l[i]; i++;
} else {
arr[k] = temp_r[j]; j++;
}
k++; actions++;
}
while(i < l) { arr[k] = temp_l[i]; i++; k++; actions++;}
while(j < r) { arr[k] = temp_r[j]; j++; k++; actions++;}
}
mergesort(arr1, 0, arr1.length - 1);
console.log(arr1, "Number of actions: " + actions); // 11*log11 = 39 (aprox.)
actions = 0;
mergesort(arr2, 0, arr2.length - 1);
console.log(arr2, "Number of actions: " + actions); // 11*log11 = 39 (aprox.)
So, answering your question:
For any array of length greater than 10, is it safe to say that merge sort performs fewer comparisons among the array's elements than does insertion sort on the same array
I would say that no, it isn't safe to say so. Merge sort can perform more actions compared to insertion sort in some cases. The size of an array isn't important here. What is important in this particular case of comparing insertion sort vs. merge sort is how far from the sorted state is your array. I hope it helps :)
BTW, merge sort and insertion sort have been united in a hybrid stable sorting algorithm called Timsort to get the best from both of them. Check it out if interested.

Why does this quicksort function not work?

void swap(Person* a, int i, int j) {
Person b;
b = a[i];
a[i] = a[j];
a[j] = b;
}
void quicksort(Person* a, int left, int right, PersonComparator cmp) {
if (left >= right) return; // 0 or 1 elements, recursion end
swap(a, left, (left + right) / 2); // move pivot element to left
int j = left;
for (int i = left + 1; i <= right; i++) {
if (i < left) {
swap(a, ++j, i);
}
// assert: v[i] < v[left] for i = left+1..j
}
swap(a, left, j); // move back pivot element
quicksort(a, left, j-1, cmp); // assert: v[i] < v[j] for i = left..j-1
quicksort(a, j+1, right, cmp); // assert: v[i] >= v[j] for i = j+1..right
}
I somehow have to get this "cmp" in there but I don't know where and how. Person* is a pointer to the struct Person btw.
You need to learn to use a debugger. Without that, you are lost. Run your code with a debugger and check where the code does something that you don't expect.
I suppose these lines:
for (int i = left + 1; i <= right; i++) {
if (i < left) {
won't do what you expect. It looks more like a question of "why would you think this might ever work", and not "why doesn't it work". Especially since you don't seem to be using the comparator at all.

Merging two sorted lists, one with additional 0s

Consider the following problem:
We are given two arrays A and B such that A and B are sorted
except A has B.length additional 0s appended to its end. For instance, A and B could be the following:
A = [2, 4, 6, 7, 0, 0, 0]
B = [1, 7, 9]
Our goal is to create one sorted list by inserting each entry of B
into A in place. For instance, running the algorithm on the above
example would leave
A = [1, 2, 4, 6, 7, 7, 9]
Is there a clever way to do this in better than O(n^2) time? The only way I could think of is to insert each element of B into A by scanning linearly and performing the appropriate number of shifts, but this leads to the O(n^2) solution.
Some pseudo-code (sorta C-ish), assuming array indexing is 0-based:
pA = A + len(A) - 1;
pC = pA; // last element in A
while (! *pA) --pA; // find the last non-zero entry in A
pB = B + len(B) - 1;
while (pA >= A) && (pB >= B)
if *pA > *pB
*pC = *pA; --pA;
else
*pC = *pB; --pB;
--pC
while (pB >= B) // still some bits in B to copy over
*pC = *pB; --pB; --pC;
Not really tested, and just written off the top of my head, but it should give you the idea... May not have the termination and boundary conditions exactly right.
You can do it in O(n).
Work from the end, moving the largest element towards the end of A. This way you avoid a lot of trouble to do with where to keep the elements while iterating. This is pretty easy to implement:
int indexA = A.Length - B.Length - 1;
int indexB = B.Length - 1;
int insertAt = A.Length;
while (indexA > 0 || indexB > 0)
{
insertAt--;
A[insertAt] = max(B[indexB], A[indexA]);
if (A[indexA] <= B[indexB])
indexB--;
else
indexA--;
}