Is an algorithm with asymptotic runtime complexity of θ(n) always faster runtime than a similar algorithm with runtime complexity of θ(n^2 )? - quicksort

If so can you provide explicit examples? I understand that an algorithm like Quicksort can have O(n log n) expected running time, but O(n^2) in the worse case. I presume that if the same principle of expected/worst case applies to theta, then the above question could be false. Understanding how theta works will help me to understand the relationship between theta and big-O.

When $n$ is large enough, the algorithm with complexity $\theta(n)$ will run faster than the algorithm with complexity $\theta(n^2)$. In fact $\theta(n) / \theta(n^2)\to 0$ as $\theta \to \infty$. However there might be values of $n$ where $\theta(n) > \theta(n^2)$.

It's not always faster, only asymptotically faster (when n grows infinitely). But after some n — yes, it is always faster.
For example, for little n a bubble sort may operate faster than quick sort just because it's simpler (its θ has lower constants).
This has nothing to do with expected/worst cases: selecting a case is another problem that is not related to theta or big-O.
And about the relationship between theta and big-O: in computer science, big-O is often (mis)used in sense of θ, but in its strict meaning big-O is a more wide class than θ: it limits only the upper bound of a growing function while theta limits both bounds. E.g. when somebody says that Quicksort has a complexity of O(n log n), he actually means θ(n log n).

You are on the right track of thought.
Actual runtime of program can be quite different from asymptotic bounds.This is a fundamental concept that arises from the way asymptotic notation is defined.
You can read my answer here to clarify.

Related

Why is l2 regularization always an addition?

I am reading through info about the l2 regularization of neural network weights. So far I understood, the intention is that weights get pushed towards zero the larger they become i.e. large weights receive a high penalty while lower ones are less severely punished.
The formulae is usually:
new_weight = weight * update + lambda * sum(squared(weights))
My question: Why is this always positive? If the weight is already positive the l2 will never decrease it but makes things worse and pushes the weight away from zero. This is the case in almost all formulae I saw so far, why is that?
The formula you presented is very vague about what an 'update' is.
First, what is regularization? Generally speaking, the formula for L2 regularization is:
(n is traing set size, lambda scales the influence of the L2 term)
You add an extra term to your original cost function , which will be also partially derived for the update of the weights. Intuitively, this punishes big weights, so the algorithm tries to find the best tradeoff between small weights and the chosen cost function. Small weights are associated with finding a simpler model, as the behavior of the network does not change much when given some random outlying values. This means it filters out the noise of the data and comes down to learn the simplest possible solution. In other words, it reduces overfitting.
Going towards your question, let's derive the update rule. For any weight in the graph, we get
Thus, the update formula for the weights can be written as (eta is the learning rate)
Considering only the first term, the weight seems to be driven towards zero regardless of what's happening. But the second term can add to the weight, if the partial derivative is negative. All in all, weights can be positive or negative, as you cannot derive a constraint from this expression. The same applies to the derivatives. Think of fitting a line with a negative slope: the weight has to be negative. To answer your question, neither the derivative of regularized cost nor the weights have to be positive all the time.
If you need more clarification, leave a comment.

better building of kd-trees

Has anyone ever tried improving kd-trees using the following method?
Dividing each numeric dimension via some 1-d clustering method (e.g. Jenks Natural Breaks Optimization, or FayyadIranni or xyz...)
Sorting the dimensions on the expected value of the variance reduction within each division of that dimension
Building the KD-tree top-down selecting attributes from the order found in (2)
Breaking dimensions at each level of the KD-tree using the divisions found in (1)
And just to say the obvious. If (3) terminates when #rows is (say) less than 30 then nearest neighbor would require 30 distance measures, not N.
You want the tree to be balanced, so there is not much leeway in terms of where to split.
Also, you want the construction to be fast.
If you put in an O(n^2) method during construction, construction will likely be the new bottleneck.
In many cases, the very simple (original) k-d-tree is just as fast as any of the "optimized" techniques that try to determine the "best" splitting axis.

Who knows the computational complexity of the function quadprog in MATLAB?

The QP problem is convex. For Wiki, the problem can be solved in polynomial time.
But what exactly is the order?
That is an interesting question with (in my opinion) no clear answer. I am going to assume your problem is convex and you are interested in run-time complexity (as opposed to Iteration complexity).
As you may know, QuadProg is not one algorithm but rather, a generic name for something that solves Quadratic problems. It uses a set of algorithms underneath viz. Interior Point (Default), Trust-Region and Active-Set. Source.
Depending upon what you choose, each of these algorithms will have its own complexity analysis. For Trust-Region and Active-Set methods, the complexity analysis is extremely hard. In fact, Active-Set methods are not polynomial to begin with. Counterexamples exist where Active-Set methods take exponential "time" to converge (This is true also for the Simplex Method for Linear Programs). Source.
Now, assuming that you choose Interior Point methods, the answer is still not straightforward because there are various flavours of these methods. When Karmarkar first proposed this method, it was the first known polynomial algorithm for solving Linear Programs and it had a complexity of O(n^3.5). Source. These bounds were improved quite a lot later. However, this is for Linear Programs.
Finally, to answer your question, Ye and Tse proved in 1989 that we can have an Interior Point method with complexity O(n^3). However, whether MATLAB uses this exact flavor of Interior Point method is a little tricky to know but O(n^3) would be my best guess.
Of course, my answer is rather theoretical; if you want to empirically test it out, you can do so by gradually increasing the number of variables and plotting the CPU time required to get an estimate.

Why doesn't k-means give the global minima?

I read that the k-means algorithm only converges to a local minima and not to a global minima. Why is this? I can logically think of how initialization could affect the final clustering and there is a possibility of sub-optimum clustering, but I did not find anything that will mathematically prove that.
Also, why is k-means an iterative process?
Can't we just partially differentiate the objective function w.r.t. to the centroids, equate it to zero to find the centroids that minimizes this function? Why do we have to use gradient descent to reach the minimum step by step?
Consider:
. c .
. c .
Where c is a cluster centroid. The algorithm will stop, but a better solution is:
. .
c c
. .
With regards to a proof - You don't require a mathematical proof to prove that something isn't always true, you just need a single counter-example, as provided above. You can probably convert the above into a mathematical proof, but this is unnecessary and generally requires a lot of work; even in academia it is accepted to merely give a counter-example to disprove something.
The k-means algorithm is by definition an iterative process, it's simply the way it works. The problem of clustering is NP-hard, thus using an exact algorithm to calculate the centroids would take immensely long.
Don't mix the problem and the algorithm.
The k-means problem is finding the least-squares assignment to centroids.
There are multiple algorithms for finding a solution.
There is an obvious approach to find the global optimum: enumerating all k^n possible assignments - that will yield a global minimum, but in exponential runtime.
Much more attention was put to finding an approximate solution in faster time.
The Lloyd/Forgy algorithm is an EM-style iterative model refinement approach, that is guaranteed to converge to a local minimum simply because there is a finite number of states, and the objective function must decrease in every step. This algorithm runs in O(n*k*i) where i << n usually, but it may find a local minimum only.
The MacQueens method is technically not iterative. It's a single-pass, one-element-at-a-time algorithm that will not even find a local minimum in the Lloyd sense. (You can however run it multiple passes over the data set, until convergence, to get a local minimum too!) If you do a single pass, its in O(n*k), for multiple passes add i. It may or may not take more passes than Lloyd.
Then there is Hartigan and Wong. I don't remember the details, IIRC it was a clever, more lazy, variant of Lloyd/Forgy, so probably in O(n*k*i), too (although probably not recomputing all n*k distances for later iterations?)
You could also do a randomized alogrithm that just tests l random assignments. It probably won't find a minimum at all, but run in "linear" time O(n*l).
Oh, and you can try different random initializations, to improve your chances of finding the global minimum. Add a factor t for the number of trials...

Why is Matlab's inv slow and inaccurate?

I read at a few places (in the doc and in this blog post : http://blogs.mathworks.com/loren/2007/05/16/purpose-of-inv/ ) that the use of inv in Matlab is not recommended because it is slow and inaccurate.
I am trying to find the reason of this inaccuracy. As of now, Google did not give m interesting result, so I thought someone here could guide me.
Thanks !
The inaccuracy I mentioned is with the method INV, not MATLAB's implementation of it. You should be using QR, LU, or other methods to solve systems of equations since these methods don't typically require squaring the condition number of the system in question. Using inv typically requires an operation that loses accuracy by squaring the condition number of the original system.
--Loren
I think the point of Loren's blog is not that MATLAB's inv function is particularly slower or more inaccurate than any other numerical implementation of computing a matrix inverse; rather, that in most cases the inverse itself is not needed, and you can proceed by other means (such as solving a linear system using \ - the backslash operator - rather than computing an inverse).
inv() is certainly slower than \ unless you have multiple right hand side vectors to solve for. However, the advice from MathWorks regarding inaccuracy is due to a overly conservative bound in a numerical linear algebra result. In other words, inv() is NOT inaccurate. The link elaborates further : http://arxiv.org/abs/1201.6035
Several widely-used textbooks lead the reader to believe that solving a linear system of equations Ax = b by multiplying the vector b by a computed inverse inv(A) is inaccurate. Virtually all other textbooks on numerical analysis and numerical linear algebra advise against using computed inverses without stating whether this is accurate or not. In fact, under reasonable assumptions on how the inverse is computed, x = inv(A)*b is as accurate as the solution computed by the best backward-stable solvers.