Is Dijkstra's a special case for A*? - dijkstra

According to a.o. this accepted answer, Dijkstra's algorithm is a special case of the A* algorithm.
Special case
In logic, especially as applied in mathematics, concept A is a special case or specialization of concept B precisely if every
instance of A is also an instance of B but not vice versa, or
equivalently, if B is a generalization of A.
There are, however, cases that produce different results for A* with a constant zero heuristic when compared to Dijktra's algorithm.
One of these is their behaviour when the graph contains negative cycles: while A*'s closed set handling completely avoids such cycles, Dijkstra's follows negative cycles infinite times before proceeding.
Another edge case where their results may differ, this time even with an admissible heuristic, is when A* underestimates the cost of the final edge in a graph. Given a graph like this:
Most A* implementations, including the psuedo-code example at wikipedia, would incorrectly assume A-B-D to be the shortest path if the heuristic function for rates B to D below 2, while Dijkstra correctly yields A-C-D.
Can Dijkstra's algorithm still be viewed as special case for A*, given the above?

Related

How to find the time complexity of the algebra operation in algebraixlib

How can I calculate time complexity using mathematics or Big O notation for algebra operations used in the algebra of data.
I will use book example to explain my question. Consider following example given in book.
B
In above example I would like to calculate the time complexity of transpose and compose operation.
If it possible I would also like to find out other algebra data operations' time complexity.
Please let me know if you need more explanation.
#wesholler I edited my question to understand you explanation. Following is a real life example and suppose we want to calculate the time complexity for operations used below.
suppose I have algebra of data operations as follows
Could you describe how we would calculate the time complexity in above example. Preferably in Big O?
Thanks
This answer has three parts:
General Time Complexity Analysis
Generally, the time complexity/BigO can be determined by considering the origin of an operation - that is, what operations were extended from more primitive algebras to derive this one?
The following rules describe the upper-bound on the time complexity for both unary and binary operations that are extended into their power set algebras.
Unary extension can be thought of similarly to a map operation and so has linear time complexity. Binary extension evaluates the cross product of the operation's arguments and so has a worst-case time complexity similar to O(n^2). However it is important to consider that the real upper bound is a product of the cardinality of both arguments; this comes up in practice often when the right-hand argument to a composition or superstriction operation is a singleton.
Time Complexity for algebraixlib Implementations
We can take a look at a few examples of how extension affects the time complexity while at the same time analyzing the complexity of the implementations in algebraixlib (the last part talks about other implementations.)
Being that it is a reference implementation for data algebra, algebraixlib implements extended operations very literally. For that reason, Big Theta is used below, because the formulas represent both the lower and upper bounds of the time complexity.
Here is the unary operation transpose being extended from couplets to relations and then to clans.
Likewise, here is the binary operation compose being extended from couplets to relations and then to clans.
It is clear that the complexity of both of the clan operations is influenced by both the number of relation elements as well as the number of couplets in those relations.
Time Complexity for Other Implementations
It is important to note that the above section describes the time complexity that is specific to the algorithms implemented in algebraixlib.
One could imagine implementing e.g. clans.cross_union with a method similar to sort-merge-join or hash-join. In this case, the upper bound would remain the same, but the lower bound (and expected) time complexity would be reduced by one or more degrees.

What's the best way to calculate a numerical derivative in MATLAB?

(Note: This is intended to be a community Wiki.)
Suppose I have a set of points xi = {x0,x1,x2,...xn} and corresponding function values fi = f(xi) = {f0,f1,f2,...,fn}, where f(x) is, in general, an unknown function. (In some situations, we might know f(x) ahead of time, but we want to do this generally, since we often don't know f(x) in advance.) What's a good way to approximate the derivative of f(x) at each point xi? That is, how can I estimate values of dfi == d/dx fi == df(xi)/dx at each of the points xi?
Unfortunately, MATLAB doesn't have a very good general-purpose, numerical differentiation routine. Part of the reason for this is probably because choosing a good routine can be difficult!
So what kinds of methods are there? What routines exist? How can we choose a good routine for a particular problem?
There are several considerations when choosing how to differentiate in MATLAB:
Do you have a symbolic function or a set of points?
Is your grid evenly or unevenly spaced?
Is your domain periodic? Can you assume periodic boundary conditions?
What level of accuracy are you looking for? Do you need to compute the derivatives within a given tolerance?
Does it matter to you that your derivative is evaluated on the same points as your function is defined?
Do you need to calculate multiple orders of derivatives?
What's the best way to proceed?
These are just some quick-and-dirty suggestions. Hopefully somebody will find them helpful!
1. Do you have a symbolic function or a set of points?
If you have a symbolic function, you may be able to calculate the derivative analytically. (Chances are, you would have done this if it were that easy, and you would not be here looking for alternatives.)
If you have a symbolic function and cannot calculate the derivative analytically, you can always evaluate the function on a set of points, and use some other method listed on this page to evaluate the derivative.
In most cases, you have a set of points (xi,fi), and will have to use one of the following methods....
2. Is your grid evenly or unevenly spaced?
If your grid is evenly spaced, you probably will want to use a finite difference scheme (see either of the Wikipedia articles here or here), unless you are using periodic boundary conditions (see below). Here is a decent introduction to finite difference methods in the context of solving ordinary differential equations on a grid (see especially slides 9-14). These methods are generally computationally efficient, simple to implement, and the error of the method can be simply estimated as the truncation error of the Taylor expansions used to derive it.
If your grid is unevenly spaced, you can still use a finite difference scheme, but the expressions are more difficult and the accuracy varies very strongly with how uniform your grid is. If your grid is very non-uniform, you will probably need to use large stencil sizes (more neighboring points) to calculate the derivative at a given point. People often construct an interpolating polynomial (often the Lagrange polynomial) and differentiate that polynomial to compute the derivative. See for instance, this StackExchange question. It is often difficult to estimate the error using these methods (although some have attempted to do so: here and here). Fornberg's method is often very useful in these cases....
Care must be taken at the boundaries of your domain because the stencil often involves points that are outside the domain. Some people introduce "ghost points" or combine boundary conditions with derivatives of different orders to eliminate these "ghost points" and simplify the stencil. Another approach is to use right- or left-sided finite difference methods.
Here's an excellent "cheat sheet" of finite difference methods, including centered, right- and left-sided schemes of low orders. I keep a printout of this near my workstation because I find it so useful.
3. Is your domain periodic? Can you assume periodic boundary conditions?
If your domain is periodic, you can compute derivatives to a very high order accuracy using Fourier spectral methods. This technique sacrifices performance somewhat to gain high accuracy. In fact, if you are using N points, your estimate of the derivative is approximately N^th order accurate. For more information, see (for example) this WikiBook.
Fourier methods often use the Fast Fourier Transform (FFT) algorithm to achieve roughly O(N log(N)) performance, rather than the O(N^2) algorithm that a naively-implemented discrete Fourier transform (DFT) might employ.
If your function and domain are not periodic, you should not use the Fourier spectral method. If you attempt to use it with a function that is not periodic, you will get large errors and undesirable "ringing" phenomena.
Computing derivatives of any order requires 1) a transform from grid-space to spectral space (O(N log(N))), 2) multiplication of the Fourier coefficients by their spectral wavenumbers (O(N)), and 2) an inverse transform from spectral space to grid space (again O(N log(N))).
Care must be taken when multiplying the Fourier coefficients by their spectral wavenumbers. Every implementation of the FFT algorithm seems to have its own ordering of the spectral modes and normalization parameters. See, for instance, the answer to this question on the Math StackExchange, for notes about doing this in MATLAB.
4. What level of accuracy are you looking for? Do you need to compute the derivatives within a given tolerance?
For many purposes, a 1st or 2nd order finite difference scheme may be sufficient. For higher precision, you can use higher order Taylor expansions, dropping higher-order terms.
If you need to compute the derivatives within a given tolerance, you may want to look around for a high-order scheme that has the error you need.
Often, the best way to reduce error is reducing the grid spacing in a finite difference scheme, but this is not always possible.
Be aware that higher-order finite difference schemes almost always require larger stencil sizes (more neighboring points). This can cause issues at the boundaries. (See the discussion above about ghost points.)
5. Does it matter to you that your derivative is evaluated on the same points as your function is defined?
MATLAB provides the diff function to compute differences between adjacent array elements. This can be used to calculate approximate derivatives via a first-order forward-differencing (or forward finite difference) scheme, but the estimates are low-order estimates. As described in MATLAB's documentation of diff (link), if you input an array of length N, it will return an array of length N-1. When you estimate derivatives using this method on N points, you will only have estimates of the derivative at N-1 points. (Note that this can be used on uneven grids, if they are sorted in ascending order.)
In most cases, we want the derivative evaluated at all points, which means we want to use something besides the diff method.
6. Do you need to calculate multiple orders of derivatives?
One can set up a system of equations in which the grid point function values and the 1st and 2nd order derivatives at these points all depend on each other. This can be found by combining Taylor expansions at neighboring points as usual, but keeping the derivative terms rather than cancelling them out, and linking them together with those of neighboring points. These equations can be solved via linear algebra to give not just the first derivative, but the second as well (or higher orders, if set up properly). I believe these are called combined finite difference schemes, and they are often used in conjunction with compact finite difference schemes, which will be discussed next.
Compact finite difference schemes (link). In these schemes, one sets up a design matrix and calculates the derivatives at all points simultaneously via a matrix solve. They are called "compact" because they are usually designed to require fewer stencil points than ordinary finite difference schemes of comparable accuracy. Because they involve a matrix equation that links all points together, certain compact finite difference schemes are said to have "spectral-like resolution" (e.g. Lele's 1992 paper--excellent!), meaning that they mimic spectral schemes by depending on all nodal values and, because of this, they maintain accuracy at all length scales. In contrast, typical finite difference methods are only locally accurate (the derivative at point #13, for example, ordinarily doesn't depend on the function value at point #200).
A current area of research is how best to solve for multiple derivatives in a compact stencil. The results of such research, combined, compact finite difference methods, are powerful and widely applicable, though many researchers tend to tune them for particular needs (performance, accuracy, stability, or a particular field of research such as fluid dynamics).
Ready-to-Go Routines
As described above, one can use the diff function (link to documentation) to compute rough derivatives between adjacent array elements.
MATLAB's gradient routine (link to documentation) is a great option for many purposes. It implements a second-order, central difference scheme. It has the advantages of computing derivatives in multiple dimensions and supporting arbitrary grid spacing. (Thanks to #thewaywewalk for pointing out this glaring omission!)
I used Fornberg's method (see above) to develop a small routine (nderiv_fornberg) to calculate finite differences in one dimension for arbitrary grid spacings. I find it easy to use. It uses sided stencils of 6 points at the boundaries and a centered, 5-point stencil in the interior. It is available at the MATLAB File Exchange here.
Conclusion
The field of numerical differentiation is very diverse. For each method listed above, there are many variants with their own set of advantages and disadvantages. This post is hardly a complete treatment of numerical differentiation.
Every application is different. Hopefully this post gives the interested reader an organized list of considerations and resources for choosing a method that suits their own needs.
This community wiki could be improved with code snippets and examples particular to MATLAB.
I believe there is more in to these particular questions. So I have elaborated on the subject further as follows:
(4) Q: What level of accuracy are you looking for? Do you need to compute the derivatives within a given tolerance?
A: The accuracy of numerical differentiation is subjective to the application of interest. Usually the way it works is, if you are using the ND in forward problem to approximate the derivatives to estimate features from signal of interest, then you should be aware of noise perturbations. Usually such artifacts contain high frequency components and by the definition of the differentiator, the noise effect will be amplified in the magnitude order of $i\omega^n$. So, increasing the accuracy of differentiator (increasing the polynomial accuracy) will no help at all. In this case you should be able to cancelt the effect of noise for differentiation. This can be done in casecade order: first smooth the signal, and then differentiate. But a better way of doing this is to use "Lowpass Differentiator". A good example of MATLAB library can be found here.
However, if this is not the case and you're using ND in inverse problems, such as solvign PDEs, then the global accuracy of differentiator is very important. Depending on what kind of bounady condition (BC) suits your problem, the design will be adapted accordingly. The rule of thump is to increase the numerical accuracy known is the fullband differentiator. You need to design a derivative matrix that takes care of suitable BC. You can find comprehensive solutions to such designs using the above link.
(5) Does it matter to you that your derivative is evaluated on the same points as your function is defined?
A: Yes absolutely. The evaluation of the ND on the same grid points is called "centralized" and off the points "staggered" schemes. Note that using odd order of derivatives, centralized ND will deviate the accuracy of frequency response of the differentiator. Therefore, if you're using such design in inverse problems, this will perturb your approximation. Also, the opposite applies to the case of even order of differentiation utilized by staggered schemes. You can find comprehensive explanation on this subject using the link above.
(6) Do you need to calculate multiple orders of derivatives?
This totally depends on your application at hand. You can refer to the same link I have provided and take care of multiple derivative designs.

How many and which parents should we select for crossover in genetic algorithm

I have read many tutorials, papers and I understood the concept of Genetic Algorithm, but I have some problems to implement the problem in Matlab.
In summary, I have:
A chromosome containing three genes [ a b c ] with each gene constrained by some different limits.
Objective function to be evaluated to find the best solution
What I did:
Generated random values of a, b and c, say 20 populations. i.e
[a1 b1 c1] [a2 b2 c2]…..[a20 b20 c20]
At each solution, I evaluated the objective function and ranked the solutions from best to worst.
Difficulties I faced:
Now, why should we go for crossover and mutation? Is the best solution I found not enough?
I know the concept of doing crossover (generating random number, probability…etc) but which parents and how many of them will be selected to do crossover or mutation?
Should I do the crossover for the entire 20 solutions (parents) or only two of them?
Generally a Genetic Algorithm is used to find a good solution to a problem with a huge search space, where finding an absolute solution is either very difficult or impossible. Obviously, I don't know the range of your values but since you have only three genes it's likely that a good solution will be found by a Genetic Algorithm (or a simpler search strategy at that) without any additional operators. Selection and Crossover is usually carried out on all chromosome in the population (although it's not uncommon to carry some of the best from each generation forward as is). The general idea is that the fitter chromosomes are more likely to be selected and undergo crossover with each other.
Mutation is usually used to stop the Genetic Algorithm prematurely converging on a non-optimal solution. You should analyse the results without mutation to see if it's needed. Mutation is usually run on the entire population, at every generation, but with a very small probability. Giving every gene 0.05% chance that it will mutate isn't uncommon. You usually want to give a small chance of mutation, without it completely overriding the results of selection and crossover.
As has been suggested I'd do a lit bit more general background reading on Genetic Algorithms to give a better understanding of its concepts.
Sharing a bit of advice from 'Practical Neural Network Recipies in C++' book... It is a good idea to have a significantly larger population for your first epoc, then your likely to include features which will contribute to an acceptable solution. Later epocs which can have smaller populations will then tune and combine or obsolete these favourable features.
And Handbook-Multiparent-Eiben seems to indicate four parents are better than two. However bed manufactures have not caught on to this yet and seem to only produce single and double-beds.

Dijkstra's algorithm Vs Uniform Cost Search (Time comlexity)

My question is as follows: According to different sources, Dijkstra's algorithm is nothing but a variant of Uniform Cost Search. We know that Dijkstra's algorithm finds the shortest path between a source and all destinations ( single-source ). However, we can always modify Dijkstra to find the the shortest path between a START and a GOAL state ( when the goal is popped from the priority queue, we simply stop); but doing so, the worst case scenario will be still finding the shortest path from START to all other nodes ( suppose the goal is the furthest node in the graph).
If we implement Dijkstra's algorithm using a min-priority heap, the running time will be
O(V log V +E) , where E is the number of edges and V the number of vertices.
Since Uniform Cost Search is the same as Dijkstra ( slightly different implementation), then the running time of UCS should be similar to Dijkstra, right? However, according to my AI class, Uniform Cost Search is exponential at the worst case, and it takes O(b1 + [C*/ε]), where C* is the cost of the optimal solution. ( b is the branching factor)
How can both algorithms be the same while they have different running times? Is the running time the same, but the way we look at it is different?
I would appreciate your help :):) Thank you
Is the running time the same, but the way we look at it is different?
Yes. Uniform cost search can be used on infinitely large graphs, on which Dijkstra's original algorithm would never terminate. In such situations, it's no use defining complexity in terms of V and E as both might be infinite and the resulting big-O figure meaningless.

Why can't we apply Dijkstra's algorithm for a graph with negative weights?

Why can't we apply Dijkstra's algorithm for a graph with negative weights?
What does it mean to find the least expensive path from A to B, if every time you travel from C to D you get paid?
If there is a negative weight between two nodes, the "shortest path" is to loop backwards and forwards between those two nodes forever. The more hops, the "shorter" the path gets.
This is nothing to do with the algorithm, and all to do with the impossibility of answering such a question.
Edit:
The above claim assumes bidirectional links. If there is no cycles which have an overall negative weight, you do not have a way to loop around forever, being paid.
In such a case, Dijkstra's algorithm may still fail:
Consider two paths:
an optimal path that racks up a cost of 100, before crossing the final edge which has a -25 weight, giving a total of 75, and
a suboptimal path that has no negatively-weighted edges with a total cost of 90.
Dijkstra's algorithm will investigate the suboptimal path first, and will declare itself finished when it finds it. It will never follow up the subpath that is worse than the first solution found
I will give you an counterexample. Consider following graph
http://img853.imageshack.us/img853/7318/3fk8.png
Suppose you begun in vertex A and you want shortest path to D. Dijkstra's algorithm would do following steps:
Mark A as visited and add vertices B and C to queue
Fetch from queue vertex with minimal distance. It is B
Mark B as visited and add vertex D to queue.
Fetch from queue. Not it is vertex D.
Mark D as visited
Dijkstra says shortest path from A to D has length 2 but it is obviously not true.
Imagine you had a directed graph in it with a directed cycle, and the total "distance" around that was a negative weight. If on your way from the Start to the End vertex you could pass through that directed cycle, you could simply go around and around the directed cycle an arbitrary number of times.
And that means you could make you path across the graph have an infinitely negative distance (or effectively so).
However, as long as there are no directed cycles around your graph, you could get away with using Dijkstra's Algorithm without anything exploding on you.
All that being said, there if you have a graph with negative weights, you could use the Belman-Ford algorithm. Because of the generality of this algorithm, however, it is a bit slower. The Bellman-Ford algorithm takes O(V·E), where the Dijkstra's takes O(E + VlogV) time