Depending on the problem specifics, two algorithms generally mentioned in the context of the single source shortest path problem is Dijkstra's algorithm and the Bellman-Ford algorithm.
Dijkstra's algorithm works with positive edge weights, whereas the Bellman-Ford algorithm is a generalization also allowing for negative edge weights.
As implemented in Sedgewick's book "Algorithms" (4th ed.), Dijkstra's algorithm is based on a priority queue, whereas the Bellman-Ford algorithm is based on a plain FIFO queue.
However, to me it doesn't look like either choice of the two queue types would be necessary for implementing the algorithms. One could just as well implement Dijkstra's algorithm with a FIFO queue and the Bellman-Ford algorithm with a priority queue.
What is the reason why Dijkstra's algorithm is usually implemented with a priority queue, Bellman-Ford on the other hand with a FIFO queue? Is there a functional reason, or is it for runtime optimization?
Dijkstra's algorithm is based on a priority queue
Not necessarily. You can also implement dijkstra's algorithm without a priority queue. But in that case you have to pick the lowest value after searching from your array of node list you are currently processing.
Bellman-Ford algorithm is based on a plain FIFO queue
Without any sort of queue you can easily implement Bellman-ford algorithm. here is an example implementation. https://kt48.wordpress.com/2015/06/16/bellman-ford-algorithm-c-implementation/
What is the reason why Dijkstra's algorithm is usually implemented
with a priority queue, Bellman-Ford on the other hand with a FIFO
queue? Is there a functional reason, or is it for runtime
optimization?
Yes, it is runtime optimization.
Related
I am looking for some comprehensive description. I couldn't find it via browsing as things are more clustered on the web and its not in my scope currently.
Classification and evolutionary computing is comparing oranges to apples. Let me explain:
Classification is a type of problem, where the goal is to determine a label given some input. (Typical example, given pixel values, determine image label).
Evolutionary computing is a family of algorithms to solve different types of problems. They work with a "population" of candidates (imagine a set of different neural networks trying to solve a given problem). Somehow you evaluate how good each candidate is in the given task (typically using a "fitness function", but there are other methods). Then a new generation of candidates is produced, taking the best candidates from the previous generation as a model, and including mutations and cross-over (that is, introducing changes). Repeat until happy.
Evolutionary computing can absolutely be used for classification! But there are examples where it is used in different ways. You may use evolutionary computing to create an artificial neural network controlling a robot (in this case, inputs are sensor values, outputs are commands for actuators). Or to create original content free of a given goal, as in Picbreeder.
Classification may be solved using evolutionary computation (maybe this is why you where confused in the first place) but other techniques are also common. You can use decision trees, or notably deep-learning (based on backpropagation).
Deep-learning based on backpropagation may sound similar to evolutionary computation, but it is quite different. Here you have only one artificial neural network, and a clear rule (backpropagation) telling you which changes to introduce every iteration.
Hope this helps to complement other answers!
Classification algorithms and evolutionary computing are different approaches. However, they are related in some ways.
Classification algorithms aim to identify the class label of new instances. They are trained with some labeled instances. For example, recognition of digits is a classification algorithm.
Evolutionary algorithms are used to find out the minimum or maximum solution of an optimization problem. They randomly explore the solution space of the given problem. They can find a good solution in a reasonable time and are not able to find the global optimum in all problems.
In some classification approaches, evolutionary algorithms are used to find out the optimal value of the parameters.
I have a dataset consists of (700 data points x 400 dimensions) which belong to 10 classes. I did cluster this data to see how data points will fit into clusters similar to their class. I performed two clustering experiments, one using basic k-means (euclidean) and another using Affinity Propagation. I noticed that the results using k-means are better and faster!! than the Affinity Propagation.
I could not understand the reason behind this. Can any of you help in giving explanation why I got such results (I thought Affinity Propagation is better than k-means)?
It could be a matter of granularity - the APC result could be close to a subclustering or superclustering of the class labels. There is a parameter that affects APC granularity (check yourself).
Another consideration is how you prepare the network that you give to APC (or any other network clustering algorithm). Ideally it should not be too dense. As a rough guideline, make sure that the distribution of { number of neighbours per node | all nodes } does not stray far outside [0.5 * sqrt(N) - 2.0 * sqrt(N)]. Especially try to avoid hubs, that is, nodes that have many more neighbours than that upper bound.
As a sanity check, are the values that you give to APC similarities? They should similarities be of course, not distances. You have a choice how the similarity is computed. The standard way to restrain the number of neighbours is to use a cut-off. Experiment with the combination of these. Finally you may also want to try MCL, an algorithm that precedes APC and uses conceptually similar principles but is a bit cleaner in its formulation (alternation of simple matrix operations). It is probably faster.
After profiling my Neural Nets' code I've realized that the method, which computes the weight changes for each arc in the network (-rate*gradient + momentum*previous_delta - decay*rate*weight), already given the gradient, is the bottleneck (55% inclusive samples).
Is there any trick to compute these values in a efficient manner?
This is normal behaviour. I am assuming that you are using an iterative process to solve the weights at each evolution step (such as backpropagation?). If the number of neurons is large and the training (back-testing) algorithm is short, then it is normal that weight mutation such as this will consume a larger fraction of compute time during training of the neural network.
Did you get this result using a simple XOR problem or similar? If so, you will probably find that if you start to solve more complex problems (such as pattern detection in multidimensional arrays, image processing, etc.) that those functions will begin to consume an insignificant fraction of compute time.
If you are profiling, I would suggest you profile with a problem that is closer to the purpose for which the neural network is designed (I am guessing you didn't design it to solve XOR or play tic tac toe) and you will probably find that optimising code such as -rate*gradient + momentum*previous_delta - decay*rate*weight is more or less a waste of time, at least this is my experience.
If you do find that this code is compute-intensive in real-world applications then I would suggest trying to reduce the number of times this line of code is executed via structural changes. Neural network optimization is a rich field and I can't possibly give you useful advise from such a broad question, but I will say that if your program is unusually slow, you're unlikely to see significant improvements by tinkering at such low-level code. I will however suggest the following from my own experience:
Consider parallelisation. Many search algorithms such as those implemented in back-propagation techniques are amenable to parallel attempts to improve convergence. As weight-adjustments are identical in terms of computation demand for a given network, think static loops in Open MP.
Modify the convergence criterion (the critical convergence rate before you stop adjustments of weights) to perform less of these calculations
Consider an alternative to deterministic solutions such as back-propagations, which are slightly more prone to local optimisation anyway. Consider gaussian mutation (All things being equal gaussian mutation will 1) reduce time spent on mutation relative to backtesting 2) increase convergence time and 3) be less prone to getting caught in local minima of the error search space)
Please note that this is a non-technical answer to what I have interpreted as a non-technical question.
My question is as follows: According to different sources, Dijkstra's algorithm is nothing but a variant of Uniform Cost Search. We know that Dijkstra's algorithm finds the shortest path between a source and all destinations ( single-source ). However, we can always modify Dijkstra to find the the shortest path between a START and a GOAL state ( when the goal is popped from the priority queue, we simply stop); but doing so, the worst case scenario will be still finding the shortest path from START to all other nodes ( suppose the goal is the furthest node in the graph).
If we implement Dijkstra's algorithm using a min-priority heap, the running time will be
O(V log V +E) , where E is the number of edges and V the number of vertices.
Since Uniform Cost Search is the same as Dijkstra ( slightly different implementation), then the running time of UCS should be similar to Dijkstra, right? However, according to my AI class, Uniform Cost Search is exponential at the worst case, and it takes O(b1 + [C*/ε]), where C* is the cost of the optimal solution. ( b is the branching factor)
How can both algorithms be the same while they have different running times? Is the running time the same, but the way we look at it is different?
I would appreciate your help :):) Thank you
Is the running time the same, but the way we look at it is different?
Yes. Uniform cost search can be used on infinitely large graphs, on which Dijkstra's original algorithm would never terminate. In such situations, it's no use defining complexity in terms of V and E as both might be infinite and the resulting big-O figure meaningless.
Is there a machine learning concept (algorithm or multi-classifier system) that can detect the variance of network attacks(or try to).
One of the biggest problems for signature based intrusion detection systems is the inability to detect new or variant attacks.
Reading up, anomaly detection seems to still be a statistical based en-devour it refers to detecting patterns in a given data set which isn't the same as detecting variation in packet payloads. Anomaly based NIDS monitors network traffic and compares it against an established baseline of a normal traffic profile. The baseline characterizes what is "normal" for the network - such as the normal bandwidth usage, the common protocols used, correct combinations of ports numbers and devices etc
Say some one uses Virus A to propagate through a network then some one writes a rule to stop Virus A but another person writes a "variation" of Virus A called Virus B purely for the purposes of evading that initial rule but still using most if not all of the same tactics/code. Is there not a way to detect variance?
If there is whats the umbrella term it would come under, as ive been under the illusion that anomaly detection was it.
Could machine learning be used for pattern recognition(rather than pattern matching) at the packet payload level?
i think your intution to look at machine learning techniques is correct, or will turn out to be correct (One of the biggest problems for signature based intrusion detection systems is the inability to detect new or variant attacks.) The superior performance of ML techiques is in general due to the ability of these algorithms to generalize (a multiplicity of soft constraints rather than a few hard constraints). and to adapt (updates based on new training instances to frustrate simple countermeasures)--two attributes that i would imagine are crucial for identifying network attacks.
The theoretical promise aside, there are practical difficulties with applying ML techniques to problems like the one recited in the OP. By far the most significant is the difficultly in gathering data to train the classifier. In particular, reliably labeling data points as "intrusion" is probably not easy; likewise, my guess is that these instances are sparsely distributed in the raw data."
I suppose it's this limitation that has led to the increased interest (as evidenced at least by the published literature) in applying unsupervised ML techniques to problems like network intrusion detection.
Unsupervised techniques differ from supervised techniques in that the data is fed to the algorithms without a response variable (i.e., without the class labels). In these cases you are relying on the algorithm to discern structure in the data--i.e., some inherent ordering in the data into reasonably stable groups or clusters (possibly what you the OP had in mind by "variance." So with an unsupervised technique, there is no need to explicitly show the algorithm instances of each class, nor is it necessary to establish baseline measurements, etc.
The most frequently used unsupervised ML technique applied to problems of this type is probably the Kohonen Map (also sometimes called self-organizing map or SOM.)
i use Kohonen Maps frequently, but so far not for this purpose. There are however, numerous published reports of their successful application in your domain of interest, e.g.,
Dynamic Intrusion Detection Using Self-Organizing Maps
Multiple Self-Organizing Maps for Intrusion Detection
I know MATLAB has at least one available implementation of Kohonen Map--the SOM Toolbox. The homepage for this Toolbox also contains a brief introduction to Kohonen Maps.