Can I use Prim's algorithm instead of Dijkstra's to find shortest path? - dijkstra

I have been fighting all day in understanding Dijkstra's algorithm and implementing with no significant results. I have a matrix of cities and their distances. What I want to do is to given an origin point and a destination point, to find the shortest path between the cities.
Example:
__0__ __1__ __2__
0 | 0 | 34 | 0 |
|-----|-----|-----|
1 | 34 | 0 | 23 |
|-----|-----|-----|
2 | 0 | 23 | 0 |
----- ----- -----
I started wondering if there is an other way to solve this. What if I apply Prim's algorithm from the origin's point and then I loop through the whole tree created until I find the destination point?

You could apply Prim's algorithm and then walk the resulting tree, but you answer may be wrong. Assume that you have a graph where each edge has the same weight. Prim's algorithm simply chooses a minimal weight edge in the set of edges that could be added to the tree. It is possible that you will not choose an edge that will lead to a shortest path between two nodes.
Assume:
__0__ __1__ __2__
0 | 0 | 1 | 1 |
|-----|-----|-----|
1 | 1 | 0 | 1 |
|-----|-----|-----|
2 | 1 | 1 | 0 |
----- ----- -----
Starting from node 0 you could, via Prim's, choose the edges 0-1 and 0-2 to make your tree. Alternately, you could pick edges 0-1 and 1-2 to make your tree. Under the first edge set, you could find the minimum length path from 0 to 2, but under the second edge set you would not find the minimal path. Since you can't a-priori determine which edges get added in the Prim algorithm, you can't use it to find a shortest path.
You could consider the Bellman-Ford algorithm, but unless you're dealing with negative edge weights I find Dijkstra's algorithm preferable.

Related

Simple cumulative increase in Prometheus

I have an application that increments a Prometheus counter when it receives a particular HTTP request. The application runs in Kubernetes, has multiple instances and redeploys multiple times a day. Using the query http_requests_total{method="POST",path="/resource/aaa",statusClass="2XX"} produces a graph displaying cumulative request counts per instance as is expected.
I would like to create a Grafana graph that shows the cumulative frequency of requests received over the last 7 days.
My first thought was use increase(...[7d]) in order to account for any metrics starting outside of the 7 day window (like in the image shown) and then sum those values.
I've come to the realisation that sum(increase(http_requests_total{method="POST",path="/resource/aaa",statusClass="2XX"}[7d])) does in fact give the correct answer for points in time. However, resulting graph isn't quite what was asked for because the component increase(...) values increase/decrease along the week.
How would I go about creating a graph that shows the cumulative sum of the increase in these metrics over the passed 7 days? For example, given the simplified following data
| Day | # Requests |
|-----|------------|
| 1 | 10 |
| 2 | 5 |
| 3 | 15 |
| 4 | 10 |
| 5 | 20 |
| 6 | 5 |
| 7 | 5 |
| 8 | 10 |
If I was to view a graph of day 2 to day 8 I would like the graph to render a line as follows,
| Day | Cumulative Requests |
|-----|---------------------|
| d0 | 0 |
| d1 | 5 |
| d2 | 20 |
| d3 | 30 |
| d4 | 50 |
| d5 | 55 |
| d6 | 60 |
| d7 | 70 |
Where d0 represents the initial value in the graph
Thanks
Prometheus doesn't provide functionality, which can be used for returning cumulative increase over multiple time series on the selected time range.
If you still need this functionality, then try VictoriaMetrics - Prometheus-like monitoring solution I work on. It allows calculating cumulative increase over multiple counters. For example, the following MetricsQL query returns cumulative increase over all the time series with http_requests_total name on the selected time range in Grafana:
running_sum(sum(increase(http_requests_total)))
How does it work?
It calculates increase per each time series with the http_requests_total name. Note that the increase() in the query above doesn't contain lookbehind window in square brackets. VictoriaMetrics automatically sets the lookbehind window to the step value, which is passed by Grafana to /api/v1/query_range endpoint. The step value is the interval between points on the graph.
It sums increases returned at step 1 with the sum() function individually per each point on the graph.
It calculates cumulative increase over per-step increases returned at step 2 with the running_sum function.
If I understood your question's idea correctly, I think I managed to create such graph with a query like this
sum(max_over_time(counterName{someLabel="desiredlabelValue"}[7d]))
A graph produced by it looks like the blue one:
The reasons why the future part of the graph decreases are both because the future processing hasn't obviously yet happened and because the more-than-7-days-old processing slides out of the moving 7-day inspection window.

Why is this XOR neural network having 2 outputs?

Regularly, a simple neural network to solve XOR should have 2 inputs, 2 neurons in hidden layer, 1 neuron in output layer.
However, the following example implementation has 2 output neurons, and I don't get it:
https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/feedforward/xor/XorExample.java
Why did the author put 2 output neurons in there?
Edit:
Author of the example noted that he is using 4 neurons in hidden layer, 2 neurons in output layer. But I still don't get it why, why a shape of {4,2} instead of {2,1}?
This is called one hot encoding. The idea is that you have one neuron per class. Each neuron gives the probability of that class.
I don't know why he uses 4 hidden neurons. 2 should be enough (if I remember correctly).
The author uses the Evaluation class in the end (for stats of how often the network gives the correct result). This class needs one neuron per classification to work correctly, i.e. one output neuron for true and one for false.
It might be helpful to think of it like this:
Training Set Label Set
0 | 1 0 | 1
0 | 0 | 0 0 | 0 | 1
1 | 1 | 0 1 | 1 | 0
2 | 0 | 1 2 | 1 | 0
3 | 1 | 1 3 | 0 | 1
So [[0,0], 0], [[0,1], 0], etc. for the Training Set.
If you're using the two column Label Set, 0 and 1 correspond to true or false.
Thus, [0,0] correctly maps to false, [1,0] correctly maps to true, etc.
A pretty good article that slightly modifies the original can be found here: https://medium.com/autonomous-agents/how-to-teach-logic-to-your-neuralnetworks-116215c71a49

Normalize Count Measure in Tableau

I am trying to create a plot similar to those created by Google's ngram viewer. I have the ngrams that correspond to year, but some years have much more data than others; as a result, plotting from absolute counts doesn't get me the information I want. I'd like to normalize it so that I get the counts as a percentage of the total samples for that year.
I've found ways to normalize data to ranges in Tableau, but nothing about normalizing by count. I also see that there is a count distinct function, but that doesn't appear to do what I want.
How can I do this in Tableau?
Thanks in advance for your help!
Edit:
Here is some toy data and the desired output.
Toy Data:
+---------+------+
| Pattern | Year |
+---------+------+
| a | 1 |
| a | 1 |
| a | 1 |
| b | 1 |
| b | 1 |
| b | 1 |
| a | 2 |
| b | 2 |
| a | 3 |
| b | 4 |
+---------+------+
Desired Output:
Put [Year] on the Columns shelf, and if it is really a Date field instead of a number - choose any truncation level you'd like or choose exact date. Make sure to treat it as a discrete dimension field (the pill should be blue)
Put [Number of Records] on the Rows shelf. Should be a continuous measure, i.e. SUM([Number of Records])
Put Pattern on the Color shelf.
At this point, you should be looking at a graph raw counts. To convert them to percentages, right click on the [Number of Records] field on the Rows shelf, and choose Quick Table Calc->Percent of Total. Finally, right click on [Number of Records] a second time, and choose Compute Using->Pattern.
You might want to sort the patterns. One easy way is to just drag them in the color legend.

Bootstrap weighted data - Matlab

I have a simple dataset with values and absolute frequencies, like the table below:
value|freq
-----------
1 | 10
3 | 20
4 | 10
3 | 10
And now I'd like to calculate the frequency table, like:
value| %
-----------
1 | 1/5
3 | 3/5
4 | 1/5
And last step, I'd like to compute the bootstrap CI with matlab. I have a lot of rows in the dataset.
I've calculated the frequency table via grpstatscommand in Matlab, but I don't know how I can use it in the boostrp function in matlab.
Any help or suggestions would be really appreciated.

Training LIBSVM with multivariate data in MATLAB

How LIBSVM works performs multivariate regression is my generalized question?
In detail, I have some data for certain number of links. (Example 3 links). Each link has 3 dependent variables which when used in a model gives output Y. I have data collected on these links in some interval.
LinkId | var1 | var2 | var3 | var4(OUTPUT)
1 | 10 | 12.1 | 2.2 | 3
2 | 11 | 11.2 | 2.3 | 3.1
3 | 12 | 12.4 | 4.1 | 1
1 | 13 | 11.8 | 2.2 | 4
2 | 14 | 12.7 | 2.3 | 2
3 | 15 | 10.7 | 4.1 | 6
1 | 16 | 8.6 | 2.2 | 6.6
2 | 17 | 14.2 | 2.3 | 4
3 | 18 | 9.8 | 4.1 | 5
I need to perform prediction to find the output of
(2,19,10.2,2.3).
How can I do that using above data for training in Matlab using LIBSVM? Can I train the whole data as input to the svmtrain to create a model or do I need to train each link separate and use the model create for prediction? Does it make any difference?
NOTE : Notice each link with same ID has same value.
This is not really a matlab or libsvm question but rather a generic svm related one.
How LIBSVM works performs multivariate regression is my generalized question?
LibSVM is just a library, which in particular - implements the Support Vector Regression model for the regression tasks. In short words, in a linear case, SVR tries to find a hyperplane for which your data points are placed in some margin around it (which is quite a dual approach to the classical SVM which tries to separate data with as big margin as possible).
In non linear case the kernel trick is used (in the same fashion as in SVM), so it is still looking for a hyperplane, but in a feature space induced by the particular kernel, which results in the non linear regression in the input space.
Quite nice introduction to SVRs' can be found here:
http://alex.smola.org/papers/2003/SmoSch03b.pdf
How can I do that using above data for training in Matlab using LIBSVM? Can I train the whole data as input to the svmtrain to create a model or do I need to train each link separate and use the model create for prediction? Does it make any difference? NOTE : Notice each link with same ID has same value.
You could train SVR (as it is a regression problem) with the whole data, but:
seems that var3 and LinkId are the same variables (1->2.2, 2->2.3, 3->4.1), if this is a case you should remove the LinkId column,
are values of var1 unique ascending integers? If so, these are also probably a useless featues (as they do not seem to carry any information, they seem to be your id numbers),
you should preprocess your data before applying SVM so eg. each column contains values from the [0,1] interval, otherwise some features may become more important than others just because of their scale.
Now, if you would like to create a separate model for each link, and follow above clues, you end up with 1 input variable (var2) and 1 output variable var4, so I would not recommend such a step. In general it seems that you have very limited featues set, it would be valuable to gather more informative features.