I have a simple dataset with values and absolute frequencies, like the table below:
value|freq
-----------
1 | 10
3 | 20
4 | 10
3 | 10
And now I'd like to calculate the frequency table, like:
value| %
-----------
1 | 1/5
3 | 3/5
4 | 1/5
And last step, I'd like to compute the bootstrap CI with matlab. I have a lot of rows in the dataset.
I've calculated the frequency table via grpstatscommand in Matlab, but I don't know how I can use it in the boostrp function in matlab.
Any help or suggestions would be really appreciated.
Related
I have an application that increments a Prometheus counter when it receives a particular HTTP request. The application runs in Kubernetes, has multiple instances and redeploys multiple times a day. Using the query http_requests_total{method="POST",path="/resource/aaa",statusClass="2XX"} produces a graph displaying cumulative request counts per instance as is expected.
I would like to create a Grafana graph that shows the cumulative frequency of requests received over the last 7 days.
My first thought was use increase(...[7d]) in order to account for any metrics starting outside of the 7 day window (like in the image shown) and then sum those values.
I've come to the realisation that sum(increase(http_requests_total{method="POST",path="/resource/aaa",statusClass="2XX"}[7d])) does in fact give the correct answer for points in time. However, resulting graph isn't quite what was asked for because the component increase(...) values increase/decrease along the week.
How would I go about creating a graph that shows the cumulative sum of the increase in these metrics over the passed 7 days? For example, given the simplified following data
| Day | # Requests |
|-----|------------|
| 1 | 10 |
| 2 | 5 |
| 3 | 15 |
| 4 | 10 |
| 5 | 20 |
| 6 | 5 |
| 7 | 5 |
| 8 | 10 |
If I was to view a graph of day 2 to day 8 I would like the graph to render a line as follows,
| Day | Cumulative Requests |
|-----|---------------------|
| d0 | 0 |
| d1 | 5 |
| d2 | 20 |
| d3 | 30 |
| d4 | 50 |
| d5 | 55 |
| d6 | 60 |
| d7 | 70 |
Where d0 represents the initial value in the graph
Thanks
Prometheus doesn't provide functionality, which can be used for returning cumulative increase over multiple time series on the selected time range.
If you still need this functionality, then try VictoriaMetrics - Prometheus-like monitoring solution I work on. It allows calculating cumulative increase over multiple counters. For example, the following MetricsQL query returns cumulative increase over all the time series with http_requests_total name on the selected time range in Grafana:
running_sum(sum(increase(http_requests_total)))
How does it work?
It calculates increase per each time series with the http_requests_total name. Note that the increase() in the query above doesn't contain lookbehind window in square brackets. VictoriaMetrics automatically sets the lookbehind window to the step value, which is passed by Grafana to /api/v1/query_range endpoint. The step value is the interval between points on the graph.
It sums increases returned at step 1 with the sum() function individually per each point on the graph.
It calculates cumulative increase over per-step increases returned at step 2 with the running_sum function.
If I understood your question's idea correctly, I think I managed to create such graph with a query like this
sum(max_over_time(counterName{someLabel="desiredlabelValue"}[7d]))
A graph produced by it looks like the blue one:
The reasons why the future part of the graph decreases are both because the future processing hasn't obviously yet happened and because the more-than-7-days-old processing slides out of the moving 7-day inspection window.
I have a 2-D time series for which I take 1-minute snapshots that I put in my influxdb.
To give a concrete example, consider a yield curve : this is a curve giving the interest rate by maturity date and looks like this:
maturity | 1YEAR | 2 YEARS | 2 YEARS | 3 YEARS | 4 YEARS | 5 YEARS |
interest | 0.5 | 0.75 | 0.83 | 0.99 | 1.01 | 1.05 |
My application takes snapshots of the curve and stores them in influxdb.
Now I want to plot these snapshots in grafana. So at one particular time stamp I want to plot the curve (X axis will be my maturities, and Y axis the corresponding interest rates for each maturity).
Can this be done in Grafana?
To the best of my knowledge, this is not currently possible with Grafana. One of your axes must always be time.
I have a list of employees and turn-around times, like so:
order | employee | turn-around
------------------------------
1 | Mark | 1
2 | Mark | 2
3 | Mark | 10
4 | John | 1
5 | John | 5
6 | John | 20
7 | Chad | 15
8 | Chad | 20
9 | Chad | 60
So, as you can see, the data ends to be skewed somewhat, and so I'd like to summarize each employee by their median turn-around:
employee | median turn-around
-----------------------------
Mark | 2
John | 5
Chad | 20
I'd also like to present each employee with a comparison of how they're doing compared to the other employees. For this summary, I'd like to use the difference from the median of the medians:
employee | median turn-around | median absolute difference
----------------------------------------------------------
Mark | 2 | -3
John | 5 | 0
Chad | 20 | +15
I'd like to have this automatically done in Crystal Reports 2013 so each employee gets their own page with a histogram of their turn-around times, their median turn-around time, and how it compares to the median of all the other employees' median turn-around times.
Alas, my crystal-fu is failing me in the last part. I have grouped the records by employee, created a formula field to calculate the turn-around time in the details, and created a formula to retrieve the median turn-around for the employee in the group footer. I've managed to create my histogram. But I cannot for the life of me figure out how to aggregate the group medians and report the median of that median without querying the same data again using a subreport. Is it possible to accomplish this without a subreport?
I have a table in excel with the below structure
Names | Pass | Fail |
= ==== == ==== ==== == =====
NameA | 2 | 3 |
NameB | 6 | 7 |
NameC | 3 | 4 |
The Pass/Fail details im getting from a series of rows using CountIF formula.
If i generate a graph now for this table in excel. I get the details based on the count.
Eg: For Name A, out of 5 rows - 2 are pass and 3 are Fail
I wanted to acheive this graph interms of percentage as Y-axis in the graph which says out of 100% - 40% are pass adn 60% percent are fail.
Can someone please help me out with this?
If you want to plot a percentage, calculate it in the sheet, and plot the calculations.
Alternatively, if you use a stacked 100% chart (column or bar), the bars will be scaled so they show the percentage. However, the data will still be the input values, and data labels will show these values and not percentages, and you will have both bars in the chart.
How LIBSVM works performs multivariate regression is my generalized question?
In detail, I have some data for certain number of links. (Example 3 links). Each link has 3 dependent variables which when used in a model gives output Y. I have data collected on these links in some interval.
LinkId | var1 | var2 | var3 | var4(OUTPUT)
1 | 10 | 12.1 | 2.2 | 3
2 | 11 | 11.2 | 2.3 | 3.1
3 | 12 | 12.4 | 4.1 | 1
1 | 13 | 11.8 | 2.2 | 4
2 | 14 | 12.7 | 2.3 | 2
3 | 15 | 10.7 | 4.1 | 6
1 | 16 | 8.6 | 2.2 | 6.6
2 | 17 | 14.2 | 2.3 | 4
3 | 18 | 9.8 | 4.1 | 5
I need to perform prediction to find the output of
(2,19,10.2,2.3).
How can I do that using above data for training in Matlab using LIBSVM? Can I train the whole data as input to the svmtrain to create a model or do I need to train each link separate and use the model create for prediction? Does it make any difference?
NOTE : Notice each link with same ID has same value.
This is not really a matlab or libsvm question but rather a generic svm related one.
How LIBSVM works performs multivariate regression is my generalized question?
LibSVM is just a library, which in particular - implements the Support Vector Regression model for the regression tasks. In short words, in a linear case, SVR tries to find a hyperplane for which your data points are placed in some margin around it (which is quite a dual approach to the classical SVM which tries to separate data with as big margin as possible).
In non linear case the kernel trick is used (in the same fashion as in SVM), so it is still looking for a hyperplane, but in a feature space induced by the particular kernel, which results in the non linear regression in the input space.
Quite nice introduction to SVRs' can be found here:
http://alex.smola.org/papers/2003/SmoSch03b.pdf
How can I do that using above data for training in Matlab using LIBSVM? Can I train the whole data as input to the svmtrain to create a model or do I need to train each link separate and use the model create for prediction? Does it make any difference? NOTE : Notice each link with same ID has same value.
You could train SVR (as it is a regression problem) with the whole data, but:
seems that var3 and LinkId are the same variables (1->2.2, 2->2.3, 3->4.1), if this is a case you should remove the LinkId column,
are values of var1 unique ascending integers? If so, these are also probably a useless featues (as they do not seem to carry any information, they seem to be your id numbers),
you should preprocess your data before applying SVM so eg. each column contains values from the [0,1] interval, otherwise some features may become more important than others just because of their scale.
Now, if you would like to create a separate model for each link, and follow above clues, you end up with 1 input variable (var2) and 1 output variable var4, so I would not recommend such a step. In general it seems that you have very limited featues set, it would be valuable to gather more informative features.