Connect line chart when using datasets with non-overlapping labels(timestamps) - charts

I have several datasets (approx. 10) that comes from user input and the labels (x-axis) will therefore almost never overlap between the datasets.
What i would like to do is connecting points from the same datasets (for instance bloodPressure) thru a "non-existing" datapoint when this is necessary - like in the graph below. I would not want to fake a datapoint to achieve this.
Any suggestions how to do this?

I found the answer in the Chart.js documentation. It is possible to skip the array of labels and instead create the datasets with a x and y value like in the screen dump.
Works fine!

Related

How can I build the following graph in Grafana with Athena CSV as a source

I have the following CSV that my Athena as a service (aws) is using as a source.
I am currently using QuickSight to create the needed graph but I cant seem to be able to to go about this in Grafana. This is more or less what the CSV looks like:
All graphs are time based and should use the creation date as the timeline (x).
For y I would like to use different combinations to create different graphs.
For example, one of the graphs should have the time as the y while different colors of the y will represent different Tag combined with size.
My point is that many graphs could be extracted from this CSV yet I cant find a way to go about this as each query enables me to choose only table and column.
In QucikSight I am able to choose the x column (although it detects Creation date automatically) and add all kinds of filters for the y, all done graphically.
Is it possible to do in Grafana? Any help would be appreciated!

y axis limit - appears to alter the data that is analysed

I have a bunch of data where the hours taken to process an item ranges from 3-3000 hours. most of the data is <1000 hours
I am creating a boxplot of that data. I have an large number of outliers within the data that I don't need to display, but I do need to analyse.
I have tried to use both 'scale_y_continuous(limits=c(0,1000))' and 'ylim(0,1000)' that appears to change the data that is used to create the boxplot, I altered the limits to '20' to test this theory and I get a complete plot, which can only be because the method i'm using to limit the axis also limits the range of data analysed.
I'd like to limit the y axis but not limit the range of data that is used in the analysis, what function do I use to accomplish that?
many thanks
it appears that it's 'coord_cartesian(ylim = c(nnn,nnn))+' that I needed to use.

I cannot reproduce the results with kmeans in Orange

I've tried to repeat the same results with the same flow, and I don't understand the results are different in each situation.
I describe the situation I have a file with 192 instances and 37 features, y select in all cases the same columns and preprocess by Median and StdDev. It computes the PCA with 7 principal components. The following step is to run the k-means algorithm (k is between 2 and 8) from this 'new' dataset. The scatter plot shows the results for k=5.
I attached different images with my flows.
Image1: original flow
The first one is the original flow (it is painted of yellow color), which I would like to repeat without the rest of the options (the second image).
Image2: flows repeated
However, when I tried to do it, I saw that the results are different (the third image) Of course the colors don't determine the differences, however the clusters are different. In addition the Slhouette Scores are different too for the different flows.
Image3: results of the different flows
K-means initializes with the kmean++ and I have the question if I can "control" this, or if the way to initialize k-means is always randomly. I saw in other programmes that there is an option called seed which is used to control that an experiment can be repeated but I didn't see this option here or something similar.
I wonder if it is possible to obtain always the same results with the same flow (using k-means).
It seems that the issue happens because no random seed is set in the k-means widget. So initialization is different each time you repeat an experiment and because of nature of your data, the method converges differently. Can you please report your issue to Orange3 issue tracker.

Generate subset of data with known mean

I have a dataset of n observations (nx1 vector) and would like to create a subset of this data, whose mean is known in advance, by selecting at random only n/3 observations (or within some constraint, ie where the mean of the data subset is within a range about the known mean).
Can someone please help me with the code do this in matlab?
Note, I don't want to use the rand function to create random data as I already have my data collected.
For example on a smaller scale: If I had the following dataset of 12 observations:
data = [8;7;4;6;9;6;4;7;3;2;1;1];
but then wanted to randomly select a subset of this data containing only 4 observations with a mean of 4 (or with a mean between 3.5-4.5 for example):
Then the answer might be datasubset=[7;3;2;4] but the answer could also be datasubset=[6;4;2;4] or datasubset=[6;4;3;4].
It doesn't matter if there are several possible solutions, I just need one of them, though I'd like to know the alternative solutions also.

Tableau Dual Axis with different filters

I am trying to create a graph with two lines, with two filters from the same dimension.
I have a dimension which has 20+ values. I'd like one line to show data based on just one of the selected values and the other line to show a line excluding that same value.
I've tried the following:
-Creating a duplicate/copy dimension and filtering the original one with the first, and the copy with the 2nd. When I do this, the graphic disappears.
-Creating a calculated field that tries to split the measures up. This isn't letting me track the count.
I want this on the same axis; the best I've been able to do is create two sheets, one with the first filter and one with the 2nd, and stack them in a dashboard.
My end user wants the lines in the same visual, otherwise I'd be happy with the dashboard approach. Right now, though, I'd also like to know how to do this.
It is a little hard to tell exactly what you want to achieve, but the problem with filtering is common.
The principle that is important is that Tableau will filter the whole dataset by row. So duplicating the dimension you want to filter won't help as the filter on the original dimension will also filter the corresponding rows in the second dimension. Any solution has to be clever enough to work around this issue.
One solution is to build two new dimensions that use a calculation rather than a filter to create the new result. Let's say you have a dimension, [size] that has a range of numbers from 1 to 10 and you want to compare the total number of rows including and excluding the number 5. You could create a new field using a formula like if [size] <> 5 then 1 else 0 end
Summing the new field will give a count of the number of rows that don't contain a 5 and this can be compared directly to a rowcount of the original [size] field which will give the number including the value 5.
This basic principle can be extended to much more complex logic. The essential point is to realise that filters act on every row in your data and can't, by themselves, show comparisons with alternative filter choices on a single visualisation.
Depending on the nature of your problem there may be other solutions worth looking at including sets and groups but you would need to provide more specific details for users here to tell you whether they would be useful.
We can make a a set out of the values of the dimension and then place it in the required shelf. So, you will have your dimension which will plot accordingly and set which will have data as per the requirement because with filter you can't have that independence of showing data everytime you want.