I have a set of scatter points. They are height of sixty plants (cm) over time(days). I measure each of them for three times (days:~10, ~50, ~100)But some of the plants does not have the second or/and third measurement yet. Here are the small example of my data showing four plants (A,B,C,D).
Plant Days Height
A 10 2
B 11 5
C 12 4
D 12 5
A 57 7
B 56 8
C 53 6
A 100 12
B 100 10
Then I could use plotmatrix(Days, Height) to plot the scatter points. I need to make percentile curves (similar to children growth rate) in MATLAB. I tried to use prctile(height, [25 50 75], 1) could only output the 25th, 50th and 75th value of height but not a growth rate curve. Could anyone suggest a way to generate the percentile curve of a set of scatter points over time please? Is regression needed to generate a growth rate curve (25th, 50th, 75th) of sixty plants?
Sorry I am still new to Matlab and statistics, please help. Thanks!
Related
I have to plot a time series data in MATLAB. The Y axis is a parameter taken six hourly for each day in a certain month of the year. And 44 such years have been taken into account. From 1958 to 2001. So the points on the X axis are 4*31*44=5456. How can I plot the data efficiently in MATLAB? The data file has two column vectors.
I have to plot the x axis so that it shows 44 July s from 1958 to 2001 . Each July has 124 points.
One for the time points (5456 points) so 5456 rows and other for the parameter measured. Thanks a lot.
As you don't give any more details, it is hard to know exactly what you are asking. If you have a matrix A with two columns, then you are looking for
plot( A(:,1), A(:,2) )
Alternatively, perhaps you want to see the histogram, hist, or the scatter plot scatter.
Well your X-axis(time data) is most probably not in datetime format & hence the problem.Once that is done, the plot will show what you want. You should try and change it to datetime & then try
plot(X,Y)
or
plot(A(:,1),A(:,2))
whichever be your data format
In MATLAB (R2015b) I have to find the midpoint between two time series of different lengths (ca 2000 vs. 3000 rows), in both series the first column is time and second is a measurement. Such as A:
09:30:14 23
09:31:03 23.5
And B:
09:30:19 25.5
09:30:37 25
09:31:12 24.5
How can I get MATLAB to calculate the midpoint value between A and B and get the result as shown below?
09:30:19 24.25 (Here it is 23+(25.5-23)/2)
09:30:37 24 (Here it is 23+(25-23)/2)
09:30:12 24 (Here it is 23.5+(24.5-23.5)/2)
You can use the interp1 function to estimate the value of one series at the time points corresponding to the other samples. Then the time points agree and you can just take the mean of the values.
interp1 supports several interpolation methods, such as nearest and linear.
I have a question that I don't know if there is a solution off the bat.
Here it goes,
I have two data sets, plotted on the same figure. I need to find their difference, simple so far...
the problem arises in the fact that say matrix A has 1000 data points while the second (matrix B) has 580 data points. How will I be able to find the difference between the two graphs since there is a dimensional miss match between the two figures.
One way that I thought of is artificially inflating matrix B to 1000 data points, but the trend of the plot will remain the same. Would this be possible? and if yes how?
for example:
A=[1 45 33 4 1009 ];
B=[1 22 33 44 55 66 77 88 99 1010];
Ya=A.*20+4;
Yb=B./10+3;
C=abs(B - A)
plot(A,Ya,'r',B,Yb)
xlim([-100 1000])
grid on
hold on
plot(length(B),C)
One way to do it is to resample the 580 element vector to 1000 samples. Use matlab resample (requires the Signal Processing Toolbox, I believe) for this:
x = randn(580,1);
y = randn(1000,1);
xr = resample(x, 50,29); # 50/29 = 1000/580 is the resampling ratio
You should then be able to compare the two data vectors.
There are two ways that I can think of:
1- Matching the size:
Generating more data for the matrix with lower number of elements (using interpolation, etc.)
Removing some data from the matrix with higher number of elements (i.e. outlier removal)
2- Comparing the matrices with their properties.
For instance, you can calculate the mean and the covariance of a matrix and compare it to the other matrix. The other options include, cov , mean , median , std, var , xcorr , xcov.
Сan anyone shine a light to my matlab program?
I have data from two sensors and i'm doing a kNN classification for each of them separately.
In both cases training set looks like a set of vectors of 42 rows total, like this:
[44 12 53 29 35 30 49;
54 36 58 30 38 24 37;..]
Then I get a sample, e.g. [40 30 50 25 40 25 30] and I want to classify the sample to its closest neighbor.
As a criteria of proximity I use Euclidean metrics, sqrt(sum(Y2)), where Y is a difference between each element and it gives me an array of distances between Sample and each Class of Training Set.
So, two questions:
Is it possible to convert distance into distribution of probabilities, something like: Class1: 60%, Class 2: 30%, Class 3: 5%, Class 5: 1%, etc.
added: Up to this moment I'm using formula: probability = distance/sum of distances, but I cannot plot a correct cdf or histogram.
This gives me a distribution in some way, but I see a problem there, because if distance is large, for example 700, then the closest class will get a biggest probability, but it'd be wrong because the distance is too big to be compared with any of classes.
If I would be able to get two probability density functions, I guess then I would do some product of them. Is it possible?
Any help or remark is highly appreciated.
I think there are multiple way of doing this:
as Adam suggested using 1/d / sum(1/d)
use the square, or even higher ordered of inverse of distance, e.g 1/d^2 / sum(1/d^2), This will make the class probability distribution more skewed. For example if 1/d generated 40%/60% prediction, the 1/d^2 may gave a 10%/90%.
use softmax (https://en.wikipedia.org/wiki/Softmax_function), the exponential of negative distance.
use exp(-d^2)/sigma^2 / sum[exp(-d^2)/sigma^2], this will imitate the Gaussian Distribution likelihoods. Sigma could be the average within-cluster distance, or simply set to 1 for all clusters.
You could try to inverse your distances to get a likelihood measure. I.e. the bigger the distance x, the smaller the inverse of it. Then, you can normalize as in probability = (1/distance) / (sum (1/distance) )
Hi: Have you ever tried with the formula probability = 1-distance assuming that you are using a standardized distance between 0 and 1?
I want to create a ROC curve in Matlab using the perfcurve function (it's for logistic regression similar as illustrated in this example (bottom of page)). I have 150 datapoints (binary data), but they are neither positive nor negative classes; they are the number of positive observations within the particular datapoint.
Example (random data to illustrate):
datapoint +ve observations total observations
1 23 35
2 27 41
3 23 36
4 18 29
5 19 39
6 21 41
7 24 40
8 29 36
9 38 45
10 12 32
The example illustrated on mathworks (bottom of page) only demonstrates how to create labels for data rows that correspond either solely to positive or negative classes.
For
[X,Y,T,AUC] = perfcurve(labels,scores,posclass)
how do I have to format my labels and posclass in order to make the ROC curve plot work?
Thank you very much in advance.
In order to create an ROC curve in Matlab using the perfcurve function, you need to have the score for each data point (which you pass to perfcurve using the scores argument). The score of a data point is given by your classifier and corresponds to the "probability" [1] that this data point belongs to the positive class (which is defined by the posclass argument). Given your data, you don't have enough information to use the perfcurve function.
[1] Some classifiers don't return strict probabilities but higher score indicates a higher probability so it's all right. More information in Fawcett, Tom. "An introduction to ROC analysis." Pattern recognition letters 27.8 (2006): 861-874.