How to put labels on each data points in stem plot using matlab - matlab

so this is my x and y data:
x = [29.745, 61.77, 42.57, 70.049, 108.51, 93.1, 135.47, 52.79, 77.91, 116.7, 100.71, 146.37, 125.53]
y = [6, 6, 12, 24, 24, 12, 24, 8, 24, 24, 24, 48, 8]
stem(x,y);
so i want to label each data point on my stem plot, this i want output i want:
i edit it using paint, can matlab do this vertical labeling? just what the image look like? please help.

Yes it can! You just need to provide the rotation property of text annotations with a value of 90 and it works fine.
Example:
clear
clc
x = [29.745, 61.77, 42.57, 70.049, 108.51, 93.1, 135.47, 52.79, 77.91, 116.7, 100.71, 146.37, 125.53]
y = [6, 6, 12, 24, 24, 12, 24, 8, 24, 24, 24, 48, 8]
hStem = stem(x,y);
%// Create labels.
Labels = {'none'; 'true';'false';'mean';'none';'';'true';'hints';'high';'low';'peas';'far';'mid'}
%// Get position of each stem 'bar'. Sorry I don't know how to name them.
X_data = get(hStem, 'XData');
Y_data = get(hStem, 'YData');
%// Assign labels.
for labelID = 1 : numel(X_data)
text(X_data(labelID), Y_data(labelID) + 3, Labels{labelID}, 'HorizontalAlignment', 'center','rotation',90);
end
Which gives the following:
The last label is a bit high so you might want to rescale the axes, but you get the idea.

Related

DBSCAN on 3d coordinates doesn't find clusters

I'm trying to cluster points in a 3D coordinates DataFrame of 1428 points.
The clusters are relatively flat planes that are elongated clouds DataFrame. They are very obvious clusters so I was hoping to try unsupervised clustering (not putting in the number of clusters expected) KMeans does not properly separate them and does require the number of clusters:
Kmeans plot results
The data looks as follows:
5 6 7
0 9207.495280 18922.083277 4932.864
1 5831.199280 3441.735280 5756.326
2 8985.735280 12511.719280 7099.844
3 8858.223280 28883.151280 5689.652
4 6801.399277 6468.759280 7142.524
... ... ... ...
1423 10332.927277 22041.855280 5136.252
1424 6874.971277 12937.563277 5467.216
1425 8952.471280 28849.887280 5710.522
1426 7900.611277 19128.255280 4803.122
1427 10234.635277 18734.631280 5631.286
[1428 rows x 3 columns]
I was hoping DBSCAN would deal better with this data. However, when I try the following (I played around with eps and min_samples but without success):
from sklearn.cluster import DBSCAN
dbscan = DBSCAN(eps=10, min_samples = 50)
clusters = dbscan.fit_predict(X)
print('Clusters found', dbscan.labels_)
len(clusters)
I get this output:
Clusters found [-1 -1 -1 ... -1 -1 -1]
1428
I have been confused about getting this to work, especially since Kmeans did work:
kmeans = sk_cluster.KMeans(init='k-means++', n_clusters=9, n_init=50)
kmeans.fit_predict(X)
centroids = kmeans.cluster_centers_
kmeans_labels = kmeans.labels_
error = kmeans.inertia_
print ("The total error of the clustering is: ", error)
print ('\nCluster labels')
The total error of the clustering is: 4994508618.792263
Cluster labels
[8 0 7 ... 3 8 1]
Remember this golden rule:
Always and always perform normalization on your data before feeding it to ML / DL algorithm.
Reason being, your columns have different range, probably one column has a range of [10000,20000] and other has [4000,5000] when you will plot these coordinates on a graph, they will be very very far away, Clustering/Classification will never work, maybe Regression will. Scaling brings the range of each of the column to same level but still maintaining the distance but with different scale. It is just like in google MAPS, when you zoom in scale decrease and when you zoom out scale increases.
You are free to choose the normalization algorithm, there are almost 20-30 available on sklearn.
Edit:
Use this code:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X)
X_norm = scaler.transform(X)
from sklearn.cluster import DBSCAN
dbscan = DBSCAN(eps=0.05, min_samples = 3,leaf_size=30)
clusters = dbscan.fit_predict(X_norm)
np.unique(dbscan.labels_)
array([-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47])
What I found that as DBSCAN is a density based approach and I Tried sklearn normalizer(from sklearn.preprocessing import normalize) which basically converts into gaussian distribution, but it didn't work and it should not in case of DBSCAN as it requires each feature to have similar density.
So, I went with MinMax scaler as it should turn each features density similar and One thing to note, that as your data points after scaling, are less than 1, one should use epsilon in the similar range as well.
Kudos :)

How to plot a graph in MATLAB using a vector of time values in hh:mm format

I have a vector of time values in hh:mm format as well as a vector of values representing levels of activity.
For example:
x=[06:18, 07:58, 08:38, 09:18, 10:58];
y=[14, 28, 33, 68, 24];
Is it possible to plot a graph of y vs. x in Matlab?
If not, is there a way to display a vector of EPOCH time values as time in the format hh:mm, on the graph?
For example:
x= [1383260400, 1383261000, 1383261600, 1383262200, 1383262800];
y=[14, 28, 33, 68, 24];
Thanks in advance for your help
This should do the trick:
time={'06:18', '07:58', '08:38', '09:18', '10:58'};
data=[14, 28, 33, 68, 24];
ts = timeseries(data,time);
ts.TimeInfo.Format = 'HH:MM';
ts.TimeInfo.StartDate = '00:00';
plot(ts)
The time stamps must be in a cell array and apart from that it should be pretty self explanatory.
If you would like to plot more lines in the same plot and the same time stamps just use a matrix instead of a vector for the data:
data=[14, 28, 33, 68, 24; 7, 14, 35, 34, 12];
You could also change the TickLabels. This does not require Time Series object
time={'06:18', '07:58', '08:38', '09:18', '10:58'};
data=[14, 28, 33, 68, 24];
plot(data)
set(gca,'XTickLabel', time)

Least square surface fitting from first principles

I want to make this "by hand" rather than using a surface fitting tool, because depending on the data I have, the surface fitting may vary. So, I first read the data in an excel sheet, then initialize some coefficients, calculate a 3D surface (f(x,y)) and then calculate the total least squares sum, which I'd like to minimise. Every time I run the script it tells me that I'm at the local minimum, even when I change the initial values. Changing the tolerance doesn't affect the result either.
This is the code:
% flow function in a separate .m file (approximation, it’s a negative paraboloid, maybe if required, this function may vary):
function Q = flow(P1,P2,a,b,c,d,e,f)
Q1 = a-b.*P1-c.*P1.^2;
Q2 = d-e.*P2-f.*P2.^2;
Q = Q1 + Q2;
% Variable read, I use a xlsread instead
p1a = [-5, -5, -5, -5, -5, -5, -5, -5, -5, -5];
p2a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
qa = [10, 9, 8, 7, 6, 5, 4, 3, 2, 1];
p1b = [-6, -6, -6, -6, -6, -6, -6, -6, -6, -6];
p2b = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
qb = [12, 11, 10, 9, 8, 7, 6, 5, 4, 3];
% Variable initialization
coef = [50, 1, 1, 10, 1, 1];
% Function calculation
q1a = flow(p1a,p2a,coef(1),coef(2),coef(3),coef(4),coef(5),coef(6));
q1b = flow(p1b,p2b,coef(1),coef(2),coef(3),coef(4),coef(5),coef(6));
% Least squares
LQa = (qa-q1a).^2;
LQb = (qb-q1b).^2;
Sa = sum(LQa);
Sb = sum(LQb);
St = Sa+Sb;
% Optimization (minimize the least squares sum)
func = #(coef)(St);
init = coef;
opt = optimoptions('fminunc', 'Algorithm', 'quasi-newton', 'Display', 'iter','TolX', 1e-35, 'TolFun', 1e-30);
[coefmin, Stmin] = fminunc(func, init, opt);
If you run this, you should get a result of 15546 for Stmin, but if you change the coefficients, you'll get another result, and it will also be considered as a local minimum.
What am I doing wrong?
The problem is that your func is just a constant. It simply returns a pre-calculated value, St, which is constant no matter what input you pass to func. Try calling func with various different inputs to test this.
Your objective function needs to contain all the calculations that got you to St. So I suggest you replace your func with a function saved in an m-file looking something like this:
function St = objectiveFunction(coef, p1a, p2a, p1b, p2b, qa, qb, q1a, q1b)
% Function calculation
q1a = flow(p1a,p2a,coef(1),coef(2),coef(3),coef(4),coef(5),coef(6));
q1b = flow(p1b,p2b,coef(1),coef(2),coef(3),coef(4),coef(5),coef(6));
% Least squares
LQa = (qa-q1a).^2;
LQb = (qb-q1b).^2;
Sa = sum(LQa);
Sb = sum(LQb);
St = Sa+Sb;
end
And then in your script call objectiveFunction using an anonymous function like this:
[coefmin, Stmin] = fminunc(#(coef)(objectiveFunction(coef, p1a, p2a, p1b, p2b, qa, qb, q1a, q1b)), init, opt);
The idea is to create an anonymous function that only takes a single parameter, coef, which is the variable that fminunc will peturb and pass back to your objective function. The other parameters that your objectiveFunction needs (i.e. p1a, p2a, p1b,...) are now considered to be pre-calculated by your anonymous function and thus by fminunc.
The rest of your code can stay the same.

Scipy interp1d and matlab interp1

The following are the inputs for my interpolation:
x = [-1.01, 5.66, 5.69, 13.77, 20.89]
y = [0.28773, 1.036889, 1.043178, 1.595322, 1.543763]
new_x = [0, 2, 4, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20]
The results from matlab interp1 and scipy.interpolate interp1d are different.
The results are like this.
new_y_scipy=[0.401171, 0.625806, 0.850442, 1.062384, 1.186291, 1.248244, 1.310198, 1.372152, 1.434105, 1.496059, 1.545429, 1.55267, 1.559911, 1.567153, 1.574394, 1.588877,]
new_y_matlab=[0.401171, 0.625806, 0.850442, 1.064362, 1.201031, 1.269366, 1.3377, 1.406035, 1.47437, 1.542704, 1.593656, 1.586415, 1.579174, 1.571932, 1.564691, 1.550208]
Apparently matlab seems to get better result than scipy. What is the fundamental difference?
I think that your data from scipy might be messed up somehow, because I can't reproduce your problem. For me, the results from scipy match perfectly with your results from matlab. See below for a demonstration:
import numpy as np
from scipy.interpolate import interp1d
import matplotlib.pyplot as plt
x = [-1.01, 5.66, 5.69, 13.77, 20.89]
y = [0.28773, 1.036889, 1.043178, 1.595322, 1.543763]
new_x = [0, 2, 4, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20]
new_y_scipy=[0.401171, 0.625806, 0.850442, 1.062384, 1.186291, 1.248244, 1.310198, 1.372152, 1.434105, 1.496059, 1.545429, 1.55267, 1.559911, 1.567153, 1.574394, 1.588877,]
new_y_matlab=[0.401171, 0.625806, 0.850442, 1.064362, 1.201031, 1.269366, 1.3377, 1.406035, 1.47437, 1.542704, 1.593656, 1.586415, 1.579174, 1.571932, 1.564691, 1.550208]
askewchan = interp1d(x,y)(new_x)
# 'linear' has no effect since it's the default, but I'll plot it too:
set_interp = interp1d(x, y, kind='linear')
new_y = set_interp(new_x)
plt.plot(x, y, 'o', new_x, new_y_scipy, '--', new_x, new_y_matlab, ':', new_x, askewchan, '.', new_x, new_y, '+')
plt.legend(('Original','OP_scipy', 'OP_matlab', 'askewchan_scipy', 'OP style scipy'), loc='lower right')
np.allclose(new_y_matlab, interp1d(x,y)(new_x))
#True

Finding the location of maximum peaks in a plot with MATLAB

Suppose I have the F matrix like this:
F =
0, 0, 106, 10, 14, 20, 20, 23, 27, 26, 28, 28, 28, 23
| | |
peak peak peak
I'm using the command plot(F). I want to get the indexes of the peaks in the data.
This is the code I have so far, it does not work:
[max_x,index_x]=max(x);
e=index_x;
for i=1:11
index_x(i)=e;
e=e+16;
end
Is there a builtin function in matlab that will do this for me?
Use the findpeaks function (Signal Processing Toolbox).
[peakVal,peakLoc]= findpeaks(x);
Well here is what I prefer:
[maxval maxloc] = max(A(:));