Associated Labels in a dendrogram plot - MATLAB - matlab

I have the following set of data stored in file stations.dat :
Station A 305.2 321.1 420.9 383.5 311.7 197.1 160.2 113.9 60.5 60.5 64.8 154.3
Station B 281.1 304.0 353.1 231.9 84.6 20.9 11.7 11.9 31.1 75.8 133.0 235.3
Station C 312.3 342.2 366.2 335.2 200.1 74.4 45.9 27.5 24.0 53.6 87.7 177.0
Station D 402.2 524.5 554.9 529.5 347.5 176.8 120.2 35.0 12.6 13.3 14.0 61.6
Station E 261.3 262.7 282.3 232.6 103.8 33.2 16.7 33.2 111.0 149.0 184.8 227.0
By using the following commands,
Z = linkage (stations.data,'ward','euc');
figure (1), dendrogram(Z,0,'orientation', 'right')
I get the figure below:
So cluster 1 components are 4,3,1 (Stations D,C and A, respectively) and cluster 2 are 5,2(Stations E and B).
I want to put the name of Stations on plot, but if I use the command:
set (gca,'YTickLabel', stations.textdata);
The figure I get is the following:
How can I associate data to respective names and plot in dendrogram.
I have 144 stations data. I used only 5 for illustration.

Try the following:
ind = str2num(get(gca,'YTickLabel'));
set(gca, 'YTickLabel',stations.textdata(ind))
An easier way would be to specify the labels of the data points in the dendrogram call directly:
dendrogram(Z,0, 'Orientation','right', 'Labels',stations.textdata)

Related

Translating chemical equations from article, results differ (Matlab)

I've been trying to translate a set of chemical equations to MATLAB code, to be able to solve for different chemical species. I have the approximate solution (as it's from a graph) but after entering all the data and checking multiple times I still haven't been able to find what is wrong. I'm wondering what is going wrong and if anyone could please help me out. The source for the graph/equation is the article at this link: The chemistry of co-injected BOE. The graph I want to reproduce later on is figure 2 in the paper, see the image below:
Now the results I get for 10cc, 40cc and 90cc are respectively:
HF 43%, H2F2 48%, F- 3%, HF2- 6% in comparison ~28%, 63%, 2%, 7% (10cc).
HF 35%, H2F2 33%, F- 14%, HF2- 18% in comparison ~24%, 44%, 6%, 26% (40cc).
HF 21%, H2F2 12%, F- 37%, HF2- 30% in comparison ~18%, 23%, 20%, 45% (90cc).
The script is the following:
clc;
clear all;
%Units to be used
%Volume is in CC also cm^3, 1 litre is 1000 CC, 1 cc = 1 ml
%density is in g/cm^3
%weigth percentages are in fractions of 0 to 1
%Molecular weight is in g/mol
% pts=10; %number of points for linear spacing
%weight percentages of NH4OH and HF
xhf=0.49;
xnh3=0.28;
%H2O
Vh2o=1800;
dh2o=1.00; %0.997 at 25C when rounded 1
mh2o=18.02;
%HF values
Vhf=100;
dhf49=1.15;
dhf=dh2o+(dhf49-dh2o)*xhf/0.49; %# 25C
Mhf=20.01;
nhf=mols(Vhf,dhf,xhf,Mhf);
%NH4OH (NH3) values
% Vnh3=linspace(0.1*Vhf,1.9*Vhf,pts);
Vnh3=10;
dnh3=0.9; %for ~20-31% #~20-25C
Mnh3=17.03; %The wt% of NH4OH actually refers to the wt% of NH3 dissolved in H2O
nnh3=mols(Vnh3,dnh3,xnh3,Mnh3);
if max(nnh3)>=nhf
error(['There are more mols NH4OH,',num2str(max(nnh3)),', than mols HF,',num2str(nhf),'.'])
end
%% Calculations for species
Vt=(Vhf+Vh2o+Vnh3)/1000; %litre
A=nhf/Vt; %mol/l
B=nnh3/Vt; %mol/l
syms HF F H2F2 HF2 NH3 NH4 H OH
eq2= H*F/HF==6.85*10^(-4);
eq3= NH3*H/NH4==6.31*10^(-10);
eq4= H*OH==10^(-14);
eq5= HF2/(HF*F)==3.963;
eq6= H2F2/(HF^2)==2.7;
eq7= H+NH4==OH+F+HF2;
eq8= HF+F+2*H2F2+2*HF2==A;
eq9= NH3+NH4==B;
eqns=[eq2,eq5,eq6,eq8,eq4,eq3,eq9,eq7];
varias=[HF, F, H2F2, HF2, NH3, NH4, H, OH];
assume(HF> 0 & F>= 0 & H2F2>= 0 & HF2>= 0& NH3>= 0 & NH4>= 0 & H>= 0 & OH>= 0)
[HF, F, H2F2, HF2, NH3, NH4, H, OH]=vpasolve(eqns,varias);% [0 max([A,B])])
totalHF=double(HF)+double(F)+double(H2F2)+double(HF2);
HFf=double(HF)/totalHF %fraction of species for HF
H2F2f=double(H2F2)/totalHF %fraction of species for H2F2
Ff=double(F)/totalHF %fraction of species for F-
HF2f=double(HF2)/totalHF %fraction of species for HF2-
an extra function needed is called mols.m
%%%% amount of mol, Vol=volume, d=density, pwt=%weight, M=molecularweight
function mol=mols(Vol, d, pwt, M)
mol=(Vol*d*pwt)/M;
end
The equations being used from the article are in the image below:
(HF)2 is H2F2 in my script
So appears the issue wasn't so much with Matlab, had some help in that area as well.
Final solution and updated Matlab code can be found here:
https://chemistry.stackexchange.com/questions/98306/why-do-my-equilibrium-calculations-on-this-hf-nh4oh-buffer-system-not-match-thos

linear combination of curves to match a single curve with integer constraints

I have a set of vectors (curves) which I would like to match to a single curve. The issue isnt only finding a linear combination of the set of curves which will most closely match the single curve (this can be done with least squares Ax = B). I need to be able to add constraints, for example limiting the number of curves used in the fitting to a particular number, or that the curves lie next to each other. These constraints would be found in mixed integer linear programming optimization.
I have started by using lsqlin which allows constraints and have been able to limit the variable to be > 0.0, but in terms of adding further constraints I am at a loss. Is there a way to add integer constraints to least squares, or alternatively is there a way to solve this with a MILP?
any help in the right direction much appreciated!
Edit: Based on the suggestion by ErwinKalvelagen I am attempting to use CPLEX and its quadtratic solvers, however until now I have not managed to get it working. I have created a minimal 'notworking' example and have uploaded the data here and code here below. The issue is that matlabs LS solver lsqlin is able to solve, however CPLEX cplexlsqnonneglin returns CPLEX Error 5002: %s is not convex for the same problem.
function [ ] = minWorkingLSexample( )
%MINWORKINGLSEXAMPLE for LS with matlab and CPLEX
%matlab is able to solve the least squares, CPLEX returns error:
% Error using cplexlsqnonneglin
% CPLEX Error 5002: %s is not convex.
%
%
% Error in Backscatter_Transform_excel2_readMut_LINPROG_CPLEX (line 203)
% cplexlsqnonneglin (C,d);
%
load('C_n_d_2.mat')
lb = zeros(size(C,2),1);
options = optimoptions('lsqlin','Algorithm','trust-region-reflective');
[fact2,resnorm,residual,exitflag,output] = ...
lsqlin(C,d,[],[],[],[],lb,[],[],options);
%% CPLEX
ctype = cellstr(repmat('C',1,size(C,2)));
options = cplexoptimset;
options.Display = 'on';
[fact3, resnorm, residual, exitflag, output] = ...
cplexlsqnonneglin (C,d);
end
I could reproduce the Cplex problem. Here is a workaround. Instead of solving the first model, use a model that is less nonlinear:
The second model solves fine with Cplex. The problem is somewhat of a tolerance/numeric issue. For the second model we have a much more well-behaved Q matrix (a diagonal). Essentially we moved some of the complexity from the objective into linear constraints.
You should now see something like:
Tried aggregator 1 time.
QP Presolve eliminated 1 rows and 1 columns.
Reduced QP has 401 rows, 443 columns, and 17201 nonzeros.
Reduced QP objective Q matrix has 401 nonzeros.
Presolve time = 0.02 sec. (1.21 ticks)
Parallel mode: using up to 8 threads for barrier.
Number of nonzeros in lower triangle of A*A' = 80200
Using Approximate Minimum Degree ordering
Total time for automatic ordering = 0.00 sec. (3.57 ticks)
Summary statistics for Cholesky factor:
Threads = 8
Rows in Factor = 401
Integer space required = 401
Total non-zeros in factor = 80601
Total FP ops to factor = 21574201
Itn Primal Obj Dual Obj Prim Inf Upper Inf Dual Inf
0 3.3391791e-01 -3.3391791e-01 9.70e+03 0.00e+00 4.20e+04
1 9.6533667e+02 -3.0509942e+03 1.21e-12 0.00e+00 1.71e-11
2 6.4361775e+01 -3.6729243e+02 3.08e-13 0.00e+00 1.71e-11
3 2.2399862e+01 -6.8231454e+01 1.14e-13 0.00e+00 3.75e-12
4 6.8012056e+00 -2.0011575e+01 2.45e-13 0.00e+00 1.04e-12
5 3.3548410e+00 -1.9547176e+00 1.18e-13 0.00e+00 3.55e-13
6 1.9866256e+00 6.0981384e-01 5.55e-13 0.00e+00 1.86e-13
7 1.4271894e+00 1.0119284e+00 2.82e-12 0.00e+00 1.15e-13
8 1.1434804e+00 1.1081026e+00 6.93e-12 0.00e+00 1.09e-13
9 1.1163905e+00 1.1149752e+00 5.89e-12 0.00e+00 1.14e-13
10 1.1153877e+00 1.1153509e+00 2.52e-11 0.00e+00 9.71e-14
11 1.1153611e+00 1.1153602e+00 2.10e-11 0.00e+00 8.69e-14
12 1.1153604e+00 1.1153604e+00 1.10e-11 0.00e+00 8.96e-14
Barrier time = 0.17 sec. (38.31 ticks)
Total time on 8 threads = 0.17 sec. (38.31 ticks)
QP status(1): optimal
Cplex Time: 0.17sec (det. 38.31 ticks)
Optimal solution found.
Objective : 1.115360
See here for some details.
Update: In Matlab this becomes:

How to speed up this MATLAB code in the loop?

I have a matrix with 1200000 rows and 18 columns. It is traffic data and each row is unique and the first column is the "vehicle ID", the second is the "frame ID" and the 16th column shows the "front vehicle ID". I want MATLAB to find the row vector of the front vehicle from the whole data in the same frame ID and place it in another matrix called PV. Also if there is no car in front, place zero vector. The whole matrix is called "H1". I used the code below and for the work I have used percentage. However, its runtime is too slow and takes more than 14 hours on a 16GB memory. It is too long for me since I have 10 other such data. Please help me to make it faster and better.
Thanks in advance.
for i=1:1200000
i*100/m % Shows what percent of the work done
g = H1(H1(:,1)==H1(i,16),:);
g = g(g(:,2)==H1(i,2),:);
if isempty(g)==1
PV(i,:) = zero(1,:);
else
PV(i,[1:17])=g(1,[1:17]);
end
end
EDIT: The data is like a book with 10000 pages. Each page is a frame ID (the page number is the frame ID) which has many cars in it with unique vehicle ID. So any page shows an image which is took from above and there are many cars inside this image. If we attach the images together with a time interval of 0.1 second we get the driving film of the vehicles. This data includes the x and y coordinates of the vehicles which each frame can be depicted using the "plot" command in MATLAB. This data also includes the preceding vehicle (the vehicle in front of the subject vehicle) with its ID in the 16th column in each row. It is worth note that the information of all vehicles are present in the data. If there is no vehicle in front of the subject vehicle the number in the 16th column is zero. So any row shows the information of only one vehicle. The data are sorted with respect to frame ID.
Now I need to extract the row of the preceding vehicle from the whole matrix and place it in the matrix of PV. The problem is that the percentage goes slowly when it gets to about 5%. Here is a sample of the data:
[629 2033...581]
the first column is the vehicle ID, the second is the frame ID and the 16th is the preceding vehicle ID in this frame ID. Here the car number 581 is in front of the car number 629 in the frame ID of 2033. Now I need to extract the data for the vehicle ID 581 in the frame 2033 and place it in the PV matrix.
More samples: the first is the vehicle ID, the second is the frame ID and the third number is the preceding vehicle ID.
[629 2033 688 1113433338200 28.703 462.09 6042802.932 2133529.776 56.3 7.9 3 12.8 5.09 3 581 640 95.39]
[577 2033 465 1113433338200 79.392 618.232 6042833.946 2133691.06 17.3 8.4 2 30.19 -0.37 7 0 3362 0]
[580 2033 621 1113433338200 53.4 542.455 6042817.601 2133612.708 18.3 7.5 2 20.49 -0.09 5 572 3361 80.9]
[581 2033 565 1113433338200 27.252 557.481 6042789.779 2133624.359 16.8 7.4 2 21.25 4.19 3 573 629 62.54]
Sorry for the long explanation and thanks for your help in advance.
With the help of others I found the answer:
We first need to extract each frame ID into one cell array then apply the code for each frame.
N = max(H1(:,2));
for i=1:N
display('first')
i*100/N
F{i} = H1(H1(:,2)==i,:);
end
F = F(~cellfun(#isempty, F));
this code divides the frames. Then this is applied:
for j=1:10000
m = size(F{1,j},1);
for i=1:m
i*100/m % Shows what percent of the work done
g = F{1,j}(F{1,j}(:,1)==F{1,j}(i,16),:);
if isempty(g)==1
F{1,j}(i,[18:34]) = zero(1,:); % Preceding vehicle
else
F{1,j}(i,[18:34]) = g(1,[1:17]);
end
end
end
Thanks for the help. #hypfco and #m7913d

Decide best 'k' in k-means algorithm in weka

I am using k-means algorithm for clustering but I am not sure how to decide best optimal value of k based on the results.
For ex, i have applied k-means on a dataset for k=10:
kMeans
======
Number of iterations: 16
Within cluster sum of squared errors: 38.47923197081721
Missing values globally replaced with mean/mode
Cluster centroids:
Cluster#
Attribute Full Data 0 1 2 3 4 5 6 7 8 9
(214) (16) (9) (13) (23) (46) (12) (11) (40) (15) (29)
==============================================================================================================================================================================================================================================================
RI 1.5184 1.5181 1.5175 1.5189 1.5178 1.5172 1.519 1.5255 1.5175 1.5222 1.5171
Na 13.4079 12.9988 14.6467 12.8277 13.2148 13.1896 13.63 12.6318 13.0518 13.9107 14.4421
Mg 2.6845 3.4894 1.3056 0.7738 3.4261 3.4987 3.4917 0.2145 3.4958 3.8273 0.5383
Al 1.4449 1.1844 1.3667 2.0338 1.3552 1.4898 1.3308 1.1891 1.2617 0.716 2.1228
Si 72.6509 72.785 73.2067 72.3662 72.6526 72.6989 72.07 72.0709 72.9532 71.7467 72.9659
K 0.4971 0.4794 0 1.47 0.527 0.59 0.4108 0.2345 0.547 0.1007 0.3252
Ca 8.957 8.8069 9.3567 10.1238 8.5648 8.3041 8.87 13.1291 8.5035 9.5887 8.4914
Ba 0.175 0.015 0 0.1877 0.023 0.003 0.0667 0.2864 0 0 1.04
Fe 0.057 0.2238 0 0.0608 0.2013 0.0104 0.0167 0.1109 0.011 0.0313 0.0134
Type build wind non-float build wind float tableware containers build wind non-float build wind non-float build wind float build wind non-float build wind float build wind float headlamps
There are various methods for deciding the optimal value for "k" in k-means algorithm Thumb-Rule, elbow method, silhouette method etc. In my work I used to follow the result obtained form the elbow method and got succeed with my results, I had done all the analysis in the R-Language.
Here is the link of the description for those methods link
Try to find the sub links of the given link, build a code for any one of the method & apply on your data.
I hope this will help you, if not I am sorry.
All the Best with your work.

How do I use matrix and compare to table and interpolate to closest values and generate new matrix?

I have a question regarding interpolation and comparing values from a matrix to another matrix and then generating a new matrix with interpolated values.
I have a matrix with timestamps, wind speeds, and direction, that looks like this:
Timestamp Wind speed Direction
13-Apr-2000 00:10:00 9.285 265.59
13-Apr-2000 00:20:00 7.044 261.32
13-Apr-2000 00:30:00 6.578 258.66
13-Apr-2000 00:40:00 7.476 261.43
13-Apr-2000 00:50:00 6.918 260.29
13-Apr-2000 01:00:00 6.832 253.48
13-Apr-2000 01:10:00 6.368 250.11
13-Apr-2000 01:20:00 5.279 260.44
13-Apr-2000 01:30:00 5.27 266.75
In my other matrix I have my turbulence (TI) dependent on speed (downwards) and direction (from left to right):
0 5 10 15 20 25
0 12.368 12.368 12.368 12.7585 13.149 13.149
1 12.368 12.368 12.368 12.7585 13.149 13.149
2 11.934 11.934 11.934 12.4135 12.893 12.893
3 11.726 11.726 11.726 11.917 12.108 12.108
4 11.391 11.391 11.391 11.065 10.739 10.739
5 11.32 11.32 11.32 11.0505 10.781 10.781
6 11.062 11.062 11.062 10.958 10.854 10.854
7 10.932 10.932 10.932 11.0905 11.249 11.249
8 11.244 11.244 11.244 11.294 11.344 11.344
9 12.037 12.037 12.037 11.757 11.477 11.477
10 11.934 11.934 11.934 11.8795 11.825 11.825
I want to write a function where my input is the matrix with my timestamp, wind speed, and direction. I then want the function to consider each wind speed and direction at each timestamp and then interpolate to the closest value of the turbulence in my turbulence matrix.
I then want the function to generate a new time series (matrix) with my new values for turbulence at the same timestamp as in the original time series for each considered wind speed and direction.
How can I do this?
I'm using MATLAB 2011b and I don't have SIMULINK.
I use fit function to interpolate the data it Tt matrix.
In the beginning it looks for the nearest lower values of both Speed and Direction. Then in uses data around the point we are looking for and fit to them surface defined in form z=ax+by+c. Finally it evaluate the function to point of interrest.
% Data we have
Tt=[nan 0 5 10 15 20 25;
0 12.368 12.368 12.368 12.7585 13.149 13.149;
1 12.368 12.368 12.368 12.7585 13.149 13.149;
2 11.934 11.934 11.934 12.4135 12.893 12.893;
3 11.726 11.726 11.726 11.917 12.108 12.108;
4 11.391 11.391 11.391 11.065 10.739 10.739;
5 11.32 11.32 11.32 11.0505 10.781 10.781;
6 11.062 11.062 11.062 10.958 10.854 10.854;
7 10.932 10.932 10.932 11.0905 11.249 11.249;
8 11.244 11.244 11.244 11.294 11.344 11.344;
9 12.037 12.037 12.037 11.757 11.477 11.477;
10 11.934 11.934 11.934 11.8795 11.825 11.825];
% Data we are looking for
Speed=4.5;
Direction=12.5;
[M,N]=size(Tt);
Sindex=find(Speed<Tt(:,1),1); % find the index matching Speed
Dindex=find(Direction<Tt(1,:),1); % find the index matching Direction
if Sindex<=M&&Dindex<=N
% Speed and Direction are defined in the Tt table
S=[Tt(Sindex,1)*[1;1];Tt(Sindex+1,1)*[1;1]];
D=[Tt(1,Dindex);Tt(1,Dindex+1);Tt(1,Dindex);Tt(1,Dindex+1)];
T=[Tt(Sindex,Dindex);Tt(Sindex,Dindex+1);Tt(Sindex+1,Dindex);Tt(Sindex+1,Dindex+1)];
% S,D,T are in form of [x1;x1;x2;x2],[y1;y2;y1;y2],[z11;z12;z21;z22] vectorized matrix.
Tfit=fit([S,D],T,'poly11'); % get the linear fit of data, type help fit for more info
Turbulence=feval(Tfit,[Speed,Direction]) %Here we have the wanted Turbulence value.
else
%What shall we do w... hen data are out of the matrix?
end
Edit according to comment
If the Tt matrix are in form [S,D,T], then
Tfit=fit(Tt(:,1:2),Tt(:,3),'linearinterp');
will interpolate whole matrix.
If you have several Tt matrices I'd recommend two-step approach. At first create your TtDataBase.mat by
save('TtDataBase.mat','Tfit1') % Run this in workspace for the first time only
save('TtDataBase.mat','Tfit2','-append') % Run this for the other Tfits
The first create/rewrite exiting .mat file while the second appends/rewrite existing new variable to the existing .mat file.
I'd recommend to append some description of each Tfit, say the range of validity etc.
In the second step You can use
load(`TtDataBase.mat`,'Tfit2') % load Tfit2 only
load(`TtDataBase.mat`) % load all variables in TtDataBase
If you provide the specification and located the proper Tfit You can use
load('TtDataBase.mat','description')
% decide which Tfit is good for the situation
% some code here
ProperFit='Tfit5' % Say the automated "logic" choose Tfit5 to be the best
load('TtDataBase.mat',ProperFit); % Tfit5 will appear in workspace
eval(['Tfit=' ProperFit]) % Tfit will appear in the workspace as a copy of Tfit5
Turbulence=feval(Tfit,[Speed,Direction]); % actual Tfit5 data will be used for interpolation.
If you obtain new Tt matrix and you have the description variable(s) then do not forget to load the old description(s) and append new ones to them before using save('TtDataBase.mat','description','description2','-append') because by that command you are rewriting existing files.