Heatmap dendrogram using "ComplexHeatmap" package - cluster-analysis

I'm using the "ComplexHeatmap" package to create a heatmap of the correlations in a matrix.
I want to use my own clustering for the dendrogram of the heatmap so I run the code below:
library(ComplexHeatmap);
mat = matrix(rnorm(800),80,10);
cor.mat= cor(mat)
dist.mat = (1-cor.mat)/2;
rowdist = dist(as.matrix(dist.mat), method = "euclidean")
rowcluster = hclust(rowdist, method = "ward.D2")
coldist = dist(t(as.matrix(dist.mat)), method = "euclidean")
colcluster = hclust(coldist, method = "ward.D2")
par(mfrow=c(1,2));plot(rowcluster);plot(colcluster);
Heatmap(cor.mat ,cluster_rows=rowcluster, cluster_columns=colcluster)
Problem is, I get different clustering on the rows and columns (asymmetrical), despite the fact that the cluster objects are the same.
Even if I pass the Heatmap function the exact same object for rows and columns it still displays a different order for rows and columns.
if I just create the dendrograms i.e. plot(rowcluster) or plot(colcluster) they are identical.
I want to get a symmetrical heatmap.
Any idea why this happens?
Thanks

Use rowclust=colclust.
No need to transpose.
But note that you have a distance matrix already, so "euclidean" is wrong. You are computing a distance matrix of your distance matrix!

Related

Matlab xcorr giving different values for different implementations of cross-correlation

I am trying to perform a cross-correlation but noticed that performing this in two different ways results in slightly different results.
I have a vector with some spikes ('dual_spikes') and I want to cross-correlate this with 'dips' (using xcorr in Matlab).
I noticed a difference when I perform this in two different ways:
perform an xcorr as normal with 'dual_spikes'
perform an xcorr with each individual spike, add them together, and normalise.
I do not know why there should be a difference. Use the following function below for illustration.
function [] = xcorr_fault()
dual_spikes = [zeros(1,200),ones(1,200),zeros(1,400),ones(1,100),zeros(1,100)];
dips = 1-[zeros(1,400),ones(1,1),zeros(1,599)];
plot(dips)
single_spike_1 = [zeros(1,200),ones(1,200),zeros(1,600)];
single_spike_2 = [zeros(1,800),ones(1,100),zeros(1,100)];
xcorr_dual = xcorr_div(dual_spikes,dips);
xcorr_single1 = xcorr_div(single_spike_1,dips);
xcorr_single2 = xcorr_div(single_spike_2,dips);
xcorr_single_all = (xcorr_single1+xcorr_single2)/max(xcorr_single1+xcorr_single2);
xcorr_dual_norm = xcorr_dual/max(xcorr_dual);
figure(1)
clf
hold all
plot(xcorr_dual_norm)
plot(xcorr_single_all)
legend('Single xcorr','xcorr with individual spikes')
function [xcorr_norm] = xcorr_div(lines,signal)
xcorr_signal = xcorr(signal,lines,'none');
xcorr_signal(xcorr_signal<1e-13) = NaN;
xcorr_bg = xcorr(ones(1,length(signal)),lines,'none');
xcorr_norm = xcorr_signal ./ xcorr_bg;
xcorr_norm(isnan(xcorr_norm)) = 1;
Note, the xcorr signal must have a 'background' (bg) divided so only the dips are found. This happens in 'xcorr_div'.
Your function xcorr_div computes cross-correlation, then divides the result with the correlation with a uniform signal. The result is some sort of normalized cross-correlation (not the standard definition) that is not linear. Thus, you should not expect that the sum of the result is the result of the sum.
If you want to be able to get the same result both ways, output both xcorr_signal and xcorr_norm from xcorr_div, then do the sum on those two outputs then divide.

Adjacency matrix from edge list (preferrably in Matlab)

I have a list of triads (vertex1, vertex2, weight) representing the edges of a weighted directed graph. Since prototype implementation is going on in Matlab, these are imported as a Nx3 matrix, where N is the number of edges. So the naive implementation of this is
id1 = L(:,1);
id2 = L(:,2);
weight = L(:,3);
m = max(max(id1, id2)) % to find the necessary size
V = zeros(m,m)
for i=1:m
V(id1(i),id2(i)) = weight(i)
end
The trouble with tribbles is that "id1" and "id2" are nonconsecutive; they're codes. This gives me three problems. (1) Huge matrices with way too many "phantom", spurious vertices, which distorts the results of algorithms to be used with that matrix and (2) I need to recover the codes in the results of said algorithms (suffice to say this would be trivial if id codes where consecutive 1:m).
Answers in Matlab are preferrable, but I think I can hack back from answers in other languages (as long as they're not pre-packaged solutions of the kind "R has a library that does this").
I'm new to StackOverflow, and I hope to be contributing meaningfully to the community soon. For the time being, thanks in advance!
Edit: This would be a solution, if we didn't have vertices at the origin of multiple vertices. (This implies a 1:1 match between the list of edge origins and the list of identities)
for i=1:n
for j=1:n
if id1(i) >0 & i2(j) > 0
V(i,j) = weight(i);
end
end
end
You can use the function sparse:
sparse(id1,id2,weight,m,m)
If your problem is that the node ID numbers are nonconsecutive, why not re-map them onto consecutive integers? All you need to do is create a dictionary of all unique node ID's and their correspondence to new IDs.
This is really no different to the case where you're asked to work with named nodes (Australia, Britain, Canada, Denmark...) - you would map these onto consecutive integers first.
You can use GRP2IDX function to convert your id codes to consecutive numbers, and ids can be either numerical or not, does not matter. Just keep the mapping information.
[idx1, gname1, gmap1] = grp2idx(id1);
[idx2, gname2, gmap2] = grp2idx(id2);
You can recover the original ids with gmap1(idx1).
If your id1 and id2 are from the same set you can apply grp2idx to their union:
[idx, gname,gmap] = grp2idx([id1; id2]);
idx1 = idx(1:numel(id1));
idx2 = idx(numel(id1)+1:end);
For the reordering see a recent question - how to assign a set of coordinates in Matlab?
You can use ACCUMARRAY or SUB2IND to solve this problem.
V = accumarray([idx1 idx2], weight);
or
V = zeros(max(idx1),max(idx2)); %# or V = zeros(max(idx));
V(sub2ind(size(V),idx1,idx2)) = weight;
Confirm if you have non-unique combinations of id1 and id2. You will have to take care of that.
Here is another solution:
First put together all your vertex ids since there might a sink vertex in your graph:
v_id_from = edge_list(:,1);
v_id_to = edge_list(:,2);
v_id_all = [v_id_from; v_id_to];
Then find the unique vertex ids:
v_id_unique = unique(v_id_all);
Now you can use the ismember function to get the mapping between your vertex ids and their consecutive index mappings:
[~,from] = ismember(v_id_from, v_id_unique);
[~,to] = ismember(v_id_to, v_id_unique);
Now you can use sub2ind to populate your adjacency matrix:
adjacency_matrix = zeros(length(from), length(to));
linear_ind = sub2ind(size(adjacency_matrix), from, to);
adjacency_matrix(linear_ind) = edge_list(:,3);
You can always go back from the mapped consecutive id to the original vertex id:
original_vertex_id = v_id_unique(mapped_consecutive_id);
Hope this helps.
Your first solution is close to what you want. However it is probably best to iterate over your edge list instead of the adjacency matrix.
edge_indexes = edge_list(:, 1:2);
n_edges = max(edge_indexes(:));
adj_matrix = zeros(n_edges);
for local_edge = edge_list' %transpose in order to iterate by edge
adj_matrix(local_edge(1), local_edge(2)) = local_edge(3);
end

MATLAB XYZ to Grid

I have a tab separated XYZ file which contains 3 columns, e.g.
586231.8 2525785.4 15.11
586215.1 2525785.8 14.6
586164.7 2525941 14.58
586199.4 2525857.8 15.22
586219.8 2525731 14.6
586242.2 2525829.2 14.41
Columns 1 and 2 are the X and Y coordinates (in UTM meters) and column 3 is the associated Z value at the point X,Y; e.g. the elevation (z) at a point is given as z(x,y)
I can read in this file using dlmread() to get 3 variables in the workspace, e.g. X = 41322x1 double, but I would like to create a surface of size (m x n) using these variables. How would I go about this?
Following from the comments below, I tried using TriScatteredInterp (see commands below). I keep getting the result shown below (it appears to be getting some of my surface though):
Any ideas what is going on to cause this result? I think the problem lies with themeshgrid command, though I'm not sure where (or why). I am currently putting in the following set of commands to calculate the above figure (my X and Y columns are in meters, and I know my grid size is 8m, hence ti/tj going up in 8s):
F = TriScatteredInterp(x,y,z,'nearest');
ti = ((min(x)):8:(max(x)));
tj = ((min(y)):8:(max(y)));
[qx,qy] = meshgrid(ti,tj);
qz = F(qx,qy);
imagesc(qz) %produces the above figure^
I think you want the griddata function. See Interpolating Scattered Data in MATLAB help.
Griddata and tirscattteredinterp are extremely slow. Use the utm2deg function on the file exchange and from there a combination of both vec2mtx to make a regular grid and then imbedm to fit the data to the grid.
I.E.
for i = 1:length(X)
[Lat,Lon ] = utm2deg(Easting ,Northing ,Zone);
end
[Grid, R] = vec2mtx(Lat, Lon, gridsize);
Grid= imbedm(Lat, Lon,z, Grid, R);
Maybe you are looking for the function "ndgrid(x,y)" or "meshgrid(x,y)"

matlab diagonal code

tesconf is 3x3 matrix;
tes_avg = (diag(tesconf)./sum(tesconf,2));
example that the result of tes_avg is given [0.345;0.3423;0.483]
However, i wish to get average result of this 3 values, how should i change the code above?Please advise...
Simply avg = mean(tes_avg); (or directly tes_avg = mean(diag(tesconf)./sum(tesconf,2));).

Matlab "interp2" problem regarding NaN at edges

I am a bit stuck on a simple exercise and would appreciate some help.
I am trying to do some simple 2D interpolation using the "interp2" function in Matlab for a variable 'tmin' of dimension [15x12]:
lat = 15:1.5:32;
lon = 70:1.5:92;
lat_interp = 15:1:32;
lon_interp = 70:1:92;
[X,Y] = meshgrid(lat,lon);
[Xi,Yi] = meshgrid(lat_interp,lon_interp);
tmin_interp = zeros(length(lon_interp),length(lat_interp),Num_Days);
tmin_interp(:,:) = interp2(X,Y,tmin(:,:),Xi,Yi,'linear');
This code results in the last row and last column of tmin_interp to be NaNs, i.e.:
tmin_interp(23,1:18) ==> NaN
tmin_interp(1:23,18) ==> NaN
Does anyone know what I might be doing wrong? Am I making a simple mistake with regards to the interpolation setup? Thank you for your time.
The reason they are nans is that there is no data before and after your grid to interpolate to. Linear interpolation uses the gradient of the field at the Xi,Yi, in order to estimate the value at that point. If there is nothing either side, it can't.
You can use extrapval parameter to extrapolate outside the X,Y you specify. Just add the parameter 0 after 'linear':
interp2(X,Y,tmin(:,:),Xi,Yi,'linear', 0);
This will put zero for the points 'on the edge'. However, it is likely that for points outside, they may fall off to some default value, like zero. To do this, you can add zeros before and after tmin:
tmin_padded = [ zeros(1,size(tmin,2)+2)
zeros(size(tmin,1),1) tmin zeros(size(tmin,1),1)
zeros(1,size(tmin,2)+2) ];
(haven't checked this but you get the idea.) you will also need to add some pre- and post-values to X and Y.
Use some other value, if that's the 'outside' or 'default' value of tmin.
PS why are you creating tmin_interp as 3-dimensional?
Or just try:
interp2(X,Y,tmin(:,:),Xi,Yi,'spline');
to avoid imposing the 0 value.
HTH!