Addressing Composite-Keyed 1NF Matlab Table with Arrays - matlab

The Problem
Consider a Table object, tbl, defined below, that holds 1NF normalized information with a 2 part composite key comprised of the unordered tuple of the Species1 and Species2 column values.
% build table ( in place of readtable )
dat = {"example1" "example1" "example1" "example2" "example4" "example3"; ...
"example2" "example3" "example4" "example3" "example2" "example4"; ...
0.1896 -0.0119 -0.0070 0.3257 0.1140 0.2086 };
tbl = cell2table(dat','VariableNames', {'Species1','Species2','k_ij'});
% clear temp vars
clear('dat')
You are given a one dimentional string array, names that holds some set of keys that can appear in either the Species1 or Species2 columns.
% Make an example 1x4 string array
names = ["example1" "example2" "example3" "example4"];
How can a nxn array of values from the k_ij column, as indexed by all combinations of the elements in names, be constructed? Can this be done only with matrix operations?
Assume that any self-self interactions have a zero value; that is k_ij = 0 when the two keys equal one another.
% spec.1 spec.2 spec.3 spec.4
out = [ 0 0.1896 -0.0119 -0.0070; ... % species 1
0.1896 0 0.3257 0.1140; ... % species 2
-0.0119 0.3257 0 0.2086; ... % species 3
-0.0070 0.1140 0.2086 0 ] % species 4
My Aproach Thusfar
My intuition is that it would be best to approach this by forming a nxn array holding all combinations of names, which you can see below.
% make grids come together
[names_1, names_2] = meshgrid(names,names');
% cross join the names
names_mat = reshape( ...
cat(2,names_1',names_2'), ...
length(names), ...
length(names), ...
2);
% clear temp vars
clear('names_1','names_2')
I think that the next step would be to attempt to use the values in names(i,j,:) to address tbl and extract an nxn array values associated from the k_ij column, and to add this to out to get some matrix of the form shown below.
This is the part that I cannot figure out
% spec.1 spec.2 spec.3 spec.4
tmp = [ NaN 0.1896 -0.0119 -0.0070; ... % species 1
NaN NaN 0.3257 NaN ; ... % species 2
NaN NaN NaN 0.2086; ... % species 3
NaN 0.1140 NaN NaN ] % species 4
Because we know that the self-self interactions are always going to be zero, we can set the diagnal to be zero.
tmp(eye(length(names))==1) = 0
this renders a table of the form
% spec.1 spec.2 spec.3 spec.4
tmp = [ 0 0.1896 -0.0119 -0.0070; ... % species 1
NaN 0 0.3257 NaN ; ... % species 2
NaN NaN 0 0.2086; ... % species 3
NaN 0.1140 NaN 0 ] % species 4
Then we can reflect across the diagnal to get the desired output
% grab vals across the diagonal for each value that is NaN
mirror_vals = tmp(isnan(tmp)');
% flip over the diagnal
tmp = tmp';
% stuff grabbed values into NaN indices
tmp(isnan(tmp)) = mirror_vals;
% push to output var
out = tmp;
% clear temporary varaible
clear('tmp','mirror_vals')

Related

Putting a smaller 3D matrix into a bigger 3D matrix (3D sub2ind)

I need to put a smaller 3D matrix into a bigger 3D matrix. Explaining with an example:
Suppose I have the following 3D matrices:
%A is the big matrix
A(:,:,1)=[ 0.3545 0.8865 0.2177
0.9713 0.4547 0.1257
0.3464 0.4134 0.3089];
A(:,:,2)=[ 0.7261 0.0098 0.7710
0.7829 0.8432 0.0427
0.6938 0.9223 0.3782];
A(:,:,3) = [0.7043 0.2691 0.6237
0.7295 0.6730 0.2364
0.2243 0.4775 0.1771];
%B is the small matrix
B(:,:,1) = [0.3909 0.5013
0.0546 0.4317];
B(:,:,2) =[0.4857 0.1375
0.8944 0.3900];
B(:,:,3) =[0.7136 0.3433
0.6183 0.9360];
Now to put B in A such that: use first dimension: [1 3], second dimension [2 3] and do this for [1,2,3] pages of A. For the given matrix, putting these values will result in:
NewA(:,:,1) = [ 0.3545 0.3909 0.5013 % putting the value of %B(1,:,1)
0.9713 0.4547 0.1257
0.3464 0.0546 0.4317; % putting the value of %B(2,:,1)
NewA(:,:,2)=[ 0.7261 0.4857 0.1375 % putting the value of %B(1,:,2)
0.7829 0.8432 0.0427
0.6938 0.8944 0.3900]; % putting the value of %B(2,:,2)
NewA(:,:,3) = [0.7043 0.7136 0.3433 % putting the value of %B(1,:,3)
0.7295 0.6730 0.2364
0.2243 0.6183 0.9360]; % putting the value of %B(2,:,3)
I won't necessarily have square matrices as 3D pages and the size of A to put B in can vary as well. But the matrices will always be 3D. Above is just a small example. What I actually have is dimensions as big as A -> [500,500,5] and B as -> [350,350,4].
This is what sub2ind do for 2D matrices but I am not yet able to manipulate into use for 3D matrices.
Something like:
NewA = A;
NewA(sub2ind(size(A), [1 3], [2 3], [1 2 3])) = B;
but it gives:
Error using sub2ind (line 69)
The subscript vectors must all be of the same size.
How can I do this?
You don't need sub2ind, just assign directly:
newA(1,2:3,:)=B(1,:,:)
If you want to use sub2ind, you need to specify each of the 3 dimensions, for each of the elements you want to replace:
dim1A=[1 1 1 1 1 1]; % always first row
dim2A=[2 3 2 3 2 3]; % second and third column, for each slice
dim3A=[1 1 2 2 3 3]; % two elements from each slice
newA(sub2ind(size(A),dim1A,dim2A,dim3A))=B(1,:,:)
newA(:,:,1) =
0.3545 0.3909 0.5013
0.9713 0.4547 0.1257
0.3464 0.4134 0.3089
newA(:,:,2) =
0.7261 0.4857 0.1375
0.7829 0.8432 0.0427
0.6938 0.9223 0.3782
newA(:,:,3) =
0.7043 0.7136 0.3433
0.7295 0.6730 0.2364
0.2243 0.4775 0.1771

How to interpolate only less than 3 consecutive Nan values in Matlab?

I have a 500x600 matrix containing some NaN values. I want to interpolate places where there are less than three NaNs (possibly an average of the preceding, following values) and for all the other places where there are more than 3 consecutive NaN values I want to leave them as Nan values. I have already looked at http://uk.mathworks.com/matlabcentral/answers/34481-interpolate-nans-only-if-less-than-4-consecutive-nans but even the accepted answer doesn't work. (I realise this one is for 4 consecutive values but it doesn't work either way).
If by writing 3 consecutive nans you mean 3 consecutive nans in a row or column, you can use the following approach:
For each row, use convolution to determine for each sequence of nans if its shorter than 3.
use the following approach to fill each line in the matrix.
fill the columns by transposing the result and executing the function again.
Code:
%generates example array
data = rand(5,5);
data (1,2:4) = nan;
data (2:5,2) = nan;
data (:,4) = nan;
%fills all relevan nans in a row
data2 = interpolateNanRows(data );
%fills all relevant nans in a column
out= interpolateNanRows(data2')';
Auxiliary functions:
function res = interpolateNanRows(data)
%zero padding
dataPad = zeros(size(data,1)+2,size(data,2)+2);
dataPad(2:end-1,2:end-1)=data;
%generates relevant nan maps
nansMap = isnan(dataPad);
irrelevantNans = conv2(double(nansMap),[1,0,0,0,1],'same')>0 & nansMap;
%fills each row
for ii=1:size(dataPad,1)
filledRow = interpolateRow(dataPad(ii,:));
%ignores irrelevant values (more than 3 consecutive nans)
filledRow(irrelevantNans(ii,:)) = nan;
dataPad(ii,:) = filledRow;
end
%generates output
res = dataPad(2:end-1,2:end-1);
end
function filledRow = interpolateRow(row)
%receives a vector of values, and perform interpolation in regions of nans
if sum(isnan(row))==0 || sum(isnan(row))==length(row)
filledRow = row;
return;
end
nanData = isnan(row);
index = 1:numel(row);
filledRow = row;
filledRow(nanData) = interp1(index(~nanData), row(~nanData), index(nanData));
end
results:
data2=
0.6386 NaN NaN NaN 0.6671
0.4805 NaN 0.3171 NaN 0.7771
0.1184 NaN 0.0124 NaN 0.6860
0.2455 NaN 0.3011 NaN 0.8014
0.7761 NaN 0.7239 NaN 0.2833
out =
0.6386 0.6457 0.6528 0.6599 0.6671
0.4805 0.3988 0.3171 0.5471 0.7771
0.1184 0.0654 0.0124 0.3492 0.6860
0.2455 0.2733 0.3011 0.5512 0.8014
0.7761 0.7500 0.7239 0.5036 0.2833

How to compare columns of a binary matrix and compare elements in matlab?

i have [sentences*words] matrix as shown below
out = 0 1 1 0 1
1 1 0 0 1
1 0 1 1 0
0 0 0 1 0
i want to process this matrix in a way that should tell W1 & W2 in "sentence number 2" and "sentence number 4" occurs with same value i.e 1 1 and 0 0.the output should be as follows:
output{1,2}= 2 4
output{1,2} tells word number 1 and 2 occurs in sentence number 2 and 4 with same values.
after comparing W1 & W2 next candidate should be W1 & W3 which occurs with same value in sentence 3 & sentence 4
output{1,3}= 3 4
and so on till every nth word is compared with every other words and saved.
This would be one vectorized approach -
%// Get number of columns in input array for later usage
N = size(out,2);
%// Get indices for pairwise combinations between columns of input array
[idx2,idx1] = find(bsxfun(#gt,[1:N]',[1:N])); %//'
%// Get indices for matches between out1 and out2. The row indices would
%// represent the occurance values for the final output and columns for the
%// indices of the final output.
[R,C] = find(out(:,idx1) == out(:,idx2))
%// Form cells off each unique C (these will be final output values)
output_vals = accumarray(C(:),R(:),[],#(x) {x})
%// Setup output cell array
output = cell(N,N)
%// Indices for places in output cell array where occurance values are to be put
all_idx = sub2ind(size(output),idx1,idx2)
%// Finally store the output values at appropriate indices
output(all_idx(1:max(C))) = output_vals
You can get a logical matrix of size #words-by-#words-by-#sentences easily using bsxfun:
coc = bsxfun( #eq, permute( out, [3 2 1]), permute( out, [2 3 1] ) );
this logical array is occ( wi, wj, si ) is true iff word wi and word wj occur in sentence si with the same value.
To get the output cell array from coc you need
nw = size( out, 2 ); %// number of words
output = cell(nw,nw);
for wi = 1:(nw-1)
for wj = (wi+1):nw
output{wi,wj} = find( coc(wi,wj,:) );
output{wj,wi} = output{wi,wj}; %// you can force it to be symmetric if you want
end
end

Mean and Standard Deviation of a column, ignoring zero values - Matlab

I am trying to find the mean of a column however I am having trouble getting an output for a function I created. My code is below, I cannot see what mistake I have made.
for j=1:48;
C_f2 = V(V(:,3) == j,:);
C_f2(C_f2==0)=NaN;
m=mean(C_f2(:,4));
s=std(C_f2(:,4));
row=[j,m,s];
s1=[s1;row];
end
I have checked the matrix, C_f2 and that is full of values so should not be returning NaN. However my output for the matrix s1 is
1 NaN NaN
2 NaN NaN
3 NaN NaN
. ... ...
48 NaN NaN
Can anyone see my issue? Help would me much appreciated!
The matrix C_f2 looks like,
1 185 01 5003
1 185 02 5009
. ... .. ....
1 259 48 5001
On line 3 you set all values which are zero to NaN. The mean function will return NaN as mean if any element is NaN. If you want to ignore the NaN values, you have to use the nanmean function, which comes with the Statistics toolbox. See the following example:
a = [1 NaN 2 3];
mean(a)
ans =
NaN
nanmean(a)
ans =
2
If you don't have the Statistics toolbox, you can exclude NaN elements with logical indexing
mean(a(~isnan(a)))
ans =
2
or it is possibly the easiest, if you directly exlude all elements which are zero instead of replacing them by NaN.
mean(a(a~=0))
Your line C_f2(C_f2==0)=NaN; will put NaNs into C_f2. Then, your mean and std operations will see those NaNs and output NaNs themselves.
To have the mean and std ignore NaN, you need to use the alternate version nanmean and nanstd.
These are part of a toolbox, however, so you might not have them if you just have the base Matlab installation.
Don't set it to NaN, any NaN involved computation without additional rules will return NaN,
use find to correctly index the none zero part of your column
say column n is your input
N = n(find(n~=0))
now do your Mu calculation
To compute the mean and standard deviation of each column excluding zeros:
A = [1 2;
3 0;
4 5;
6 7;
0 0]; %// example data
den = sum(A~=0); %// number of nonzero values in each column
mean_nz = bsxfun(#rdivide, sum(A), den);
mean2_nz = bsxfun(#rdivide, sum(A.^2), den);
std_nz = sqrt(bsxfun(#times, mean2_nz-mean_nz.^2, den./(den-1)));
The results for the example are
mean_nz =
3.5000 4.6667
std_nz =
2.0817 2.5166
The above uses the "corrected" definition of standard deviation (which divides by n-1, where n is the number of values). If you want the "uncorrected" version (i.e. divide by n):
std_nz = sqrt(mean2_nz-mean_nz.^2);

remove duplicates in column 1 of array by retaining only that entry in column 1 that has maximum value in column 2

I have a n X 2 matrix which has been formed by appending many matrices together. Column 1 of the matrix consists of numbers that indicate item_ids and column 2 consists of similarity values. Since this matrix has been formed by concatenating many matrices together, there might exist duplicate values in column 1 which I do not want. I would like to remove all the duplicate values in column 1 such that for any value X in column 1 of which there are duplicates, all the rows of the matrix are removed in which column 1 = X , except that row of the matrix where column 1 = X and column2 value is the maximum among all the values for X in the matrix.
Example:
1 0.85
1 0.5
1 0.95
2 0.5
result required:
1 0.95
2 0.5
obtained by removing all the rows in the n X 2 matrix where the duplicate values in column 1 did not have the maximum value in column 2.
If you might have gaps in the index, use sparse output:
>> result = accumarray( M(:,1), M(:,2), [], #max, 0, true)
>> uMat = [find(result) nonzeros(result)]
uMat =
1.0000 0.9500
2.0000 0.5000
This also simplifies creation of the first column of the output.
A couple of other ways to do it with unique.
First way, use sort with 'descend' ordering:
>> [~,IS] = sort(M(:,2),'descend');
>> [C,ia] = unique(M(IS,1));
>> M(IS(ia),:)
ans =
1.0000 0.9500
2.0000 0.5000
Second, use sortrows (ascending sort by second column), and unique with 'first' occurrence option:
>> [Ms,IS] = sortrows(M,2)
>> [~,ia] = unique(Ms(:,1),'last')
>> M(IS(ia),:)
ans =
1.0000 0.9500
2.0000 0.5000
You can try
result = accumarray( M(:,1), M(:,2), [max(M(:,1)) 1], #max);
According to the documentation, that should work.
Apologies I can't try it out right now...
update - I did try the above, and it gave me the max values correctly. However it doesn't give you the indices corresponding to the max values. For that, you need to do a bit more work (since the identifiers probably aren't sorted).
result = accumarray( M(:,1), M(:,2), [], #max, true); % to create a sparse matrix
c1 = find(result); % to get the indices of nonzero values
c2 = full(result(c1)); % to get the values corresponding to the indices
answer = [c1 c2]; % to put them side by side
result = accumarray( M(:,1), M(:,2), [max(M(:,1)) 1], #max);
finalResult = [sort(unique(M(:,1))),nonzeros(result)]
This basically reattaches the required item_ids in sorted order to the corresponding max_similarity values in the second column. As a result in the finalResult matrix, each value in column 1 is unique and the corresponding value in column 2 is the maximum similarity value for that item_id.
#Floris, thanks for your help couldn't have solved this without your help.
Yet another approach: use sortrows and then diff to select the last row for each value of the first column:
M2 = sortrows(M);
result = M2(diff([M2(:,1); inf])>0,:);
This works also if the indices in the first column have gaps.