Match cell arrays with different size based on two conditions in Matlab - matlab

RECCELL is a cell array with 8 columns and 30000 rows:
C1 C2 C3 C4 C5 C6 C7 C8
'AA' 1997 19970102 1 'BACHE' 'MORI' 148 127
'AA' 1997 19970108 2 'MORGAN' [] 1595 0
'AA' 1997 19970224 3 'KEMSEC' 'FATHI' 1315 297
CONCELL is a cell array with 4 columns and 70000 rows:
C1 C2 D3 D4
'AA' 1997 19970116 2,75
'AA' 1997 19970220 2,71
'AA' 1997 19970320 2,61
I would like to add to RECCELL the 4 columns of CONCELL only in case the C1s match and C3 and D3 (both dates) are the closest possible. For instance I would get in this example:
C1 C2 C3 C4 C5 C6 C7 C8 C1 C2 D3 D4
'AA' 1997 19970102 1 'BACHE' 'MORI' 148 127 'AA' 1997 19970116 2,75
'AA' 1997 19970108 2 'MORGAN' [] 1595 0 'AA' 1997 19970116 2,75
'AA' 1997 19970113 3 'KEMSEC' 'FATHI' 1315 297 'AA' 1997 19970220 2,71
To the first row of RECCELL corresponds the first row of CONCELL.
To the second row of RECCELL corresponds the first row of CONCELL.
To the third row of RECCELL corresponds the second row of CONCELL.
The code I have so far is:
[~, indCon, indREC] = intersect(CONCELL(:,1), RECCELL(:,1));
REC_CON=[RECCELL(indREC,:),CONCELL(indCon,:)];
NO_REC_CON= RECCELL(setdiff(1:size(RECCELL,1), indREC),:);
It's wrong because I cannot use intersect for a string element and because I am not considering the second condition, which is to choose the closest dates.
Can someone help me? Thank you

I would suggest to do this inside a for loop as the cells are very tall.
(Note: it seems like the date format (C3/D3) in the cell is a double opposed to a string, thus needs to be converted first for using datenum)
n=size(RECCELL,1);
ind=zeros(n,1);
rd=datenum(num2str(cell2mat(CONCELL(:,3))),'yyyymmdd'); % convert double to string
for k=1:n
a=find(ismember(CONCELL(:,1),RECCELL(k,1))==1); % find indices of matching C1s
if ~isempty(a) % do only if there is a match for the C1s
dnk=datenum(num2str(RECCELL{k,3}),'yyyymmdd'); % convert double to string
[~,f]=min((rd(a)-dnk).^2); % find closest date of the subset a
ind(k,1)=a(f); % assign index of closest match to ind
RECCELL(k,(end+1):(end+4))=CONCELL(ind(k,1),:); % add CONCELL to RECCELL, be aware that other rows will now display empty cells, and a row of RECCELL can keep 'growing'
end
end
The vector ind contains the indices of the closest match in CONCELL for each entry in RECCELL. When it contains a 0, no match was found between the C1s.
Edit: One possible solution to avoid increasing the number of columns of RECCELL if multiple CONCELL entries are added to the same RECCELL entry is the following which results in a adding a single column to the RECCELL matrix:
n=size(RECCELL,1);
RECCELL{1,end+1}=[]; % to add a single empty column to RECCELL
ind=zeros(n,1);
rd=datenum(num2str(cell2mat(CONCELL(:,3))),'yyyymmdd'); % convert double to string
for k=1:n
a=find(ismember(CONCELL(:,1),RECCELL(k,1))==1); % find indices of matching C1s
if ~isempty(a) % do only if there is a match for the C1s
dnk=datenum(num2str(RECCELL{k,3}),'yyyymmdd'); % convert double to string
[~,f]=min((rd(a)-dnk).^2); % find closest date of the subset a
ind(k,1)=a(f); % assign index of closest match to ind
if isempty(RECCELL{k,end}) % if nothing is in this cell, add the CONCELL entry to it
RECCELL{k,end}=CONCELL(ind(k,1),:);
else % if something is already in, add the new CONCELL entry to the cell
RECCELL{k,end}(end+1,1:4)=CONCELL(ind(k,1),:);
end
end
end

Related

What is a Mat-lab function for « If-in »?

Problem statement: Provide a function that does the following: c1) two vectors D1 and D2 have 7 elements each, then form the division between the corresponding components of each vector and assign the result to a vector name D3, placing a statement that avoids division by zero (i.e., does not divide if element to denominator is null);
The idea of the problem, is to set an error message whenever one of the elements of vector D2 is equal to 0.
My attempt:
D1 = [d1 d2 d3 d4 d5 d6 d7]
D2= [d21 d22 d23 d24 d25 d26 d27]
for i= 1:length(D1)
if 0 in D2
fprintf(‘error: division by 0/n’)
else
D3=D1./D2
end
I don’t know if the “if-in” structure exists in Matlab. If it doesn’t, what could be an equivalent?
Thanks in advance!!!
One way to avoid any division by zero is to modify D2 by replacing any 0 with nan. Divisions by nan produce nan, so it's easy to tell which division would have caused a problem by simply inspecting the resulting vector D3. Moreover, almost all Matlab's functions are able to handle nans nicely (i.e. without crashing) or can be instructed to do so by setting some option.
What I've just described can be accomplished by using logical indexing, as follows:
% Definition of D1 and D2
D1 = [d1 d2 d3 d4 d5 d6 d7]
D2 = [d21 d22 d23 d24 d25 d26 d27]
% Replace 0s with NaNs
D2(D2==0) = nan;
% Perform the divisions at once
D3 = D1./D2 ;
For more details on logical indexing, look at the relevant section here.
As the OP requests a function that does the job, here's a possible implementation:
function D3 = vector_divide(D1, D2)
% Verify that vectors are numeric
% and have the same dimensions
if isnumeric(D1) & isnumeric(D2) &...
(size(D1,1) == size(D2,1)) &...
(size(D1,2) == size(D2,2))
% replace 0s with NaNs
D2(D2==0) = nan;
% Perform the divisions at once
D3 = D1./D2 ;
else
disp('D1 and D2 should both be numeric and have the same size!');
D3 = [];
end
Error handling in case of non-numeric arrays or size mismatch might vary depending on project requirements, if any. For instance, I could have used error (instead of disp) to display a message and terminate the program.

Pivot Table with Additional Columns

Let's have a 2D double array such as:
% Data: ID, Index, Weight, Category
A0=[1 1121 204 1;...
2 2212 112 1;...
3 2212 483 3;...
4 4334 233 1;...
5 4334 359 2;...
6 4334 122 3 ];
I am needing to pivot / group by the rows with the highest Weights, for each given Index, which can be achieved with any Pivot Table | Group By functionality (such as pivottable, SQL GROUP BY or MS Excel PivotTable)
% Current Result
A1=pivottable(A0,[2],[],[3],{#max}); % Pivot Table
A1=cell2mat(A1); % Convert to array
>>A1=[1121 204;...
2212 483;...
4334 359 ]
How should I proceed if I need to recover also the ID and the Category columns?
% Required Result
>>A1=[1 1121 204 1;...
3 2212 483 3;...
5 4334 359 2 ];
The syntax is Matlab, but a solution involving other languages (Java, SQL) can be acceptable, due they can be transcribed into Matlab.
You can use splitapply with an anonymous function as follows.
grouping_col = 2; % Grouping column
maximize_col = 3; % Column to maximize
[~, ~, group_label] = unique(A0(:,grouping_col));
result = splitapply(#(x) {x(x(:,maximize_col)==max(x(:,maximize_col)),:)}, A0, group_label);
result = cell2mat(result); % convert to matrix
How it works: the anonymous function #(x) {x(x(:,maximize_col)==max(···),:)} is called by splitapply once for each group. The function is provided as input a submatrix containing all rows with the same value of the column with index grouping_col. What this function then does is keep all rows that maximize the column with index maximize_col, and pack that into a cell. The result is then converted to matrix form by cell2mat.
With the above solution, if there are several maximizing rows for each group all of them are produced. To keep only the first one, replace the last line by
result = cell2mat(cellfun(#(c) c(1,:), result, 'uniformoutput', false));
How it works: this uses cellfun to apply the anonymous function #(c) c(1,:) to the content of each cell. The function simply keeps the first row. Alternatively, to keep the last row use #(c) c(end,:). The result is then converted to matrix form using cell2mat again.

Use Matlab to find all combinations of multiple string arrays

I have three cell arrays of strings and I want to find all possible combinations of the three. So if I have:
One= Two= Three=
[A B [M N [W X
C D] O P] Y Z]
I want to be able to find all combinations, which would give me something like:
Four=
[AMW AMX AMY AMZ ANW ANX ANY ANZ AOW AOX AOY AOZ APW APX APY APZ
BMW BMX BMY BMZ BNW BNX BNY BNZ BOW BOX BOY BOZ BPW BPX BPY BPZ
etc.
etc.]
Is there a simple way to do this or will I have to somehow change the strings to integer values?
So, you have three cell arrays:
One = {'A' 'B' 'C' 'D'};
Two = {'M' 'N' 'O' 'P'};
Three = {'W' 'X' 'Y' 'Z'};
We can try to work with their indices
a = 1:length(One);
ind = combvec(a,a,a);
ind would be the matrix with all the combinations of three numbers from 1 to 4, e.g. 111 112 113 114 211 212 213 214 etc. According to combinatorics, its dimensions would be 3x64. The indices correspond to the letters in your cell array, i.e. 214 would correspond to 'BMZ' combination.
EDIT, developed an easy way to generate the combinations without combvec with help of #LuisMendo 's answer:
a=1:4;
[A,B,C] = ndgrid(a,a,a);
ind = [A(:),B(:),C(:)]';
Then you create the blank cell array with length equal to 64 - that's how many combinations of three letters you are expected to get. Finally, you start a loop that concatenates the letters according to the combination in ind array:
Four = cell(1,length(ind(1,:)));
for i = 1:length(ind(1,:))
Four(i) = strcat(One(ind(1,i)),Two(ind(2,i)),Three(ind(3,i)));
end
So, you obtain:
Four =
'AMW' 'BMW' 'CMW' 'DMW' 'ANW' ...

How to check range of values from one array into another array in MATLAB?

I have Two Arrays in MATLAB, Say A and B contains random values as below. Both arrays A and B always contain a pair; 2,4,6 or 8 or more elements (even number only) and A always has less elements than B. And elements in both arrays are pre-sorted.
A=[152 271];
B=[107 266 314 517 538 732];
I want to check the range of values of all pairs (one pair, 152-271 in this example) in A against all pairs of B. And expand/modify the values of pairs of B as such, if it exceed the B values. In this example, first to compare pair 152-271 of A with first pair of B (i.e. 107-266). As 152 is greater than 107, and 271 is greater than 266. We will modify 266 values of first pair of B with 271 to wholly include the range of first pair of A within B. Both intervals (range) in A and B should somewhat overlap to modify the B values.We will stop when there are no elements to check in A. The end result will be like this:
A=[152 271];
B=[107 271 314 517 538 732];
In this image below Green,Rad and Yellow represent A,B and final B (only modified) values respectively.
You can use find with the option last to identify the indices in B
A=[152 271 280 320];
B=[107 266 314 517 538 732];
for interval = 1:numel(A)-1
%get the index of the lower interval bound in B
index=find(B<=A(interval),1,'last');
%increase the upper interval bound if nessecary
B(index+1)=max(B(index+1),A(interval+1));
end
As you did not specify any corner cases (Intervals in A exceeds B) I did not conciser them. If they can happen, you need to extend the code.
A=[152 271];
B=[107 266 314 517 538 732];
mat=[A B];
A1 = vec2mat(mat,2)
n = size(mat,1);
[t,p] = sort(mat(:));
z = cumsum(accumarray((1:2*n)',2*(p<=n)-1));
z1 = [0;z(1:end-1)];
A2 = [t(z1==0 & z>0),t(z1>0 & z==0)]
% Reference Link: (http://www.mathworks.com/matlabcentral/newsreader/view_thread/171594) by Roger Stafford

MATLAB: Transform a flat file list into a multi-dimensional array

I am completely stuck with this: I start out with a flat file type of list I get from an SQL statement like this and want to transform it into a 4D array.
SELECT a1, a2, a3, a4, v FROM table A;
a1 a2 a3 a4 v
--------------
2 2 3 3 100
2 1 2 2 200
3 3 3 3 300
...
a1 to a4 are some identifiers (integers) from a range of (1:5), which are also the coordinates for the new to be populated 4D array.
v is a value (double) e.g. a result from a measurement.
What I now want is to transform this list into a 4D array of dimension (5,5,5,5) where each v is put at the right coordinates.
This could easily be done using a for loop, however as I have lots of data this is not really feasible.
If I had just 1 dimension, I would do somesthing like this:
a1 = [2;5;7]; % Identifiers
v = [17;18;19]; % Values
b1 = (1:10)'; % Range of Identifiers
V = zeros(10,1); % Create result vector with correct dimensions
idx = ismember(b1, a1); % Do the look up
V(idx) = v; % Insert
My question: How can I do this for the above mentioned 4D array without using a for loop. Is there a "Matlab Way" of doing it?
Any help is greatly appreciated!
Thanks,
Janosch
You should be able to do what you want using linear indexing, and the sub2ind function. It would look something like this.
x=zeros(5,5,5,5); %initialize output vector
i = sub2ind(size(x),a1,a2,a3,a4);
x(i) = v;