I have two matrices. One is of size 1,000,000 x 9 and the other is 500,000 x 9.
The columns have the same meaning and the first 7 columns have the function of a key. Correspondingly, the last two columns have data character. There are many overlapping key values in both of the matrices and I would like to have a big matrix to compare the values. This big matrix should be of dimension 1,000,000 x 11.
For example:
A = [0 0 0 0 0 0 0 10 20; 0 0 0 0 0 0 1 30 40];
B = [0 0 0 0 0 0 0 50 60];
A merged matrix would look like this:
C = [0 0 0 0 0 0 0 10 20 50 60; 0 0 0 0 0 0 1 30 40 0 0];
As you can see, the first row of C has columns 8, 9 from matrix A and columns 10,11 from matrix B. The second row uses the columns 8, 9 from matrix A and 0,0 for the last to columns because there is no corresponding entry in matrix B.
I have accomplished this task theoretically, but it is very, very slow. I use loops a lot. In any other programming language, I would sort both tables, would iterate both of the tables in one big loop keeping two pointers.
Is there a more efficient algorithm available in Matlab using vectorization or at least a sufficiently efficient one that is idiomatic/short?
(Additional note: My largest issue seems to be the search function: Given my matrix, I would like to throw in one column vector 7x1, let's name it key to find the corresponding row. Right now, I use bsxfun for that:
targetRow = data( min(bsxfun(#eq, data(:, 1:7), key), [], 2) == 1, :);
I use min because the result of bsxfun is a vector with 7 match flags and I obviously want all of them to be true. It seems to me that this could be bottleneck of a Matlab algorithm)
Maybe with ismember and some indexing:
% locates in B the last ocurrence of each key in A. idxA has logicals of
% those keys found, and idxB tells us where in B.
[idxA, idxB] = ismember(A(:,1:7), B(:,1:7),'rows');
C = [ A zeros(size(A, 1), 2) ];
C(idxA, 10:11) = B(idxB(idxA), 8:9); % idxB(idxA) are the idxB != 0
I think this does what you want, only tested with your simple example.
% Initial matrices
A = [0 0 0 0 0 0 0 10 20;
0 0 0 0 0 0 1 30 40];
B = [0 0 0 0 0 0 0 50 60];
% Stack matrices with common key columns, 8&9 or 10&11 for data columns
C = [[A, zeros(size(A,1),2)]; [B(:,1:7), zeros(size(B,1),2), B(:,8:9)]];
% Sort C so that matching key rows will be consecutive
C = sortrows(C,1:7);
% Loop through rows
curRow = 1;
lastRow = size(C,1) - 1;
while curRow < lastRow
if all(C(curRow,1:7) == C(curRow+1,1:7))
% If first 7 cols of 2 rows match, take max values (override 0s)
% It may be safer to initialise the 0 columns to NaNs, as max will
% choose a numeric value over NaN, and it allows your data to be
% negative values.
C(curRow,8:11) = max(C(curRow:curRow+1, 8:11));
% Remove merged row
C(curRow+1,:) = [];
% Decrease size counter for matrix
lastRow = lastRow - 1;
else
% Increase row counter
curRow = curRow + 1;
end
end
Answer:
C = [0 0 0 0 0 0 0 10 20 50 60
0 0 0 0 0 0 1 30 40 0 0]
Related
I have a NxNx5 array T that I would like to convert into a Rx5 array TT such that the following condition is satisfied (where R is the number of non-zero entries of the array T(:,:,1)):
If T(i,j,1) == 0 then we ignore. If T(i,j,1) != 0 then I would like a row of TT whose entry is
[T(i,j,1) T(i,j,2) T(i,j,3) T(i,j,4) T(i,j,5)]
Note that T(i,j,k) (k = 2,3,4,5) could be zero. For example,
If
T(3,2,1) = 3
then I would like a row of TT to be
[3 0 2 1 5].
Some notes:
The entries of TT are all integers.
The entries accent in order column wise. i.e the first column of TT(:,:,1) maybe
[1 2 0 0 3 4 0 0 0 5 6]'
then the next column
[7 8 0 0 0 0 0 9 10 11 12]'
I think this does what you want:
ind = find(T(:,:,1));
ind = bsxfun(#plus, ind(:), (0:size(T,3)-1)*size(T,1)*size(T,2));
result = T(ind);
This will do it:
clear
rng(343)
N=7;
K=5;
T=randi([0,4],[N,N,K])
TT=reshape(T,[N*N,K])
TT(T(:,1)==0,:)=[] %delete rows with first col equal to 0
I would like to enter the same vector of numbers repeatedly to an existing matrix at specific (row) logical indices. This is like an extension of entering just a single number at all logical index positions (at least in my head).
I.e., it is possible to have
mat = zeros(5,3);
rowInd = logical([0 1 0 0 1]); %normally obtained from previous operation
mat(rowInd,1) = 15;
mat =
0 0 0
15 0 0
0 0 0
0 0 0
15 0 0
But I would like to do sth like this
mat(rowInd,:) = [15 6 3]; %rows 2 and 5 should be filled with these numbers
and get an assignment mismatch error.
I want to avoid for loops for the rows or assigning vector elements single file. I have the strong feeling there is an elementary matlab operation that should be able to do this? Thanks!
The problem is that your indexing picks two rows from the matrix and tries to assign a single row to them. You have to replicate the targeted row to fit your indexing:
mat = zeros(5,3);
rowInd = logical([0 1 0 0 1]);
mat(rowInd,:) = repmat([15 6 3],sum(rowInd),1)
This returns:
mat =
0 0 0
15 6 3
0 0 0
0 0 0
15 6 3
I have a matrix A with size 10x100 as shown below. What I want to do is:
I'll work row by row in which for each row I'll check the data of
each coloumn in this row
Let's say I'm now in the first col cell in the first row. I'll check if the value is zero I'll move to the next col, and so on till I found a col having a non-zero value and save its col number e.g. col 3 "this means that col 1&2 were zeros"
Now I'm in the first non zero col in row1, I'll move to the next col till I find a col with zero value. I'll fetch the col just before this zero one which must be a non-zero one and save it. e.g col 7 "this means that col4&5&6 are non-zeros and col8 is zero"
Now I want to save the median middle col between this two columns e.g col3 and col7 then the middle col is col5 so I'll save the index row1_col5. if there are two middle values then any of them is fine.
I'll then move to the next col till I find a non-zero col "do the
same steps from 2-->5" till the first row is finished.
Move to the next row and start over again from step 2-->5.
There are two rules: -The first one is that I'll get the middle index of non-zero consecutive values only if there is a minimum of 3 non-zero consecutive values, if there are two non-zero consecutive value then the middle will not be calculated -The second one is that if the number of zero consecutive values are less than 3 then they will be ignored and will be considered as non-zero values. e.g in the below example the first row middle values are col5 and col11. In row2 col5 is counted, while no cols in row3 satisfy this conditions , and in row4 col6 or col7 will be counted.
After finishing all the rows want to have a vector or array holding the positions of all the middle indexes e.g row1_col5 row1_col17 row2_col_10 and so on.
example:
A = [ 0 0 0 2 4 1 0 0 0 1 3 2;
0 0 0 5 1 1 1 1 0 0 0 1;
0 3 4 1 0 3 1 2 0 0 1 3;
0 0 0 0 1 3 4 5 0 0 0 0];
for the first row the middle value will be 5 and 11 and so on
So if anyone could please advise how can I do this with least processing as this can be done using loops but if there is more efficient way of doing it? Please let me know if any clarification is needed.
Now you have clarified your question (again...) here is a solution (still using a for loop...). It includes "rule 7" - excluding runs of fewer than three elements; it also includes the second part of that rule - runs of fewer than three zeros don't count as zero. The new code looks like this:
A = [ 0 0 0 2 4 1 0 0 0 1 3 2;
0 0 0 5 1 1 1 1 0 0 0 1;
0 3 4 1 0 3 1 2 0 0 1 3;
0 0 0 0 1 3 4 5 0 0 0 0];
retVal = cell(1, size(A, 1));
for ri = 1:size(A,1)
temp = [1 0 0 0 A(ri,:) 0 0 0 1]; % pad ends with 3 zeros + 1
% so that is always a "good run"
isz = (temp == 0); % find zeros - pad "short runs of 0" with ones
diffIsZ = diff(isz);
f = find(diffIsZ == 1);
l = find(diffIsZ == -1);
shortRun = find((l-f)<3); % these are the zeros that need eliminating
for ii = 1:numel(shortRun)
temp(f(shortRun(ii))+1:l(shortRun(ii))) = 1;
end
% now take the modified row:
nz = (temp(4:end-3)~=0);
dnz = diff(nz); % find first and last nonzero elements
f = find(dnz==1);
l = find(dnz==-1);
middleValue = floor((f + l)/2);
rule7 = find((l - f) > 2);
retVal{ri} = middleValue(rule7);
end
You have to use a cell array for the return value since you don't know how many elements will be returned per row (per your updated requirement).
The code above returns the following cell array:
{[5 11], [6], [7], [7]}
I appear still not to understand your "rule 7", because you say that "no columns in row 3 satisfy this condition". But it seems to me that once we eliminate the short runs of zeros, it does. Unless I misinterpret how you want to treat a run of non-zero numbers that goes right to the edge (I assume that's OK - which is why you return 11 as a valid column in row 1; so why wouldn't you return 7 for row 3??)
Try this:
sizeA = size(A);
N = sizeA(1);
D = diff([zeros(1, N); (A.' ~= 0); zeros(1,N)]) ~= 0;
[a b] = find(D ~= 0);
c = reshape(a, 2, []);
midRow = floor(sum(c)/2);
midCol = b(1:2:length(b))
After this, midRow and midCol contain the indices of your centroids (e.g. midRow(1) = 1, midCol(1) = 4 for the example matrix you gave above.
If you don't mind using a for loop:
A = [ 0 0 1 1 1 0 1;
0 0 0 0 0 0 0;
0 1 1 1 1 0 0;
0 1 1 1 0 1 1;
0 0 0 0 1 0 0]; % data
sol = repmat(NaN,size(A,1),1);
for row = 1:size(A,1)
[aux_row aux_col aux_val] = find(A(row,:));
if ~isempty(aux_col)
sol(row) = aux_col(1) + floor((find(diff([aux_col 0])~=1,1)-1)/2);
% the final 0 is necessary in case the row of A ends with ones
% you can use either "floor" or "ceil"
end
end
disp(sol)
Try it and see if it does what you want. I hope the code is clear; if not, tell me
I have a matrix including 1 and 0 elements like below which is used as a network adjacency matrix.
A =
0 1 1 1
1 1 0 1
1 1 0 1
1 1 1 0
I want to simulate an attack on the network, so I must replace some specific percent of 1 elements randomly with 0. How can I do this in MATLAB?
I know how to replace a percentage of elements randomly with zeros, but I must be sure that the element that is replaced randomly, is one of the 1 elements of matrix not zeros.
If you want to change each 1 with a certain probability:
p = 0.1%; % desired probability of change
A_ones = find(A); % linear index of ones in A
A_ones_change = A_ones(rand(size(A_ones))<=p); % entries to be changed
A(A_ones_change) = 0; % apply changes in those entries
If you want to randomly change a fixed fraction of the 1 entries:
f = 0.1; % desired fraction
A_ones = find(A);
n = round(f*length(A_ones));
A_ones_change = randsample(A_ones,n);
A(A_ones_change) = 0;
Note that in this case the resulting fraction may be different to that intended, because of the need to round to an integer number of entries.
#horchler's point is a good one. However, if we keep it simple, then you can just multiple your input matrix to a mask matrix.
>> a1=randint(5,5,[0 1]) #before replacing 1->0
a1 =
1 1 1 0 1
0 1 1 1 0
0 1 0 0 1
0 0 1 0 1
1 0 1 0 1
>> a2=random('unif',0,1,5,5) #Assuming frequency distribution is uniform ('unif')
a2 =
0.7889 0.3200 0.2679 0.8392 0.6299
0.4387 0.9601 0.4399 0.6288 0.3705
0.4983 0.7266 0.9334 0.1338 0.5751
0.2140 0.4120 0.6833 0.2071 0.4514
0.6435 0.7446 0.2126 0.6072 0.0439
>> a1.*(a2>0.1) #And the replacement prob. is 0.1
ans =
1 1 1 0 1
0 1 1 1 0
0 1 0 0 1
0 0 1 0 1
1 0 1 0 0
And other trick can be added to the mask matrix (a2). Such as a different freq. distribution, or a structure (e.g. once a cell is replaced, the adjacent cells become less likely to be replaced and so on.)
Cheers.
The function find is your friend:
indices = find(A);
This will return an array of the indices of 1 elements in your matrix A and you can use your method of replacing a percent of elements with zero on a subset of this array. Then,
A(subsetIndices) = 0;
will replace the remaining indices of A with zero.
I am trying to write a short matlab function that will recieve a vector and will return me the index of the first element of the longest sequence of 1s (I can assume that the sequence consists of 1s and 0s). for example:
IndexLargeSeq([110001111100000000001111111111110000000000000000000000000000000])
will return 21 - which is the index of the first 1 of the longest sequence of 1s.
thank you
ariel
There you go:
% input:
A = [0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1]';
% replace 0 with 2 because the next command doesn't work with '0' as values
A(A == 0) = 2;
% accumulate data sets
B = [A(diff([A; 0]) ~= 0), diff(find(diff([0; A; 0])))];
% maximize second column where first column == 1
maxSeq = max(B(B(:, 1) == 1, 2));
% get row of B where first column == 1 && second column == maxSeq
row = find(B(:,1) == 1 & B(:,2) == maxSeq, 1);
% calculate the index of the first 1s of this longest sequence:
idx = sum(B(1:(row-1),2)) + 1
idx than is the value (the index) you are looking for, maxSeq is the length of this sewuence of 1s. A has to be a row-vector.
If you want to understand how the datasets are accumulated (the command B = ...), look here: How to accumulate data-sets?.
Here is another option measuring distances between indices of 0s. The code takes into account situations if there are no 1s at all (returns empty vector), or if there are multiple sequences with the longest length. x is an input row vector.
idx = find([1 ~x 1]); %# indices of 0s +1
idxdiff = diff(idx); %# lengths of sequences (+1)
maxdiff = max(idxdiff);
if maxdiff == 1
maxseqidx = []; %# no 1s at all
else
%# find all longest sequences, may be more then one
maxidx = find(idxdiff == maxdiff);
maxseqidx = idx(maxidx);
end
disp(maxseqidx)
EDIT: If x can be either row or column vector, you can change the first line to
idx = find([1; ~x(:); 1]);
The output will be a column vector in this case.