Matlab: Removing duplicate interactions [duplicate] - matlab

This question already has answers here:
How can I find unique rows in a matrix, with no element order within each row?
(4 answers)
Closed 7 years ago.
I have a Protein-Protein interaction data of homo sapiens. The size of the matrix is <4850628x3>. The first two columns are proteins and the third is its confident score. The problem is half the rows are duplicate pairs
if protein A interacts with B, C, D. it is mentioned as
A B 0.8
A C 0.5
A D 0.6
B A 0.8
C A 0.5
D A 0.6
If you observe the confident score of A interacting with B and B interacting with A is 0.8
If I have a matrix of <4850628x3> half the rows are duplicate pairs. If I choose Unique(1,:) I might loose some data.
But I want <2425314x3> i.e without duplicate pairs. How can I do it efficiently?
Thanks
Naresh

Supposing that in your matrix you store each protein with a unique id.
(Eg: A=1, B=2, C=3...) your example matrix will be:
M =
1.0000 2.0000 0.8000
1.0000 3.0000 0.5000
1.0000 4.0000 0.6000
2.0000 1.0000 0.8000
3.0000 1.0000 0.5000
4.0000 1.0000 0.6000
You must first sort the two first columns row-wise so you will always have the protein pairs in the same order:
M2 = sort(M(:,1:2),2)
M2 =
1 2
1 3
1 4
1 2
1 3
1 4
Then use unique with the second parameter rows and keep the indexes of unique pairs:
[~, idx] = unique(M2, 'rows')
idx =
1
2
3
Finally filter your initial matrix to keep unly the unique pairs.
R = M(idx,:)
R =
1.0000 2.0000 0.8000
1.0000 3.0000 0.5000
1.0000 4.0000 0.6000
Et voilĂ !

Related

moving mean on a circle

Is there a way to calculate a moving mean in a way that the values at the beginning and at the end of the array are averaged with the ones at the opposite end?
For example, instead of this result:
A=[2 1 2 4 6 1 1];
movmean(A,2)
ans = 2.0 1.5 1.5 3.0 5 3.5 1.0
I want to obtain the vector [1.5 1.5 1.5 3 5 3.5 1.0], as the initial array element 2 would be averaged with the ending element 1.
Generalizing to an arbitrary window size N, this is how you can add circular behavior to movmean in the way you want:
movmean(A([(end-floor(N./2)+1):end 1:end 1:(ceil(N./2)-1)]), N, 'Endpoints', 'discard')
For the given A and N = 2, you get:
ans =
1.5000 1.5000 1.5000 3.0000 5.0000 3.5000 1.0000
For an arbitrary window size n, you can use circular convolution with an averaging mask defined as [1/n ... 1/n] (with n entries; in your example n = 2):
result = cconv(A, repmat(1/n, 1, n), numel(A));
Convolution offers some nice ways of doing this. Though, you may need to tweak your input slightly if you are only going to partially average the ends (i.e. the first is averaged with the last in your example, but then the last is not averaged with the first).
conv([A(end),A],[0.5 0.5],'valid')
ans =
1.5000 1.5000 1.5000 3.0000 5.0000 3.5000 1.0000
The generalized case here, for a moving average of size N, is:
conv(A([end-N+2:end, 1:end]),repmat(1/N,1,N),'valid')

Store values from exisiting matrix into new matrix

I have a matrix containing 8 cols and 80k rows. (From an excel file)
Each row has an ID.
I want to store all data with ID no. 1 in a new matrix. And all data with ID no. 2 in a second matrix etc. So each time an ID changes I want to save all the data of a new ID in a new matrix.
There are above 800 ID's.
Ive tried several things w/o luck. Among others:
k = zeros(117,8)
for i =1:80000
k(i) = i + Dataset(1:i,:)
end
The above was only to see if I actually could get the first 117 rows saved in another matrix which didnt succeed.
If one of the 8 columns contains the ID then you can use logical indexing. For example if column 1 contains the ID, we can first find a list of all different ID values:
uniqueIDs = unique(Dataset(:, 1));
Then we can create cell array, with the lists of items of a given ID:
listsByID = cell(length(uniqueIDs), 1);
for idx = 1:length(uniqueIDs)
listsByID{idx} = Dataset(Dataset(:, 1) == uniqueIDs(idx), :);
end
Running the above on an example dataset:
Dataset = [1 0.1 10
1 0.2 20
2 0.3 30
3 0.4 40
2 0.5 50
2 0.6 60];
Results in:
1.0000 0.1000 10.0000
1.0000 0.2000 20.0000
2.0000 0.3000 30.0000
2.0000 0.5000 50.0000
2.0000 0.6000 60.0000
3.0000 0.4000 40.0000

Operations within matrix avoiding for loops

I have a rather simple question but I can't get the proper result in MATLAB.
I am writing a code in Matlab where I have a 200x3 matrix. This data corresponds to the recording of 10 different points, for each of which I took 20 frames.
This is just in order to account for the error in the measuring system. So now I want to calculate the 3D coordinates of each point from this matrix by calculating an average of the measured independent coordinates.
An example (for 1 point with 3 measurements) would be:
MeasuredFrames (Point 1) =
x y z
1.0000 2.0000 3.0000
1.1000 2.2000 2.9000
0.9000 2.0000 3.1000
Point = mean(MeasuredFrames(1:3, :))
Point =
1.0000 2.0667 3.0000
Now I want to get this result but for 10 points, all stored in a [200x3] array, in intervals of 20 frames.
Any ideas?
Thanks in advance!
If you have the Image processing toolbox blockproc could be an option:
A = blockproc(data,[20 3],#(x) mean(x.data,1))
If not the following using permute with reshape works as well:
B = permute(mean(reshape(data,20,10,3),1),[2,3,1])
Explanation:
%// transform data to 3D-Matrix
a = reshape(data,20,10,3);
%// avarage in first dimension
b = mean(a,1);
%// transform back to 10x3 matrix
c = permute(b,[2,3,1])
Some sample data:
x = [ 1.0000 2.0000 3.0000
1.1000 2.2000 2.9000
0.9000 2.0000 3.1000
1.0000 2.0000 3.0000
1.1000 2.2000 2.9000
0.9000 2.0000 3.1000
1.0000 2.0000 3.0000
1.1000 2.2000 2.9000
0.9000 2.0000 3.1000
1.1000 2.2000 2.9000]
data = kron(1:20,x.').';
A = B =
1.5150 3.1200 4.4850
3.5350 7.2800 10.4650
5.5550 11.4400 16.4450
7.5750 15.6000 22.4250
9.5950 19.7600 28.4050
11.6150 23.9200 34.3850
13.6350 28.0800 40.3650
15.6550 32.2400 46.3450
17.6750 36.4000 52.3250
19.6950 40.5600 58.3050
If you do not have access to the blockproc function, you can do it with a combination of reshape:
np = 20 ; %// number of points for averaging
tmp = reshape( A(:) , np,[] ) ; %// unfold A then group by "np"
tmp = mean(tmp); %// calculate mean for each group
B = reshape(tmp, [],3 ) ; %// reshape back to nx3 matrix
In your case, replace A by MeasuredFrames and B by Points, and group in one line:
Points = reshape(mean(reshape( MeasuredFrames (:) , np,[] )), [],3 ) ;
Matrix multiplication can be used:
N=20;
L=size(MeasuredFrames,1);
Points = sparse(ceil((1:L)/N), 1:L, 1)*MeasuredFrames/N;

Reshaping a matrix

I have a matrix that looks something like this:
a=[1 1 2 2 3 3 4 4;
1.5 1.5 2.5 2.5 3.5 3.5 4.5 4.5]
what I would like to do is reshape this ie.
What I want is to take the 2x2 matrices next to one another and put them underneath each other.
So get:
b=[1 1;
1.5 1.5;
2 2;
2.5 2.5;
3 3;
3.5 3.5;
4 4;
4.5 4.5]
but I can't seem to manipulate the reshape function to do this for me
edit: the single line version might be a bit complicated, so I've also added one based on a for loop
2 reshapes and a permute should do it (we first split the matrices and store them in 3d), and then stack them. In order to stack them we first need to permute the dimensions (similar to a transpose).
>> reshape(permute(reshape(a,2,2,4),[1 3 2]),8,2)
ans =
1.0000 1.0000
1.5000 1.5000
2.0000 2.0000
2.5000 2.5000
3.0000 3.0000
3.5000 3.5000
4.0000 4.0000
4.5000 4.5000
the for loop based version is a bit more straight forward. We create an empty array of the correct size, and then insert each of the 2x2 matrices separately:
b=zeros(8,2);
for i=1:4,
b((2*i-1):(2*i),:) = a(:,(2*i-1):(2*i));
end

matlab: duplicate rows removal [duplicate]

This question already has answers here:
matlab: remove duplicate values
(3 answers)
Closed 9 years ago.
I had a question and got an answer yesterday about removing doubling rows in a matrix, and I can't figure out why it omits certain rows in a matrix.
With a matrix:
tmp2 =
0 1.0000
0.1000 1.0000
0.2000 1.0000
0.3000 1.0000
0.3000 2.0000
0.4000 2.0000
0.5000 2.0000
0.6000 2.0000
0.7000 2.0000
0.7000 3.0000
0.8000 3.0000
0.9000 3.0000
1.0000 3.0000
1.1000 3.0000
1.2000 3.0000
I need to remove the rows:
0.3000 2.0000
0.7000 3.0000
I tried to do it with
[~,b] = unique(tmp2(:,1));
tmp2(b,:)
I wrote something on my own
tmp3 = [];
for i=1:numel(tmp2(:,1))-1
if tmp2(i,1) == tmp3
tmp2(i,:) = [];
end
tmp3 = tmp2(i,1);
end
But all of the methods seem to omit the first row to remove... Please help, as I already spent some hours trying to fix it myself (I suck at programming...) and nothings seems to work. The matrix is an example, but generally if two rows have the same value in the first column I have to remove the second one
You were on the right track...
tmp2 = [...
0 1
1 1
2 3
2 5
3 5
4 7
5 4
5 8
6 1
];
Now call unique like you did, but use the flag first to grab the first unique:
[~,li]=unique(tmp2(:,1),'first');
tmp_unique = tmp2(li,:);