Store values from exisiting matrix into new matrix - matlab

I have a matrix containing 8 cols and 80k rows. (From an excel file)
Each row has an ID.
I want to store all data with ID no. 1 in a new matrix. And all data with ID no. 2 in a second matrix etc. So each time an ID changes I want to save all the data of a new ID in a new matrix.
There are above 800 ID's.
Ive tried several things w/o luck. Among others:
k = zeros(117,8)
for i =1:80000
k(i) = i + Dataset(1:i,:)
end
The above was only to see if I actually could get the first 117 rows saved in another matrix which didnt succeed.

If one of the 8 columns contains the ID then you can use logical indexing. For example if column 1 contains the ID, we can first find a list of all different ID values:
uniqueIDs = unique(Dataset(:, 1));
Then we can create cell array, with the lists of items of a given ID:
listsByID = cell(length(uniqueIDs), 1);
for idx = 1:length(uniqueIDs)
listsByID{idx} = Dataset(Dataset(:, 1) == uniqueIDs(idx), :);
end
Running the above on an example dataset:
Dataset = [1 0.1 10
1 0.2 20
2 0.3 30
3 0.4 40
2 0.5 50
2 0.6 60];
Results in:
1.0000 0.1000 10.0000
1.0000 0.2000 20.0000
2.0000 0.3000 30.0000
2.0000 0.5000 50.0000
2.0000 0.6000 60.0000
3.0000 0.4000 40.0000

Related

Matlab: Removing duplicate interactions [duplicate]

This question already has answers here:
How can I find unique rows in a matrix, with no element order within each row?
(4 answers)
Closed 7 years ago.
I have a Protein-Protein interaction data of homo sapiens. The size of the matrix is <4850628x3>. The first two columns are proteins and the third is its confident score. The problem is half the rows are duplicate pairs
if protein A interacts with B, C, D. it is mentioned as
A B 0.8
A C 0.5
A D 0.6
B A 0.8
C A 0.5
D A 0.6
If you observe the confident score of A interacting with B and B interacting with A is 0.8
If I have a matrix of <4850628x3> half the rows are duplicate pairs. If I choose Unique(1,:) I might loose some data.
But I want <2425314x3> i.e without duplicate pairs. How can I do it efficiently?
Thanks
Naresh
Supposing that in your matrix you store each protein with a unique id.
(Eg: A=1, B=2, C=3...) your example matrix will be:
M =
1.0000 2.0000 0.8000
1.0000 3.0000 0.5000
1.0000 4.0000 0.6000
2.0000 1.0000 0.8000
3.0000 1.0000 0.5000
4.0000 1.0000 0.6000
You must first sort the two first columns row-wise so you will always have the protein pairs in the same order:
M2 = sort(M(:,1:2),2)
M2 =
1 2
1 3
1 4
1 2
1 3
1 4
Then use unique with the second parameter rows and keep the indexes of unique pairs:
[~, idx] = unique(M2, 'rows')
idx =
1
2
3
Finally filter your initial matrix to keep unly the unique pairs.
R = M(idx,:)
R =
1.0000 2.0000 0.8000
1.0000 3.0000 0.5000
1.0000 4.0000 0.6000
Et voilĂ !

Find the same values in another column in matlab

i want to find same values of number in different column,
for example i have a matrix array:
A = [1 11 0.17
2 1 78
3 4 90
45 5 14
10 10 1]
so as you can see no. 1 in column 1 have the same values in column 2 and column 3, so i want to pick that number and put into another cell or matrix cell
B= [1]
and perform another operation C/B, letting C is equal to:
C= [1
3
5
7
9]
and you will have:
D= [1 11 0.17 1
2 1 78 3
3 4 90 5
45 5 14 7
10 10 1 9]
then after that, values in column 4 have equivalent numbers that we can define, but we will choose only those number that have number 1, or B in theirs row
define:
1-->23
3 -->56
9 --> 78
then we have, see image below:
so how can i do that? is it possible? thanks
Let's tackle your problem into steps.
Step #1 - Determine if there is a value shared by all columns
We can do this intelligently by bsxfun, unique, permute and any and all.
We first need to use unique so that we can generate all possible unique values in the matrix A. Once we do this, we can look at each value of the unique values and see if all columns in A contain this value. If this is the case, then this is the number we need to focus on.
As such, do something like this first:
Aun = unique(A);
eqs_mat = bsxfun(#eq, A, permute(Aun, [3 2 1]));
eqs_mat would generate a 3D matrix where each slice figures out where a particular value in the unique array appeared. As such, for each slice, each column will have a bunch of false values but at least one true value where this true value tells you the position in the column that matched a unique value. The next thing you'll want to do is go through each slice of this result and determine whether there is at least one non-zero value for each column.
For a value to be shared along all columns, a slice should have a non-zero value per column.
We can eloquently determine which value we need to extract by:
ind = squeeze(all(any(eqs_mat,1),2));
Given your example data, we have this for our unique values:
>> B
B =
0.1700
1.0000
2.0000
3.0000
4.0000
5.0000
10.0000
11.0000
14.0000
45.0000
78.0000
90.0000
Also, the last statement I executed above gives us:
>> ind
ind =
0
1
0
0
0
0
0
0
0
0
0
0
The above means that the second location of the unique array is the value we want, and this corresponds to 1. Therefore, we can extract the particular value we want by:
val = Aun(ind);
val contains the value that is shared along all columns.
Step #2 - Given the value B, take a vector C and divide by B.
That's pretty straight forward. Make sure that C is the same size as the total number of rows as A, so:
C = [1 3 5 7 9].';
B = val;
col = C / B;
Step #3 - For each location in A that shares the common value, we want to generate a new fifth column that gives a new value for each corresponding row.
You can do that by declaring a vector of... say... zeroes, then find the right rows that share the common value and replace the values in this fifth column with the values you want:
zer = zeros(size(A,1), 1);
D = [23; 56; 78];
ind2 = any(A == val, 2);
zer(ind2) = D;
%// Create final matrix
fin = [A col zer];
We finally get:
>> fin
fin =
1.0000 11.0000 0.1700 1.0000 23.0000
2.0000 1.0000 78.0000 3.0000 56.0000
3.0000 4.0000 90.0000 5.0000 0
45.0000 5.0000 14.0000 7.0000 0
10.0000 10.0000 1.0000 9.0000 78.0000
Take note that you need to make sure that what you're assigning to the fifth column is the same size as the total number of columns in A.

Generating a grid in matlab with a general number of dimensions

Problem
I have a vector w containing n elements. I do not know n in advance.
I want to generate an n-dimensional grid g whose values range from grid_min to grid_max and obtain the "dimension-wise" product of w and g.
How can I do this for an arbitrary n?
Examples
For simplicity, let's say that grid_min = 0 and grid_max = 5.
Case: n=1
>> w = [0.75];
>> g = 0:5
ans =
0 1 2 3 4 5
>> w * g
ans =
0 0.7500 1.5000 2.2500 3.0000 3.7500
Case: n=2
>> w = [0.1, 0.2];
>> [g1, g2] = meshgrid(0:5, 0:5)
g1 =
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
g2 =
0 0 0 0 0 0
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
>> w(1) * g1 + w(2) * g2
ans =
0 0.1000 0.2000 0.3000 0.4000 0.5000
0.2000 0.3000 0.4000 0.5000 0.6000 0.7000
0.4000 0.5000 0.6000 0.7000 0.8000 0.9000
0.6000 0.7000 0.8000 0.9000 1.0000 1.1000
0.8000 0.9000 1.0000 1.1000 1.2000 1.3000
1.0000 1.1000 1.2000 1.3000 1.4000 1.5000
Now suppose a user passes in the vector w and we do not know how many elements (n) it contains. How can I create the grid and obtain the product?
%// Data:
grid_min = 0;
grid_max = 5;
w = [.1 .2 .3];
%// Let's go:
n = numel(w);
gg = cell(1,n);
[gg{:}] = ndgrid(grid_min:grid_max);
gg = cat(n+1, gg{:});
result = sum(bsxfun(#times, gg, shiftdim(w(:), -n)), n+1);
How this works:
The grid (variable gg) is generated with ndgrid, using as output a comma-separated list of n elements obtained from a cell array. The resulting n-dimensional arrays (gg{1}, gg{2} etc) are contatenated along the n+1-th dimension (using cat), which turns gg into an n+1-dimensional array. The vector w is reshaped into the n+1-th dimension (shiftdim), multiplied by gg using bsxfun, and the results are summed along the n+1-th dimension.
Edit:
Following #Divakar's insightful comment, the last line can be replaced by
sz_gg = size(gg);
result = zeros(sz_gg(1:end-1));
result(:) = reshape(gg,[],numel(w))*w(:);
which results in a significant speedup, because Matlab is even better at matrix multiplication than at bsxfun (see for example here and here).

MATLAB Remove rows with second/third occurence of duplicate index

I have a large matrix with two columns. First is an index, second is data. Some indices are repeated. How can I retain only the first instance of rows with repeated indices?
For Example:
x =
1 5.5
1 4.5
2 4
3 2.5
3 3
4 1.5
to end up with:
ans =
1 5.5
2 4
3 2.5
4 1.5
I've tried various variations and iterations of
[Uy, iy, yu] = unique(x(:,1));
[q, t] = meshgrid(1:size(x, 2), yu);
totals = accumarray([t(:), q(:)], x(:));
but nothing so far has given me the output I need.
Use the 'first' tag in the unique function and then the second output supplies you with the row indices you want which you can use to 'filter' your matrix.
[~, ind] = unique(x(:,1), 'first');
ans = x(ind, :)
ans =
1.0000 5.5000
2.0000 4.0000
3.0000 2.5000
4.0000 1.5000
EDIT
or as Jonas points out (esp for old Matlab releases)
[~, ind] = unique(flipud(x(:,1)));
ans = x(flipud(ind), :)

How to apply indices I got from one row to other rows in matlab?

Let say if I have this data
my_data = [ 10 20 30 40; 0.1 0.7 0.4 0.3; 6 1 2 3; 2 5 4 2];
my_index = logical(my_data(4,:)==2);
What is the simplest way to use 'my_index' to give this output
10.0000 40.0000
0.1000 0.3000
6.0000 3.0000
2.0000 2.0000
my_data(:,my_index)
but I'm suspicious that this is so simple that it doesn't satisfy your (background) requirements ...