How can I delete table row values in pairs? For example, if either column is less than 0.01, how do I delete the row? - matlab

I have two sets of data from different instruments that have common X-variables (XThompsons) but various Y-variables (YCounts) due to various experimental conditions. The data resemble the example below:
[Table1]
XThompsons | YCounts (1) | YCounts (2) | YCounts (3) | .... | ....
------------------------------------------------------------------
[Table2]
XThompsons | YCounts (1) | YCounts (2) | YCounts (3) | .... | ....
------------------------------------------------------------------
When I have two sets of data that are like this, I have written a script to take a single Y-column information from Table1 and do some math to all Y-columns in Table2. However, when comparing two table columns if either column has a value of a specific threshold (0.10) I want to delete that value. In the example below I want to delete row 4 and row 6 because either column has a value containing 0.10 or less
XThompsons | Table1.YCounts(1) | Table2.YCounts(2)
--------------------------------------------------
1 1.00 0.50
2 0.22 0.12
3 0.29 0.14
4 0.29 0.09 (delete row)
5 0.11 0.49
6 0.02 0.83 (delete row)
How can I carry this out in Matlab? My current code is below; I convert each table row to an array first. How can I make it so that if Y < 0.10 delete the row?
datax = readtable('table1.xls'); % Instrument 1
datay = readtable('table2.xls'); % Instrument 2
SIDATA = [];
for idx=2:width(datay);
% Read the indexed column of datax (instrument 1) then normalize to 1
x = table2array(datax(:,idx));
x = x ./ max(x);
% Read indexed column of datay (instrument 2) and carry out loop
for idy=2:width(datay);
% Normalize y data to 1
y = table2array(datay(:,idy));
y = y ./ max(y);
% Calculate similarity index (SI) at using the datax index for all collision energies for datay
xynum = sum(sqrt(x) .* sqrt(y));
xyden = sqrt(sum(x) .* sum(y));
SIDATA(idy,idx) = (xynum/xyden);
end
end
Help would be appreciated.
Thanks!

Generally when looping through and pruning values you want to increment from the end of the matrix back to one; this way, if you delete any rows, you don't skip. (If you delete row 2, then advance to row 3, you skip the data formerly in row 3).
To me, the easiest way to do this is that if all your data is in one matrix A, with columns Y1 Y2,
APruned = A((A(:,1) > 0.1) & (A(:,2) > 0.1),:)
This takes the A matrix, finds the rows where Y1 > 0.1, finds the rows where Y2 > 0.1, finds the overlap, and then outputs only the rows in A where both of these are true.
You should read about logical indecies for more on this topic
EDIT: It looks like you could also clean up your earlier code using element-wise operations;
A = [datax./max(datax) datay./max(datay)];

Related

Merge different sized arrays into a table

Overview
I am currently working with a series of .txt files I am importing into MATLAB. For simplicity, I'll show my problem conceptually. Obligatory, I'm new to MATLAB (or programming in general).
These .txt files contain data from tracking a ROI in a video (frame-by-frame) with time ('t') in the first column and velocity ('v') in the second as shown below;
T1 = T2 = etc.
t v t v
0 NaN 0 NaN
0.1 100 0.1 200
0.2 200 0.2 500
0.3 400 0.3 NaN
0.4 150
0.5 NaN
Problem
Files differ in their size, the columns remain fixed but the rows vary from trial to trial as shown in T1 and T2.
The time column is the same for each of these files so I wanted to organise data in a table as follows;
time v1 v2 etc.
0 NaN NaN
0.1 100 200
0.2 200 500
0.3 400 NaN
0.4 150 0
0.5 NaN 0
Note that I want to add 0s (or NaN) to end of shorter trials to fix the issue of size differences.
Edit
Both solutions worked well for my dataset. I appreciate all the help!
You could import each file into a table using readtable and then use outerjoin to combine the tables in the way that you would expect. This will work if all data starts at t = 0 or not.
To create a table from a file:
T1 = readtable('filename1.dat');
T2 = readtable('filename2.dat');
Then to perform the outerjoin (pseudo data created for demonstration purposes).
t1 = table((1:4)', (5:8)', 'VariableNames', {'t', 'v'});
%// t v
%// _ _
%// 1 5
%// 2 6
%// 3 7
%// 4 8
% t2 is missing row 2
t2 = table([1;3;4], [1;3;4], 'VariableNames', {'t', 'v'});
%// t v
%// _ _
%// 1 1
%// 3 3
%// 4 4
%// Now perform an outer join and merge the key column
t3 = outerjoin(t1, t2, 'Keys', 't', 'MergeKeys', true)
%// t v_t1 v_t2
%// _ ____ ____
%// 1 5 1
%// 2 6 NaN
%// 3 7 3
%// 4 8 4
I would suggest the use of the padarray and horzcat functions. They respectively :
Pad a matrix or vector with extra data, effectively adding extra 0's or any specified value (NaNs work too).
Concatenate matrices or vectors horizontally.
First, try to obtain the length of the longest vector you have to concatenate. Let's call this value max_len. Once you have that, you can then pad each vector by doing :
v1 = padarray(v1, max_len - length(v1), 0, 'post');
% You can replace the '0' by any value you want !
Finally, once you have vectors of the same size, you can concatenate them using horzcat :
big_table = horzcat(v1, v2, ... , vn);

Matlab: arithmetic operation on columns inside a for loop (simple yet devious!)

I'm trying to represent a simple matrix m*n (let's assume it has only one row!) such that m1n1 = m1n1^1, m1n2 = m1n1^2, m1n3 = m1n1^3, m1n3 = m1n1^4, ... m1ni = m1n1^i.
In other words, I am trying to iterate over a matrix columns n times to add a new vector(column) at the end such that each of the indices has the same value as the the first vector but raised to the power of its column number n.
This is the original vector:
v =
1.2421
2.3348
0.1326
2.3470
6.7389
and this is v after the third iteration:
v =
1.2421 1.5429 1.9165
2.3348 5.4513 12.7277
0.1326 0.0176 0.0023
2.3470 5.5084 12.9282
6.7389 45.4128 306.0329
now given that I'm a total noob in Matlab, I really underestimated the difficulty of such a seemingly easy task, that took my almost a day of debugging and surfing the web to find any clue. Here's what I have come up with:
rows = 5;
columns = 3;
v = x(1:rows,1);
k = v;
Ncol = ones(rows,1);
extraK = ones(rows,1);
disp(v)
for c = 1:columns
Ncol = k(:,length(k(1,:))).^c; % a verbose way of selecting the last column only.
extraK = cat(2,extraK,Ncol);
end
k = cat(2,k,extraK);
disp(extraK(:,2:columns+1)) % to cut off the first column
now this code (for some weird reason) work only if rows = 6 or less, and columns = 3 or less.
when rows = 7, this is the output:
v = 1.0e+03 *
0.0012 0.0015 0.0019
0.0023 0.0055 0.0127
0.0001 0.0000 0.0000
0.0023 0.0055 0.0129
0.0067 0.0454 0.3060
0.0037 0.0138 0.0510
0.0119 0.1405 1.6654
How could I get it to run on any number of rows and columns?
Thanks!
I have found a couple of things wrong with your code:
I'm not sure as to why you are defining d = 3;. This is just nitpicking, but you can remove that from your code safely.
You are not doing the power operation properly. Specifically, look at this statement:
Ncol = k(:,length(k(1,:))).^c; % a verbose way of selecting the last column only.
You are selectively choosing the last column, which is great, but you are not applying the power operation properly. If I understand your statement, you wish to take the original vector, and perform a power operation to the power of n, where n is the current iteration. Therefore, you really just need to do this:
Ncol = k.^c;
Once you replace Ncol with the above line, the code should now work. I also noticed that you crop out the first column of your result. The reason why you are getting duplicate columns is because your for loop starts from c = 1. Since you have already computed v.^1 = v, you can just start your loop at c = 2. Change your loop starting point to c = 2, and you can get rid of the removal of the first column.
However, I'm going to do this in an alternative way in one line of code. Before we do this, let's go through the theory of what you're trying to do.
Given a vector v that is m elements long stored in a m x 1 vector, what you want is to have a matrix of size m x n, where n is the desired number of columns, and for each column starting from left to right, you wish to take v to the nth power.
Therefore, given your example from your third "iteration", the first column represents v, the second column represents v.^2, and the third column represents v.^3.
I'm going to introduce you to the power of bsxfun. bsxfun stands for Binary Singleton EXpansion function. What bsxfun does is that if you have two inputs where either or both inputs has a singleton dimension, or if either of both inputs has only one dimension which has value of 1, each input is replicated in their singleton dimensions to match the size of the other input, and then an element-wise operation is applied to these inputs together to produce your output.
For example, if we had two vectors like so:
A = [1 2 3]
B = [1
2
3]
Note that one of them is a row vector, and the other is a column vector. bsxfun would see that A and B both have singleton dimensions, where A has a singleton dimension being the number of rows being 1, and B having a singleton dimension which is the number of columns being 1. Therefore, we would duplicate B as many columns as there are in A and duplicate A for as many rows as there are in B, and we actually get:
A = [1 2 3
1 2 3
1 2 3]
B = [1 1 1
2 2 2
3 3 3]
Once we have these two matrices, you can apply any element wise operations to these matrices to get your output. For example, you could add, subtract, take the power or do an element wise multiplication or division.
Now, how this scenario applies to your problem is the following. What you are doing is you have a vector v, and you will have a matrix of powers like so:
M = [1 2 3 ... n
1 2 3 ... n
...........
...........
1 2 3 ... n]
Essentially, we will have a column of 1s, followed by a column of 2s, up to as many columns as you want n. We would apply bsxfun on the vector v which is a column vector, and another vector that is only a single row of values from 1 up to n. You would apply the power operation to achieve your result. Therefore, you can conveniently calculate your output by doing:
columns = 3;
out = bsxfun(#power, v, 1:columns);
Let's try a few examples given your vector v:
>> v = [1.2421; 2.3348; 0.1326; 2.3470; 6.7389];
>> columns = 3;
>> out = bsxfun(#power, v, 1:columns)
out =
1.2421 1.5428 1.9163
2.3348 5.4513 12.7277
0.1326 0.0176 0.0023
2.3470 5.5084 12.9282
6.7389 45.4128 306.0321
>> columns = 7;
>> format bank
>> out = bsxfun(#power, v, 1:columns)
out =
Columns 1 through 5
1.24 1.54 1.92 2.38 2.96
2.33 5.45 12.73 29.72 69.38
0.13 0.02 0.00 0.00 0.00
2.35 5.51 12.93 30.34 71.21
6.74 45.41 306.03 2062.32 13897.77
Columns 6 through 7
3.67 4.56
161.99 378.22
0.00 0.00
167.14 392.28
93655.67 631136.19
Note that for setting the columns to 3, we get what we see in your post. For pushing the columns up to 7, I had to change the way the numbers were presented so you can see the numbers clearly. Not doing this would put this into exponential form, and there were a lot of zeroes that followed the significant digits.
Good luck!
When computing cumulative powers you can reuse previous results: for scalar x and n, x.^n is equal to x * x.^(n-1), where x.^(n-1) has been already obtained. This may be more efficient than computing each power independently, because multiplication is faster than power.
Let N be the maximum exponent desired. To use the described approach, the column vector v is repeated N times horizontally (repmat), and then a cumulative product is applied along each row (cumprod):
v =[1.2421; 2.3348; 0.1326; 2.3470; 6.7389]; %// data. Column vector
N = 3; %// maximum exponent
result = cumprod(repmat(v, 1, N), 2);

Simulation of custom probability distribution in Matlab

I'm trying to simulate the following distribution:
a | 0 | 1 | 7 | 11 | 13
-----------------------------------------
p(a) | 0.34 | 0.02 | 0.24 | 0.29 | 0.11
I already simulated a similar problem: four type of balls with chances of 0.3, 0.1, 0.4 and 0.2. I created a vector F = [0 0.3 0.4 0.8 1] and used repmat to grow it by 1000 rows. Then I compared it with a columnvector of 1000 random numbers grown with 5 columns using the same repmat approach. I compared those two, calculated the sumvector of the matrix, and calculated the difference to get the frequences (e.g. [301 117 386 196]). .
But with the current distribution I don't know how to create the initial matrix F and whether I can use the same approach I used before at all.
I need the answer to be "vectorised", so no (for, while or if) loops.
This question on math.stackexchange
What if you create the following arrays:
largeNumber = 1000000;
a=repmat( [0], 1, largeNumber*0.34 );
b=repmat( [1], 1, largeNumber*0.02 );
% ...
e=repmat( [13], 1, largeNumber*0.11 );
Then you concatenate all of these arrays (to get a single array where your entries are represented with their corresponding probabilities), shuffle them, and extract the first N elements to get an N-dimensional vector drawn from your distribution.
EDIT: of course this answer is the way to go.

Ignoring similar columns when concating matrixes vertically

In matlab I have a 128 by n matrix, which we can call
[A B C]
where each letter is an 128 by 1 matrix.
So what I want to do is concat the above matrix with another matrix,
[A~ D E].
Where A~ is similar in its values to A.
What I want to get as the result of the concat would be:
[A B C D E],
where A~ is omitted.
What is the best way to do this? Note that I do not know beforehand that A~ is similar.
To clarify, my problem is how would I determine if two columns are similar? By similar I mean where between two columns, many of the row values are close in value.
Maybe an illustration would help as well
Vector A: [1 2 3 4 5 6 7 8 9]'
| | | | | | | | |
Vector B: [20 2.4 4 5 0 7 7 7.6 10]'
where there are some instances where the values are completely different, but for the most part the values are close. I don't have a defined threshold for this, but ideally it would be something that I could experiment with.
If you want to omit only identical columns, this is one way to do it:
%# Define the example matrices.
Matrix1 = [ 1 2 3; 4 5 6; 7 8 9 ]';
Matrix2 = [ 4 5 6; 7 8 10 ]';
%# Concatenate the matrices and keep only unique columns.
OutputMatrix = unique([ Matrix1, Matrix2 ]', 'rows')';
To solve this, a matching algorithm called vl_ubcmatch can be used.
[matches, scores] = vl_ubcmatch(da, db) ; For each descriptor in da,
vl_ubcmatch finds the closest descriptor in db (as measured by the L2
norm of the difference between them). The index of the original match
and the closest descriptor is stored in each column of matches and the
distance between the pair is stored in scores.
source:
http://www.vlfeat.org/overview/sift.html
Thus, the solution is to find the matched columns with the highest scores and eliminate them before concatenating.
I think it's pdist2 you need.
Consider the following example:
>> X = rand(25, 5);
>> Y = rand(100, 5);
>> Y(22, : ) = 0.99*X(22,:);
>> D = pdist2(X,Y, 'euclidean');
>> [~,ind] = min(D(:));
>> [i,j]=ind2sub(size(D),ind)
i =
22
j =
22
which is indeed the entry we manipulated to be similar. Read help pdist2 or doc pdist2 for more background.

Generating all combinations containing at least one element of a given set in Matlab

I use combnk to generate a list of combinations. How can I generate a subset of combinations, which always includes particular values. For example, for combnk(1:10, 2) I only need combinations which contain 3 and/or 5. Is there a quick way to do this?
Well, in your specific example, choosing two integers from the set {1, ..., 10} such that one of the chosen integers is 3 or 5 yields 9+9-1 = 17 known combinations, so you can just enumerate them.
In general, to find all of the n-choose-k combinations from integers {1, ..., n} that contain integer m, that is the same as finding the (n-1)-choose-(k-1) combinations from integers {1, ..., m-1, m+1, ..., n}.
In matlab, that would be
combnk([1:m-1 m+1:n], k-1)
(This code is still valid even if m is 1 or n.)
For a brute force solution, you can generate all your combinations with COMBNK then use the functions ANY and ISMEMBER to find only those combinations that contain one or more of a subset of numbers. Here's how you can do it using your above example:
v = 1:10; %# Set of elements
vSub = [3 5]; %# Required elements (i.e. at least one must appear in the
%# combinations that are generated)
c = combnk(v,2); %# Find pairwise combinations of the numbers 1 through 10
rowIndex = any(ismember(c,vSub),2); %# Get row indices where 3 and/or 5 appear
c = c(rowIndex,:); %# Keep only combinations with 3 and/or 5
EDIT:
For a more elegant solution, it looks like Steve and I had a similar idea. However, I've generalized the solution so that it works for both an arbitrary number of required elements and for repeated elements in v. The function SUBCOMBNK will find all the combinations of k values taken from a set v that include at least one of the values in the set vSub:
function c = subcombnk(v,vSub,k)
%#SUBCOMBNK All combinations of the N elements in V taken K at a time and
%# with one or more of the elements in VSUB as members.
%# Error-checking (minimal):
if ~all(ismember(vSub,v))
error('The values in vSub must also be in v.');
end
%# Initializations:
index = ismember(v,vSub); %# Index of elements in v that are in vSub
vSub = v(index); %# Get elements in v that are in vSub
v = v(~index); %# Get elements in v that are not in vSub
nSubset = numel(vSub); %# Number of elements in vSub
nElements = numel(v); %# Number of elements in v
c = []; %# Initialize combinations to empty
%# Find combinations:
for kSub = max(1,k-nElements):min(k,nSubset)
M1 = combnk(vSub,kSub);
if kSub == k
c = [c; M1];
else
M2 = combnk(v,k-kSub);
c = [c; kron(M1,ones(size(M2,1),1)) repmat(M2,size(M1,1),1)];
end
end
end
You can test this function against the brute force solution above to see that it returns the same output:
cSub = subcombnk(v,vSub,2);
setxor(c,sort(cSub,2),'rows') %# Returns an empty matrix if c and cSub
%# contain exactly the same rows
I further tested this function against the brute force solution using v = 1:15; and vSub = [3 5]; for values of N ranging from 2 to 15. The combinations created were identical, but SUBCOMBNK was significantly faster as shown by the average run times (in msec) displayed below:
N | brute force | SUBCOMBNK
---+-------------+----------
2 | 1.49 | 0.98
3 | 4.91 | 1.17
4 | 17.67 | 4.67
5 | 22.35 | 8.67
6 | 30.71 | 11.71
7 | 36.80 | 14.46
8 | 35.41 | 16.69
9 | 31.85 | 16.71
10 | 25.03 | 12.56
11 | 19.62 | 9.46
12 | 16.14 | 7.30
13 | 14.32 | 4.32
14 | 0.14 | 0.59* #This could probably be sped up by checking for
15 | 0.11 | 0.33* #simplified cases (i.e. all elements in v used)
Just to improve Steve's answer : in your case (you want all combinations with 3 and/or 5) it will be
all k-1/n-2 combinations with 3 added
all k-1/n-2 combinations with 5 added
all k-2/n-2 combinations with 3 and 5 added
Easily generalized for any other case of this type.