How to vectorize this for-loop in Matlab? - matlab

This is perhaps a simple question. I have a vector and a matrix and want to make a new matrix based on some manipulation. I constructed the new matrix using for loop and I would like to know how can I write it with Vector operator that are likely faster.
d=[n x 1];
t= [n x n];
I want the new Delta matrix which is [n x n] as follows:
for i=1:39
for j=1:39
Delta(i,j)=d(i)-d(j)-t(i,j);
end
end
The result
[d (1) - d (1) - t( 1 ,1),d (1) - d (2) - t( 1 ,2), ... d(1) - d (39) - t( 1 ,39)
d (2) - d (1) - t( 2 ,1),d (2) - d (2) - t( 2 ,2), .... ,d (2) - d (39) - t( 2 ,39)
.
.
.
d (38) - d (1) - t( 38 ,1),d (38) - d (2) - t( 38 ,2), ... , d(38) -d (39)-t(38,39)
d (39) - d (1) - t( 39 ,1),d (39) - d (2) - t( 39 ,2), ..., d(39)- d (39)- t(39 ,39)]

You can use the efficient bsxfun -
Delta = bsxfun(#minus,d,d.') - t

Related

Create a sequence from vectors of start and end numbers

How is it possible to create a sequence if I have vectors of starting and ending numbers of the subsequences in a vectorized way in Matlab?
Example Input:
A=[12 20 34]
B=[18 25 37]
I want to get (spacing for clarity):
C=[12 13 14 15 16 17 18 20 21 22 23 24 25 34 35 36 37]
Assuming that A and B are sorted ascending and that B(i) < A(i+1) holds then:
idx = zeros(1,max(B)+1);
idx(A) = 1;
idx(B+1) = -1;
C = find(cumsum(idx))
To get around the issue mentioned by Dennis in the comments:
m = min(A)-1;
A = A-m;
B = B-m;
idx = zeros(1,max(B)+1);
idx(A) = 1;
idx(B+1) = -1;
C = find(cumsum(idx)) + m;
cumsum based approach for a generic case (negative numbers or overlaps) -
%// Positions in intended output array at which group shifts
intv = cumsum([1 B-A+1])
%// Values to be put at those places with intention of doing cumsum at the end
put_vals = [A(1) A(2:end) - B(1:end-1)]
%// Get vector of ones and put_vals
id_arr = ones(1,intv(end)-1)
id_arr(intv(1:end-1)) = put_vals
%// Final output with cumsum of id_arr
out = cumsum(id_arr)
Sample run -
>> A,B
A =
-2 -3 1
B =
5 -1 3
>> out
out =
-2 -1 0 1 2 3 4 5 -3 -2 -1 1 2 3
Benchmarking
Here's a runtime test after warming-up tic-toc to compare the various approaches listed to solve the problem for large sized A and B -
%// Create inputs
A = round(linspace(1,400000000,200000));
B = round((A(1:end-1) + A(2:end))/2);
B = [B A(end)+B(1)];
disp('------------------ Divakar Method')
.... Proposed approach in this solution
disp('------------------ Dan Method')
tic
idx = zeros(1,max(B)+1);
idx(A) = 1;
idx(B+1) = -1;
C = find(cumsum(idx));
toc, clear C idx
disp('------------------ Santhan Method')
tic
In = [A;B];
difIn = diff(In);
out1 = bsxfun(#plus, (0:max(difIn)).',A); %//'
mask = bsxfun(#le, (1:max(difIn)+1).',difIn+1); %//'
out1 = out1(mask).'; %//'
toc, clear out1 mask difIn In
disp('------------------ Itamar Method')
tic
C = cell2mat(cellfun(#(a,b){a:b},num2cell(A),num2cell(B)));
toc, clear C
disp('------------------ dlavila Method')
tic
C = cell2mat(arrayfun(#(a,b)a:b, A, B, 'UniformOutput', false));
toc
Runtimes
------------------ Divakar Method
Elapsed time is 0.793758 seconds.
------------------ Dan Method
Elapsed time is 2.640529 seconds.
------------------ Santhan Method
Elapsed time is 1.662889 seconds.
------------------ Itamar Method
Elapsed time is 2.524527 seconds.
------------------ dlavila Method
Elapsed time is 2.096454 seconds.
Alternative using bsxfun
%// Find the difference between Inputs
difIn = B - A;
%// Do colon on A to A+max(difIn)
out = bsxfun(#plus, (0:max(difIn)).',A); %//'
%// mask out which values you want
mask = bsxfun(#le, (1:max(difIn)+1).',difIn+1); %//'
%// getting only the masked values
out = out(mask).'
Results:
>> A,B
A =
-2 -4 1
B =
5 -3 3
>> out
out =
-2 -1 0 1 2 3 4 5 -4 -3 1 2 3
This is one option:
C = cell2mat(cellfun(#(a,b){a:b},num2cell(A),num2cell(B)));
You can do this:
C = cell2mat(arrayfun(#(a,b)a:b, A, B, "UniformOutput", false))
arrayfun(...) create all the subsequences taking pairs of A and B.
cell2mat is used to concatenate each subsequence.

Average a subset of a matrix in a loop in matlab

I work with an image that I consider as a matrix.
I want to turn a 800 x 800 matrix (A) into a 400 x 400 matrix (B) where the mean of 4 cells of the A matrix = 1 cell of the B matrix (I know this not a right code line) :
B[1,1] =mean2(A[1,1 + 1,2 + 2,1 + 2,2])
and so on for the whole matrix ...
B [1,2]=mean2(A[1,3 + 1,4 + 2,3 + 2,4 ])
I thought to :
1) Reshape the A matrix into a 2 x 320 000 matrix so I get the four cells I need to average next to each other and it is easier to deal with the row number afterwards.
Im4bis=reshape(permute(reshape(Im4,size(Im4,2),2,[]),[2,3,1]),2,[]);
2) Create a cell-array with the 4 cells I need to average (subsetted) and calculate the mean of it. That's where it doesn't work
I{1,160000}=ones,
for k=drange(1:2:319999)
for n=1:160000
I{n}=mean2(Im4bis(1:2,k:k+1));
end
end
I created an empty matrix of 400 x 400 cells (actually a vector of 1 x 160000) and I wanted to fill it with the mean but I get a matrix of 1 x 319 999 cells with one cell out of 2 empty.
Looking for light
My Input Image:
Method 1
Using mat2cell and cellfun
AC = mat2cell(A, repmat(2,size(A,1)/2,1), repmat(2,size(A,2)/2,1));
out = cellfun(#(x) mean(x(:)), AC);
Method 2
using im2col
out = reshape(mean(im2col(A,[2 2],'distinct')),size(A)./2);
Method 3
Using simple for loop
out(size(A,1)/2,size(A,2)/2) = 0;
k = 1;
for i = 1:2:size(A,1)
l = 1;
for j = 1:2:size(A,2)
out(k,l) = mean(mean(A(i:i+1,j:j+1)));
l = l+1;
end
k = k+1;
end
Test on input image:
A = rgb2gray(imread('inputImage.png'));
%// Here, You could use any of the method from any answers
%// or you could use the best method from the bench-marking tests done by Divakar
out = reshape(mean(im2col(A,[2 2],'distinct')),size(A)./2);
imshow(uint8(out));
imwrite(uint8(out),'outputImage.bmp');
Output Image:
Final check by reading the already written image
B = imread('outputImage.bmp');
>> whos B
Name Size Bytes Class Attributes
B 400x400 160000 uint8
Let A denote your matrix and
m = 2; %// block size: rows
n = 2; %// block size: columns
Method 1
Use blockproc:
B = blockproc(A, [m n], #(x) mean(x.data(:)));
Example:
>> A = magic(6)
A =
35 1 6 26 19 24
3 32 7 21 23 25
31 9 2 22 27 20
8 28 33 17 10 15
30 5 34 12 14 16
4 36 29 13 18 11
>> B = blockproc(A, [m n], #(x) mean(x.data(:)))
B =
17.7500 15.0000 22.7500
19.0000 18.5000 18.0000
18.7500 22.0000 14.7500
Method 2
If you prefer the reshaping way (which is probably faster), use this great answer to organize the matrix into 2x2 blocks tiled along the third dimension, average along the first two dimensions, and reshape the result:
T = permute(reshape(permute(reshape(A, size(A, 1), n, []), [2 1 3]), n, m, []), [2 1 3]);
B = reshape(mean(mean(T,1),2), size(A,1)/m, size(A,2)/n);
Method 3
Apply a 2D convolution (conv2) and then downsample. The convolution computes more entries than are really necessary (hence the downsampling), but on the other hand it can be done separably, which helps speed things up:
B = conv2(ones(m,1)/m, ones(1,n)/n ,A,'same');
B = B(m-1:m:end ,n-1:n:end);
One approach based on this solution using reshape, sum & squeeze -
sublen = 2; %// subset length
part1 = reshape(sum(reshape(A,sublen,[])),size(A,1)/sublen,sublen,[]);
out = squeeze(sum(part1,2))/sublen^2;
Benchmarking
Set #1
Here are the runtime comparisons for the approaches listed so far for a input datasize of 800x 800 -
%// Input
A = rand(800,800);
%// Warm up tic/toc.
for k = 1:50000
tic(); elapsed = toc();
end
disp('----------------------- With RESHAPE + SUM + SQUEEZE')
tic
sublen = 2; %// subset length
part1 = reshape(sum(reshape(A,sublen,[])),size(A,1)/sublen,sublen,[]);
out = squeeze(sum(part1,2))/sublen^2;
toc, clear sublen part1 out
disp('----------------------- With BLOCKPROC')
tic
B = blockproc(A, [2 2], #(x) mean(x.data(:))); %// [m n]
toc, clear B
disp('----------------------- With PERMUTE + MEAN + RESHAPE')
tic
m = 2;n = 2;
T = permute(reshape(permute(reshape(A, size(A, 1), n, []),...
[2 1 3]), n, m, []), [2 1 3]);
B = reshape(mean(mean(T,1),2), size(A,1)/m, size(A,2)/m);
toc, clear B T m n
disp('----------------------- With CONVOLUTION')
tic
m = 2;n = 2;
B = conv2(ones(m,1)/m, ones(1,n)/n ,A,'same');
B = B(m-1:m:end ,n-1:n:end);
toc, clear m n B
disp('----------------------- With MAT2CELL')
tic
AC = mat2cell(A, repmat(2,size(A,1)/2,1), repmat(2,size(A,2)/2,1));
out = cellfun(#(x) mean(x(:)), AC);
toc
disp('----------------------- With IM2COL')
tic
out = reshape(mean(im2col(A,[2 2],'distinct')),size(A)./2);
toc
Runtime results -
----------------------- With RESHAPE + SUM + SQUEEZE
Elapsed time is 0.004702 seconds.
----------------------- With BLOCKPROC
Elapsed time is 6.039851 seconds.
----------------------- With PERMUTE + MEAN + RESHAPE
Elapsed time is 0.006015 seconds.
----------------------- With CONVOLUTION
Elapsed time is 0.002174 seconds.
----------------------- With MAT2CELL
Elapsed time is 2.362291 seconds.
----------------------- With IM2COL
Elapsed time is 0.239218 seconds.
To make the runtimes more fair, we can use a number of trials of 1000 on top of the fastest three approaches for the same input datasize of 800 x 800, giving us -
----------------------- With RESHAPE + SUM + SQUEEZE
Elapsed time is 1.264722 seconds.
----------------------- With PERMUTE + MEAN + RESHAPE
Elapsed time is 3.986038 seconds.
----------------------- With CONVOLUTION
Elapsed time is 1.992030 seconds.
Set #2
Here are the runtime comparisons for a larger input datasize of 10000x 10000 for the fastest three approaches -
----------------------- With RESHAPE + SUM + SQUEEZE
Elapsed time is 0.158483 seconds.
----------------------- With PERMUTE + MEAN + RESHAPE
Elapsed time is 0.589322 seconds.
----------------------- With CONVOLUTION
Elapsed time is 0.307836 seconds.

Combinations from a given set without repetition

Suppose I have a matrix defined as follows
M = [C1 C2 C3 C4]
Where the C's are column vectors
I want some efficient (i.e. no for loops) way of producing A vector such that
ResultVec = [C1 C2;
C1 C3;
C1 C4;
C2 C3;
C2 C4;
C3 C4]
Thanks in advance!
That is, what nchoosek does:
M = [ 1 2 3 4 ];
R = nchoosek(M,2);
returns:
R =
1 2
1 3
1 4
2 3
2 4
3 4
I don't know if it's your intention but nchoosek is Matlabs implementation of The number of k-combinations from a given set S of n elements without repetition (Wikipedia)
The function nchoosek is performance wise not very efficient though. But there are equivalents on File Exchange, which are much(!!) faster and doing the same.
Just to make it clear, it's not just working for the fairly simple example above, and is not returning any indices. It directly transforms the matrix as desired.
M = [ 21 42 123 17 ];
returns:
R =
21 42
21 123
21 17
42 123
42 17
123 17
This is the simplest way I've come up with:
n = size(M, 2);
[j, i] = ind2sub([n n], find(~triu(ones(n))));
ResultVec = M(:, [i j]);
ResultVec = reshape(ResultVec, [], 2)

All combinations for matrix in matlab

I'm trying find all combinations of an n-by-n matrix without repetitions.
For example, I have a matrix like this:
A = [321 319 322; ...
320 180 130; ...
299 100 310];
I want the following result:
(321 180 310)
(321 130 100)
(319 320 310)
(319 139 299)
(322 320 100)
(322 180 299)
I have tried using ndgrid, but it takes the row or the column twice.
Here's a simpler (native) solution with perms and meshgrid:
N = size(A, 1);
X = perms(1:N); % # Permuations of column indices
Y = meshgrid(1:N, 1:factorial(N)); % # Row indices
idx = (X - 1) * N + Y; % # Convert to linear indexing
C = A(idx) % # Extract combinations
The result is a matrix, each row containing a different combination of elements:
C =
321 180 310
319 320 310
321 130 100
319 130 299
322 320 100
322 180 299
This solution can also be shortened to:
C = A((perms(1:N) - 1) * N + meshgrid(1:N, 1:factorial(N)))
ALLCOMB is the key to your question
E.g. I am not front of a MATLAB machine so, I took a sample from web.
x = allcomb([1 3 5],[-3 8],[],[0 1]) ;
ans
1 -3 0
1 -3 1
1 8 0
...
5 -3 1
5 8 0
5 8 1
You can use perms to permute the columns as follows:
% A is given m x n matrix
row = 1:size( A, 1 );
col = perms( 1:size( A, 2 ) );
B = zeros( size( col, 1 ), length( row )); % Allocate memory for storage
% Simple for-loop (this should be vectorized)
% for c = 1:size( B, 2 )
% for r = 1:size( B, 1 )
% B( r, c ) = A( row( c ), col( r, c ));
% end
% end
% Simple for-loop (further vectorization possible)
r = 1:size( B, 1 );
for c = 1:size( B, 2 )
B( r, c ) = A( row( c ), col( r, c ));
end

cut specific columns from several files and reshape using unix tools

I have several hundred files in a folder. Each of these file is a tab delimited text file that contain more than a million rows and 27 columns. From each file, I want to be able to extract only specific columns (say pull out only columns: 1,2,11,12,13). Columns 3:10 & 14:27 can be ignored. I want to be able to do this for all files in the folder (say 2300 files). The columns from each of the 2300 file looks like this..........
Sample.ID SNP.Name col3 col10 Sample.Index Allele1...Forward Allele2...Forward col14 ....col27
1234567890_A rs758676 - - 1 T T - ....col27
1234567890_A rs3916934 - - 1 T T - ....col27
1234567890_A rs2711935 - - 1 T C - ....col27
1234567890_A rs17126880 - - 1 - - - ....col27
1234567890_A rs12831433 - - 1 T T - ....col27
1234567890_A rs12797197 - - 1 T C - ....col27
The cut columns from the 2nd file may look like this....
Sample.ID SNP.Name col3 col10 Sample.Index Allele1...Forward Allele2...Forward col14 ....col27
1234567899_C rs758676 - - 100 T A - ....col27
1234567899_C rs3916934 - - 100 T T - ....col27
1234567899_C rs2711935 - - 100 T C - ....col27
1234567899_C rs17126880 - - 100 C G - ....col27
1234567899_C rs12831433 - - 100 T T - ....col27
1234567899_C rs12797197 - - 100 T C - ....col27
The cut columns from the 3rd file may look like this....
Sample.ID SNP.Name col3 col10 Sample.Index Allele1...Forward Allele2...Forward col14 ....col27
1234567999_F rs758676 - - 256 A A - ....col27
1234567999_F rs3916934 - - 256 T T - ....col27
1234567999_F rs2711935 - - 256 T C - ....col27
1234567999_F rs17126880 - - 256 C G - ....col27
1234567999_F rs12831433 - - 256 T T - ....col27
1234567999_F rs12797197 - - 256 C C - ....col27
The width of the Sample.ID, Sample.Index are the same in each file but can change between files. The value of Sample.ID is the same within each file but different between files. Each of the cut files have the same values under "SNP.Name" column. The Sample.Index column may sometimes be same from different file. The other two columns values (Allele1...Forward & Allele2...Forward) may change, and are pasted with " " sep under each SNP.Name for each Sample.ID.
I finally want to merge (tab-delemited) all the cut columns from the 2300 files into this format ......
Sample.Index Sample.ID rs758676 rs3916934 rs2711935 rs17126880 rs12831433 rs12797197
1 1234567890_A T T T T T C 0 0 T T T C
200 1234567899_C T A T T T C C G T T T C
256 1234567999_F A A T T T C C G T T C C
In simple terms I want to be able to convert a long format into wide format based on the Sample.ID column. This is similar to reshape function in R. I tried this with R and it runs out of memory and is really slow. Can anyone help with unix tools?
When reshape.sh was applied to 20 files... it produced a spurious "Samples line" in the output. The first 4 fields are featured here.
Sample.Index Sample.ID rs476542 rs7073746
1234567891_A 11 C C A G
1234567892_A 191 T C A G
1234567893_A 204 T C G G
1234567894_A 15 T C A G
1234567895_A 158 T T A A
1234567896_A 208 T C A A
1234567897_A 111 T T G G
1234567898_A 137 T C G G
1234567899_A 216 T C A G
1234567900_A 113 T C G G
1234567901_A 152 T C A G
1234567902_A 178 C C A A
1234567903_A 135 C C A A
1234567904_A 125 T C A A
1234567905_A 194 C C A A
1234567906_A 110 C C G G
1234567907_A 126 C C A A
Sample -
1234567908_A 169 C C G G
1234567909_A 173 C C G G
1234567910_A 168 T C A A
#!/bin/bash
awk '
BEGIN {
maxInd = length("Sample.Index")
maxID = length("Sample.ID")
}
FNR>11 && $2 ~ "^rs" {
SNP[$2]
key[$11,$1]
val[$2,$11,$1]=$12" "$13
maxInd = (len=length($11)) > maxInd ? len : maxInd
maxID = (len=length($1)) > maxID ? len : maxID
}
END {
printf("%-*s\t%*s\t", maxInd, "Sample.Index", maxID, "Sample.ID")
for (rs in SNP)
printf("%s\t", rs)
printf("\n")
for(pair in key) {
split(pair,a,SUBSEP)
printf("%-*s\t%*s\t", maxInd, a[1], maxID, a[2])
for(rs in SNP) {
ale = val[rs,a[1],a[2]]
out = ale == "- -" || ale == "" ? "0 0" : ale
printf("%*s\t", length(rs), out)
}
printf("\n")
}
}' DNA*.txt
Proof of Concept
$ ./reshapeDNA
Sample.Index Sample.ID rs2711935 rs10829026 rs3924674 rs2635442 rs715350 rs17126880 rs7037313 rs11983370 rs6424572 rs7055953 rs758676 rs7167305 rs12831433 rs2147587 rs12797197 rs3916934 rs11002902
11 1234567890_A T T 0 0 C C 0 0 0 0 T C 0 0 C C T G 0 0 C C 0 0 T C A G T T T C G G
111 1234567892_A T T T C C C 0 0 0 0 C C T C C C T T 0 0 C C 0 0 T T A A T T T T G G
1 1234567894_A T T 0 0 T C C C A G C C 0 0 C C 0 0 T C C C T T T T A G T T C C G G
12 1234567893_A T T 0 0 C C T C A A T C 0 0 C C 0 0 T T C C T G T C A G T T T C G G
15 1234567891_A T T C C C C 0 0 0 0 C C C C C C T T 0 0 C C 0 0 T C A G T T T T G G