MATLAB: Multiple occurences data in the pdist function: how? - matlab

MATLAB: we have data that give x y coordinates of occurences. Often there are more then 1 occurences on one location. This generates a problem when we want to use this data in the function pdist. We tried to change the data into a list with x y coordinates for EACH occurence, but havent managed so far. Does anyone have a solution for this?
% initial values m = 0;
[X,Y,People,Positive] = textread('Loaloa_data.txt', '%f%f%f%f', 'headerlines', 1);
LoaData = [X,Y,People, Positive];
%calculate percentage of infected people per location
Percentage_per_Location = (Positive(:)./People(:))*100;
%calculate percentage of total infected people per location
TotPos = sum(Positive);
Percentage_of_Total = Positive(:)./TotPos*100;
LoaData = [X,Y,People,Positive,Percentage_per_Location, Percentage_of_Total];
%Make a new frequency table in which every value of Positive has a x and y
%value.
% first make a new matrix with x and y coordinated and positive values.
LengthColumn = (1:197)';
Matrix1 = [LengthColumn,X,Y,Positive];
Freq = zeros(sum(Positive(:)),2) ;
% r = Matrix1(:,1)
% CumFreq = ([1: sum(Positives)]);
% Freq = Matrix1(:,4);
% xyData =zeros(length(Freq),2);
for i=1:length(Matrix1(:,1))
%for m<=TotPos
F = Matrix1(i,4)
for j =[1:F]
m = m+j
Freq(m,1:2) = Matrix1(i,2:3)
end
% end
end

Related

How to speed up runtime of code that searches for data between several arrays within a 'moving' sphere

I am trying to average my CFD data (which is in the form of a scalar N x M x P array; N corresponds to Y, M to x, and P to z) over a subset of time steps. I've tried to simplify the description of my desired averaging process below.
Rotate the grid at each time step by a specified angle (this is because the flow has a coherent structure that rotates and changes shape/size at each time step and I want to overlap them and find a time averaged form of the structure that takes into account the change of shape/size over time)
Drawing a sphere centered on the original unrotated grid
Identifying the grid points from all the rotated grids that lie within the sphere
Identify the indices of the grid points in each rotated grid
Use the indices to find the scalar data at the rotated grid points within the sphere
Take an average of the values within the sphere
Put that new averaged value at the location on the unrotated grid
I have a code that seems to do what I want correctly, but it takes far too long to finish the calculations. I would like to make it run faster, and I am open to changing the code if necessary. Below is version of my code that works with a smaller version of the data.
x = -5:5; % x - position data
y = -2:.5:5; % y - position data
z = -5:5; % z - position data
% my grid is much bigger actually
[X,Y,Z] = meshgrid(x,y,z); % mesh for plotting data
dX = diff(x)'; dX(end+1) = dX(end); % x grid intervals
dY = diff(y)'; dY(end+1) = dY(end); % y grid intervals
dZ = diff(z)'; dZ(end+1) = dZ(end); % z grid intervals
TestPoints = combvec(x,y,z)'; % you need the Matlab Neural Network Toolbox to run this
dXYZ = combvec(dX',dY',dZ')';
% TestPoints is the unrotated grid
M = length(x); % size of grid x - direction
N = length(y); % size of grid y - direction
P = length(z); % size of grid z - direction
D = randi([-10,10],N,M,P,3); % placeholder for data for 3 time steps (I have more than 3, this is a subset)
D2{3,M*N*P} = [];
PosAll{3,2} = [];
[xSph,ySph,zSph] = sphere(50);
c = 0.01; % 1 cm
nu = 8e-6; % 8 cSt
s = 3*c; % span for Aspect Ratio 3
r_g = s/sqrt(3);
U_g = 110*nu/c; % velocity for Reynolds number 110
Omega = U_g/r_g; % angular velocity
T = (2*pi)/Omega; % period
dt = 40*T/1920; % time interval
DeltaRotAngle = ((2*pi)/T)*dt; % angle interval
timesteps = 121:123; % time steps 121, 122, and 123
for ti=timesteps
tj = find(ti==timesteps);
Theta = ti*DeltaRotAngle;
Rotate = [cos(Theta),0,sin(Theta);...
0,1,0;...
-sin(Theta),0,cos(Theta)];
PosAll{tj,1} = (Rotate*TestPoints')';
end
for i=1:M*N*P
aa = TestPoints(i,1);
bb = TestPoints(i,2);
cc = TestPoints(i,3);
rs = 0.8*sqrt(dXYZ(i,1)^2 + dXYZ(i,2)^2 + dXYZ(i,3)^2);
handles.H = figure;
hs = surf(xSph*rs+aa,ySph*rs+bb,zSph*rs+cc);
[Fs,Vs,~] = surf2patch(hs,'triangle');
close(handles.H)
for ti=timesteps
tj = find(timesteps==ti);
f = inpolyhedron(Fs,Vs,PosAll{tj,1},'FlipNormals',false);
TestPointsR_ti = PosAll{tj,1};
PointsInSphere = TestPointsR_ti(f,:);
p1 = [aa,bb,cc];
p2 = [PointsInSphere(:,1),...
PointsInSphere(:,2),...
PointsInSphere(:,3)];
w = 1./sqrt(sum(...
(p2-repmat(p1,size(PointsInSphere,1),1))...
.^2,2));
D_ti = reshape(D(:,:,:,tj),M*N*P,1);
D2{tj,i} = [D_ti(f),w];
end
end
D3{M*N*P,1} = [];
for i=1:M*N*P
D3{i} = vertcat(D2{:,i});
end
D4 = zeros(M*N*P,1);
for i=1:M*N*P
D4(i) = sum(D3{i}(:,1).*D3{i}(:,2))/...
sum(D3{i}(:,2));
end
D_ta = reshape(D4,N,M,P);
I expect to get an N x M x P array where each index is the weighted average of all the points covering all of the time steps at that specific position in the unrotated grid. As you can see this is exactly what I get. The major problem however is the length of time it takes to do so when I use the larger set of my 'real' data. The code above takes only a couple minutes to run, but when M = 120, N = 24, and P = 120, and the number of time steps is 24 this can take much longer. Based on my estimates it would take approximately 25+ days to finish the entire calculation.
Nevermind, I can help you with the math. What you are trying to do here is find things inside a sphere. You have a well-defined sphere so this makes things easy. Just find the distance of all points from the center point. No need to plot or use inpolyhedron. Note line 66 where I modify the points by the center point of the sphere, compute the distance of these points, and compare to the radius of the sphere.
% x = -5:2:5; % x - position data
x = linspace(-5,5,120);
% y = -2:5; % y - position data
y = linspace(-2,5,24);
% z = -5:2:5; % z - position data
z = linspace(-5,5,120);
% my grid is much bigger actually
[X,Y,Z] = meshgrid(x,y,z); % mesh for plotting data
dX = diff(x)'; dX(end+1) = dX(end); % x grid intervals
dY = diff(y)'; dY(end+1) = dY(end); % y grid intervals
dZ = diff(z)'; dZ(end+1) = dZ(end); % z grid intervals
TestPoints = combvec(x,y,z)'; % you need the Matlab Neural Network Toolbox to run this
dXYZ = combvec(dX',dY',dZ')';
% TestPoints is the unrotated grid
M = length(x); % size of grid x - direction
N = length(y); % size of grid y - direction
P = length(z); % size of grid z - direction
D = randi([-10,10],N,M,P,3); % placeholder for data for 3 time steps (I have more than 3, this is a subset)
D2{3,M*N*P} = [];
PosAll{3,2} = [];
[xSph,ySph,zSph] = sphere(50);
c = 0.01; % 1 cm
nu = 8e-6; % 8 cSt
s = 3*c; % span for Aspect Ratio 3
r_g = s/sqrt(3);
U_g = 110*nu/c; % velocity for Reynolds number 110
Omega = U_g/r_g; % angular velocity
T = (2*pi)/Omega; % period
dt = 40*T/1920; % time interval
DeltaRotAngle = ((2*pi)/T)*dt; % angle interval
timesteps = 121:123; % time steps 121, 122, and 123
for ti=timesteps
tj = find(ti==timesteps);
Theta = ti*DeltaRotAngle;
Rotate = [cos(Theta),0,sin(Theta);...
0,1,0;...
-sin(Theta),0,cos(Theta)];
PosAll{tj,1} = (Rotate*TestPoints')';
end
tic
for i=1:M*N*P
aa = TestPoints(i,1);
bb = TestPoints(i,2);
cc = TestPoints(i,3);
rs = 0.8*sqrt(dXYZ(i,1)^2 + dXYZ(i,2)^2 + dXYZ(i,3)^2);
% handles.H = figure;
% hs = surf(xSph*rs+aa,ySph*rs+bb,zSph*rs+cc);
% [Fs,Vs,~] = surf2patch(hs,'triangle');
% close(handles.H)
for ti=timesteps
tj = find(timesteps==ti);
% f = inpolyhedron(Fs,Vs,PosAll{tj,1},'FlipNormals',false);
f = sqrt(sum((PosAll{tj,1}-[aa,bb,cc]).^2,2))<rs;
TestPointsR_ti = PosAll{tj,1};
PointsInSphere = TestPointsR_ti(f,:);
p1 = [aa,bb,cc];
p2 = [PointsInSphere(:,1),...
PointsInSphere(:,2),...
PointsInSphere(:,3)];
w = 1./sqrt(sum(...
(p2-repmat(p1,size(PointsInSphere,1),1))...
.^2,2));
D_ti = reshape(D(:,:,:,tj),M*N*P,1);
D2{tj,i} = [D_ti(f),w];
end
if ~mod(i,10)
toc
end
end
D3{M*N*P,1} = [];
for i=1:M*N*P
D3{i} = vertcat(D2{:,i});
end
D4 = zeros(M*N*P,1);
for i=1:M*N*P
D4(i) = sum(D3{i}(:,1).*D3{i}(:,2))/...
sum(D3{i}(:,2));
end
D_ta = reshape(D4,N,M,P);
In terms of runtime, on my computer, the old code takes 57 hours to run. The new code takes 2 hours. At this point, the main calculation is the distance so I doubt you'll get much better.

Plotting a collection of sine waves

I have the following code:
Fs = 1000;
T = 1/Fs;
L = 1000;
t = (0:L-1)*T;
k = 25:1:50;
m = 1:1:25;
where k and m are corresponding. I want to plot the 25 sine waves resulting from:
x = m*sin(2*pi*k*t);
I thought about doing it using a for loop that takes one value from m and k each time, but I'm unsure how to proceed.
Below is a very basic plotting solution. You will notice that it's very difficult to see what's going on in the plot, so you might want to consider other ways to present this data.
function q45532082
Fs = 1000;
T = 1/Fs;
L = 1000;
t = (0:L-1)*T;
k = 26:1:50;
m = 1:1:25;
%% Plotting
assert(numel(m) == numel(k)); % We make sure that the number of elements is the same.
figure(); hold on; % "hold" is needed if you want to see all curves at the same time.
for ind1 = 1:numel(m)
plot(t,m(ind1)*sin(2*pi*k(ind1)*t));
end
This is the result:
Note that the number of elements in k and m in your code is different, so I had to change it.
Using the functionality of plot you can also plot all sine waves without a loop:
Fs = 1000;
T = 1/Fs;
L = 1000;
t = (0:L-1)*T;
k = 26:1:50;
m = 1:1:25;
x = m.*sin(2.*pi.*bsxfun(#times,t.',k)); %this results in an L*25 matrix, each column is data of one wave
% or, if you have version 2016b or newer:
% x = m.*sin(2.*pi.*t.'*k);
plot(t,x) % plot all sines at ones
and as #Dev-iL noted, I also had to change k.
The result with L = 1000 is too crowded, so I plot it here with L = 50:

Count the number of unique values for each column of a submatrix in a fast manner

I have a matrix X with tens of rows and thousands of columns, all elements are categorical and re-organized to an index matrix. For example, ith column X(:,i) = [-1,-1,0,2,1,2]' is converted to X2(:,i) = ic of [x,ia,ic] = unique(X(:,i)), for convenient use of function accumarray. I randomly selected a submatrix from the matrix and counted the number of unique values of each column of the submatrix. I performed this procedure 10,000 times. I know several methods for counting number of unique values in a column, the fasted way I found so far is shown below:
mx = max(X);
for iter = 1:numperm
for j = 1:ny
ky = yrand(:,iter)==uy(j);
% select submatrix from X where all rows correspond to rows in y that y equals to uy(j)
Xk = X(ky,:);
% specify the sites where to put the number of each unique value
mxj = mx*(j-1);
mxi = mxj+1;
mxk = max(Xk)+mxj;
% iteration to count number of unique values in each column of the submatrix
for i = 1:c
pxs(mxi(i):mxk(i),i) = accumarray(Xk(:,i),1);
end
end
end
This is a way to perform random permutation test to calculate information gain between a data matrix X of size n by c and categorical variable y, under which y is randomly permutated. In above codes, all randomly permutated y are stored in matrix yrand, and the number of permutations is numperm. The unique values of y are stored in uy and the unique number is ny. In each iteration of 1:numperm, submatrix Xk is selected according to the unique element of y and number of unique elements in each column of this submatrix is counted and stored in matrix pxs.
The most time costly section in the above code is the iterations of i = 1:c for large c.
Is it possible to perform the function accumarray in a matrix manner to avoid for loop? How else can I improve the above code?
-------
As requested, a simplified test function including above codes is provided as
%% test
function test(x,y)
[r,c] = size(x);
x2 = x;
numperm = 1000;
% convert the original matrix to index matrix for suitable and fast use of accumarray function
for i = 1:c
[~,~,ic] = unique(x(:,i));
x2(:,i) = ic;
end
% get 'numperm' rand permutations of y
yrand(r, numperm) = 0;
for i = 1:numperm
yrand(:,i) = y(randperm(r));
end
% get statistic of y
uy = unique(y);
nuy = numel(uy);
% main iterations
mx = max(x2);
pxs(max(mx),c) = 0;
for iter = 1:numperm
for j = 1:nuy
ky = yrand(:,iter)==uy(j);
xk = x2(ky,:);
mxj = mx*(j-1);
mxk = max(xk)+mxj;
mxi = mxj+1;
for i = 1:c
pxs(mxi(i):mxk(i),i) = accumarray(xk(:,i),1);
end
end
end
And a test data
x = round(randn(60,3000));
y = [ones(30,1);ones(30,1)*-1];
Test the function
tic; test(x,y); toc
return Elapsed time is 15.391628 seconds. in my computer. In the test function, 1000 permutations is set. So if I perform 10,000 permutation and do some additional computations (are negligible comparing to the above code), time more than 150 s is expected. I think whether the code can be improved. Intuitively, perform accumarray in a matrix manner can save lots of time. Can I?
The way suggested by #rahnema1 has significantly improved the calculations, so I posted my answer here, as also requested by #Dev-iL.
%% test
function test(x,y)
[r,c] = size(x);
x2 = x;
numperm = 1000;
% convert the original matrix to index matrix for suitable and fast use of accumarray function
for i = 1:c
[~,~,ic] = unique(x(:,i));
x2(:,i) = ic;
end
% get 'numperm' rand permutations of y
yrand(r, numperm) = 0;
for i = 1:numperm
yrand(:,i) = y(randperm(r));
end
% get statistic of y
uy = unique(y);
nuy = numel(uy);
% main iterations
mx = max(max(x2));
% preallocation
pxs(mx*nuy,c) = 0;
% set the edges of the bin for function histc
binrg = (1:mx)';
% preallocation of the range of matrix into which the results will be stored
mxr = mx*(0:nuy);
for iter = 1:numperm
yt = yrand(:,iter);
for j = 1:nuy
pxs(mxr(j)+1:mxr(j),:) = histc(x2(yt==uy(j)),binrg);
end
end
Test results:
>> x = round(randn(60,3000));
>> y = [ones(30,1);ones(30,1)*-1];
>> tic; test(x,y); toc
Elapsed time is 15.632962 seconds.
>> tic; test(x,y); toc % using the way suggested by rahnema1, i.e., revised function posted above
Elapsed time is 2.900463 seconds.

MATLAB: find peaks from data iterations

I have a function that plots the magnitude of an fft function from a signal.
For every iteration I want to determine the x-value of the two peaks below 2000. I thought this was relatively simple using the function findpeaks however it has not given me the correct output.
I do not intend to plot the ouput, but just for illustration purposes here is a plot. I only want to know the peaks for the data below 2000 (the first set of peaks)
Example of one iteration:
Here is a bit of my code. B is a vector containing the starting indices for every segment of data that needs to be analysed.
function [number] = fourir_(data,sampling_rate)
%Finds the approximate starting index of every peak segment
%B is a vector containing the indeces
[A,B] = findpeaks(double(abs(data) > 0.6), 'MinPeakDistance', 2500);
Fs = sampling_rate;
t = 0:1/Fs:0.25;
C = zeros(size(B),2)
for i = 1:numel(B)
new_data = data(B(i):(B(i)+200))
y = double(new_data)/max(abs(new_data));
n = length(y);
p = abs(fft(y));
f = (0:n-1)*(Fs/n);
end
Example data: https://www.dropbox.com/s/zxypn3axoqwo2g0/signal%20%281%29.mat?dl=0
Here is your answer, this is exactly what #Ed Smith suggested in his first comment. You can just add a threshold in order to distinguish the major peak.
%Finds the approximate starting index of every peak segment
%B is a vector containing the indeces
[A,B] = findpeaks(double(abs(data) > 0.6), 'MinPeakDistance', 2500);
Fs = sampling_rate;
t = 0:1/Fs:0.25;
C = zeros(size(B),2)
for i = 1:numel(B)
new_data = data(B(i):(B(i)+200))
y = double(new_data)/max(abs(new_data));
n = length(y);
p = abs(fft(y));
f = (0:n-1)*(Fs/n);
p1 = p(1:round(length(p)/2));
p1(p1<10) = 0; %add a threshold
[~,ind] = findpeaks(p1); %index of where are the peaks
C(i,:) = f(ind);
hold on
plot(f,p,'b',C(i,:),p(ind),'ro')
end
The following may help, which seems to get the peaks from one fft of your signal data,
clear all
close all
%load sample data from https://www.dropbox.com/s/zxypn3axoqwo2g0/signal%20%281%29.mat?dl=0
load('./signal (1).mat')
%get an FFT and take half
p = abs(fft(signal));
p = p(1:length(p)/2);
%find peaks and plot
[pk, loc] = findpeaks(p,'MINPEAKHEIGHT',100,'MINPEAKDISTANCE',100);
plot(p,'k-')
hold all
plot(loc, pk, 'rx')
which looks like,
Where some of the peaks are isolated...

Multiplying a vector times the inverse of a matrix in Matlab

I have a problem multiplying a vector times the inverse of a matrix in Matlab. The code I am using is the following:
% Final Time
T = 0.1;
% Number of grid cells
N=20;
%N=40;
L=20;
% Delta x
dx=1/N
% define cell centers
%x = 0+dx*0.5:dx:1-0.5*dx;
x = linspace(-L/2, L/2, N)';
%define number of time steps
NTime = 100; %NB! Stability conditions-dersom NTime var 50 ville en fått helt feil svar pga lambda>0,5
%NTime = 30;
%NTime = 10;
%NTime = 20;
%NTime = 4*21;
%NTime = 4*19;
% Time step dt
dt = T/NTime
% Define a vector that is useful for handling teh different cells
J = 1:N; % number the cells of the domain
J1 = 2:N-1; % the interior cells
J2 = 1:N-1; % numbering of the cell interfaces
%define vector for initial data
u0 = zeros(1,N);
L = x<0.5;
u0(L) = 0;
u0(~L) = 1;
plot(x,u0,'-r')
grid on
hold on
% define vector for solution
u = zeros(1,N);
u_old = zeros(1,N);
% useful quantity for the discrete scheme
r = dt/dx^2
mu = dt/dx;
% calculate the numerical solution u by going through a loop of NTime number
% of time steps
A=zeros(N,N);
alpha(1)=A(1,1);
d(1)=alpha(1);
b(1)=0;
c(1)=b(1);
gamma(1,2)=A(1,2);
% initial state
u_old = u0;
pause
for j = 2:NTime
A(j,j)=1+2*r;
A(j,j-1)=-(1/dx^2);
A(j,j+1)=-(1/dx^2);
u=u_old./A;
% plotting
plot(x,u,'-')
xlabel('X')
ylabel('P(X)')
hold on
grid on
% update "u_old" before you move forward to the next time level
u_old = u;
pause
end
hold off
The error message I get is:
Matrix dimensions must agree.
Error in Implicit_new (line 72)
u=u_old./A;
My question is therefore how it is possible to perform u=u_old*[A^(-1)] in Matlab?
David
As knedlsepp said, v./A is the elementwise division, which is not what you wanted. You can use either
v/A provided that v is a row vector and its length is equal to the number of columns in A. The result is a row vector.
A\v provided that v is a column vector and its length is equal to the number of rows in A
The results differ only in shape: v/A is the transpose of A'\v'