Using polyarea to calculate the area of a subcycle - matlab

I was wondering how to use polyarea in MATLAB at different intervals. For example, I have disp=[1,2,3,4,5.....] and load = [3,4,5,6,7,8....]. I would like to calculate polyarea(disp,load) at every 40 rows (or intervals). disp and load are cyclic loading and displacement data, containing 1000+ rows like this. Any help is much appreciated!
EDIT 1: (based on m7913d's answer) It seems the code is somewhat not giving the answers appropriately. Is anything wrong with the code?
data=xlsread('RE.xlsx');
time=data(:,1);
load=data(:,2);
disp=data(:,3);
duration = 40;
n = length(disp); % number of captured samples
nCycles = floor(n/duration); % number of completed cycles
areas = zeros(nCycles, 1); % initialise output (area of each cycle)
for i=1:nCycles % loop over the cycles
range = (i-1)*duration + (1:duration); % calculate the indexes corresponding with the ith cycle
areas(i) = polyarea(disp(range), load(range)); % calculate the area of the ith cycle
end

Assuming each cycle has the same known duration (duration = 40), you can calculate the area of each cycle as follows:
duration = 40;
n = length(A); % number of captured samples
nCycles = floor(n/duration); % number of completed cycles
areas = zeros(nCycles, 1); % initialise output (area of each cycle)
for i=1:nCycles % loop over the cycles
range = (i-1)*duration + (1:duration); % calculate the indexes corresponding with the ith cycle
areas(i) = polyarea(A(range), B(range)); % calculate the area of the ith cycle
end
Further Reading
As this seems a basic question to me, it may be useful to have a look at the Getting Started tutorial of MATLAB.

Related

How to plot a vector in matlab with another vector as a parameter?

I am trying to optimize the speed of a function I am writing, and trying to use vectors as much as I can. I am new to Matlab and vectorization is sometimes understandable to me, but I would like some additional help. Here is my current code:
For note, the oracle() function represents a randomly shaped object, and if you input a 1x2 matrix, it will return whether or not the matrx (or in our case, x- y-coordinates) is inside the random object.
Code In Image
function area = MitchellLitvinov_areaCalc(n)
% create random coordinate vectors, with bounds from (3, 14)
x = rand(n, 1) * 11 + 3;
y = rand(n, 1) * 11 + 3;
% find every point that is inside of oracle
inOracle = oracle([x y]);
% calculate the proportion, and multiply total area by proportion to find area
% of oracle
numPointsInOracle = nnz(inOracle);
area = numPointsInOracle/n * (11*11);
% create variable to store number of points in the area, and create a
% matrix with size [numPoints, 2] to hold x and y values
oracleCoordinates = zeros(numPointsInOracle, 2);
% HERE IS WHERE I NEED ASSISTANCE!!!!!!!!!
% find the points that are in the oracle shape
index = 0; % start index at 0 % is the index of our oracleCoordinates matrix
for i = 1:n % have to go through every point again to get their index
% if point is inside oracle, increase index and record
% coordinates
if (inOracle(i) == 1) % see if point is in oracle
index = index + 1;
oracleCoordinates(index, 1) = x(i, 1);
oracleCoordinates(index, 2) = y(i, 1);
end
end
% plot all points inside the oracle
scatter(oracleCoordinates(:,1), oracleCoordinates(:,2))
title("Oracle Shape")
xlim([3, 14]);
ylim([3, 14]);
end
Yes, even with near maximum memory usage, the code will run fairly quickly. But I want it to be fully vectorized simply for speed reasons, and if I need to repurpose this code for imaging. Currently, to calculate the area I am using vectors, but to actually reproduce an image, I need to create a separate storage matrix, and manually use indexing/appending to then transfer over the points inside the oracle function. I was wondering if there were any direct "shortcuts" to make my plotting a bit faster.
You can use an array as the index to select certain items from another array. For example, using your variable names:
oracleCoordinates(:,1) = x(inOracle == 1);
oracleCoordinates(:,2) = y(inOracle == 1);
This should give the same result as the code in your question, without using a loop.

How to bins values and plot

I have a dataset with two columns, the first column is duration (length of time (e.g. 5 min) and the second column is firing rates. Is it possible to plot this in such a way that firing rates are binned according to corresponding duration (e.g. 5, 10, 15 min) and then plot bars with firing rate on the y axis and time on the x?
I'm sure this can be accomplished without the for loop. Solution below uses the discretize function to accomplish the grouping. Other approaches possible.
% MATLAB R2017a
% Sample data
D = 20*rand(25,1);
FR = 550*rand(25,1);
D_bins = (0:5:20)';
ind = discretize(D,D_bins); % groups data
FR_mean = zeros(length(D_bins),1);
for k = 1:length(D_bins)
FR_mean(k) = mean(FR(ind==k));
end
bar(D_bins,FR_mean) % bar plot
% Cosmetics
xlabel('Duration (min)')
ylabel('Mean Firing Rate (unit)')
I'm positive there's a more efficient way to get the means for each group, possibly using arrayfun or some other nifty functions, but will hold off until OP provides more details.

compute the average over simulation for different parameters values

I am working on something similar to the following example:
I want to compute <x(t)> that is the average of a function x(t) over number of simulations. To do this, I generate the following code:
sim=50;% number of simulations
t=linspace(0,1);% time interval
a_range=[1,2,3];% different values for the parameter a
b_range=[0,0.5,1];% different values for the parameter b
z=zeros(1,sim);
theta=zeros(1,sim);
for nplot=1:3
a=a_range(nplot);
b=b_range(nplot);
average_x=zeros(nplot,sim);
for i=1:sim
z(i)=rand(1);% random number for every simulation
theta(i)=pi*rand(1);% random number for every simulation
x=z(i)*t.^2+a*sin(theta(i))+b.*tan(theta(i));% the function
end
average_x(nplot,sim)=mean(x);% average over the number of simulations
end
fname=['xsin.mat'];
save(fname)
The time is a vector 1 by 100 and x is a vector 1 by 100, and average_x is 1 by 50. What I am looking for is to write a script to load the file and plot the average against time for different parameters a and b. So I want to write a code to generate three figures such that in figure 1 I will plot the average
plot(t,average_x)
for a=1 and b=0.
Then in figure 2 I will plot the average again but for a=2 and b=0.5 and so on. The problem is the dimension of time t and the average are not the same. How can I fix this problem and generate three distinct figures.
If I got your intention correctly, this is what you look for:
sim = 50;% number of simulations
t = linspace(0,1);% time interval
a_range = [1,2,3];% different values for the parameter a
b_range = [0,0.5,1];% different values for the parameter b
% NO NEED TO GENERATE THE RANDOM NUMBERS ONE BY ONE:
theta = pi*rand(sim,1);% random number for every simulation
z = rand(sim,1); % random number for every simulation
% YOU SOULD INITIALIZE ALL YOUR VARIABLES OUTSIDE THE LOOPS:
x = zeros(sim,numel(t));
average_x = zeros(3,numel(t));% the mean accross simulations
% for average accros time use:
% average_x = zeros(3,sim);
for nplot=1:3
a = a_range(nplot);
b = b_range(nplot);
for i=1:sim
x(i,:) = z(i)*t.^2+a*sin(theta(i))+b.*tan(theta(i));% the function
end
average_x(nplot,:) = mean(x); % average over the number of simulations
% average_x(nplot,:) = mean(x,2); % average accross time
end
% save the relevant variables:
save('results.mat','average_x','t')
In another file you can write:
load('results.mat')
for k = 1:size(average_x,1)
figure(k)
plot(t,average_x(k,:))
title(['Parameter set ' num2str(k)])
xlabel('Time')
ylabel('mean x')
end
This is the plot in one figure (if you want then average over simulations):
BTW, if you want to make your code more compact and fast, you can vectorize it, mainly using bsxfun. Here is a demonstration with your code:
% assuming all parameters are defined as above:
zt = bsxfun(#times,z,t.^2); % first part of the function 'z(i)*t.^2'
% second part of the function 'a*sin(theta(i)) + b.*tan(theta(i))':
ab = bsxfun(#times,a_range,sin(theta)) + bsxfun(#times,b_range,tan(theta));
% convert the second part to the right dimensions and size:
ab = repmat(reshape(ab,[],1,3),1,numel(t),1);
x = bsxfun(#plus,zt,ab); % the function
average_x = squeeze(mean(x)); % take the mean by simulation
plot(t,average_x) % plot it all at once, as in the figure above
xlabel('Time')
ylabel('mean x')

Sequentially reading down a matrix column in MATLAB

My original problem was to create a scenario whereby there is a line of a specific length (x=100), and a barrier at specific position (pos=50). Multiple rounds of sampling are carried out, within which a specific amount of random numbers (p) are made. The numbers generated can either fall to left or right of the barrier. The program outputs the difference between the largest number generated to the left of the barrier and the smallest number generated to the right. This is much clearer to see here:
In this example, the system has created 4 numbers (a,b,c,d). It will ignore a and d and output the difference between b and c. Essentially, it will output the smallest possible fragment of the line that still contains the barrier.
The code I have been using to do this is:
x = 100; % length of the grid
pos = 50; % position of the barrier
len1 = 0; % left edge of the grid
len2 = x; % right edge of the grid
sample = 1000; % number of samples to make
nn = 1:12 % number of points to generate (will loop over these)
len = zeros(sample, length(nn)); % array to record the results
for n = 1:length(nn) % For each number of pts to generate
numpts = nn(n);
for i = 1:sample % For each round of sampling,
p = round(rand(numpts,1) * x); % generate 'numpts' random points.
if any(p>pos) % If any are to the right of the barrier,
pright = min(p(p>pos)); % pick the smallest.
else
pright = len2;
end
if any(p<pos) % If any are to the left of the barrier,
pleft = max(p(p<pos)); % pick the largest.
else
pleft = len1;
end
len(i,n) = pright - pleft; % Record the length of the interval.
end
end
My current problem: I'd like to make this more complex. For example, I would like to be able to use more than just one random number count in each round. Specifically I would like to relate this to Poisson distributions with different mean values:
% Create poisson distributions for λ = 1:12
range = 0:20;
for l=1:12;
y = poisspdf(range,l);
dist(:,l) = y;
end
From this, i'd like to take 1000 samples for each λ but within each round of 1000 samples, the random number count is no longer the same for all 1000 samples. Instead it depends on the poisson distribution. For example, within a mean value of 1, the probabilities are:
0 - 0.3678
1 - 0.3678
2 - 0.1839
3 - 0.0613
4 - 0.0153
5 - 0.0030
6 - 0.0005
7 - 0.0001
8 - 0.0000
9 - 0.0000
10 - 0.0000
11 - 0.0000
12 - 0.0000
So for the first round of 1000 samples, 367 of them would be carried out generating just 1 number, 367 carried out generating 2 numbers, 183 carried out generating 3 numbers and so on. The program will then repeat this using new values it gains from a mean value of 2 and so on. I'd then like to simply collect together all the fragment sizes (pright-pleft) into a column of a matrix - a column for each value of λ.
I know I could do something like:
amount = dist*sample
To multiply the poisson distributions by the sample size to gain how many of each number generation it should do - however i'm really stuck on how to incorporate this into the for-loop and alter the code to meet to tackle this new problem. I am also not sure how to read down a column on a matrix to use each probability value to determine how much of each type of RNG it should do.
Any help would be greatly appreciated,
Anna.
You could generate a vector of random variables from a known pdf object using random, if you have the statistics toolbox. Better still, skip the PDF step and generate the random variables using poissrnd. Round off the value to the nearest integer and call rand as you were doing already. In your loop simply iterate over your generated vector of poisson distributed random numbers.
Example:
x = 100; % length of the grid
pos = 50; % position of the barrier
len1 = 0; % left edge of the grid
len2 = x; % right edge of the grid
sample = 1000; % number of samples to make
lambda = 1:12; % lambdas
Rrnd = round(poissrnd(repmat(lambda,sample,1)));
len = zeros(size(Rrnd)); % array to record the results
for n = lambda; % For each number of pts to generate
for i = 1:sample % For each round of sampling,
numpts = Rrnd(i,n);
p = round(rand(numpts,1) * x); % generate 'numpts' random points.
len(i,n) = min([p(p>pos);len2]) - max([p(p<pos);len1]); % Record the length
end
end

optimizing manually-coded k-means in MATLAB?

So I'm writing a k-means script in MATLAB, since the native function doesn't seem to be very efficient, and it seems to be fully operational. It appears to work on the small training set that I'm using (which is a 150x2 matrix fed via text file). However, the runtime is taking exponentially longer for my target data set, which is a 3924x19 matrix.
I'm not the greatest at vectorization, so any suggestions would be greatly appreciated. Here's my k-means script so far (I know I'm going to have to adjust my convergence condition, since it's looking for an exact match, and I'll probably need more iterations for a dataset this large, but I want it to be able to finish in a reasonable time first, before I crank that number up):
clear all;
%take input file (manually specified by user
disp('Please type input filename (in working directory): ')
target_file = input('filename: ', 's');
%parse and load into matrix
data = load(target_file);
%prompt name of output file for later) UNCOMMENT BELOW TWO LINES LATER
% disp('Please type output filename (to be saved in working directory): ')
% output_name = input('filename:', 's')
%prompt number of clusters
disp('Please type desired number of clusters: ')
c = input ('number of clusters: ');
%specify type of kmeans algorithm ('regular' for regular, 'fuzzy' for fuzzy)
%UNCOMMENT BELOW TWO LINES LATER
% disp('Please specify type (regular or fuzzy):')
% runtype = input('type: ', 's')
%initialize cluster centroid locations within bounds given by data set
%initialize rangemax and rangemin row vectors
%with length same as number of dimensions
rangemax = zeros(1,size(data,2));
rangemin = zeros(1,size(data,2));
%map max and min values for bounds
for dim = 1:size(data,2)
rangemax(dim) = max(data(:,dim));
rangemin(dim) = min(data(:,dim));
end
% rangemax
% rangemin
%randomly initialize mu_k (center) locations in (k x n) matrix where k is
%cluster number and n is number of dimensions/coordinates
mu_k = zeros(c,size(data,2));
for k = 1:size(data,2)
mu_k(k,:) = rangemin + (rangemax - rangemin).*rand(1,1);
end
mu_k
%iterate k-means
%initialize holding variable for distance comparison
comparisonmatrix = [];
%initialize assignment vector
assignment = zeros(size(data,1),1);
%initialize distance holding vector
dist = zeros(1,size(data,2));
%specify convergence threshold
%threshold = 0.001;
for iteration = 1:25
%save current assignment values to check convergence condition
hold_assignment = assignment;
for point = 1:size(data,1)
%calculate distances from point to centers
for k = 1:c
%holding variables
comparisonmatrix = [data(point,:);mu_k(k,:)];
dist(k) = pdist(comparisonmatrix);
end
%record location of mininum distance (location value will be between 1
%and k)
[minval, location] = min(dist);
%assign cluster number (analogous to location value)
assignment(point) = location;
end
%check convergence criteria
if isequal(assignment,hold_assignment)
break
end
%revise mu_k locations
%count number of each label
assignment_count = zeros(1,c);
for i = 1:size(data,1)
assignment_count(assignment(i)) = assignment_count(assignment(i)) + 1;
end
%compute centroids
point_total = zeros(size(mu_k));
for row = 1:size(data,1)
point_total(assignment(row),:) = point_total(assignment(row)) + data(row,:);
end
%move mu_k values to centroids
for center = 1:c
mu_k(center,:) = point_total(center,:)/assignment_count(center);
end
end
There are a lot of loops in there, so I feel that there's a lot of optimization to be made. However, I think I've just been staring at this code for far too long, so some fresh eyes could help. Please let me know if I need to clarify anything in the code block.
When the above code block is executed (in context) on the large dataset, it takes 3732.152 seconds, according to MATLAB's profiler, to make the full 25 iterations (I'm assuming it hasn't "converged" according to my criteria yet) for 150 clusters, but about 130 of them return NaNs (130 rows in mu_k).
Profiling will help, but the place to rework your code is to avoid the loop over the number of data points (for point = 1:size(data,1)). Vectorize that.
In your for iteration loop here is a quick partial example,
[nPoints,nDims] = size(data);
% Calculate all high-dimensional distances at once
kdiffs = bsxfun(#minus,data,permute(mu_k,[3 2 1])); % NxDx1 - 1xDxK => NxDxK
distances = sum(kdiffs.^2,2); % no need to do sqrt
distances = squeeze(distances); % Nx1xK => NxK
% Find closest cluster center for each point
[~,ik] = min(distances,[],2); % Nx1
% Calculate the new cluster centers (mean the data)
mu_k_new = zeros(c,nDims);
for i=1:c,
indk = ik==i;
clustersizes(i) = nnz(indk);
mu_k_new(i,:) = mean(data(indk,:))';
end
This isn't the only (or the best) way to do it, but it should be a decent example.
Some other comments:
Instead of using input, make this script into a function to efficiently handle input arguments.
If you want an easy way to specify a file, see uigetfile.
With many MATLAB functions, such as max, min, sum, mean, etc., you can specify a dimension over which the function should operate. This way you an run it on a matrix and compute values for multiple conditions/dimensions at the same time.
Once you get decent performance, consider iterating longer, specifically until the centers no longer change or the number of samples that change clusters becomes small.
The cluster with the smallest distance for each point, ik, will be the same with squared Euclidean distance.