find range of value in matlab from two columns which are dependent - matlab

I have an excel file which contains 4 columns. First column is time based on seconds and the other three are my function. How is it possible to find a time for specific value in the time column? let me give an example:
let say I want to find where is this value in the second column: 0.7636 located in time column? I found it manually which is located between 6960-7020enter link description here.
So, if I have for example a couple of values, and also considering different functions then it is difficult to do it manually.
I hope to hear your support.
Thanks Sepideh

You have to think about what a good solution is first. Let's call your three functions f1(t), f2(t) and f3(t). Now you have values v=[v1,v2,v3] and you want to know the best matching time value.
What is the best matching time value? You have to find some kind of distance metric, which tells you how close the data matches. As a default, I would use the 2-norm unless you have a reason to use something else. This would be:
%no running code, just a formula
d(t)=sqrt((f1(t)-v1)^2+(f2(t)-v2)^2+(f3(t)-v3)^2)
Now having d defined, you want to minimize it. There are basically two approaches. If you are looking for the closest row in your data, calculate d(t) for each row and take the minimum. Another approach would be to interpolate f1 to f3 so you could fill the gaps between your rows, then again search for the minimum d(t)

You can try something like this, to find position of data in column#2:
data = xlsread('Q1.xlsx');
ref = 0.7636; % your reference value
lb = data(:,2) < ref; % find lower value
ub = data(:,2) > ref; % find greater value
lower_bound = find(data(:,2)==max(data(lb,2))); % find lower value position
upper_bound = find(data(:,2)==min(data(ub,2))); % find greater value position
row = data(sort([lower_bound upper_bound]),1); % find position in column#1
and result will be row = [6960;7020].

Related

Index when mean is constant

I am relatively new to matlab. I found the consecutive mean of a set of 1E6 random numbers that has mean and standard deviation. Initially the calculated mean fluctuate and then converges to a certain value.
I will like to know the index (i.e 100th position) at which the mean converges. I have no idea how to do that.
I tried using the logical operator but i have to go through 1e6 data points. Even with that i still can't find the index.
Y_c= sigma_c * randn(n_r, 1) + mu_c; %Random number creation
Y_f=sigma_f * randn(n_r, 1) + mu_f;%Random number creation
P_u=gamma*(B*B)/2.*N_gamma+q*B.*N_q + Y_c*B.*N_c; %Calculation of Ultimate load
prog_mu=cumsum(P_u)./cumsum(ones(size(P_u))); %Progressive Cumulative Mean of system response
logical(diff(prog_mu==0)); %Find index
I suspect the issue is that the mean will never truly be constant, but will rather fluctuate around the "true mean". As such, you'll most likely never encounter a situation where the two consecutive values of the cumulative mean are identical. What you should do is determine some threshold value, below which you consider fluctuations in the mean to be approximately equal to zero, and compare the difference of the cumulative mean to that value. For instance:
epsilon = 0.01;
const_ind = find(abs(diff(prog_mu))<epsilon,1,'first');
where epsilon will be the threshold value you choose. The find command will return the index at which the variation in the cumulative mean first drops below this threshold value.
EDIT: As was pointed out, this method may potentially fail if the first few random numbers are generated such that the difference between them is less than the epsilon value, but have not yet converged. I would like to suggest a different approach, then.
We calculate the cumulative means, as before, like so:
prog_mu=cumsum(P_u)./cumsum(ones(size(P_u)));
We also calculate the difference in these cumulative means, as before:
df_prog_mu = diff(prog_mu);
Now, to ensure that conversion has been achieved, we find the first index where the cumulative mean is below the threshold value epsilon and all subsequent means are also below the threshold value. To phrase this another way, we want to find the index after the last position in the array where the cumulative mean is above the threshold:
conv_index = find(~df_prog_mu,1,'last')+1;
In doing so, we guarantee that the value at the index, and all subsequent values, have converged below your predetermined threshold value.
I wouldn't imagine that the mean would suddenly become constant at a single index. Wouldn't it asymptotically approach a constant value? I would reccommend a for loop to calculate the mean (it sounds like maybe you've already done this part?) like this:
avg = [];
for k=1:length(x)
avg(k) = mean(x(1:k));
end
Then plot the consecutive mean:
plot(avg)
hold on % this will allow us to plot more data on the same figure later
If you're trying to find the point at which the consecutive mean comes within a certain range of the true mean, try this:
Tavg = 5; % or whatever your true mean is
err = 0.01; % the range you want the consecutive mean to reach before we say that it "became constant"
inRange = avg>(Tavg-err) & avg<(Tavg+err); % gives you a binary logical array telling you which values fell within the range
q = 1000; % set this as high as you can while still getting a value for consIndex
constIndex = [];
for k=1:length(inRange)
if(inRange(k) == sum(inRange(k:k+q))/(q-1);)
constIndex = k;
end
end
The below answer takes a similar approach but makes an unsafe assumption that the first value to fall within the range is the value where the function starts to converge. Any value could randomly fall within that range. We need to make sure that the following values also fall within that range. In the above code, you can edit "q" and "err" to optimize your result. I would recommend double checking it by plotting.
plot(avg(constIndex), '*')

'Find' function working incorrectly, have tried floating point accuracy resolution

I have vertically concatenated files from my directory into a matrix that is about 60000 x 15 in size (verified).
d=dir('*.log');
n=length(d);
data=[];
for k=1:n
data{k}=importdata(d(k).name);
end
total=[];
for k=1:n
total=[total;data{n}];
end
I am using a the following 32-iteration loop and the 'Find" function to locate row numbers where the final column is an integer corresponding to the integer iteration of the loop:
for i=1:32
v=[];
vn=[];
[v,vn]=find(abs(fix(i)-fix(total))<eps);
g=length(v)
end
I have tried to account for the floating point accuracy by using 'fix' on values of 'i' and values from matrix 'total', in addition to taking their absolute difference and checking it to be less than a tolerance of 'eps' (floating-point relative accuracy function), up to a tolerance of .99.
The 'Find' function is not working correctly. It is only working for certain integers (although it should be locating all of them (1-32)), and for the integers it does find the values are incomplete.
What is the problem here? If 'Find' is inadequate for this purpose, what is a suitable alternative?
You are getting a lot of zeros because you are looking not just at the 15th column of data but the entire data matrix so you are going to have a lot of non-integers.
Also, you're using fix on both numbers and since floating point errors can cause the number to be slightly above and below the desired integer, this will cause the ones that are below to round down an integer lower than what you'd expect. You should use round to round to the nearest integer instead.
Rather than using find to do this, I would use simple boolean logic to determine the value of the last column
for k = 1:32
% Compare column 15 to the current index
matches = abs(total(:,end) - k) < eps;
% Do stuff with these matches
g = sum(matches); % Count the matches
end
Depending on what you want to actually do with the data, you may be able to use the last column as an input to accumarray to perform an operation on each group.
As a side note, you can replace the first chunk of code with
d = dir('*.log');
data = cellfun(#importdata, {d.name}, 'UniformOutput', false);
total = cat(1, data{:});

How to filter a column of data in matlab?

I have loaded an xlsx file into Matlab using the
data = xlsread()
Now there is a column which I would like to filter as per positive and negative values in the cells.
How would I go about this?
I am just starting, if someone can point out a good resource to learn how to program/code in matlab, I would be very grateful.
Thanks
col = data(:,3);
gtz = col(col>0);
ltz = col(col<0);
eqz = col(col==0);
gives you the greater then zero, lower then zero and equal to zero values in column 3.
And searchich for 'Matlab tutorial' in your favorite search engine will bring you heaps of them.

MATLAB - histograms of equal size and histogram overlap

An issue I've come across multiple times is wanting to take two similar data sets and create histograms from them where the bins are identical, so as to easily calculate things like histogram overlap.
You can define the number of bins (obviously) using
[counts, bins] = hist(data,number_of_bins)
But there's not an obvious way (as far as I can see) to make the bin size equal for several different data sets. If remember when I initially looked finding various people who seem to have the same issue, but no good solutions.
The right, easy way
As pointed out by horchler, this can easily be achieved using either histc (which lets you define your bins vector), or vectorizing your histogram input into hist.
The wrong, stupid way
I'm leaving below as a reminder to others that even stupid questions can yield worthwhile answers
I've been using the following approach for a while, so figured it might be useful for others (or, someone can very quickly point out the correct way to do this!).
The general approach relies on the fact that MATLAB's hist function defines an equally spaced number of bins between the largest and smallest value in your sample. So, if you append a start (smallest) and end (largest) value to your various samples which is the min and max for all samples of interest, this forces the histogram range to be equal for all your data sets. You can then truncate the first and last values to recreate your original data.
For example, create the following data set
A = randn(1,2000)+7
B = randn(1,2000)+9
[counts_A, bins_A] = hist(A, 500);
[counts_B, bins_B] = hist(B, 500);
Here for my specific data sets I get the following results
bins_A(1) % 3.8127 (this is also min(A) )
bins_A(500) % 10.3081 (this is also max(A) )
bins_B(1) % 5.6310 (this is also min(B) )
bins_B(500) % 13.0254 (this is also max(B) )
To create equal bins you can simply first define a min and max value which is slightly smaller than both ranges;
topval = max([max(A) max(B)])+0.05;
bottomval = min([min(A) min(B)])-0.05;
The addition/subtraction of 0.05 is based on knowledge of the range of values - you don't want your extra bin to be too far or too close to the actual range. That being said, for this example by using the joint min/max values this code will work irrespective of the A and B values generated.
Now we re-create histogram counts and bins using (note the extra 2 bins are for our new largest and smallest value)
[counts_Ae, bins_Ae] = hist([bottomval, A, topval], 502);
[counts_Be, bins_Be] = hist([bottomval, B, topval], 502);
Finally, you truncate the first and last bin and value entries to recreate your original sample exactly
bins_A = bins_Ae(2:501)
bins_B = bins_Ae(2:501)
counts_A = counts_Ae(2:501)
counts_B = counts_Be(2:501)
Now
bins_A(1) % 3.7655
bins_A(500) % 13.0735
bins_B(1) % 3.7655
bins_B(500) % 13.0735
From this you can easily plot both histograms again
bar([bins_A;bins_B]', [counts_A;counts_B]')
And also plot the histogram overlap with ease
bar(bins_A,(counts_A+counts_B)-(abs(counts_A-counts_B)))

Matlab: Code Performance Issue Using "ismember"

I need this section of my code to run faster, as it is called many many times. I am new to Matlab and I feel as though there MUST be a way to do this that is not so round-about. Any help you could give on how to improve the speed of what I have or other functions to look into that would help me perform this task would be appreciated.
(Task is to get only lines of "alldata" where the first column is in the set of "minuteintervals" into "alldataMinutes". "minuteintervals" is just the minimum value of "alldata" column one increasing by twenty to the maximum of alldata.
minuteintervals= min(alldata(:,1)):20:max(alldata(:,1)); %20 second intervals
alldataMinutes= zeros(30000,4);
counter=1;
for x=1:length(alldata)
if ismember(alldata(x,1), minuteintervals)
alldataMinutes(counter,:)= alldata(x,:);
counter= counter+1;
end
end
alldataMinutes(counter:length(alldataMinutes),:)= [];
This should give you what you want, and it should be substantially faster:
minuteintervals = min(alldata(:,1)):20:max(alldata(:,1)); %# Interval set
index = ismember(alldata(:,1),minuteintervals); %# Logical index showing first
%# column values in the set
alldataMinutes = alldata(index,:); %# Extract the corresponding rows
This works by passing a vector of values to the function ISMEMBER, instead of passing values one at a time. The output index is a logical vector the same size as alldata(:,1), with a value of 1 (i.e. true) for elements of alldata(:,1) that are in the set minuteintervals, and a value of 0 (i.e. false) otherwise. You can then use logical indexing to easily extract the rows corresponding to the ones in index, placing them in alldataMinutes.