Why is more data being added to my array than should be? - matlab

I am writing some code for data processing. The requirement is to sort the data, which comes in lists called u_B (5000 values of speed data) and P_B (5000 pieces of the related power data) into "bins" by speed, so that it is possible to calculate the mean speed and power within each bin. The code below is just trying to get the "bin" for the range of speeds 24-25m/s. What I expect to happen is that the code cycles through the u_B list, checks if each speed is within the required range, and if it is, puts it in the "bin", along with the corresponding power value. I have altered it to output the speeds it considers to be in the right range, and they seem to be all as I expect them to be, but when the bin is outputted right at the end it contains not only the data within the right range, but also a whole load of other data that does not fit within the speed range. I cannot work out why this other data is being added to the bin. If anyone can spot what I am missing I would be grateful.
i = 25;
inc = 1;
for n = 1:5000
if (u_B(n) >= (i-1)) && (u_B(n) < (i + 1))
disp(u_B(n))
bin(inc,1) = u_B(n);
disp(bin(inc,1))
bin(inc,2) = P_B(n);
inc = inc + 1
end
end
disp(bin)
This shows the first set of outputs from within the if-statement, the 24.7s are the speed u_B(n) and the value that has been put into the bin, they are the same as expected, the 0 for power and 2 for inc are both fine. the list from this goes on and only contains speed values in the right range.
screenshot of code and output
This shows the output of what is in the bin, the first 10 values are the ones I want to be in there, and all the rest have lower speeds, and therefore shouldn't be in the bin.
screenshot of code and output

Related

Separate y-values depending if the x-value is increasing or decreasing

I try to analyze my data using a mixture of python and matlab, but I am stuck and could not find any discussion that solves my problem.
My data consists of temperature and current measurements which are recorded at the same time but using two different devices. Afterwards these measurements are matched together using the time stamp of each measurement to get the "raw data plot". However, the current values differ at the same temperature depending if the sample was heated up or cooled down.
Now, I would like to separate the heating and cooling values of the current measurements, and calculate the mean and standard deviation for all currents at one temperature for cooling and heating, respectively.
What I do so far is first looking for all the values that are beloning to the same temperature, no matter if it's a cooling or heating cycle. That results in quite large standard deviation values.
The two figures show a simple example how my data looks like.
The first figure plots the temperature values against the number of data points and marks all values that belong to this temperature:
The second figure shows the current data with the marked values that correspond to the temperature.
The temperature is always kept constant for 180 s and then increased or decreased by 10°C. During the 180 s several current measurements are taking place, which results in several data point per temperature per cycle. The cycle is repeated itself several times (not shown here). To simplify the example here, I just used simple numbers instead of real temperature and current values. The repetition of the same number, indicates several measurements at one temperature. In reality the current values are not completely stable, but fluctuate around a certain value (which I also irgnoered here).
The code which does that looks like this:
Sample data:
Test_T = [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,4,4,4,4,4,3,3,3,3,3,2,2,2,2,2,1,1,1,1,1] ;
Test_I = [5,5,5,5,5,6,6,6,6,6,7,7,7,7,7,8,8,8,8,8,9,9,9,9,9,7,7,7,7,7,6,6,6,6,6,5,5,5,5,5,4,4,4,4,4] ;
Code:
Test_T_sel =Test_T;
Test_I_sel = Test_I;
ll = 0;
ul = 2;
ID = Test_T_sel <ul & Test_T_sel >ll;
Test_x_avg = mean(Test_T_sel(ID));
Test_y_avg = mean(Test_I_sel(ID));
figure('Position', [100 100 700 500]);
plot(Test_T_sel);
hold on;
plot(find(ID), Test_T_sel(ID), '*r');
ylabel('Temperature [°C]')
figure('Position', [900 100 700 500]);
plot(Test_I_sel);
hold on;
plot(find(ID), Test_I_sel(ID), '.r');
ylabel('Current [µA]')
And Test_T contains 90 values increasing stepwise from 1 to 5 each 10 values, while Test_I contains the current values. As you can see for Temperature = 1°C the current values is either 5 or 4. Now I would like to get a vector that only contains the values of T and the corresponding value of current if T increases and a second vector for the current values were T decreases.
I thought using a if else comand, but I actually do not know how to implement this. Maybe something like this could work:
if T2 == T1 and T2-T1 <= 0.2 "take corresponding I values" (this is true when the temperature is stable and only varies by 0.2°C)
if T2-T1 > 0.2 "ignore I values until" T2 == T1 again and T2-T1 <= 0.2 (this would either be a stronger variation at one temperature or indicate a temperature change and waits until T is constant again)
But now, I still need to distinguish if the temperature is generally increasing or decreasing after 5 measurements.
if T2 > T1, T is increasing (Test_T_heat) and the correspnding I values should be written in a vector Test_I_heat
if T2 < T1, T is decreasing (Test_T_cool) and the corresponding I values should be written in a vector Test_I_cool
For the example given above this should look like this at the end:
Test_T_heat: [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5];
Test_I_heat: [5,5,5,5,5,6,6,6,6,6,7,7,7,7,7,8,8,8,8,8,9,9,9,9,9];
Test_T_cool: [4,4,4,4,4,3,3,3,3,3,2,2,2,2,2,1,1,1,1,1] ;
Test_I_cool: [7,7,7,7,7,6,6,6,6,6,5,5,5,5,5,4,4,4,4,4] ;
How has the "code" to be changed that I get such vectors?

Why number of peaks of my signal stay same when I increase n in n-point moving average filter when data is big?

I am using MATLAB to find the number of peaks of a signal.
I'm trying to plot the number of peaks of a signal filtered with N-point moving average filter, N goes from 2 to 30.(I also consider the number of peaks when no filter has applied at the beginning of the resulting array) My data array(imported from csv and has double values between 0 and 1) has around 50k points. When I give part of the data i.e 100, 500 or 1000 points, using array slicing, # of peaks decrease as expected. However, when I give the whole data or even 2000 points, the number of peaks stays same at 127.
I changed the number of data given to the filter to find out why this happens. I changed the commented lines like showed in the comment and tried. When less than 1000 data points given plot was fine.
Here is the signal
https://www.dropbox.com/s/e1bkcjn5ta5q610/exampleSignal.csv?dl=0
Please import it from 4th element to end, it has some strange data at the beginning, I have not taken them, VarName1 is the imported column vector's name
numberOfPeaks = zeros(30,1,'int8');
pks = findpeaks(VarName1); % VarName1(1:1000,:) (when no filter applied)
numberOfPeaks(1) = size(pks,1);
for i=2:30
h = 1/i*ones(1,i,'double');
y = filter(h,1,VarName1); % VarName1(1:1000,:)
numberOfPeaks(i) = size(findpeaks(y),1);
end
plot(1:30,numberOfPeaks);
I expect a plot like this when whole the data is given:
but I get:
I realised that the problem is int8 I use. It can only take up to 127 and this caused my big results to be as 127.
Turning it into double solves the problem.

Matlab: Find the result with accuracy of certain decimal place with minimum iterations

I'm using a numerical integration method to approximate an integral. I need to use a minimum number of iterations that give an answer correct to 5 decimal places.
I cannot seem to find a generalised way of doing this. I've only been able to get it working for some cases.
Here is what I'e tried:
%num2str(x,7) to truncate the value of x
xStr = num2str(x(n),7);
x5dp(n) = str2double(xStr); %convert back to a truncated number
%find the difference between the values
if n>1 %cannot index n-1 = 0
check = x5dp(n)-x5dp(n-1);
end
This will find the first instance at which the first 5dp are the same, it doesn't take into account that changes might occur beyond that point, which has happened, the iteration I am looking for was about 450, but it stopped at 178 due to this.
Then I tried this
while err>errLim & n<1000
...
r = fix(x(j)*1e6)/1e1 %to move the 6th dp to the 1stplace
x6dp = rem(r,1)*10 %to 'isolate' the value of the 6th dp
err = abs(x(j)-x(j-1)) % calculate the difference between the two
%if the 6th decimal place is greater than 5 and err<1e-6
% then the 6th decimal place won't change the value of the 5th any more
if err<errLim && x6dp<5
err=1;
end
...
end
This works for one method and function I tested it one. However when I pasted it into another method for another function, I get the iteration ending before the final result is achieved. the final 4 results are:
4.39045203734423 4.39045305948901 4.39045406364900 4.39045505024365
However, the answer I need is actually 4.39053, this has stopped the iteration about 300 steps too early.

MATLAB spending an incredible amount of time writing a relatively small matrix

I have a small MATLAB script (included below) for handling data read from a CSV file with two columns and hundreds of thousands of rows. Each entry is a natural number, with zeros only occurring in the second column. This code is taking a truly incredible amount of time (hours) to run what should be achievable in at most some seconds. The profiler identifies that approximately 100% of the run time is spent writing a matrix of zeros, whose size varies depending on input, but in all usage is smaller than 1000x1000.
The code is as follows
function [data] = DataHandler(D)
n = size(D,1);
s = max(D,1);
data = zeros(s,s);
for i = 1:n
data(D(i,1),D(i,2)+1) = data(D(i,1),D(i,2)+1) + 1;
end
It's the data = zeros(s,s); line that takes around 100% of the runtime. I can make the code run quickly by just changing out the s's in this line for 1000, which is a sufficient upper bound to ensure it won't run into errors for any of the data I'm looking at.
Obviously there're better ways to do this, but being that I just bashed the code together to quickly format some data I wasn't too concerned. As I said, I fixed it by just replacing s with 1000 for my purposes, but I'm perplexed as to why writing that matrix would bog MATLAB down for several hours. New code runs instantaneously.
I'd be very interested if anyone has seen this kind of behaviour before, or knows why this would be happening. Its a little disconcerting, and it would be good to be able to be confident that I can initialize matrices freely without killing MATLAB.
Your call to zeros is incorrect. Looking at your code, D looks like a D x 2 array. However, your call of s = max(D,1) would actually generate another D x 2 array. By consulting the documentation for max, this is what happens when you call max in the way you used:
C = max(A,B) returns an array the same size as A and B with the largest elements taken from A or B. Either the dimensions of A and B are the same, or one can be a scalar.
Therefore, because you used max(D,1), you are essentially comparing every value in D with the value of 1, so what you're actually getting is just a copy of D in the end. Using this as input into zeros has rather undefined behaviour. What will actually happen is that for each row of s, it will allocate a temporary zeros matrix of that size and toss the temporary result. Only the dimensions of the last row of s is what is recorded. Because you have a very large matrix D, this is probably why the profiler hangs here at 100% utilization. Therefore, each parameter to zeros must be scalar, yet your call to produce s would produce a matrix.
What I believe you intended should have been:
s = max(D(:));
This finds the overall maximum of the matrix D by unrolling D into a single vector and finding the overall maximum. If you do this, your code should run faster.
As a side note, this post may interest you:
Faster way to initialize arrays via empty matrix multiplication? (Matlab)
It was shown in this post that doing zeros(n,n) is in fact slow and there are several neat tricks to initializing an array of zeros. One way is to accomplish this by empty matrix multiplication:
data = zeros(n,0)*zeros(0,n);
One of my personal favourites is that if you assume that data was not declared / initialized, you can do:
data(n,n) = 0;
If I can also comment, that for loop is quite inefficient. What you are doing is calculating a 2D histogram / accumulation of data. You can replace that for loop with a more efficient accumarray call. This also avoids allocating an array of zeros and accumarray will do that under the hood for you.
As such, your code would basically become this:
function [data] = DataHandler(D)
data = accumarray([D(:,1) D(:,2)+1], 1);
accumarray in this case will take all pairs of row and column coordinates, stored in D(i,1) and D(i,2) + 1 for i = 1, 2, ..., size(D,1) and place all that match the same row and column coordinates into a separate 2D bin, we then add up all of the occurrences and the output at this 2D bin gives you the total tally of how many values at this 2D bin which corresponds to the row and column coordinate of interest mapped to this location.

recording 'bursts' of samples at 300 samples per sec

I am recording voltage changes over a small circuit- this records mouse feeding. When the mouse is eating, the circuit voltage changes, I convert that into ones and zeroes, all is well.
BUT- I want to calculate the number and duration of 'bursts' of feeding- that is, instances of circuit closing that occur within 250 ms (75 samples) of one another. If the gap between closings is larger than 250ms I want to count it as a new 'burst'
I guess I am looking for help in asking matlab to compare the sample number of each 1 in the digital file with the sample number of the next 1 down- if the difference is more than 75, call the first 1 the end of one bout and the second one the start of another bout, classifying the difference as a gap, but if it is NOT, keep the sample number of the first 1 and compare it against the next and next and next until there is a 75-sample difference
I can compare each 1 to the next 1 down:
n=1; m=2;
for i = 1:length(bouts4)-1
if bouts4(i+1) - bouts4(i) >= 75 %250 msec gap at a sample rate of 300
boutend4(n) = bouts4(i);
boutstart4(m)= bouts4(i+1);
m = m+1;
n = n+1;
end
I don't really want to iterate through i for both variables though...
any ideas??
-DB
You can try the following code
time_diff = diff(bouts4);
new_feeding = time_diff > 75;
boutend4 = bouts4(new_feeding);
boutstart4 = [0; bouts4(find(new_feeding) + 1)];
That's actually not too bad. We can actually make this completely vectorized. First, let's start with two signals:
A version of your voltages untouched
A version of your voltages that is shifted in time by 1 step (i.e. it starts at time index = 2).
Now the basic algorithm is really:
Go through each element and see if the difference is above a threshold (in your case 75).
Enumerate the locations of each one in separate arrays
Now onto the code!
%// Make those signals
bout4a = bouts4(1:end-1);
bout4b = bouts4(2:end);
%// Ensure column vectors - you'll see why soon
bout4a = bout4a(:);
bout4b = bout4b(:);
% // Step #1
loc = find(bouts4b - bouts4a >= 75);
% // Step #2
boutend4 = [bouts4(loc); 0];
boutstart4 = [0; bouts4(loc + 1)];
Aside:
Thanks to tail.b.lo, you can also use diff. It basically performs that difference operation with the copying of those vectors like I did before. diff basically works the same way. However, I decided not to use it so you can see how exactly your code that you wrote translates over in a vectorized way. Only way to learn, right?
Back to it!
Let's step through this slowly. The first two lines of code make those signals I was talking about. An original one (up to length(bouts) - 1) and another one that is the same length but shifted over by one time index. Next, we use find to find those time slots where the time index was >= 75. After, we use these locations to access the bouts array. The ending array accesses the original array while the starting array accesses the same locations but moved over by one time index.
The reason why we need to make these two signals column vector is the way I am appending information to the starting vector. I am not sure whether your data comes in rows or columns, so to make this completely independent of orientation, I'm going to make sure that your data is in columns. This is because if I try to append a 0, if I do it to a row vector I have to use a space to denote that I'm going to the next column. If I do it for a column vector, I have to use a semi-colon to go to the next row. To completely avoid checking to see whether it's a row or column vector, I'm going to make sure that it's a column vector no matter what.
By looking at your code m=2. This means that when you start writing into this array, the first location is 0. As such, I've artificially placed a 0 at the beginning of this array and followed that up with the rest of the values.
Hope this helps!