Pick specific timepoints from a timeseries - matlab

I have a 164 x 246 matrix called M. M is data for time series containing 246 time points of 164 brain regions. I want to work on only specific blocks of the time series, not the whole thing. To do so, I created a vector called onsets containing the time onset of each block.
onsets = [7;37;82;112;145;175;190;220];
In this example, there are 8 blocks total (though this number can vary), each blocks containing 9 time points. So for instance, the first block would contain time point 7, 8, 9,..., 15; the second block would contain time point 37, 38, 39,..., 45. I would like to extract the time points for these 8 blocks from M and concatenate 8 these blocks. Thus, the output should be a 164 x 72 matrix (i.e., 164 regions, 8 blocks x 9 time points/per block).
This seems like a very simple indexing problem but I'm struggling to do this efficiently. I've tried indexing each block in M (for intance, vertcat(M(onsets(1,1):onsets(1,1)+8,:));) then use vertcat but this seems very clumsy. Can anyone help?

Try this:
% create sample data
M = rand(164,246);
% create index vector
idx = false(1,size(M,2));
onsets = [7;37;82;112;145;175;190;220];
for i=1:numel(onsets)
idx(onsets(i):onsets(i)+8) = true;
end
% create output matrix
MM = M(:,idx);
You seem to have switched the dimensions somehow, i.e. you try to operate on the rows of M whilst according to your description you need to operate on the columns. Hope this helps.

Related

Interpreting time series dimension?

I am wondering if anyone can explain the interpretation of the size (number of feature) in a time series? For example consider a simple script in Matlab
X= randn(2,5,2)
X(:,:,1) =
-0.5530 0.4291 0.3937 -1.2534 0.2811
-1.4926 -0.7019 -0.8305 -1.4034 1.9545
X(:,:,2) =
0.2004 0.1438 2.3655 -0.1589 0.7140
0.4905 0.2301 -0.7813 -0.6737 0.2552
Assume X is a time series with the following output
This generates 2 vectors of length 5 each has 2 rows. Can anyone tell me what is exactly the meaning of first 2 and 5?
In some websites it says a creating 5 vectors of length 5 and size 2. What does size mean here?
Is 2 like number of features and 5 is like number of time series. The reason for this confusion is because I do not understand how to interpret following sentence:
"Generate 2 vector-valued sequences of length 5; each vector has size
2."
What do size 2 and length 5 mean here?
This entirely depends on your data, and how you want to store this. If you have some 2D data over time, I find it convenient to have a data matrix with in the 1st and 2nd dimension the 2D data per time step, and in the 3rd dimension time.
Say I have a movie of 1920 by 1080 pixels with 100 frames, I'd store this as mov = rand(1080,1920,100) (1080 and 1920 swapped because of row, col order of indexing). So now mov(:,:,1) would give me the first frame etc.
BTW, your X is a normal array, not to be confused with the timeseries object.

Access different rows from multiple pages in 3D array

How can I access different rows from multiple pages in a 3D array while avoiding a for-loop?
Let's assume I have a 10x5x3 matrix (mat1) and I would like to copy different individual rows from the three pages (such as the 4th, 2nd, and 5th row of the 1st, 2nd, and 3rd page) into the first row of another 10x5x3 matrix (mat2).
My solution uses a for-loop. What about vectorization?
mat1 = randi(100, 10, 5, 3)
mat2 = nan(size(mat1))
rows_to_copy = [4, 2, 5]
for i = 1 : 3
mat2(1, :, i) = mat1(rows_to_copy(i), :, i)
end
Any vectorized solution is likely not going to be as simple as your for loop solution, and might actually be less efficient (edit: see timing tests below). However, if you're curious, vectorizing an indexing operation like this generally involves converting your desired indices from subscripts to linear indices. Normally you can do this using sub2ind, but since you're selecting entire rows it may be more efficient to calculate the index yourself.
Here's a solution that takes advantage of implicit expansion in newer versions of MATLAB (R2016b and later):
[R, C, D] = size(mat1);
index = rows_to_copy+R.*(0:(C-1)).'+R*C.*(0:(D-1));
mat2(1, :, :) = reshape(mat1(index), 1, C, D);
Note that if you don't really need all the extra space full of NaN values in mat2, you can make your result more compact by just concatenating all the rows into a 2-D matrix instead:
>> mat2 = mat1(index).'
mat2 =
95 41 2 19 44
38 31 93 27 27
49 10 72 91 49
And if you're still using an older version of MATLAB without implicit expansion, you can use bsxfun instead:
index = bsxfun(#plus, rows_to_copy+R*C.*(0:(D-1)), R.*(0:(C-1)).');
Timing
I ran some tests using timeit (R2018a, Windows 7 64-bit) to see how the loop and indexing solutions compared. I tested 3 different scenarios: increasing row size, increasing column size, and increasing page size (third dimension) for mat1. The rows_to_copy was randomly selected and always had the same number of elements as the page size of mat1. Here are the results, showing the ratio of the loop time versus the indexing time:
Aside from some transient noise, there are some clear patterns. Increasing either the number of rows or columns (blue or red lines) doesn't appreciably change the time ratio, which hovers in the range of 0.7 to 0.9, meaning the for loop is slightly faster on average. Increasing the number of pages (yellow line) means the for loop has to iterate more times, and the indexing solution quickly starts to win out, reaching an 8 times speedup when the page size exceeds about 150.

Unreasonable [positive] log-likelihood values from matlab "fitgmdist" function

I want to fit a data sets with Gaussian mixture model, the data sets contains about 120k samples and each sample has about 130 dimensions. When I use matlab to do it, so I run scripts (with cluster number 1000):
gm = fitgmdist(data, 1000, 'Options', statset('Display', 'iter'), 'RegularizationValue', 0.01);
I get the following outputs:
iter log-likelihood
1 -6.66298e+07
2 -1.87763e+07
3 -5.00384e+06
4 -1.11863e+06
5 299767
6 985834
7 1.39525e+06
8 1.70956e+06
9 1.94637e+06
The log likelihood is bigger than 0! I think it's unreasonable, and don't know why.
Could somebody help me?
First of all, it is not a problem of how large your dataset is.
Here is some code that produces similar results with a quite small dataset:
options = statset('Display', 'iter');
x = ones(5,2) + (rand(5,2)-0.5)/1000;
fitgmdist(x,1,'Options',options);
this produces
iter log-likelihood
1 64.4731
2 73.4987
3 73.4987
Of course you know that the log function (the natural logarithm) has a range from -inf to +inf. I guess your problem is that you think the input to the log (i.e. the aposteriori function) should be bounded by [0,1]. Well, the aposteriori function is a pdf function, which means that its value can be very large for very dense dataset.
PDFs must be positive (which is why we can use the log on them) and must integrate to 1. But they are not bounded by [0,1].
You can verify this by reducing the density in the above code
x = ones(5,2) + (rand(5,2)-0.5)/1;
fitgmdist(x,1,'Options',options);
this produces
iter log-likelihood
1 -8.99083
2 -3.06465
3 -3.06465
So, I would rather assume that your dataset contains several duplicate (or very close) values.

Graphing different sets of data on same graph within a ‘for’ loop MATLAB

I just have a problem with graphing different plots on the same graph within a ‘for’ loop. I hope someone can be point me in the right direction.
I have a 2-D array, with discrete chunks of data in and amongst zeros. My data is the following:
A=
0 0
0 0
0 0
3 9
4 10
5 11
6 12
0 0
0 0
0 0
0 0
7 9.7
8 9.8
9 9.9
0 0
0 0
A chunk of data is defined as contiguous set of data, without interruptions of a [0 0] row. So in this example, the 1st chunk of data would be
3 9
4 10
5 11
6 12
And 2nd chunk is
7 9.7
8 9.8
9 9.9
The first column is x and second column is y. I would like to plot y as a function of x (x is horizontal axis, y is vertical axis) I want to plot these data sets on the same graph as a scatter graph, and put a line of best fit through the points, whenever I come across a chunk of data. In this case, I will have 2 sets of points and 2 lines of best fit (because I have 2 chunks of data). I would also like to calculate the R-squared value
The code that I have so far is shown below:
fh1 = figure;
hold all;
ah1 = gca;
% plot graphs:
for d = 1:max_number_zeros+num_rows
if sequence_holder(d,1)==0
continue;
end
c = d;
while sequence_holder(c,1)~=0
plot(ah1,sequence_holder(c,1),sequence_holder(c,num_cols),'*');
%lsline;
c =c+1;
continue;
end
end
Sequence holder is the array with the data in it. I can only plot the first set of data, with no line of best fit. I tried lsline, but that didn't work.
Can anyone tell me how to
-plot both sets of graphs
-how to draw a line of best fit a get the regression coefficient?
The first part could be done in a number of ways. I would test the second column for zeroness
zerodata = A(:,2) == 0;
which will give you a logical array of ones and zeros like [1 1 1 0 1 0 0 ...]. Then you can use this to split up your input. You could look at the diff of that array and test it for positive or negative sign. Your data starts on 0 so you won't get a transition for that one, so you'd need to think of some way to deal with that or the opposite case, unless you know for certain that it will always be one way or the other. You could just test the first element, or you could insert a known value at the start of your input array.
You will then have to store your chunks. As they may be of variable length and variable number you wouldn't put them into a big matrix, but you still want to be able to use a loop. I would use either a cell array, where each cell in a row contains the x or y data for a chunk, or a struct array where say structarray(1).x and structarray)1).y hold your data values.
Then you can iterate through your struct array and call plot on each chunk separately.
As for fitting you can use the fit command. It's complex and has lots of options so you should check out the help first (type doc fit inside the console to get the inline help, which is the same as the website help in content). The short version is that you can do a simple linear fit like this
[fitobject, gof] = fit(x, y, 'poly1');
where 'poly1' specifies you want a first order polynomial (i.e. straight line) and the output arguments give you a fit object, which you can do various things with like plot or interpolate, and the second gives you a struct containing among other things the r^2 and adjusted r^2. The fitobject also contains your fit coefficients.

How do I cut my EMG signal and get an average signal?

I have an EMG signal of a subject walking on a treadmill.
We used footswitches to be able to see when the subject is placing his foot, so we can see how many periods (steps) there are in time.
We would like to cut the signal in periods (steps) and get one average signal (100% step cycle).
I tried the reshape function but it does not work
when I count 38 steps:
nwaves = 38;
sig2 = reshape(sig,[numel(sig)/nwaves nwaves])';
avgSig = mean(sig2,1);
plot(avgSig);
the error displayed is this: Size arguments must be real integers.
Can anyone help me with this? Thanks!
First of all, reshaping the array is a bad approach to the problem. In real world one cannot assume that the person on the treadmill will step rhythmically with millisecond-precision (i.e. for the same amount of samples).
A more realistic approach is to use the footswitch signal: assume is really a switch on a single foot (1=foot on, 0=foot off), and its actions are filtered to avoid noise (Schmidt trigger, for example), you can get the samples index when the foot is removed from the treadmill with:
foot_off = find(diff(footswitch) < 0);
then you can transform your signal in a cell array (variable lengths) of vectors of data between consecutive steps:
step_len = diff([0, foot_off, numel(footswitch)]);
sig2 = mat2cell(sig(:), step_len, 1);
The problem now is you can't apply mean() to the signal slices in order to get an "average step": you must process each step first, then average the results.
It's probably because numel(sig)/nwaves isn't an integer. You need to round it to the nearest integer with round(numel(sig)/nwaves).
EDIT based on comments:
Your problem is you can't divide 51116 by 38 (it's 1345.2), so you can't reshape your signal in chunks of 38 long. You need a signal whose length is exactly a multiple of 38 if you want to be able to reshape it in chunks of 38. Either that, or remove the last (or first) 6 values from your signal to have an exact multiple of 38 (1345 * 38 = 51110):
nwaves = 38;
n_chunks = round(numel(sig)/nwaves);
max_sig_length = n_chunks * nwaves;
sig2 = reshape(sig(1:max_sig_length),[n_chunks nwaves])';
avgSig = mean(sig2,1);
plot(avgSig);