How to build multiple regression for structures within array of structures in Matlab - matlab

I am trying to use the regress function:
b = regress(y,X);
However, I am having trouble getting it to work with structures. I think I need to fit two structures (independent variables) into X for it to work. Is there a way to do it? Perhaps I'm on the wrong track?
Here is what my structs look like:
s(1).s1 = -0.169
s(2).s1 = 0.125
s(3).s1 = -0.188
s(4).s1 = 0.188
s(5).s1 = 0.012
s(1).s2 = 0.572
s(2).s2 = 0.300
s(3).s2 = 0.018
s(4).s2 = 0.147
s(5).s2 = 1.080
s(1).s3 = 0.076
s(2).s3 = -0.490
s(3).s3 = -0.144
s(4).s3 = -0.134
s(5).s3 = -0.183
s1 and s2 are my independent variables and s3 is the dependent variable.

The reason why you have your values as fields in a structure array is beyond my understanding.... but working with this, extract out the fields and place them into a matrix (for the independent variables) and a vector (for the dependent variable).
Extract out each field for each structure into a comma-separated list, then use regress:
X = [[s.s1].' [s.s2].'];
y = [s.s3].';
b = regress(y, X);
This is assuming that the first column consists of s1 and the second column consists of s2 for the "independent" matrix. Also, s3 is the dependent variable. Simply put, the X matrix will consist of two columns. The first column is all of the s1 values extracted from the array of structures and the second column is all of the s2 values extracted. The dependent vector is made up of all of the s3 values. This syntax [s.s1] (or [s.s2] and [s.s3]) may seem a bit peculiar but it is common-place in MATLAB. Doing s.s1 for example produces a comma-separated list which takes each field from the array of structures and represents them like so:
s(1).s1, s(2).s1, s(3).s1, s(4).s1, s(5).s1
Wrapping this with [] essentially creates an array, but this creates a row vector. We need to make this a column vector, which is why the transpose (.') operator is required. For regress each column is a variable while each row is a sample for the X matrix. We repeat this for the s2 field, and the dependent vector for s3.
After running this code, I get:
>> format long g;
>> b
b =
-0.687194475280996
-0.21086419010155
format long g; is used to show more digits of precision for the answer.

Related

Extract values from a vector and sort them based on their original squence

I have a vector of numbers (temperatures), and I am using the MATLAB function mink to extract the 5 smallest numbers from the vector to form a new variable. However, the numbers extracted using mink are automatically ordered from lowest to largest (of those 5 numbers). Ideally, I would like to retain the sequence of the numbers as they are arranged in the original vector. I hope my problem is easy to understand. I appreciate any advice.
The function mink that you use was introduced in MATLAB 2017b. It has (as Andras Deak mentioned) two output arguments:
[B,I] = mink(A,k);
The second output argument are the indices, such that B == A(I).
To obtain the set B but sorted as they appear in A, simply sort the vector of indices I:
B = A(sort(I));
For example:
>> A = [5,7,3,1,9,4,6];
>> [~,I] = mink(A,3);
>> A(sort(I))
ans =
3 1 4
For older versions of MATLAB, it is possible to reproduce mink using sort:
function [B,I] = mink(A,k)
[B,I] = sort(A);
B = B(1:k);
I = I(1:k);
Note that, in the above, you don't need the B output, your ordered_mink can be written as follows
function B = ordered_mink(A,k)
[~,I] = sort(A);
B = A(sort(I(1:k)));
Note: This solution assumes A is a vector. For matrix A, see Andras' answer, which he wrote up at the same time as this one.
First you'll need the corresponding indices for the extracted values from mink using its two-output form:
[vals, inds] = mink(array);
Then you only need to order the items in val according to increasing indices in inds. There are multiple ways to do this, but they all revolve around sorting inds and using the corresponding order on vals. The simplest way is to put these vectors into a matrix and sort the rows:
sorted_rows = sortrows([inds, vals]); % sort on indices
and then just extract the corresponding column
reordered_vals = sorted_rows(:,2); % items now ordered as they appear in "array"
A less straightforward possibility for doing the sorting after the above call to mink is to take the sorting order of inds and use its inverse to reverse-sort vals:
reverse_inds = inds; % just allocation, really
reverse_inds(inds) = 1:numel(inds); % contruct reverse permutation
reordered_vals = vals(reverse_inds); % should be the same as previously

Mean of fields in a structure using structfun, Matlab

I have a structure in the following format:
A.L1.data = <1000x3 double>
A.L2.data = <1000x3 double>
A.L3.data = <1000x3 double>
I would like to obtain the mean of the first column of all the fields, i.e. one vector of 1000 rows that is the mean of L1, L2 and L3.
I have tried using structfun with the following code:
foo = structfun(#(x) mean(x.data(:,1)), A, 'UniformOutput', false)
However, this gives me the mean (single value) of each first column rather than the mean of all the fields.
If I do:
foo = structfun(#(x) mean(x.data), A, 'UniformOutput', false)
I obtain the mean of each column for each field.
How should I modify my code?
You can access all data of a struct by struct2array.
Get struct firstColumnsOfData with fields L1 L2 L3 with the first columns of data:
firstColumnsOfData = structfun(#(x) x.data(:,1), A, 'UniformOutput', false)
Get mean of each element of L1 L2 L3:
mL123 = mean(struct2array(firstColumnsOfData')) % transpose to not get mean of each field
I understand your question to mean that you want the mean of A.L1.data(ii,1), A.L2.data(ii,1), and A.L3.data(ii,1), thereby creating a column vector with 1000 entries. With structfun, I don't see how you can apply it across fields in the structure as this applies the function provided to every field in the structure sequentially.
I think what you want is this:
bar = mean([A.L1.data(:,1) A.L2.data(:,1) A.L3.data(:,1)], 2);
Passing 2 as the second argument to mean provides the mean across the rows as opposed to down the columns.

Matlab -- Copy Structure Array without For Loop

I have a fairly simple question in Matlab. I want to copy n items of structure array (sumRT.P) to a matrix (m). In C, I would just use a for loop, like this:
for i = 1:n
m(i) = sumRT(i).P;
end
But I bet there's a simpler way to copy an array in Matlab (that's the whole point of language right?). I tried this:
m = sumRT(1:n).P;
But this just copies the first item in sumRT.P to m, resulting in a 1 X 1 matrix. Note, if I type, sumRT(2).P for example, I can see the second item. Same for any number up to n. Why is this wrong and how do I fix it?
It depends on the data types in your structure array. If they are types of variables, or if they are variables of the same size in arrays of different dimensions, then you can't put them into an array, but you can make them into a cell:
m={sumRT(1:n).P}
and cells are pretty simple to deal with, so this oughtn't be a big problem.
If they are all scalar numerical values, you can create a matrix:
m=cell2mat({sumRT(1:n).P})
Try the following:
m = squeeze(cell2mat(struct2cell(sumRT(1:n))));
This converts the struct array to a cell array, and then to a (numeric) array, and then squeezes it by remoiving singleton dimensions.
Example:
>> sumRT(1).P = 10; sumRT(2).P = 20; sumRT(3).P = 30;
>> n = 2; %// copy first two elements only
>> m = squeeze(cell2mat(struct2cell(sumRT(1:n))))
m =
10
20

Finding the average number of consecutive same value elements in matlab

I have a question that I couldn't solve by myself, and search results have also not been what I am looking for (unless I missed one that explains it all, in which case I apologize!)
I have a system that can be in three states, S = S1, S2 and S3. It can change between these three states with a certain probability: From S1 to S2 with P1, S2 to S1 with P2, S2 to S3 with P3 and S3 to S2 with P4. However, to make things simple, I'll begin with P1 = P2 = P3 = P4 = P.
Now I have a dataset, an array of 1000000 values which correspond to these specific states. So S1 means a 1 in the array, S2 means 0.5 and S3 means 0.
So now I want to find out how long the average 'string' of consecutive 1's, or 0.5's, or 0's is in my array. As it is simply a binomial process, (change state with p = P), I should in principle be able to extract P from this information. Although I'm not sure how yet, as I can't simply fit the distribution of 'string lengths' to the binomial distribution, can I?
In any case, a good place to start would be to be able to extract the length of 'strings' of consecutive equal values. Could anyone point me in a direction for where to start?
Edit:
I see that fitdist could fit the 'string lengths' to the binomial distribution. So now I simply want to find how to create an array that contains the 'string lengths' for consecutive 1's, 0.5's and 0's.
Edit 2: It seems that Series of consecutive numbers (different lengths) might be doing exactly what I want. I'll have a quick look at it, and if so I'll delete the post. I apologize!
You could do something as simple as using a derivative. This will identify when there is a change in the sequence. Anywhere the derivative returns something other than 0, this indicates a change. Find what index those changes happen, and then you can find the differences between these indices to get the lengths. Here is some example code
% all just setup
a = 0*ones(1,randi([1,10]));
b = 1*ones(1,randi([1,10]));
c = 0.5*ones(1,randi([1,10]));
vals = {a,b,c};
len = 1e6;
temp = cell(1,len);
for i = 1:len
index = randi([1,3]);
temp{i} = vals{index};
end
mat = cell2mat(temp);
% code that actually does what you need
mat = [mat,nan];
seqLengths = diff([0,find(diff(mat) ~= 0)]);
Please note that the nan is added to the end of your vector so that you will get a vector of the same length at the end. nan is used because it is assumed that your vector will contain all valid numbers, if not, nan can be replaced with any value that does not match the last value in the matrix.
If it's really a binomial process, there is no need to count the average length. Count the transitions for each state:
y=sparse(x(1:end-1),x(2:end),ones(numel(x)-1,1))
And divide it by the total number of transitions:
z=y./sum(sum(y))

Indexing must appear last in an index expression

I have a vector CD1 (120-by-1) and I separate CD1 into 6 parts. For example, the first part is extracted from row 1 to row 20 in CD1, and second part is extracted from row 21 to row 40 in CD1, etc. For each part, I need to compute the means of the absolute values of second differences of the data.
for PartNo = 1:6
% extract data
Y(PartNo) = CD1(1 + 20*(PartNo-1):20*(PartNo),:);
% find the second difference
Z(PartNo) = Y(PartNo)(3:end) - Y(PartNo)(1:end-2);
% mean of absolute value
MEAN_ABS_2ND_DIFF_RESULT(PartNo) = mean(abs(Z));
end
However, the commands above produce the error:
()-indexing must appear last in an index expression for Line:2
Any ideas to change the code to have it do what I want?
This error is often encountered when Y is a cell-array. For cell arrays,
Y{1}(1:3)
is legal. Curly braces ({}) mean data extraction, so this means you are extracting the array stored in location 1 in the cell array, and then referencing the elements 1 through 3 of that array.
The notation
Y(1)(1:3)
is different in that it does not extract data, but it references the cell's location 1. This means the first part (Y(1)) returns a cell-array which, in your case, contains a single array. So you won't have direct access to the regular array as before.
It is an infamous limitation in Matlab that you cannot do indirect or double-referencing, which is in effect what you are doing here.
Hence the error.
Now, to resolve: I suspect replacing a few normal braces with curly ones will do the trick:
Y{PartNo} = CD1(1+20*(PartNo-1):20*PartNo,:); % extract data
Z{PartNo} = Y{PartNo}(3:end)-Y{PartNo}(1:end-2); % find the second difference
MEAN_ABS_2ND_DIFF_RESULT{PartNo} = mean(abs(Z{PartNo})); % mean of absolute value
I might suggest a different approach
Y = reshape(CD1, 20, 6);
Z = diff(y(1:2:end,:));
MEAN_ABS_2ND_DIFF_RESULT = mean(abs(Z));
This is not a valid statement in matlab:
Y(PartNo)(3:end)
You should either make Y two-dimensional and use this indexing
Y(PartNo, 3:end)
or extract vector parts and use them directly, if you use a loop like you have shown
for PartNo = 1:6
% extract data
Y = CD1(1 + 20*(PartNo-1):20*(PartNo),:);
% find the second difference
Z = Y(3:end) - Y(1:end-2);
% mean of absolute value
MEAN_ABS_2ND_DIFF_RESULT(PartNo) = mean(abs(Z));
end
Also, since CD1 is a vector, you do not need to index the second dimension. Drop the :
Y = CD1(1 + 20*(PartNo-1):20*(PartNo));
Finally, you do not need a loop. You can reshape the CD1 vector to a two-dimensional array Y of size 20x6, in which the columns are your parts, and work directly on the resulting matrix:
Y = reshape(CD1, 20, 6);
Z = Y(3:end,:)-Y(1:end-1,:);
MEAN_ABS_2ND_DIFF_RESULT = mean(abs(Z));