MATLAB - show categorical variables as columns instead of rows? - matlab

I execute the following line using the "hospital" data set and get the following:
>> statarray = grpstats(dsa,{'Smoker','Sex'},'mean','DataVars',{'Age','Weight'})
statarray =
Smoker Sex GroupCount mean_Age mean_Weight
0_Female false Female 40 37.425 130.32
0_Male false Male 26 38.808 180.04
1_Female true Female 13 38.615 130.92
1_Male true Male 21 39.048 181.14
I was wondering if it's easy to be able to instead have it be like this:
Smoker GroupCount mean_Age mean_Weight Male Female
0 false 66 37.97 149.91 21 40
1 true 34 38.882 161.94 26 13
I can't figure out how to bring the categorical variables to the columns like this of the stat table instead of having them as rows. Maybe this is not possible with grpstats. Just curious. Thanks!

You can count the sex in a separate crosstab, and then concatenate it to one table in statarray:
statarray = grpstats(dataset2table(hospital),{'Smoker'},'mean',...
'DataVars',{'Age','Weight'});
statarray{:,end+1:end+2} = crosstab(hospital.Smoker,hospital.Sex);
statarray.Properties.VariableNames(end-1:end) = categories(hospital.Sex);
Output:
statarray =
Smoker GroupCount mean_Age mean_Weight Female Male
______ __________ ________ ___________ ______ ____
0 false 66 37.97 149.91 40 26
1 true 34 38.882 161.94 13 21
You may notice I converted statarray from a dataset to a table, this is because of this message in Matlab's docs:
The dataset data type might be removed in a future release. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.
And indeed, table is more friendly...

Related

How to include “within” and “between” predictors in a model?

I have a dataset of fish caught in several points in 120 lakes. In each lake, we collected fish at random places (points), resulting in approximately 800 fish observations. Among other factors, I would like to account fish variation (e.g. density of individuals captured) “between” and “within” lakes, since I have the coordinates for each fish observation and also for each lake.
How is it possible to include “within” and “between” lake variation in a model to verify which one is more important for fish response?
This is an example of my dataset:
observation
lake
point
Xcoord.lake
Ycoord.lake
Xcoord.point
Ycoord.point
fish
factor1
factor2
1
lake1
1
453497
6166094
453022
6166137
100
37.10
158.9
2
lake1
2
453025
6166110
453022
6166137
95
15.7
105.1
3
lake1
3
453093
6166079
453022
6166137
73
0
170.0
4
lake2
1
493269
6319292
493269
6318708
0
5.1
185.7
5
lake2
2
493269
6319292
493568
6318542
10
10.7
129.4
6
lake2
3
493269
6319292
493627
6318531
8
20.8
257.3
...
...
...
...
...
...
...
...
...
...
798
lake120
1
517966
6143495
517967
6143454
252
50.2
326.9
799
lake120
2
517966
6143495
517969
6143379
158
33.8
196.4
800
lake120
3
517966
6143495
517972
6143510
300
87.5
93.2
I was trying something like:
fit <-glmmPQL(fish ~ factor1 + factor2, random = ~1|lake, data = data,
family = poisson, correlation = corExp(form = ~ Xcoord.point + Ycoord.lake))
But I don’t know if this responds my question.
I really appreciate any help!

Changing the value of elements in a table, depending on a specific string for MATLAB

Suppose I have a MATLAB table of the following type:
Node_Number Generation_Type Total_power(MW)
1 Wind 600
1 Solar 452
1 Tidal 123
2 Wind 200
2 Tidal 159
What I want to do is to produce a table with exactly same dimensions, with the only difference being the value of the data of the Total_Power column that corresponds to the Wind generation type being multiplied with 0.5. Hence the result that I would get would be:
Node_Number Generation_Type Total_power(MW)
1 Wind 300
1 Solar 452
1 Tidal 123
2 Wind 100
2 Tidal 159
What I believe that would do the trick is some code which would scan all the rows that have the string 'Wind', and then after locating the rows which have this string, to multiply the 3rd column of this row with 0.5. A for loop seems like a viable solution, though I am not sure how to implement this. Any help would be greatly appreciated.
Just find the index of rows with the category Wind, and then you could have access to them by calling T(index,:).
clc; clear;
T=readtable('data.txt');
rows = find(ismember(T.Generation_Type,'Wind'));
T(rows,:).Total_power_MW_=T(rows,:).Total_power_MW_*0.5
Output:
Node_Number Generation_Type Total_power_MW_
___________ _______________ _______________
1 'Wind' 300
1 'Solar' 452
1 'Tidal' 123
2 'Wind' 100
2 'Tidal' 159

Missing data in repeated measure model

I am using the matlab function fitrm to fit repeated measures model in order to investigate whether elements grouped according to Grouping1 have statistically different means for the variable measured at time t1,t2,t3,t4 (var_t1,var_t2,var_t3,var_t4).
My data look like the ones in the table:
Grouping1 Grouping2 Gender Age BMI var_t1 var_t2 var_t3 var_t4
______ ___________ ______ ______ ______ ____________ ____________ ____________ ____________
C B Male 60 24.802 836 608 746 NaN
C A Male 67 19.818 242 544 460 483
... ...
D C Female 65 21.631 621 468 NaN NaN
As you can see from I have some missing data for var_t3 and var_t4.
Can I still use fitrm?
If fit a repeated measures model, where var_t1,-var_t4 are the responses and Grouping1, Grouping2, Gender, Age and BMI are the predictor variables
Time = [1:4]';
rm = fitrm(table,'var_t1-var_t4 ~ Grouping1 + Grouping2 + Gender + Age + BMI','WithinDesign',Time)
the function doesn't return error, but I don't know if the results have any meaning...

MATLAB - extract selected rows in a table based on some criterion

Let's say I have a table like this:
post user date
____ ____ ________________
1 A 12.01.2014 13:05
2 B 15.01.2014 20:17
3 A 16.01.2014 05:22
I want to create a smaller table (but not delete the original one!) containing all posts of - for example - user A including the dates that those were posted on.
When looking at MATLAB's documentation (see the very last part for deleting rows) I discovered that MATLAB allows you to create a mask for a table based on some criterion. So in my case if I do something like this:
postsA = myTable.user == 'A'
I get a nice mask vector as follows:
>> postsA =
1
0
1
where the 1s are obviously those rows in myTable, which satisfy the rule I have given.
In the documention I have pointed at above rows are deleted from the original table:
postsNotA = myTable.user ~= 'A' % note that I have to reverse the criterion since I'm choosing stuff that will be removed
myTable(postsNotA,:) = [];
I would however - as stated above - like to not touch my original table. One possible solution here is to create an empty table with two columns:
post date
____ ____
interate through all rows of my original table, while also looking at the current value of my mask vector postsA and if it's equal to 1, copy the two of the columns in that row that I'm interested in and concatenate this shrunk row to my smaller table. What I'd like to know is if there is a more or less 1-2 lines long solution for this problem?
Assuming myTable is your original table.
You can just do
myTable(myTable.user == 'A',:)
Sample Code:
user = ['A';'B';'A';'C';'B'];
Age = [38;43;38;40;49];
Height = [71;69;64;67;64];
Weight = [176;163;131;133;119];
BloodPressure = [124 93; 109 77; 125 83; 117 75; 122 80];
T = table(user,Age,Height,Weight,BloodPressure)
T(T.user=='A',:)
Gives:
T =
user Age Height Weight BloodPressure
____ ___ ______ ______ _________________________
A 38 71 176 124 93
B 43 69 163 109 77
A 38 64 131 125 83
C 40 67 133 117 75
B 49 64 119 122 80
ans =
user Age Height Weight BloodPressure
____ ___ ______ ______ _________________________
A 38 71 176 124 93
A 38 64 131 125 83

How to make structure fields programmable in matlab?

I have a excell file that I need to read into matlab. I already read excel file as cell form in matlab.
I want to convert cell data type to structure in matlab. My excel file is like this
Va Ia Vb Ib Vc ... ...
01:00 100 10 200 20 300 ... ...
02:00 110 11 210 21 310 ... ...
03:00 120 12 220 22 320 ... ...
04:00 130 13 230 23 330 ... ...
... ... ... ... ... ... ... ...
and then I know taht structure data type consist field and material. I wanna that first column use to field name. for instance, I want this
structure A
field 1: Va
material : [100;110;120;130 ... ...]
field 2: Ia
material : [10 ;11 ;12 ;13 ... ...]
and I will use that to access variables in structure
A.Va(3)= ~~~~~ ;somethig.....