I am using the matlab function fitrm to fit repeated measures model in order to investigate whether elements grouped according to Grouping1 have statistically different means for the variable measured at time t1,t2,t3,t4 (var_t1,var_t2,var_t3,var_t4).
My data look like the ones in the table:
Grouping1 Grouping2 Gender Age BMI var_t1 var_t2 var_t3 var_t4
______ ___________ ______ ______ ______ ____________ ____________ ____________ ____________
C B Male 60 24.802 836 608 746 NaN
C A Male 67 19.818 242 544 460 483
... ...
D C Female 65 21.631 621 468 NaN NaN
As you can see from I have some missing data for var_t3 and var_t4.
Can I still use fitrm?
If fit a repeated measures model, where var_t1,-var_t4 are the responses and Grouping1, Grouping2, Gender, Age and BMI are the predictor variables
Time = [1:4]';
rm = fitrm(table,'var_t1-var_t4 ~ Grouping1 + Grouping2 + Gender + Age + BMI','WithinDesign',Time)
the function doesn't return error, but I don't know if the results have any meaning...
Related
So i have a homogenous numeric array as shown below. I converted this array to a table using the array2Table function. What is shown below is simply the variabe names being applied to the array. I have column names but I would like to have row names as well. Is it the fact that the array is of one variable class that I cant do this?
T = array2table(C,'RowNames',{'','T0','T1','T2','T3','T4'},'VariableNames' ,{'to','t1','t2','t3','t4','t5','t6','t7','t8','t9','t10'})
T =
6×11 table
to t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
___ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______
0 18 36 54 72 90 108 126 144 162 180
15 15 15 15 15 15 15 15 15 15 15
325 304.17 303.4 295.01 293.52 288.3 286.56 282.49 280.5 276.99 274.8
325 325 315.67 314.35 308.58 306.86 302.38 300.33 296.49 294.19 290.74
325 325 325 320.82 319.8 315.43 313.61 309.61 307.35 303.69 301.2
325 325 325 325 321.25 319.95 315.9 313.85 310.05 307.63 304.1
The errors that Im getting here are:
Error using matlab.internal.tabular.private.rowNamesDim/validateAndAssignLabels (line 109)
The RowNames property must be a cell array, with each element containing one nonempty character vector.
Error in matlab.internal.tabular.private.tabularDimension/setLabels (line 173)
obj = obj.validateAndAssignLabels(newLabels,indices,fullAssignment,fixDups,fixEmpties,fixIllegal);
Error in matlab.internal.tabular.private.tabularDimension/createLike_impl (line 355)
obj = obj.setLabels(dimLabels,[]);
Error in matlab.internal.tabular.private.tabularDimension/createLike (line 62)
obj = obj.createLike_impl(dimLength,dimLabels);
Error in tabular/initInternals (line 206)
t.rowDim = t.rowDim.createLike(nrows,rowLabels);
Error in table.init (line 327)
t = initInternals(t, vars, numRows, rowLabels, numVars, varnames);
Error in array2table (line 64)
t = table.init(vars,nrows,rownames,nvars,varnames);
The error you're getting is
The RowNames property must be a cell array, with each element containing one nonempty character vector.
Here is a valid version:
T = array2table(C,'RowNames',{'T','T0','T1','T2','T3','T4'},'VariableNames' ,{'to','t1','t2','t3','t4','t5','t6','t7','t8','t9','t10'})
You need to change the first element of RowNames array to be nonempty character vector, e.g. 'T' instead of ''.
I execute the following line using the "hospital" data set and get the following:
>> statarray = grpstats(dsa,{'Smoker','Sex'},'mean','DataVars',{'Age','Weight'})
statarray =
Smoker Sex GroupCount mean_Age mean_Weight
0_Female false Female 40 37.425 130.32
0_Male false Male 26 38.808 180.04
1_Female true Female 13 38.615 130.92
1_Male true Male 21 39.048 181.14
I was wondering if it's easy to be able to instead have it be like this:
Smoker GroupCount mean_Age mean_Weight Male Female
0 false 66 37.97 149.91 21 40
1 true 34 38.882 161.94 26 13
I can't figure out how to bring the categorical variables to the columns like this of the stat table instead of having them as rows. Maybe this is not possible with grpstats. Just curious. Thanks!
You can count the sex in a separate crosstab, and then concatenate it to one table in statarray:
statarray = grpstats(dataset2table(hospital),{'Smoker'},'mean',...
'DataVars',{'Age','Weight'});
statarray{:,end+1:end+2} = crosstab(hospital.Smoker,hospital.Sex);
statarray.Properties.VariableNames(end-1:end) = categories(hospital.Sex);
Output:
statarray =
Smoker GroupCount mean_Age mean_Weight Female Male
______ __________ ________ ___________ ______ ____
0 false 66 37.97 149.91 40 26
1 true 34 38.882 161.94 13 21
You may notice I converted statarray from a dataset to a table, this is because of this message in Matlab's docs:
The dataset data type might be removed in a future release. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.
And indeed, table is more friendly...
I have a vectorization problem with nlinfit.
Let A = (n,p) the matrix of observations and t(1,p) the explanatory variable.
For ex
t=[0 1 2 3 4 5 6 7]
and
A=[3.12E-04 7.73E-04 3.58E-04 5.05E-04 4.02E-04 5.20E-04 1.84E-04 3.70E-04
3.38E-04 3.34E-04 3.28E-04 4.98E-04 5.19E-04 5.05E-04 1.97E-04 2.88E-04
1.09E-04 3.64E-04 1.82E-04 2.91E-04 1.82E-04 3.62E-04 4.65E-04 3.89E-04
2.70E-04 3.37E-04 2.03E-04 1.70E-04 1.37E-04 2.08E-04 1.05E-04 2.45E-04
3.70E-04 3.34E-04 2.63E-04 3.21E-04 2.52E-04 2.81E-04 6.25E+09 2.51E-04
3.11E-04 3.68E-04 3.65E-04 2.71E-04 2.69E-04 1.49E-04 2.97E-04 4.70E-04
5.48E-04 4.12E-04 5.55E-04 5.94E-04 6.10E-04 5.44E-04 5.67E-04 4.53E-04
....
]
I want to estimate a linear model for each row of A without looping and avoid the loop
for i=1:7
ml[i]=fitlm(A(i,:),t);
end
Thanks for your help !
Luc
I believe that your probem is about undertanding how fitlm works, for matrix:
Let's work with the hald example for matlab:
>> load hald
>> Description
Description =
== Portland Cement Data ==
Multiple regression data
ingredients (%):
column1: 3CaO.Al2O3 (tricalcium aluminate)
column2: 3CaO.SiO2 (tricalcium silicate)
column3: 4CaO.Al2O3.Fe2O3 (tetracalcium aluminoferrite)
column4: 2CaO.SiO2 (beta-dicalcium silicate)
heat (cal/gm):
heat of hardening after 180 days
Source:
Woods,H., H. Steinour, H. Starke,
"Effect of Composition of Portland Cement on Heat Evolved
during Hardening," Industrial and Engineering Chemistry,
v.24 no.11 (1932), pp.1207-1214.
Reference:
Hald,A., Statistical Theory with Engineering Applications,
Wiley, 1960.
>> ingredients
ingredients =
7 26 6 60
1 29 15 52
11 56 8 20
11 31 8 47
7 52 6 33
11 55 9 22
3 71 17 6
1 31 22 44
2 54 18 22
21 47 4 26
1 40 23 34
11 66 9 12
10 68 8 12
>> heat
heat =
78.5000
74.3000
104.3000
87.6000
95.9000
109.2000
102.7000
72.5000
93.1000
115.9000
83.8000
113.3000
109.4000
This means that you have a matrix ingredients column % of ingredients in a component
>> sum(ingredients(1,:))
ans =
99 % so it is near 100%
and the rows are the 13 measures of the prodcut and the heat vector, the heat at the observation was taken.
>> mdl = fitlm(ingredients,heat)
mdl =
Linear regression model:
y ~ 1 + x1 + x2 + x3 + x4
Estimated Coefficients:
Estimate SE tStat pValue
________ _______ ________ ________
(Intercept) 62.405 70.071 0.8906 0.39913
x1 1.5511 0.74477 2.0827 0.070822
x2 0.51017 0.72379 0.70486 0.5009
x3 0.10191 0.75471 0.13503 0.89592
x4 -0.14406 0.70905 -0.20317 0.84407
Number of observations: 13, Error degrees of freedom: 8
Root Mean Squared Error: 2.45
R-squared: 0.982, Adjusted R-Squared 0.974
F-statistic vs. constant model: 111, p-value = 4.76e-07
So in your case, it not have sense to measure for each observation separately. is simply with t the same number of elements than observations.
take a look here
mdl = fitllm(A,t)
Problem solved using sapply and findgroups !
I'm trying to pivot some trade data in KDB/q. Although my data are only slightly different from the working example on the website (see the general pivot function: http://code.kx.com/q/cookbook/pivoting-tables/),
I can't get the function to work, even after several hours of trying (I'm very new to KDB).
Put simply, I'm trying to go from this table:
q)5# trades_agg
date sym time exchange buysell| shares
--------------------------------------| ------
2009.01.05 aaca 09:30 BATS B | 484
2009.01.05 aaca 09:30 BATS S | 434
2009.01.05 aaca 09:30 NASDAQ B | 235
2009.01.05 aaca 09:30 NASDAQ S | 429
2009.01.05 aaca 09:30 NYSE B | 309
to this one:
date sym time | BATSsharesB BATSsharesS NASDAQsharesB ...
----------------------| -----------------------------------------------
2009.01.05 aaca 09:30 | 484 434 235 ...
... | ...
I'll provide a working example to illustrate things:
// Create data
qpd:5*2*4*"i"$16:00-09:30
date:raze(100*qpd)#'2009.01.05+til 5
sym:(raze/)5#enlist qpd#'100?`4
sym:(neg count sym)?sym
time:"t"$raze 500#enlist 09:30:00+15*til qpd
time+:(count time)?1000
exchange:raze 500#enlist raze(qpd div 3)#enlist`NYSE`NASDAQ`BATS
buysell:raze 500#enlist raze(qpd div 2)#enlist`B`S
shares:(500*qpd)?100
trades:([]date;sym;time;exchange;buysell;shares)
//I then aggregate the data into equal sized buckets
trades_agg: select sum shares by date, sym, time: 15 xbar time.minute, exchange, buysell from trades
// pivot function from the code.kx.com website
piv:{[t;k;p;v;f;g]
v:(),v;
G:group flip k!(t:.Q.v t)k;
F:group flip p!t p;
count[k]!g[k;P;C]xcols 0!key[G]!flip(C:f[v]P:flip value flip key F)!raze
{[i;j;k;x;y]
a:count[x]#x 0N;
a[y]:x y;
b:count[x]#0b;
b[y]:1b;
c:a i;
c[k]:first'[a[j]#'where'[b j]];
c}[I[;0];I J;J:where 1<>count'[I:value G]]/:\:[t v;value F]}
I subsequently apply this pivot function to the example with the functions f and g set to their default (::) values but I get an error message:
piv[`trades_agg;`date`sym`time;`exchange`buysell;`shares;(::);(::)]
Even when I use the suggested f and g functions it doesn't work:
f:{[v;P]`$raze each string raze P[;0],'/:v,/:\:P[;1]}
g:{[k;P;c]k,(raze/)flip flip each 5 cut'10 cut raze reverse 10 cut asc c}
I don't get why this is not working correctly since it is so close to the example on the website.
This is a self-contained version that's easier to use:
tt:1000#0!trades_agg
piv:{[t;k;p;v]
/ controls new columns names
f:{[v;P]`${raze " " sv x} each string raze P[;0],'/:v,/:\:P[;1]};
v:(),v; k:(),k; p:(),p; / make sure args are lists
G:group flip k!(t:.Q.v t)k;
F:group flip p!t p;
key[G]!flip(C:f[v]P:flip value flip key F)!raze
{[i;j;k;x;y]
a:count[x]#x 0N;
a[y]:x y;
b:count[x]#0b;
b[y]:1b;
c:a i;
c[k]:first'[a[j]#'where'[b j]];
c}[I[;0];I J;J:where 1<>count'[I:value G]]/:\:[t v;value F]};
q)piv[`tt;`date`sym`time;`exchange`buysell;enlist `shares]
date sym time | BATS shares B BATS shares S NASDAQ shares B NASDAQ sha..
---------------------| ------------------------------------------------------..
2009.01.05 adkk 09:30| 577 359 499 452 ..
2009.01.05 adkk 09:45| 882 501 339 467 ..
2009.01.05 adkk 10:00| 620 513 411 128 ..
2009.01.05 adkk 10:15| 501 544 272 544 ..
2009.01.05 adkk 10:30| 291 594 363 331 ..
2009.01.05 adkk 10:45| 867 500 498 536 ..
2009.01.05 adkk 11:00| 624 632 694 493 ..
2009.01.05 adkk 11:15| 99 704 600 299 ..
2009.01.05 adkk 11:30| 269 394 280 392 ..
2009.01.05 adkk 11:45| 635 744 758 597 ..
2009.01.05 adkk 12:00| 562 354 498 405 ..
2009.01.05 adkk 12:15| 416 437 303 492 ..
2009.01.05 adkk 12:30| 447 699 370 302 ..
2009.01.05 adkk 12:45| 336 647 512 245 ..
2009.01.05 adkk 13:00| 692 457 497 553 ..
Your table is keyed so unkey it:
trades_agg:0!select sum shares by date, sym, time: 15 xbar time.minute,exchange,buysell from trades
And define your g as:
g:{[k;P;c]k,c}
Best way to figure out what the f/g needs to be is to define it with a breakpoint and then investigate the variables
g:{[k;P;c]break}
I found it difficult to understand the original piv function in Ryan's answer, so I updated it by adding some comments + more readable variable names HTH
piv:{[table; rows; columns; vals]
/ make sure args are lists
vals: (),vals;
rows: (),rows;
columns: (),columns;
/ Get columns of table corresponding to those of row labels and calculate groups
/ group returns filteredValues dict whose keys are the unique row labels and vals are the row indices of each group e.g. (0 1 3; 2 4; ...)
rowGroups: group rows#table;
rowGroupIdxs: value rowGroups;
rowValues: key[rowGroups];
/ Similarly, get columns of table corresponding to those of column labels and calculate groups
colGroups: group columns#table;
colGroupIdxs: value colGroups;
colValues: key colGroups;
getPivotCol: {[rowGroupStartIdx; nonSingleRowGroups; nonSingleRowGroupsIdx; vals; colGroupIdxs]
/ vals: the list of values for this particular value-column combination
/ colGroupIdxs: the list of indices for this particular column group
/ We only care about vals that should belong in this pivot column - we need to filter out vals not part of this column group
filteredValues: count[vals]#vals[0N];
filteredValues[colGroupIdxs]: vals[colGroupIdxs];
/ Equivalent to filteredValues <> 0N
hasValue: count[vals]#0b;
hasValue[colGroupIdxs]: 1b;
/ Seed off pivot column with the first (filtered) value of each row group
/ This will be correct for row groups of size 1 as no aggregation needs to occur
pivotCol: filteredValues[rowGroupStartIdx];
/ Otherwise, for the row groups larger than 1, get the first (filtered) value
pivotCol[nonSingleRowGroupsIdx]: first'[filteredValues[nonSingleRowGroups]#'where'[hasValue[nonSingleRowGroups]]];
pivotCol
}
/ Groups with more than 1 row (these are the ones that will need aggregating)
nonSingleRowGroupsIdx: where 1 <> count'[rowGroupIdxs];
/ Get resulting pivot column for each combination of column and value fields
pivotCols: raze getPivotCol[rowGroupIdxs[;0]; rowGroupIdxs[nonSingleRowGroupsIdx]; nonSingleRowGroupsIdx] /:\: [table[vals]; colGroupIdxs]
/ Columns names are the cross-product of column and value fields
colNames:`${raze "" sv vals} each string raze (flip value flip colValues),'/:vals;
/ Finally, stitch together row and column headings with pivot data to obtain final table
rowValues!flip colNames!pivotCols
};
I also made a small change to formatting of columns names for my needs btw
Let's say I have a table like this:
post user date
____ ____ ________________
1 A 12.01.2014 13:05
2 B 15.01.2014 20:17
3 A 16.01.2014 05:22
I want to create a smaller table (but not delete the original one!) containing all posts of - for example - user A including the dates that those were posted on.
When looking at MATLAB's documentation (see the very last part for deleting rows) I discovered that MATLAB allows you to create a mask for a table based on some criterion. So in my case if I do something like this:
postsA = myTable.user == 'A'
I get a nice mask vector as follows:
>> postsA =
1
0
1
where the 1s are obviously those rows in myTable, which satisfy the rule I have given.
In the documention I have pointed at above rows are deleted from the original table:
postsNotA = myTable.user ~= 'A' % note that I have to reverse the criterion since I'm choosing stuff that will be removed
myTable(postsNotA,:) = [];
I would however - as stated above - like to not touch my original table. One possible solution here is to create an empty table with two columns:
post date
____ ____
interate through all rows of my original table, while also looking at the current value of my mask vector postsA and if it's equal to 1, copy the two of the columns in that row that I'm interested in and concatenate this shrunk row to my smaller table. What I'd like to know is if there is a more or less 1-2 lines long solution for this problem?
Assuming myTable is your original table.
You can just do
myTable(myTable.user == 'A',:)
Sample Code:
user = ['A';'B';'A';'C';'B'];
Age = [38;43;38;40;49];
Height = [71;69;64;67;64];
Weight = [176;163;131;133;119];
BloodPressure = [124 93; 109 77; 125 83; 117 75; 122 80];
T = table(user,Age,Height,Weight,BloodPressure)
T(T.user=='A',:)
Gives:
T =
user Age Height Weight BloodPressure
____ ___ ______ ______ _________________________
A 38 71 176 124 93
B 43 69 163 109 77
A 38 64 131 125 83
C 40 67 133 117 75
B 49 64 119 122 80
ans =
user Age Height Weight BloodPressure
____ ___ ______ ______ _________________________
A 38 71 176 124 93
A 38 64 131 125 83