Loops for covariates: making SPSS choose covariates according to predictor in linear regression - multiple-regression

In my data set, I have several independent variables (here named "predictor1", "predictor2" etc.) and several dependent variables ("outcomeA", "outcomeB" etc). Furthermore, I have several covariates ("covariate1", "covariate2", etc.).
I want to do linear regression analysis, in which I
FIRST use predictor1 to predict all outcomes, while adjusting for covariate1.
THEN use predictor2 to predict all outcomes, while adjusting for covariate2.
THEN use predictor3 to predict all outcomes, while adjusting for covariate3.
etc.
I know how to build a loop that will use predictor1 to predict all outcomes, then use predictor2 to predict all outcomes, etc. I also know how to add covariates that would be used for all models regardless of the predictor in question, but that's no use.
What I don't know is this: how to I "couple" covariates with predictors, so that when I predict the outcomes with predictor1, I will also adjust for covariate1? Then, when I predict the outcomes with predictor2, I will adjust for covariate2, etc.
Below here is my syntax for doing this so that all models include the same covariates. How do I change this so that SPSS will not use all these covariates for all models, but will choose them according to the independent variable? Can I build two lists for independent variables that the loop will go through (e.g. "!indepvars1" and "!indepvars2") or something similar, or what could I do?
Obviously, I have no experience in programming, and I just couldn't get it to work on my own. Perhaps the answer is obvious.
PRESERVE.
SET TVARS NAMES.
oms select tables
/destination format = sav
numbered = "Table_Number"
outfile = '\\ATKK\visit1_TEMP1.sav'
/if commands = ['regression']
subtypes = ['Coefficients']
/tag = "reg".
*////////////////////.
DEFINE !regtest100 (indepvars=!charend ('/') /depvars=!CMDEND)
!DO !depvar !IN (!depvars)
!DO !indepvar !IN (!indepvars)
regression
/STATISTICS COEFF OUTS CI(95) R ANOVA
/dependent = !depvar
/method = enter !indepvar covariate1 covariate2 covariate3.
!DOEND
!DOEND
!ENDDEFINE.
*///////////////////
!regtest100
indepvars= predictor1 predictor2 predictor3.
/ depvars= outcomeA outcomeB outcomeC.
EXECUTE.

There is multiple ways you can achieve this depending on how you expect your inputs to arrive.
Here below is one method, where both the regression and co-variate variables are provided as arguments to the macro parameters. The variable are paired together on their position. (You would expect the number of variables entered in the indepvars macro parameter and the covars parameter to be equal. I would build an explicit check for this to be thorough).
I've added notes in the macro body to explain some of the logic, which I hope may help.
*////////////////////.
define !RunJob1 (depvars=!charend ('/') /indepvars=!charend ('/') /covars=!charend ('/'))
!do !dv !in (!depvars)
!let !cv=!covars /* make copy of original list of variables */
!do !iv !in (!indepvars)
!let !cvhead=!head(!cv) /* retrieve first variable in list */
!let !cv=!tail(!cv) /* retrieve all but first variable in list */
title !dv !iv !cvhead /* print results as test */ .
!doend
!doend
!enddefine.
*////////////////////.
!RunJob1 depvars=dv1 dv2 dv3 /indepvars=Apple Bananna Carrot /covars=A1 B2 C3.
Alternatively, if your input variables had a particular format where they had a predefined prefix stub followed by numeric suffix then you could approach this slightly differently which would also be somewhat easier to code. Again here is another example demonstrating this also:
*////////////////////.
define !RunJob2 (depvars=!charend ('/'))
!do !dv !in (!depvars)
!let !cv=!covars
!do !i = 1 !to 3
title !dv !concat("iv", !i) !concat("cv",!i) .
!doend
!doend
!enddefine.
*////////////////////.
!RunJob2 depvars=dv1 dv2 dv3.
Obviously many assumptions being made here. You'll have to assess your data to see what best suits your needs (if any or if something else may be more appropriate)

With no experience in programming, it still took me a little while to apply the great advice, given here by Jignesh Sutar, into practice. Here's the final SPSS syntax, in case it helps others who really need script for dummies -type examples.
*////////////////////.
DEFINE
!regtest101 (depvars=!charend ('/') /indepvars=!charend ('/') /covars=!charend ('/'))
!DO !depvar !IN (!depvars)
!let !cv=!covars
!DO !indepvar !IN (!indepvars)
!let !cvhead=!head(!cv)
!let !cv=!tail(!cv)
regression
/STATISTICS COEFF OUTS CI(95) R ANOVA
/dependent = !depvar
/method = enter !indepvar !cvhead extracovariateA extracovariateB.
!doend
!doend
!enddefine.
*////////////////////.
!regtest101
depvars= outcomeA outcomeB outcomeC
/indepvars= predictor1 predictor2 predictor3
/covars= covariate1 covariate2 covariate3.
EXECUTE.
The two extra covariates (extracovariateA, extracovariateB) are additional covariates that are included in all models, while covariate1, covariate2, and covariate3 are the ones that are added in consecutive models ("paired up" with predictor1, predictor2, predictor3).

Related

Is it possible to use tbl_regression fonction with lmer fonction with random effect?

I work on antifungal activity of some molecules ("cyclo") added with fungicides and I want to assess impact of these cyclos and their concentration ratio. CMI is a quantitative variable and all other variables are factors.
I have this script:
mod=lmer(CMI ~ cyclo*ratio + (1|fungicide) + (1|strains), data)
And I'd like to know if I can use tbl_regression() (library(gtsummary)) with my lmer()?
If yes, what do I have to specify for exponentiate term ?
If I write exponentiate=FALSE I obtain the same values than the estimates in the classical summary(mod).
Thank you for your help
Steffi
The default behavior for tbl_regression() for a mixed-effects models is to print the fixed-effects only. To see the full output, including the random components, you need to override the default function for tidying up the model results using the tidy_fun= argument.
library(gtsummary)
lme4::lmer(age ~ marker + (1|grade), trial) %>%
tbl_regression(
# set the tidying function to broom.mixed::tidy to show random effects
tidy_fun = broom.mixed::tidy,
)
You can use the label= argument to update the label displayed for the random components if you wish.
The default is exponentiate = FALSE, so you don't need specify in the tbl_regression() call.
For more details on the tidy_fun= argument, you can review this help file: http://www.danieldsjoberg.com/gtsummary/reference/vetted_models.html
Hope this helps! Happy Coding!

Only add existing variables in Matlab

I would like to add the sums of 5 one-column arrays in Matlab. The catch is that depending on previous inputs, any of these arrays may or may not exist, thus throwing an error when I try to add the sums of these arrays for post-processing.
After doing some digging I found the following function which I can use to return a logical statement if a certain variable exists in the workspace:
exist('my_variable','var') == 1
I'm not sure this helps me in this case though - Currently the line in which I add the sums of the various arrays looks as follows:
tot_deviation = sum(var1) + sum(var2) + sum(var3) + sum(var4) + sum(var4);
Is there a short way to add only the sums of the existing arrays without excessive loops?
Any help would be appreciated!
You can use if statements:
if ~exist('var1','var'), var1 = 0;end
if ~exist('var2','var'), var2 = 0;end
if ~exist('var3','var'), var3 = 0;end
if ~exist('var4','var'), var4 = 0;end
tot_deviation = sum(var1) + sum(var2) + sum(var3) + sum(var4) + sum(var4);
To my knowledge there is no quick way to do this in matlab. It seems to me that, depending on the structure of your code, you have the following alternatives:
1- Initialize all your variables to column arrays of zeros with something like
var = zeros(nbLines, 1);
2- Put all your columns vectors side to side in a single array and the use tot_deviation = sum(sum(MyArray)); which will work no matter how many columns and lines there is in the array.
3- If you pass your variables to a function you can check the number of inputs arguments inside the function using 'nargin' and then only proceed to sum the right number of variables.
I would recommend using the second method for it seem to me that it is the one that allows you to take the most advantage of matlab's array system which good.
The most robust solution is to initialise all of your variables to 0 at the top of the function. Then there is no chance they don't exist, and they influence the summation correctly.
Alternatively...
You could (read: shouldn't) use a really nasty eval trick here for flexibility...
vars = {'var1','var2','var3','var4'};
tot = 0;
for ii = 1:numel(vars)
if exist(vars{ii}, 'var')
tot = tot + eval(var);
end
end
I say it's "nasty" because eval should be avoided (read the linked blog). The check on the variable name existence mitigates some of the strife, but it's still not ideal.
As suggested in the MathWorks blog on evading eval, a better option would be a struct with dynamic field names. You could use almost the same syntax as above, but replace the if statement with
if isfield( myStruct, vars{ii} )
tot = tot + myStruct.(vars{ii});
end
This will avoid dynamically named variables and keep your workspace clean!

Find variables and the dependencies between them in code

I am trying to write a function that finds all the variables in a function which contains computing operations. I do this to find out which variables this block of calculations requires as input arguments to perform the calculations.
A calculation function always gets a table with different parameters as input, accesses certain parameters of that table to calculate certain metrics.
For example my data table T contains the double arrays Power and Time. This table is passed to the function calc_energy which then calculates Energy:
function [ T ] = calc_energy( T )
T.Energy = T.Power .* T.Time;
end
Say whenever a new data table is generated, different calculations as the above one are run automatically. Now if Power itself is calculated by the function calc_power, but calc_energy is run before or parallel to calc_power, then I have a problem because calc_energy doesn't find the required variables in T.
In order to prevent such an error, I want my function check_required_variables to be called inside the calc_energy and check beforehand whether T.Power exists. The thing is that check_required_variables should be a general function that is called in every single calculation function and therefore doesn't know the required variables. It has to find them in the function that it is called by.
function [ T ] = calc_energy( T )
OK = check_required_variables( T, 'calc_energy.m' );
if OK == 1
T.Energy = T.Power .* T.Time;
else
error('Required fields not found');
end
end
Are there any functions that spot the variables Energy, Power and Time used in my code? And are there functions that maybe analyse the dependencies between these variables (Energy dependent on Power and Time)? What are generally clever ways to figure out the ocurring variables just from the code? Any ideas?
I know this is a tough one so I'm thankful for any suggestions.
use isfield
function [ T ] = calc_energy( T )
if all(isfield(T,{'Power','Time'}))
T.Energy = T.Power .* T.Time;
else
error('Nooooope')
end
end
However, knowing which ones are required seems a bit harder.... You can always try to read the .m file, and regexp for T.____, then put the input of that to isfield.
This however, just hints of bad software design. A function should know what it requires to run. What happens if OK is false? It just skips the computation? You can then cascade to hundreds of calls (depend on the application) because the original structure failed to have a variable, or had a typo. I'd take the radically opposite approach to software design:
function [ T ] = calc_energy( T )
assert(all(isfield(T,{'Power','Time'}))); %error if there are not.
T.Energy = T.Power .* T.Time;
end

Importing much data to Matlab

I have many .txt files like these: "u1.txt", "i3.txt", "p10.txt"... How to load all of these files to matlab, to variables like these: "u1", "u2"... "p1"... Here is my code:
clc, clear all
%% loading data
for j=0:3
switch j
case 0
variable='i';
case 1
variable='u';
case 2
variable='p';
case 3
variable='q';
end
for i=0:15
name = strcat(variable, int2str(i), '.txt')
fid=fopen(name,'r');
data=textscan(fid,'%*s%*s%s%s%s%*s','HeaderLines',10,'CollectOutput',1);
fclose(fid);
data=strrep(data{1},',','.');
data=cellfun(#str2num, data);
end
end
Problem is with variable data - how to change this variable to: "u1", "u2"... "p1"... after every loop?
You could use the variable names u1 u2 etc.. but I would strongly recommend not to do so. Then these consecutive variables u1 to u15 are totally individual variables and basically matlab can not iterate over these variables. For this purpose, I would use a struct which contains cell arrays. Use this line to assign:
allData.(variable){i}=data
And to get your data, instead of u1 use allData.u{1}. These are some more characters to write, but having such structured data results in much simpler code when you use the data.
//Code:
for j=0:3
switch j
case 0
variable='i';
case 1
variable='u';
case 2
variable='p';
case 3
variable='q';
end
for i=0:15
name = strcat(variable, int2str(i), '.txt')
fid=fopen(name,'r');
data=textscan(fid,'%*s%*s%s%s%s%*s','HeaderLines',10,'CollectOutput',1);
fclose(fid);
data=strrep(data{1},',','.');
data=cellfun(#str2num, data);
allData.(variable){i}=data;
end
end
Use eval.
For example:
x = input('Enter the name of the new variable: ','s');
eval([x,'=0:4;']);
In your case:
variablename = strcat(variable,int2str(i));
eval([variablename,'=cellfun(#str2num,',variablename,')'];
This is a good read: Creating variables on the run
You could use eval to do this:
eval([name '=textscan(fid,'%*s%*s%s%s%s%*s','HeaderLines',10,'CollectOutput',1);']);
I think that syntax is correct but I'm not sure. You might have to play around with it a bit to get it to work, which is a huge drawback to using eval in the first place. Not to mention the syntax parser can't work with it, making it more difficult to debug.
My recommendation would be to utilize MATLAB's capability to use dynamic fieldnames.
data.(name)=textscan(fid,'%*s%*s%s%s%s%*s','HeaderLines',10,'CollectOutput',1);
Much cleaner and much easier to debug.
My other recommendation is that you evaluate why you need to utilize this naming scheme in the first place. It will be much easier to create an array (numeric or cell) and use that numeric ID as an index rather than include it with the variable name.

Calculations in table based on variable names in matlab

I am trying to find a better solution to calculation using data stored in table. I have a large table with many variables (100+) from which I select smaller sub-table with only two observations and their difference for smaller selection of variables. Thus, the resulting table looks for example similarly to this:
air bbs bri
_________ ________ _________
test1 12.451 0.549 3.6987
test2 10.2 0.47 3.99
diff 2.251 0.078999 -0.29132
Now, I need to multiply the ‘diff’ row with various coefficients that differ between variables. I can get the same result with the following code:
T(4,:) = array2table([T.air(3)*0.2*0.25, T.bbs(3)*0.1*0.25, T.bri(3)*0.7*0.6/2]);
However, I need more flexible solution since the selection of variables will differ between applications. I was thinking that better solution might be using either varfun or rowfun and speficic function that would assign correct coefficients/equations based on variable names:
T(4,:) = varfun(#func, T(3,:), 'InputVariables', {'air' 'bbs' 'bri'});
or
T(4,:) = rowfun(#func, T(3,:), 'OutputVariableNames', T.Properties.VariableNames);
However, the current solution I have is similarly inflexible as the basic calculation above:
function [air_out, bbs_out, bri_out] = func(air, bbs, bri)
air_out = air*0.2*0.25;
bbs_out = bbs*0.1*0.25;
bri_out = bri*0.7*0.6/2;
since I need to define every input/output variable. What I need is to assign in the function coefficients/equations for every variable and the ability of the function to apply it only to the variables that are present in the specific sub-table.
Any suggestions?