Error in post hoc test for lmer(): both multcomp() and emmeans() - linear-regression

I have a dataset of measurements of "Y" at different locations, and I am trying to determine how variable Y is influenced by variables A, B, and D by running a lmer() model and analyzing the results. However, when I reach the post hoc step, I receive an error when trying to analyze.
Here is an example of my data:
table <- " ID location A B C D Y
1 1 AA 0 0.6181587 -29.67 14.14 168.041
2 2 AA 1 0.5816176 -29.42 14.21 200.991
3 3 AA 2 0.4289670 -28.57 13.55 200.343
4 4 AA 3 0.4158891 -28.59 12.68 215.638
5 5 AA 4 0.3172721 -28.74 12.28 173.299
6 6 AA 5 0.1540603 -27.86 14.01 104.246
7 7 AA 6 0.1219355 -27.18 14.43 128.141
8 8 AA 7 0.1016643 -26.86 13.75 179.330
9 9 BB 0 0.6831649 -28.93 17.03 210.066
10 10 BB 1 0.6796935 -28.54 18.31 280.249
11 11 BB 2 0.5497743 -27.88 17.33 134.023
12 12 BB 3 0.3631052 -27.48 16.79 142.383
13 13 BB 4 0.3875498 -26.98 17.81 136.647
14 14 BB 5 0.3883785 -26.71 17.56 142.179
15 15 BB 6 0.4058061 -26.72 17.71 109.826
16 16 CC 0 0.8647298 -28.53 11.93 220.464
17 17 CC 1 0.8664036 -28.39 11.59 326.868
18 18 CC 2 0.7480748 -27.61 11.75 322.745
19 19 CC 3 0.5959143 -26.81 13.27 170.064
20 20 CC 4 0.4849077 -26.77 14.68 118.092
21 21 CC 5 0.3584687 -26.65 15.65 95.512
22 22 CC 6 0.3018285 -26.33 16.11 71.717
23 23 CC 7 0.2629121 -26.39 16.16 60.052
24 24 DD 0 0.8673077 -27.93 12.09 234.244
25 25 DD 1 0.8226558 -27.96 12.13 244.903
26 26 DD 2 0.7826429 -27.44 12.38 252.485
27 27 DD 3 0.6620447 -27.23 13.84 150.886
28 28 DD 4 0.4453213 -27.03 15.73 102.787
29 29 DD 5 0.3720257 -27.13 16.27 109.201
30 30 DD 6 0.6040217 -27.79 16.41 101.509
31 31 EE 0 0.8770987 -28.62 12.72 239.036
32 32 EE 1 0.8504547 -28.47 12.92 220.600
33 33 EE 2 0.8329484 -28.45 12.94 174.979
34 34 EE 3 0.8181102 -28.37 13.17 138.412
35 35 EE 4 0.7942685 -28.32 13.69 121.330
36 36 EE 5 0.7319724 -28.22 14.62 111.851
37 37 EE 6 0.7014828 -28.24 15.04 110.447
38 38 EE 7 0.7286984 -28.15 15.18 121.831"
#Create a dataframe with the above table
df <- read.table(text=table, header = TRUE)
df
# Make sure location is a factor
df$location<-as.factor(df$location)
Here is my model:
# Load libraries
library(ggplot2)
library(pscl)
library(lmtest)
library(lme4)
library(car)
mod = lmer(Y ~ A * B * poly(D, 2) * (1|location), data = df)
summary(mod)
plot(mod)
I now need to determine what variables significantly influence Y. Thus I ran Anova() from the package car (output pasted here).
Anova(mod)
# Analysis of Deviance Table (Type II Wald chisquare tests)
#
# Response: Y
# Chisq Df Pr(>Chisq)
# A 8.2754 1 0.004019 **
# B 0.0053 1 0.941974
# poly(D, 2) 40.4618 2 1.636e-09 ***
# A:B 0.1709 1 0.679348
# A:poly(D, 2) 1.6460 2 0.439117
# B:poly(D, 2) 5.2601 2 0.072076 .
# A:B:poly(D, 2) 0.6372 2 0.727175
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
This suggests that:
A significantly influences Y
B does not significantly influence Y
D significantly influences Y
So next I would run a post hoc test for each of these variables, but this is where I run into issues. I have tried using both emmeans and multcomp packages below:
library(emmeans)
emmeans(mod, list(pairwise ~ A), adjust = "tukey")
# NOTE: Results may be misleading due to involvement in interactions
# Error in if ((misc$estType == "pairs") && (paste(c("", by), collapse = ",") != :
# missing value where TRUE/FALSE needed
pairs(emmeans(mod, "A"))
# NOTE: Results may be misleading due to involvement in interactions
# Error in if ((misc$estType == "pairs") && (paste(c("", by), collapse = ",") != :
# missing value where TRUE/FALSE needed
library(multcomp)
summary(glht(mod, linfct = mcp(A = "Tukey")), test = adjusted("fdr"))
# Error in h(simpleError(msg, call)) :
# error in evaluating the argument 'object' in selecting a method for function 'summary': Variable(s) ‘depth’ of class ‘integer’ is/are not contained as a factor in ‘model’.
This is the first time I've run an ANOVA/post hoc test on a lmer() model, and though I've read a few introductory sites for this model, I'm not sure I am testing it correctly. Any help would be appreciated.

If I am looking at the data correctly, A is the variable that has values of 0, 1, ..., 7. Now look at your anova table, where you see that A has only 1 d.f., not 7 as it should for a factor having 8 levels. That means your model is taking A to be a numerical predictor -- which is rather meaningless. Make A into a factor and re-fit he model. You'll have better luck.
I also think you meant to have + (1|location) at the end of the model formula, rather than having the random effects interacting with some of the polynomial effects.

Related

how I delete combination rows that have the same numbers from matrix and only keeping one of the combinations?

for a=1:50; %numbers 1 through 50
for b=1:50;
c=sqrt(a^2+b^2);
if c<=50&c(rem(c,1)==0);%if display only if c<=50 and c=c/1 has remainder of 0
pyth=[a,b,c];%pythagorean matrix
disp(pyth)
else c(rem(c,1)~=0);%if remainder doesn't equal to 0, omit output
end
end
end
answer=
3 4 5
4 3 5
5 12 13
6 8 10
7 24 25
8 6 10
8 15 17
9 12 15
9 40 41
10 24 26
12 5 13
12 9 15
12 16 20
12 35 37
14 48 50
15 8 17
15 20 25
15 36 39
16 12 20
16 30 34
18 24 30
20 15 25
20 21 29
21 20 29
21 28 35
24 7 25
24 10 26
24 18 30
24 32 40
27 36 45
28 21 35
30 16 34
30 40 50
32 24 40
35 12 37
36 15 39
36 27 45
40 9 41
40 30 50
48 14 50
This problem involves the Pythagorean theorem but we cannot use the built in function so I had to write one myself. The problem is for example columns 1 & 2 from the first two rows have the same numbers. How do I code it so it only deletes one of the rows if the columns 1 and 2 have the same number combination? I've tried unique function but it doesn't really delete the combinations. I have read about deleting duplicates from previous posts but those have confused me even more. Any help on how to go about this problem will help me immensely!
Thank you
welcome to StackOverflow.
The problem in your code seems to be, that pyth only contains 3 values, [a, b, c]. The unique() funcion used in the next line has no effect in that case, because only one row is contained in pyth. another issue is, that the values idx and out are calculated in each loop cycle. This should be placed after the loops. An example code could look like this:
pyth = zeros(0,3);
for a=1:50
for b=1:50
c = sqrt(a^2 + b^2);
if c<=50 && rem(c,1)==0
abc_sorted = sort([a,b,c]);
pyth = [pyth; abc_sorted];
end
end
end
% do final sorting outside of the loop
[~,idx] = unique(pyth, 'rows', 'stable');
out = pyth(idx,:);
disp(out)
a few other tips for writing MATLAB code:
You do not need to end for or if/else stements with a semicolon
else statements cover any other case not included before, so they do not need a condition.
Some performance reommendations:
Due to the symmetry of a and b (a^2 + b^2 = b^2 + a^2) the b loop could be constrained to for b=1:a, which would roughly save you half of the loop cycles.
if you use && for contencation of scalar values, the second part is not evaluated, if the first part already fails (source).
Regards,
Chris
You can also linearize your algorithm (but we're still using bruteforce):
[X,Y] = meshgrid(1:50,1:50); %generate all the combination
C = (X(:).^2+Y(:).^2).^0.5; %sums of two square for every combination
ind = find(rem(C,1)==0 & C<=50); %get the index
res = unique([sort([X(ind),Y(ind)],2),C(ind)],'rows'); %check for uniqueness
Now you could really optimized your algorithm using math, you should read this question. It will be useful if n>>50.

(q/kdb+) Generating an automated list

Example 1)
I have the code below
5#10+1*2
that generates
index value
0 12
1 12
2 12
3 12
4 12
How can I replace the number "1" by the index?
then generating
5#10+index*2
index value
0 10
1 12
2 14
3 16
4 18
update Example 2)
Now, if I have, let's say
mult:5;
t:select from ([]numC:1 3 6 4 1;[]s:50 16 53 6 33);
update lst:(numC#'s) from t
the last update will generate
numC s lst
1 50 50
3 16 16 16 16
6 53 53 53 53 53 53 53
4 6 6 6 6 6
1 33 33
How can I generate the "lst" column as per below?
numC s lst
1 50 50+0*mult
3 16 16+0*mult 16+1*mult 16+2*mult
6 53 53+0*mult 53+1*mult 53+2*mult 53+3*mult 53+4*mult 53+5*mult
4 6 6+0*mult 6+1*mult 6+2*mult 6+3*mult
1 33 33+0*mult
I tried something like
update lst:(numC#'s + (til numC)*mult) from t
but I am getting an error
ERROR: 'type
Thanks vm
Is this what you're looking for:
q)x:5
q)x#10+(til x)*2
10 12 14 16 18
http://code.kx.com/q/ref/arith-integer/#til
You can remove take # and use til to simplify to:
q)10+2*til 5
10 12 14 16 18
Using til will create a list of a list of 5 elements (0->4), so you will not need take 5 elements from the resulting list. Take will only be required if your list of indices is greater than 5.
Update:
For your second example the following should work:
q)update lst:{y+x*til z}'[mult;s;numC] from t
q)update lst:s+mult*til each numC from t
numC s lst
-------------------------
1 50 ,50
3 16 16 21 26
6 53 53 58 63 68 73 78
4 6 6 11 16 21
1 33 ,33
There are many ways with which we can get achieve this:
1) 10+2*til 5
2) (2*til 5) + 10
/ take operator: The dyadic take function creates lists. The left argument specifies the count and shape and the right argument provides the data.
It is useful for selecting from the front or end of a list.
https://code.kx.com/wiki/Reference/NumberSign
q)5#0 1 2 3 4 5 6 7 8 / take the first 5 items
0 1 2 3 4
q)-5#0 1 2 3 4 5 6 7 8 / take the last 5 elements
4 5 6 7 8
use take operator # only when it is required.
say we have 10 elements, of which we need five on output, then we can use:
5#10+2*til 10
/ The til function takes a non-negative integer argument X and returns the first X integers

Read a big data file with headlines into a matrix

I have a file that looks like this (with real data and much bigger):
A B C D E F G H I
1 105.28 1 22 84 2 10.55 21 2
2 357.01 0 32 34 1 11.43 28 1
3 150.23 3 78 22 0 12.02 11 0
4 357.01 0 32 34 1 11.43 28 1
5 357.01 0 32 34 1 11.43 28 1
6 357.01 0 32 34 1 11.43 28 1
...
17000 357.01 0 32 34 1 11.43 28 1
I want to import all the numerical value into a matrix, skipping the headlines. For that purpose I use this code:
Filename = 'test.txt';
A = dlmread(Filename,' ',1,0); %Imports the whole data into a matrix
The problem with this is just that A is a 17 000 * 1 vector instead of a matrix with several columns. If I manual edit the data file, remove the headlines and just run this it works:
A = dlmread(Filename); %Imports the whole data into a matrix
But I would prefer not to do this since the headlines are used later on in the code. Any advice how to get this work?
edit: solved by using
' '
instead of just
' '
Use the import tool.
Make sure you choose the data.
Generate script.

Matlab isn't incrementing my variable

I have the following matrix declared in Matlab:
EmployeeData =
1 20 100000 42 14
2 15 95000 35 14
3 18 70000 28 14
4 10 85000 35 14
5 10 40000 21 12
6 4 45000 14 8
7 3 50000 21 10
8 5 55000 21 14
9 1 25000 14 7
10 2 50000 21 9
42 4 100000 42 10
Where column 1 represents ID numbers, 2 represents years, 3 is salary, 4 is vacation days, and 5 is sick days. I am trying to find the maximum value of a column (in this case the salary column), and print out the ID associated with that value. If more than one employee has the maximum value, all the IDs with that maximum are supposed to be shown. So here is how I naively implemented a way to do it:
>> maxVal = [];
>> j = 1;
>> for i = EmployeeData(:, 3)
if i == max(EmployeeData(:, 3))
maxVal = [maxVal EmployeeData(j, 1)];
end
j = j + 1;
end
But it shows maxVal to be [] in my workspace variables, instead of [1 42] as I expected. Upon inserting a disp(i) in the for loop above the if to debug, I get the following output:
100000
95000
70000
85000
40000
45000
50000
55000
25000
50000
Just like I expected. But when I switch out that disp(i) with a disp(j), I get this for my output:
1
What am I doing wrong? Should this not work?
MATLAB for loops operate on rows, not columns. You should try replacing your for loop with:
for i = EmployeeData(:, 3)' % NOTE THE TRANSPOSE
...
end
EDIT: Note that you can do what you're trying to do without a forloop:
maxVal = EmployeeData(EmployeeData(:,3) == max(EmployeeData(:,3)),1);
Is this what you want?
>> EmployeeData(EmployeeData(:,3)==max(EmployeeData(:,3)),1)
ans =
1
42

Matrix division & permutation to achieve Baker map

I'm trying to implement the Baker map.
Is there a function that would allow one to divide a 8 x 8 matrix by providing, for example, a sequence of divisors 2, 4, 2 and rearranging pixels in the order as shown in the matrices below?
X = reshape(1:64,8,8);
After applying divisors 2,4,2 to the matrix X one should get a matrix like A shown below.
A=[31 23 15 7 32 24 16 8;
63 55 47 39 64 56 48 40;
11 3 12 4 13 5 14 6;
27 19 28 20 29 21 30 22;
43 35 44 36 45 37 46 38;
59 51 60 52 61 53 62 54;
25 17 9 1 26 18 10 2;
57 49 41 33 58 50 42 34]
The link to the document which I am working on is:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.39.5132&rep=rep1&type=pdf
This is what I want to achieve:
Edit: a little more generic solution:
%function Z = bakermap(X,divisors)
function Z = bakermap()
X = reshape(1:64,8,8)'
divisors = [ 2 4 2 ];
[x,y] = size(X);
offsets = sum(divisors)-fliplr(cumsum(fliplr(divisors)));
if any(mod(y,divisors)) && ~(sum(divisors) == y)
disp('invalid divisor vector')
return
end
blocks = #(div) cell2mat( cellfun(#mtimes, repmat({ones(x/div,div)},div,1),...
num2cell(1:div)',...
'UniformOutput',false) );
%create index matrix
I = [];
for ii = 1:numel(divisors);
I = [I, blocks(divisors(ii))+offsets(ii)];
end
%create Baker map
Y = flipud(X);
Z = [];
for jj=1:I(end)
Z = [Z; Y(I==jj)'];
end
Z = flipud(Z);
end
returns:
index matrix:
I =
1 1 3 3 3 3 7 7
1 1 3 3 3 3 7 7
1 1 4 4 4 4 7 7
1 1 4 4 4 4 7 7
2 2 5 5 5 5 8 8
2 2 5 5 5 5 8 8
2 2 6 6 6 6 8 8
2 2 6 6 6 6 8 8
Baker map:
Z =
31 23 15 7 32 24 16 8
63 55 47 39 64 56 48 40
11 3 12 4 13 5 14 6
27 19 28 20 29 21 30 22
43 35 44 36 45 37 46 38
59 51 60 52 61 53 62 54
25 17 9 1 26 18 10 2
57 49 41 33 58 50 42 34
But have a look at the if-condition, it's just possible for these cases. I don't know if that's enough. I also tried something like divisors = [ 1 4 1 2 ] - and it worked. As long as the sum of all divisors is equal the row-length and the modulus as well, there shouldn't be problems.
Explanation:
% definition of anonymous function with input parameter: div: divisor vector
blocks = #(div) cell2mat( ... % converts final result into matrix
cellfun(#mtimes, ... % multiplies the next two inputs A,B
repmat(... % A...
{ones(x/div,div)},... % cell with a matrix of ones in size
of one subblock, e.g. [1,1,1,1;1,1,1,1]
div,1),... % which is replicated div-times according
to actual by cellfun processed divisor
num2cell(1:div)',... % creates a vector [1,2,3,4...] according
to the number of divisors, so so finally
every Block A gets an increasing factor
'UniformOutput',false...% necessary additional property of cellfun
));
Have also a look at this revision to have a simpler insight in what is happening. You requested a generic solution, thats the one above, the one linked was with more manual inputs.