Why do I always get 1 for df when running adonis function (permanova)? - vegan

I run adonis on community data and environmental matrix (that contains a factor of two levels and 6 continuous variables) using Bray-Curtis and I always take 1 df but this is not the case. Probably there is a bug here.
See also the example in adonis
data(dune)
data(dune.env)
str(dune.env)
adonis(dune ~ Management*A1, data=dune.env, permutations=99)
Although A1 is a numeric variable the result provides 1 df.

In the model:
> adonis(dune ~ Management*A1, data=dune.env, permutations=99)
Call:
adonis(formula = dune ~ Management * A1, data = dune.env, permutations = 99)
Permutation: free
Number of permutations: 99
Terms added sequentially (first to last)
Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
Management 3 1.4686 0.48953 3.2629 0.34161 0.01 **
A1 1 0.4409 0.44089 2.9387 0.10256 0.02 *
Management:A1 3 0.5892 0.19639 1.3090 0.13705 0.21
Residuals 12 1.8004 0.15003 0.41878
Total 19 4.2990 1.00000
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The main effect of A1 uses a single degree of freedom because it is a continuous variable. The interaction between Management and A1 uses 3 additional degree's of freedom as there is one additional "effect" of A1 per level of Management.
This is all expected and there is certainly no bug illustrated in adonis() from this model.
Importantly, you must insure that factor variables are coded as factors otherwise, for example if the categories are coded as integers, then R will still interpret those variables as continuous/numeric. It will only interpret them as factors if coerced to the "factor" class. Check the output of str(df), where df is your data frame containing the predictor variables (covariates; the things on the right-hand side of ~), and insure that each factor variable is of the appropriate class. For example, the dune.env data are:
> str(dune.env)
'data.frame': 20 obs. of 5 variables:
$ A1 : num 2.8 3.5 4.3 4.2 6.3 4.3 2.8 4.2 3.7 3.3 ...
$ Moisture : Ord.factor w/ 4 levels "1"<"2"<"4"<"5": 1 1 2 2 1 1 1 4 3 2 ...
$ Management: Factor w/ 4 levels "BF","HF","NM",..: 4 1 4 4 2 2 2 2 2 1 ...
$ Use : Ord.factor w/ 3 levels "Hayfield"<"Haypastu"<..: 2 2 2 2 1 2 3 3 1 1 ...
$ Manure : Ord.factor w/ 5 levels "0"<"1"<"2"<"3"<..: 5 3 5 5 3 3 4 4 2 2 ...
which indicates that Management is a factor, A1 is numeric (it is the thickness of the A1 soil horizon), and the remaining variables are ordered factors (but still factors; they work correctly in R's omdel formula infrastructure).

Related

Interpolating matrices with 2 variables and 1 dependent value

I'm analyzing an induction motor, varying the frequency and absolute value of the stator current. Since the FEM-Tool only works with a current input, I need to vary the current over the frequency to obtain current-values of constant torque for each frequency.
To generate a mesh, I use 2 for-loops:
The outer loop sets the current.
The inner loop varies the frequency with said current, gets the machine's torque and finally, the matrices are appended adding the current stator-current, frequency and torque each in separate matrices. Plotted it looks like this:
Example of the plot using the raw data
For the plot I used smaller, more imprecise matrices and rather arbitrary values:
I_S = [ 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 ];
fre = [ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ];
tor = [ 0 0.1 0.3 0.5 0.7 1 1.5 2 2.6 3.3 0 1.1 1.3 1.5 1.7 2 2.5 3 3.6 4.3 0 2.1 2.3 2.5 2.7 3 3.5 4 4.6 5.3 ];
While tor is shown as the colormap in the plot. Each matrix has a length of 30.
One simulation needs about 20-30 seconds. Thus, to get a precise mesh, the FEM-tool needs several hours to generate.
I would like to interpolate the spaces in between the known ones.
It seems that either the way of creating the matrices is the problem or the interp*...-functions of Octave/MATLAB simply don't work for this kind of interpolation.
Is there a way to achieve a mesh/grid-like interpolation from this type of matrices? I found many examples with x,y as variables and z as a math-function but rarely 3 linear/non-linear matrices.
Your data need to be in a meshgrid form, that is 2D:
// Known data
current = [0:2];
frequency = [0:9];
[current2D, frequency2D] = meshgrid(current,frequency);
torque2D = [ 0 0.1 0.3; 0.5 0.7 1; 1.5 2 2.6; 3.3 0 1.1; 1.3 1.5 1.7; 2 2.5 3; 3.6 4.3 0; 2.1 2.3 2.5; 2.7 3 3.5; 4 4.6 5.3 ];
// Interpolated data
currentToInterpolate = [0.5 1.5];
frequncyToInterpolate = [0.5 : 8.5];
[currentToInterpolate2D, frequencyToInterpolate2D] = meshgrid(currentToInterpolate,frequncyToInterpolate);
interpolatedTorque2D = interp2(current2D,frequency2D,torque2D,currentToInterpolate2D,frequencyToInterpolate2D);

How to add up numbers with same value in matlab

So if i have an number array:
a b
1 2.5
1 1.2
3 2.5
4 0.4
6 3
3 1.2
i want to sum up numbers in column a with same value of of b in column 2, like this:
a b
4 2.5
4 1.2
4 0.4
6 3
so as you can see 1 and 3 add up and became 4, because they have the same value of b which is 2 and also to the rest of the numbers, so how i will do that? thanks
(PS: my real data is combination of integers and decimal no. thanks)
Assuming A to be the input array, you have two approaches to play with here.
Approach #1
A combination of accumarray and unique -
[unqcol2,~,idx] = unique(A(:,2),'stable')
[accumarray(idx,A(:,1)) unqcol2]
Approach #2
With bsxfun -
[unqcol2,~,idx] = unique(A(:,2),'stable')
[sum(bsxfun(#times,bsxfun(#eq,idx,1:max(idx)),A(:,1)),1).' unqcol2 ]

2 groups, 1 way, repeated measure anova in MaLlab

I have 2 groups of persons with repeated measures (the order of the measures does not matter [1,2] is the same as [2,1]). The data could look like that (3 persons per group, 6 measures each):
groupA = [1 3 6 5 2 9; 2 5 3 4 5 8; 8 7 3 6 2 4];
groupB = [3 4 5 4 4 1; 2 8 4 2 1 2; 3 2 5 5 1 2];
A straightforward way would be to compare the 2 groups via a ranksum test of the mean values of each person:
meansA = mean(groupA, 2); % => [4.3 4.5 5.0]
meansB = mean(groupB, 2); % => [3.5 3.2 3.0]
[p, h] = ranksum(meansA, meansB)
However, this type of analysis neglects that each of the mean values consists of several measures (and therefore underestimates the significance).
A statistician told me to use a "repeated measure ANOVA" instead but none of the ANOVA functions of MatLab seems to do exactly what I want. The closest thing that I could find was:
>> [p, atab] = anovan([1 3 6 5 2 9 2 5 3 4 5 8 8 7 3 6 2 4 3 4 5 4 4 1 2 8 4 2 1 2 3 2 5 5 1 2], {[zeros(1,18) ones(1,18)],[1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6]}, 'varnames', {'individual', 'groupAorB'}, 'display', 'off')
p =
NaN
0.9774
But this seems not to work in the way I want it (NaN value and unrealistic p-value). I would be happy for any suggestions on how to perform an appropriate statistical test on these data in MatLab.
You should have a look at this FileExchange entry that deals with the one-way repeated measure ANOVA:
http://www.mathworks.com/matlabcentral/fileexchange/5576-rmaov1
The author (Antonio Trujillo-Ortiz) made some other nice entries for different designs (2 and 3 way anovas with repeated measures).
Unfortunately, the regular statistical functions in Matlab do not allow for repeated measure designs.
The nan signifies that a model which accounts for INDIVIDUAL, accounts for all of the variance of GROUP. In otherwords, if you fit an intercept for each INDIVIDUAL, then try to find the variability due to GROUP, you have no variance left. The model is "overspecified".
This is because you actually need a mixed effects model - you are looking for a between-subjects effect. In order to achieve this, you need to tell MATLAB that INDIVIDUAL is nested inside GROUP. So you use the following parameters for anovan:
'random', INDIVIDUAL
'nested', INDIVIDUAL_within_GROUP
Having said that, I don't know what error covariance assumptions this makes - i.e. does it assume a diagonal covariance matrix?
If you want more control over the assumptions being made, I suggest you investigate NLMEFIT from the statistics toolbox. This allows mixed effects models specifying the covariance.

Find the increasing and decreasing trend in a curve MATLAB

a=[2 3 6 7 2 1 0.01 6 8 10 12 15 18 9 6 5 4 2].
Here is an array i need to extract the exact values where the increasing and decreasing trend starts.
the output for the array a will be [2(first element) 2 6 9]
a=[2 3 6 7 2 1 0.01 6 8 10 12 15 18 9 6 5 4 2].
^ ^ ^ ^
| | | |
Kindly help me to get the result in MATLAB for any similar type of array..
You just have to find where the sign of the difference between consecutive numbers changes.
With some common sense and the functions diff, sign and find, you get this solution:
a = [2 3 6 7 2 1 0.01 6 8 10 12 15 18 9 6 5 4 2];
sda = sign(diff(a));
idx = [1 find(sda(1:end-1)~=sda(2:end))+2 ];
result = a(idx);
EDIT:
The sign function messes things up when there are two consecutive numbers which are the same, because sign(0) = 0, which is falsely identified as a trend change. You'd have to filter these out. You can do this by first removing the consecutive duplicates from the original data. Since you only want the values where the trend change starts, and not the position where it actually starts, this is easiest:
a(diff(a)==0) = [];
This is a great place to use the diff function.
Your first step will be to do the following:
B = [0 diff(a)]
The reason we add the 0 there is to keep the matrix the same length because of the way the diff function works. It will start with the first element in the matrix and then report the difference between that and the next element. There's no leading element before the first one so is just truncates the matrix by one element. We add a zero because there is no change there as it's the starting element.
If you look at the results in B now it is quite obvious where the inflection points are (where you go from positive to negative numbers).
To pull this out programatically there are a number of things you can do. I tend to use a little multiplication and the find command.
Result = find(B(1:end-1).*B(2:end)<0)
This will return the index where you are on the cusp of the inflection. In this case it will be:
ans =
4 7 13

calculate co-occurrences

I have a file as shown in the screenshot attached. There are 61 events(peaks) and I want to find how often each peak occurs with the other (co-occurrence) for all possible combinations. The file has the frequency (number of times the peak appears in the 47 samples) and probability (no.of times the peak occurs divided by total number of samples).
Then I want to find mutually exclusive peaks using the formula p(x,y) / p(x)*p(y), where p(x,y) is the probability that x and y co-occur, p(x) is probability of peak (x) and p(y) is the probability of peak y.
What is the best way to solve such a problem? Do I need to write a Perl script or are there some R functions I could use? I am a biologist trying to learn Perl and R so I would appreciate some example code to solve this problem.
In the following, I've assumed that what you alternately call p(xy) and p(x,y) should actually be the probability (rather than the number of times) that x and y co-occur. If that's not correct, just remove the division by nrow(X) from the 2nd line below.
# As an example, create a sub-matrix of your data
X <- cbind(c(0,0,0,0,0,0), c(1,0,0,1,1,1), c(1,1,0,0,0,0))
num <- (t(X) %*% X)/nrow(X) # The numerator of your expression
means <- colMeans(X) # A vector of means of each column
denom <- outer(colMeans(X), colMeans(X)) # The denominator
out <- num/denom
# [,1] [,2] [,3]
# [1,] NaN NaN NaN
# [2,] NaN 1.50 0.75
# [3,] NaN 0.75 3.00
Note: The NaNs in the results are R's way of indicating that those cells are "Not a number" (since they are each the result of dividing 0 by 0).
your question is not completely clear without a proper example but I am thinking this result is along the lines of what you want i.e. "I want to find how often each peak occurs with the other (co-occurrence) "
library(igraph)
library(tnet)
library(bipartite)
#if you load your data in as a matrix e.g.
mat<-matrix(c(1,1,0,2,2,2,3,3,3,4,4,0),nrow=4,byrow=TRUE) # e.g.
# [,1] [,2] [,3] # your top line as columns e.g.81_05 131_00 and peaks as rows
#[1,] 1 1 0
#[2,] 2 2 2
#[3,] 3 3 3
#[4,] 4 4 0
then
pairs<-web2edges(mat,return=TRUE)
pairs<- as.tnet(pairs,type="weighted two-mode tnet")
peaktopeak<-projecting_tm(pairs, method="sum")
peaktopeak
#peaktopeak
# i j w
#1 1 2 2 # top row here says peak1 and peak2 occurred together twice
#2 1 3 2
#3 1 4 2
#4 2 1 4
#5 2 3 6
#6 2 4 4
#7 3 1 6
#8 3 2 9
#9 3 4 6
#10 4 1 8
#11 4 2 8
#12 4 3 8 # peak4 occured with peak3 8 times
EDIT: If mutually exclusive peaks that do not occur are just those that do not share 1s in the same columns as your original data then you can just see this in peaktopeak. For instance if peak 1 and 3 never occur they wont be found in peaktopeak in the same row.
To look at this easier you could:
peakmat <- tnet_igraph(peaktopeak,type="weighted one-mode tnet")
peakmat<-get.adjacency(peakmat,attr="weight")
e.g.:
# [,1] [,2] [,3] [,4]
#[1,] 0 2 2 2
#[2,] 4 0 6 4
#[3,] 6 9 0 6
#[4,] 8 8 8 0 # zeros would represent peaks that never co occur.
#In this case everything shares at least 2 co-occurrences
#diagonals are 0 as saying peak1 occurs with itself is obviously silly.