pander on aov in knitr does not print? - knitr

I'm trying to print the outcome of an anova like so:
library(pander)
m.aov = aov(Sepal.Width ~ Species * Sepal.Length, iris)
pander(m.aov, split.table=Inf)
and I get this as expected if I type it into the console:
----------------------------------------------------------------------
Df Sum Sq Mean Sq F value Pr(>F)
-------------------------- ---- -------- --------- --------- ---------
**Species** 2 11.34 5.672 76.48 2.329e-23
**Sepal.Length** 1 4.769 4.769 64.3 3.368e-13
**Species:Sepal.Length** 2 1.513 0.7566 10.2 7.19e-05
**Residuals** 144 10.68 0.07417 NA NA
----------------------------------------------------------------------
Table: Analysis of Variance Model
However, if I embed this into a knitr chunk, I don't get the table:
```{r, results='asis'}
library(pander)
m.aov = aov(Sepal.Width ~ Species * Sepal.Length, iris)
pander(m.aov, split.table=Inf)
```
Knit the above and one obtains
```r
pander(m.aov, split.table=Inf)
```
i.e., the code chunk with no output.
Question: Is this a bug (in knitr? pander?) or something I've overlooked? How can I work around it?
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8
[4] LC_COLLATE=en_AU.UTF-8 LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=en_AU.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.8 pander_0.5.1 vimcom_1.0-0 setwidth_1.0-3 colorout_1.0-3
loaded via a namespace (and not attached):
[1] digest_0.6.4 evaluate_0.5.5 formatR_1.0 Rcpp_0.11.2 stringr_0.6.2 tools_3.0.2

Related

statistical test to compare 1st/2nd differences based on output from ggpredict / ggeffect

I want to conduct a simple two sample t-test in R to compare marginal effects that are generated by ggpredict (or ggeffect).
Both ggpredict and ggeffect provide nice outputs: (1) table (pred prob / std error / CIs) and (2) plot. However, it does not provide p-values for assessing statistical significance of the marginal effects (i.e., is the difference between the two predicted probabilities difference from zero?). Further, since Iā€™m working with Interaction Effects, I'm also interested in a two sample t-tests for the First Differences (between two marginal effects) and the Second Differences.
Is there an easy way to run the relevant t tests with ggpredict/ggeffect output? Other options?
Attaching:
. reprex code with fictitious data
. To be specific: I want to test the following "1st differences":
--> .67 - .33=.34 (diff from zero?)
--> .5 - .5 = 0 (diff from zero?)
...and the following Second difference:
--> 0.0 - .34 = .34 (diff from zero?)
See also Figure 12 / Table 3 in Mize 2019 (interaction effects in nonlinear models)
Thanks Scott
library(mlogit)
#> Loading required package: dfidx
#>
#> Attaching package: 'dfidx'
#> The following object is masked from 'package:stats':
#>
#> filter
library(sjPlot)
library(ggeffects)
# create ex. data set. 1 row per respondent (dataset shows 2 resp). Each resp answers 3 choice sets, w/ 2 alternatives in each set.
cedata.1 <- data.frame( id = c(1,1,1,1,1,1,2,2,2,2,2,2), # respondent ID.
QES = c(1,1,2,2,3,3,1,1,2,2,3,3), # Choice set (with 2 alternatives)
Alt = c(1,2,1,2,1,2,1,2,1,2,1,2), # Alt 1 or Alt 2 in choice set
LOC = c(0,0,1,1,0,1,0,1,1,0,0,1), # attribute describing alternative. binary categorical variable
SIZE = c(1,1,1,0,0,1,0,0,1,1,0,1), # attribute describing alternative. binary categorical variable
Choice = c(0,1,1,0,1,0,0,1,0,1,0,1), # if alternative is Chosen (1) or not (0)
gender = c(1,1,1,1,1,1,0,0,0,0,0,0) # male or female (repeats for each indivdual)
)
# convert dep var Choice to factor as required by sjPlot
cedata.1$Choice <- as.factor(cedata.1$Choice)
cedata.1$LOC <- as.factor(cedata.1$LOC)
cedata.1$SIZE <- as.factor(cedata.1$SIZE)
# estimate model.
glm.model <- glm(Choice ~ LOC*SIZE, data=cedata.1, family = binomial(link = "logit"))
# estimate MEs for use in IE assessment
LOC.SIZE <- ggpredict(glm.model, terms = c("LOC", "SIZE"))
LOC.SIZE
#>
#> # Predicted probabilities of Choice
#> # x = LOC
#>
#> # SIZE = 0
#>
#> x | Predicted | SE | 95% CI
#> -----------------------------------
#> 0 | 0.33 | 1.22 | [0.04, 0.85]
#> 1 | 0.50 | 1.41 | [0.06, 0.94]
#>
#> # SIZE = 1
#>
#> x | Predicted | SE | 95% CI
#> -----------------------------------
#> 0 | 0.67 | 1.22 | [0.15, 0.96]
#> 1 | 0.50 | 1.00 | [0.12, 0.88]
#> Standard errors are on the link-scale (untransformed).
# plot
# plot(LOC.SIZE, connect.lines = TRUE)

Option to cut values below a threshold in papaja::apa_table

I can't figure out how to selectively print values in a table above or below some value. What I'm looking for is known as "cut" in Revelle's psych package. MWE below.
library("psych")
library("psychTools")
derp <- fa(ability, nfactors=3)
print(derp, cut=0.5) #removes all loadings smaller than 0.5
derp <- print(derp, cut=0.5) #apa_table still doesn't print like this
Question is, how do I add that cut to an apa_table? Printing apa_table(derp) prints the entire table, including all values.
The print-method from psych does not return the formatted loadings but only the table of variance accounted for. You can, however, get the result you want by manually formatting the loadings table:
library("psych")
library("psychTools")
derp <- fa(ability, nfactors=3)
# Class `loadings` cannot be coerced to data.frame or matrix
class(derp$Structure)
[1] "loadings"
# Class `matrix` is supported by apa_table()
derp_loadings <- unclass(derp$Structure)
class(derp_loadings)
[1] "matrix"
# Remove values below "cut"
derp_loadings[derp_loadings < 0.5] <- NA
colnames(derp_loadings) <- paste("Factor", 1:3)
apa_table(
derp_loadings
, caption = "Factor loadings"
, added_stub_head = "Item"
, format = "pandoc" # Omit this in your R Markdown document
, format.args = list(na_string = "") # Don't print NA
)
*Factor loadings*
Item Factor 1 Factor 2 Factor 3
---------- --------- --------- ---------
reason.4 0.60
reason.16
reason.17 0.65
reason.19
letter.7 0.61
letter.33 0.56
letter.34 0.65
letter.58
matrix.45
matrix.46
matrix.47
matrix.55
rotate.3 0.70
rotate.4 0.73
rotate.6 0.63
rotate.8 0.63

Error when importing tm Vcorpus into Quanteda corpus

This code snippet worked just fine until I decided to update R(3.6.3) and RStudio(1.2.5042) yesterday, though it is not obvious to me that is the source of the problem.
In a nutshell, I convert 91 pdf files into a volatile corpus named Vcorp and confirm that I created a volatile corpus as follows:
> Vcorp <- VCorpus(VectorSource(citiesText))
> class(Vcorp)
[1] "VCorpus" "Corpus"
Then I attempt to import this tm Vcorpus into quanteda, but keep getting an error message, which I did not get before (eg the day before the update).
> data(Vcorp, package = "tm")
> citiesCorpus <- corpus(Vcorp)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 8714, 91
Any suggestions? Thank you.
Impossible to know the exact problem without a) version information on your packages and b) a reproducible example.
Why use tm at all? You could have created a quanteda corpus directly as:
corpus(citiesText)
Converting a VCorpus works fine for me.
library("quanteda")
## Package version: 2.0.1
library("tm")
packageVersion("tm")
## [1] ā€˜0.7.7ā€™
reut21578 <- system.file("texts", "crude", package = "tm")
VCorp <- VCorpus(
DirSource(reut21578, mode = "binary"),
list(reader = readReut21578XMLasPlain)
)
corpus(VCorp)
## Corpus consisting of 20 documents and 16 docvars.
## text1 :
## "Diamond Shamrock Corp said that effective today it had cut i..."
##
## text2 :
## "OPEC may be forced to meet before a scheduled June session t..."
##
## text3 :
## "Texaco Canada said it lowered the contract price it will pay..."
##
## text4 :
## "Marathon Petroleum Co said it reduced the contract price it ..."
##
## text5 :
## "Houston Oil Trust said that independent petroleum engineers ..."
##
## text6 :
## "Kuwait"s Oil Minister, in remarks published today, said ther..."
##
## [ reached max_ndoc ... 14 more documents ]

Compare contrasts in linear model in Python (like Rs contrast library?)

In R I can do the following to compare two contrasts from a linear model:
url <- "https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/spider_wolff_gorb_2013.csv"
filename <- "spider_wolff_gorb_2013.csv"
install.packages("downloader", repos="http://cran.us.r-project.org")
library(downloader)
if (!file.exists(filename)) download(url, filename)
spider <- read.csv(filename, skip=1)
head(spider, 5)
# leg type friction
# 1 L1 pull 0.90
# 2 L1 pull 0.91
# 3 L1 pull 0.86
# 4 L1 pull 0.85
# 5 L1 pull 0.80
fit = lm(friction ~ type + leg, data=spider)
fit
# Call:
# lm(formula = friction ~ type + leg, data = spider)
#
# Coefficients:
# (Intercept) typepush legL2 legL3 legL4
# 1.0539 -0.7790 0.1719 0.1605 0.2813
install.packages("contrast", repos="http://cran.us.r-project.org")
library(contrast)
l4vsl2 = contrast(fit, list(leg="L4", type="pull"), list(leg="L2",type="pull"))
l4vsl2
# lm model parameter contrast
#
# Contrast S.E. Lower Upper t df Pr(>|t|)
# 0.1094167 0.04462392 0.02157158 0.1972618 2.45 277 0.0148
I have found out how to do much of the above in Python:
import pandas as pd
df = pd.read_table("https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/spider_wolff_gorb_2013.csv", sep=",", skiprows=1)
df.head(2)
import statsmodels.formula.api as sm
model1 = sm.ols(formula='friction ~ type + leg', data=df)
fitted1 = model1.fit()
print(fitted1.summary())
Now all that remains is finding the t-statistic for the contrast of leg pair L4 vs. leg pair L2. Is this possible in Python?
statsmodels is still missing some predefined contrasts, but the t_test and wald_test or f_test methods of the model Results classes can be used to test linear (or affine) restrictions. The restrictions either be given by arrays or by strings using the parameter names.
Details for how to specify contrasts/restrictions should be in the documentation
for example
>>> tt = fitted1.t_test("leg[T.L4] - leg[T.L2]")
>>> print(tt.summary())
Test for Constraints
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
c0 0.1094 0.045 2.452 0.015 0.022 0.197
==============================================================================
The results are attributes or methods in the instance that is returned by t_test. For example the conf_int can be obtained by
>>> tt.conf_int()
array([[ 0.02157158, 0.19726175]])
t_test is vectorized and treats each restriction or contrast as separate hypothesis. wald_test treats a list of restrictions as joint hypothesis:
>>> tt = fitted1.t_test(["leg[T.L3] - leg[T.L2], leg[T.L4] - leg[T.L2]"])
>>> print(tt.summary())
Test for Constraints
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
c0 -0.0114 0.043 -0.265 0.792 -0.096 0.074
c1 0.1094 0.045 2.452 0.015 0.022 0.197
==============================================================================
>>> tt = fitted1.wald_test(["leg[T.L3] - leg[T.L2], leg[T.L4] - leg[T.L2]"])
>>> print(tt.summary())
<F test: F=array([[ 8.10128575]]), p=0.00038081249480917173, df_denom=277, df_num=2>
Aside: this also works for robust covariance matrices if cov_type was specified as argument to fit.

POS tagging in Scala

I tried to POS tag a sentence in Scala using Stanford parser like below
val lp:LexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
lp.setOptionFlags("-maxLength", "50", "-retainTmpSubcategories")
val s = "I love to play"
val parse :Tree = lp.apply(s)
val taggedWords = parse.taggedYield()
println(taggedWords)
I got an error type mismatch; found : java.lang.String required: java.util.List[_ <: edu.stanford.nlp.ling.HasWord] in the line val parse :Tree = lp.apply(s)
I don't know whether this is the right way of doing it or not. Are there any other easy ways of POS tagging a sentence in Scala?
You might like to consider the FACTORIE toolkit (http://github.com/factorie/factorie). It is a general library for machine learning and graphical models that happens to include an extensive suite of natural language processing components (tokenization, token normalization, morphological analysis, sentence segmentation, part-of-speech tagging, named entity recognition, dependency parsing, mention finding, coreference).
Furthermore it is written entirely in Scala, and it is released under the Apache License.
Documentation is currently sparse, but will be improving in the coming months.
For example, once Maven-based installation is finished you can type at the command line:
bin/fac nlp --pos1 --parser1 --ner1
to launch a socket-listening multi-threaded NLP server. Then query it by piping plain text to its socket number:
echo "Mr. Jones took a job at Google in New York. He and his Australian wife moved from New South Wales on 4/1/12." | nc localhost 3228
The output is then
1 1 Mr. NNP 2 nn O
2 2 Jones NNP 3 nsubj U-PER
3 3 took VBD 0 root O
4 4 a DT 5 det O
5 5 job NN 3 dobj O
6 6 at IN 3 prep O
7 7 Google NNP 6 pobj U-ORG
8 8 in IN 7 prep O
9 9 New NNP 10 nn B-LOC
10 10 York NNP 8 pobj L-LOC
11 11 . . 3 punct O
12 1 He PRP 6 nsubj O
13 2 and CC 1 cc O
14 3 his PRP$ 5 poss O
15 4 Australian JJ 5 amod U-MISC
16 5 wife NN 6 nsubj O
17 6 moved VBD 0 root O
18 7 from IN 6 prep O
19 8 New NNP 9 nn B-LOC
20 9 South NNP 10 nn I-LOC
21 10 Wales NNP 7 pobj L-LOC
22 11 on IN 6 prep O
23 12 4/1/12 NNP 11 pobj O
24 13 . . 6 punct O
Of course there is a programmatic API to all this functionality as well.
import cc.factorie._
import cc.factorie.app.nlp._
val doc = new Document("Education is the most powerful weapon which you can use to change the world.")
DocumentAnnotatorPipeline(pos.POS1).process(doc)
for (token <- doc.tokens)
println("%-10s %-5s".format(token.string, token.posLabel.categoryValue))
will output:
Education NN
is VBZ
the DT
most RBS
powerful JJ
weapon NN
which WDT
you PRP
can MD
use VB
to TO
change VB
the DT
world NN
. .
I found a very simple way to do POS tagging in Scala
Step 1
Download stanford tagger version 3.2.0 form the link below
http://nlp.stanford.edu/software/stanford-postagger-2013-06-20.zip
Step 2
Add stanford-postagger jar present in the folder to your project and also place the english-left3words-distsim.tagger file present in the models folder in your project
Then, with the code below you can pos tag a sentence in Scala
val tagger = new MaxentTagger(
"english-left3words-distsim.tagger")
val art_con = "My name is Rahul"
val tagged = tagger.tagString(art_con)
println(tagged)
Output: My_PRP$ name_NN is_VBZ Rahul_NNP
I believe the API of the Stanford Parser has changed somewhat, as it does sometimes. apply has the signature, public Tree apply(java.util.List<? extends HasWord> words), and this is what you see in the error message.
What you should use now is parse, which has the signature public Tree parse(java.lang.String sentence).