Error when importing tm Vcorpus into Quanteda corpus

Error when importing tm Vcorpus into Quanteda corpus - tm

This code snippet worked just fine until I decided to update R(3.6.3) and RStudio(1.2.5042) yesterday, though it is not obvious to me that is the source of the problem.
In a nutshell, I convert 91 pdf files into a volatile corpus named Vcorp and confirm that I created a volatile corpus as follows:
> Vcorp <- VCorpus(VectorSource(citiesText))
> class(Vcorp)
[1] "VCorpus" "Corpus"
Then I attempt to import this tm Vcorpus into quanteda, but keep getting an error message, which I did not get before (eg the day before the update).
> data(Vcorp, package = "tm")
> citiesCorpus <- corpus(Vcorp)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 8714, 91
Any suggestions? Thank you.

Impossible to know the exact problem without a) version information on your packages and b) a reproducible example.
Why use tm at all? You could have created a quanteda corpus directly as:
corpus(citiesText)
Converting a VCorpus works fine for me.
library("quanteda")
## Package version: 2.0.1
library("tm")
packageVersion("tm")
## [1] ‘0.7.7’
reut21578 <- system.file("texts", "crude", package = "tm")
VCorp <- VCorpus(
DirSource(reut21578, mode = "binary"),
list(reader = readReut21578XMLasPlain)
)
corpus(VCorp)
## Corpus consisting of 20 documents and 16 docvars.
## text1 :
## "Diamond Shamrock Corp said that effective today it had cut i..."
##
## text2 :
## "OPEC may be forced to meet before a scheduled June session t..."
##
## text3 :
## "Texaco Canada said it lowered the contract price it will pay..."
##
## text4 :
## "Marathon Petroleum Co said it reduced the contract price it ..."
##
## text5 :
## "Houston Oil Trust said that independent petroleum engineers ..."
##
## text6 :
## "Kuwait"s Oil Minister, in remarks published today, said ther..."
##
## [ reached max_ndoc ... 14 more documents ]

Related

Why does my R notebook produce a blank html document

For some reason, my R notebook is producing a blank HTML document. When I'm ready to knit the document to an html notebook, my browser opens up the file and it is a blank document. I'm pressing the "knit" button, then "html" from R Studio.
Here is my code:
---
title: "Rate Hole Model"
output: html_document
---
```{r}
library(tidyverse)
library(plotly)
library(rmarkdown)
library(knitr)
```
```{r}
veh_age <- mc2 %>%
filter(cummulative < 51)
plot_ly(veh_age, x = ~unit_age, y = ~loss_ratio, color = ~rating_class_name) %>%
add_markers(text = ~paste(rating_class_name, "<br />", 'unit age: ',
unit_age, "<br />", 'loss ratio: ', loss_ratio), hoverinfo =
'text') %>%
layout(title = 'Comp Loss Ratio by Unit Age/Rating Class')
```
I'm not sure what happened. I'm on R version 3.5.1Has anyone ran into this problem?

My Libraries were installed to my Home directory : \Home\firstname.lastname\documents. This directory is a network located resource. When the last step of the process ran, the proper permission was not available to the called application (Pandoc). I am running 64 bit Win 10 with 64 bit RStudio version 1.1.456 and the 3.5 version of R. When I moved (reinstalled) the packages/libraries to a local folder : c:\Program Files\RStudio\Packages the HTML rendered in the browser.

htmlTable in Rmd - conversion to Word docx

I have the following Rmd file, which produces an html file, which I then copy-paste into a docx file (for collaborators). Here are things I'd like to know how to do with the tables, but I can't find answers in the vignettes here:
A. I want to know how to remove the blank column that gets inserted in Word in between Cgroup 1 and Cgroup 2.
B. I want to know how to set the width of the column with the row names ("1st row",...)
C. How can I change the font and font size? I tried following this but it doesn't work to have output: word_document with htmlTable()
D. To ease the conversion to Word, is there a way to specify page breaks? Landscape orientation?
Thank you so much!
---
title: "Example"
output:
Gmisc::docx_document:
fig_caption: TRUE
force_captions: TRUE
---
Results
=======
```{r, echo = FALSE}
library(htmlTable)
library(Gmisc)
library(knitr)
mx <-
matrix(ncol=6, nrow=8)
rownames(mx) <- paste(c("1st", "2nd",
"3rd",
paste0(4:8, "th")),
"row")
colnames(mx) <- paste(c("1st", "2nd",
"3rd",
paste0(4:6, "th")),
"hdr")
for (nr in 1:nrow(mx)){
for (nc in 1:ncol(mx)){
mx[nr, nc] <-
paste0(nr, ":", nc)
}
}
htmlTable(mx,
cgroup = c("Cgroup 1", "Cgroup 2"),
n.cgroup = c(2,4))
```

The styling seemed to be off for the row names and it is now fixed in version 1.10.1 that you can download using the devtools package: devtools::install_github("gforge/htmlTable", ref="develop")
Regarding the styling the function allows almost any CSS-style you could image. Unfortunately it requires copy-pasting into Word and this functionality hasn't been Microsofts highest priority. You can easily adapt you example to accomodate the requiered changes using the css.cell:
library(htmlTable)
library(knitr)
mx <-
matrix(ncol=6, nrow=8)
rownames(mx) <- paste(c("1st", "2nd",
"3rd",
paste0(4:8, "th")),
"row")
colnames(mx) <- paste(c("1st", "2nd",
"3rd",
paste0(4:6, "th")),
"hdr")
for (nr in 1:nrow(mx)){
for (nc in 1:ncol(mx)){
mx[nr, nc] <-
paste0(nr, ":", nc)
}
}
css.cell = rep("font-size: 1.5em;", times = ncol(mx) + 1)
css.cell[1] = "width: 4cm; font-size: 2em;"
htmlTable(mx,
css.cell=css.cell,
css.cgroup = "color: red",
css.table = "color: blue",
cgroup = c("Cgroup 1", "Cgroup 2"),
n.cgroup = c(2,4))
There is no way to remove the empty column generated by cgroups. This was required for the table to look nice and is a conscious design choice.
Regarding page-breaks I doubt there is any elegant way for doing that. An alternative could possibly be the ReporteRs package. I haven't used it myself but it's closer integrated with Word and could possibly be a solution.

Aligning and italicising table column headings using Rmarkdown and pander

I am writing a rmarkdown document knitting to pdf with tables taken from portions of lists from the ezANOVA package. The tables are made using the pander package. Toy Rmarkdown file with toy dataset below.
---
title: "Table Doc"
output: pdf_document
---
```{r global_options, include=FALSE}
#set global knit options parameters.
knitr::opts_chunk$set(fig.width=12, fig.height=8, fig.path='Figs/',
echo=FALSE, warning=FALSE, message=FALSE, dev = 'pdf')
```
```{r, echo=FALSE}
# toy data
id <- rep(c(1,2,3,4), 5)
group1 <- factor(rep(c("A", "B"), 10))
group2 <- factor(rep(c("A", "B"), each = 10))
dv <- runif(20, min = 0, max = 10)
df <- data.frame(id, group1, group2, dv)
```
``` {r anova, echo = FALSE}
library(ez)
library(plyr)
library(pander)
# create anova object
anOb <- ezANOVA(df,
dv = dv,
wid = id,
between = c(group1, group2),
type = 3,
detailed = TRUE)
# extract the output table from the anova object, reduce it down to only desired columns
anOb <- data.frame(anOb[[1]][, c("Effect", "F", "p", "p<.05")])
# format entries in columns
anOb[,2] <- format( round (anOb[,2], digits = 1), nsmall = 1)
anOb[,3] <- format( round (anOb[,3], digits = 4), nsmall = 1)
pander(anOb, justify = c("left", "center", "center", "right"))
```
Now I have a few problems
a) For the last three columns I would like to have the column heading in the table aligned in the center, but the actual column entries underneath those headings aligned to the right.
b) I would like to have the column headings 'F' and 'p' in italics and the 'p' in the 'p<.05' column in italics also but the rest in normal font. So they read F, p and p<.05
I tried renaming the column headings using plyr::rename like so
anOb <- rename(anOb, c("F" = "italic(F)", "p" = "italic(p)", "p<.05" = ""))
But it didn't work

In markdown, you have to use the markdown syntax for italics, which is wrapping text between a star or underscore:
> names(anOb) <- c('Effect', '*F*', '*p*', '*p<.05*')
> pander(anOb)
-----------------------------------------
Effect *F* *p* *p<.05*
--------------- ------ -------- ---------
(Intercept) 52.3 0.0019 *
group1 1.3 0.3180
group2 2.0 0.2261
group1:group2 3.7 0.1273
-----------------------------------------
If you want to do that in a programmatic way, you can also use the pandoc.emphasis helper function to add the starts to a string.
But your other problem is due to a bug in the package, for which I've just proposed a fix on GH. Please feel free to give a try to that branch and report back on GH -- I will try to get some time later this week to clean up the related unit tests and merge the branch if everything seem to be OK.

pander on aov in knitr does not print?

I'm trying to print the outcome of an anova like so:
library(pander)
m.aov = aov(Sepal.Width ~ Species * Sepal.Length, iris)
pander(m.aov, split.table=Inf)
and I get this as expected if I type it into the console:
----------------------------------------------------------------------
Df Sum Sq Mean Sq F value Pr(>F)
-------------------------- ---- -------- --------- --------- ---------
**Species** 2 11.34 5.672 76.48 2.329e-23
**Sepal.Length** 1 4.769 4.769 64.3 3.368e-13
**Species:Sepal.Length** 2 1.513 0.7566 10.2 7.19e-05
**Residuals** 144 10.68 0.07417 NA NA
----------------------------------------------------------------------
Table: Analysis of Variance Model
However, if I embed this into a knitr chunk, I don't get the table:
```{r, results='asis'}
library(pander)
m.aov = aov(Sepal.Width ~ Species * Sepal.Length, iris)
pander(m.aov, split.table=Inf)
```
Knit the above and one obtains
```r
pander(m.aov, split.table=Inf)
```
i.e., the code chunk with no output.
Question: Is this a bug (in knitr? pander?) or something I've overlooked? How can I work around it?
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8
[4] LC_COLLATE=en_AU.UTF-8 LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=en_AU.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.8 pander_0.5.1 vimcom_1.0-0 setwidth_1.0-3 colorout_1.0-3
loaded via a namespace (and not attached):
[1] digest_0.6.4 evaluate_0.5.5 formatR_1.0 Rcpp_0.11.2 stringr_0.6.2 tools_3.0.2

inf2cat 22.9.10 error

I'm trying to sign old Hitachi driver that makes USB flash drive appear as fixed disk
(Quite useful when you have fast, large thumb drives)
Driver itself works fine but I constantly get same error when try to get it signed:
Errors:
22.9.10: cfadisk.sys in [cfadisk_copyfiles] is missing from [SourceDisksFiles] section in
\cfadisk.inf; driver may not sign correctly until this is resolved.
22.9.10: disk.sys in [gendisk_copyfiles] is missing from [SourceDisksFiles] section in
cfadisk.inf; driver may not sign correctly until this is resolved.
This is my .inf file:
[Version]
Signature="$Windows NT$"
Class=DiskDrive
ClassGuid={4D36E967-E325-11CE-BFC1-08002BE10318}
Provider=%HGST%
DriverVer=10/14/2012,9.9.9.9
CatalogFile=cfadisk.cat
[Manufacturer]
%HGST% = cfadisk_device,ntAMD64
[DestinationDirs]
cfadisk_copyfiles=12 ; %SystemRoot%\system32\drivers
gendisk_copyfiles=12 ; %SystemRoot%\system32\drivers
[cfadisk_copyfiles]
cfadisk.sys
[gendisk_copyfiles]
disk.sys
[cfadisk_device]
%Microdrive_devdesc% = cfadisk_install,USBSTOR\Disk&Ven_SanDisk&Prod_Extreme&Rev_0001
[cfadisk_device.NTamd64]
%Microdrive_devdesc% = cfadisk_install,USBSTOR\Disk&Ven_SanDisk&Prod_Extreme&Rev_0001
[cfadisk_addreg]
HKR,,"LowerFilters",0x00010008,"cfadisk"
[cfadisk_install]
CopyFiles=cfadisk_copyfiles,gendisk_copyfiles
[cfadisk_install.HW]
AddReg=cfadisk_addreg
[cfadisk_install.Services]
AddService=disk,2,gendisk_ServiceInstallSection
AddService=cfadisk,,cfadisk_ServiceInstallSection
[gendisk_ServiceInstallSection]
DisplayName = "Disk Driver"
ServiceType = 1
StartType = 0
ErrorControl = 1
ServiceBinary = %12%\disk.sys
LoadOrderGroup = SCSI Class
[cfadisk_ServiceInstallSection]
DisplayName = "CompactFlash Filter Driver"
ServiceType = 1
StartType = 3
ErrorControl = 1
ServiceBinary = %12%\cfadisk.sys
LoadOrderGroup = Pnp Filter
; -----------------------
[Strings]
HGST = "Hitachi"
Microdrive_devdesc = "SanDisk Extreme"
I was using this tutorial as reference point:
http://www.deploymentresearch.com/Blog/tabid/62/EntryId/63/Sign-your-unsigned-drivers-Damn-It.aspx
cfadisk.inf and sys can be downloaded here:
link is at the beginning of first post
http://hardforum.com/showthread.php?t=1655684
Any help would be greatly appreciated
EDIT:
I just used chkinf utility on this .inf file
Here is the output:
C:\DriversCert\SanDisk\cfadisk.inf: FAILED
NTLOG REPORT--------------
Total Lines: 62 |
Total Errors: 1 |
Total Warnings: 4 |
--------------------------
Line 0: ERROR: (E22.1.1003) Section [SourceDisksNames] not defined.
Line 0: WARNING: (W22.1.2212) No Copyright information found.
Line 0: WARNING: (W22.1.2111) [SourceDisksFiles] section not defined - full CopyFiles checking not done.
Line 17: WARNING: (W22.1.2112) File "cfadisk.sys" is not listed in the [SourceDisksFiles].
Line 20: WARNING: (W22.1.2112) File "disk.sys" is not listed in the [SourceDisksFiles].
I'm really no programer so I really don't understand what does all this mean.
Strange thing is that driver does work, I just can't get i signed.
Thank you!
Best regards,
Walter

It means that some sections are missed. In your case they are [SourceDisksFiles] and [SourceDisksNames]
In this specific situation you just should add:
[SourceDisksFiles]
cfadisk.sys = 1
disk.sys = 1
[SourceDisksNames]
1 = %DiskName%, ,
and also add a record to [String] section in the bottom:
DiskName="Disk Drive"

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Error when importing tm Vcorpus into Quanteda corpus - tm

Related

Why does my R notebook produce a blank html document

htmlTable in Rmd - conversion to Word docx

Aligning and italicising table column headings using Rmarkdown and pander

pander on aov in knitr does not print?

inf2cat 22.9.10 error

Categories

Resources