Rmd table captions are messed up when knitting to word doc - ms-word

I am trying to make tables in Rmd, with different captions or headers. The package flextable has great options that can be output to word documents. It's function add_header_lines() allows you to add a caption header at the top of each table. When Rmd output wraps to a new page, add_header_lines() adds another caption header to the top of the continued table on the next page. However, it grabs the arguments passed to whatever table you made first. Then it continues to be correct for the next table, until the next page is reached, where it reverts to the first again (see pictures).
Here is a reproducible example, where all values in a table should be the same as the Table number.
Any ideas on how to fix this? I would like it to have the correct caption, but would settle for simply getting rid of the second caption after a page break.
---
title: "Untitled"
author: "Anyone"
date: "2/29/2020"
output:
word_document: default
html_document:
df_print: paged
pdf_document: default
---
```{r setup, include=FALSE}
library(dplyr)
library(flextable)
knitr::opts_chunk$set(echo = FALSE,message=FALSE)
```
```{r Table1}
cars1<-cars*0+1
theme_zebra(regulartable(cars1))%>%
align(align="center",part="all") %>%
autofit%>%
add_header_lines(paste("Table 1: Model output for thing 1"))
```
```{r Table2}
cars2<-cars*0+2
theme_zebra(regulartable(cars2))%>%
align(align="center",part="all") %>%
autofit%>%
add_header_lines(paste("Table 2: Model output for thing 2"))
```
```{r Table3}
cars3<-cars*0+3
theme_zebra(regulartable(cars3))%>%
align(align="center",part="all") %>%
autofit%>%
add_header_lines(paste("Table 3: Model output for thing 3"))
```
```{r Table4}
cars4<-cars*0+4
theme_zebra(regulartable(cars4))%>%
align(align="center",part="all") %>%
autofit%>%
add_header_lines(paste("Table 4: Model output for thing 4"))
```
Note that this is where the page break is. The table labelled '1' here, should really be '2.' It will continue to do this for every page, where the rest of the table is labelled with a '1' (try Rmd code).

That's Word fault :) It decides that two tables with no break between them is the same table.
---
title: "Untitled"
author: "Anyone"
date: "2/29/2020"
output:
word_document: default
---
```{r setup, include=FALSE}
library(dplyr)
library(flextable)
knitr::opts_chunk$set(echo = FALSE,message=FALSE)
```
```{r Table1}
cars1<-cars*0+1
theme_zebra(flextable(cars1))%>%
align(align="center",part="all") %>%
autofit%>%
add_header_lines(paste("Table 1: Model output for thing 1"))
```
blah
```{r Table2}
cars2<-cars*0+2
theme_zebra(flextable(cars2))%>%
align(align="center",part="all") %>%
autofit%>%
add_header_lines(paste("Table 2: Model output for thing 2"))
```
blah
```{r Table3}
cars3<-cars*0+3
theme_zebra(flextable(cars3))%>%
align(align="center",part="all") %>%
autofit%>%
add_header_lines(paste("Table 3: Model output for thing 3"))
```
blah
```{r Table4}
cars4<-cars*0+4
theme_zebra(flextable(cars4))%>%
align(align="center",part="all") %>%
autofit%>%
add_header_lines(paste("Table 4: Model output for thing 4"))
```
As far as i know, this is not something that can be solved with flextable but if there is an option I am not aware of, I'd be happy to integrate it.

Related

R stringr using subset

In R using the stringr package, how would you get only three occurrences of a letter within a word using str_subset?
Example- the letter "a" three times within a word
Results- banana and Canada
library(stringr)
text <- c("Canada", "and", "banana", "baobab")
# Any character repeated three times:
#
# maybe something followed by a marked character, maybe followed by
# something different, followed by that character, maybe followed by
# something different, followed by that character, maybe followed by
# something different
pattern <- "^.*(.)+.*\\1.*\\1.*$"
are_matching <- str_detect(text, pattern)
are_matching
#> [1] TRUE FALSE TRUE TRUE
words_extracted <- str_subset(text, pattern)
words_extracted
#> [1] "Canada" "banana" "baobab"
letter_repeated <- str_replace(words_extracted, pattern, "\\1")
letter_repeated
#> [1] "a" "a" "b"
# That give you the "last" repeated character
str_replace("baobaba", pattern, "\\1")
#> [1] "a"
# Note: If you want the first repeated character (if multiple), you
# should be lazy both at the initial optional set of character and at
# the first marked matching. (Not relevant for "detect" and "subset")
lazy_text <- c("bananan", "baobaba")
lazy_pattern <- "^.*?(.)+?.*\\1.*\\1.*$"
str_replace(lazy_text, pattern, "\\1")
#> [1] "n" "a"
str_replace(lazy_text, lazy_pattern, "\\1")
#> [1] "a" "b"
Created on 2020-09-02 by the reprex package (v0.3.0)
This will give you all words in which at least one letter appears exactly 3 times:
library(tidyverse)
vec <- "banana and Canada"
words <- vec %>% str_split(" ") %>% .[[1]]
lgl_vec <- words %>% map_lgl(
~str_split(.x, "") %>%
.[[1]] %>%
factor() %>%
summary() %>%
"=="(3) %>%
any()
)
words[lgl_vec]
[1] "banana" "Canada"
Use str_extract_all:
input <- c("apple", "banana", "Canada")
regex <- "\\b[^\\WAa]*[Aa][^\\WAa]*[Aa][^\\WAa]*[Aa][^\\WAa]*\\b"
matches <- str_extract_all(input, regex)
Demo

How to strip extra spaces when writing from dataframe to csv

Read in multiple sheets (6) from an xlsx file and created individual dataframes. Want to write each one out to a pipe delimited csv.
ind_dim.to_csv (r'/mypath/ind_dim_out.csv', index = None, header=True, sep='|')
Currently outputs like this:
1|value1 |value2 |word1 word2 word3 etc.
Want to strip trailing blanks
Suggestion
Include the method .apply(lambda x: x.str.rstrip()) to your output string (prior to the .to_csv() call) to strip the right trailing blank from each field across the DataFrame. It would look like:
Change:
ind_dim.to_csv(r'/mypath/ind_dim_out.csv', index = None, header=True, sep='|')
To:
ind_dim.apply(lambda x: x.str.rstrip()).to_csv(r'/mypath/ind_dim_out.csv', index = None, header=True, sep='|')
It can be easily inserted to the output code string using '.' referencing. To handle multiple data types, we can enforce the 'object' dtype on import by including the argument dtype='str':
ind_dim = pd.read_excel('testing_xlsx_nums.xlsx', header=0, index_col=0, sheet_name=None, dtype='str')
Or on the DataFrame itself by:
df = pd.DataFrame(df, dtype='str')
Proof
I did a mock-up where the .xlsx document has 5 sheets, with each sheet having three columns: The first column with all numbers except an empty cell in row 2; the second column with both a leading blank and a trailing blank on strings, an empty cell in row 3, and a number in row 4; and the third column * with all strings having a leading blank, and an empty value in row 4*. Integer indexes and integer columns have been included. The text in each sheet is:
0 1 2
0 11111 valueB1 valueC1
1 valueB2 valueC2
2 33333 valueC3
3 44444 44444
4 55555 valueB5 valueC5
This code reads in our .xlsx testing_xlsx_dtype.xlsx to the DataFrame dictionary ind_dim.
Next, it loops through each sheet using a for loop to place the sheet name variable as a key to reference the individual sheet DataFrame. It applies the .str.rstrip() method to the entire sheet/DataFrame by passing the lambda x: x.str.rstrip() lambda function to the .apply() method called on the sheet/DataFrame.
Finally, it outputs the sheet/DataFrame as a .csv with the pipe delimiter using .to_csv() as seen in the OP post.
# reads xlsx in
ind_dim = pd.read_excel('testing_xlsx_nums.xlsx', header=0, index_col=0, sheet_name=None, dtype='str')
# loops through sheets, applies rstrip(), output as csv '|' delimit
for sheet in ind_dim:
ind_dim[sheet].apply(lambda x: x.str.rstrip()).to_csv(sheet + '_ind_dim_out.csv', sep='|')
Returns:
|0|1|2
0|11111| valueB1| valueC1
1|| valueB2| valueC2
2|33333|| valueC3
3|44444|44444|
4|55555| valueB5| valueC5
(Note our column 2 strings no longer have the trailing space).
We can also reference each sheet using a loop that cycles through the dictionary items; the syntax would look like for k, v in dict.items() where k and v are the key and value:
# reads xlsx in
ind_dim = pd.read_excel('testing_xlsx_nums.xlsx', header=0, index_col=0, sheet_name=None, dtype='str')
# loops through sheets, applies rstrip(), output as csv '|' delimit
for k, v in ind_dim.items():
v.apply(lambda x: x.str.rstrip()).to_csv(k + '_ind_dim_out.csv', sep='|')
Notes:
We'll still need to apply the correct arguments for selecting/ignoring indexes and columns with the header= and names= parameters as needed. For these examples I just passed =None for simplicity.
The other methods that strip leading and leading & trailing spaces are: .str.lstrip() and .str.strip() respectively. They can also be applied to an entire DataFrame using the .apply(lambda x: x.str.strip()) lambda function passed to the .apply() method called on the DataFrame.
Only 1 Column: If we only wanted to strip from one column, we can call the .str methods directly on the column itself. For example, to strip leading & trailing spaces from a column named column2 in DataFrame df we would write: df.column2.str.strip().
Data types not string: When importing our data, pandas will assume data types for columns with a similar data type. We can override this by passing dtype='str' to the pd.read_excel() call when importing.
pandas 1.0.1 documentation (04/30/2020) on pandas.read_excel:
"dtypeType name or dict of column -> type, default None
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} Use object to preserve data as stored in Excel and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion."
We can pass the argument dtype='str' when importing with pd.read_excel.() (as seen above). If we want to enforce a single data type on a DataFrame we are working with, we can set it equal to itself and pass it to pd.DataFrame() with the argument dtype='str like: df = pd.DataFrame(df, dtype='str')
Hope it helps!
The following trims left and right spaces fairly easily:
if (!require(dplyr)) {
install.packages("dplyr")
}
library(dplyr)
if (!require(stringr)) {
install.packages("stringr")
}
library(stringr)
setwd("~/wherever/you/need/to/get/data")
outputWithSpaces <- read.csv("CSVSpace.csv", header = FALSE)
print(head(outputWithSpaces), quote=TRUE)
#str_trim(string, side = c("both", "left", "right"))
outputWithoutSpaces <- outputWithSpaces %>% mutate_all(str_trim)
print(head(outputWithoutSpaces), quote=TRUE)
Starting Data:
V1 V2 V3 V4
1 "Something is interesting. " "This is also Interesting. " "Not " "Intereting "
2 " Something with leading space" " Leading" " Spaces with many words." " More."
3 " Leading and training Space. " " More " " Leading and trailing. " " Spaces. "
Resulting:
V1 V2 V3 V4
1 "Something is interesting." "This is also Interesting." "Not" "Intereting"
2 "Something with leading space" "Leading" "Spaces with many words." "More."
3 "Leading and training Space." "More" "Leading and trailing." "Spaces."

Knitr option hook results in LaTeX output

I'm creating ioslides and beamer output from RMarkdown source files and need to have variable figure output dependent on the output format.
I've generate a plot using ggplot2 which renders fine.
I want the plot to have an out.width set to 100% for HTML output and 70% for LaTeX output.
The problem is that when I set the option hook and check for LaTeX output, the tex file generated by knitr contains the LaTeX source verbatim for including the image which renders as text in the slide.
## Modify the output based on the format we're exporting
knitr::opts_hooks$set (out.width = function (options) {
if (knitr::is_latex_output()) {
options$out.width = '70%'
}
options
})
The plot renders fine in HTML output.
However, for beamer I get as shown in the image:
And the resulting output in the .tex file:
\begin{frame}{Why bother?}
\protect\hypertarget{why-bother}{}
\textbackslash begin\{center\}\textbackslash includegraphics{[}width=70\% {]}\{slide-book\_files/figure-beamer/lec-4-modularity-cost-1\}
\textbackslash end\{center\}
\end{frame}
Here's complete code for a MWE:
## Modify the output based on the format we're exporting
knitr::opts_hooks$set (out.width = function (options) {
if (knitr::is_latex_output()) {
options$out.width = '70%'
}
return (options)
})
```{r, lec-4-modularity-cost, echo=FALSE, out.width='100%', fig.align='center'}
plot.data <- tibble (X = seq (0,100), Y = 2 * X)
ggplot(plot.data, aes(x=X, y=Y)) +
geom_path()
```
When setting out.width this way you have to use a format that LaTeX understands right away, i.e. 0.7\linewidth instead of 70%. And you have to double the backslash in the R code:
---
output:
html_document: default
pdf_document:
keep_tex: yes
---
```{r setup, include=FALSE}
## Modify the output based on the format we're exporting
knitr::opts_hooks$set (out.width = function (options) {
if (knitr::is_latex_output()) {
options$out.width = '0.7\\linewidth'
}
options
})
```
```{r, lec-4-modularity-cost, echo=FALSE, out.width='100%', fig.align='center'}
library(ggplot2)
library(tibble)
plot.data <- tibble (X = seq (0,100), Y = 2 * X)
ggplot(plot.data, aes(x=X, y=Y)) +
geom_path()
```

Combine leaflet and markdown in loop

This question shows how to loop over/apply leaflet objects within a markdown file. I'd like to do a similar thing, though I'd like to add additional markdown content.
---
title: "Test"
output: html_document
---
```{r setup, echo=T,results='asis'}
library(leaflet)
library(dplyr) ### !!! uses development version with tidyeval !!!
library(htmltools)
##Add A Random Year Column
data(quakes)
quakes <- tbl_df(quakes) %>%
mutate(year = sample(2008:2010, n(), replace=TRUE))
```
```{r maps, echo=T,results='asis'}
createMaps <- function(year){
cat(paste("###", year, "\n"))
leaflet(quakes %>% filter(year == !!year)) %>%
addTiles() %>%
addMarkers(
lng = ~long,
lat = ~lat,
popup = ~as.character(mag))
cat("\n\n")
}
htmltools::tagList(lapply(as.list(2008:2010), function(x) createMaps(x) ))
```
If I leave out the cat statements in the createMaps function, this code prints all three maps. If I put in the cat statements, I get the markdown, but no maps. Any way to combine both types of element?
The problem is, that your cat statements are being evaluated, before lapply returns its result list.
Delete the cat statements, change your createMaps function to
createMaps <- function(year){
mymap <- leaflet(quakes %>% filter(year == !!year)) %>%
addTiles() %>%
addMarkers(
lng = ~long,
lat = ~lat,
popup = ~as.character(mag))
return(list(tags$h1(year), mymap))
}
and change tags$h1() to whatever size of header you want (tags$h2(), ...)

Inserting blank spaces at the end of a column name in a table using pander

I am trying to find a way of centering a column heading in a pander table using knitr to pdf in rmarkdwon, but keeping the column entries right justified.
---
title: "Table Doc"
output: pdf_document
---
```{r table, echo = FALSE}
table1 <- anova(lm(Petal.Length ~ Species*Petal.Width, iris))
names(table1) <- c("DF", "Sum Sq", "Mean Sq", "*F*", "*p*")
library(pander)
pander(table1, justify = c("left", rep("right", 5)))
```
There is no way to align individual cells inside a table in pandoc apparently. I want the entries to be to the right so they are all aligned properly but sit the column headings 'F' and 'p' in the center. So what I need to do is insert blank spaces after F and p to force them into the center. How do I do this? I tried simply inserting the blank spaces:
names(table1) <- c("DF", "Sum Sq", "Mean Sq", "*F* ", "*p* ")
but the spaces are not recognised by pander.
I also tried LaTex spacing characters
names(table1) <- c("DF", "Sum Sq", "Mean Sq", "*F*\\", "*p*\\")
but this didn't work either. Can anyone think of a workaround?