Examples of using SCons with knitr - knitr

Are there minimal, or even larger, working examples of using SCons and knitr to generate reports from .Rmd files?
kniting an cleaning_session.Rmd file from the command line (bash shell) to derive an .html file, may be done via:
Rscript -e "library(knitr); knit('cleaning_session.Rmd')".
In this example, Rscript and instructions are fed to a Makefile:
RMDFILE=test
html :
Rscript -e "require(knitr); require(markdown); knit('$(RMDFILE).rmd', '$(RMDFILE).md'); markdownToHTML('$(RMDFILE).md', '$(RMDFILE).html', options=c('use_xhtml', 'base64_images')); browseURL(paste('file://', file.path(getwd(),'$(RMDFILE).html'), sep=''
In this answer https://stackoverflow.com/a/10945832/1172302, there is reportedly a solution using SCons. Yet, I did not test enough to make it work for me. Essentially, it would be awesome to have something like the example presented at https://tex.stackexchange.com/a/26573/8272.

[Updated] One working example is an Sconstruct file:
import os
environment = Environment(ENV=os.environ)
# define a `knitr` builder
builder = Builder(action = '/usr/local/bin/knit $SOURCE -o $TARGET',
src_suffix='Rmd')
# add builders as "Knit", "RMD"
environment.Append( BUILDERS = {'Knit' : builder} )
# define an `rmarkdown::render()` builder
builder = Builder(action = '/usr/bin/Rscript -e "rmarkdown::render(input=\'$SOURCE\', output_file=\'$TARGET\')"',
src_suffix='Rmd')
environment.Append( BUILDERS = {'RMD' : builder} )
# define source (and target files -- currently useless, since not defined above!)
# main cleaning session code
environment.RMD(source='cleaning_session.Rmd', target='cleaning_session.html')
# documentation of the Cleaning Process
environment.Knit(source='Cleaning_Process.Rmd', target='Cleaning_Process.html')
# documentation of data
environment.Knit(source='Code_Book.Rmd', target='Code_Book.html')
The first builder calls the custom script called knit. Which, in turn, takes care of the target file/extension, here being cleaning_session.html. Likely the suffix parameter is not needed altogether, in this very example.
The second builder added is Rscript -e "rmarkdown::render(\'$SOURCE\')"'.
The existence of $TARGETs (as in the example at Command wrapper) ensures SCons won't repeat work if a target file already exists.
The custom script (whose source I can't retrieve currently) is:
#!/usr/bin/env Rscript
local({
p = commandArgs(TRUE)
if (length(p) == 0L || any(c('-h', '--help') %in% p)) {
message('usage: knit input [input2 input3] [-n] [-o output output2 output3]
-h, --help to print help messages
-n, --no-convert do not convert tex to pdf, markdown to html, etc
-o output filename(s) for knit()')
q('no')
}
library(knitr)
o = match('-o', p)
if (is.na(o)) output = NA else {
output = tail(p, length(p) - o)
p = head(p, o - 1L)
}
nc = c('-n', '--no-convert')
knit_fun = if (any(nc %in% p)) {
p = setdiff(p, nc)
knit
} else {
if (length(p) == 0L) stop('no input file provided')
if (grepl('\\.(R|S)(nw|tex)$', p[1])) {
function(x, ...) knit2pdf(x, ..., clean = TRUE)
} else {
if (grepl('\\.R(md|markdown)$', p[1])) knit2html else knit
}
}
mapply(knit_fun, p, output = output, MoreArgs = list(envir = globalenv()))
})
The only thing, now, necessary is to run scons.

Related

Output file not created when running a R command in a Nextflow file?

I am trying to run a nextflow pipeline but the output file is not created.
The main.nf file looks like this:
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
process my_script {
"""
Rscript script.R
"""
}
workflow {
my_script
}
In my nextflow.config I have:
process {
executor = 'k8s'
container = 'rocker/r-ver:4.1.3'
}
The script.R looks like this:
FUN <- readRDS("function.rds");
input = readRDS("input.rds");
output = FUN(
singleCell_data_input = input[[1]], savePath = input[[2]], tmpDirGC = input[[3]]
);
saveRDS(output, "output.rds")
After running nextflow run main.nf the output.rds is not created
Nextflow processes are run independently and isolated from each other from inside the working directory. For your script to be able to find the required input files, these must be localized inside the process working directory. This should be done by defining an input block and declaring the files using the path qualifier, for example:
params.function_rds = './function.rds'
params.input_rds = './input.rds'
process my_script {
input:
path my_function_rds
path my_input_rds
output:
path "output.rds"
"""
#!/usr/bin/env Rscript
FUN <- readRDS("${my_function_rds}");
input = readRDS("${my_input_rds}");
output = FUN(
singleCell_data_input=input[[1]], savePath=input[[2]], tmpDirGC=input[[3]]
);
saveRDS(output, "output.rds")
"""
}
workflow {
function_rds = file( params.function_rds )
input_rds = file( params.input_rds )
my_script( function_rds, input_rds )
my_script.out.view()
}
In the same way, the script itself would need to be localized inside the process working directory. To avoid specifying an absolute path to your R script (which would not make your workflow portable at all), it's possible to simply embed your code, making sure to specify the Rscript shebang. This works because process scripts are not limited to Bash1.
Another way, would be to make your Rscript executable and move it into a directory called bin in the the root directory of your project repository (i.e. the same directory as your 'main.nf' Nextflow script). Nextflow automatically adds this folder to the $PATH environment variable and your script would become automatically accessible to each of your pipeline processes. For this to work, you'd need some way to pass in the input files as command line arguments. For example:
params.function_rds = './function.rds'
params.input_rds = './input.rds'
process my_script {
input:
path my_function_rds
path my_input_rds
output:
path "output.rds"
"""
script.R "${my_function_rds}" "${my_input_rds}" output.rds
"""
}
workflow {
function_rds = file( params.function_rds )
input_rds = file( params.input_rds )
my_script( function_rds, input_rds )
my_script.out.view()
}
And your R script might look like:
#!/usr/bin/env Rscript
args <- commandArgs(trailingOnly = TRUE)
FUN <- readRDS(args[1]);
input = readRDS(args[2]);
output = FUN(
singleCell_data_input=input[[1]], savePath=input[[2]], tmpDirGC=input[[3]]
);
saveRDS(output, args[3])

object not found when creating targets list programmatically

I'm trying to generate a {targets} list programmatically, via a function in an R package.
get_pipeline <- function(which_countries) {
countries <- NULL # avoid R CMD CHECK warning
print(which_countries) # Shows that which_countries is available
list(
targets::tar_target(
name = countries,
command = which_countries # But here, which_countries is not found
)
)
}
The _targets.R file looks like this:
library(targets)
couns <- c("USA", "GBR")
TargetsQuestions::get_pipeline(couns)
I see the following error:
> tar_make()
[1] "USA" "GBR"
Error in enexpr(expr) : object 'which_countries' not found
Error in `tar_throw_run()`:
! callr subprocess failed: object 'which_countries' not found
Note that the which_countries variable is printable, but not found in the call to tar_target.
How can I get create the countries target successfully so that it contains the vector c("USA", "GBR")?
This code is in a GitHub repository at https://github.com/MatthewHeun/TargetsQuestions. To reproduce:
git clone https://github.com/MatthewHeun/TargetsQuestions
Build the package in RStudio.
targets::tar_make() at the Console.
Thanks in advance for any suggestions!
Thanks to #landau for pointing to https://wlandau.github.io/targetopia/contributing.html#target-factories which in turn points to the metaprogramming section of Advanced R at https://adv-r.hadley.nz/metaprogramming.html.
The solution turned out to be:
get_pipeline <- function(which_countries) {
list(
targets::tar_target_raw(
name = "countries",
# command = which_countries # which_countries must have length 1
# command = !!which_countries # invalid argument type
command = rlang::enexpr(which_countries) # Works
)
)
}
With _targets.R like this:
library(targets)
couns <- c("USA", "GBR")
TargetsQuestions::get_pipeline(couns)
the commands tar_make() and tar_read(countries), give
[1] "USA" "GBR"
as expected!

Within a gimp python-fu plug-in can one create/invoke a modal dialog (and/or register a procedure that is ONLY to be added as a temp procedure?)

I am trying to add a procedure to pop-up a modal dialog inside a plug-in.
Its purpose is to query a response at designated steps within the control-flow of the plug-in (not just acquire parameters at its start).
I have tried using gtk - I get a dialog but it is asynchronous - the plugin continues execution. It needs to operate as a synchronous function.
I have tried registering a plugin in order to take advantage of the gimpfu start-up dialogue for same. By itself, it works; it shows up in the procedural db when queried. But I never seem to be able to actually invoke it from within another plug-in - its either an execution error or wrong number of arguments no matter how many permutations I try.
[Reason behind all of this nonsense: I have written a lot of extension Python scripts for PaintShopPro. I have written a App package (with App.Do, App.Constants, Environment and the like that lets me begin to port those scripts to GIMP -- yes it is perverse, and yes sometimes the code just has to be rewritten, but for a lot of what I actual use in the PSP.API it is sufficient.
However, debugging and writing the module rhymes with witch. So. I am trying to add emulation of psp's "SetExecutionMode" (ie interactive). If
set, the intended behavior is that the App.Do() method will "pause" after/before it runs the applicable psp emulation code by popping up a simple message dialog.]
A simple modal dialogue within a gimp python-fu plug-in can be implemented via gtk's Dialog interface, specifically gtk.MessageDialog.
A generic dialog can be created via
queryDialogue = gtk.MessageDialog(None, gtk.DIALOG_DESTROY_WITH_PARENT \
gtk.MESSAGE_QUESTION, \
gtk.BUTTONS_OK_CANCEL, "")
Once the dialog has been shown,
a synchronous response may be obtained from it
queryDialogue.show()
response = queryDialogue.run()
queryDialogue.hide()
The above assumes that the dialog is not created and thence destroyed after each use.
In the use case (mentioned in the question) of a modal dialog to manage single stepping through a pspScript in gimp via an App emulator package, the dialogue message contents need to be customized for each use. [Hence, the "" for the message argument in the Constructor. [more below]]
In addition, the emulator must be able to accept a [cancel] response to 'get out of Dodge' - ie quit the entire plug-in (gracefully). I could not find a gimpfu interface for the latter, (and do not want to kill the app entirely via gimp.exit()). Hence, this is accomplished by raising a custom Exception class [appTerminate] within the App pkg and catching the exception in the outer-most scope of the plugin. When caught, then, the plug-in returns (exits).[App.Do() can not return a value to indicate continue/exit/etc, because the pspScripts are to be included verbatim.]
The following is an abbreviated skeleton of the solution -
a plug-in incorporating (in part) a pspScript
the App.py pkg supplying the environment and App.Do() to support the pspScript
a Map.py pkg supporting how pspScripts use dot-notation for parameters
App.py demonstrates creation, customization and use of a modal dialog - App.doContinue() displays the dialogue illustrating how it can be customized on each use.
App._parse() parses the pspScript (excerpt showing how it determines to start/stop single-step via the dialogue)
App._exec() implements the pspScript commands (excerpt showing how it creates the dialogue, identifies the message widget for later customization, and starts/stops its use)
# App.py (abbreviated)
#
import gimp
import gtk
import Map # see https://stackoverflow.com/questions/2352181/how-to- use-a-dot-to-access-members-of-dictionary
from Map import *
pdb = gimp.pdb
isDialogueAvailable = False
queryDialogue = None
queryMessage = None
Environment = Map({'executionMode' : 1 })
_AutoActionMode = Map({'Match' : 0})
_ExecutionMode = Map({'Default' : 0}, Silent=1, Interactive=2)
Constants = Map({'AutoActionMode' : _AutoActionMode}, ExecutionMode=_ExecutionMode ) # etc...
class appTerminate(Exception): pass
def Do(eNvironment, procedureName, options = {}):
global appTerminate
img = gimp.image_list()[0]
lyr = pdb.gimp_image_get_active_layer(img)
parsed = _parse(img, lyr, procedureName, options)
if eNvironment.executionMode == Constants.ExecutionMode.Interactive:
resp = doContinue(procedureName, parsed.detail)
if resp == -5: # OK
print procedureName # log to stdout
if parsed.valid:
if parsed.isvalid:
_exec(img, lyr, procedureName, options, parsed, eNvironment)
else:
print "invalid args"
else:
print "invalid procedure"
elif resp == -6: # CANCEL
raise appTerminate, "script cancelled"
pass # terminate plugin
else:
print procedureName + " skipped"
pass # skip execution, continue
else:
_exec(img, lyr, procedureName, options, parsed, eNvironment)
return
def doContinue(procedureName, details):
global queryMessage, querySkip, queryDialogue
# - customize the dialog -
if details == "":
msg = "About to execute procedure \n "+procedureName+ "\n\nContinue?"
else:
msg = "About to execute procedure \n "+procedureName+ "\n\nDetails - \n" + details +"\n\nContinue?"
queryMessage.set_text(msg)
queryDialogue.show()
resp = queryDialogue.run() # get modal response
queryDialogue.hide()
return resp
def _parse(img, lyr, procedureName, options):
# validate and interpret App.Do options' semantics vz gimp
if procedureName == "Selection":
isValid=True
# ...
# parsed = Map({'valid' : True}, isvalid=True, start=Start, width=Width, height=Height, channelOP=ChannelOP ...
# /Selection
# ...
elif procedureName == "SetExecutionMode":
generalOptions = options['GeneralSettings']
newMode = generalOptions['ExecutionMode']
if newMode == Constants.ExecutionMode.Interactive:
msg = "set mode interactive/single-step"
else:
msg = "set mode silent/run"
parsed = Map({'valid' : True}, isvalid=True, detail=msg, mode=newMode)
# /SetExecutionMode
else:
parsed = Map({'valid' : False})
return parsed
def _exec(img, lyr, procedureName, options, o, eNvironment):
global isDialogueAvailable, queryMessage, queryDialogue
#
try:
# -------------------------------------------------------------------------------------------------------------------
if procedureName == "Selection":
# pdb.gimp_rect_select(img, o.start[0], o.start[1], o.width, o.height, o.channelOP, ...
# /Selection
# ...
elif procedureName == "SetExecutionMode":
generalOptions = options['GeneralSettings']
eNvironment.executionMode = generalOptions['ExecutionMode']
if eNvironment.executionMode == Constants.ExecutionMode.Interactive:
if isDialogueAvailable:
queryDialogue.destroy() # then clean-up and refresh
isDialogueAvailable = True
queryDialogue = gtk.MessageDialog(None, gtk.DIALOG_DESTROY_WITH_PARENT, gtk.MESSAGE_QUESTION, gtk.BUTTONS_OK_CANCEL, "")
queryDialogue.set_title("psp/APP.Do Emulator")
queryDialogue.set_size_request(450, 180)
aqdContent = queryDialogue.children()[0]
aqdHeader = aqdContent.children()[0]
aqdMsgBox = aqdHeader.children()[1]
aqdMessage = aqdMsgBox.children()[0]
queryMessage = aqdMessage
else:
if isDialogueAvailable:
queryDialogue.destroy()
isDialogueAvailable = False
# /SetExecutionMode
else: # should not get here (should have been screened by parse)
raise AssertionError, "unimplemented PSP procedure: " + procedureName
except:
raise AssertionError, "App.Do("+procedureName+") generated an exception:\n" + sys.exc_info()
return
A skeleton of the plug-in itself. This illustrates incorporating a pspScript which includes a request for single-step/interactive execution mode, and thus the dialogues. It catches the terminate exception raised via the dialogue, and then terminates.
def generateWebImageSet(dasImage, dasLayer, title, mode):
try:
img = dasImage.duplicate()
# ...
bkg = img.layers[-1]
frameWidth = 52
start = bkg.offsets
end = (start[0]+bkg.width, start[1]+frameWidth)
# pspScript: (snippet included verbatim)
# SetExecutionMode / begin interactive single-step through pspScript
App.Do( Environment, 'SetExecutionMode', {
'GeneralSettings': {
'ExecutionMode': App.Constants.ExecutionMode.Interactive
}
})
# Selection
App.Do( Environment, 'Selection', {
'General' : {
'Mode' : 'Replace',
'Antialias' : False,
'Feather' : 0
},
'Start': start,
'End': end
})
# Promote
App.Do( Environment, 'SelectPromote' )
# und_so_weiter ...
except App.appTerminate:
raise AssertionError, "script cancelled"
# /generateWebImageSet
# _generateFloatingCanvasSetWeb.register -----------------------------------------
#
def generateFloatingCanvasSetWeb(dasImage, dasLayer, title):
mode="FCSW"
generateWebImageSet(dasImage, dasLayer, title, mode)
register(
"generateFloatingCanvasSetWeb",
"Generate Floating- Frame GW Canvas Image Set for Web Page",
"Generate Floating- Frame GW Canvas Image Set for Web Page",
"C G",
"C G",
"2019",
"<Image>/Image/Generate Web Imagesets/Floating-Frame Gallery-Wrapped Canvas Imageset...",
"*",
[
( PF_STRING, "title", "title", "")
],
[],
generateFloatingCanvasSetWeb)
main()
I realize that this may seem like a lot of work just to be able to include some pspScripts in a gimp plug-in, and to be able to single-step through the emulation. But we are talking about maybe 10K lines of scripts (and multiple scripts).
However, if any of this helps anyone else with dialogues inside plug-ins, etc., so much the better.

Using input function with remote files in snakemake

I want to use a function to read inputs file paths from a dataframe and send them to my snakemake rule. I also have a helper function to select the remote from which to pull the files.
from snakemake.remote.GS import RemoteProvider as GSRemoteProvider
from snakemake.remote.SFTP import RemoteProvider as SFTPRemoteProvider
from os.path import join
import pandas as pd
configfile: "config.yaml"
units = pd.read_csv(config["units"]).set_index(["library", "unit"], drop=False)
TMP= join('data', 'tmp')
def access_remote(local_path):
""" Connnects to remote as defined in config file"""
provider = config['provider']
if provider == 'GS':
GS = GSRemoteProvider()
remote_path = GS.remote(join("gs://" + config['bucket'], local_path))
elif provider == 'SFTP':
SFTP = SFTPRemoteProvider(
username=config['user'],
private_key=config['ssh_key']
)
remote_path = SFTP.remote(
config['host'] + ":22" + join(base_path, local_path)
)
else:
remote_path = local_path
return remote_path
def get_fastqs(wc):
"""
Get fastq files (units) of a particular library - sample
combination from the unit sheet.
"""
fqs = units.loc[
(units.library == wc.library) &
(units.libtype == wc.libtype),
"fq1"
]
return {
"r1": list(map(access_remote, fqs.fq1.values)),
}
# Combine all fastq files from the same sample / library type combination
rule combine_units:
input: unpack(get_fastqs)
output:
r1 = join(TMP, "reads", "{library}_{libtype}.end1.fq.gz")
threads: 12
run:
shell("cat {i1} > {o1}".format(i1=input['r1'], o1=output['r1']))
My config file contains the bucket name and provider, which are passed to the function. This works as expected when running simply snakemake.
However, I would like to use the kubernetes integration, which requires passing the provider and bucket name in the command line. But when I run:
snakemake -n --kubernetes --default-remote-provider GS --default-remote-prefix bucket-name
I get this error:
ERROR :: MissingInputException in line 19 of Snakefile:
Missing input files for rule combine_units:
bucket-name/['bucket-name/lib1-unit1.end1.fastq.gz', 'bucket-name/lib1-unit2.end1.fastq.gz', 'bucket-name/lib1-unit3.end1.fastq.gz']
The bucket is applied twice (once mapped correctly to each element, and once before the whole list (which gets converted to a string). Did I miss something ? Is there a good way to work around this ?

Scala command line parser using Scallop

I'm fairly new to Scala and need to build a really simple command line parser which provides something like the following which I created using JRuby in a few minutes:-
java -jar demo.jar --help
Command Line Example Application
Example: java -jar demo.jar --dn "CN=Test" --nde-url "http://www.example.com" --password "password"
For usage see below:
-n http://www.example.com
-p, --password set the password
-c, --capi set add to Windows key-store
-h, --help Show this message
-v, --version Print version
Scallop looks like it will do the trick, but I can't seem to find a simple example that works! All of the examples I've found seem to be fragmented and don't work for some reason or other.
UPDATE
I found this example which works, but I'm not sure how to bind it into the actual args within the main method.
import org.rogach.scallop._;
object cmdlinetest {
def main(args: Array[String])
val opts = Scallop(List("-d","--num-limbs","1"))
.version("test 1.2.3 (c) 2012 Mr Placeholder")
.banner("""Usage: test [OPTION]... [pet-name]
|test is an awesome program, which does something funny
|Options:
|""".stripMargin)
.footer("\nFor all other tricks, consult the documentation!")
.opt[Boolean]("donkey", descr = "use donkey mode")
.opt("monkeys", default = Some(2), short = 'm')
.opt[Int]("num-limbs", 'k',
"number of libms", required = true)
.opt[List[Double]]("params")
.opt[String]("debug", hidden = true)
.props[String]('D',"some key-value pairs")
// you can add parameters a bit later
.args(List("-Dalpha=1","-D","betta=2","gamma=3", "Pigeon"))
.trailArg[String]("pet name")
.verify
println(opts.help)
}
}
Well, I'll try to add more examples :)
In this case, it would be much better to use ScallopConf:
import org.rogach.scallop._
object Main extends App {
val opts = new ScallopConf(args) {
banner("""
NDE/SCEP Certificate enrollment prototype
Example: java -jar demo.jar --dn CN=Test --nde-url http://www.example.com --password password
For usage see below:
""")
val ndeUrl = opt[String]("nde-url")
val password = opt[String]("password", descr = "set the password")
val capi = toggle("capi", prefix = "no-", descrYes = "enable adding to Windows key-store", descrNo = "disable adding to Windows key-store")
val version = opt[Boolean]("version", noshort = true, descr = "Print version")
val help = opt[Boolean]("help", noshort = true, descr = "Show this message")
}
println(opts.password())
}
It prints:
$ java -jar demo.jar --help
NDE/SCEP Certificate enrollment prototype
Example: java -jar demo.jar --dn CN=Test --nde-url http://www.example.com --password password
For usage see below:
-c, --capi enable adding to Windows key-store
--no-capi disable adding to Windows key-store
--help Show this message
-n, --nde-url <arg>
-p, --password <arg> set the password
--version Print version
Did you read the documentation? It looks like all you have to do is call get for each option you want:
def get [A] (name: String)(implicit m: Manifest[A]): Option[A]
It looks like you might need to provide the expected return type in the method call. Try something like this:
val donkey = opts.get[Boolean]("donkey")
val numLimbs = opts.get[Int]("num-limbs")
If you're just looking for a quick and dirty way to parse command line arguments, you can use pirate, an extremely barebones way to parse arguments. Here is what it would look like to handle the usage you describe above:
import com.mosesn.pirate.Pirate
object Main {
def main(commandLineArgs: Array[String]) {
val args = Pirate("[ -n string ] [ -p string ] [ -chv ]")("-n whatever -c".split(" "))
val c = args.flags.contains('c')
val v = args.flags.contains('v')
val h = args.flags.contains('h')
val n = args.strings.get("n")
val p = args.strings.get("p")
println(Seq(c, v, h, n, p))
}
}
Of course, for your program, you would pass commandLineArgs instead of "-n whatever -c".
Unfortunately, pirate does not yet support GNU style arguments, nor the version or help text options.