Still problems with R CMD check: undocumented S4 classes - class

I read the long list of questions posted in stackoverlow and other websites about problems with R CMD check and missing documentation. However, I still did not find the correct information to make Roxygen2 to generate the correct .RD file to solve the WARNING in R CMD check.
The WARNING is generated when documenting an S4 class.
I generated a package "TMP" just composed by a S4 class XPSCoreLine
This is my Roxygen header and the R code:
#' #title class XPSCoreLine
#' #description definition of the coreLine class for the XPS Core-Line spectra
#'
#' #slot RegionToFit the portion of the spectrum to fit
#' #slot Baseline the Baseline applied to subtract background
#' #slot Components the fitting components
#' #slot Fit the best fit
#' #slot Boundaries the values of the RegionToFit edges
#' #slot RSF the relative seisitivity factor associated to the element spectrum
#' #slot Shift the energy correction shift if charging present
#' #slot units the adopted units: kinetic/binding energy, counts/counts_per_second
#' #slot Flags logical
#' #slot Info information regarding the spectrum acquisition
#' #slot Symbol symbol of the element associated to the spectrum
#'
#' #aliases XPSCoreLine
#' #keywords XPSCoreLine
#' #name XPSCoreLine
#' #rdname XPSCoreLine
#' #docType class
#'
#' #examples
#' \dontrun{
#' test <- new("XPSCoreLine", Info="test", units=c("Binding [eV]", "Counts"))
#' }
#' #exportClass XPSCoreLine
#'
setClass("XPSCoreLine",
representation(
RegionToFit="list",
Baseline="list",
Components="list",
Fit="list",
Boundaries="list",
RSF="numeric",
Shift="numeric",
units="character",
Flags="logical",
Info="character",
Symbol="character"
),
prototype(
RegionToFit=list(),
Baseline=list(),
Components=list(),
Fit=list(),
RSF=0,
Shift=0,
Boundaries=list(),
units=c("Binding Energy [eV]","Intensity [cps]"),
Flags=c(TRUE, FALSE, FALSE),
Info="",
Symbol=""
),
contains = "list"
)
This generates the following .../man/XPSCoreLine.RD:
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/XPSClass.r
\docType{class}
\name{XPSCoreLine}
\alias{XPSCoreLine}
\title{class XPSCoreLine}
\description{
definition of the coreLine class for the XPS Core-Line spectra
}
\section{Slots}{
\describe{
\item{\code{RegionToFit}}{the portion of the spectrum to fit}
\item{\code{Baseline}}{the Baseline applied to subtract background}
\item{\code{Components}}{the fitting components}
\item{\code{Fit}}{the best fit}
\item{\code{Boundaries}}{the values of the RegionToFit edges}
\item{\code{RSF}}{the relative seisitivity factor associated to the element spectrum}
\item{\code{Shift}}{the energy correction shift if charging present}
\item{\code{units}}{the adopted units: kinetic/binding energy, counts/counts_per_second}
\item{\code{Flags}}{logical}
\item{\code{Info}}{information regarding the spectrum acquisition}
\item{\code{Symbol}}{symbol of the element associated to the spectrum}
}}
\examples{
\dontrun{
test <- new("XPSCoreLine", Info="test", units=c("Binding [eV]", "Counts"))
}
}
\keyword{XPSCoreLine}
The NAMESPACE contains:
# Generated by roxygen2: do not edit by hand
exportClasses(XPSCoreLine)
In RStudio the command check_man applied to the package "TMP" just composed by the S4class XPSCoreLine defined above:
> check_man("~/R/tmp/")
ℹ Updating TMP documentation
ℹ Loading TMP
ℹ Checking documentation...
✔ No issues detected
However running devtools::check using the command check("~/R/tmp/", document=TRUE)
I'm unable to get rid of the WARNING
checking for missing documentation entries ... WARNING
Undocumented S4 classes:
‘XPSCoreLine’
Undocumented S4 methods:
generic 'hasBoundaries' and siglist 'XPSCoreLine'
All user-level objects in a package (including S4 classes and methods)
should have documentation entries.
See chapter ‘Writing R documentation files’ in the ‘Writing R
Extensions’ manual.
Is there anybody which can suggest a solution?
Thanks in advance
G. Speranza

Following a private mail exchange with Hadley Wickman, the roxygen maintainer, he kindly answered my request suggesting to remove
#' #aliases XPSCoreLine
#' #keywords XPSCoreLine
#' #name XPSCoreLine
#' #rdname XPSCoreLine
#' #docType class
and replace
#' #exportClass XPSCoreLine
with
#' #export
"Otherwise you've effectively disabled all the S4 stuff that roxygen2 does for you."
This makes R CMD check to recognize the S4 class 'XPSCoreLine'. I then got a second error from R CMD check:
❯ checking for code/documentation mismatches ... WARNING
S4 class codoc mismatches from documentation object 'XPSCoreLine-class':
Slots for class 'XPSCoreLine'
Code: .Data Baseline Boundaries Components Fit Flags Info RSF
RegionToFit Shift Symbol units
Docs: Baseline Boundaries Components Fit Flags Info RSF RegionToFit
Shift Symbol units
The slot .Data is implicitly introduced in the S4 class. It is not included in the representation and in the prototype. I included it in the roxygen header then the correct code for the 'XPSCoreLine' class is:
#' #title class XPSCoreLine
#' #description definition of the coreLine class for the XPS Core-Line spectra
#'
#' #slot .Data contains the X, Y spectral data
#' #slot RegionToFit the portion of the spectrum to fit
#' #slot Baseline the Baseline applied to subtract background
#' #slot Components the fitting components
#' #slot Fit the best fit
#' #slot Boundaries the values of the RegionToFit edges
#' #slot RSF the relative seisitivity factor associated to the element spectrum
#' #slot Shift the energy correction shift if charging present
#' #slot units the adopted units: kinetic/binding energy, counts/counts_per_second
#' #slot Flags logical
#' #slot Info information regarding the spectrum acquisition
#' #slot Symbol symbol of the element associated to the spectrum
#'
#' #examples
#' \dontrun{
#' test <- new("XPSCoreLine", Info="test", units=c("Binding [eV]", "Counts"))
#' }
#' #export
#'
setClass("XPSCoreLine",
representation(
RegionToFit="list",
Baseline="list",
Components="list",
Fit="list",
Boundaries="list",
RSF="numeric",
Shift="numeric",
units="character",
Flags="logical",
Info="character",
Symbol="character"
),
prototype(
RegionToFit=list(),
Baseline=list(),
Components=list(),
Fit=list(),
RSF=0,
Shift=0,
Boundaries=list(),
units=c("Binding Energy [eV]","Intensity [cps]"),
Flags=c(TRUE, FALSE, FALSE),
Info="",
Symbol=""
),
contains = "list"
)
This removes the last R CMD check warning...

Related

Extending Stargazer to multiwaycov

I'm using stargazer to create regression outputs for my bachelor thesis. Due to the structure of my data I have to use clustered models (code below). I'm using the vcovclust command from the multiwaycov package, which works perfectly. However, stargazer does not support it. Do you know another way to create outputs as nice as stargazer does? Or do you know an other package/command to cluster the models, which is suppported by stargazer?
model1.1.2 <- lm(leaflet ~ partisan + as.factor(gender) + age + as.factor(education) + meaning + as.factor(polintrest), data = voxit)
summary(model1.1.2)
#clustering
vcov_clust1.1.2 <- cluster.vcov(model1.1.2, cbind(voxit$id, voxit$projetx))
coeftest(model1.1.2, vcov_clust1.1.2)
You can supply the adjusted p- and se-values to stargazer manually.
# model1 and model2 are both objects returned from coeftest()
# Capture them in an object and extract the ses (2nd column) and ps (4th column) in a list
ses <- list(model1[,2], model2[,2])
ps <- list(model1[,4], model2[,4])
# you can then run your normal stargazer command and supply
# the se- and p-values manually to the stargazer function
stargazer(model1, model2, type = "text", se = ses, p = ps, p.auto = F)
Hope this helps!

Cannot bracket slice for node, WinBUGS

So I have been working on this code for awhile running hazard models for a population and for one of my initial parameters it cannot figure out how to fix it. This is the area of code that is giving me problems. I am coding in Notebook++, running in R, and it opens in WinBUGS to run it. Hz.Scale is the parameter that I am struggling with and that the error comes up with.
# Define model - Model 02 = hazard
if(detect.fun=="hazard"){
sink(paste(top.dir,model.name,sep="/"))
cat("
model{
# Priors
psi ~ dunif(0,1) # Data augmentation parameter
#hz.scale ~ dgamma(0.1,0.1) # Scale parameter for hazard distribution (sigma) #ORIGINAL CODE
hz.scale ~ dgamma(0,10) # Scale parameter for hazard distribution (sigma) #change the parameters to test if it really runs
hz.shape ~ dunif(0.001,1000) # Shape parameter for hazard distribution (b in green & yellow book)
lambda ~ dunif(0,20) # Expectation for cluster size
# Likelihood
## Construct conditional detection probability (log(p[g])) and Pr(x) (pi[g]) for each bin (g)
for(g in 1:nD){
# Hazard model:
cloglog(p[g]) <- hz.scale*log(hz.shape) - hz.scale*log(midpt[g]) # Kery and Royle 2016 page 507
# Probability of x in each interval
pi[g] <- delta[g]/B
}#g
for(i in 1:(nclus+nz)){
z[i] ~ dbern(psi) # Real observation or augmented?
dclass[i] ~ dcat(pi[]) # population distribution of distance class
mu[i] <- z[i] * p[dclass[i]] # p depends on distance class
y[i] ~ dbern(mu[i]) # Observed or not?
clsz[i] ~ dpois(lambda) # Poisson process for cluster size
}#i
# Derived quantities
num.clusters <- sum(z[]) # Number of clusters
back.lambda <- lambda+1 # Back-transformed lambda (minimum is 1)
total.pop <- num.clusters*back.lambda
}#model
",fill=TRUE)#cat
sink()
# Inits function and parameters to save
inits <- function(){list(psi=runif(1),z=y,hz.shape=runif(1,40,200),hz.scale=rgamma(1,0.1,0.1),lambda=runif(1,0,10))}
params <- c("hz.scale","hz.shape","num.clusters","back.lambda","total.pop")
}# hazard
# hazard

GLM.fit() in Matlab vs. Python Statsmodels: why the different results?

In what ways is Matlab's glmfit implemented differently than Python statsmodels' GLM.fit()?
Here is a comparison of their results on my dataset:
This represents graph 209 weights, generated from running GLM fit on:
V: (100000, 209) predictor variable (design matrix)
y: (100000,1) response variable
Sum of squared errors: 18.140615678
A Specific Example
Why are these different? First, here's a specific example in Matlab:
yin = horzcat(y,ones(size(y)));
[weights_mat, d0, st0]=glmfit(V, yin,'binomial','probit','off',[],[],'off');
Let's try the equivalent in Python:
import statsmodels.api as sm
## set up GLM
y = np.concatenate((y, np.ones( [len(y),1] )), axis=1)
sm_probit_Link = sm.genmod.families.links.probit
glm_binom = sm.GLM(sm.add_constant(y), sm.add_constant(V_design_matrix), family=sm.families.Binomial(link=sm_probit_Link))
# statsmodels.GLM format: glm_binom = sm.GLM(data.endog, data.exog, family)
## Run GLM fit
glm_result = glm_binom.fit()
weights_py = glm_result.params
## Compare the difference
weights_mat_import = Matpy.get_output('w_output.mat', 'weights_mat') # imports matlab variables
print SSE(weights_mat_import, weights_python)
Let's Check The Docs
glmfit in Matlab:
[b,dev,stats] = glmfit(X,y,distr)
GLM.fit() setup in Python (documentation ) :
glm_model = sm.GLM(endog, exog, family=None, offset=None, exposure=None, missing='none', **kwargs)
glm_model.fit(start_params=None, maxiter=100, method='IRLS', tol=1e-08, scale=None, cov_type='nonrobust', cov_kwds=None, use_t=None, **kwargs)
How might we get Matlab glmfit results with Statsmodels?
Thank you!

Logistic Regression(Classification Technique) on Time-dependent Predictors/variables Data

I would like to know if I can apply the classification techniques, like say Logistic Regression, to data whose variables/predictors are 'indexed' by time. Or if not, what classification techniques are appropriate to use in these kinds of data.
To give you a clear picture of the problem, say I have a dependent variable Y, whose values are 0 or 1 (for binary case classification), or 1,2,3,... (for 'multi' classification).
And I have predictor variables which are 'indexed' by time, i.e., X1T1, X1T2,...,X1Tn, X2T1, X2T2,..., X2Tm,....XpTk,
where
X1T1 = values of variable X1 at time 1 (T1)
X1T2 = values of variable X1 at time 2 (T2)
.
.
X1Tn = values of variable X1 at time n (Tn)
X2T1 = values of variable X2 at time 1 (T1)
X2T2 = values of variable X2 at time 2 (T2)
.
.
X2Tm = values of variable X2 at time m (Tm)
.
.
.
XpTk = values of variable Xp at time k (Tk)
where n,m,k = 1,2,... (variable time 'index')
p =1,2,.... (# of predictor variables).
For the data view, we have;
Obs Y X1T1 X1T2 ... X1Tn X2T1 X2T3 ... X2Tm ... XpTk
1 . . . . . . . .
2 . . . . . . . ... .
.
.
.
N . . . . . . . ... .
Can I apply a classification technique, like say, Logistic Regression on these types of data (or other classification techniques for 'multi' category response variable like tree based methods.) Thanks a lot!
You could use your data to fit Logistic Regression model.
However, the result may not so good because the algorithm would consider each variable independently.
One way to use your data to fit LR model is to make new variables which are okay to be considered that they are independent to each other and keep enough information to represent your original data.
For example)
newvar1 = mean(X1T1, X1T2, ..., X1Tn),
newvar2 = sd(X1T1, X1T2, ..., X1Tn),
newvar3 = mean increasing ratio(X1T1, X1T2, ..., X1Tn)
...
This way, you can use your data to fit LR model, even though the examples of new variables would not be enough.

Bootstrapping Stepwise Regression in Stata

I'm trying to bootstrap a stepwise regression in Stata and extract the bootstrapped coefficients. I have two separate ado files. sw_pbs is the command the user uses, which calls the helper command sw_pbs_simulator.
program define sw_pbs, rclass
syntax varlist, [reps(integer 100)]
simulate _b, reps(`reps') : sw_pbs_simulator `varlist'
end
program define sw_pbs_simulator, rclass
syntax varlist
local depvar : word 1 of `varlist'
local indepvar : list varlist - depvar
reg `depvar' `indepvar'
local rmse = e(rmse)
matrix b_matrix = e(b)'
gen col_of_ones = 1
mkmat `indepvar' col_of_ones, mat(x_matrix)
gen errs = rnormal(0, `rmse')
mkmat errs, mat(e_matrix)
matrix y = x_matrix * b_matrix + e_matrix
svmat y
sw reg y `indepvar', pr(0.10) pe(0.05)
drop col_of_ones errs y
end
The output is a data set of the bootstrapped coefficients. My problem is that the output seems to be dependent on the result of the first stepwise regression simulation. For example if I had the independent variables var1 var2 var3 var4 and the first stepwise simulation includes only var1 and var2 in the model, then only var1 and var2 will appear in subsequent models. If the first simulation includes var1 var2 and var3 then only var1 var2 and var3 will appear in subsequent models, assuming that they are significant (if not their coefficients will appear as dots).
For example, the incorrect output is featured below. The variables lweight, age, lbph, svi, gleason and pgg45 never appear if they do not appear in the first simulation.
_b_lweight _b_age _b_lbph _b_svi _b_lcp _b_gleason _b_pgg45 _b_lpsa
.4064831 .5390302
.2298697 .5591789
.2829061 .6279869
.5384691 .6027049
.3157105 .5523808
I want coefficients that are not included in the model to always appear as dots in the data set and I want subsequent simulations to not be seemingly dependent on the first simulation.
By using _b as a short-cut, the first iteration defined which coefficients were to be stored by simulate in all subsequent iterations. That is fine for most simulation programs, as those would use a fixed set of coefficients, but not what you want to use in combination with sw. So I adapted the program to explicitly list the coefficients (possibly missing when not selected) that are to be stored.
I also changed your programs such that they will run faster by avoiding mkmat and svmat and replacing those computations with predict and generate. I also changed it to make it fit more with conventions in the Stata community that a command will only replace a dataset in memory after a user explicitly asks for it by specifying the clear option. Finally I made sure that names of variables and scalars created in the program do not conflict with names already present in memory by using tempvar and tempname. These will also be automatically deleted when the program ends.
clear all
program define sw_pbs, rclass
syntax varlist, clear [reps(integer 100)]
gettoken depvar indepvar : varlist
foreach var of local indepvar {
local res "`res' `var'=r(`var')"
}
simulate `res', reps(`reps') : sw_pbs_simulator `varlist'
end
program define sw_pbs_simulator, rclass
syntax varlist
tempname rmse b
tempvar yhat y
gettoken depvar indepvar : varlist
reg `depvar' `indepvar'
scalar `rmse' = e(rmse)
predict double `yhat' if e(sample)
gen double `y' = `yhat' + rnormal(0, `rmse')
sw reg `y' `indepvar', pr(0.10) pe(0.05)
// start returning coefficients
matrix `b' = e(b)
local in : colnames `b'
local out : list indepvar - in
foreach var of local in {
return scalar `var' = _b[`var']
}
foreach var of local out {
return scalar `var' = .
}
end