Remove variable labels attached with foreign/Hmisc SPSS import functions - class

As usual, I got some SPSS file that I've imported into R with spss.get function from Hmisc package. I'm bothered with labelled class that Hmisc::spss.get adds to all variables in data.frame, hence want to remove it.
labelled class gives me headaches when I try to run ggplot or even when I want to do some menial analysis! One solution would be to remove labelled class from each variable in data.frame. How can I do that? Is that possible at all? If not, what are my other options?
I really want to bypass reediting variables "from scratch" with as.data.frame(lapply(x, as.numeric)) and as.character where applicable... And I certainly don't want to run SPSS and remove labels manually (don't like SPSS, nor care to install it)!
Thanks!

Here's how I get rid of the labels altogether. Similar to Jyotirmoy's solution but works for a vector as well as a data.frame. (Partial credits to Frank Harrell)
clear.labels <- function(x) {
if(is.list(x)) {
for(i in 1 : length(x)) class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled')
for(i in 1 : length(x)) attr(x[[i]],"label") <- NULL
}
else {
class(x) <- setdiff(class(x), "labelled")
attr(x, "label") <- NULL
}
return(x)
}
Use as follows:
my.unlabelled.df <- clear.labels(my.labelled.df)
EDIT
Here's a bit of a cleaner version of the function, same results:
clear.labels <- function(x) {
if(is.list(x)) {
for(i in seq_along(x)) {
class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled')
attr(x[[i]],"label") <- NULL
}
} else {
class(x) <- setdiff(class(x), "labelled")
attr(x, "label") <- NULL
}
return(x)
}

A belated note/warning regarding class membership in R objects. The correct method for identification of "labelled" is not to test for with an is function or equality {==) but rather with inherits. Methods that test for a specific location will not pick up cases where the order of existing classes are not the ones assumed.
You can avoid creating "labelled" variables in spss.get with the argument: , use.value.labels=FALSE.
w <- spss.get('/tmp/my.sav', use.value.labels=FALSE, datevars=c('birthdate','deathdate'))
The code from Bhattacharya could fail if the class of the labelled vector were simply "labelled" rather than c("labelled", "factor") in which case it should have been:
class(x[[i]]) <- NULL # no error from assignment of empty vector
The error you report can be reproduced with this code:
> b <- 4:6
> label(b) <- 'B Label'
> str(b)
Class 'labelled' atomic [1:3] 4 5 6
..- attr(*, "label")= chr "B Label"
> class(b) <- class(b)[-1]
Error in class(b) <- class(b)[-1] :
invalid replacement object to be a class string

You can try out the read.spss function from the foreign package.
A rough and ready way to get rid of the labelled class created by spss.get
for (i in 1:ncol(x)) {
z<-class(x[[i]])
if (z[[1]]=='labelled'){
class(x[[i]])<-z[-1]
attr(x[[i]],'label')<-NULL
}
}
But can you please give an example where labelled causes problems?
If I have a variable MAED in a data frame x created by spss.get, I have:
> class(x$MAED)
[1] "labelled" "factor"
> is.factor(x$MAED)
[1] TRUE
So well-written code that expects a factor (say) should not have any problems.

Suppose:
library(Hmisc)
w <- spss.get('...')
You could remove the labels of a variable called "var1" by using:
attributes(w$var1)$label <- NULL
If you also want to remove the class "labbled", you could do:
class(w$var1) <- NULL
or if the variable has more than one class:
class(w$var1) <- class(w$var1)[-which(class(w$var1)=="labelled")]
Hope this helps!

Well, I figured out that unclass function can be utilized to remove classes (who would tell, aye?!):
library(Hmisc)
# let's presuppose that variable x is gathered through spss.get() function
# and that x is factor
> class(x)
[1] "labelled" "factor"
> foo <- unclass(x)
> class(foo)
[1] "integer"
It's not the luckiest solution, just imagine back-converting bunch of vectors... If anyone tops this, I'll check it as an answer...

Related

scala nested for/yield generator to extract substring

I am new to scala. Pls be gentle. My problem for the moment is the syntax error.
(But my ultimate goal is to print each group of 3 characters from every string in the list...now i am merely printing the first 3 characters of every string)
def do_stuff():Unit = {
val s = List[String]("abc", "fds", "654444654")
for {
i <- s.indices
r <- 0 to s(i).length by 3
println(s(i).substring(0,3))
} yield {s(i)}
}
do_stuff()
i am getting this error. it is syntax related, but i dont undersatnd..
Error:(12, 18) ')' expected but '.' found.
println(s(i).substring(0,3))
That code doesn't compile because in a for-comprehension, you can't just put a print statement, you always need an assignment, in this case, a dummy one can solve your porblem.
_ = println(s(i).substring(0,3))
EDIT
If you want the combination of 3 elements in every String you can use combinations method from collections.
List("abc", "fds", "654444654").flatMap(_.combinations(3).toList)

Flatten syntax with yield - improving code readability

I'm trying to improve the readability of my code and I'm having a hard time with this little chunk.
Foo is a method that accepts a List[Ping]
Thing.generate returns a List[Ping]
ListOfPings is a List[Ping]
hasQuality returns a boolean value from evaluating a Ping
Here's the code:
foo((for {
pinger <- listOfPings
} yield pinger.generate.filter(_.hasQuality)).flatten)
Each Ping in listOfPingss is creating a List[Thing] with the generate method, meaning the result of the yield at the end of the loop is a List[List[Ping]].
I'm flattening that List[List[Ping]] (not the individual lists), and putting the whole result into foo
I'm having trouble making this look nicer, potentially with a flatmap? I sincerely appreciate the help.
Something like:
foo {
for (p <- listOfPings ; q <- p.generate if q.hasQuality) yield q
}

A simple model in Winbugs but it says "This chain contains uninitialized variables"

I have some simple time to event data, no covariates. I was trying to fit a Weibull distribution to it. So I have the following code. Everything looks good until I load my initials. It says "this chain contains uninitialized variables". But I don't understand. I think Weibull dist only has 2 parameters, and I already specified them all. Could you please advise? Thanks!
model
{
for(i in 1 : N) {
t[i] ~ dweib(r, mu)I(t.cen[i],)
}
mu ~ dexp(0.001)
r ~ dexp(0.001)
}
# Data
list(
t.cen=c(0,3.91,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,21.95,23.98,33.08),
t=c(2.34,NA,5.16,5.63,6.17,6.8,7.03,8.05,8.13,8.36,8.83,10.16,
10.55,10.94,11.48,11.95,13.05,13.59,16.02,20.08,NA,NA,
NA),
N=23
)
# Initial values
list(
r=3,mu=3
)
The other uninitialised variables are the missing (NA) values in the vector of t. Remember that the BUGS language makes no distinction between data and parameters, and that supplying something as data with the value NA is equivalent to not supplying it as data.

Forward reference extends over definition of variable in scala

I have a list. For all the numbers in odd position I want to make it 0. And for all the numbers in even position, I want to keep it as it is.I'm trying to do it via map in the following way.
Here's my code
def main(args: Array[String]) {
var l1 = List (1,2,3,4,5,6)
println(l1.map(f(_)))
var c = 0
def f(n:Int):Int =
{
if (c%2 == 0)
{c +=1
return n}
else
{c += 1
return 0}
I want the variable to keep track of the position. But as it seems,I can't forward reference 'c'.
I get the following error
scala forward reference extends over definition of variable c
I can't also declare 'c' inside the function, because it will never increment that way.
What should be the idea way to achieve what I am trying, with the help of map.
I have a list. For all the numbers in odd position I want to make it
0. And for all the numbers in even position, I want to keep it as it is.
Here's an elegant solution of this problem:
l1.zipWithIndex map { case (v, i) => if (i % 2 == 0) v else 0 }
As for the reason, why your code fails: you're trying to access variable c before its declaration in code. Here:
println(l1.map(f(_)))
var c = 0
Your function f is trying to access variable c, which is not declared yet. Reorder these two lines and it will work. But I'd recommend to stick with my initial approach.

Best way to create generic/method consistency for sort.data.frame?

I've finally decided to put the sort.data.frame method that's floating around the internet into an R package. It just gets requested too much to be left to an ad hoc method of distribution.
However, it's written with arguments that make it incompatible with the generic sort function:
sort(x,decreasing,...)
sort.data.frame(form,dat)
If I change sort.data.frame to take decreasing as an argument as in sort.data.frame(form,decreasing,dat) and discard decreasing, then it loses its simplicity because you'll always have to specify dat= and can't really use positional arguments. If I add it to the end as in sort.data.frame(form,dat,decreasing), then the order doesn't match with the generic function. If I hope that decreasing gets caught up in the dots `sort.data.frame(form,dat,...), then when using position-based matching I believe the generic function will assign the second position to decreasing and it will get discarded. What's the best way to harmonize these two functions?
The full function is:
# Sort a data frame
sort.data.frame <- function(form,dat){
# Author: Kevin Wright
# http://tolstoy.newcastle.edu.au/R/help/04/09/4300.html
# Some ideas from Andy Liaw
# http://tolstoy.newcastle.edu.au/R/help/04/07/1076.html
# Use + for ascending, - for decending.
# Sorting is left to right in the formula
# Useage is either of the following:
# sort.data.frame(~Block-Variety,Oats)
# sort.data.frame(Oats,~-Variety+Block)
# If dat is the formula, then switch form and dat
if(inherits(dat,"formula")){
f=dat
dat=form
form=f
}
if(form[[1]] != "~") {
stop("Formula must be one-sided.")
}
# Make the formula into character and remove spaces
formc <- as.character(form[2])
formc <- gsub(" ","",formc)
# If the first character is not + or -, add +
if(!is.element(substring(formc,1,1),c("+","-"))) {
formc <- paste("+",formc,sep="")
}
# Extract the variables from the formula
vars <- unlist(strsplit(formc, "[\\+\\-]"))
vars <- vars[vars!=""] # Remove spurious "" terms
# Build a list of arguments to pass to "order" function
calllist <- list()
pos=1 # Position of + or -
for(i in 1:length(vars)){
varsign <- substring(formc,pos,pos)
pos <- pos+1+nchar(vars[i])
if(is.factor(dat[,vars[i]])){
if(varsign=="-")
calllist[[i]] <- -rank(dat[,vars[i]])
else
calllist[[i]] <- rank(dat[,vars[i]])
}
else {
if(varsign=="-")
calllist[[i]] <- -dat[,vars[i]]
else
calllist[[i]] <- dat[,vars[i]]
}
}
dat[do.call("order",calllist),]
}
Example:
library(datasets)
sort.data.frame(~len+dose,ToothGrowth)
Use the arrange function in plyr. It allows you to individually pick which variables should be in ascending and descending order:
arrange(ToothGrowth, len, dose)
arrange(ToothGrowth, desc(len), dose)
arrange(ToothGrowth, len, desc(dose))
arrange(ToothGrowth, desc(len), desc(dose))
It also has an elegant implementation:
arrange <- function (df, ...) {
ord <- eval(substitute(order(...)), df, parent.frame())
unrowname(df[ord, ])
}
And desc is just an ordinary function:
desc <- function (x) -xtfrm(x)
Reading the help for xtfrm is highly recommended if you're writing this sort of function.
There are a few problems there. sort.data.frame needs to have the same arguments as the generic, so at a minimum it needs to be
sort.data.frame(x, decreasing = FALSE, ...) {
....
}
To have dispatch work, the first argument needs to be the object dispatched on. So I would start with:
sort.data.frame(x, decreasing = FALSE, formula = ~ ., ...) {
....
}
where x is your dat, formula is your form, and we provide a default for formula to include everything. (I haven't studied your code in detail to see exactly what form represents.)
Of course, you don't need to specify decreasing in the call, so:
sort(ToothGrowth, formula = ~ len + dose)
would be how to call the function using the above specifications.
Otherwise, if you don't want sort.data.frame to be an S3 generic, call it something else and then you are free to have whatever arguments you want.
I agree with #Gavin that x must come first. I'd put the decreasing parameter after the formula though - since it probably isn't used that much, and hardly ever as a positional argument.
The formula argument would be used much more and therefore should be the second argument. I also strongly agree with #Gavin that it should be called formula, and not form.
sort.data.frame(x, formula = ~ ., decreasing = FALSE, ...) {
...
}
You might want to extend the decreasing argument to allow a logical vector where each TRUE/FALSE value corresponds to one column in the formula:
d <- data.frame(A=1:10, B=10:1)
sort(d, ~ A+B, decreasing=c(A=TRUE, B=FALSE)) # sort by decreasing A, increasing B