Best way to create generic/method consistency for sort.data.frame? - class

I've finally decided to put the sort.data.frame method that's floating around the internet into an R package. It just gets requested too much to be left to an ad hoc method of distribution.
However, it's written with arguments that make it incompatible with the generic sort function:
sort(x,decreasing,...)
sort.data.frame(form,dat)
If I change sort.data.frame to take decreasing as an argument as in sort.data.frame(form,decreasing,dat) and discard decreasing, then it loses its simplicity because you'll always have to specify dat= and can't really use positional arguments. If I add it to the end as in sort.data.frame(form,dat,decreasing), then the order doesn't match with the generic function. If I hope that decreasing gets caught up in the dots `sort.data.frame(form,dat,...), then when using position-based matching I believe the generic function will assign the second position to decreasing and it will get discarded. What's the best way to harmonize these two functions?
The full function is:
# Sort a data frame
sort.data.frame <- function(form,dat){
# Author: Kevin Wright
# http://tolstoy.newcastle.edu.au/R/help/04/09/4300.html
# Some ideas from Andy Liaw
# http://tolstoy.newcastle.edu.au/R/help/04/07/1076.html
# Use + for ascending, - for decending.
# Sorting is left to right in the formula
# Useage is either of the following:
# sort.data.frame(~Block-Variety,Oats)
# sort.data.frame(Oats,~-Variety+Block)
# If dat is the formula, then switch form and dat
if(inherits(dat,"formula")){
f=dat
dat=form
form=f
}
if(form[[1]] != "~") {
stop("Formula must be one-sided.")
}
# Make the formula into character and remove spaces
formc <- as.character(form[2])
formc <- gsub(" ","",formc)
# If the first character is not + or -, add +
if(!is.element(substring(formc,1,1),c("+","-"))) {
formc <- paste("+",formc,sep="")
}
# Extract the variables from the formula
vars <- unlist(strsplit(formc, "[\\+\\-]"))
vars <- vars[vars!=""] # Remove spurious "" terms
# Build a list of arguments to pass to "order" function
calllist <- list()
pos=1 # Position of + or -
for(i in 1:length(vars)){
varsign <- substring(formc,pos,pos)
pos <- pos+1+nchar(vars[i])
if(is.factor(dat[,vars[i]])){
if(varsign=="-")
calllist[[i]] <- -rank(dat[,vars[i]])
else
calllist[[i]] <- rank(dat[,vars[i]])
}
else {
if(varsign=="-")
calllist[[i]] <- -dat[,vars[i]]
else
calllist[[i]] <- dat[,vars[i]]
}
}
dat[do.call("order",calllist),]
}
Example:
library(datasets)
sort.data.frame(~len+dose,ToothGrowth)

Use the arrange function in plyr. It allows you to individually pick which variables should be in ascending and descending order:
arrange(ToothGrowth, len, dose)
arrange(ToothGrowth, desc(len), dose)
arrange(ToothGrowth, len, desc(dose))
arrange(ToothGrowth, desc(len), desc(dose))
It also has an elegant implementation:
arrange <- function (df, ...) {
ord <- eval(substitute(order(...)), df, parent.frame())
unrowname(df[ord, ])
}
And desc is just an ordinary function:
desc <- function (x) -xtfrm(x)
Reading the help for xtfrm is highly recommended if you're writing this sort of function.

There are a few problems there. sort.data.frame needs to have the same arguments as the generic, so at a minimum it needs to be
sort.data.frame(x, decreasing = FALSE, ...) {
....
}
To have dispatch work, the first argument needs to be the object dispatched on. So I would start with:
sort.data.frame(x, decreasing = FALSE, formula = ~ ., ...) {
....
}
where x is your dat, formula is your form, and we provide a default for formula to include everything. (I haven't studied your code in detail to see exactly what form represents.)
Of course, you don't need to specify decreasing in the call, so:
sort(ToothGrowth, formula = ~ len + dose)
would be how to call the function using the above specifications.
Otherwise, if you don't want sort.data.frame to be an S3 generic, call it something else and then you are free to have whatever arguments you want.

I agree with #Gavin that x must come first. I'd put the decreasing parameter after the formula though - since it probably isn't used that much, and hardly ever as a positional argument.
The formula argument would be used much more and therefore should be the second argument. I also strongly agree with #Gavin that it should be called formula, and not form.
sort.data.frame(x, formula = ~ ., decreasing = FALSE, ...) {
...
}
You might want to extend the decreasing argument to allow a logical vector where each TRUE/FALSE value corresponds to one column in the formula:
d <- data.frame(A=1:10, B=10:1)
sort(d, ~ A+B, decreasing=c(A=TRUE, B=FALSE)) # sort by decreasing A, increasing B

Related

Minizinc: declare explicit set in decision variable

I'm trying to implement the 'Sport Scheduling Problem' (with a Round-Robin approach to break symmetries). The actual problem is of no importance. I simply want to declare the value at x[1,1] to be the set {1,2} and base the sets in the same column upon the first set. This is modelled as in the code below. The output is included in a screenshot below it. The problem is that the first set is not printed as a set but rather some sort of range while the values at x[2,1] and x[3,1] are indeed printed as sets and x[4,1] again as a range. Why is this? I assume that in the declaration of x that set of 1..n is treated as an integer but if it is not, how to declare it as integers?
EDIT: ONLY the first column of the output is of importance.
int: n = 8;
int: nw = n-1;
int: np = n div 2;
array[1..np, 1..nw] of var set of 1..n: x;
% BEGIN FIX FIRST WEEK $
constraint(
x[1,1] = {1, 2}
);
constraint(
forall(t in 2..np) (x[t,1] = {t+1, n+2-t} )
);
solve satisfy;
output[
"\(x[p,w])" ++ if w == nw then "\n" else "\t" endif | p in 1..np, w in 1..nw
]
Backend solver: Gecode
(Here's a summarize of my comments above.)
The range syntax is simply a shorthand for contiguous values in a set: 1..8 is a shorthand of the set {1,2,3,4,5,6,7,8}, and 5..6 is a shorthand for the set {5,6}.
The reason for this shorthand is probably since it's often - and arguably - easier to read the shorthand version than the full list, especially if it's a long list of integers, e.g. 1..1024. It also save space in the output of solutions.
For the two set versions, e.g. {1,2}, this explicit enumeration might be clearer to read than 1..2, though I tend to prefer the shorthand version in all cases.

Does pattern match in Raku have guard clause?

In scala, pattern match has guard pattern:
val ch = 23
val sign = ch match {
case _: Int if 10 < ch => 65
case '+' => 1
case '-' => -1
case _ => 0
}
Is the Raku version like this?
my $ch = 23;
given $ch {
when Int and * > 10 { say 65}
when '+' { say 1 }
when '-' { say -1 }
default { say 0 }
}
Is this right?
Update: as jjmerelo suggested, i post my result as follows, the signature version is also interesting.
multi washing_machine(Int \x where * > 10 ) { 65 }
multi washing_machine(Str \x where '+' ) { 1 }
multi washing_machine(Str \x where '-' ) { -1 }
multi washing_machine(\x) { 0 }
say washing_machine(12); # 65
say washing_machine(-12); # 0
say washing_machine('+'); # 1
say washing_machine('-'); # -1
say washing_machine('12'); # 0
say washing_machine('洗衣机'); # 0
TL;DR I've written another answer that focuses on using when. This answer focuses on using an alternative to that which combines Signatures, Raku's powerful pattern matching construct, with a where clause.
"Does pattern match in Raku have guard clause?"
Based on what little I know about Scala, some/most Scala pattern matching actually corresponds to using Raku signatures. (And guard clauses in that context are typically where clauses.)
Quoting Martin Odersky, Scala's creator, from The Point of Pattern Matching in Scala:
instead of just matching numbers, which is what switch statements do, you match what are essentially the creation forms of objects
Raku signatures cover several use cases (yay, puns). These include the Raku equivalent of the functional programming paradigmatic use in which one matches values' or functions' type signatures (cf Haskell) and the object oriented programming paradigmatic use in which one matches against nested data/objects and pulls out desired bits (cf Scala).
Consider this Raku code:
class body { has ( $.head, #.arms, #.legs ) } # Declare a class (object structure).
class person { has ( $.mom, $.body, $.age ) } # And another that includes first.
multi person's-age-and-legs # Declare a function that matches ...
( person # ... a person ...
( :$age where * > 40, # ... whose age is over 40 ...
:$body ( :#legs, *% ), # ... noting their body's legs ...
*% ) ) # ... and ignoring other attributes.
{ say "$age {+#legs}" } # Display age and number of legs.
my $age = 42; # Let's demo handy :$var syntax below.
person's-age-and-legs # Call function declared above ...
person # ... passing a person.
.new: # Explicitly construct ...
:$age, # ... a middle aged ...
body => body.new:
:head,
:2arms,
legs => <left middle right> # ... three legged person.
# Displays "42 3"
Notice where there's a close equivalent to a Scala pattern matching guard clause in the above -- where * > 40. (This can be nicely bundled up into a subset type.)
We could define other multis that correspond to different cases, perhaps pulling out the "names" of the person's legs ('left', 'middle', etc.) if their mom's name matches a particular regex or whatever -- you hopefully get the picture.
A default case (multi) that doesn't bother to deconstruct the person could be:
multi person's-age-and-legs (|otherwise)
{ say "let's not deconstruct this person" }
(In the above we've prefixed a parameter in a signature with | to slurp up all remaining structure/arguments passed to a multi. Given that we do nothing with that slurped structure/data, we could have written just (|).)
Unfortunately, I don't think signature deconstruction is mentioned in the official docs. Someone could write a book about Raku signatures. (Literally. Which of course is a great way -- the only way, even -- to write stuff. My favorite article that unpacks a bit of the power of Raku signatures is Pattern Matching and Unpacking from 2013 by Moritz. Who has authored Raku books. Here's hoping.)
Scala's match/case and Raku's given/when seem simpler
Indeed.
As #jjmerelo points out in the comments, using signatures means there's a multi foo (...) { ...} for each and every case, which is much heavier syntactically than case ... => ....
In mitigation:
Simpler cases can just use given/when, just like you wrote in the body of your question;
Raku will presumably one day get non-experimental macros that can be used to implement a construct that looks much closer to Scala's match/case construct, eliding the repeated multi foo (...)s.
From what I see in this answer, that's not really an implementation of a guard pattern in the same sense Haskell has them. However, Perl 6 does have guards in the same sense Scala has: using default patterns combined with ifs.
The Haskell to Perl 6 guide does have a section on guards. It hints at the use of where as guards; so that might answer your question.
TL;DR You've encountered what I'd call a WTF?!?: when Type and ... fails to check the and clause. This answer talks about what's wrong with the when and how to fix it. I've written another answer that focuses on using where with a signature.
If you want to stick with when, I suggest this:
when (condition when Type) { ... } # General form
when (* > 10 when Int) { ... } # For your specific example
This is (imo) unsatisfactory, but it does first check the Type as a guard, and then the condition if the guard passes, and works as expected.
"Is this right?"
No.
given $ch {
when Int and * > 10 { say 65}
}
This code says 65 for any given integer, not just one over 10!
WTF?!? Imo we should mention this on Raku's trap page.
We should also consider filing an issue to make Rakudo warn or fail to compile if a when construct starts with a compile-time constant value that's a type object, and continues with and (or &&, andthen, etc), which . It could either fail at compile-time or display a warning.
Here's the best option I've been able to come up with:
when (* > 10 when Int) { say 65 }
This takes advantage of the statement modifier (aka postfix) form of when inside the parens. The Int is checked before the * > 10.
This was inspired by Brad++'s new answer which looks nice if you're writing multiple conditions against a single guard clause.
I think my variant is nicer than the other options I've come up with in previous versions of this answer, but still unsatisfactory inasmuch as I don't like the Int coming after the condition.
Ultimately, especially if/when RakuAST lands, I think we will experiment with new pattern matching forms. Hopefully we'll come up with something nice that provides a nice elimination of this wart.
Really? What's going on?
We can begin to see the underlying problem with this code:
.say for ('TrueA' and 'TrueB'),
('TrueB' and 'TrueA'),
(Int and 42),
(42 and Int)
displays:
TrueB
TrueA
(Int)
(Int)
The and construct boolean evaluates its left hand argument. If that evaluates to False, it returns it, otherwise it returns its right hand argument.
In the first line, 'TrueA' boolean evaluates to True so the first line returns the right hand argument 'TrueB'.
In the second line 'TrueB' evaluates to True so the and returns its right hand argument, in this case 'TrueA'.
But what happens in the third line? Well, Int is a type object. Type objects boolean evaluate to False! So the and duly returns its left hand argument which is Int (which the .say then displays as (Int)).
This is the root of the problem.
(To continue to the bitter end, the compiler evaluates the expression Int and * > 10; immediately returns the left hand side argument to and which is Int; then successfully matches that Int against whatever integer is given -- completely ignoring the code that looks like a guard clause (the and ... bit).)
If you were using such an expression as the condition of, say, an if statement, the Int would boolean evaluate to False and you'd get a false negative. Here you're using a when which uses .ACCEPTS which leads to a false positive (it is an integer but it's any integer, disregarding the supposed guard clause). This problem quite plausibly belongs on the traps page.
Years ago I wrote a comment mentioning that you had to be more explicit about matching against $_ like this:
my $ch = 23;
given $ch {
when $_ ~~ Int and $_ > 10 { say 65}
when '+' { say 1 }
when '-' { say -1 }
default { say 0 }
}
After coming back to this question, I realized there was another way.
when can safely be inside of another when construct.
my $ch = 23;
given $ch {
when Int:D {
when $_ > 10 { say 65}
proceed
}
when '+' { say 1 }
when '-' { say -1 }
default { say 0 }
}
Note that the inner when will succeed out of the outer one, which will succeed out of the given block.
If the inner when doesn't match we want to proceed on to the outer when checks and default, so we call proceed.
This means that we can also group multiple when statements inside of the Int case, saving having to do repeated type checks. It also means that those inner when checks don't happen at all if we aren't testing an Int value.
when Int:D {
when $_ < 10 { say 5 }
when 10 { say 10}
when $_ > 10 { say 65}
}

A simple model in Winbugs but it says "This chain contains uninitialized variables"

I have some simple time to event data, no covariates. I was trying to fit a Weibull distribution to it. So I have the following code. Everything looks good until I load my initials. It says "this chain contains uninitialized variables". But I don't understand. I think Weibull dist only has 2 parameters, and I already specified them all. Could you please advise? Thanks!
model
{
for(i in 1 : N) {
t[i] ~ dweib(r, mu)I(t.cen[i],)
}
mu ~ dexp(0.001)
r ~ dexp(0.001)
}
# Data
list(
t.cen=c(0,3.91,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,21.95,23.98,33.08),
t=c(2.34,NA,5.16,5.63,6.17,6.8,7.03,8.05,8.13,8.36,8.83,10.16,
10.55,10.94,11.48,11.95,13.05,13.59,16.02,20.08,NA,NA,
NA),
N=23
)
# Initial values
list(
r=3,mu=3
)
The other uninitialised variables are the missing (NA) values in the vector of t. Remember that the BUGS language makes no distinction between data and parameters, and that supplying something as data with the value NA is equivalent to not supplying it as data.

Specman: Why DAC macro interprets the type <some_name'exp> as 'string'?

I'm trying to write a DAC macro that gets as input the name of list of bits and its size, and the name of integer variable. Every element in the list should be constrained to be equal to every bit in the variable (both of the same length), i.e. (for list name list_of_bits and variable name foo and their length is 4) the macro's output should be:
keep list_of_bits[0] == foo[0:0];
keep list_of_bits[1] == foo[1:1];
keep list_of_bits[2] == foo[2:2];
keep list_of_bits[3] == foo[3:3];
My macro's code is:
define <keep_all_bits'exp> "keep_all_bits <list_size'exp> <num'name> <list_name'name>" as computed {
for i from 0 to (<list_size'exp> - 1) do {
result = appendf("%s keep %s[%d] == %s[%d:%d];",result, <list_name'name>, index, <num'name>, index, index);
};
};
The error I get:
*** Error: The type of '<list_size'exp>' is 'string', while expecting a
numeric type
...
for i from 0 to (<list_size'exp> - 1) do {
Why it interprets the <list_size'exp> as string?
Thank you for your help
All macro arguments in DAC macros are considered strings (except repetitions, which are considered lists of strings).
The point is that a macro treats its input purely syntactically, and it has no semantic information about the arguments. For example, in case of an expression (<exp>) the macro is unable to actually evaluate the expression and compute its value at compilation time, or even to figure out its type. This information is figured out at later compilation phases.
In your case, I would assume that the size is always a constant. So, first of all, you can use <num> instead of <exp> for that macro argument, and use as_a() to convert it to the actual number. The difference between <exp> and <num> is that <num> allows only constant numbers and not any expressions; but it's still treated as a string inside the macro.
Another important point: your macro itself should be a <struct_member> macro rather than an <exp> macro, because this construct itself is a struct member (namely, a constraint) and not an expression.
And one more thing: to ensure that the list size will be exactly as needed, add another constraint for the list size.
So, the improved macro can look like this:
define <keep_all_bits'struct_member> "keep_all_bits <list_size'num> <num'name> <list_name'name>" as computed {
result = appendf("keep %s.size() == %s;", <list_name'name>, <list_size'num>);
for i from 0 to (<list_size'num>.as_a(int) - 1) do {
result = appendf("%s keep %s[%d] == %s[%d:%d];",result, <list_name'name>, i, <num'name>, i, i);
};
};
Why not write is without macro?
keep for each in list_of_bits {
it == foo[index:index];
};
This should do the same, but look more readable and easier to debug; also the generation engine might take some advantage of more concise constraint.

Remove variable labels attached with foreign/Hmisc SPSS import functions

As usual, I got some SPSS file that I've imported into R with spss.get function from Hmisc package. I'm bothered with labelled class that Hmisc::spss.get adds to all variables in data.frame, hence want to remove it.
labelled class gives me headaches when I try to run ggplot or even when I want to do some menial analysis! One solution would be to remove labelled class from each variable in data.frame. How can I do that? Is that possible at all? If not, what are my other options?
I really want to bypass reediting variables "from scratch" with as.data.frame(lapply(x, as.numeric)) and as.character where applicable... And I certainly don't want to run SPSS and remove labels manually (don't like SPSS, nor care to install it)!
Thanks!
Here's how I get rid of the labels altogether. Similar to Jyotirmoy's solution but works for a vector as well as a data.frame. (Partial credits to Frank Harrell)
clear.labels <- function(x) {
if(is.list(x)) {
for(i in 1 : length(x)) class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled')
for(i in 1 : length(x)) attr(x[[i]],"label") <- NULL
}
else {
class(x) <- setdiff(class(x), "labelled")
attr(x, "label") <- NULL
}
return(x)
}
Use as follows:
my.unlabelled.df <- clear.labels(my.labelled.df)
EDIT
Here's a bit of a cleaner version of the function, same results:
clear.labels <- function(x) {
if(is.list(x)) {
for(i in seq_along(x)) {
class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled')
attr(x[[i]],"label") <- NULL
}
} else {
class(x) <- setdiff(class(x), "labelled")
attr(x, "label") <- NULL
}
return(x)
}
A belated note/warning regarding class membership in R objects. The correct method for identification of "labelled" is not to test for with an is function or equality {==) but rather with inherits. Methods that test for a specific location will not pick up cases where the order of existing classes are not the ones assumed.
You can avoid creating "labelled" variables in spss.get with the argument: , use.value.labels=FALSE.
w <- spss.get('/tmp/my.sav', use.value.labels=FALSE, datevars=c('birthdate','deathdate'))
The code from Bhattacharya could fail if the class of the labelled vector were simply "labelled" rather than c("labelled", "factor") in which case it should have been:
class(x[[i]]) <- NULL # no error from assignment of empty vector
The error you report can be reproduced with this code:
> b <- 4:6
> label(b) <- 'B Label'
> str(b)
Class 'labelled' atomic [1:3] 4 5 6
..- attr(*, "label")= chr "B Label"
> class(b) <- class(b)[-1]
Error in class(b) <- class(b)[-1] :
invalid replacement object to be a class string
You can try out the read.spss function from the foreign package.
A rough and ready way to get rid of the labelled class created by spss.get
for (i in 1:ncol(x)) {
z<-class(x[[i]])
if (z[[1]]=='labelled'){
class(x[[i]])<-z[-1]
attr(x[[i]],'label')<-NULL
}
}
But can you please give an example where labelled causes problems?
If I have a variable MAED in a data frame x created by spss.get, I have:
> class(x$MAED)
[1] "labelled" "factor"
> is.factor(x$MAED)
[1] TRUE
So well-written code that expects a factor (say) should not have any problems.
Suppose:
library(Hmisc)
w <- spss.get('...')
You could remove the labels of a variable called "var1" by using:
attributes(w$var1)$label <- NULL
If you also want to remove the class "labbled", you could do:
class(w$var1) <- NULL
or if the variable has more than one class:
class(w$var1) <- class(w$var1)[-which(class(w$var1)=="labelled")]
Hope this helps!
Well, I figured out that unclass function can be utilized to remove classes (who would tell, aye?!):
library(Hmisc)
# let's presuppose that variable x is gathered through spss.get() function
# and that x is factor
> class(x)
[1] "labelled" "factor"
> foo <- unclass(x)
> class(foo)
[1] "integer"
It's not the luckiest solution, just imagine back-converting bunch of vectors... If anyone tops this, I'll check it as an answer...