Why does ifelse convert a data.frame to a list: ifelse(TRUE, data.frame(1), 0)) != data.frame(1)? - class

I want to return a data.frame from a function if TRUE, else return NA using return(ifelse(condition, mydf, NA))
However, ifelse strips the column names from the data.frame.
Why are these results different?
> data.frame(1)
X1
1 1
> ifelse(TRUE, data.frame(1), NA)
[[1]]
[1] 1
Some additional insight from dput():
> dput(ifelse(TRUE, data.frame(1), 0))
list(1)
> dput(data.frame(1))
structure(list(X1 = 1), .Names = "X1", row.names = c(NA, -1L),
class = "data.frame")

ifelse is generally intended for vectorized comparisons, and has side-effects such as these: as it says in ?ifelse,
‘ifelse’ returns a value with the same shape as ‘test’ ...
so in this case (test is a vector of length 1) it tries to convert the data frame to a 'vector' (list in this case) of length 1 ...
return(if (condition) mydf else NA)
As a general design point I try to return objects of the same structure no matter what, so I might prefer
if (!condition) mydf[] <- NA
return(mydf)
As a general rule, I find that R users (especially coming from other programming languages) start by using if exclusively, take a while to discover ifelse, then overuse it for a while, discovering later that you really want to use if in logical contexts. A similar thing happens with & and &&.
See also:
section 3.2 of Patrick Burns's R Inferno ...
Why can't R's ifelse statements return vectors?

Related

Set/sequence summation operator?

I have a set, S = { 1, 2, 3, 4, 5 }.
If I wanted to sum this in standard logic it's just ∑S (no MathJax on SO so I can't format this nicely).
What's the VDM equivalent? I don't see anything in the numerics/sets section of the language reference.
There isn't a standard library function to do this (though perhaps there should be). You would sum a set with a simple recursive function:
sum: set of nat +> nat
sum(s) ==
if s = {}
then 0
else let e in set s in
e + sum(s \ {e})
measure card s;
The "let" selects an arbitrary element from the set, and then add that to the sum of the remainder. The measure says that the recursion always deals with smaller sets.
This should work:
sum(S)
But you could find this very easily.

Do these two postgres expressions give same results?

I have two fields with range type.
Are these two where expressions will give same results?
where range1 && range2
where not isempty( range1 * range2 )
They will, indeed.
Both overlap operator (&&) and intersection (*) are evaluated correctly, including cases where the 2 ranges have a common bound and their intersection:
only contains 1 element (= the bound is included in both)
contains no elements (= the bound is not included in at least 1 of the 2 ranges)
The following query tests pretty much all the cases (= intersections with more than 1 point, exactly 1 point, 0 points but "close", "truly" 0 points, and all the combinations of lower/upper bounds inclusions):
WITH r(range) AS (
VALUES (numrange(0,1,'[]')), (numrange(1,2,'[]')), (numrange(0,2,'[]')), (numrange(5,6, '[]')),
(numrange(0,1,'[)')), (numrange(1,2,'[)')), (numrange(0,2,'[)')), (numrange(5,6, '[)')),
(numrange(0,1,'(]')), (numrange(1,2,'(]')), (numrange(0,2,'(]')), (numrange(5,6, '(]')),
(numrange(0,1,'()')), (numrange(1,2,'()')), (numrange(0,2,'()')), (numrange(5,6, '()'))
)
SELECT * FROM (
SELECT r1.range, r2.range, r1.range && r2.range AS UsingOverlap, NOT isempty(r1.range * r2.range) AS UsingIntersect
FROM r r1, r r2
) T
Feel free to add WHERE UsingOverlap <> UsingIntersect.

In count change recursive algorithm, why do we return 1 if the amount = 0?

I am taking coursera course and,for an assignment, I have written a code to count the change of an amount given a list of denominations. A doing a lot of research, I found explanations of various algorithms. In the recursive implementation, one of the base cases is if the amount money is 0 then the count is 1. I don't understand why but this is the only way the code works. I feel that is the amount is 0 then there is no way to make change for it and I should throw an exception. The code look like:
function countChange(amount : Int, denoms :List[Int]) : Int = {
if (amount == 0 ) return 1 ....
Any explanation is much appreciated.
To avoid speaking specifically about the Coursera problem, I'll refer to a simpler but similar problem.
How many outcomes are there for 2 coin flips? 4.
(H,H),(H,T),(T,H),(T,T)
How many outcomes are there for 1 coin flip? 2.
(H),(T)
How many outcomes are there for 0 coin flips? 1.
()
Expressing this recursively, how many outcomes are there for N coin flips? Let's call it f(N) where
f(N) = 2 * f(N - 1), for N > 0
f(0) = 1
The N = 0 trivial (base) case is chosen so that the non-trivial cases, defined recursively, work out correctly. Since we're doing multiplication in this example and the identity element for multiplication is 1, it makes sense to choose that as the base case.
Alternatively, you could argue from combinatorics: n choose 0 = 1, 0! = 1, etc.

Consolidating a data table in Scala

I am working on a small data analysis tool, and practicing/learning Scala in the process. However I got stuck at a small problem.
Assume data of type:
X Gr1 x_11 ... x_1n
X Gr2 x_21 ... x_2n
..
X GrK x_k1 ... x_kn
Y Gr1 y_11 ... y_1n
Y Gr3 y_31 ... y_3n
..
Y Gr(K-1) ...
Here I have entries (X,Y...) that may or may not exist in up to K groups, with a series of values for each group. What I want to do is pretty simple (in theory), I would like to consolidate the rows that belong to the same "entity" in different groups. so instead of multiple lines that start with X, I want to have one row with all values from x_11 to x_kn in columns.
What makes things complicated however is that not all entities exist in all groups. So wherever there's "missing data" I would like to pad with for instance zeroes, or some string that denotes a missing value. So if I have (X,Y,Z) in up to 3 groups, the type I table I want to have is as follows:
X x_11 x_12 x_21 x_22 x_31 x_32
Y y_11 y_12 N/A N/A y_31 y_32
Z N/A N/A z_21 z_22 N/A N/A
I have been stuck trying to figure this out, is there a smart way to use List functions to solve this?
I wrote this simple loop:
for {
(id, hitlist) <- hits.groupBy(_.acc)
h <- hitlist
} println(id + "\t" + h.sampleId + "\t" + h.ratios.mkString("\t"))
to able to generate the tables that look like the example above. Note that, my original data is of a different format and layout,but that has little to do with the problem at hand, thus I have skipped all steps regarding parsing. I should be able to use groupBy in a better way that actually solves this for me, but I can't seem to get there.
Then I modified my loop mapping the hits to ratios and appending them to one another:
for ((id, hitlist) <- hits.groupBy(_.acc)){
val l = hitlist.map(_.ratios).foldRight(List[Double]()){
(l1: List[Double], l2: List[Double]) => l1 ::: l2
}
println(id + "\t" + l.mkString("\t"))
//println(id + "\t" + h.sampleId + "\t" + h.ratios.mkString("\t"))
}
That gets me one step closer but still no cigar! Instead of a fully padded "matrix" I get a jagged table. Taking the example above:
X x_11 x_12 x_21 x_22 x_31 x_32
Y y_11 y_12 y_31 y_32
Z z_21 z_22
Any ideas as to how I can pad the table so that values from respective groups are aligned with one another? I should be able to use _.sampleId, which holds the "group membersip" for each "hit", but I am not sure how exactly. ´hits´ is a List of type Hit which is practically a wrapper for each row, giving convenience methods for getting individual values, so essentially a tuple which have "named indices" (such as .acc, .sampleId..)
(I would like to solve this problem without hardcoding the number of groups, as it might change from case to case)
Thanks!
This is a bit of a contrived example, but I think you can see where this is going:
case class Hit(acc:String, subAcc:String, value:Int)
val hits = List(Hit("X", "x_11", 1), Hit("X", "x_21", 2), Hit("X", "x_31", 3))
val kMax = 4
val nMax = 2
for {
(id, hitlist) <- hits.groupBy(_.acc)
k <- 1 to kMax
n <- 1 to nMax
} yield {
val subId = "x_%s%s".format(k, n)
val row = hitlist.find(h => h.subAcc == subId).getOrElse(Hit(id, subId, 0))
println(row)
}
//Prints
Hit(X,x_11,1)
Hit(X,x_12,0)
Hit(X,x_21,2)
Hit(X,x_22,0)
Hit(X,x_31,3)
Hit(X,x_32,0)
Hit(X,x_41,0)
Hit(X,x_42,0)
If you provide more information on your hits lists then we could probably come with something a little more accurate.
I have managed to solve this problem with the following code, I am putting it here as an answer in case someone else runs into a similar problem and requires some help. The use of find() from Noah's answer was definitely very useful, so do give him a +1 in case this code snippet helps you out.
val samples = hits.groupBy(_.sampleId).keys.toList.sorted
for ((id, hitlist) <- hits.groupBy(_.acc)) {
val ratios =
for (sample <- samples)
yield hitlist.find(h => h.sampleId == sample).map(_.ratios)
.getOrElse(List(Double.NaN, Double.NaN, Double.NaN, Double.NaN, Double.NaN, Double.NaN))
println(id + "\t" + ratios.flatten.mkString("\t"))
}
I figure it's not a very elegant or efficient solution, as I have two calls to groupBy and I would be interested to see better solutions to this problem.

Calculating prime numbers in Scala: how does this code work?

So I've spent hours trying to work out exactly how this code produces prime numbers.
lazy val ps: Stream[Int] = 2 #:: Stream.from(3).filter(i =>
ps.takeWhile{j => j * j <= i}.forall{ k => i % k > 0});
I've used a number of printlns etc, but nothings making it clearer.
This is what I think the code does:
/**
* [2,3]
*
* takeWhile 2*2 <= 3
* takeWhile 2*2 <= 4 found match
* (4 % [2,3] > 1) return false.
* takeWhile 2*2 <= 5 found match
* (5 % [2,3] > 1) return true
* Add 5 to the list
* takeWhile 2*2 <= 6 found match
* (6 % [2,3,5] > 1) return false
* takeWhile 2*2 <= 7
* (7 % [2,3,5] > 1) return true
* Add 7 to the list
*/
But If I change j*j in the list to be 2*2 which I assumed would work exactly the same, it causes a stackoverflow error.
I'm obviously missing something fundamental here, and could really use someone explaining this to me like I was a five year old.
Any help would be greatly appreciated.
I'm not sure that seeking a procedural/imperative explanation is the best way to gain understanding here. Streams come from functional programming and they're best understood from that perspective. The key aspects of the definition you've given are:
It's lazy. Other than the first element in the stream, nothing is computed until you ask for it. If you never ask for the 5th prime, it will never be computed.
It's recursive. The list of prime numbers is defined in terms of itself.
It's infinite. Streams have the interesting property (because they're lazy) that they can represent a sequence with an infinite number of elements. Stream.from(3) is an example of this: it represents the list [3, 4, 5, ...].
Let's see if we can understand why your definition computes the sequence of prime numbers.
The definition starts out with 2 #:: .... This just says that the first number in the sequence is 2 - simple enough so far.
The next part defines the rest of the prime numbers. We can start with all the counting numbers starting at 3 (Stream.from(3)), but we obviously need to filter a bunch of these numbers out (i.e., all the composites). So let's consider each number i. If i is not a multiple of a lesser prime number, then i is prime. That is, i is prime if, for all primes k less than i, i % k > 0. In Scala, we could express this as
nums.filter(i => ps.takeWhile(k => k < i).forall(k => i % k > 0))
However, it isn't actually necessary to check all lesser prime numbers -- we really only need to check the prime numbers whose square is less than or equal to i (this is a fact from number theory*). So we could instead write
nums.filter(i => ps.takeWhile(k => k * k <= i).forall(k => i % k > 0))
So we've derived your definition.
Now, if you happened to try the first definition (with k < i), you would have found that it didn't work. Why not? It has to do with the fact that this is a recursive definition.
Suppose we're trying to decide what comes after 2 in the sequence. The definition tells us to first determine whether 3 belongs. To do so, we consider the list of primes up to the first one greater than or equal to 3 (takeWhile(k => k < i)). The first prime is 2, which is less than 3 -- so far so good. But we don't yet know the second prime, so we need to compute it. Fine, so we need to first see whether 3 belongs ... BOOM!
* It's pretty easy to see that if a number n is composite then the square of one of its factors must be less than or equal to n. If n is composite, then by definition n == a * b, where 1 < a <= b < n (we can guarantee a <= b just by labeling the two factors appropriately). From a <= b it follows that a^2 <= a * b, so it follows that a^2 <= n.
Your explanations are mostly correct, you made only two mistakes:
takeWhile doesn't include the last checked element:
scala> List(1,2,3).takeWhile(_<2)
res1: List[Int] = List(1)
You assume that ps always contains only a two and a three but because Stream is lazy it is possible to add new elements to it. In fact each time a new prime is found it is added to ps and in the next step takeWhile will consider this new added element. Here, it is important to remember that the tail of a Stream is computed only when it is needed, thus takeWhile can't see it before forall is evaluated to true.
Keep these two things in mind and you should came up with this:
ps = [2]
i = 3
takeWhile
2*2 <= 3 -> false
forall on []
-> true
ps = [2,3]
i = 4
takeWhile
2*2 <= 4 -> true
3*3 <= 4 -> false
forall on [2]
4%2 > 0 -> false
ps = [2,3]
i = 5
takeWhile
2*2 <= 5 -> true
3*3 <= 5 -> false
forall on [2]
5%2 > 0 -> true
ps = [2,3,5]
i = 6
...
While these steps describe the behavior of the code, it is not fully correct because not only adding elements to the Stream is lazy but every operation on it. This means that when you call xs.takeWhile(f) not all values until the point when f is false are computed at once - they are computed when forall wants to see them (because it is the only function here that needs to look at all elements before it definitely can result to true, for false it can abort earlier). Here the computation order when laziness is considered everywhere (example only looking at 9):
ps = [2,3,5,7]
i = 9
takeWhile on 2
2*2 <= 9 -> true
forall on 2
9%2 > 0 -> true
takeWhile on 3
3*3 <= 9 -> true
forall on 3
9%3 > 0 -> false
ps = [2,3,5,7]
i = 10
...
Because forall is aborted when it evaluates to false, takeWhile doesn't calculate the remaining possible elements.
That code is easier (for me, at least) to read with some variables renamed suggestively, as
lazy val ps: Stream[Int] = 2 #:: Stream.from(3).filter(i =>
ps.takeWhile{p => p * p <= i}.forall{ p => i % p > 0});
This reads left-to-right quite naturally, as
primes are 2, and those numbers i from 3 up, that all of the primes p whose square does not exceed the i, do not divide i evenly (i.e. without some non-zero remainder).
In a true recursive fashion, to understand this definition as defining the ever increasing stream of primes, we assume that it is so, and from that assumption we see that no contradiction arises, i.e. the truth of the definition holds.
The only potential problem after that, is the timing of accessing the stream ps as it is being defined. As the first step, imagine we just have another stream of primes provided to us from somewhere, magically. Then, after seeing the truth of the definition, check that the timing of the access is okay, i.e. we never try to access the areas of ps before they are defined; that would make the definition stuck, unproductive.
I remember reading somewhere (don't recall where) something like the following -- a conversation between a student and a wizard,
student: which numbers are prime?
wizard: well, do you know what number is the first prime?
s: yes, it's 2.
w: okay (quickly writes down 2 on a piece of paper). And what about the next one?
s: well, next candidate is 3. we need to check whether it is divided by any prime whose square does not exceed it, but I don't yet know what the primes are!
w: don't worry, I'l give them to you. It's a magic I know; I'm a wizard after all.
s: okay, so what is the first prime number?
w: (glances over the piece of paper) 2.
s: great, so its square is already greater than 3... HEY, you've cheated! .....
Here's a pseudocode1 translation of your code, read partially right-to-left, with some variables again renamed for clarity (using p for "prime"):
ps = 2 : filter (\i-> all (\p->rem i p > 0) (takeWhile (\p->p^2 <= i) ps)) [3..]
which is also
ps = 2 : [i | i <- [3..], and [rem i p > 0 | p <- takeWhile (\p->p^2 <= i) ps]]
which is a bit more visually apparent, using list comprehensions. and checks that all entries in a list of Booleans are True (read | as "for", <- as "drawn from", , as "such that" and (\p-> ...) as "lambda of p").
So you see, ps is a lazy list of 2, and then of numbers i drawn from a stream [3,4,5,...] such that for all p drawn from ps such that p^2 <= i, it is true that i % p > 0. Which is actually an optimal trial division algorithm. :)
There's a subtlety here of course: the list ps is open-ended. We use it as it is being "fleshed-out" (that of course, because it is lazy). When ps are taken from ps, it could potentially be a case that we run past its end, in which case we'd have a non-terminating calculation on our hands (a "black hole"). It just so happens :) (and needs to ⁄ can be proved mathematically) that this is impossible with the above definition. So 2 is put into ps unconditionally, so there's something in it to begin with.
But if we try to "simplify",
bad = 2 : [i | i <- [3..], and [rem i p > 0 | p <- takeWhile (\p->p < i) bad]]
it stops working after producing just one number, 2: when considering 3 as the candidate, takeWhile (\p->p < 3) bad demands the next number in bad after 2, but there aren't yet any more numbers there. It "jumps ahead of itself".
This is "fixed" with
bad = 2 : [i | i <- [3..], and [rem i p > 0 | p <- [2..(i-1)] ]]
but that is a much much slower trial division algorithm, very far from the optimal one.
--
1 (Haskell actually, it's just easier for me that way :) )