Off by one with sliding?

Off by one with sliding? - scala

One of the advantages of not handling collections through indices is to avoid off-by-one errors. That's certainly not the only advantage, but it is one of them.
Now, I often use sliding in some algorithms in Scala, but I feel that it usually results in something very similar to the off-by-one errors, because a sliding of m elements in a collection of size n has size n - m + 1 elements. Or, more trivially, list sliding 2 is one element shorter than list.
The feeling I get is that there's a missing abstraction in this pattern, something that would be part sliding, part something more -- like foldLeft is to reduceLeft. I can't think of what that might be, however. Can anyone help me find enlightenment here?
UPDATE
Since people are not clear one what I'm talking, let's consider this case. I want to capitalize a string. Basically, every letter that is not preceded by a letter should be upper case, and all other letters should be lower case. Using sliding, I have to special case either the first or the last letter. For example:
def capitalize(s: String) = s(0).toUpper +: s.toSeq.sliding(2).map {
case Seq(c1, c2) if c2.isLetter => if (c1.isLetter) c2.toLower else c2.toUpper
case Seq(_, x) => x
}.mkString

I’m taking Owen’s answer as an inspiration to this.
When you want to apply a simple diff() to a list, this can be seen as equivalent to the following matrix multiplication.
a = (0 1 4 3).T
M = ( 1 -1 0 0)
( 0 1 -1 0)
( 0 0 1 -1)
diff(a) = M * a = (1 3 1).T
We may now use the same scheme for general list operations, if we replace addition and multiplication (and if we generalise the numbers in our matrix M).
So, with plus being a list append operation (with flatten afterwards – or simply a collect operation), and the multiplicative equivalent being either Some(_) or None, a slide with a window size of two becomes:
M = (Some(_) Some(_) None None)
(None Some(_) Some(_) None)
(None None Some(_) Some(_))
slide(a) = M “*” a = ((0 1) (1 4) (4 3)).T
Not sure, if this is the kind of abstraction you’re looking for, but it would be a generalisation on a class of operations which change the number of items.
diff or slide operations of order m for an input of length n will need to use Matrixes of size n-m+1 × n.
Edit: A solution could be to transform List[A] to List[Some[A]] and then to prepend or append (slideLeft or slideRight) these with None. That way you could handle all the magic inside the map method.
list.slideLeft(2) {
case Seq(Some(c1), Some(c2)) if c2.isLetter => if (c1.isLetter) c2.toLower else c2.toUpper
case Seq(_, Some(x)) => x
}

I run into this problem all the time in python/R/Matlab where you diff() a vector and then can't line it up with the original one! It is very frustrating.
I think what's really missing is that the vector only hold the dependent variables, and assumes that you, the programmer, are keeping track of the independent variables, ie the dimension that the collection ranges over.
I think the way to solve this is to have the language to some degree keep track of independent variables; perhaps statically through types, or dynamically by storing them along with the vector. Then it can check the independent axes, make sure they line up, or, I don't know if this is possible, shuffle things around to make them line up.
That's the best I've thought of so far.
EDIT
Another way of thinking about this is, why does your collection have order? Why is it not just a Set? The order means something, but the collection doesn't keep track of that -- it's basically using sequential position (which is about as informative as numerical indices) to proxy for the real meaning.
EDIT
Another consequence would be that transformations like sliding actually represent two transformations, one for the dependent variables, and one for their axis.

In your example, I think the code is made more complex because, you basically want to do a map but working with sliding which introduces edge conditions in a way that doesn't work nicely. I think a fold left with an accumulator that remembers the relevant state may be easier conceptually:
def capitalize2(s: String) = (("", true) /: s){ case ((res, notLetter), c) =>
(res + (if (notLetter) c.toUpper else c.toLower), !c.isLetter)
}._1
I think this could be generalized so that notLetter could remember n elements where n is the size of the sliding window.

The transformation you're asking for inherently reduces the size of the data. Sorry--there's no other way to look at it. tail also gives you off-by-one errors.
Now, you might say--well, fine, but I want a convenience method to maintain the original size.
In that case, you might want these methods on List:
initializedSliding(init: List[A]) = (init ::: this).sliding(1 + init.length)
finalizedSliding(tail: List[A]) = (this ::: tail).sliding(1 + tail.length)
which will maintain your list length. (You can envision how to generalize to non-lists, I'm sure.)
This is the analog to fold left/right in that you supply the missing data in order to perform a pairwise (or more) operation on every element of the list.

The off by one problem you describe reminds me in the boundary condition issue in digital signal processing. The problem occurs since the data (list) is finite. It doesn't occur for infinite data (stream). In digital signal processing the issues is remedied by extending the finite signal to an infinite one. This can be done in various ways like repeating the data or repeating the data and reversing it on every repetition (like it is done for the discrete cosine transform).
Borrowing from these approached for sliding would lead to an abstraction which does not exhibit the off by one problem:
(1::2::3::Nil).sliding(2)
would yield
(1,2), (2,3), (3,1)
for circular boundary conditions and
(1,2), (2,3), (3,2)
for circular boundary conditions with reversal.

Off-by-one errors suggest that you are trying to put the original list in one-to-one correspondence with the sliding list, but something strange is going on, since the sliding list has fewer elements.
The problem statement for your example can be roughly phrased as: "Uppercase every character if it (a) is the first character, or (b) follows a letter character". As Owen points, the first character is a special case, and any abstraction should respect this. Here's a possibility,
def slidingPairMap[A, B](s: List[A], f1: A => B, f2: (A, A) => B): List[B] = s match {
case Nil => Nil
case x :: _ => f1(x) +: s.sliding(2).toList.map { case List(x, y) => f2(x, y) }
}
(not the best implementation, but you get the idea). This generalizes to sliding triples, with off-by-two errors, and so on. The type of slidingPairMap makes it clear that special casing is being done.
An equivalent signature could be
def slidingPairMap[A, B](s: List[A], f: Either[A, (A, A)] => B): List[B]
Then f could use pattern matching to figure out if it's working with the first element, or with a subsequent one.
Or, as Owen says in the comments, why not make a modified sliding method that gives information about whether the element is first or not,
def slidingPairs[A](s: List[A]): List[Either[A, (A, A)]]
I guess this last idea is isomorphic to what Debilski suggests in the comments: pad the beginning of the list with None, wrap all the existing elements with Some, and then call sliding.

I realize this is an old question but I just had a similar problem and I wanted to solve it without having to append or prepend anything, and where it would handle the last elements of the sequence in a seamless manner. The approach I came up with is a slidingFoldLeft. You have to handle the first element as a special case (like some others mentioned, for capitalize, it is a special case), but for the end of the sequence you can just handle it like other cases. Here is the implementation and some silly examples:
def slidingFoldLeft[A, B] (seq: Seq[A], window: Int)(acc: B)(
f: (B, Seq[A]) => B): B = {
if (window > 0) {
val iter = seq.sliding(window)
iter.foldLeft(acc){
// Operate normally
case (acc, next) if iter.hasNext => f(acc, next)
// It's at the last <window> elements of the seq, handle current case and
// call recursively with smaller window
case (acc, next) =>
slidingFoldLeft(next.tail, window - 1)(f(acc, next))(f)
}
} else acc
}
def capitalizeAndQuestionIncredulously(s: String) =
slidingFoldLeft(s.toSeq, 2)("" + s(0).toUpper) {
// Normal iteration
case (acc, Seq(c1, c2)) if c1.isLetter && c2.isLetter => acc + c2.toLower
case (acc, Seq(_, c2)) if c2.isLetter => acc + c2.toUpper
case (acc, Seq(_, c2)) => acc + c2
// Last element of string
case (acc, Seq(c)) => acc + "?!"
}
def capitalizeAndInterruptAndQuestionIncredulously(s: String) =
slidingFoldLeft(s.toSeq, 3)("" + s(0).toUpper) {
// Normal iteration
case (acc, Seq(c1, c2, _)) if c1.isLetter && c2.isLetter => acc + c2.toLower
case (acc, Seq(_, c2, _)) if c2.isLetter => acc + c2.toUpper
case (acc, Seq(_, c2, _)) => acc + c2
// Last two elements of string
case (acc, Seq(c1, c2)) => acc + " (commercial break) " + c2
// Last element of string
case (acc, Seq(c)) => acc + "?!"
}
println(capitalizeAndQuestionIncredulously("hello my name is mAtthew"))
println(capitalizeAndInterruptAndQuestionIncredulously("hello my name is mAtthew"))
And the output:
Hello My Name Is Matthew?!
Hello My Name Is Matthe (commercial break) w?!

I would prepend None after mapping with Some(_) the elements; note that the obvious way of doing it (matching for two Some in the default case, as done in the edit by Debilski) is wrong, as we must be able to modify even the first letter. This way, the abstraction respects the fact that simply sometimes there is no predecessor. Using getOrElse(false) ensures that a missing predecessor is treated as having failed the test.
((None +: "foo1bar".toSeq.map(Some(_))) sliding 2).map {
case Seq(c1Opt, Some(c2)) if c2.isLetter => if (c1Opt.map(_.isLetter).getOrElse(false)) c2.toLower else c2.toUpper
case Seq(_, Some(x)) => x
}.mkString
res13: String = "Foo1Bar"
Acknowledgments: the idea of mapping the elements with Some(_) did come to me through Debilski's post.

I'm not sure if this solves your concrete problem, but we could easily imagine a pair of methods e.g. slidingFromLeft(z: A, size: Int) and slidingToRight(z: A, size: Int) (where A is collection's element type) which, when called on e.g.
List(1, 2, 3, 4, 5)
with arguments e.g. (0, 3), should produce respectively
List(0, 0, 1), List(0, 1, 2), List(1, 2, 3), List(2, 3, 4), List(3, 4, 5)
and
List(1, 2, 3), List(2, 3, 4), List(3, 4, 5), List(4, 5, 0), List(5, 0, 0)

This is the sort of problem nicely-suited to an array-oriented functional language like J. Basically, we generate a boolean with a one corresponding to the first letter of each word. To do this, we start with a boolean marking the spaces in a string. For example (lines indented three spaces are inputs; results are flush with left margin; "NB." starts a comment):
str=. 'now is the time' NB. Example w/extra spaces for interest
]whspc=. ' '=str NB. Mark where spaces are=1
0 0 0 1 1 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0
Verify that (*.-.) ("and not") returns one only for "1 0":
]tt=. #:i.4 NB. Truth table
0 0
0 1
1 0
1 1
(*.-.)/"1 tt NB. Apply to 1-D sub-arrays (rows)
0 0 1 0 NB. As hoped.
Slide our tacit function across pairs in the boolean:
2(*.-.)/\whspc NB. Apply to 2-ples
0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0
But this doesn't handle the edge condition of the initial letter, so force a one into the first position. This actually helps as the reduction of 2-ples left us one short. Here we compare lengths of the original boolean and the target boolean:
#whspc
20
#1,2(*.-.)/\whspc
20
1,2(*.-.)/\whspc
1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0
We get uppercase by using the index into the lowercase vector to select from the uppercase vector (after defining these two vectors):
'lc uc'=. 'abcdefghijklmnopqrstuvwxyz';'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
(uc,' '){~lc i. str
NOW IS THE TIME
Check that insertion by boolean gives correct result:
(1,2(*.-.)/\whspc) } str,:(uc,' '){~lc i. str
Now Is The Time
Now is the time to combine all this into one statement:
(1,2(*.-.)/\' '=str) } str,:(uc,' '){~lc i. str
Now Is The Time

Related

When does Scala force a stream value?

I am comfortable with streams, but I admit I am puzzled by this behavior:
import collection.immutable.Stream
object StreamForceTest extends App {
println("Computing fibs")
val fibs: Stream[BigInt] = BigInt(0) #:: BigInt(1) #::
fibs.zip(fibs.tail).map((x: (BigInt, BigInt)) => {
println("Adding " + x._1 + " and " + x._2);
x._1 + x._2
})
println("Taking first 5 elements")
val fibs5 = fibs.take(5)
println("Computing length of that prefix")
println("fibs5.length = " + fibs5.length)
}
with output
Computing fibs
Taking first 5 elements
Computing length of that prefix
Adding 0 and 1
Adding 1 and 1
Adding 1 and 2
fibs5.length = 5
Why should take(5) not force the stream's values to be computed,
while length does do so? Offhand neither one needs to actually
look at the values, but I would have thought that take was more
likely to do it than length. Inspecting the source code on github,
we find these definitions for take (including an illuminating
comment):
override def take(n: Int): Stream[A] = (
// Note that the n == 1 condition appears redundant but is not.
// It prevents "tail" from being referenced (and its head being evaluated)
// when obtaining the last element of the result. Such are the challenges
// of working with a lazy-but-not-really sequence.
if (n <= 0 || isEmpty) Stream.empty
else if (n == 1) cons(head, Stream.empty)
else cons(head, tail take n-1)
)
and length:
override def length: Int = {
var len = 0
var left = this
while (!left.isEmpty) {
len += 1
left = left.tail
}
len
}
The definition of head and tail is obtained from the specific
subclass (Empty and Cons). (Of course Empty is an object, not a
class, and its definitions of head and tail just throw
exceptions.) There are subtleties, but they seem to concern making
sure that the tail of a Cons is evaluated lazily; the head
definition is straight out of lecture 0 on Scala constructors.
Note that length doesn't go near head, but it's the one that
does the forcing.
All this is part of a general puzzlement about how close Scala streams
are to Haskell lists. I thought Haskell treated head and tail
symmetrically (I'm not a serious Haskell hacker), and Scala forced
head evaluation in more circumstances. I'm trying to figure out
exactly what those circumstances are.

Stream's head is strict and its tail is lazy, as you can see in cons.apply and in the Cons constructor:
def apply[A](hd: A, tl: => Stream[A]) = new Cons(hd, tl)
class Cons[+A](hd: A, tl: => Stream[A]) extends Stream[A]
Notice the context in which the take method refers to tail:
cons(head, tail take n-1)
Because the expression tail take n-1 is used as the second argument to cons, which is passed by name, it doesn't force evaluation of tail take n-1, thus doesn't force evaluation of tail.
Whereas in length, the statement
left = left.tail
, by assigning left.tail to a var, does force its evaluation.
Scala is "strict by default". In most situations, everything you reference will be evaluated. We only have lazy evaluation in cases where a method/constructor parameter declares an call-by-name argument with =>, and in the culture we don't typically use this unless there's a special reason.

Let me offer another answer to this, one that just looks from a high level, i.e. without actually considering the code.
If you want to know how long a Stream is, you must evaluate it all the way to the end. Otherwise, you can only guess at its length. Admittedly, you may not actually care about the values (since you only want to count them) but that's immaterial.
On the other hand, when you "take" a certain number of elements from a stream (or indeed any collection) you are simply saying that you want at most that number of elements. The result is still a stream even though it may have been truncated.

Evaluating Performance of Functions

The "oneToEach" function adds 1 to each element of a List[Int]. The first function is not tail recursive, whereas the latter is.
If I had a one million length List[Int] that I passed in to these 2 functions, which one would perform better? Better = faster or less resource usage.
// Add one to each element of a list
def oneToEach(l: List[Int]) : List[Int] =
l match {
case Nil => l
case x :: xs => (x + 1) :: oneToEach(xs)
}
...
def oneToEachTR(l: List[Int]) : List[Int] = {
def go(l: List[Int], acc: List[Int]) : List[Int] =
l match {
case Nil => acc
case x :: xs => go(xs, acc :+ (x + 1))
}
go(l, List[Int]())
}
If I understand, the first function has algorithmic complexity of O(n) since it's necessary to recurse through each item of the list and add 1.
For oneToEachTR, it uses the :+ operator, which, I've read, is O(n) complexity. As a result of using this operator per recursion/item in the list, does the worst-case algorithm complexity become O(2*n)?
Lastly, for the million-element List, will the latter function perform better with respect to resources since it's tail-recursive?

Regarding
For oneToEachTR, it uses the :+ operator, which, I've read, is O(n) complexity. As a result of using this operator per recursion/item in the list, does the worst-case algorithm complexity become O(2*n)?
no, it becomes O(n^2)
Tail recursion won't save a O(n^2) algorithm vs O(n) for sufficiently large n; 1 million is certainly sufficient!
Why O(n^2)?
You've got a list of n elements.
The first call to :+ will traverse 0 elements (acc is is initially empty) and append 1: 1 operation.
The second call will traverse 1 element and append 1: 2 operations
The third call.. 2 elements + append 1: 3 operations
...
The sum of all "operations" is 1 + 2 + 3 + ... + n = n(n+1)/2 = (1/2)n^2 + n/2. That's "in the order of" n^2, or O(n^2).

How to filter a list with a condition that depends on more than one element

Given a list L I want to keep an element L(i) if it exists at least one value j > i such that L(j) is a multiple of L(i), otherwise L(i) should be discarded.
It is quite simple to do that by means of imperative programming paradigms, but I would like to do that using functional programming.
Is that possible to use the filter method? If so, how to write the condition (i.e. the parameter of the filter function) ? Otherwise, what can I do?

For example:
val l = (1 to 100)
l.tails.collect { case (head +: tail) if tail.exists(_ % head == 0) => head } .toList
tail produces an iterator that returns in each step the input minus one element, e.g.
(1 to 10).tails.foreach(println)
gives
Vector(1, 2, 3, 4)
Vector(2, 3, 4)
Vector(3, 4)
Vector(4)
Vector()
You can view these 'tails' as a head element to which you want to apply a filter and a tail in itself that is used to find out whether to keep the head.
The collect method is useful here, because it takes a partial function, so you only need to specify the cases where you actually retain a value—like filter—, while at the same time it acts like a map by letting you specify how the filtered value is to be collected.
So we can match on tails that have at least one head element and a tail of any size, and then see if in that tail there exists an element that is a multiple of the head. I use a guard here for the match case, so the match is a double filter. First, the tail must be non-empty, second there must be multiple. A multiple means that the modulus is zero. If the case matches, just return the verified head element.
Finally, since without specific type annotations the collect will just return another iterator, we turn the result into a list with toList.

A more "explicit" one - you accumulate elements in a case if tail has a multiple of head:
(1 to 10).tails.foldLeft(List[Int]())((acc, tl) => tl match {
case h +: t if (t.exists(_ % h == 0)) => h :: acc
case _ => acc
}).reverse

Explain some scala code - beginner

I've encountered this scala code and I'm trying to work out what its doing except the fact it returns an int. I'm unsure of these three lines :
l match {
case h :: t =>
case _ => 0
Function :
def iterate(l: List[Int]): Int =
l match {
case h :: t =>
if (h > n) 0
case _ => 0
}

First, you define a function called iterate and you specified the return type as Int. It has arity 1, parameter l of type List[Int].
The List type is prominent throughout functional programming, and it's main characteristics being that it has efficient prepend and that it is easy to decompose any List into a head and tail. The head would be the first element of the list (if non-empty) and the tail would be the rest of the List(which itself is a List) - this becomes useful for recursive functions that operate on List.
The match is called pattern matching.. it's essentially a switch statement in the C-ish languages, but much more powerful - the switch restricts you to constants (at least in C it does), but there is no such restriction with match.
Now, your first case you have h :: t - the :: is called a "cons", another term from functional programming. When you create a new List from another List via a prepend, you can use the :: operator to do it.
Example:
val oldList = List(1, 2, 3)
val newList = 0 :: oldList // newList == List(0, 1, 2, 3)
In Scala, operators that end with a : are really a method of the right hand side, so 0 :: oldList is the equivalent of oldList.::(0) - the 0 :: oldList is syntactic sugar that makes it easier to read.
We could've defined oldList like
val oldList = 1 :: 2 :: 3 :: Nil
where Nil represents an empty List. Breaking this down into steps:
3 :: Nil is evaluated first, creating the equivalent of a List(3) which has head 3 and empty tail.
2 is prepended to the above list, creative a new list with head 2 and tail List(3).
1 is prepended, creating a new list with head 1 and tail List(2, 3).
The resulting List of List(1, 2, 3) is assigned to the val oldList.
Now when you use :: to pattern match you essentially decompose a List into a head and tail, like the reverse of how we created the List above. Here when you do
l match {
case h :: t => ...
}
you are saying decompose l into a head and tail if possible. If you decompose successfully, you can then use these h and t variables to do whatever you want.. typically you would do something like act on h and call the recursive function on t.
One thing to note here is that your code will not compile.. you do an if (h > n) 0 but there is no explicit else so what happens is your code looks like this to the compiler:
if (h > n) 0
else { }
which has type AnyVal (the common supertype of 0 and "nothing"), a violation of your Int guarentee - you're going to have to add an else branch with some failure value or something.
The second case _ => is like a default in the switch, it catches anything that failed the head/tail decomposition in your first case.
Your code essentially does this:
Take the l List parameter and see if it can be decomposed into a head and tail.
If it can be, compare the head against (what I assume to be) a variable in the outer scope called n. If it is greater than n, the function returns 0. (You need to add what happens if it's not greater)
If it cannot be decomposed, the function returns 0.

This is called pattern matching. It's like a switch statement, but more powerful.
Some useful resources:
http://www.scala-lang.org/node/120
http://www.codecommit.com/blog/scala/scala-for-java-refugees-part-4

Why is this scala prime generation so slow/memory intensive?

I run out of memory while finding the 10,001th prime number.
object Euler0007 {
def from(n: Int): Stream[Int] = n #:: from(n + 1)
def sieve(s: Stream[Int]): Stream[Int] = s.head #:: sieve(s.filter(_ % s.head != 0))
def primes = sieve(from(2))
def main(args: Array[String]): Unit = {
println(primes(10001))
}
}
Is this because after each "iteration" (is this the correct term in this context?) of primes, I increase the stack of functions to be called to get the next element by one?
One solution that I've found on the web which doesn't resort to an iterative solution (which I'd like to avoid to get into functional programming/idiomatic scala) is this (Problem 7):
lazy val ps: Stream[Int] = 2 #:: Stream.from(3).filter(i => ps.takeWhile(j => j * j <= i).forall(i % _ > 0))
From what I can see, this does not lead to this recursion-like way. Is this a good way to do it, or do you know of a better way?

One reason why this is slow is that it isn't the sieve of Eratosthenes. Read http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf for a detailled explanation (the examples are in Haskell, but can be translated directly into Scala).
My old solution for Euler problem #7 wasn't the "true" sieve either, but it seems to work good enough for little numbers:
object Sieve {
val primes = 2 #:: sieve(3)
def sieve(n: Int) : Stream[Int] =
if (primes.takeWhile(p => p*p <= n).exists(n % _ == 0)) sieve(n + 2)
else n #:: sieve(n + 2)
def main(args: Array[String]) {
println(primes(10000)) //note that indexes are zero-based
}
}
I think the problem with your first version is that you have only defs and no val which collects the results and can be consulted by the generating function, so you always recalculate from scratch.

Yes, it is because you "increase the stack of functions to be called to get the next element, by one after each "iteration" " - i.e. add a new filter on top of stack of filters each time after getting each prime. That's way too many filters.
This means that each produced prime gets tested by all its preceding primes - but only those below its square root are really needed. For instance, to get the 10001-th prime, 104743, there will be 10000 filters created, at run-time. But there are just 66 primes below 323, the square root of 104743, so only 66 filters were really needed. All the 9934 others will be there needlessly, taking up memory, hard at work producing absolutely no added value.
This is the key deficiency of that "functional sieve", which seems to have originated in the 1970s code by David Turner, and later have found its way into the SICP book and other places. It is not that it's a trial division sieve (rather than the sieve of Eratosthenes). That's far too remote a concern for it. Trial division, when optimally implemented, is perfectly capable of producing the 10000th prime very fast.
The key deficiency of that code is that it does not postpone the creation of filters to the right moment, and ends up creating far too many of them.
Talking complexities now, the "old sieve" code is O(n2), in n primes produced. The optimal trial division is O(n1.5/log0.5(n)), and the sieve of Eratosthenes is O(n*log(n)*log(log(n))). As empirical orders of growth the first is seen typically as ~ n^2, the second as ~ n^1.45 and the third ~ n^1.2.
You can find Python generators-based code for optimal trial division implemented in this answer (2nd half of it). It was originally discussed here dealing with the Haskell equivalent of your sieve function.
Just as an illustration, a "readable pseudocode" :) for the old sieve is
primes = sieve [2..] where
sieve (x:xs) = x : sieve [ y | y <- xs, rem y x > 0 ]
-- list of 'y's, drawn from 'xs',
-- such that (y % x > 0)
and for optimal trial division (TD) sieve, synchronized on primes' squares,
primes = sieve [2..] primes where
sieve (x:xs) ps = x : (h ++ sieve [ y | y <- t, rem y p > 0 ] qs)
where
(p:qs) = ps -- 'p' is head elt in 'ps', and 'qs' the rest
(h,t) = span (< p*p) xs -- 'h' are elts below p^2 in 'xs'
-- and 't' are the rest
and for a sieve of Eratosthenes, devised by Richard Bird, as seen in that JFP article mentioned in another answer here,
primes = 2 : minus [3..]
(foldr (\p r-> p*p : union [p*p+p, p*p+2*p..] r) [] primes)
-- function of 'p' and 'r', that returns
-- a list with p^2 as its head elt, ...
Short and fast. (minus a b is a list a with all the elts of b progressively removed from it; union a b is a list a with all the elts of b progressively added to it without duplicates; both dealing with ordered, non-decreasing lists). foldr is the right fold of a list. Because it is linear this runs at ~ n^1.33, to make it run at ~ n^1.2 the tree-like folding function foldi can be used).
The answer to your second question is also a yes. Your second code, re-written in same "pseudocode",
ps = 2 : [i | i <- [3..], all ((> 0).rem i) (takeWhile ((<= i).(^2)) ps)]
is very similar to the optimal TD sieve above - both arrange for each candidate to be tested by all primes below its square root. While the sieve arranges that with a run-time sequence of postponed filters, the latter definition re-fetches the needed primes anew for each candidate. One might be faster than another depending on a compiler, but both are essentially the same.
And the third is also a yes: the sieve of Eratosthenes is better,
ps = 2 : 3 : minus [5,7..] (unionAll [[p*p, p*p+2*p..] | p <- drop 1 ps])
unionAll = foldi union' [] -- one possible implementation
union' (x:xs) ys = x : union xs ys
-- unconditionally produce first elt of the 1st arg
-- to avoid run-away access to infinite lists
It looks like it can be implemented in Scala too, judging by the similarity of other code snippets. (Though I don't know Scala). unionAll here implements tree-like folding structure (click for a picture and full code) but could also be implemented with a sliding array, working segment by segment along the streams of primes' multiples.
TL;DR: yes, yes, and yes.

FWIW, here's a real Sieve of Eratosthenes:
def sieve(n: Int) = (2 to math.sqrt(n).toInt).foldLeft((2 to n).toSet) { (ps, x) =>
if (ps(x)) ps -- (x * x to n by x)
else ps
}
Here's an infinite stream of primes using a variation on the Sieve of Eratosthenes that preserves its fundamental properties:
case class Cross(next: Int, incr: Int)
def adjustCrosses(crosses: List[Cross], current: Int) = {
crosses map {
case cross # Cross(`current`, incr) => cross copy (next = current + incr)
case unchangedCross => unchangedCross
}
}
def notPrime(crosses: List[Cross], current: Int) = crosses exists (_.next == current)
def sieve(s: Stream[Int], crosses: List[Cross]): Stream[Int] = {
val current #:: rest = s
if (notPrime(crosses, current)) sieve(rest, adjustCrosses(crosses, current))
else current #:: sieve(rest, Cross(current * current, current) :: crosses)
}
def primes = sieve(Stream from 2, Nil)
This is somewhat difficult to use, however, since each element of the Stream is composed using the crosses list, which has as many numbers as there have been primes up to a number, and it seems that, for some reason, these lists are being kept in memory for each number in the Stream.
For example, prompted by a comment, primes take 6000 contains 56993 would throw a GC exception whereas primes drop 5000 take 1000 contains 56993 would return a result rather fast on my tests.