Strange behavior of Scala - scala

Can any of you explain why this is happening?
val s = "abcdefg"
val slides = s.sliding(4)
val n1 = slides.length
val n2 = slides.dropWhile(foo).length
println(n1) // 4
println(n2) // 0
println(slides.length) // 0
But:
val s = "abcdefg"
println(s.sliding(4).length) // 4
println(s.sliding(4).dropWhile(foo).length) // 3
println(s.sliding(4).length) // 4
Don't pay attention to the function "foo", it's a simple method to check if a string doesn't contain the letter "c".
Unfortunately, I don't understand this programming language behavior. Maybe someone with more knowledge can answer why this is happening.

slides is an Iterator. It is a special kind of "collection", that you can only traverse once.
Once you ask for its length, it has to scan through (and discard) all of its elements, to count them, so, it becomes empty, and when you ask for its length again (dropWhile is inconsequential here), it is 0.
This is useful in cases when you need to process a huge collection without loading it into memory all at once (e.g., reading a huge file line by line, to see if it contains the word "google" somewhere).
sliding returns an iterator, because making it traversable more than once may be expensive, but is rarely needed.
If you need to traverse it more than once, do val slides = s.sliding(4).toSeq

Related

Iterating through Seq[row] till a particular condition is met using Scala

I need to iterate a scala Seq of Row type until a particular condition is met. i dont need to process further post the condition.
I have a seq[Row] r->WrappedArray([1/1/2020,abc,1],[1/2/2020,pqr,1],[1/3/2020,stu,0],[1/4/2020,opq,1],[1/6/2020,lmn,0])
I want to iterate through this collection for r.getInt(2) until i encounter 0. As soon as i encounter 0, i need to break the iteration and collect r.getString(1) till then. I dont need to look into any other data post that.
My output should be: Array(abc,pqr,stu)
I am new to scala programming. This seq was actually a Dataframe. I know how to handle this using Spark dataframes, but due to some restriction put forth by my organization, windows function, createDataFrame function are not available/working in our environment. Hence i have resort to Scala programming to achieve the same.
All I could come up was something like below, but not really working!
breakable{
for(i <- r)
var temp = i.getInt(3)===0
if(temp ==true)
{
val = i.getInt(2)
break()
}
}
Can someone please help me here!
You can use the takeWhile method to grab the elements while it's value is 1
s.takeWhile(_.getInt(2) == 1).map(_.getString(1))
Than will give you
List(abc, pqr)
So you still need to get the first element where the int values 0 which you can do as follows:
s.find(_.getInt(2)== 0).map(_.getString(1)).get
Putting all together (and handle possible nil values):
s.takeWhile(_.getInt(2) == 1).map(_.getString(1)) ++ s.find(_.getInt(2)== 0).map(r => List(r.getString(1))).getOrElse(Nil)
Result:
Seq[String] = List(abc, pqr, stu)

Optimal Way to Achieve Traditional Loop Based Tasks in Scala

I am new to Scala and working on implementing an algorithm. In C#, this would have been a much easier task with necessary loops, but it is a bit confusing to implement with Scala functional programming semantics.
Assume I have to fill a spreadsheet (S) with N rows and M cols with values that I have in a one-dimensional list (L).
While filing an individual cell in the spreadsheet, there is a back and forth logic involved.
2a. The system will walk through the items in L sequentially and will fill the same in next empty cell in sheet S
2b. While filling the item value of the currently processed item from L in a cell, the system will check, can the current cell accept the item value. If yes, it will fill, and move on to the next item and follow Step 2a. If not, it will see if it could fill the next item from L. Until it finds a value that could fit in, the system will continue to evaluate till it runs out of values and will leave it blank.
2c. The system after filling the cell in Step 2b will move to the next cell. Now, it will first check whether any of the unprocessed values from the previous step (2b) could be accepted by the currently processed cell. If yes, it will fill the same and continue to do work with unprocessed values. If it cannot find an unprocessed value that could fit in, it will pull the next item from L based on the position of the pointer on Step 2b.
It would be great if I could get ideas of how-to structure this with Scala. As I mentioned earlier, in C# this would have been easy with foreach loops, but I am not sure what is the most optimal way to do this in a functional programming construct.
You can remember that imperative:
for (init; condition; afterEach) {
instructions
}
is just a syntactic sugar for:
init
while (condition) {
instructions
afterEach
}
(at least until you use break or continue). So if you are able to rewrite your for-loop code into while-loop code the translation is pretty straightforward.
If you are not interested in such solution you could do something like
val indices = for {
i <- (0 until n).toStream // or .to(LazyList) if on 2.13
j <- (0 until m).toStream // or .to(LazyList) if on 2.13
} yield i -> j
indices.foldLeft(allItemsToInsert) { case (itemsLeft, (i, j)) =>
itemsLeft.find(item => /* predicate if item can be inserted at (i, j) */) match {
case Some(item) =>
// insert item to spreadsheet
items diff List(1) // remove found element - use other data structure if you find this too costly
case None =>
items // nothing could be inserted, move on
}
}
This would go through all indices one after another, and then try to find the first element which can be inserted. If it does it would insert it and take it off the list, if it cannot be inserted move on.
You can tweak the logic to e.g. partition on items that can be inserted if there could be more than one:
indices.foldLeft(allItemsToInsert) { case (itemsLeft, (i, j)) =>
val (insertable, nonInsertable) = itemsLeft.partition(item => /* predicate if item can be inserted */)
// insert insertable
nonInsertable // pass non-insertable for the next indice
}
Alternatively you could also use tail recursion if you really need to go back and forth:
#scala.annotation.tailrec
def insertValues(items: List[Item], i: Int, j: Int): Unit = {
if (items.nonEmpty) {
// insert what you can into spreadsheet
val itemsLeft = ... // items that you haven't inserted
val newI, newJ = ...
insertValues(itemsLeft, newI, newJ)
}
}

Index of word in string 'covering' certain position

Not sure if this is the right place to ask but I couldn't find any related or similar questions.
Anyway: imagine you have a certain string like
val exampleString = "Hello StackOverflow this is my question, cool right?"
If given a position in this string, for example 23, return the word that 'occupies' this position in the string. If we look at the example string, we can see that the 23rd character is the letter 's' (the last character of 'this'), so we should return index = 5 (because 'this' is the 5th word). In my question spaces are counted as words. If, for example, we were given position 5, we land on the first space and thus we should return index = 1.
I'm implementing this in Scala (but this should be quite language-agnostic and I would love to see implementations in other languages).
Currently I have the following approach (assume exampleString is the given string and charPosition the given position):
exampleString.split("((?<= )|(?= ))").scanLeft(0)((a, b) => a + b.length()).drop(1).zipWithIndex.takeWhile(_._1 <= charPosition).last._2 + 1
This works, but it is way too complex to be honest. Is there a better (more efficient?) way to achieve this. I'm fairly new to functions like fold, scan, map, filter ... but I would love to learn more.
Thanks in advance.
def wordIndex(exampleString: String, index: Int): Int = {
exampleString.take(index + 1).foldLeft((0, exampleString.head.isWhitespace)) {
case ((n, isWhitespace), c) =>
if (isWhitespace == c.isWhitespace) (n, isWhitespace)
else (n + 1, !isWhitespace)
}._1
}
This will fold over the string, keeping track of whether the previous character was a whitespace or not, and if it detects a change, it will flip the boolean and add 1 to the count (n).
This will be able to handle groups of spaces (e.g. in hello world, world would be at position 2), and also spaces at the start of the string would count as index 0 and the first word would be index 1.
Note that this can't handle when the input is an empty string, I'll let you decide what you want to do in that case.

Scala: For loop that matches ints in a List

New to Scala. I'm iterating a for loop 100 times. 10 times I want condition 'a' to be met and 90 times condition 'b'. However I want the 10 a's to occur at random.
The best way I can think is to create a val of 10 random integers, then loop through 1 to 100 ints.
For example:
val z = List.fill(10)(100).map(scala.util.Random.nextInt)
z: List[Int] = List(71, 5, 2, 9, 26, 96, 69, 26, 92, 4)
Then something like:
for (i <- 1 to 100) {
whenever i == to a number in z: 'Condition a met: do something'
else {
'condition b met: do something else'
}
}
I tried using contains and == and =! but nothing seemed to work. How else can I do this?
Your generation of random numbers could yield duplicates... is that OK? Here's how you can easily generate 10 unique numbers 1-100 (by generating a randomly shuffled sequence of 1-100 and taking first ten):
val r = scala.util.Random.shuffle(1 to 100).toList.take(10)
Now you can simply partition a range 1-100 into those who are contained in your randomly generated list and those who are not:
val (listOfA, listOfB) = (1 to 100).partition(r.contains(_))
Now do whatever you want with those two lists, e.g.:
println(listOfA.mkString(","))
println(listOfB.mkString(","))
Of course, you can always simply go through the list one by one:
(1 to 100).map {
case i if (r.contains(i)) => println("yes: " + i) // or whatever
case i => println("no: " + i)
}
What you consider to be a simple for-loop actually isn't one. It's a for-comprehension and it's a syntax sugar that de-sugares into chained calls of maps, flatMaps and filters. Yes, it can be used in the same way as you would use the classical for-loop, but this is only because List is in fact a monad. Without going into too much details, if you want to do things the idiomatic Scala way (the "functional" way), you should avoid trying to write classical iterative for loops and prefer getting a collection of your data and then mapping over its elements to perform whatever it is that you need. Note that collections have a really rich library behind them which allows you to invoke cool methods such as partition.
EDIT (for completeness):
Also, you should avoid side-effects, or at least push them as far down the road as possible. I'm talking about the second example from my answer. Let's say you really need to log that stuff (you would be using a logger, but println is good enough for this example). Doing it like this is bad. Btw note that you could use foreach instead of map in that case, because you're not collecting results, just performing the side effects.
Good way would be to compute the needed stuff by modifying each element into an appropriate string. So, calculate the needed strings and accumulate them into results:
val results = (1 to 100).map {
case i if (r.contains(i)) => ("yes: " + i) // or whatever
case i => ("no: " + i)
}
// do whatever with results, e.g. print them
Now results contains a list of a hundred "yes x" and "no x" strings, but you didn't do the ugly thing and perform logging as a side effect in the mapping process. Instead, you mapped each element of the collection into a corresponding string (note that original collection remains intact, so if (1 to 100) was stored in some value, it's still there; mapping creates a new collection) and now you can do whatever you want with it, e.g. pass it on to the logger. Yes, at some point you need to do "the ugly side effect thing" and log the stuff, but at least you will have a special part of code for doing that and you will not be mixing it into your mapping logic which checks if number is contained in the random sequence.
(1 to 100).foreach { x =>
if(z.contains(x)) {
// do something
} else {
// do something else
}
}
or you can use a partial function, like so:
(1 to 100).foreach {
case x if(z.contains(x)) => // do something
case _ => // do something else
}

What is the difference between Reactive programming and plain old closures?

Example from scala.rx:
import rx._
val a = Var(1); val b = Var(2)
val c = Rx{ a() + b() }
println(c()) // 3
a() = 4
println(c()) // 6
How is the above version better than:
var a = 1; var b = 2
def c = a + b
println(c) // 3
a = 4
println(c) // 6
The only thing I can think of is that the first example is efficient in the sense that unless a or b changes, c is not recalculated but in my version, c is recomputed every time I invoke c() but that is just a special case of memoization with size=1 e.g. I can do this to prevent re-computations using a memoization macro:
var a = 1; var b = 2
#memoize(maxSize = 1) def c(x: Int = a, y: Int = z) = x + y
Is there anything that I am missing to grok about reactive programming that provides insight into why it might be a better paradigm (than memoized closures) in certain cases?
Problem: It's a bad example
The example on the web page doesn't illustrate the purpose of Scala.RX very well. In that sense it is a quite bad example.
What is Scala.RX for?
It's about notifications
The idea of Scala.Rs is that a piece of code can get notifications, when data changes. Usually the this notification is used to (re-)calculate a result that depends on the changed data.
Scala.RX automates the wiring
When the calculation goes over multiple stages, it becomes quite hard to track which intermediate result depends on which data and on which other intermediate results. Additionally on must recalculate the intermediate results in the correct order.
You can think of this just like a big excel sheet which must of formulas that depend of each other. When you change one of the input values, Excel has to figure out, which parts of the sheet must be recalculated in which order. When Excel has re-calculated all the changed cells, it can update the display.
Scala.RX can do a similar thing than Excel: It tracks how the formulas depend on each other on notifies the ones that need to update in the correct order.
Purpose: MVC
Scala.RX is a nice tool to implement the MVC-pattern, especially when you have business applications that you could also bring to excel.
There is also a variant that works with Scala.js, i.e. that runs in the browser as part of a HTML site. This can be quite useful if you want to dynamically update parts of a HTML page according to changes on the server or edits of the user.
Limitations
Scala.RX doe not scale when you have a huge amounts of input data, e.g. operations on huge matrices.
A better example
import rx._
import rx.ops._
val a = Var(1); val b = Var(2)
val c: Rx[Int] = Rx{ a() + b() }
val o = c.foreach{value =>
println(s"c has a new value: ${value}")
}
a()=4
b()=12
a()=35
Gives you the following output:
c has a new value: 3
c has a new value: 6
c has a new value: 16
c has a new value: 47
Now imagine instead of printing the value, you will refresh controls in a UI or parts of a HTML page.