Sort List according to more than only constraint in Scala - scala

I am desperately trying to find a way to sort a List of strings, where the strings are predefined identifiers of following form: a1.1, a1.2,..., a1.100, a2.1, a2.2,....,a2.100,...,b1.1, b1.2,.. and so on, which is alread the correct ordering. So each identifier is first ordered by its first character (descending alphabetic order) and within this ordering descending ordered by consecutive numbers. I have tried sortWith by providing a sorting function specifying the above rule for all two consecutive list members.
scala> List("a1.102", "b2.2", "b2.1", "a1.1").sortWith((a: String, b: String) => a.take(1) < b.take(1) && a.drop(1).toDouble < b.drop(1).toDouble)
res2: List[java.lang.String] = List(a1.102, a1.1, b2.2, b2.1)
This is not the ordering I expected. However, by swapping the ordering of the expressions, as
scala> List("a1.102", "b2.2", "b2.1", "a1.1").sortWith((a: String, b: String) => (a.drop(1).toDouble < b.drop(1).toDouble && a.take(1) < b.take(2)))
res3: List[java.lang.String] = List(a1.1, a1.102, b2.1, b2.2)
this indeed gives me (at least for this example) the desired ordering, which I do not understand neither.
I would be so thankful, if somebody could give me a hint what exactly is going on there and how I can sort lists as I wish (with a more complex boolean expression than only comparing < or >). A further question: The strings I am sorting (in my example) are actually keys from a HashMap m. Will any solution effect sorting m by its keys within
m.toSeq.sortWith((a: (String, String), b: (String, String)) => a._1.drop(1).toDouble < b._1.drop(1).toDouble && a._1.take(1) < b._1.take(1))
Many thanks in advanced!

Update: I misread your example—you want a1.2 to precede a1.102, which the toDouble versions below won't get right. I'd suggest the following instead:
items.sortBy { s =>
val Array(x, y) = s.tail.split('.')
(s.head, x.toInt, y.toInt)
}
Here we use Scala's Ordering instance for Tuple3[Char, Int, Int].
It looks like you have a typo in your second ("correct") version: b.take(2) should doesn't make sense, and should be b.take(1) to match the first. Once you fix that, you get the same (incorrect) ordering.
The real problem is that you only need the second condition in the case where the numbers match. So the following works as desired:
val items = List("a1.102", "b2.2", "b2.1", "a1.1")
items.sortWith((a, b) =>
a.head < b.head || (a.head == b.head && a.tail.toDouble < b.tail.toDouble)
)
I'd actually suggest the following, though:
items.sortBy(s => s.head -> s.tail.toDouble)
Here we take advantage of the fact that Scala provides an appropriate Ordering instance for Tuple2[Char, Double], so we can just provide a transformation function that turns your items into that type.
And to answer your last question: yes, either of these approaches should work just fine with your Map example.

Create a tuple containing the string before the "." and then the integer after the ".". This will use a lexicographic order for the first part and an order on the integer for the second part.
scala> val order = Ordering.by((s:String) => (s.split("\\.")(0),s.split("\\.")(1).toInt))
order: scala.math.Ordering[String] = scala.math.Ordering$$anon$7#384eb259
scala> res2
res8: List[java.lang.String] = List(a1.5, a2.2, b1.11, b1.8, a1.10)
scala> res2.sorted(order)
res7: List[java.lang.String] = List(a1.5, a1.10, a2.2, b1.8, b1.11)

So consider what happens when your sorting function is passed a="a1.1" and b="a1.102".
What you'd like is for the function to return true. However, a.take(1) < b.take(1) returns false, so the function returns false.
Think about your cases a bit more carefully
if the prefix is equal, and the tails are ordered properly, then the arguments are ordered properly
if the prefixes are not equal, then the arguments are ordered properly only if the prefixes are.
So try this instead:
(a: String, b: String) => if (a.take(1) == b.take(1)) a.drop(1).toDouble < b.drop(1).toDouble else a.take(1) < b.take(1)
And that returns the proper ordering:
scala> List("a1.102", "b2.2", "b2.1", "a1.1").sortWith((a: String, b: String) => if (a.take(1) == b.take(1)) a.drop(1).toDouble < b.drop(1).toDouble else a.take(1) < b.take(1))
res8: List[java.lang.String] = List(a1.1, a1.102, b2.1, b2.2)
The reason it worked for you with the reversed ordering was luck. Consider the extra input "c0" to see what was happening:
scala> List("c0", "a1.102", "b2.2", "b2.1", "a1.1").sortWith((a: String, b: String) => (a.drop(1).toDouble < b.drop(1).toDouble && a.take(1) < b.take(2)))
res1: List[java.lang.String] = List(c0, a1.1, a1.102, b2.1, b2.2)
The reversed function sorts on the numeric part of the string first, then on the prefix. It just so happens that your numeric ordering you gave also preserved the prefix ordering, but that won't always be the case.

Related

scala to check whether loop through all element in a vector when joining two vectors

I have 2 vectors as below.
val vecBase21=....sortBy(r=>(r._1,r._2))
vecBase21: scala.collection.immutable.Vector[(String, String, Double)] = Vector((036,20210624 0400,2.0), (036,20210624 0405,2.0), (036,20210624 0410,2.0), (036,20210624 0415,2.0), (036,20210624 0420,2.0),...)
val vecBase22=....sortBy(r=>(r._1,r._2))
vecBase22: scala.collection.immutable.Vector[(String, String, Double)] = Vector((036,20210625 0400,2.0), (036,20210625 0405,2.0), (036,20210625 0410,2.0), (036,20210625 0415,2.0), (036,20210625 0420,2.0),...)
Inside, x._1 is ID, x._2 is date time, and x._3 is value.Then I did this to create a 3rd vector as follow.
val vecBase30=vecBase21.map(x=>vecBase22.filter(y=>x._1==y._1 && x._2==y._2).map(y=>(x._1,x._2,x._3,y._3))).flatten
This is literally a join in SQL, a join b on a.id=b.id and a.date_time=b.date_time. It loops in vecBase22 to search one combination of ID and date_time from vecBase21. As each combination is unique in one vector and they are sorted, I want to find out whether the loop in vecBase22 stops once it finds a match or it loops till the end of vecBase22 anyway. I tried this
val vecBase30=vecBase21.map(x=>vecBase22.filter(y=>x._1==y._1 && x._2==y._2).map{y=>
println("x1="+x._1+" y1="+y._1+" x2="+x._2+" y2="+y._2)
(x._1,x._2,x._3,y._3)}).flatten
But it apparently gives only matched results. Is there a way of printing all combinations from two vectors that the machine evaluates whether there is a match?
As each combination is unique in one vector and they are sorted, I want to find out whether the loop in vecBase22 stops once it finds a match or it loops till the end of vecBase22 anyway
When you call filter on vecBase22 you loop through every element of that collection to see if it matches the predicate. This returns a new collection and passes it to the map function. If you want to short-circuit the filtering process you could consider using the method collectFirst (Scala 2.12):
def collectFirst[B](pf: PartialFunction[A, B]): Option[B]
Finds the first element of the traversable or iterator for which the given partial function is defined, and applies the partial function to it.
Note: may not terminate for infinite-sized collections.
Note: might return different results for different runs, unless the underlying collection type is ordered.
pf: the partial function
returns: an option value containing pf applied to the first value for which it is defined, or None if none exists.
Example:
Seq("a", 1, 5L).collectFirst({ case x: Int => x*10 }) = Some(10)
So you could do something like:
val vecBase30: Vector[(String, String, Double, Double)] = vecBase21
.flatMap(x => vecBase22.collectFirst {
case matched: (String, String, Double) if x._1 == matched._1 && x._2 == matched._2 => (x._1, x._2, x._3, matched._3)
})
First off: yes it loop through all items of vecBase22, for each item of vecBase21. That's what the map and filter do.
If the println doesn't work, it is probably because you are executing you code in an interpreter that lose the std out. Some notebook maybe?
Also, if you want it stop once it find a match, use Seq.find
Finally, you can improve readability. here is a couple of ideas:
use case class instead of tuple
add space around operator
add new lines before each monad operation if it doesn't fit one line
use flatMap instead of map followed by flatten
add val type (not necessary but it helps reading the code)
That gives:
case class Item(id: String, time: String, value: Double)
case class Joint(id: String, time: String, v1: Double, v2: Double)
val vecBase21: Vector[Item] = ....sortBy(item => (item.id, item.time))
val vecBase22: Vector[Item] = ....sortBy(item => (item.id, item.time))
val vecBase30: Vector[Joint] = vecBase21.flatMap( x =>
vecBase22
.filter( y => x.id == y.id && x.time == y.time)
.map( y => Joint(x.id, x.time, x.value, y.value))
)

How to search efficiently in a nested collection in a functional way

I'd like to find the indices (coordinates) of the first element whose value is 4, in a nested Vector of Int, in a functional way.
val a = Vector(Vector(1,2,3), Vector(4,5), Vector(3,8,4))
a.map(_.zipWithIndex).zipWithIndex.collect{
case (col, i) =>
col.collectFirst {
case (num, index) if num == 4 =>
(i, index)
}
}.collectFirst {
case Some(x) ⇒ x
}
It returns:
Some((0, 1))
the coordinate of the first 4 occurrence.
This solution is quite simple, but it has a performance penalty, because the nested col.collect is performed for all the elements of the top Vector, when we are only interested in the 1st match.
One possible solution is to write a guard in the pattern matching. But I don't know how to write a guard based in a slow condition, and return something that has already been calculated in the guard.
Can it be done better?
Recursive maybe?
If you insist on using Vectors, something like this will work (for a non-indexed seq, you'd need a different approach):
#tailrec
findit(
what: Int,
lists: IndexedSeq[IndexedSeq[Int]],
i: Int = 0,
j: Int = 0
): Option[(Int, Int)] =
if(i >= lists.length) None
else if(j >= lists(i).length) findit(what, lists, i+1, 0)
else if(lists(i)(j) == what) Some((i,j))
else findit(what, lists, i, j+1)
A simple thing you can to without changing the algorithm is to use Scala streams to be able to exit as soon as you find the match. Streams are lazily evaluated as opposed to sequences.
Just make a change similar to this
a.map(_.zipWithIndex.toStream).zipWithIndex.toStream.collect{ ...
In terms of algorithmic changes, if you can somehow have your data sorted (even before you start to search) then you can use Binary search instead of looking at each element.
import scala.collection.Searching._
val dummy = 123
implicit val anOrdering = new Ordering[(Int, Int, Int)]{
override def compare(x: (Int, Int, Int), y: (Int, Int, Int)): Int = Integer.compare(x._1, y._1)
}
val seqOfIntsWithPosition = a.zipWithIndex.flatMap(vectorWithIndex => vectorWithIndex._1.zipWithIndex.map(intWithIndex => (intWithIndex._1, vectorWithIndex._2, intWithIndex._2)))
val sorted: IndexedSeq[(Int, Int, Int)] = seqOfIntsWithPosition.sortBy(_._1)
val element = sorted.search((4, dummy, dummy))
This code is not very pretty or readable, I just quickly wanted to show an example of how it could be done.

Scala comparison error

I am trying to compare an item from a List of type Strings to an integer. I tried doing this but I get an error saying that:
'value < is not a member of List[Int]'
The line of code that compares is something similar to this:
if(csvList.map(x => x(0).toInt) < someInteger)
Besides the point of why this happens, I wondered why I didn't get an error
when I used a different type of comparison, such as ' == '.
So if I run the line:
if( csvList.map(x => x(0).toInt) == someInteger)
I don't get an error. Why is that?
Let's start with some introductions before answering the questions
Using the REPL you can understand a bit more what you are doing
scala> List("1", "2", "3", "33").map(x => x(0).toInt)
res1: List[Int] = List(49, 50, 51, 51)
The map function is used to transform every element, so x inside the map will be "1" the first time, "2" the second, and so on.
When you are using x(0) you are accessing the first character in the String.
scala> "Hello"(0)
res2: Char = H
As you see the type after you have mapped your strings is a List of Int. And you can compare that with an Int, but it will never be equals.
scala> List(1, 2, 3) == 5
res0: Boolean = false
This is very much like in Java when you try
"Hello".equals(new Integer(1));
If you want to know more about the reasons behind the equality problem you can check out Why has Scala no type-safe equals method?
Last but not least, you get an error when using less than because there is no less than in the List class.
Extra:
If you want to know if the second element in the list is smaller than 2 you can do
scala> val data = List("1", "10", "20")
data: List[String] = List(1, 10, 20)
scala> 5 < data(1).toInt
res2: Boolean = true
Although it is a bit strange, maybe you should transform the list of string is something a bit more typed like a case class and then do your business logic with a more clear data model.
You can refer to
Why == operator and equals() behave differently for values of AnyVal in Scala
Every class support operator ==, but may not support <,> these operators.
in your code
csvList.map(x => x(0).toInt)
it returns a List<int>, and application use it to compare with a int,
so it may process a implicit type conversion. Even the compiler doesn't report it as a error. Generally, it's not good to compare value with different types.
csvList.map(x => x(0).toInt) converts the entire csvList to a List[Int] and then tries to apply the operator < to List[Int] and someInteger which does not exist. This is essentially what the error message is saying.
There is no error for == since this operator exists for List though List[T] == Int will always return false.
Perhaps what you are trying to do is compare each item of the List to an Int. If that is the case, something like this would do:
scala> List("1","2","3").map(x => x.toInt < 2)
res18: List[Boolean] = List(true, false, false)
The piece of code csvList.map(x => x(0).toInt) actually returns a List[Int], that is not comparable with a integer (not sure what it would mean to say that List(1,2) < 3).
If you want to compare each element of the list to your number, making sure they are all inferior to it, you would actually write if(csvList.map(x => x.toInt).forall { _ < someInteger })

Why does Scala's indexOf (in List etc) return Int instead of Option[Int]?

I want to write really nice looking idiomatic Scala code list indexOf foo getOrElse Int.MaxValue but now I have to settle for idiotic Java looking code val result = list indexOf foo; if (result < 0) Int.MaxValue else result. Is there a good reason that indexOf in Scala returns Int instead of Option[Int]
It's for compatibility with Java and for speed. You'd have to double-box the answer (first in java.lang.Integer then Option) instead of just returning a negative number. This can take on the order of ten times longer.
You can always write something that will convert negative numbers to None and non-negative to Some(n) if it bothers you:
implicit class OptionOutNegatives(val underlying: Int) extends AnyVal {
def asIndex = if (underlying < 0) None else Some(underlying)
}
It doesn't have to be like that.
scala> "abcde" index 'c'
res0: psp.std.Index = 2
scala> "abcde" index 'z'
res1: psp.std.Index = -1
scala> "abcde" index 'z' match { case Index(n) => n ; case _ => MaxInt }
res2: Int = 2147483647
// Emphasizing that at the bytecode level we still return an Int - no boxing.
scala> :javap psp.std.SeqLikeExtensionOps
[...]
public abstract int index(A);
descriptor: (Ljava/lang/Object;)I
That's from psp-std, you can run "sbt console" and then the above.
To address the secondary question, squeezed between the primary and tertiary ones, there are other methods for doing things like processing indices and finding or collecting elements that satisfy a predicate.
scala> ('a' to 'z').zipWithIndex find (_._1 == 'k') map (_._2)
res6: Option[Int] = Some(10)
Usually you're doing something interesting with the element you find.

Scala - finding a specific tuple in a list

Let's say we have this list of tuples:
val data = List(('a', List(1, 0)), ('b', List(1, 1)), ('c', List(0)))
The list has this signature:
List[(Char, List[Int])]
My task is to get the "List[Int]" element from a tuple inside "data" whose key is, for instance, letter "b". If I implement a method like "findIntList(data, 'b')", then I expect List(1, 1) as a result. I have tried the following approaches:
data.foreach { elem => if (elem._1 == char) return elem._2 }
data.find(x=> x._1 == ch)
for (elem <- data) yield elem match {case (x, y: List[Bit]) => if (x == char) y}
for (x <- data) yield if (x._1 == char) x._2
With all the approaches (except Approach 1, where I employ an explicit "return"), I get either a List[Option] or List[Any] and I don't know how to extract the "List[Int]" out of it.
One of many ways:
data.toMap.get('b').get
toMap converts a list of 2-tuples into a Map from the first element of the tuples to the second. get gives you the value for the given key and returns an Option, thus you need another get to actually get the list.
Or you can use:
data.find(_._1 == 'b').get._2
Note: Only use get on Option when you can guarantee that you'll have a Some and not a None. See http://www.scala-lang.org/api/current/index.html#scala.Option for how to use Option idiomatic.
Update: Explanation of the result types you see with your different approaches
Approach 2: find returns an Option[List[Int]] because it can not guarantee that a matching element gets found.
Approach 3: here you basically do a map, i.e. you apply a function to each element of your collection. For the element you are looking for the function returns your List[Int] for all other elements it contains the value () which is the Unit value, roughly equivalent to void in Java, but an actual type. Since the only common super type of ´List[Int]´ and ´Unit´ is ´Any´ you get a ´List[Any]´ as the result.
Approach 4 is basically the same as #3
Another way is
data.toMap.apply('b')
Or with one intermediate step this is even nicer:
val m = data.toMap
m('b')
where apply is used implicitly, i.e., the last line is equivalent to
m.apply('b')
There are multiple ways of doing it. One more way:
scala> def listInt(ls:List[(Char, List[Int])],ch:Char) = ls filter (a => a._1 == ch) match {
| case Nil => List[Int]()
| case x ::xs => x._2
| }
listInt: (ls: List[(Char, List[Int])], ch: Char)List[Int]
scala> listInt(data, 'b')
res66: List[Int] = List(1, 1)
You can try something like(when you are sure it exists) simply by adding type information.
val char = 'b'
data.collect{case (x,y:List[Int]) if x == char => y}.head
or use headOption if your not sure the character exists
data.collect{case (x,y:List[Int]) if x == char => y}.headOption
You can also solve this using pattern matching. Keep in mind you need to make it recursive though. The solution should look something like this;
def findTupleValue(tupleList: List[(Char, List[Int])], char: Char): List[Int] = tupleList match {
case (k, list) :: _ if char == k => list
case _ :: theRest => findTupleValue(theRest, char)
}
What this will do is walk your tuple list recursively. Check whether the head element matches your condition (the key you are looking for) and then returns it. Or continues with the remainder of the list.