Scala Pattern matching leading to two outputs - scala

I am matching a text file which has some columns [String Double Double Double Double]. I would like to obtain the following for each row of the file [String Double Double] and [String Double Double] wherein the String is the label same for both but I am splitting the first two doubles and last two doubles into two independent rows.
I am using the following which is not working:
val out = Source.fromFile(filename).getLines.collect(_.split("\\s+").toList match {
case s1 :: points1 :: points2 => (s1,"4",Point(points1.map(_.toDouble).toIndexedSeq))
=> (s1,"6",Point(points2.map(_.toDouble).toIndexedSeq))
My doubles are co-ordinates of points.

First of all points1 matches your second column and points2 rest columns.
It's because in :: notation left side is first element of list (head), but second is rest sublist (tail).
It may be easy to decompose row to list of all columns like this:
... match {
case s1 :: p1x :: p1y :: p2x :: p2y :: Nil =>
Then you can compose it again to two rows, placing them in two-element list:
=> List( (s1,"4",Point(Vector(p1x,p1y).map(_.toDouble))),
(s1,"6",Point(Vector(p2x,p2y).map(_.toDouble))) )
But then, in result you'll have a List[List[..]], so you need to flatten it. Simplest way is to use flatMap instead of collect.
So your full code will look like this:
val out = Source.fromFile(filename).getLines.flatMap(_.split("\\s+").toList match {
case s1 :: p1x :: p1y :: p2x :: p2y :: Nil =>
List( (s1,"4",Point(Vector(p1x,p1y).map(_.toDouble))),
(s1,"6",Point(Vector(p2x,p2y).map(_.toDouble))) )
})

Related

calculate cosine similarity in scala

I have a file (tags.csv) that contains UserId, MovieId,tags.I want to use a domain-based method to calculate the cosine similarity between tags. I want to show the relevant tags for comedy only and measure similarity for each tag relevant to the comedy tag.
dataset
My code is:
val rows = sc.textFile("/usr/local/comedy")
val vecData = rows.map(line => Vectors.dense(line.split(", ").map(_.toDouble)))
val mat = new RowMatrix(vecData)
val exact = mat.columnSimilarities()
val approx = mat.columnSimilarities(0.07)
val exactEntries = exact.entries.map { case MatrixEntry(i, j, u) => ((i, j), u) }
val approxEntries = approx.entries.map { case MatrixEntry(i, j, v) => ((i, j), v) }
val MAE = exactEntries.leftOuterJoin(approxEntries).values.map {
case (u, Some(v)) =>
math.abs(u - v)
case (u, None) =>
math.abs(u)
}.mean()
but this error appear:
java.lang.NumberFormatException: For input string: "[1,898,"black comedy"]"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
What's wrong?
The error message is full of pertinent info.
NumberFormatException: For input string: "[1,898,"black comedy"]"
It looks like the input String isn't being split into separate column data. So .split(", ") isn't doing its job and it's easy to see why, there are no comma-space sequences to split on.
We could take out the space and split on just the comma but that would still leave a non-digit [ in the 1st column data and the 3rd column data has no digit characters at all.
There are a few different ways to attack this. I'd be tempted to use a regex parser.
val twoNums = "(\\d+),(\\d+),".r.unanchored
val vecData = rows.collect{ case twoNums(a, b) =>
Vectors.dense(Array(a.toDouble, b.toDouble))
}

Second Element of a List

From the Book programming in Scala I got the following line of code:
val second: List[ Int] => Int = { case x :: y :: _ => y }
//warning: match may not be exhaustive.
It states that this function will return the second element of a list of integers if the list is not empty or nil. Stil this part is a bit awkward to me:
case x :: y :: _
How does this ecxactly work? Does this mathches any list with at least 2 Elements and than return the second? If so can somebody still explain the syntax? I understood that :: is invoked on the right operand. So it could be written as
(_.::(y)).::(X)
Still I than don't get why this would return 2
val second: List[ Int] => Int = { case x :: y :: _ => y }
var x = List(1,2)
second(x) //returns 2
In the REPL, you can type:
scala> val list = "a" :: "b" :: Nil
list: List[String] = List(a, b)
which is to be read from right to left, and means take the end of a List (Nil), prepend String "b" and to this List ("b" :: Nil) prepend String a, a :: ("b" :: Nil) but you don't need the parens, so it can be written "a" :: "b" :: Nil.
In pattern matching you will more often see:
... list match {
case Nil => // ...
case x :: xs => // ...
}
to distinguish between empty list, and nonempty, where xs might be a rest of list, but matches Nil too, if the whole list is ("b" :: Nil) for example, then x="b" and xs=Nil.
But if list= "a" :: "b" :: Nil, then x="a" and xs=(b :: Nil).
In your example, the deconstruction is just one more step, and instead of a name like xs, the joker sign _ is used, indicating, that the name is probably not used and doesn't play a role.
The value second is of function type, it takes List[Int] and returns Int.
If the list has first element ("x"), and a second element ("y"), and whatever comes next (we don't care about it), we simply return the element "y" (which is the second element of the list).
In any other case, the function is not defined. You can check that:
scala> val second: PartialFunction[List[Int], Int] = {
| case x :: y :: _ => y
| }
second: PartialFunction[List[Int],Int] = <function1>
scala> second.isDefinedAt(List(1,2,3))
res18: Boolean = true
scala> second.isDefinedAt(List(1,2))
res19: Boolean = true
scala> second.isDefinedAt(List(0))
res20: Boolean = false
First of all. When you think about pattern matching you should think about matching a structure.
The first part of the case statement describes a structure. This structure may describe one or more things (variables) which are useful to deriving your result.
In your example, you are interested in deriving the second element of a list. A shorthand to build a list in Scala is to use :: method (also called cons). :: can also be used to describe a structure in case statement. At this time, you shouldn't think about evaluation of the :: method in first part of case. May be that's why you are saying about evaluation of _.::(y).::(x). The :: cons operator help us describe the structure of the list in terms of its elements. In this case, the first element (x) , the second element (y) and the rest of it (_ wildcard). We are interested in a structure that is a list with at least 2 elements and the third can be anything - a Nil to indicate end of list or another element - hence the wildcard.
The second part of the case statement, uses the second element to derive the result (y).
More on List and Consing
List in Scala is similar to a LinkedList. You know about the first element called head and start of the rest of the list. When traversing the linked list you stop if the rest of the list is Nil. This :: cons operator helps us visualise the structure of the linked list. Although Scala compile would actually be calling :: methods evaluating from right to left as you described _.::(y).::(x)
As an aside, you might have already noticed that the Scala compiler might be complain that your match isn't exhaustive. This means that this second method would work for list of any size. Because there isn't any case statement to describe list with zero or one element. Also, as mentioned in comments of previous answers, if you aren't interested in first element you can describe it as a wildcard _.
case _ :: y :: _ => y
I hope this helped.
If you see the structure of list in scala its head::tail, first element is treated as head and all remaining ones as tail(Nil will be the last element of tail). whenever you do x::y::_, x will match the head of the list and remaining will be tail and again y will match the head of the next list(tail of first list)
eg:
val l = List(1,2,3,4,5)
you can see this list in differnt ways:
1::2::3::4::5::Nil
1::List(2,3,4,5)
1::2::List(2,3,4,5)
and so on
So try matching the pattern. In your question y will give the second element

How to assign a name to intermediate pattern of a List?

This code don't compile, what am I doing wrong? is it possible to do it?
How can I pattern match a list with at least 2 elements, and have the pattern have a variable for the tail (meaning y :: _)
I know it's possible desugaring the :: or with a simple if. But without desugaring and without if... it's possible?
val list:List[Int] = ...
list match {
case x :: tail#(y:: _) =>
}
Try if this code works for you:
list match {
case x :: (tail#(y :: _)) =>
}
You use another variable to hold the second element:
list match {
case x :: y :: _ =>
}
This will only match a list with at least two elements, will bind x to the first element, y to the second element and ignore the rest.
If you need to just ensure the remainder of the list is at least 1 long, then
list match {
case x :: y if y.size > 0 =>
}
will do the job.

How to read an element from a Scala HList?

There is very few readable documentation about HLists, and the answers I can find on SO come from outer space for a humble Scala beginner.
I encountered HLists because Slick can auto-generate some to represent database rows. They are slick.collection.heterogeneous.HList (not shapeless').
Example:
type MyRow = HCons[Int,HCons[String,HCons[Option[String],HCons[Int,HCons[String,HCons[Int,HCons[Int,HCons[Option[Int],HCons[Option[Float],HCons[Option[Float],HCons[Option[String],HCons[Option[String],HCons[Boolean,HCons[Option[String],HCons[Option[String],HCons[Option[String],HCons[Option[String],HCons[Option[String],HCons[Option[Int],HCons[Option[Float],HCons[Option[Float],HCons[Option[Float],HCons[Option[String],HCons[Option[String],HNil]]]]]]]]]]]]]]]]]]]]]]]]
def MyRow(a, b, c, ...): MyRow = a :: b :: c :: ... :: HNil
Now given one of these rows, I'd need to read one element, typed if possible. I just can't do that. I tried
row(4) // error
row._4 // error
row.toList // elements are inferred as Any
row match { case a :: b :: c :: x :: rest => x } // "Pattern type is incompatible. Expected MyRow."
row match { case MyRow(_,_,_,_,_,x,...) => x } // is not a case class like other rows
row match { HCons[Int,HCons[String,HCons[Option[String],HCons[Int,HCons[String, x]]]]] => x.head } // error
row.tail.tail.tail.tail.head // well, is that really the way??
Could somebody please explain how I can extract a specific value from that dinosaur?
I'd expect your row(0) lookup to work based on the HList API doc for apply. Here's an example I tried with Slick 3.1.1:
scala> import slick.collection.heterogeneous._
import slick.collection.heterogeneous._
scala> import slick.collection.heterogeneous.syntax._
import slick.collection.heterogeneous.syntax._
scala> type MyRow = Int :: String :: HNil
defined type alias MyRow
scala> val row: MyRow = 1 :: "a" :: HNil
row: MyRow = 1 :: a :: HNil
scala> row(0) + 99
res1: Int = 100
scala> val a: String = row(1)
a: String = a
Just one thing... if it is not too important than just stick to HList as the type. Do not alias it to MyRow unless necessary.
So.. you had
val row = a :: b :: c :: ... :: HNil
How about this ?
val yourX = row match { case a :: b :: c :: x ::: rest => x }
notice that ::: instead of :: at the end.
Or... how about this,
val yourX = row.tail.tail.tail.head
// this may change a little if you had,
def MyRow(a, b, c, ...): MyRow = a :: b :: c :: ... :: HNil
val row = MyRow(a, b, c, ...)
val yourX = row.asInstanceOf[HList].tail.tail.tail.head

Pre- and Append to a List

Using scala, I try to concatenate multiple elements to a list as follows
val min = func1()
val max = func1()
val interpol : List[Float] = func2()
val res : List[Float] = (min.toFloat) :: interpolated :: (max.toFloat) :: Nil
This syntax does not work because of a type mismatch error. How could I pre- and append elements to a list (in a very elegant way, i.e., without using list buffers, etc.)?
Btw, I also tried
val res : List[Float] = (min.toFloat) :: interpolated :: List(max.toFloat)
but got a type mismatch error (List[Any] vs List[Float])
The Peter Neyens solution works fine.
Personally, i prefer this one
min.toFloat +: interpolated :+ max.toFloat
+: and :+ are defines in Seq, so works not only for List, but for Vector too
You are prepending the min.toFloat to the interpolated list, but you can't prepend the resulting list to the list with the maximum you have created (max.toFloat :: Nil), you will need ::: to concatenate these two lists.
(min.toFloat) :: interpolated ::: ((max.toFloat) :: Nil)