Type Mismatch in scala case match - scala

Trying to create multiple dataframes in a single foreach, using spark, as below
I get values delivery and click out of row.getAs("type"), when I try to print them.
val check = eachrec.foreach(recrd => recrd.map(row => {
row.getAs("type") match {
case "delivery" => val delivery_data = delivery(row.get(0).toString,row.get(1).toString)
case "click" => val click_data = delivery(row.get(0).toString,row.get(1).toString)
case _ => "not sure if this impacts"
}})
)
but getting below error:
Error:(41, 14) type mismatch; found : String("delivery") required: Nothing
case "delivery" => val delivery_data = delivery(row.get(0).toString,row.get(1).toString)
^
My plan is to create dataframe using todf() once I create these individual delivery objects referenced by delivery_data and click_data by:
delivery_data.toDF() and click_data.toDF().
Please provide any clue regarding the error above (in match case).
How can I create two df's using todf() in val check?

val declarations make your first 2 cases return type to be unit, but in the third case you return a String
for instance, here the z type was inferred by the compiler, Unit:
def x = {
val z: Unit = 3 match {
case 2 => val a = 2
case _ => val b = 3
}
}

I think you need to cast this match clause to String.
row.getAs("type").toString

Related

Distinguish union type with type parameter in Scala 3

In Scala 3, given such a type alias:
type Test[A] = A | Option[A]
val first: Test[String] = "gandalf"
val second: Test[String] = None
val third: Test[String] = Some("sam")
how can i distinguish which one of the two cases I have (A or Option[A])?
When trying to write code, I think the main problem is that for the compiler and type, including Either[String, B], matches A`. I don't even know if there is an idiomatic solution for my problem.
This is my best attemp, any one other failed:
def get[A](input: Test[A]): A =
input match
case Some(value) => value.asInstanceOf[A]
case None => throw new RuntimeException("Error!")
case _ => input.asInstanceOf[A]
get(first) // "gandalf"
get(second) // Exception!!!
get(third) // sam"
as you can see value is not recognized to be the same type as the type parameter A and the A case at the bottom can't really be checked at runtime (trying case a: A => gives a compilation error) so I catch it by exclusion from the other two. This doesn't look like an optimal solution.

Convert Seq[Try[Option(String, Any)]] into Try[Option[Map[String, Any]]]

How to conveniently convert Seq[Try[Option[String, Any]]] into Try[Option[Map[String, Any]]].
If any Try before convert throws an exception, the converted Try should throw as well.
Assuming that the input type has a tuple inside the Option then this should give you the result you want:
val in: Seq[Try[Option[(String, Any)]]] = ???
val out: Try[Option[Map[String,Any]]] = Try(Some(in.flatMap(_.get).toMap))
If any of the Trys is Failure then the outer Try will catch the exception raised by the get and return Failure
The Some is there to give the correct return type
The get extracts the Option from the Try (or raises an exception)
Using flatMap rather than map removes the Option wrapper, keeping all Some values and discaring None values, giving Seq[(String, Any)]
The toMap call converts the Seq to a Map
Here is something that's not very clean but may help get you started. It assumes Option[(String,Any)], returns the first Failure if there are any in the input Seq and just drops None elements.
foo.scala
package foo
import scala.util.{Try,Success,Failure}
object foo {
val x0 = Seq[Try[Option[(String, Any)]]]()
val x1 = Seq[Try[Option[(String, Any)]]](Success(Some(("A",1))), Success(None))
val x2 = Seq[Try[Option[(String, Any)]]](Success(Some(("A",1))), Success(Some(("B","two"))))
val x3 = Seq[Try[Option[(String, Any)]]](Success(Some(("A",1))), Success(Some(("B","two"))), Failure(new Exception("bad")))
def f(x: Seq[Try[Option[(String, Any)]]]) =
x.find( _.isFailure ).getOrElse( Success(Some(x.map( _.get ).filterNot( _.isEmpty ).map( _.get ).toMap)) )
}
Example session
bash-3.2$ scalac foo.scala
bash-3.2$ scala -classpath .
Welcome to Scala 2.13.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_66).
Type in expressions for evaluation. Or try :help.
scala> import foo.foo._
import foo.foo._
scala> f(x0)
res0: scala.util.Try[Option[Equals]] = Success(Some(Map()))
scala> f(x1)
res1: scala.util.Try[Option[Equals]] = Success(Some(Map(A -> 1)))
scala> f(x2)
res2: scala.util.Try[Option[Equals]] = Success(Some(Map(A -> 1, B -> two)))
scala> f(x3)
res3: scala.util.Try[Option[Equals]] = Failure(java.lang.Exception: bad)
scala> :quit
If you're willing to use a functional support library like Cats then there are two tricks that can help this along:
Many things like List and Try are traversable, which means that (if Cats's implicits are in scope) they have a sequence method that can swap two types, for example converting List[Try[T]] to Try[List[T]] (failing if any of the items in the list are failure).
Almost all of the container types support a map method that can operate on the contents of a container, so if you have a function from A to B then map can convert a Try[A] to a Try[B]. (In Cats language they are functors but the container-like types in the standard library generally have map already.)
Cats doesn't directly support Seq, so this answer is mostly in terms of List instead.
Given that type signature, you can iteratively sequence the item you have to in effect push the list type down one level in the type chain, then map over that container to work on its contents. That can look like:
import cats.implicits._
import scala.util._
def convert(listTryOptionPair: List[Try[Option[(String, Any)]]]): Try[
Option[Map[String, Any]]
] = {
val tryListOptionPair = listTryOptionPair.sequence
tryListOptionPair.map { listOptionPair =>
val optionListPair = listOptionPair.sequence
optionListPair.map { listPair =>
Map.from(listPair)
}
}
}
https://scastie.scala-lang.org/xbQ8ZbkoRSCXGDJX0PgJAQ has a slightly more complete example.
One way to approach this is by using a foldLeft:
// Let's say this is the object you're trying to convert
val seq: Seq[Try[Option[(String, Any)]]] = ???
seq.foldLeft(Try(Option(Map.empty[String, Any]))) {
case (acc, e) =>
for {
accOption <- acc
elemOption <- e
} yield elemOption match {
case Some(value) => accOption.map(_ + value)
case None => accOption
}
}
You start off with en empty Map. You then use a for comprehension to go through the current map and element and finally you add a new tuple in the map if present.
The following solutions is based on this answer to the point that almost makes the question a duplicate.
Method 1: Using recursion
def trySeqToMap1[X,Y](trySeq : Seq[Try[Option[(X, Y)]]]) : Try[Option[Map[X,Y]]] = {
def helper(it : Iterator[Try[Option[(X,Y)]]], m : Map[X,Y] = Map()) : Try[Option[Map[X,Y]]] = {
if(it.hasNext) {
val x = it.next()
if(x.isFailure)
Failure(x.failed.get)
else if(x.get.isDefined)
helper(it, m + (x.get.get._1-> x.get.get._2))
else
helper(it, m)
} else Success(Some(m))
}
helper(trySeq.iterator)
}
Method 2: directly pattern matching in case you are able to get a stream or a List instead:
def trySeqToMap2[X,Y](trySeq : LazyList[Try[Option[(X, Y)]]], m : Map[X,Y]= Map.empty[X,Y]) : Try[Option[Map[X,Y]]] =
trySeq match {
case Success(Some(h)) #:: tail => trySeqToMap2(tail, m + (h._1 -> h._2))
case Success(None) #:: tail => tail => trySeqToMap2(tail, m)
case Failure(f) #:: _ => Failure(f)
case _ => Success(Some(m))
}
note: this answer was previously using different method signatures. It has been updated to conform to the signature given in the question.

Hashing multiple columns of spark dataframe

I need to hash specific columns of spark dataframe. Some columns have specific datatype which are basically the extensions of standard spark's DataType class. The problem is because for some reason in when case some conditions don't work as expected.
As a hash table I have a map. Let's call it tableConfig:
val tableConfig = Map("a" -> "KEEP", "b" -> "HASH", "c" -> "KEEP", "d" -> "HASH", "e" -> "KEEP")
The salt variable is used to concatenate with column:
val salt = "abc"
The function for hashing looks like this:
def hashColumns(tableConfig: Map[String, String], salt: String, df: DataFrame): DataFrame = {
val removedColumns = tableConfig.filter(_._2 == "REMOVE").keys.toList
val hashedColumns = tableConfig.filter(_._2 == "HASH").keys.toList
val cleanedDF = df.drop(removedColumns: _ *)
val colTypes = cleanedDF.dtypes.toMap
def typeFromString(s: String): DataType = s match {
case "StringType" => StringType
case "BooleanType" => BooleanType
case "IntegerType" => IntegerType
case "DateType" => DateType
case "ShortType" => ShortType
case "DecimalType(15,7)" => DecimalType(15,7)
case "DecimalType(18,2)" => DecimalType(18,2)
case "DecimalType(11,7)" => DecimalType(11,7)
case "DecimalType(17,2)" => DecimalType(17,2)
case "DecimalType(38,2)" => DecimalType(38,2)
case _ => throw new TypeNotPresentException(
"Please check types in the dataframe. The following column type is missing: ".concat(s), null
)
}
val getType = colTypes.map{case (k, _) => (k, typeFromString(colTypes(k)))}
val hashedDF = cleanedDF.columns.foldLeft(cleanedDF) {
(memoDF, colName) =>
memoDF.withColumn(
colName,
when(col(colName).isin(hashedColumns: _*) && col(colName).isNull, null).
when(col(colName).isin(hashedColumns: _*) && col(colName).isNotNull,
sha2(concat(col(colName), lit(salt)), 256)).otherwise(col(colName)
)
)
}
hashedDF
}
I am getting error regarding to specific column. Namely the error is the following:
org.apache.spark.sql.AnalysisException: cannot resolve '(c IN ('a', 'b', 'd', 'e'))' due to data type mismatch: Arguments must be same type but were: boolean != string;;
Column names are changed.
My search didn't give any clear explanation why isin or isNull functions don't work as expected. Also I follow specific style of implementation and want to avoid the following approaches:
1) No UDFs. They are painful for me.
2) No for loops over spark dataframe columns. The data could contain more than billion samples and it's going to be a headache in terms of performance.
As mentioned in the comments the first FIX should be to remove the condition col(colName).isin(hashedColumns: _*) && col(colName).isNull since this check will be always false.
As for the error, it is because of the mismatch between value type of col(colName) and hashedColumns. The value of hashedColumns is always a string therefore col(colName) should be a string as well but in your case it seems to be a Boolean.
The last issue that I see here it has to do with the logic of the foldLeft. If I understood correctly what you want to achieve is to go through the columns and apply sha2 for those that exist in hashedColumns. To achieve that you must change your code to:
// 1st change: Convert each element of hashedColumns from String to Spark col
val hashArray = hashedColumns.map(lit(_))
val hashedDF = cleanedDF.columns.foldLeft(cleanedDF) {
(memoDF, colName) =>
memoDF.withColumn(
colName,
// 2nd.change: check if colName is in "a", "b", "c", "d" etc, if so apply sha2 otherwise leave the value as it is
when(col(colName).isNotNull && array_contains(array(hashArray:_*), lit(colName)) ,
sha2(concat(col(colName), lit(salt)), 256)
)
)
}
UPDATE:
Iterating through all the columns via foldLeft wouldn't be efficient and adds extra overhead, even more when you have large number of columns (see discussion with #baitmbarek below) I added one more approach instead of foldLeft using single select. In the next code when is applied only for the hashedColumns. We separate the columns into nonHashedCols and transformedCols then we concatenate the list and pass it to the select:
val transformedCols = hashedColumns.map{ c =>
when(col(c).isNotNull , sha2(concat(col(c), lit(salt)), 256)).as(c)
}
val nonHashedCols = (cleanedDF.columns.toSet -- hashedColumns.toSet).map(col(_)).toList
cleanedDF.select((nonHashedCols ++ transformedCols):_*)
(Posted a solution on behalf of the question author, to move it from the question to the answer section).
#AlexandrosBiratsis gave very good solution in terms of performance and elegant implementation. So the hashColumns function will look like this:
def hashColumns(tableConfig: Map[String, String], salt: String, df: DataFrame): DataFrame = {
val removedCols = tableConfig.filter(_._2 == "REMOVE").keys.toList
val hashedCols = tableConfig.filter(_._2 == "HASH").keys.toList
val cleanedDF = df.drop(removedCols: _ *)
val transformedCols = hashedCols.map{ c =>
when(col(c).isNotNull , sha2(concat(col(c), lit(salt)), 256)).as(c)
}
val nonHashedCols = (cleanedDF.columns.toSet -- hashedCols.toSet).map(col(_)).toList
val hashedDF = cleanedDF.select((nonHashedCols ++ transformedCols):_*)
hashedDF
}

Why does joining two datasets and applying filter cause "error: constructor cannot be instantiated to expected type"?

I am joining two datasets - first one coming from stream and second one which is in HDFS.
I am using scala in spark. After joining the two datasets , I need to apply filter on the joined datasets, but here I am facing as issue. Please assist to resolve.
I am using the code below,
val streamkv = streamrecs.map(_.split("~")).map(r => ( r(0), (r(5), r(6))))
val HDFSlines = sc.textFile("/user/Rest/sample.dat").map(_.split("~")).map(r => ( r(1), (r(0) r(3),r(4),)))
val streamwindow = streamkv.window(Minutes(1))
val join1 = streamwindow.transform(joinRDD => { joinRDD.join(HDFSlines)} )
I am getting the following error, when I use the filter
val tofilter = join1.filter {
| case (r(0), (r(5), r(6)),(r(0),r(3),r(4))) =>
| r(4).contains("iPhone")
| }.count()
<console>:48: error: constructor cannot be instantiated to expected type;
found : (T1, T2, T3)
required: (String, ((String, String), (String, String, String)))
case (r(0), (r(5), r(6)),(r(0),r(3),r(4))) =>
In scala (which is a typed language FYI), the pattern matching (case) allows to 'extract' variables in your block for local usage. Therefore
case (r(0), (r(5), r(6)),(r(0),r(3),r(4))) =>
is illegal, cause you're not calling r functions here, but extracting the values (think function declaration).
Assuming the objects in your resulting stream/collection join1 follows this type signature (_, (_, _), (_, _, String)), you should try :
val tofilter = join1.filter {
| case (_, (_, _),(_,_,device)) =>
| device.contains("iPhone")
| }.count()
The reason for the error is a missing pair of round brackets.
The line with case (r(0), (r(5), r(6)),(r(0),r(3),r(4))) should rather be case (_, ((_, _),(_,_,device))) => - note the brackets around the last two elements - Tuple2 and Tuple3 objects.

How to create a List of Wildcard elements Scala

I'm trying to write a function that returns a list (for querying purposes) that has some wildcard elements:
def createPattern(query: List[(String,String)]) = {
val l = List[(_,_,_,_,_,_,_)]
var iter = query
while(iter != null) {
val x = iter.head._1 match {
case "userId" => 0
case "userName" => 1
case "email" => 2
case "userPassword" => 3
case "creationDate" => 4
case "lastLoginDate" => 5
case "removed" => 6
}
l(x) = iter.head._2
iter = iter.tail
}
l
}
So, the user enters some query terms as a list. The function parses through these terms and inserts them into val l. The fields that the user doesn't specify are entered as wildcards.
Val l is causing me troubles. Am I going the right route or are there better ways to do this?
Thanks!
Gosh, where to start. I'd begin by getting an IDE (IntelliJ / Eclipse) which will tell you when you're writing nonsense and why.
Read up on how List works. It's an immutable linked list so your attempts to update by index are very misguided.
Don't use tuples - use case classes.
You shouldn't ever need to use null and I guess here you mean Nil.
Don't use var and while - use for-expression, or the relevant higher-order functions foreach, map etc.
Your code doesn't make much sense as it is, but it seems you're trying to return a 7-element list with the second element of each tuple in the input list mapped via a lookup to position in the output list.
To improve it... don't do that. What you're doing (as programmers have done since arrays were invented) is to use the index as a crude proxy for a Map from Int to whatever. What you want is an actual Map. I don't know what you want to do with it, but wouldn't it be nicer if it were from these key strings themselves, rather than by a number? If so, you can simplify your whole method to
def createPattern(query: List[(String,String)]) = query.toMap
at which point you should realise you probably don't need the method at all, since you can just use toMap at the call site.
If you insist on using an Int index, you could write
def createPattern(query: List[(String,String)]) = {
def intVal(x: String) = x match {
case "userId" => 0
case "userName" => 1
case "email" => 2
case "userPassword" => 3
case "creationDate" => 4
case "lastLoginDate" => 5
case "removed" => 6
}
val tuples = for ((key, value) <- query) yield (intVal(key), value)
tuples.toMap
}
Not sure what you want to do with the resulting list, but you can't create a List of wildcards like that.
What do you want to do with the resulting list, and what type should it be?
Here's how you might build something if you wanted the result to be a List[String], and if you wanted wildcards to be "*":
def createPattern(query:List[(String,String)]) = {
val wildcard = "*"
def orElseWildcard(key:String) = query.find(_._1 == key).getOrElse("",wildcard)._2
orElseWildcard("userID") ::
orElseWildcard("userName") ::
orElseWildcard("email") ::
orElseWildcard("userPassword") ::
orElseWildcard("creationDate") ::
orElseWildcard("lastLoginDate") ::
orElseWildcard("removed") ::
Nil
}
You're not using List, Tuple, iterator, or wild-cards correctly.
I'd take a different approach - maybe something like this:
case class Pattern ( valueMap:Map[String,String] ) {
def this( valueList:List[(String,String)] ) = this( valueList.toMap )
val Seq(
userId,userName,email,userPassword,creationDate,
lastLoginDate,removed
):Seq[Option[String]] = Seq( "userId", "userName",
"email", "userPassword", "creationDate", "lastLoginDate",
"removed" ).map( valueMap.get(_) )
}
Then you can do something like this:
scala> val pattern = new Pattern( List( "userId" -> "Fred" ) )
pattern: Pattern = Pattern(Map(userId -> Fred))
scala> pattern.email
res2: Option[String] = None
scala> pattern.userId
res3: Option[String] = Some(Fred)
, or just use the map directly.