I want to know how to replace x._1._2, x._1._3 by the name of the field using case class
def keyuid(l:Array[String]) : (String,Long,String) ={
//val l=s.split(",")
val ip=l(3).split(":")(1)
val values=Array("",0,0,0)
val uid=l(1).split(":")(1)
val timestamp=l(2).split(":")(1).toLong*1000
val impression=l(4).split(":")(1)
return (uid,timestamp,ip)
}
val cli_ip = click.map(_.split(","))
.map(x => (keyuid(x), 1.0)).assignAscendingTimestamps(x=>x._1._2)
.keyBy(x => x._1._3)
.timeWindow(Time.seconds(10))
.sum(1)
Use Scala pattern matching when writing lambda functions using curly braces and case keyword.
val cli_ip = click.map(_.split(","))
.map(x => (keyuid(x), 1.0)).assignAscendingTimestamps {
case ((_, timestamp, _), _) => timestamp
}
.keyBy { elem => elem match {
case ((_, _, ip), _) => ip
}
}
.timeWindow(Time.seconds(10))
.sum(1)
More information on Tuples and their pattern matching syntax here: https://docs.scala-lang.org/tour/tuples.html
Pattern Matching is indeed a good idea and makes the code more readable.
To answer your question, to make keyuid function returns a case class, you first need to define it, for instance :
case class Click(string uid, long timestamp, string ip)
Then instead of return (uid,timestamp,ip) you need to do return Click(uid,timestamp,ip)
Case class are not related to flink, but to scala : https://docs.scala-lang.org/tour/case-classes.html
Related
I have a few vals that match for matching values
Here is an example:
val job_ = Try(jobId.toInt) match {
case Success(value) => jobs.findById(value).map(_.id)
.getOrElse( Left(WrongValue("jobId", s"$value is not a valid job id")))
case Failure(_) => jobs.findByName(jobId.toString).map(_.id)
.getOrElse( Left(WrongValue("jobId", s"'$jobId' is not a known job title.")))
}
// Here the value arrives as a string e.i "yes || no || true || or false" then converted to a boolean
val bool_ = bool.toLowerCase() match {
case "yes" => true
case "no" => false
case "true" => true
case "false" => false
case other => Left(Invalid("bool", s"wrong value received"))
}
Note: invalid case is case class Invalid(x: String, xx: String)
above i'm looking for a given job value and checking whether it exist in the db or not,
No I have a few of these and want to add to a list, here is my list val and flatten it:
val errors = List(..all my vals errors...).flatten // <--- my_list_val (how do I include val bool_ and val job_)
if (errors.isEmpty) { do stuff }
My result should contain errors from val bool_ and val job_
THANK!
You need to fix the types first. The type of bool_ is Any. Which does not give you something you can work with.
If you want to use Either, you need to use it everwhere.
Then, the easiest approach would be to use a for comprehension (I am assuming you're dealing with Either[F, T] here, where WrongValue and Invalid are both sub-classes of F and you're not really interested in the errors).
for {
foundJob <- job_
_ <- bool_
} yield {
// do stuff
}
Note, that in Scala >= 2.13 you can use toIntOption when converting the String to Int:
vaj job_: Either[F, T] = jobId.toIntOption match {
case Some(value) => ...
case _ => ...
}
Also, in case expressions, you can use alternatives when you have the same statement for several cases:
val bool_: Either[F, Boolean] = bool.toLowerCase() match {
case "yes" | "true" => Right(true)
case "no" | "false" => Right(false)
case other => Left(Invalid("bool", "wrong value received"))
}
So, according to your question, and your comments, these are the types you're dealing with.
type ID = Long //whatever id is
def WrongValue(x: String, xx: String) :String = "?-?-?"
case class Invalid(x: String, xx: String)
Now let's create a couple of error values.
val job_ :Either[String,ID] = Left(WrongValue("x","xx"))
val bool_ :Either[Invalid,Boolean] = Left(Invalid("x","xx"))
To combine and report them you might do something like this.
val errors :List[String] =
List(job_, bool_).flatMap(_.swap.toOption.map(_.toString))
println(errors.mkString(" & "))
//?-?-? & Invalid(x,xx)
After checking types as #cbley explained. You can just do a filter operation with pattern matching on your list:
val error = List(// your variables ).filter(_ match{
case Left(_) => true
case _ => false
})
The code below I get a value from kafka as an array of bytes. ToJobEvent converts those bytes into Option[JobEvent], then I want to filter the Nones out of the JobEvent then finally extract the JobEvent out of the Maybe monad. What is the proper way to do this in scala spark?
val jobEventDS = kafkaDS
.select($"value".as[Array[Byte]])
.map(binaryData => FromThrift.ToJobEvent(binaryData))
.filter(MaybeJobEvent => MaybeJobEvent match {
case Some(_) => true
case None => false
}).map {
case Some(jobEvent) => jobEvent
case None => null
}
The above code doesn't work. Just an example I want to get working.
The first option is to use flatMap
df.select($"value".as[Array[Byte]])
.flatMap(binaryData => FromThrift.ToJobEvent(binaryData) match {
case Some(value) => Seq(value)
case None => Seq()
}
})
The second is to use Tuple1 as a holder
df.select($"value".as[Array[Byte]])
.map(binaryData => {
BinaryHolder(binaryData).toCaseClassMonad() match {
case Some(value) => Tuple1(value)
case None => Tuple1(null)
}
})
.filter(tuple => tuple._1 != null)
.map(tuple => tuple._1)
A bit of explanation.
If yous MaybeJobEvent is a case class or instance of Product, Spark won't be able to handle it.
See here.
Cannot create encoder for Option of Product type, because Product type is represented as a row, and the entire row can not be null in Spark SQL like normal databases. You can wrap your type with Tuple1 if you do want top level null Product objects, e.g. instead of creating Dataset[Option[MyClass]], you can do something like val ds: Dataset[Tuple1[MyClass]] = Seq(Tuple1(MyClass(...)), Tuple1(null)).toDS
Some examples:
case class BinaryHolder(value: Array[Byte]) {
def toStrMonad(): Option[String] = new String(value) match {
case "abc" => None
case _ => Some(new String(value))
}
def toCaseClassMonad(): Option[MyString] = new String(value) match {
case "abc" => None
case _ => Some(MyString(new String(value)))
}
}
//case classe is also Product
case class MyString(str: String)
Creating Dataset:
val ds = List(
BinaryHolder("abc".getBytes()),
BinaryHolder("dbe".getBytes()),
BinaryHolder("aws".getBytes()),
BinaryHolder("qwe".getBytes())
).toDS()
This works fine:
val df: DataFrame = ds.toDF()
df.select($"value".as[Array[Byte]])
.map(binaryData => {
BinaryHolder(binaryData).toStrMonad()
})
.show()
+-----+
|value|
+-----+
| null|
| dbe|
| aws|
| qwe|
+-----+
But this fails with the exception
df.select($"value".as[Array[Byte]])
.map(binaryData => {
//Option[MyString]
BinaryHolder(binaryData).toCaseClassMonad()
})
.show()
UnsupportedOperationException: Cannot create encoder for Option of Product type...
Returning nulls for a typed dataset also won't work
df.select($"value".as[Array[Byte]])
.map(binaryData => {
BinaryHolder(binaryData).toCaseClassMonad() match {
case Some(value) => value
case None => null
}
})
throws
java.lang.NullPointerException: Null value appeared in non-nullable field:
top level Product input object
If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer instead of int/scala.Int).
I have a Seq[String] in Scala, and if the Seq contains certain Strings, I append a relevant message to another list.
Is there a more 'scalaesque' way to do this, rather than a series of if statements appending to a list like I have below?
val result = new ListBuffer[Err]()
val malformedParamNames = // A Seq[String]
if (malformedParamNames.contains("$top")) result += IntegerMustBePositive("$top")
if (malformedParamNames.contains("$skip")) result += IntegerMustBePositive("$skip")
if (malformedParamNames.contains("modifiedDate")) result += FormatInvalid("modifiedDate", "yyyy-MM-dd")
...
result.toList
If you want to use some scala iterables sugar I would use
sealed trait Err
case class IntegerMustBePositive(msg: String) extends Err
case class FormatInvalid(msg: String, format: String) extends Err
val malformedParamNames = Seq[String]("$top", "aa", "$skip", "ccc", "ddd", "modifiedDate")
val result = malformedParamNames.map { v =>
v match {
case "$top" => Some(IntegerMustBePositive("$top"))
case "$skip" => Some(IntegerMustBePositive("$skip"))
case "modifiedDate" => Some(FormatInvalid("modifiedDate", "yyyy-MM-dd"))
case _ => None
}
}.flatten
result.toList
Be warn if you ask for scala-esque way of doing things there are many possibilities.
The map function combined with flatten can be simplified by using flatmap
sealed trait Err
case class IntegerMustBePositive(msg: String) extends Err
case class FormatInvalid(msg: String, format: String) extends Err
val malformedParamNames = Seq[String]("$top", "aa", "$skip", "ccc", "ddd", "modifiedDate")
val result = malformedParamNames.flatMap {
case "$top" => Some(IntegerMustBePositive("$top"))
case "$skip" => Some(IntegerMustBePositive("$skip"))
case "modifiedDate" => Some(FormatInvalid("modifiedDate", "yyyy-MM-dd"))
case _ => None
}
result
Most 'scalesque' version I can think of while keeping it readable would be:
val map = scala.collection.immutable.ListMap(
"$top" -> IntegerMustBePositive("$top"),
"$skip" -> IntegerMustBePositive("$skip"),
"modifiedDate" -> FormatInvalid("modifiedDate", "yyyy-MM-dd"))
val result = for {
(k,v) <- map
if malformedParamNames contains k
} yield v
//or
val result2 = map.filterKeys(malformedParamNames.contains).values.toList
Benoit's is probably the most scala-esque way of doing it, but depending on who's going to be reading the code later, you might want a different approach.
// Some type definitions omitted
val malformations = Seq[(String, Err)](
("$top", IntegerMustBePositive("$top")),
("$skip", IntegerMustBePositive("$skip")),
("modifiedDate", FormatInvalid("modifiedDate", "yyyy-MM-dd")
)
If you need a list and the order is siginificant:
val result = (malformations.foldLeft(List.empty[Err]) { (acc, pair) =>
if (malformedParamNames.contains(pair._1)) {
pair._2 ++: acc // prepend to list for faster performance
} else acc
}).reverse // and reverse since we were prepending
If the order isn't significant (although if the order's not significant, you might consider wanting a Set instead of a List):
val result = (malformations.foldLeft(Set.empty[Err]) { (acc, pair) =>
if (malformedParamNames.contains(pair._1)) {
acc ++ pair._2
} else acc
}).toList // omit the .toList if you're OK with just a Set
If the predicates in the repeated ifs are more complex/less uniform, then the type for malformations might need to change, as they would if the responses changed, but the basic pattern is very flexible.
In this solution we define a list of mappings that take your IF condition and THEN statement in pairs and we iterate over the inputted list and apply the changes where they match.
// IF THEN
case class Operation(matcher :String, action :String)
def processInput(input :List[String]) :List[String] = {
val operations = List(
Operation("$top", "integer must be positive"),
Operation("$skip", "skip value"),
Operation("$modify", "modify the date")
)
input.flatMap { in =>
operations.find(_.matcher == in).map { _.action }
}
}
println(processInput(List("$skip","$modify", "$skip")));
A breakdown
operations.find(_.matcher == in) // find an operation in our
// list matching the input we are
// checking. Returns Some or None
.map { _.action } // if some, replace input with action
// if none, do nothing
input.flatMap { in => // inputs are processed, converted
// to some(action) or none and the
// flatten removes the some/none
// returning just the strings.
In the following code the first expression returns a Result[String] which contains one of the strings "medical", "dental" or "pharmacy" inside of a Result. I can add .toOption.get to the end of the val statement to get the String, but is there a better way to use the Result? Without the .toOption.get, the code will not compile.
val service = element("h2").containingAnywhere("claim details").fullText()
service match {
case "medical" => extractMedicalClaim
case "dental" => extractDentalClaim
case "pharmacy" => extractPharmacyClaim
}
Hard to say without knowing what Result is. If it's a case class, with the target String as part of its constructor, then you could pattern match directly.
Something like this.
service match {
case Result("medical") => extractMedicalClaim
case Result("dental") => extractDentalClaim
case Result("pharmacy") => extractPharmacyClaim
case _ => // default result
}
If the Result class doesn't have an extractor (the upapply() method) you might be able to add one just for this purpose.
I'm assuming this Result[T] class has a toOption method which returns an Option[T] - if that's the case, you can call toOption and match on that option:
val service = element("h2").containingAnywhere("claim details").fullText().toOption
service match {
case Some("medical") => extractMedicalClaim
case Some("dental") => extractDentalClaim
case Some("pharmacy") => extractPharmacyClaim
case None => // handle the case where the result was empty
}
I have the following input string:
"0.3215,Some(0.5123)"
I would like to retrieve the tuple (0.3215,Some(0.5123)) with: (BigDecimal,Option[BigDecimal]).
Here is one of the thing I tried so far:
"\\d+\\.\\d+,Some\\(\\d+\\.\\d+".r findFirstIn iData match {
case None => Map[BigDecimal, Option[BigDecimal]]()
case Some(s) => {
val oO = s.split(",Some\\(")
BigDecimal.valueOf(oO(0).toDouble) -> Option[BigDecimal](BigDecimal.valueOf(lSTmp2(1).toDouble))
}
}
Using a Map and transforming it into a tuple.
When I try directly the tuple I get an Equals or an Object.
Must miss something here...
Your code has several issues, but the big one seems to be that the case None side of the match returns a Map but the Some(s) side returns a Tuple2. Map and Tuple2 unify to their lowest-common-supertype, Equals, which is what you're seeing.
I think this is what you're trying to achieve?
val Pattern = "(\\d+\\.\\d+),Some\\((\\d+\\.\\d+)\\)".r
val s = "0.3215,Some(0.5123)"
s match {
case Pattern(a,b) => Map(BigDecimal(a) -> Some(BigDecimal(b)))
case _ => Map[BigDecimal, Option[BigDecimal]]()
}
// Map[BigDecimal,Option[BigDecimal]] = Map(0.3215 -> Some(0.5123))