TypedPipe can't coerce strings to DateTime even when given implicit function - scala

I've got a Scalding data flow that starts with a bunch of Pipe separated value files. The first column is a DateTime in a slightly non-standard format. I want to use the strongly typed TypedPipe API, so I've specified a tuple type and a case class to contain the data:
type Input = (DateTime, String, Double, Double, String)
case class LatLonRecord(date : DateTime, msidn : String, lat : Double, lon : Double, cellname : String)
however, Scalding doesn't know how to coerce a String into a DateTime, so I tried adding an implicit function to do the dirty work:
implicit def stringToDateTime(dateStr: String): DateTime =
DateTime.parse(dateStr, DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss.S"))
However, I still get a ClassCastException:
val lines: TypedPipe[Input] = TypedPipe.from(TypedPsv[Input](args("input")))
lines.map(x => x._1).dump
//cascading.flow.FlowException: local step failed at java.lang.Thread.run(Thread.java:745)
//Caused by: cascading.pipe.OperatorException: [FixedPathTypedDelimite...][com.twitter.scalding.RichPipe.eachTo(RichPipe.scala:509)] operator Each failed executing operation
//Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.joda.time.DateTime
What do I need to do to get Scalding to call my conversion function?

So I ended up doing this:
case class LatLonRecord(date : DateTime, msisdn : String, lat : Double, lon : Double, cellname : String)
object dateparser {
val format = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss.S")
def parse(s : String) : DateTime = DateTime.parse(s,format);
}
//changed first column to a String, yuck
type Input = (String, String, Double, Double, String)
val lines: TypedPipe[Input] = TypedPipe.from( TypedPsv[Input]( args("input")) )
val recs = lines.map(v => LatLonRecord(dateparser.parse(v._1), v._2, v._3,v._4, v._5))
But I feel like its a sub-optimal solution. I welcome better answers from people who have been using Scala for more than, say, 1 week, like me.

Related

Scala Option and Some mismatch

I want to parse province to case class, it throws mismatch
scala.MatchError: Some(USA) (of class scala.Some)
val result = EntityUtils.toString(entity,"UTF-8")
val address = JsonParser.parse(result).extract[Address]
val value.province = Option(address.province)
val value.city = Option(address.city)
case class Access(
device: String,
deviceType: String,
os: String,
event: String,
net: String,
channel: String,
uid: String,
nu: Int,
ip: String,
time: Long,
version: String,
province: Option[String],
city: Option[String],
product: Option[Product]
)
This:
val value.province = Option(address.province)
val value.city = Option(address.city)
doesn't do what you think it does. It tries to treat value.province and value.city as extractors (which don't match the type, thus scala.MatchError exception). It doesn't mutate value as I believe you intended (because value apparently doesn't have such setters).
Since value is (apparently) Access case class, it is immutable and you can only obtain an updated copy:
val value2 = value.copy(
province = Option(address.province),
city = Option(address.city)
)
Assuming the starting point:
val province: Option[String] = ???
You can get the string with simple pattern matching:
province match {
case Some(stringValue) => JsonParser.parse(stringValue).extract[Province] //use parser to go from string to a case class
case None => .. //maybe provide a default value, depends on your context
}
Note: Without knowing what extract[T] returns it's hard to recommend a follow-up

Implementing my HbaseConnector

I would like to implement a HbaseConnector.
I'm actually reading the guide but there is a part that I don't understand and I can't find any information about it.
In the part 2 of the guide we can see the following code :
case class HBaseRecord(col0: String, col1: Boolean,col2: Double, col3: Float,col4: Int, col5: Long, col6: Short, col7: String, col8: Byte)
object HBaseRecord {def apply(i: Int, t: String): HBaseRecord = { val s = s”””row${“%03d”.format(i)}””” HBaseRecord(s, i % 2 == 0, i.toDouble, i.toFloat, i, i.toLong, i.toShort, s”String$i: $t”, i.toByte) }}
val data = (0 to 255).map { i => HBaseRecord(i, “extra”)}
I do understand that they store the future column in the HbaseRecord case class but I don't understand the specific use of this line :
val s = s”””row${“%03d”.format(i)}”””
Could someone care to explain ?
It is used to generate row ids like row001, row002 etc. which will populate column0 of your table. Try out simpler way with function
def generate(i: Int): String = { s"""row${"%03d".format(i)}"""}

How to pass join key as variable in Spark Data Frame using Scala

I am trying to kep the join key of the two dataframe in two varibale. The same i want to pass into a join . here My variable contains one key. Can I also pass more than one key ?
Ex:
1st key :
scala> val primary_key_col = scd_table_keys_df.first().getString(2)
primary_key_col: String = acct_nbr
2nd key :
scala> val delta_primary_key_col = "delta_"+primary_key_col
delta_primary_key_col: String = delta_acct_nbr
** My python Code which is working
cdc_new_acct_df = delta_src_rename_df.join(hist_tgt_tbl_Y_df ,(col(delta_primary_key_col) == col(primary_key_col)) ,'left_outer' ).where(hist_tgt_tbl_Y_df[primary_key_col].isNull())
I want to achieve same in Scala. Please suggest.
Tries multiple ways.
val cdc_new_acct_df = delta_src_rename_df.join(hist_tgt_tbl_Y_df ,(delta_src_rename_df({primary_key_col.mkstring(",")}) == hist_tgt_tbl_Y_df({primary_key_col.mkstring(",")}),"left_outer" )).where(hist_tgt_tbl_Y_df[primary_key_col].isNull())
:121: error: value mkstring is not a member of String
val cdc_new_acct_df = delta_src_rename_df.join(hist_tgt_tbl_Y_df ,(delta_src_rename_df(delta_primary_key_col.map(c => col(c))) == hist_tgt_tbl_Y_df(primary_key_col.map(c => col(c))),"left_outer" ))
:123: error: type mismatch;
found : Array[org.apache.spark.sql.Column]
required: String
Not able to solve. Please suggest.
I was missing the variable substitution. This is working for me.
scala> val cdc_new_acct_df = delta_src_rename_df.join(hist_tgt_tbl_Y_df ,delta_src_rename_df(**s"$delta_primary_key_col"**) === hist_tgt_tbl_Y_df(s"$primary_key_col"),"left_outer" )
cdc_new_acct_df: org.apache.spark.sql.DataFrame = [delta_acct_nbr: string, delta_primary_state: string, delta_zip_code: string, delta_load_tm: string, delta_load_date: string, hash_key_col: string, delta_hash_key: int, delta_eff_start_date: string, acct_nbr: string, account_sk_id: bigint, primary_state: string, zip_code: string, eff_start_date: string, eff_end_date: string, load_tm: string, hash_key: string, eff_flag: string]

Convert day/month/year ints to java.time.LocalDate using map

Given this case classs:
case class DateX (year: Int, month: Int, day: Int)
And this sequence
val dates = Seq(DateX(2001,1,1),DateX(2002,2,2),DateX(2003,3,3))
I need to convert the sequence of dates into LocalDate, I tried this but doesn't work:
val list = dates.map { x => (x.year, new LocalDate(x.year,x.month,x.day)) }
It says that LocalDate does not have a constructor. How to fix this?
This should get you started. Adjust as needed.
dates.map{ case DateX(y,m,d) => java.time.LocalDate.of(y,m,d) }
// res0: Seq[java.time.LocalDate] = List(2001-01-01, 2002-02-02, 2003-03-03)

Play: Bind a form field to a double?

Perhaps I'm just overlooking something obvious but I can't figure out how to bind a form field to a double in a Play controller.
For instance, assume this is my model:
case class SavingsGoal(
timeframeInMonths: Option[Int],
amount: Double,
name: String
)
(Ignore that I'm using a double for money, I know that's a bad idea, this is just a simplified example)
And I wanted to bind it like so:
object SavingsGoals extends Controller {
val savingsForm: Form[SavingsGoal] = Form(
mapping(
"timeframeInMonths" -> optional(number.verifying(min(0))),
"amount" -> of[Double],
"name" -> nonEmptyText
)(SavingsGoal.apply)(SavingsGoal.unapply)
)
}
I realized that the number helper only works for ints but I thought using of[] might allow me to bind a double. However, I get a compiler error on this:
Cannot find Formatter type class for Double. Perhaps you will need to import
play.api.data.format.Formats._
Doing so doesn't help as there's no double formatter defined in the API.
This is all just a long way of asking what's the canonical way of binding a form field to a double in Play?
Thanks!
edit: 4e6 pointed me in the right direction. Here's what I did using his help:
Using the snippets in his link, I added the following to app.controllers.Global.scala:
object Global {
/**
* Default formatter for the `Double` type.
*/
implicit def doubleFormat: Formatter[Double] = new Formatter[Double] {
override val format = Some("format.real", Nil)
def bind(key: String, data: Map[String, String]) =
parsing(_.toDouble, "error.real", Nil)(key, data)
def unbind(key: String, value: Double) = Map(key -> value.toString)
}
/**
* Helper for formatters binders
* #param parse Function parsing a String value into a T value, throwing an exception in case of failure
* #param error Error to set in case of parsing failure
* #param key Key name of the field to parse
* #param data Field data
*/
private def parsing[T](parse: String => T, errMsg: String, errArgs: Seq[Any])(key: String, data: Map[String, String]): Either[Seq[FormError], T] = {
stringFormat.bind(key, data).right.flatMap { s =>
util.control.Exception.allCatch[T]
.either(parse(s))
.left.map(e => Seq(FormError(key, errMsg, errArgs)))
}
}
}
Then, in my form mapping:
mapping(
"amount" -> of(Global.doubleFormat)
)
You don't need to use the the format in the global if you have the version 2.1 up.
Just import:
import play.api.data.format.Formats._
and use as:
mapping(
"amount" -> of(doubleFormat)
)
Actually, there is predefined formatter for Double on master branch. So you should either switch to 2.1-SNAPSHOT play version or just copy the implementation.