How to use map resource by a FUNC to cover to DataFrame? - scala

I want to use a FUNC to map a line data from a HDFS , then cover to a DataFrame, but it doesn't work. Please help me as soon as possible .
For example :
case class Kof(UID: String, SITEID: String, MANAGERID: String, ROLES: String, EXTERNALURL: String, EXTERNALID: String, OPTION1: String,
OPTION2: String, OPTION3: String
)
def GetData(argv1: Array[String]): Kof =
{
return Kof(argv1(0), argv1(1),argv1(2), argv1(3),argv1(4),
argv1(5),argv1(6), argv1(7),argv1(8)) }
val textFile2 = sc.textFile("hdfs://hadoop-s3:8020/tmp/mefang/modify.txt").
map(_.split(",")).map(p => {GetData(p)})**toDF** <!-here it break error ->
Exception in thread "main" org.apache.spark.SparkException: Task not serializable

Related

Couldn't find a suitable constructor for class TunnelData to initialize with {}, null, null, null

i just try project https://github.com/tark/wireguard-flutter-poc
when running debug its good but if running release showing error Couldn't find a suitable constructor for class TunnelData to initialize with {}, null, null, null
this code
class TunnelData(
val name: String,
val address: String,
val listenPort: String,
val dnsServer: String,
val privateKey: String,
val peerAllowedIp: String,
val peerPublicKey: String,
val peerEndpoint: String,
val preSharedKey: String
)

Failed to read HTTP message error due to null value passing to id :spring-boot,kotlin,mongodb

I'm completely new to kotlin programming and the mongo db.I'm defining a data class, which all the data fields are not nullable and all the fileds are val
data class Order( #Id
val id: String,
val customerId: String,
val externalTransactionId : String,
val quoteId :String,
val manifestItems : List<ManifestItem>,
val externalTokenNumber : String,
val deliveryId : String,
val quoteCreatedTime: String,
val deliveryCreatedTime: String,
val status : String,
val deliveryInfo: DeliveryInfo,
val pickupInfo: PickupInfo,
val riderId : String,
val currency : String,
val expiryTime : String,
val trackingUrl : String,
val complete:Boolean,
val updated:String
)
and I'm sending a http request with following body
{
"pickupAddress":"101 Market St, San Francisco, CA 94105",
"deliveryAddress":"101 Market St, San Francisco, CA 94105",
"deliveryExpectedTime":"2018-07-25T23:31:38Z",
"deliveryAddressLatitude":7.234,
"deliveryAddressLongitude":80.000,
"pickupLatitude":7.344,
"pickupLongitude":8.00,
"pickupReadyTime":""
}
in my router class I'm get the request body to order object and send to the service class
val request = serverRequest.awaitBody<Order>()
val quoteResponse = quoteService.createQuote(request,customerId)
in my service class I'm saving the order to database
suspend fun createQuote(order: Order,customerId:String):QuoteResponse {
ordersRepo.save(order).awaitFirst()
//generating quote response here
return quoteResponse
}
the id is generating at the database.and I'm having this kind of error when sending the request
org.springframework.web.server.ServerWebInputException: 400 BAD_REQUEST "Failed to read HTTP message"; nested exception is org.springframework.core.codec.DecodingException: JSON decoding error: Instantiation of [simple type, class basepackage.repo.Order] value failed for JSON property id due to missing (therefore NULL) value for creator parameter id which is a non-nullable type; nested exception is com.fasterxml.jackson.module.kotlin.MissingKotlinParameterException: Instantiation of [simple type, class basepackage.repo.Order] value failed for JSON property id due to missing (therefore NULL) value for creator parameter id which is a non-nullable type
at [Source: (org.springframework.core.io.buffer.DefaultDataBuffer$DefaultDataBufferInputStream); line: 12, column: 1] (through reference chain: basepackage.repo.Order["id"])
How do I overcome that problem.
If you are using mongoDB, you need to use ObjectID instead of String and it could not generate by itself and you need to create it every time!
Following code probably is a solution :
data class Order constructor(
#Id
#field:JsonSerialize(using = ToStringSerializer::class)
#field:JsonProperty(access = JsonProperty.Access.READ_ONLY)
var id: ObjectId?,
val customerId: String,
val externalTransactionId: String,
val quoteId: String,
val manifestItems: List<ManifestItem>,
val externalTokenNumber: String,
val deliveryId: String,
val quoteCreatedTime: String,
val deliveryCreatedTime: String,
val status: String,
val deliveryInfo: DeliveryInfo,
val pickupInfo: PickupInfo,
val riderId: String,
val currency: String,
val expiryTime: String,
val trackingUrl: String,
val complete: Boolean,
val updated: String
)
I converted val to var that it lets you set value to it after serialize, set access as READ_ONLY that Jackson doesn't expects id as input anymore.
ToStringSerializer is optional, if you like id as String

Consuming RESTful API and converting to Dataframe in Apache Spark

I am trying to convert output of url directly from RESTful api to Dataframe conversion in following way:
package trials
import org.apache.spark.sql.SparkSession
import org.json4s.jackson.JsonMethods.parse
import scala.io.Source.fromURL
object DEF {
implicit val formats = org.json4s.DefaultFormats
case class Result(success: Boolean,
message: String,
result: Array[Markets])
case class Markets(
MarketCurrency:String,
BaseCurrency:String,
MarketCurrencyLong:String,
BaseCurrencyLong:String,
MinTradeSize:Double,
MarketName:String,
IsActive:Boolean,
Created:String,
Notice:String,
IsSponsored:String,
LogoUrl:String
)
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.appName(s"${this.getClass.getSimpleName}")
.config("spark.sql.shuffle.partitions", "4")
.master("local[*]")
.getOrCreate()
import spark.implicits._
val parsedData = parse(fromURL("https://bittrex.com/api/v1.1/public/getmarkets").mkString).extract[Array[Result]]
val mySourceDataset = spark.createDataset(parsedData)
mySourceDataset.printSchema
mySourceDataset.show()
}
}
The error is as follows and it repeats for every record:
Caused by: org.json4s.package$MappingException: Expected collection but got JObject(List((success,JBool(true)), (message,JString()), (result,JArray(List(JObject(List((MarketCurrency,JString(LTC)), (BaseCurrency,JString(BTC)), (MarketCurrencyLong,JString(Litecoin)), (BaseCurrencyLong,JString(Bitcoin)), (MinTradeSize,JDouble(0.01435906)), (MarketName,JString(BTC-LTC)), (IsActive,JBool(true)), (Created,JString(2014-02-13T00:00:00)), (Notice,JNull), (IsSponsored,JNull), (LogoUrl,JString(https://bittrexblobstorage.blob.core.windows.net/public/6defbc41-582d-47a6-bb2e-d0fa88663524.png))))))))) and mapping Result[][Result, Result]
at org.json4s.reflect.package$.fail(package.scala:96)
The structure of the JSON returned from this URL is:
{
"success": boolean,
"message": string,
"result": [ ... ]
}
So Result class should be aligned with this structure:
case class Result(success: Boolean,
message: String,
result: List[Markets])
Update
And I also refined slightly the Markets class:
case class Markets(
MarketCurrency: String,
BaseCurrency: String,
MarketCurrencyLong: String,
BaseCurrencyLong: String,
MinTradeSize: Double,
MarketName: String,
IsActive: Boolean,
Created: String,
Notice: Option[String],
IsSponsored: Option[Boolean],
LogoUrl: String
)
End-of-update
But the main issue is in the extraction of the main data part from the parsed JSON:
val parsedData = parse(fromURL("{url}").mkString).extract[Array[Result]]
The root of the returned structure is not an array, but corresponds to Result. So it should be:
val parsedData = parse(fromURL("{url}").mkString).extract[Result]
Then, I suppose that you need not to load the wrapper in the DataFrame, but rather the Markets that are inside. That is why it should be loaded like this:
val mySourceDataset = spark.createDataset(parsedData.result)
And it finally produces the DataFrame:
+--------------+------------+------------------+----------------+------------+----------+--------+-------------------+------+-----------+--------------------+
|MarketCurrency|BaseCurrency|MarketCurrencyLong|BaseCurrencyLong|MinTradeSize|MarketName|IsActive| Created|Notice|IsSponsored| LogoUrl|
+--------------+------------+------------------+----------------+------------+----------+--------+-------------------+------+-----------+--------------------+
| LTC| BTC| Litecoin| Bitcoin| 0.01435906| BTC-LTC| true|2014-02-13T00:00:00| null| null|https://bittrexbl...|
| DOGE| BTC| Dogecoin| Bitcoin|396.82539683| BTC-DOGE| true|2014-02-13T00:00:00| null| null|https://bittrexbl...|

Spark DataFrame not supporting Char datatype

I am creating a Spark DataFrame from a text file. Say Employee file which contains String, Int, Char.
created a class:
case class Emp (
Name: String,
eid: Int,
Age: Int,
Sex: Char,
Sal: Int,
City: String)
Created RDD1 using split, then created RDD2:
val textFileRDD2 = textFileRDD1.map(attributes => Emp(
attributes(0),
attributes(1).toInt,
attributes(2).toInt,
attributes(3).charAt(0),
attributes(4).toInt,
attributes(5)))
And Final RDDS as:
finalRDD = textFileRDD2.toDF
when I create final RDD it throws the error:
java.lang.UnsupportedOperationException: No Encoder found for scala.Char"
can anyone help me out why and how to resolve it?
Spark SQL doesn't provide Encoders for Char and generic Encoders are not very useful.
You can either use a StringType:
attributes(3).slice(0, 1)
or ShortType (or BooleanType, ByteType if you accept only binary response):
attributes(3)(0) match {
case 'F' => 1: Short
...
case _ => 0: Short
}

Play 2.2 Scala class serializable for cache

I have one scala class from play-mongojack example. It works fine. However, when I tried to save it in Play ehcache, it throws NotSerializableException. How can I make this class serializable?
class BlogPost(#ObjectId #Id val id: String,
#BeanProperty #JsonProperty("date") val date: Date,
#BeanProperty #JsonProperty("title") val title: String,
#BeanProperty #JsonProperty("author") val author: String,
#BeanProperty #JsonProperty("content") val content: String) {
#ObjectId #Id #BeanProperty var blogId: String = _
#BeanProperty #JsonProperty("uploadedFile") var uploadedFile: Option[(String, String, Long)] = None
}
object BlogPost {
def apply(
date: Date,
title: String,
author: String,
content: String): BlogPost = new BlogPost(date,title,author,content)
def unapply(e: Event) =
new Some((e.messageId,
e.date,
e.title,
e.author,
e.content,
e.blogId,
e.uploadedFile) )
private lazy val db = MongoDB.collection("blogposts", classOf[BlogPost], classOf[String])
def save(blogPost: BlogPost) { db.save(blogPost) }
def findByAuthor(author: String) = db.find().is("author", author).asScala
}
Saving to cache:
var latestBlogs = List[BlogPost]()
Cache.set("latestBlogs", latestBlogs, 30)
It throws an exception:
[error] n.s.e.s.d.DiskStorageFactory - Disk Write of latestBlogs failed:
java.io.NotSerializableException: BlogPost
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) ~[na:1.7.0_45]
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) ~[na:1.7.0_45]
at java.util.ArrayList.writeObject(ArrayList.java:742) ~[na:1.7.0_45]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_45]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_45]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_45]
EDIT 1:
I tried to extends the object with Serializable, it doesn't work.
object BlogPost extends Serializable {}
EDIT 2:
The vitalii's comment works for me.
class BlogPost() extends scala.Serializable {}
Try to derive class BlogPost from Serializable or define it as a case class which are serializable by default.