Struggle to read csv file

Struggle to read csv file - scala

I am struggling to figure out how i can work with CSV files in scala. I have this code and i have tried to implement various types of codes for csv reading etc, but not been able to actually read a file at all. the // read file is me trying / my late4st attempemt but i get an : object Source is not a member of package org.apache.http.io
[error] val filename = io.Source.fromFile("countries.csv")
Cn someone help me out what am i doing wrong ?
import java.io._
import org.apache.commons._
import org.apache.http._
import org.apache.http.client._
import org.apache.http.client.methods.HttpPost
import org.apache.http.impl.client.DefaultHttpClient
import java.util.ArrayList
import org.apache.http.message.BasicNameValuePair
import org.apache.http.client.entity.UrlEncodedFormEntity
import com.google.gson.Gson
import scala.io.Source
case class Teacher(firstName: String, lastName: String, age: Int)
//case class Country(name: String, region: String, population: Int, area: Int, popDensity: Int, coastline: Int, netmigration: Int, infantMortality: Int, gdp: Int, literacy: Int, phones: Int, arable: Int, crops: Int, other: Int, climate: Int, birthrate: Int,deathrate: Int, agriculture: Int, industry: Int, service: Int)
object HttpJsonPostTest extends App {
//read data
val filename = io.Source.fromFile("countries.csv")
for(line <- filename.getLines){
val col = line.split(",").map(_.trim)
println(s"${cols(0)}|${cols(1)}|${cols(2)}|${cols(3)}")
}
// create our object as a json string
val marius = new Teacher("Lucas", "Geitle", 32)
val mariusAsJson = new Gson().toJson(marius)
// add name value pairs to a post object
val post = new HttpPost("http://127.0.0.1:2379/v2/keys/big_data_course1")
val nameValuePairs = new ArrayList[NameValuePair]()
nameValuePairs.add(new BasicNameValuePair("value", mariusAsJson))
post.setEntity(new UrlEncodedFormEntity(nameValuePairs))
// send the post request
val client = new DefaultHttpClient
val response = client.execute(post)
println("--- HEADERS ---")
response.getAllHeaders.foreach(arg => println(arg))
}

Related

Map different value to the case class property during serialization and deserialization using Jackson

I am trying to deserialize this JSON using Jackson library -
{
"name": "abc",
"ageInInt": 30
}
To the case class Person
case class Person(name: String, #JsonProperty(value = "ageInInt")#JsonAlias(Array("ageInInt")) age: Int)
but I am getting -
No usable value for age
Did not find value which can be converted into int
org.json4s.package$MappingException: No usable value for age
Did not find value which can be converted into int
Basically, I want to deserialize the json with the different key fields ageInInt to age.
here is the complete code -
val json =
"""{
|"name": "Tausif",
|"ageInInt": 30
|}""".stripMargin
implicit val format = DefaultFormats
println(Serialization.read[Person](json))

You need to register DefaultScalaModule to your JsonMapper.
import com.fasterxml.jackson.databind.json.JsonMapper
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.core.`type`.TypeReference
import com.fasterxml.jackson.annotation.JsonProperty
val mapper = JsonMapper.builder()
.addModule(DefaultScalaModule)
.build()
case class Person(name: String, #JsonProperty(value = "ageInInt") age: Int)
val json =
"""{
|"name": "Tausif",
|"ageInInt": 30
|}""".stripMargin
val person: Person = mapper.readValue(json, new TypeReference[Person]{})
println(person) // Prints Person(Tausif,30)

Consuming RESTful API and converting to Dataframe in Apache Spark

I am trying to convert output of url directly from RESTful api to Dataframe conversion in following way:
package trials
import org.apache.spark.sql.SparkSession
import org.json4s.jackson.JsonMethods.parse
import scala.io.Source.fromURL
object DEF {
implicit val formats = org.json4s.DefaultFormats
case class Result(success: Boolean,
message: String,
result: Array[Markets])
case class Markets(
MarketCurrency:String,
BaseCurrency:String,
MarketCurrencyLong:String,
BaseCurrencyLong:String,
MinTradeSize:Double,
MarketName:String,
IsActive:Boolean,
Created:String,
Notice:String,
IsSponsored:String,
LogoUrl:String
)
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.appName(s"${this.getClass.getSimpleName}")
.config("spark.sql.shuffle.partitions", "4")
.master("local[*]")
.getOrCreate()
import spark.implicits._
val parsedData = parse(fromURL("https://bittrex.com/api/v1.1/public/getmarkets").mkString).extract[Array[Result]]
val mySourceDataset = spark.createDataset(parsedData)
mySourceDataset.printSchema
mySourceDataset.show()
}
}
The error is as follows and it repeats for every record:
Caused by: org.json4s.package$MappingException: Expected collection but got JObject(List((success,JBool(true)), (message,JString()), (result,JArray(List(JObject(List((MarketCurrency,JString(LTC)), (BaseCurrency,JString(BTC)), (MarketCurrencyLong,JString(Litecoin)), (BaseCurrencyLong,JString(Bitcoin)), (MinTradeSize,JDouble(0.01435906)), (MarketName,JString(BTC-LTC)), (IsActive,JBool(true)), (Created,JString(2014-02-13T00:00:00)), (Notice,JNull), (IsSponsored,JNull), (LogoUrl,JString(https://bittrexblobstorage.blob.core.windows.net/public/6defbc41-582d-47a6-bb2e-d0fa88663524.png))))))))) and mapping Result[][Result, Result]
at org.json4s.reflect.package$.fail(package.scala:96)

The structure of the JSON returned from this URL is:
{
"success": boolean,
"message": string,
"result": [ ... ]
}
So Result class should be aligned with this structure:
case class Result(success: Boolean,
message: String,
result: List[Markets])
Update
And I also refined slightly the Markets class:
case class Markets(
MarketCurrency: String,
BaseCurrency: String,
MarketCurrencyLong: String,
BaseCurrencyLong: String,
MinTradeSize: Double,
MarketName: String,
IsActive: Boolean,
Created: String,
Notice: Option[String],
IsSponsored: Option[Boolean],
LogoUrl: String
)
End-of-update
But the main issue is in the extraction of the main data part from the parsed JSON:
val parsedData = parse(fromURL("{url}").mkString).extract[Array[Result]]
The root of the returned structure is not an array, but corresponds to Result. So it should be:
val parsedData = parse(fromURL("{url}").mkString).extract[Result]
Then, I suppose that you need not to load the wrapper in the DataFrame, but rather the Markets that are inside. That is why it should be loaded like this:
val mySourceDataset = spark.createDataset(parsedData.result)
And it finally produces the DataFrame:
+--------------+------------+------------------+----------------+------------+----------+--------+-------------------+------+-----------+--------------------+
|MarketCurrency|BaseCurrency|MarketCurrencyLong|BaseCurrencyLong|MinTradeSize|MarketName|IsActive| Created|Notice|IsSponsored| LogoUrl|
+--------------+------------+------------------+----------------+------------+----------+--------+-------------------+------+-----------+--------------------+
| LTC| BTC| Litecoin| Bitcoin| 0.01435906| BTC-LTC| true|2014-02-13T00:00:00| null| null|https://bittrexbl...|
| DOGE| BTC| Dogecoin| Bitcoin|396.82539683| BTC-DOGE| true|2014-02-13T00:00:00| null| null|https://bittrexbl...|

Flink Scala join between two Streams doesn't seem to work

I want to join two streams (json) coming from a Kafka producer.
The code works if I filter the data. But it seems not working when I join them. I want to print to the console the joined stream but nothing appears.
This is my code
import java.util.Properties
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010
import org.apache.flink.streaming.util.serialization.SimpleStringSchema
import org.json4s._
import org.json4s.native.JsonMethods
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows
import org.apache.flink.streaming.api.windowing.time.Time
object App {
def main(args : Array[String]) {
case class Data(location: String, timestamp: Long, measurement: Int, unit: String, accuracy: Double)
case class Sensor(sensor_name: String, start_date: String, end_date: String, data_schema: Array[String], data: Data, stt: Stt)
case class Datas(location: String, timestamp: Long, measurement: Int, unit: String, accuracy: Double)
case class Sensor2(sensor_name: String, start_date: String, end_date: String, data_schema: Array[String], data: Datas, stt: Stt)
val properties = new Properties();
properties.setProperty("bootstrap.servers", "0.0.0.0:9092");
properties.setProperty("group.id", "test");
val env = StreamExecutionEnvironment.getExecutionEnvironment
val consumer1 = new FlinkKafkaConsumer010[String]("topics1", new SimpleStringSchema(), properties)
val stream1 = env
.addSource(consumer1)
val consumer2 = new FlinkKafkaConsumer010[String]("topics2", new SimpleStringSchema(), properties)
val stream2 = env
.addSource(consumer2)
val s1 = stream1.map { x => {
implicit val formats = DefaultFormats
JsonMethods.parse(x).extract[Sensor]
}
}
val s2 = stream2.map { x => {
implicit val formats = DefaultFormats
JsonMethods.parse(x).extract[Sensor2]
}
}
val s1t = s1.assignAscendingTimestamps { x => x.data.timestamp }
val s2t = s2.assignAscendingTimestamps { x => x.data.timestamp }
val j1pre = s1t.join(s2t)
.where(_.data.unit)
.equalTo(_.data.unit)
.window(TumblingEventTimeWindows.of(Time.seconds(2L)))
.apply((g, s) => (s.sensor_name, g.sensor_name, s.data.measurement))
env.execute()
}
}
I think the problem is on the assignment of the timestamp. I think that the assignAscendingTimestamp on the two sources is not the right function.
The json produced by the kafka producer has a field data.timestamp that should be assigned as the timestamp. But I don't know how to manage that.
I also thought that i should have to give a time window batch (as in spark) to the incoming tuples. But I'm not sure this is the right solution.

I think your code needs just some minor adjustments. First of all as you want to work in EventTime you should set appropriate TimeCharacteristic
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
Also your code that you pasted is missing a sink for the stream. If you want to print to console you should:
j1pre.print
The rest of your code seems fine.

How do I turn a Scala case class into a mongo Document

I'd like to build a generic method for transforming Scala Case Classes to Mongo Documents.
A promising Document constructor is
fromSeq(ts: Seq[(String, BsonValue)]): Document
I can turn a case class into a Map[String -> Any], but then I've lost the type information I need to use the implicit conversions to BsonValues. Maybe TypeTags can help with this?
Here's what I've tried:
import org.mongodb.scala.bson.BsonTransformer
import org.mongodb.scala.bson.collection.immutable.Document
import org.mongodb.scala.bson.BsonValue
case class Person(age: Int, name: String)
//transform scala values into BsonValues
def transform[T](v: T)(implicit transformer: BsonTransformer[T]): BsonValue = transformer(v)
// turn any case class into a Map[String, Any]
def caseClassToMap(cc: Product) = {
val values = cc.productIterator
cc.getClass.getDeclaredFields.map( _.getName -> values.next).toMap
}
// transform a Person into a Document
def personToDocument(person: Person): Document = {
val map = caseClassToMap(person)
val bsonValues = map.toSeq.map { case (key, value) =>
(key, transform(value))
}
Document.fromSeq(bsonValues)
}
<console>:24: error: No bson implicit transformer found for type Any. Implement or import an implicit BsonTransformer for this type.
(key, transform(value))

def personToDocument(person: Person): Document = {
Document("age" -> person.age, "name" -> person.name)
}

Below code works without manual conversion of an object.
import reactivemongo.api.bson.{BSON, BSONDocument, Macros}
case class Person(name:String = "SomeName", age:Int = 20)
implicit val personHandler = Macros.handler[Person]
val bsonPerson = BSON.writeDocument[Person](Person())
println(s"${BSONDocument.pretty(bsonPerson.getOrElse(BSONDocument.empty))}")

You can use Salat https://github.com/salat/salat. A nice example can be found here - https://gist.github.com/bhameyie/8276017. This is the piece of code that will help you -
import salat._
val dBObject = grater[Artist].asDBObject(artist)
artistsCollection.save(dBObject, WriteConcern.Safe)

I was able to serialize a case class to a BsonDocument using the org.bson.BsonDocumentWriter. The below code runs using scala 2.12 and mongo-scala-driver_2.12 version 2.6.0
My quest for this solution was aided by this answer (where they are trying to serialize in the opposite direction): Serialize to object using scala mongo driver?
import org.mongodb.scala.bson.codecs.Macros
import org.mongodb.scala.bson.codecs.DEFAULT_CODEC_REGISTRY
import org.bson.codecs.configuration.CodecRegistries.{fromRegistries, fromProviders}
import org.bson.codecs.EncoderContext
import org.bson.BsonDocumentWriter
import org.mongodb.scala.bson.BsonDocument
import org.bson.codecs.configuration.CodecRegistry
import org.bson.codecs.Codec
case class Animal(name : String, species: String, genus: String, weight: Int)
object TempApp {
def main(args: Array[String]) {
val jaguar = Animal("Jenny", "Jaguar", "Panthera", 190)
val codecProvider = Macros.createCodecProvider[Animal]()
val codecRegistry: CodecRegistry = fromRegistries(fromProviders(codecProvider), DEFAULT_CODEC_REGISTRY)
val codec = Macros.createCodec[Animal](codecRegistry)
val encoderContext = EncoderContext.builder.isEncodingCollectibleDocument(true).build()
var doc = BsonDocument()
val writr = new BsonDocumentWriter(doc) // need to call new since Java lib w/o companion object
codec.encode(writr, jaguar, encoderContext)
print(doc)
}
};

Deserialize MongoDB Document with Scala and Jackson-Mapper leads to UnrecognizedProperty _id

I have the following class defined in Scala using Jackson as mapper.
package models
import play.api.Play.current
import org.codehaus.jackson.annotate.JsonProperty
import net.vz.mongodb.jackson.ObjectId
import play.modules.mongodb.jackson.MongoDB
import reflect.BeanProperty
import scala.collection.JavaConversions._
import net.vz.mongodb.jackson.Id
import org.codehaus.jackson.annotate.JsonIgnoreProperties
case class Team(
#BeanProperty #JsonProperty("teamName") var teamName: String,
#BeanProperty #JsonProperty("logo") var logo: String,
#BeanProperty #JsonProperty("location") var location: String,
#BeanProperty #JsonProperty("details") var details: String,
#BeanProperty #JsonProperty("formOfSport") var formOfSport: String)
object Team {
private lazy val db = MongoDB.collection("teams", classOf[Team], classOf[String])
def save(team: Team) { db.save(team) }
def getAll(): Iterable[Team] = {
val teams: Iterable[Team] = db.find()
return teams
}
def findOneByTeamName(teamName: String): Team = {
val team: Team = db.find().is("teamName", teamName).first
return team
}
}
Inserting into mongodb works without problems and an _id is automatically inserted for every document.
But now I want to try read (or deserialize) a document e.g. by calling findOneByTeamName. This always causes an UnrecognizedPropertyException for _id. I create the instance with Team.apply and Team.unapply. Even with an own ObjectId this doesn't work as _id and id are treated different.
Can anyone help how the get the instance or how to deserialize right? Thanks in advance

I am using play-mongojack. Here is my class. You object definition is fine.
import com.fasterxml.jackson.annotation.JsonProperty
import com.fasterxml.jackson.databind.ObjectMapper
import org.mongojack.{MongoCollection, JacksonDBCollection}
import org.mongojack.ObjectId
import org.mongojack.WriteResult
import com.mongodb.BasicDBObject
import scala.reflect.BeanProperty
import javax.persistence.Id
import javax.persistence.Transient
import java.util.Date
import java.util.List
import java.lang.{ Long => JLong }
import play.mongojack.MongoDBModule
import play.mongojack.MongoDBPlugin
import scala.collection.JavaConversions._
class Event (
#BeanProperty #JsonProperty("clientMessageId") val clientMessageId: Option[String] = None,
#BeanProperty #JsonProperty("conversationId") val conversationId: String
) {
#ObjectId #Id #BeanProperty var messageId: String = _ // don't manual set messageId
#BeanProperty #JsonProperty("uploadedFile") var uploadedFile: Option[(String, String, JLong)] = None // the upload file(url,name,size)
#BeanProperty #JsonProperty("createdDate") var createdDate: Date = new Date()
#BeanProperty #Transient var cmd: Option[(String, String)] = None // the cmd(cmd,param)
def createdDateStr() = {
val format = new java.text.SimpleDateFormat("yyyy-MM-dd HH:mm:ss")
format.format(createdDate)
}
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Struggle to read csv file - scala

Related

Map different value to the case class property during serialization and deserialization using Jackson

Consuming RESTful API and converting to Dataframe in Apache Spark

Flink Scala join between two Streams doesn't seem to work

How do I turn a Scala case class into a mongo Document

Deserialize MongoDB Document with Scala and Jackson-Mapper leads to UnrecognizedProperty _id

Categories

Resources