Spark unit test -- Mock Azure SQLJDBC connection - scala

I want to unit test the below piece of code so that i can get good code coverage. I am using FunSuite with Mockito. Can you please let me know how can i mock the database connectiion and do the unit testing.
def getSummaryConfig() : Config = {
Config(Map(
"url" -> configUtil.getProperty("azure.host.name"),
"databaseName" -> configUtil.getProperty("azure.database.name"),
"dbTable" -> configUtil.getProperty("azure.summary.table"),
"user" -> configUtil.getProperty("azure.user.name"),
"password" -> configUtil.getProperty("azure.database.password")
))
}
def getSummaryDF(summaryConfig : Config) : DataFrame = {
val summaryDF = spark.read.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver").sqlDB(summaryConfig)
summaryDF
}
val summaryConfig = getSummaryConfig()
val summaryDF = getSummaryDF(summaryConfig)

Related

Deserialization of json using jackson in scala

I am trying to de-serialize the below json string to scala object using jackson json api
{ "Domain1": { "data-file": "dataFile1", "filter": {
"affected-object": "AffectedObject1", "affected-nd":
"AffectedNd1" } }, "Domain2": { "data-file": "dataFile2",
"filter": { "affected-ci": "AffectedCI2", "affected-net":
"AffectedNet2" } } }
I tried to use case class and tried first using "ClassOf" in ValueType of "readValue" Method but the output is Map of Map Object. Data is not converted into Case Class Object.
case class CrossDomainFilterObj(#JsonProperty("data-file")dataFile: String,
#JsonProperty("filter")filter: Map[String,String])
val jsonString = "{\"Domain1\": {\"data-file\": \"dataFile1\", \"filter\": { \"affected-object\":
\"AffectedObject1\", \"affected-nd\" : \"AffectedNd1\"}},\"Domain2\": {\"data-file\":
\"dataFile2\", \"filter\": { \"affected-ci\":\"AffectedCI2\", \"affected-net\" :
\"AffectedNet2\"}}}"
val mapper = new ObjectMapper
mapper.registerModule(DefaultScalaModule)
val data = mapper.readValue(jsonString, classOf[Map[String, CrossDomainFilterObj]])
println(data)
I am getting output like below
Map(Domain1 -> Map(data-file -> dataFile1, filter -> Map(affected-object ->
AffectedObject1, affected-nd -> AffectedNd1)), Domain2 -> Map(data-file ->
dataFile2, filter -> Map(affected-ci -> AffectedCI2,
affected-net -> AffectedNet2)))
But I am expecting an output like below
Map(Domain1 -> CrossDomainFilterObj(dataFile1, Map(affected-object ->
AffectedObject1, affected-nd -> AffectedNd1)), Domain2 ->
CrossDomainFilterObj(dataFile2, Map(affected-ci ->
AffectedCI2, affected-net -> AffectedNet2)))
Then i tried using TypeReference as ValueType as shown below,
case class CrossDomainFilterObj(#JsonProperty("data-file")dataFile: String,
#JsonProperty("filter")filter: Map[String,String])
val jsonString = "{\"Domain1\": {\"data-file\": \"dataFile1\", \"filter\": { \"affected-object\":
\"AffectedObject1\", \"affected-nd\" : \"AffectedNd1\"}},\"Domain2\": {\"data-file\":
\"dataFile2\", \"filter\": { \"affected-ci\":\"AffectedCI2\", \"affected-net\" :
\"AffectedNet2\"}}}"
val mapper = new ObjectMapper
mapper.registerModule(DefaultScalaModule)
val reference = new TypeReference[Map[String, CrossDomainFilterObj]] {}
val data = mapper.readValue(jsonString, reference)
println(data)
I am getting error like below
dead code following this construct
"val data = mapper.readValue(jsonString, reference)"
Could someone help to identify what I am doing wrong here.
Just make sure you use ScalaObjectMapper:
val mapper = new ObjectMapper() with ScalaObjectMapper
Then this should work:
val data = mapper.readValue[Map[String, CrossDomainFilterObj]](jsonString)

Scala does not write into MongoDB

I am on Ubuntu 20.04. I want to write some data in Scala to MongoDB. Here's what I have:
import org.mongodb.scala.bson.collection.immutable.{Document => MongoDocument}
import org.mongodb.scala.{MongoClient, MongoCollection, MongoDatabase}
object Application extends App {
val mongoClient: MongoClient = MongoClient()
// Use a Connection String
//val mongoClient: MongoClient = MongoClient("mongodb://localhost")
val database: MongoDatabase = mongoClient.getDatabase("mydb")
val collection: MongoCollection[MongoDocument] = database.getCollection("user")
val doc: MongoDocument = MongoDocument("_id" -> 0, "name" -> "MongoDB", "type" -> "database",
"count" -> 1, "info" -> MongoDocument("x" -> 203, "y" -> 102))
collection.insertOne(doc)
val documents = (1 to 100) map { i: Int => MongoDocument("i" -> i) }
collection.insertMany(documents)
}
The error (not even an error, INFO level) I get:
Nov 16, 2020 1:42:08 AM com.mongodb.diagnostics.logging.JULLogger log
INFO: Cluster created with settings {hosts=[localhost:27017],
mode=SINGLE, requiredClusterType=UNKNOWN,
serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
And nothing happens to the database. No data appears there. No errors, no insertions into Mongo, nothing.
I used primarily these sources as examples:
https://mongodb.github.io/mongo-scala-driver/2.9/getting-started/quick-tour/
https://blog.knoldus.com/how-scala-interacts-with-mongodb/
MongoDB is up, status is active. Inserting data from the terminal was done successfully. So, the program's behavior I have is strange. I've been searching everywhere on the Internet for the answers but I can't seem to find it. Your help will be appreciated a lot. Thank you!
Thanks to #Luis Miguel Mejía Suárez's help. Here's what I have done so far: added an Observer implementation and a promise. Thanks to this post: Scala script wait for mongo to complete task . That's what I have now:
val mongoClient: MongoClient = MongoClient("mongodb://localhost")
val database: MongoDatabase = mongoClient.getDatabase("mydb")
val collection: MongoCollection[MongoDocument] = database.getCollection("user")
val doc: MongoDocument = MongoDocument("name" -> "MongoDB", "type" -> "database",
"count" -> 1, "info" -> MongoDocument("x" -> 203, "y" -> 102))
val observable: Observable[Completed] = collection.insertOne(doc)
val promise = Promise[Boolean]
observable.subscribe(new Observer[Completed] {
override def onNext(result: Completed): Unit = println("Inserted")
override def onError(e: Throwable): Unit = {
println("Failed")
promise.success(false)
}
override def onComplete(): Unit = {
println("Completed")
promise.success(true)
}
})
val future = promise.future
Await.result(future, Duration(5, java.util.concurrent.TimeUnit.SECONDS))
mongoClient.close()
Generally speaking, it works in most cases. Though, I didn't handle the case with insertMany method where the program has to wait for the last element insertion. My realization does not work properly with this.
P.S. Turns out insertMany also works fine with this example, I just tested it with the wrong data.

How to call remote SQL function inside PySpark or Scala databriks notebook

I am writing databriks scala / python notebook which connect SQL server database.
and i want to execute sql server function from notebook with custom paramters.
import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._
val ID = "1"
val name = "A"
val config = Config(Map(
"url" -> "sample-p-vm.all.test.azure.com",
"databaseName" -> "DBsample",
"dbTable" -> "dbo.FN_cal_udf",
"user" -> "useer567",
"password" -> "pppp#345%",
"connectTimeout" -> "5", //seconds
"queryTimeout" -> "5" //seconds
))
val collection = sqlContext.read.sqlDB(config)
collection.show()
here function is FN_cal_udf which stored in sql server database -'DBsample'
I got error :
jdbc.SQLServerException: Parameters were not supplied for the function
How i can pass parameter and call SQL function inside notebook in scala or pyspark.
Here you can first make query string which stores function calling statement with dynamic parameters.
and then use in congig.
import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._
val ID = "1"
val name = "A"
val query = " [dbo].[FN_cal_udf]('"+ID+"','"+name+"')"
val config = Config(Map(
"url" -> "sample-p-vm.all.test.azure.com",
"databaseName" -> "DBsample",
"dbTable" -> "dbo.FN_cal_udf",
"user" -> "useer567",
"password" -> "pppp#345%",
"connectTimeout" -> "5", //seconds
"queryTimeout" -> "5" //seconds
))
val collection = sqlContext.read.sqlDB(config)
collection.show()

Mongo Scala Driver - Can't insert in the database

I'm practicing on a project that needs a database connection, I'm using the Play Framework combine to Scala and MongoDB.
I'm also using Mongo-scala-driver and following the documentation.
I wrote the exact same code:
println("start")
val mongoClient: MongoClient = MongoClient("mongodb://localhost:27017/Sandbox")
val database: MongoDatabase = mongoClient.getDatabase("test")
val collection: MongoCollection[Document] = database.getCollection("test")
val doc: Document = Document("_id" -> 0, "name" -> "MongoDB", "type" -> "database", "count" -> 1, "info" -> Document("x" -> 203, "y" -> 102))
collection.insertOne(doc).subscribe(new Observer[Completed] {
override def onSubscribe(subscription: Subscription): Unit = println("Subscribed")
override def onNext(result: Completed): Unit = println("Inserted")
override def onError(e: Throwable): Unit = println("Failed")
override def onComplete(): Unit = println("Completed")
})
mongoClient.close()
println("end")
Nothing is inserted into the database and the only result i get from the log is this:
start
Subscribed
end
I've been looking on stackoverflow for similar subject but everything I found didn't work for me.
You try insert document in asyncronous mode.
Therefore you must define three call back function onNext onError and onComplete
But you don't give time for execute insertion.
Try append any timeout before close connection. For example simple add
Thread.sleep(1000)
before
mongoClient.close()
And you no need redefine onSubscribe()
if you not want manually control demand when you move in documents list from you requests then you no need override onSubscribe(). The default definition for onSubscrime() very usable for trivial requests. In you case you no need override him.
The next code is worked
println("start")
val mongoClient: MongoClient = MongoClient("mongodb://DB01-MongoDB:27017/Sandbox")
val database: MongoDatabase = mongoClient.getDatabase("test")
val collection: MongoCollection[Document] = database.getCollection("test")
val doc: Document = Document("_id" -> 0,
"name" -> "MongoDB",
"type" -> "database",
"count" -> 1,
"info" -> Document("x" -> 203, "y" -> 102))
collection
.insertOne(doc)
.subscribe(new Observer[Completed] {
override def onNext(result: Completed): Unit = println("Inserted")
override def onError(e: Throwable): Unit = println("Failed")
override def onComplete(): Unit = println("Completed")
})
Thread.sleep(1000)
mongoClient.close()
println("end")
}
The problem was the Observer, I imported it from org.mongodb.async.client but the good one was org.mongodb.scala.
Hope this helps someone else.
The above solution may work but you might have to trade 1 second every time you insert (or any call). Another solution is to do make use of the call back :
val insertObservable = collection.insertOne(doc)
insertObservable.subscribe(new Observer[Completed] {
override def onComplete(): Unit = mongoClient.close()
})
Once the transaction completed, the connection gets closed automatically without wasting 1 second.

How to write spark DataFrames to Postgres DB

I use Spark 1.3.0
Let's say I have a dataframe in Spark and I need to store this to Postgres DB (postgresql-9.2.18-1-linux-x64) on a 64bit ubuntu machine.
I also use postgresql9.2jdbc41.jar as a driver to connect to postgres
I was able to read data from postgres DB using the below commands
import org.postgresql.Driver
val url="jdbc:postgresql://localhost/postgres?user=user&password=pwd"
val driver = "org.postgresql.Driver"
val users = {
sqlContext.load("jdbc", Map(
"url" -> url,
"driver" -> driver,
"dbtable" -> "cdimemployee",
"partitionColumn" -> "intempdimkey",
"lowerBound" -> "0",
"upperBound" -> "500",
"numPartitions" -> "50"
))
}
val get_all_emp = users.select("*")
val empDF = get_all_emp.toDF
get_all_emp.foreach(println)
I want to write this DF back to postgres after some processing.
Is this below code right?
empDF.write.jdbc("jdbc:postgresql://localhost/postgres", "test", Map("user" -> "user", "password" -> "pwd"))
Any pointers(scala) would be helpful.
You should follow the code below.
val database = jobConfig.getString("database")
val url: String = s"jdbc:postgresql://localhost/$database"
val tableName: String = jobConfig.getString("tableName")
val user: String = jobConfig.getString("user")
val password: String = jobConfig.getString("password")
val sql = jobConfig.getString("sql")
val df = sc.sql(sql)
val properties = new Properties()
properties.setProperty("user", user)
properties.setProperty("password", password)
properties.put("driver", "org.postgresql.Driver")
df.write.mode(SaveMode.Overwrite).jdbc(url, tableName, properties)