Unit test case to mock postgresql Connection and statements in SCALA - postgresql

I am very much new to Scala and need to write a test case which will mock the Postgresql connections and Statements.However unable to do so and getting the error.Can anyone help me.Below is the code that I've written
Thanks in advance !!
import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.Column`
import org.slf4j.LoggerFactory
import java.nio.file.Paths
import java.sql.ResultSet
import java.io.InputStream
import java.io.Reader
import java.util
import java.io.File
import java.util.UUID
import java.nio.file.attribute.PosixFilePermission
import com.typesafe.config.ConfigFactory
import org.apache.spark.sql.{DataFrame, SQLContext}
import org.scalatest.{Matchers, WordSpecLike, BeforeAndAfter}
import org.scalactic.{Good, Bad, Many, One}
import scala.collection.JavaConverters._
import spark.jobserver.{SparkJobValid, SparkJobInvalid}
import spark.jobserver.api.{JobEnvironment, SingleProblem}
import org.apache.spark.sql.{Column, Row, DataFrame}
import java.sql.Connection
import java.sql.DriverManager
import java.sql.ResultSet
import org.junit.Assert
import org.junit.Before
import org.junit.Test
import org.junit.runner.RunWith
import org.easymock.EasyMock.expect
import org.powermock.api._
import org.powermock.core.classloader.annotations.PrepareForTest
import java.io.FileReader
import org.scalamock.scalatest.MockFactory
import org.powermock.core.classloader.annotations.PrepareForTest
import org.powermock.api.mockito.PowerMockito
import org.powermock.api.mockito.PowerMockito._
import org.postgresql.copy.CopyManager
import scala.collection.JavaConversions._
import org.mockito.Matchers.any
import java.sql.Statement
class mockCopyManager(){
def copyIn(command : String , fR:java.io.FileReader) :Unit ={
println("Run Command {}".format(command))
}
}
class AdvisoretlSpec extends WordSpecLike with Matchers with
MockFactory {
val sc = SparkUnitTestContext.hiveContext
import SparkUnitTestContext.defaultSizeInBytes
"Class Advisoretl job" should {
"load data in "{
val csvMap : Map[String,String] = Map("t1"->"t1.csv","t2"->"t2.csv")
val testObj = new Advisoretl()
val mockStatement = mock[Statement]
val mockConnection=mock[Connection]
val a:String = "TRUNCATE TABLE t1"
val b:String = "TRUNCATE TABLE t2"
PowerMockito.mockStatic(classOf[DriverManager])
val mockCopyManager=mock[CopyManager]
PowerMockito.when(DriverManager.getConnection(any[String]), Nil: _*).thenReturn(mockConnection)
(mockConnection.createStatement _).when().returns(mockStatement)
(mockStatement.executeUpdate _).when(a).returns(1)
(mockStatement.executeUpdate _).when("TRUNCATE TABLE t2").returns(1)
(mockCopyManager.copyIn _).when(*).returns(1)*/
val fnResult = testObj.connectionWithPostgres("a", "b", "c", "target/testdata", csvMap)
fnResult should be ("OK")
}
}
}'

Related

how to store Select Statemnt data to var in scala play2.6 slick

I have written One SQL select query and i want to store the result returned from this query to some Variable how to do that
val count=(sql"""SELECT count(User_ID) from user_details_table where email=$email or Mobile_no=$Mobile_no""".as[(String)] )
val a1=Await.result(dbConfig.run(count), 1000 seconds)
Ok(Json.toJson(a1.toString()))
here i am not able to find out the id that is returning from this query
this is my complete code what i am trying to do
import javax.inject.Inject
import play.api.mvc.{AbstractController, ControllerComponents}
import javax.inject.Inject
import play.api.mvc.{AbstractController, ControllerComponents}
import scala.concurrent.Await
import javax.inject.Inject
import play.api.mvc._
import com.google.gson.{FieldNamingPolicy, Gson, GsonBuilder}
import play.api.libs.json.Json
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.{Await, Future}
import javax.inject.Inject
import org.joda.time.format.DateTimeFormat
import play.api.libs.json.{JsPath, Writes}
import slick.jdbc.GetResult
import scala.concurrent.ExecutionContext.Implicits.global
//import play.api.mvc._
import org.joda.time.{DateTime, Period}
import play.api.libs.json.Json
import play.api.mvc._
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
import com.google.gson.Gson
class adduserrs #Inject()(cc: ControllerComponents) extends AbstractController(cc)
{
def adduser(Name:String,Mobile_no:String,email:String,userpassword:String,usertype:String) = Action
{
import play.api.libs.json.{JsPath, JsValue, Json, Writes}
val gson: Gson = new GsonBuilder().setFieldNamingPolicy(FieldNamingPolicy.UPPER_CAMEL_CASE).create
val dbConfig = Database.forURL("jdbc:mysql://localhost:3306/equineapp?user=root&password=123456", driver = "com.mysql.jdbc.Driver")
var usertypeid=0;
if(usertype=="Owner")
{
usertypeid=1;
}
else if(usertype=="Practitioner")
{
usertypeid=2;
}
val count=(sql"""SELECT count(User_ID) from user_details_table where email=$email or Mobile_no=$Mobile_no""".as[(String)] )
val a1=Await.result(dbConfig.run(count), 1000 seconds)
Ok(Json.toJson(a1.toString()))
if (count==0) {
val setup1 = sql"call addusersreg ($Name,$Mobile_no,$email,$userpassword,$usertypeid);".as[(String, String, String, String, Int)]
val res = Await.result(dbConfig.run(setup1), 1000 seconds)
Ok(Json.toJson(1))
}
else {
Ok(Json.toJson(0))
}
}
from above code iam just trying to insert userdetails in database
if user exists in db then it will return response as 0 or else it will return response as 1
Ok, so here you are only counting, so perhaps you just need a variable of type Long:
SQL("select count(*) from User where tel = {telephoneNumber}")
.on('telephobeNumber -> numberThatYouPassedToTheMethod).executeQuery()
.as(SqlParser.scalar[Long].single)
You just totally changed the question, anyway, for the error you mentioned in the comment, the reason is that you have no connection as well as you did not define the database you want to use (default or otherwise). All the database calls are within the following block:
db.withConnection{
implicit connection =>
//SQL queries live here.
}
Moreover you need to db is injected if it is not the default database:
class myTestModel #Inject()(#NamedDatabase("nonDefaultDB") db: Database){???}
Follow MVC Architecture: For consistency with model-view-controller architecture, all your database calls should be within models classes. The controller method needs to call the models method for the result.

value na is not a member of?

hello i just started to learn scala.
and just follow the tutorial in udemy.
i was followed the same code but give me an error.
i have no idea about that error.
and this my code
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.sql.SparkSession
import org.apache.log4j._
import org.apache.spark.ml.feature.{CountVectorizer, CountVectorizerModel}
import org.apache.spark.ml.feature.Word2Vec
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.sql.Row
Logger.getLogger("org").setLevel(Level.ERROR)
val spark = SparkSession.builder().getOrCreate()
val data = spark.read.option("header","true").
option("inferSchema","true").
option("delimiter","\t").
format("csv").
load("dataset.tsv").
withColumn("subject", split($"subject", " "))
val logRegDataAll = (data.select(data("label")).as("label"),$"subject")
val logRegData = logRegDataAll.na.drop()
and give me error like this
scala> :load LogisticRegression.scala
Loading LogisticRegression.scala...
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.sql.SparkSession
import org.apache.log4j._
import org.apache.spark.ml.feature.{CountVectorizer, CountVectorizerModel}
import org.apache.spark.ml.feature.Word2Vec
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.sql.Row
spark: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession#1efcba00
data: org.apache.spark.sql.DataFrame = [label: string, subject: array<string>]
logRegDataAll: (org.apache.spark.sql.Dataset[org.apache.spark.sql.Row], org.apache.spark.sql.ColumnName) = ([label: string],subject)
<console>:43: error: value na is not a member of (org.apache.spark.sql.Dataset[org.apache.spark.sql.Row], org.apache.spark.sql.ColumnName)
val logRegData = logRegDataAll.na.drop()
^
thanks for helping
You can see clearly
val logRegDataAll = (data.select(data("label")).as("label"),$"subject")
This returns
(org.apache.spark.sql.Dataset[org.apache.spark.sql.Row], org.apache.spark.sql.ColumnName)
So there is an extra parantheses ) data("label")) which should be data.select(data("label").as("label"),$"subject") in actual.

Spark JSON DStream Print() / saveAsTextFiles not working

Issue Description:
Spark Version: 1.6.2
Execution: Spark-shell (REPL) master = local[2] (tried local[*])
example.json is as below:
{"name":"D2" ,"lovesPandas":"Y"}
{"name":"D3" ,"lovesPandas":"Y"}
{"name":"D4" ,"lovesPandas":"Y"}
{"name":"D5" ,"lovesPandas":"Y"}
Code executing in Spark-shell local mode:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.kafka._
import org.apache.spark.sql._
import org.json4s._
import org.json4s.jackson.JsonMethods._
import _root_.kafka.serializer.StringDecoder
import _root_.kafka.serializer.Decoder
import _root_.kafka.utils.VerifiableProperties
import org.apache.hadoop.hbase._
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.mapred.TableOutputFormat
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.mapred.JobConf
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
val ssc = new StreamingContext(sc, Seconds(2) )
val messages = ssc.textFileStream("C:\\pdtemp\\test\\example.json")
messages.print()
I tried the saveAsTextFiles but it is not saving any files too.
this does not work -- shows no output -- tried the same with reading stream from Kafka on a spark-shell
tried the following too -- does not work:
messages.foreachRDD(rdd => rdd.foreach(print))
Also, tried parsing the schema converting to dataframe but nothing seems to work
normal json parsing is working and i can print the contents of normal //RDD/DF //to console in Spark-shell
Can anyone help, please?

Compiled Querys in Slick

I need to compile a query in Slick with Play and PostgreSQL
val bioMaterialTypes: TableQuery[Tables.BioMaterialType] = Tables.BioMaterialType
def getAllBmts() = for{ bmt <- bioMaterialTypes } yield bmt
val queryCompiled = Compiled(getAllBmts _)
but in Scala IDE I get this error in the Apply of Compiled
Multiple markers at this line
- Computation of type () => scala.slick.lifted.Query[models.Tables.BioMaterialType,models.Tables.BioMaterialTypeRow,Seq]
cannot be compiled (as type C)
- not enough arguments for method apply: (implicit compilable: scala.slick.lifted.Compilable[() =>
scala.slick.lifted.Query[models.Tables.BioMaterialType,models.Tables.BioMaterialTypeRow,Seq],C], implicit driver:
scala.slick.profile.BasicProfile)C in object Compiled. Unspecified value parameters compilable, driver.
This are my imports:
import scala.concurrent.Future
import scala.slick.jdbc.StaticQuery.staticQueryToInvoker
import scala.slick.lifted.Compiled
import scala.slick.driver.PostgresDriver
import javax.inject.Inject
import javax.inject.Singleton
import models.BioMaterialType
import models.Tables
import play.api.Application
import play.api.db.slick.Config.driver.simple.TableQuery
import play.api.db.slick.Config.driver.simple.columnExtensionMethods
import play.api.db.slick.Config.driver.simple.longColumnType
import play.api.db.slick.Config.driver.simple.queryToAppliedQueryInvoker
import play.api.db.slick.Config.driver.simple.queryToInsertInvoker
import play.api.db.slick.Config.driver.simple.stringColumnExtensionMethods
import play.api.db.slick.Config.driver.simple.stringColumnType
import play.api.db.slick.Config.driver.simple.valueToConstColumn
import play.api.db.slick.DB
import play.api.db.slick.DBAction
You can simply do
val queryCompiled = Compiled(bioMaterialTypes)

How can I load Avros in Spark using the schema on-board the Avro file(s)?

I am running CDH 4.4 with Spark 0.9.0 from a Cloudera parcel.
I have a bunch of Avro files that were created via Pig's AvroStorage UDF. I want to load these files in Spark, using a generic record or the schema onboard the Avro files. So far I've tried this:
import org.apache.avro.mapred.AvroKey
import org.apache.avro.mapreduce.AvroKeyInputFormat
import org.apache.hadoop.io.NullWritable
import org.apache.commons.lang.StringEscapeUtils.escapeCsv
import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import java.net.URI
import java.io.BufferedInputStream
import java.io.File
import org.apache.avro.generic.{GenericDatumReader, GenericRecord}
import org.apache.avro.specific.SpecificDatumReader
import org.apache.avro.file.DataFileStream
import org.apache.avro.io.DatumReader
import org.apache.avro.file.DataFileReader
import org.apache.avro.mapred.FsInput
val input = "hdfs://hivecluster2/securityx/web_proxy_mef/2014/05/29/22/part-m-00016.avro"
val inURI = new URI(input)
val inPath = new Path(inURI)
val fsInput = new FsInput(inPath, sc.hadoopConfiguration)
val reader = new GenericDatumReader[GenericRecord]
val dataFileReader = DataFileReader.openReader(fsInput, reader)
val schemaString = dataFileReader.getSchema
val buf = scala.collection.mutable.ListBuffer.empty[GenericRecord]
while(dataFileReader.hasNext) {
buf += dataFileReader.next
}
sc.parallelize(buf)
This works for one file, but it can't scale - I am loading all the data into local RAM and then distributing it across the spark nodes from there.
To answer my own question:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.avro.generic.GenericRecord
import org.apache.avro.mapred.AvroKey
import org.apache.avro.mapred.AvroInputFormat
import org.apache.avro.mapreduce.AvroKeyInputFormat
import org.apache.hadoop.io.NullWritable
import org.apache.commons.lang.StringEscapeUtils.escapeCsv
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import org.apache.hadoop.conf.Configuration
import java.io.BufferedInputStream
import org.apache.avro.file.DataFileStream
import org.apache.avro.io.DatumReader
import org.apache.avro.file.DataFileReader
import org.apache.avro.file.DataFileReader
import org.apache.avro.generic.{GenericDatumReader, GenericRecord}
import org.apache.avro.mapred.FsInput
import org.apache.avro.Schema
import org.apache.avro.Schema.Parser
import org.apache.hadoop.mapred.JobConf
import java.io.File
import java.net.URI
// spark-shell -usejavacp -classpath "*.jar"
val input = "hdfs://hivecluster2/securityx/web_proxy_mef/2014/05/29/22/part-m-00016.avro"
val jobConf= new JobConf(sc.hadoopConfiguration)
val rdd = sc.hadoopFile(
input,
classOf[org.apache.avro.mapred.AvroInputFormat[GenericRecord]],
classOf[org.apache.avro.mapred.AvroWrapper[GenericRecord]],
classOf[org.apache.hadoop.io.NullWritable],
10)
val f1 = rdd.first
val a = f1._1.datum
a.get("rawLog") // Access avro fields
This works for me:
import org.apache.avro.generic.GenericRecord
import org.apache.avro.mapred.{AvroInputFormat, AvroWrapper}
import org.apache.hadoop.io.NullWritable
...
val path = "hdfs:///path/to/your/avro/folder"
val avroRDD = sc.hadoopFile[AvroWrapper[GenericRecord], NullWritable, AvroInputFormat[GenericRecord]](path)