Multiple random UUID but value uuid is the same - scala

case class ConversationId(value: UUID)
case class CustomerRequestId(uuid: UUID)
class CustomerRequestReceiverController{
def createCustomerRequest(req: Request, optConversationId: Option[ConversationId] = None): Task[Response] = {
val NONE_REVISION = 0
val customerRequestId = CustomerRequestId(RandomUUID.randomUUID)
val result: Task[Response] = for {
form <- req.as(jsonOf[CreateCustomerRequestFormInternalApiV2])
_ = println(s"Conversation Id ${form.conversationId.getOrElse(ConversationId(RandomUUID.randomUUID))}")
_ = appLogger.info(s"creating CustomerRequest ${form.customerId.value} with Conversation ${optConversationId.getOrElse(form.conversationId.getOrElse(RandomUUID.randomUUID))}")
_ = logger.info(s"creating CustomerRequest ${form.customerId.value} with Conversation ${optConversationId.getOrElse(form.conversationId.getOrElse(RandomUUID.randomUUID))}")
createCommand <- Task.delay(CreateCustomerRequest(
customerRequestId = customerRequestId,
communication = Communication.Chat(optConversationId.getOrElse(form.conversationId.getOrElse(ConversationId(RandomUUID.randomUUID)))),
commandResult <- Task.delay(createCommandHandler.handle(createCommand))
response <- responseFormat(commandResult)
} yield response
result.handleWith(invalidRequestBody)
}
}
I expect the output of value from UUID.randomUUID() is the same UUID, but the actual output is different value

Replace line
customerRequestId = customerRequestId,
with
customerRequestId = CustomerRequestId(RandomUUID.randomUUID)
Now you generate CustomerRequestId only once and use it on each iteration

Related

Use a functional construct to crate a List of classes from REST response

I wrote the below code to:
Query a REST endpoint
Create an instance of the case class Pair from the name and value fields in the response
The Pair instance added to the list lb.
This code is very imperative:
case class Pair(name: String, value: Double, update: String)
var lb = new ListBuffer[Currency]()
for (elem <- Data.updates) {
try {
val request = "http://...../values/" + elem
val response = scala.io.Source.fromURL(request).mkString
val jsonObject = Json.parse(response)
val value = (jsonObject \ "value").get.toDouble
val update = (jsonObject \ "update").get
lb += Pair(elem , value, update)
} catch {
case e: IOException => println("Exception occured accessing data for " +elem)
}
}
The above code provides the context of the problem. I've refactored and written a block that runs on Scastie with changes but hopefully conveys the same problem:
import play.api.libs.json.{Json, __}
val updates = List("test1" , "test2")
case class Pair(name: String, value: Double, update: String)
var lb = new scala.collection.mutable.ListBuffer[Pair]()
for (elem <- updates) {
val request = "http://www.google.com" + elem
val response : String = "{\"value\" : 1 , \"update\" : \"test\"}"
val jsonObject = Json.parse(response)
val value = (jsonObject \ "value").get.toString().toDouble
val update = (jsonObject \ "update").get.toString()
lb += Pair(elem , value, update)
}
println(lb.toList)
Scastie: https://scastie.scala-lang.org/RmBUUem5SxGT2dYLIqLryQ
I need to replace the ListBuffer with List and somehow populate the List using a functional construct instead of a loop?
Just use .map:
val lb = updates.map { elem =>
...
Pair(elem, value, update)
}
I'd first define a method:
def getPairFromElem(elem: String): Pair = {
val request = "http://www.google.com" + elem
val response: String = "{\"value\" : 1 , \"update\" : \"test\"}"
val jsonObject = Json.parse(response)
val value = (jsonObject \ "value").get.toString().toDouble
val update = (jsonObject \ "update").get.toString()
Pair(elem , value, update)
}
Then use it:
val updates = List("test1" , "test2")
val lb = updates.map(getPairFromElem)
Code run at Scastie.

How to create a DataFrame using a string consisting of key-value pairs?

I'm getting logs from a firewall in CEF Format as a string which looks as:
ABC|XYZ|F123|1.0|DSE|DSE|4|externalId=e705265d0d9e4d4fcb218b cn2=329160 cn1=3053998 dhost=SRV2019 duser=admin msg=Process accessed NTDS fname=ntdsutil.exe filePath=\\Device\\HarddiskVolume2\\Windows\\System32 cs5="C:\\Windows\\system32\\ntdsutil.exe" "ac i ntds" ifm "create full ntdstest3" q q fileHash=80c8b68240a95 dntdom=adminDomain cn3=13311 rt=1610948650000 tactic=Credential Access technique=Credential Dumping objective=Gain Access patternDisposition=Detection. outcome=0
How can I create a DataFrame from this kind of string where I'm getting key-value pairs separated by = ?
My objective is to infer schema from this string using the keys dynamically, i.e extract the keys from left side of the = operator and create a schema using them.
What I have been doing currently is pretty lame(IMHO) and not very dynamic in approach.(because the number of key-value pairs can change as per different type of logs)
val a: String = "ABC|XYZ|F123|1.0|DSE|DCE|4|externalId=e705265d0d9e4d4fcb218b cn2=329160 cn1=3053998 dhost=SRV2019 duser=admin msg=Process accessed NTDS fname=ntdsutil.exe filePath=\\Device\\HarddiskVolume2\\Windows\\System32 cs5="C:\\Windows\\system32\\ntdsutil.exe" "ac i ntds" ifm "create full ntdstest3" q q fileHash=80c8b68240a95 dntdom=adminDomain cn3=13311 rt=1610948650000 tactic=Credential Access technique=Credential Dumping objective=Gain Access patternDisposition=Detection. outcome=0"
val ttype: String = "DCE"
type parseReturn = (String,String,List[String],Int)
def cefParser(a: String, ttype: String): parseReturn = {
val firstPart = a.split("\\|")
var pD = new ListBuffer[String]()
var listSize: Int = 0
if (firstPart.size == 8 && firstPart(4) == ttype) {
pD += firstPart(0)
pD += firstPart(1)
pD += firstPart(2)
pD += firstPart(3)
pD += firstPart(4)
pD += firstPart(5)
pD += firstPart(6)
val secondPart = parseSecondPart(firstPart(7), ttype)
pD ++= secondPart
listSize = pD.toList.length
(firstPart(2), ttype, pD.toList, listSize)
} else {
val temp: List[String] = List(a)
(firstPart(2), "IRRELEVANT", temp, temp.length)
}
}
The method parseSecondPart is:
def parseSecondPart(m:String, ttype:String): ListBuffer[String] = ttype match {
case auditActivity.ttype=>parseAuditEvent(m)
Another function call to just replace some text in the logs
def parseAuditEvent(msg: String): ListBuffer[String] = {
val updated_msg = msg.replace("cat=", "metadata_event_type=")
.replace("destinationtranslatedaddress=", "event_user_ip=")
.replace("duser=", "event_user_id=")
.replace("deviceprocessname=", "event_service_name=")
.replace("cn3=", "metadata_offset=")
.replace("outcome=", "event_success=")
.replace("devicecustomdate1=", "event_utc_timestamp=")
.replace("rt=", "metadata_event_creation_time=")
parseEvent(updated_msg)
}
Final function to get only the values:
def parseEvent(msg: String): ListBuffer[String] = {
val newMsg = msg.replace("\\=", "$_equal_$")
val pD = new ListBuffer[String]()
val splitData = newMsg.split("=")
val mSize = splitData.size
for (i <- 1 until mSize) {
if(i < mSize-1) {
val a = splitData(i).split(" ")
val b = a.size-1
val c = a.slice(0,b).mkString(" ")
pD += c.replace("$_equal_$","=")
} else if(i == mSize-1) {
val a = splitData(i).replace("$_equal_$","=")
pD += a
} else {
logExceptions(newMsg)
}
}
pD
}
The returns contains a ListBuffer[String]at 3rd position, using which I create a DataFrame as follows:
val df = ss.sqlContext
.createDataFrame(tempRDD.filter(x => x._1 != "IRRELEVANT")
.map(x => Row.fromSeq(x._3)), schema)
People of stackoverflow, i really need your help in improving my code, both for performance and approach.
Any kind of help and/or suggestions will be highly appreciated.
Thanks In Advance.

Flatten array in yield statement in Scala

I have the following piece of code
var splitDf = fullCertificateSourceDf.map(row => {
val ID = row.getAs[String]("ID")
val CertificateID = row.getAs[String]("CertificateID")
val CertificateTag = row.getAs[String]("CertificateTag")
val CertificateDescription = row.getAs[String]("CertificateDescription")
val WorkBreakdownUp1Summary = row.getAs[String]("WorkBreakdownUp1Summary")
val ProcessBreakdownSummaryList = row.getAs[String]("ProcessBreakdownSummaryList")
val ProcessBreakdownUp1SummaryList = row.getAs[String]("ProcessBreakdownUp1SummaryList")
val ProcessBreakdownUp2Summary = row.getAs[String]("ProcessBreakdownUp2Summary")
val ProcessBreakdownUp3Summary = row.getAs[String]("ProcessBreakdownUp3Summary")
val ActualStartDate = row.getAs[java.sql.Date]("ActualStartDate")
val ActualEndDate = row.getAs[java.sql.Date]("ActualEndDate")
val ApprovedDate = row.getAs[java.sql.Date]("ApprovedDate")
val CurrentState = row.getAs[String]("CurrentState")
val DataType = row.getAs[String]("DataType")
val PullDate = row.getAs[String]("PullDate")
val PullTime = row.getAs[String]("PullTime")
val split_ProcessBreakdownSummaryList = ProcessBreakdownSummaryList.split(",")
val split_ProcessBreakdownUp1SummaryList = ProcessBreakdownUp1SummaryList.split(",")
val Pattern = "^.*?(?= - *[a-zA-Z])".r
for{
subSystem : String <- split_ProcessBreakdownSummaryList
} yield(ID,
CertificateID,
CertificateTag,
CertificateDescription,
WorkBreakdownUp1Summary,
subSystem,
for{ system: String <- split_ProcessBreakdownUp1SummaryList if(system contains subSystem.trim().substring(0,11))}yield(system),
ProcessBreakdownUp2Summary,
ProcessBreakdownUp3Summary,
ActualStartDate,
ActualEndDate,
ApprovedDate,
CurrentState,
DataType,
PullDate,
PullTime
)
}).flatMap(identity(_))
display(splitDf)
How can I get the first matching element from the following portion of the above statement:
for{ system: String <- split_ProcessBreakdownUp1SummaryList if(system contains subSystem.trim().substring(0,11))}yield(system)
At the moment it returns an array with one element in it. I dont want the array I just want the element.
Thank you in advance.

Split a list in several files scala

I have the following list,
10,44,22
10,47,12
15,38,3
15,41,30
16,44,15
16,47,18
22,38,21
22,41,42
34,44,40
34,47,36
40,38,39
40,41,42
45,38,27
45,41,30
46,44,45
46,47,48
Then I am creating one file with it is content with the following code:
val fstream:FileWriter = new FileWriter("patSPO.csv")
var out:BufferedWriter = new BufferedWriter(fstream)
val sl = listSPO.sortBy(l => (l.sub, l.pre))
for ( a <- 0 to listSPO.size-1){
out.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
}
out.close()
However I want to divide the content in a n files, then I try the following for 4 files:
val fstream:FileWriter = new FileWriter("patSPO.csv")
val fstream1:FileWriter = new FileWriter("patSPO1.csv")
val fstream2:FileWriter = new FileWriter("patSPO2.csv")
val fstream3:FileWriter = new FileWriter("patSPO3.csv")
val fstream4:FileWriter = new FileWriter("patSPO4.csv")
var out:BufferedWriter = new BufferedWriter(fstream)
var out1:BufferedWriter = new BufferedWriter(fstream1)
var out2:BufferedWriter = new BufferedWriter(fstream2)
var out3:BufferedWriter = new BufferedWriter(fstream3)
var out4:BufferedWriter = new BufferedWriter(fstream4)
val b :Int = listSPO.size/4
val sl = listSPO.sortBy(l => (l.sub, l.pre))
for ( a <- 0 to listSPO.size-1){
out.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
}
for ( a <- 0 to b-1){
out1.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
}
for ( a <- b to (b*2)-1){
out2.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
}
for ( a <- b*2 to (b*3)-1){
out3.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
}
for ( a <- b*3 to (b*4)-1){
out4.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
}
out.close()
out1.close()
out2.close()
out3.close()
out4.close()
Then my question is if exist a general code where I put the number of files to generate, for example 32, and not to write 32 times the out, the for and the fstream?
First, let's make some utilities to eliminate ugly boilerplate:
def open(
number: Int = 1,
prefix: String,
ext: String
) = (0 to n-1)
.iterator
.map { i =>
"%s%s%s".format(prefix, if(i == 0) "" else i.toString, postfix)
}.map(new FileWriter(_))
.map(new BufferedWriter(_))
def toString(l: WhateverTheType) = Seq(
l.sub,
l.pre,
l.obj
).mkString(",") + "\n"
And now to the implementation:
writeToFiles(
listSPO: List[WhateverTheType],
numFiles: Int = 1,
prefix: String = "pasSPO",
ext: String = ".csv"
) = listSPO
.grouped((listSPO.size+1)/numFiles)
.zip(open(numFiles, prefix, ext))
.foreach { case (input, file) =>
try {
input.foreach(file.write(toString(input)))
} finally {
file.close
}
}
Assuming the objects in the list have this type (otherwise, relpace SPO in the below code with your type):
case class SPO(sub: Int, pre: Int, obj: Int)
This should do it:
val sl = listSPO.sortBy(l => (l.sub, l.pre))
val files = 5 // whatever number you want
// helper function to write a single list - similar to your implementation but shorter
def writeListToFile(name: String, list: List[SPO]): Unit = {
val writer = new BufferedWriter(new FileWriter(name))
list.foreach(spo => writer.write(s"${spo.sub},${spo.pre},${spo.obj}\n"))
writer.close()
}
sl.grouped(sl.size / files) // split into sublists
.zipWithIndex // add index of sublists for file name
.foreach {
case (sublist, 0) => writeListToFile(s"pasSPO.csv", sublist) // if indeed you want the first file name NOT to include the index
case (sublist, index) => writeListToFile(s"pasSPO$index.csv", sublist)
}

spark groupBy operation hangs at 199/200

I have a spark standalone cluster with master and two executors. I have an RDD[LevelOneOutput] and below is LevelOneOutput class
class LevelOneOutput extends Serializable {
#BeanProperty
var userId: String = _
#BeanProperty
var tenantId: String = _
#BeanProperty
var rowCreatedMonth: Int = _
#BeanProperty
var rowCreatedYear: Int = _
#BeanProperty
var listType1: ArrayBuffer[TypeOne] = _
#BeanProperty
var listType2: ArrayBuffer[TypeTwo] = _
#BeanProperty
var listType3: ArrayBuffer[TypeThree] = _
...
...
#BeanProperty
var listType18: ArrayBuffer[TypeEighteen] = _
#BeanProperty
var groupbyKey: String = _
}
Now I want to group this RDD based on userId, tenantId, rowCreatedMonth, rowCreatedYear. For that I did this
val levelOneRDD = inputRDD.map(row => {
row.setGroupbyKey(s"${row.getTenantId}_${row.getRowCreatedYear}_${row.getRowCreatedMonth}_${row.getUserId}")
row
})
val groupedRDD = levelOneRDD.groupBy(row => row.getGroupbyKey)
This gives me the data in key as String and value as Iterable[LevelOneOutput]
Now I want to generate one single object of LevelOneOutput for that group key. For that I was doing something like below:
val rdd = groupedRDD.map(row => {
val levelOneOutput = new LevelOneOutput
val groupKey = row._1.split("_")
levelOneOutput.setTenantId(groupKey(0))
levelOneOutput.setRowCreatedYear(groupKey(1).toInt)
levelOneOutput.setRowCreatedMonth(groupKey(2).toInt)
levelOneOutput.setUserId(groupKey(3))
var listType1 = new ArrayBuffer[TypeOne]
var listType2 = new ArrayBuffer[TypeTwo]
var listType3 = new ArrayBuffer[TypeThree]
...
...
var listType18 = new ArrayBuffer[TypeEighteen]
row._2.foreach(data => {
if (data.getListType1 != null) listType1 = listType1 ++ data.getListType1
if (data.getListType2 != null) listType2 = listType2 ++ data.getListType2
if (data.getListType3 != null) listType3 = listType3 ++ data.getListType3
...
...
if (data.getListType18 != null) listType18 = listType18 ++ data.getListType18
})
if (listType1.isEmpty) levelOneOutput.setListType1(null) else levelOneOutput.setListType1(listType1)
if (listType2.isEmpty) levelOneOutput.setListType2(null) else levelOneOutput.setListType2(listType2)
if (listType3.isEmpty) levelOneOutput.setListType3(null) else levelOneOutput.setListType3(listType3)
...
...
if (listType18.isEmpty) levelOneOutput.setListType18(null) else levelOneOutput.setListType18(listType18)
levelOneOutput
})
This is working as expected for small size of input, but when I try to run on the larger set of input data, group by operation is getting hang at 199/200 and I don't see any specific error or warning in stdout/stderr
Can some one point me why the job is not proceeding further...
Instead of using groupBy operation I have created paired RDD like below
val levelOnePairedRDD = inputRDD.map(row => {
row.setGroupbyKey(s"${row.getTenantId}_${row.getRowCreatedYear}_${row.getRowCreatedMonth}_${row.getUserId}")
(row.getGroupByKey, row)
})
and updated the processing logic, which solved my issue.