I am reading 1000 of .eml files (message/email files) one by one from a directory and parsing them and extracting values from them using javax.mail api's and in end storing them into a Dataframe. Sample code below:
var x = Seq[DataFrame]()
val emlFiles = getListOfFiles("tmp/sample")
val fileCount = emlFiles.length
val fs = FileSystem.get(sc.hadoopConfiguration)
for (i <- 0 until fileCount){
var emlData = spark.emptyDataFrame
val f = new File(emlFiles(i))
val fileName = f.getName()
val path = Paths.get(emlFiles(i))
val session = Session.getInstance(new Properties())
val messageIn = new FileInputStream(path.toFile())
val mimeJournal = new MimeMessage(session, messageIn)
// Extracting Metadata
val Receivers = mimeJournal.getHeader("From")(0)
val Senders = mimeJournal.getHeader("To")(0)
val Date = mimeJournal.getHeader("Date")(0)
val Subject = mimeJournal.getHeader("Subject")(0)
val Size = mimeJournal.getSize
emlData =Seq((fileName,Receivers,Senders,Date,Subject,Size)).toDF("fileName","Receivers","Senders","Date","Subject","Size")
x = emlData +: x
}
Problem is that I am using a for loop to do the same and its taking a lot of time. Is there a way to break the for loop and read the files?
val decoder = new BASE64Decoder
val decodedBytes = decoder.decodeBuffer(base64String)
val uploadFile = "C:/Users/BabuSuku/Downloads/SpineorDownloads/test.png"
val image = ImageIO.read(new ByteArrayInputStream(decodedBytes))
val f = new Nothing(uploadFile)
ImageIO.write(image, "png", uploadFile)
you passed a string as third parameter to write. You need a Filevariable instead. Change the last two lines accordingly:
val decoder = new BASE64Decoder
val decodedBytes = decoder.decodeBuffer(base64String)
val uploadFile = "C:/Users/BabuSuku/Downloads/SpineorDownloads/test.png"
val image = ImageIO.read(new ByteArrayInputStream(decodedBytes))
val f = new File(uploadFile)
ImageIO.write(image, "png", f)
see Docs
I have the following piece of code
var splitDf = fullCertificateSourceDf.map(row => {
val ID = row.getAs[String]("ID")
val CertificateID = row.getAs[String]("CertificateID")
val CertificateTag = row.getAs[String]("CertificateTag")
val CertificateDescription = row.getAs[String]("CertificateDescription")
val WorkBreakdownUp1Summary = row.getAs[String]("WorkBreakdownUp1Summary")
val ProcessBreakdownSummaryList = row.getAs[String]("ProcessBreakdownSummaryList")
val ProcessBreakdownUp1SummaryList = row.getAs[String]("ProcessBreakdownUp1SummaryList")
val ProcessBreakdownUp2Summary = row.getAs[String]("ProcessBreakdownUp2Summary")
val ProcessBreakdownUp3Summary = row.getAs[String]("ProcessBreakdownUp3Summary")
val ActualStartDate = row.getAs[java.sql.Date]("ActualStartDate")
val ActualEndDate = row.getAs[java.sql.Date]("ActualEndDate")
val ApprovedDate = row.getAs[java.sql.Date]("ApprovedDate")
val CurrentState = row.getAs[String]("CurrentState")
val DataType = row.getAs[String]("DataType")
val PullDate = row.getAs[String]("PullDate")
val PullTime = row.getAs[String]("PullTime")
val split_ProcessBreakdownSummaryList = ProcessBreakdownSummaryList.split(",")
val split_ProcessBreakdownUp1SummaryList = ProcessBreakdownUp1SummaryList.split(",")
val Pattern = "^.*?(?= - *[a-zA-Z])".r
for{
subSystem : String <- split_ProcessBreakdownSummaryList
} yield(ID,
CertificateID,
CertificateTag,
CertificateDescription,
WorkBreakdownUp1Summary,
subSystem,
for{ system: String <- split_ProcessBreakdownUp1SummaryList if(system contains subSystem.trim().substring(0,11))}yield(system),
ProcessBreakdownUp2Summary,
ProcessBreakdownUp3Summary,
ActualStartDate,
ActualEndDate,
ApprovedDate,
CurrentState,
DataType,
PullDate,
PullTime
)
}).flatMap(identity(_))
display(splitDf)
How can I get the first matching element from the following portion of the above statement:
for{ system: String <- split_ProcessBreakdownUp1SummaryList if(system contains subSystem.trim().substring(0,11))}yield(system)
At the moment it returns an array with one element in it. I dont want the array I just want the element.
Thank you in advance.
I have two strings in Scala
Input 1 : "a,c,e,g,i,k"
Input 2 : "b,d,f,h,j,l"
How do I join the two Strings in Scala?
Required output = "ab,cd,ef,gh,ij,kl"
I tried something like:
var columnNameSetOne:Array[String] = Array(); //v1 = "a,c,e,g,i,k"
var columnNameSetTwo:Array[String] = Array(); //v2 = "b,d,f,h,j,l"
After I get the input data as mentioned above
columnNameSetOne = v1.split(",")
columnNameSetTwo = v2.split(",");
val newColumnSet = IntStream.range(0, Math.min(columnNameSetOne.length, columnNameSetTwo.length)).mapToObj(j => (columnNameSetOne(j) + columnNameSetTwo(j))).collect(Collectors.joining(","));
println(newColumnSet)
But I am getting error on j
Also, I am not sure if this would work!
object Solution1 extends App {
val input1 = "a,c,e,g,i,k"
val input2 = "b,d,f,h,j,l"
val i1= input1.split(",")
val i2 = input2.split(",")
val x =i1.zipAll(i2, "", "").map{
case (a,b)=> a + b
}
println(x.mkString(","))
}
//output : ab,cd,ef,gh,ij,kl
Easy to do using zip function on list.
val v1 = "a,c,e,g,i,k"
val v2 = "b,d,f,h,j,l"
val list1 = v1.split(",").toList
val list2 = v2.split(",").toList
list1.zip(list2).mkString(",") // res0: String = (a,b),( c,d),( e,f),( g,h),( i,j),( k,l)
I'm looking to roundtrip bytes through java's Deflater and running into issues. First the output, then the code. What am I doing wrong here, and how can I properly round trip through these streams?
Output:
scala> new String(decompress(compress("face".getBytes)))
(crazy output string of length 20)
Code:
def compress(bytes: Array[Byte]): Array[Byte] = {
val deflater = new java.util.zip.Deflater
val baos = new ByteArrayOutputStream
val dos = new DeflaterOutputStream(baos, deflater)
dos.write(bytes)
baos.close
dos.finish
dos.close
baos.toByteArray
}
def decompress(bytes: Array[Byte]): Array[Byte] = {
val deflater = new java.util.zip.Deflater
val baos = new ByteArrayOutputStream(512)
val bytesIn = new ByteArrayInputStream(bytes)
val in = new DeflaterInputStream(bytesIn, deflater)
var go = true
while (go) {
val b = in.read
if (b == -1)
go = false
else
baos.write(b)
}
baos.close
in.close
baos.toByteArray
}
You're (re-)Deflater-ing the result of the original deflation when you should be Inflater-ing it...