functional code to below imperative one in scala - scala

i want to write the functional version for finding the pair of elements with given sum.the below is the imperative code:
object ArrayUtil{
def findPairs(arr:Array[Int],sum:Int) ={
val MAX = 50
val binmap:Array[Boolean] = new Array[Boolean](MAX)
for(i <- 0 until arr.length){
val temp:Int = sum-arr(i);
if (temp>=0 && binmap(temp))
{
println("Pair with given sum " + sum + " is (" + arr(i) +", "+temp+")");
}
binmap(arr(i)) = true;
}
}
}

Study the Standard Library.
def findPairs(arr:Array[Int],sum:Int): List[Array[Int]] =
arr.combinations(2).filter(_.sum == sum).toList

Related

Scala If and For Loops issue

I have a list that i want to parse its element and subelement and give them as variable for dataframe query
but i get an error can anybody help here is my code
val ListParser = ("age,15,20","revenue,1,2")
val vars = "category in (1,2,4)"
val resultQuery : Dataset[Row] =
if(ListParser.size == 0){
responses.filter(vars)
}else if(ListParser.size == 1){
responses.filter(vars +" AND " + responses(ListParser(0)).between(ListParser(1).toInt, ListParser(2).toInt))
}else if(ListParser.size >= 2){
responses.filter(vars + " AND " + {for(a <- ListParser){
val myInnerList : List[String] = a.split(",").map(_.trim).toList
responses(myInnerList(0)).between(myInnerList(1).toInt,myInnerList(2).toInt)
}})
}else{
responses.filter(vars)
}
and i have another question i want only the value of response.filter() to be in resultQuery val
It seems you are mixing SQL syntax with spark object syntax
try (listParser.map( s => {val l =s.split(","); s"""${l(0)} between (${l(1)},${l(2)})"""}) :+ vars).mkString(" AND ") (assuming listParser is indeed a list and not a tuple )

Bubble sort of random integers in scala

I'm new in Scala programming language so in this Bubble sort I need to generate 10 random integers instead of right it down like the code below
any suggestions?
object BubbleSort {
def bubbleSort(array: Array[Int]) = {
def bubbleSortRecursive(array: Array[Int], current: Int, to: Int): Array[Int] = {
println(array.mkString(",") + " current -> " + current + ", to -> " + to)
to match {
case 0 => array
case _ if(to == current) => bubbleSortRecursive(array, 0, to - 1)
case _ =>
if (array(current) > array(current + 1)) {
var temp = array(current + 1)
array(current + 1) = array(current)
array(current) = temp
}
bubbleSortRecursive(array, current + 1, to)
}
}
bubbleSortRecursive(array, 0, array.size - 1)
}
def main(args: Array[String]) {
val sortedArray = bubbleSort(Array(10,9,11,5,2))
println("Sorted Array -> " + sortedArray.mkString(","))
}
}
Try this:
import scala.util.Random
val sortedArray = (1 to 10).map(_ => Random.nextInt).toArray
You can use scala.util.Random for generation. nextInt method takes maxValue argument, so in the code sample, you'll generate list of 10 int values from 0 to 100.
val r = scala.util.Random
for (i <- 1 to 10) yield r.nextInt(100)
You can find more info here or here
You can use it this way.
val solv1 = Random.shuffle( (1 to 100).toList).take(10)
val solv2 = Array.fill(10)(Random.nextInt)

How to find the number of lines and different elements in a file and writing them on header, Scala

I have something like:
object Example_01_IO {
val s = Source.fromFile("example_01.txt")
val source = s.getLines()
val destination = new PrintWriter(new File("des_example_01.txt"))
var nrVariables: Int = 0
var nrLines: Int = 0
// here are the extracted lines from example_01 that fulfills some conditions.
val linesToWrite: Iterator[String] = ...
def main(args: Array[String]): Unit = {
//Here is the header that I want to write in a destination file
destination.write("des_example_01.txt \n")
destination.write("Nr. of Variables and Lines: " + nrVariables + " " + nrLines + "\n")
for(line <- linesToWrite) {
println(line)
destination.write(line)
destination.write("\n")
nrLines += 1
}
s.close()
destination.close()
}
I need to have the values for nrVariables and nrLines to write in the header of the destination file (e.g., in the second row). Is there a possibility to calculate these two values before starting to write the other lines?
Any help or reference is really welcomed. Thank you.
Well, Source.fromFile can not be reused, the one below works fine:
package example
import java.io.PrintWriter
import scala.io.Source
import java.io.File
object Example_01_IO {
def s = Source.fromFile("/tmp/example_01.txt") // notice def everywhere, looks like Source.fromFile could not be reused :(
def source = s.getLines()
val destination = new PrintWriter(new File("/tmp/des_example_01.txt"))
var nrVariables: Int = 0
var nrLines: Int = 0
// here are the extracted lines from example_01 that fulfills some conditions.
def linesToWrite: Iterator[String] = source.filter { s => s.contains("a") }
def main(args: Array[String]): Unit = {
linesToWrite.foreach { s =>
nrLines += 1
if (s contains "variable") {
nrVariables += 1
}
}
//Here is the header that I want to write in a destination file
destination.write("des_example_01.txt \n")
destination.write("Nr. of Variables and Lines: " + nrVariables + " " + nrLines + "\n")
for(line <- linesToWrite) {
println(line)
destination.write(line)
destination.write("\n")
/*nrLines += 1*/
}
s.close()
destination.close()
}
}

Scala - append RDD to itself

for (fordate <- 2 to 30) {
val dataRDD = sc.textFile("s3n://mypath" + fordate + "/*")
val a = 1
val c = fordate - 1
for (b <- a to c) {
val cumilativeRDD1 = sc.textFile("s3n://mypath/" + b + "/*")
val cumilativeRDD : org.apache.spark.rdd.RDD[String] = sc.union(cumilativeRDD1, cumilativeRDD)
if (b == c) {
val incrementalDEviceIDs = dataRDD.subtract(cumilativeRDD)
val countofIDs = incrementalDEviceIDs.distinct().count()
println(s"201611 $fordate $countofIDs")
}
}
}
i have a data set where i get deviceIDs on daily basis. i need to figure out the incremental count per day but when i join cumilativeRDD to itself it saysthrows following error:
forward reference extends over definition of value cumilativeRDD
how can i overcome this.
The problem is this line:
val cumilativeRDD : org.apache.spark.rdd.RDD[String] = sc.union(cumilativeRDD1 ,cumilativeRDD)
You're using cumilativeRDD before it's declaration. Variable assignment works from right to left. The right side of = defines the variable on the left. Therefore you cannot use the variable inside it's own definition. Because on the right side of the equation the variable does not yet exist.
You have to init cumilativeRDD in the first run and then you can you use it in following runs:
var cumilativeRDD: Option[org.apache.spark.rdd.RDD[String]] = None
for (fordate <- 2 to 30) {
val DataRDD = sc.textFile("s3n://mypath" + fordate + "/*")
val c = fordate - 1
for (b <- 1 to c) {
val cumilativeRDD1 = sc.textFile("s3n://mypath/" + b + "/*")
if (cumilativeRDD.isEmpty) cumilativeRDD = Some(cumilativeRDD1)
else cumilativeRDD = Some(sc.union(cumilativeRDD1, cumilativeRDD.get))
if (b == c) {
val IncrementalDEviceIDs = DataRDD.subtract(cumilativeRDD.get)
val countofIDs = IncrementalDEviceIDs.distinct().count()
println("201611" + fordate + " " + countofIDs)
}
}
}

org.apache.spark.SparkException: Task not serializable (scala)

I am new for scala as well as FOR spark, Please help me to resolve this issue.
in spark shell when I load below functions individually they run without any exception, when I copy this function in scala object, and load same file in spark shell they throws task not serialization exception in "processbatch" function when trying to parallelize.
PFB code for the same:
import org.apache.spark.sql.Row
import org.apache.log4j.Logger
import org.apache.spark.sql.hive.HiveContext
object Process {
val hc = new HiveContext(sc)
def processsingle(wait: Int, patient: org.apache.spark.sql.Row, visits: Array[org.apache.spark.sql.Row]) : String = {
var out = new StringBuilder()
val processStart = getTimeInMillis()
for( x <- visits ) {
out.append(", " + x.getAs("patientid") + ":" + x.getAs("visitid"))
}
}
def processbatch(batch: Int, wait: Int, patients: Array[org.apache.spark.sql.Row], visits: Array[org.apache.spark.sql.Row]) = {
val out = sc.parallelize(patients, batch).map( r=> processsingle(wait, r, visits.filter(f=> f.getAs("patientid") == r.getAs("patientid")))).collect()
for(x <- out) println(x)
}
def processmeasures(fetch: Int, batch: Int, wait: Int) = {
val patients = hc.sql("SELECT patientid FROM tableName1 order by p_id").collect()
val visit = hc.sql("SELECT patientid, visitid FROM tableName2")
val count = patients.length
val fetches = if(count % fetch > 0) (count / fetch + 1) else (count / fetch)
for(i <- 0 to fetches.toInt-1){
val startFetch = i*fetch
val endFetch = math.min((i+1)*fetch, count.toInt)-1
val fetchSize = endFetch - startFetch + 1
val fetchClause = "patientid >= " + patients(startFetch).get(0) + " and patientid <= " + patients(endFetch).get(0)
val fetchVisit = visit.filter( fetchClause ).collect()
val batches = if(fetchSize % batch > 0) (fetchSize / batch + 1) else (fetchSize / batch)
for(j <- 0 to batches.toInt-1){
val startBatch = j*batch
val endBatch = math.min((j+1)*batch, fetch.toInt)-1
println(s"Batch from $startBatch to $endBatch");
val batchVisits = fetchVisit.filter(g => g.getAs[Long]("patientid") >= patients(i*fetch + startBatch).getLong(0) && g.getAs[Long]("patientid") <= patients(math.min(i*fetch + endBatch + 1, endFetch)).getLong(0))
processbatch(batch, wait, patients.slice(i*fetch + startBatch, i*fetch + endBatch + 1), batchVisits)
}
}
println("Processing took " + getExecutionTime(processStart) + " millis")
}
}
You should make Process object Serializable:
object Process extends Serializable {
...
}