Bubble sort of random integers in scala - scala

I'm new in Scala programming language so in this Bubble sort I need to generate 10 random integers instead of right it down like the code below
any suggestions?
object BubbleSort {
def bubbleSort(array: Array[Int]) = {
def bubbleSortRecursive(array: Array[Int], current: Int, to: Int): Array[Int] = {
println(array.mkString(",") + " current -> " + current + ", to -> " + to)
to match {
case 0 => array
case _ if(to == current) => bubbleSortRecursive(array, 0, to - 1)
case _ =>
if (array(current) > array(current + 1)) {
var temp = array(current + 1)
array(current + 1) = array(current)
array(current) = temp
}
bubbleSortRecursive(array, current + 1, to)
}
}
bubbleSortRecursive(array, 0, array.size - 1)
}
def main(args: Array[String]) {
val sortedArray = bubbleSort(Array(10,9,11,5,2))
println("Sorted Array -> " + sortedArray.mkString(","))
}
}

Try this:
import scala.util.Random
val sortedArray = (1 to 10).map(_ => Random.nextInt).toArray

You can use scala.util.Random for generation. nextInt method takes maxValue argument, so in the code sample, you'll generate list of 10 int values from 0 to 100.
val r = scala.util.Random
for (i <- 1 to 10) yield r.nextInt(100)
You can find more info here or here

You can use it this way.
val solv1 = Random.shuffle( (1 to 100).toList).take(10)
val solv2 = Array.fill(10)(Random.nextInt)

Related

Is there any way I can rewrite this line of code in Scala?

I try to rewrite this line of Scala + Figaro using my function sum_ but I have some errors.
val sum = Container(vars:_*).reduce(_+_)
It uses the reduce() method to calculate the sum. I want to rewrite this line but I have errors because of the Chain return type [Double, Int]:
import com.cra.figaro.language._
import com.cra.figaro.library.atomic.continuous.Uniform
import com.cra.figaro.language.{Element, Chain, Apply}
import com.cra.figaro.library.collection.Container
object sum {
def sum_(arr: Int*) :Int={
var i=0
var sum: Int =0
while (i < arr.length) {
sum += arr(i)
i += 1
}
return sum
}
def fillarray(): Int = {
scala.util.Random.nextInt(10) match{
case 0 | 1 | 2 => 3
case 3 | 4 | 5 | 6 => 4
case _ => 5
}
}
def main(args: Array[String]) {
val par = Array.fill(18)(fillarray())
val skill = Uniform(0.0, 8.0/13.0)
val shots = Array.tabulate(18)((hole: Int) => Chain(skill, (s:Double) =>
Select(s/8.0 -> (par(hole)-2),
s/2.0 -> (par(hole)-1),
s -> par(hole),
(4.0/5.0) * (1.0 - (13.0 * s)/8.0)-> (par(hole)+1),
(1.0/5.0) * (1.0 - (13.0 * s)/8.0) -> (par(hole)+2))))
val vars = for { i <- 0 until 18} yield shots(i)
//this line I want to rewrite
val sum1 = Container(vars:_*).reduce(_+_)
//My idea was to implement in this way the line above
val sum2 = sum_(vars)
}
}
If you want use your function you can do so:
val sum2 = sum_(vars.map(chain => chain.generateValue()):_*)
or
val sum2 = sum_(vars.map(_.generateValue()):_*)
but I'd recommend to dive deeper into your library and functional paradigm.

in countWord example, I apply foreach but it has cannot resolved symbol error

here is example about countWords. (Scala)
[origin]
def countWords(text: String): mutable.Map[String, Int] = {
val counts = mutable.Map.empty[String, Int]
for (rawWord <- text.split("[ ,!.]+")) {
val word = rawWord.toLowerCase
val oldCount =
if (counts.contains(word)) counts(word)
else 0
counts += (word -> (oldCount + 1))
}
return counts
}
[my code]
here is my code.
def countWords2(text: String):mutable.Map[String, Int] = {
val counts = mutable.Map.empty[String, Int]s
text.split("[ ,!.]").foreach(word =>
val lowWord = word.toLowerCase()
val oldCount = if (counts.contains(lowWord)) counts(lowWord) else 0
counts += (lowWord -> (oldCount + 1))
)
return counts
}
I tried transfer "for()" sentence to "foreach" but I got "cannot resolved symbol" error message.
how to use foreach in this case?

Evaluating multiple filters in a single pass

I have below rdd created and I need to perform a series of filters on the same dataset to derive different counters and aggregates.
Is there a way I can apply these filters and compute aggregates in a single pass, avoiding spark to go over the same dataset multiple times?
val res = df.rdd.map(row => {
// ............... Generate data here for each row.......
})
res.persist(StorageLevel.MEMORY_AND_DISK)
val all = res.count()
val stats1 = res.filter(row => row.getInt(1) > 0)
val stats1Count = stats1.count()
val stats1Agg = stats1.map(r => r.getInt(1)).mean()
val stats2 = res.filter(row => row.getInt(2) > 0)
val stats2Count = stats2.count()
val stats2Agg = stats2.map(r => r.getInt(2)).mean()
You can use aggregate:
case class Stats(count: Int = 0, sum: Int = 0) {
def mean = sum/count
def +(s: Stats): Stats = Stats(count + s.count, sum + s.sum)
def <- (n: Int) = if(n > 0) copy(count + 1, sum + n) else this
}
val (stats1, stats2) = res.aggregate(Stats() -> Stats()) (
{ (s, row) => (s._1 <- row.getInt(1), s._2 <- row.getInt(2)) },
{ _ + _ }
)
val (stat1Count, stats1Agg, stats2Count, stats2Agg) = (stats1.count, stats1.mean, stats2.count, stats2.mean)

Converting a For Loop to a Fold

I have to analyze an email corpus to see how many of individual sentences are dominated by leet speak (i.e. lol, brb etc.)
For each sentence I am doing the following:
val words = sentence.split(" ")
for (word <- words) {
if (validWords.contains(word)) {
score += 1
} else if (leetWords.contains(word)) {
score -= 1
}
}
Is there a better way to calculate the scores using Fold?
Not a great deal different, but another option.
val words = List("one", "two", "three")
val valid = List("one", "two")
val leet = List("three")
def check(valid: List[String], invalid: List[String])(words:List[String]): Int = words.foldLeft(0){
case (x, word) if valid.contains(word) => x + 1
case (x, word) if invalid.contains(word) => x - 1
case (x, _ ) => x
}
val checkValidOrLeet = check(valid, leet)(_)
val count = checkValidOrLeet(words)
If not limited to fold, using sum would be more concise.
sentence.split(" ")
.iterator
.map(word =>
if (validWords.contains(word)) 1
else if (leetWords.contains(word)) -1
else 0
).sum
Here's a way to do it with fold and partial application. Could still be more elegant, I'll continue to think on it.
val sentence = // ...your data....
val validWords = // ... your valid words...
val leetWords = // ... your leet words...
def checkWord(goodList: List[String], badList: List[String])(c: Int, w: String): Int = {
if (goodList.contains(w)) c + 1
else if (badList.contains(w)) c - 1
else c
}
val count = sentence.split(" ").foldLeft(0)(checkWord(validWords, leetWords))
print(count)

org.apache.spark.SparkException: Task not serializable (scala)

I am new for scala as well as FOR spark, Please help me to resolve this issue.
in spark shell when I load below functions individually they run without any exception, when I copy this function in scala object, and load same file in spark shell they throws task not serialization exception in "processbatch" function when trying to parallelize.
PFB code for the same:
import org.apache.spark.sql.Row
import org.apache.log4j.Logger
import org.apache.spark.sql.hive.HiveContext
object Process {
val hc = new HiveContext(sc)
def processsingle(wait: Int, patient: org.apache.spark.sql.Row, visits: Array[org.apache.spark.sql.Row]) : String = {
var out = new StringBuilder()
val processStart = getTimeInMillis()
for( x <- visits ) {
out.append(", " + x.getAs("patientid") + ":" + x.getAs("visitid"))
}
}
def processbatch(batch: Int, wait: Int, patients: Array[org.apache.spark.sql.Row], visits: Array[org.apache.spark.sql.Row]) = {
val out = sc.parallelize(patients, batch).map( r=> processsingle(wait, r, visits.filter(f=> f.getAs("patientid") == r.getAs("patientid")))).collect()
for(x <- out) println(x)
}
def processmeasures(fetch: Int, batch: Int, wait: Int) = {
val patients = hc.sql("SELECT patientid FROM tableName1 order by p_id").collect()
val visit = hc.sql("SELECT patientid, visitid FROM tableName2")
val count = patients.length
val fetches = if(count % fetch > 0) (count / fetch + 1) else (count / fetch)
for(i <- 0 to fetches.toInt-1){
val startFetch = i*fetch
val endFetch = math.min((i+1)*fetch, count.toInt)-1
val fetchSize = endFetch - startFetch + 1
val fetchClause = "patientid >= " + patients(startFetch).get(0) + " and patientid <= " + patients(endFetch).get(0)
val fetchVisit = visit.filter( fetchClause ).collect()
val batches = if(fetchSize % batch > 0) (fetchSize / batch + 1) else (fetchSize / batch)
for(j <- 0 to batches.toInt-1){
val startBatch = j*batch
val endBatch = math.min((j+1)*batch, fetch.toInt)-1
println(s"Batch from $startBatch to $endBatch");
val batchVisits = fetchVisit.filter(g => g.getAs[Long]("patientid") >= patients(i*fetch + startBatch).getLong(0) && g.getAs[Long]("patientid") <= patients(math.min(i*fetch + endBatch + 1, endFetch)).getLong(0))
processbatch(batch, wait, patients.slice(i*fetch + startBatch, i*fetch + endBatch + 1), batchVisits)
}
}
println("Processing took " + getExecutionTime(processStart) + " millis")
}
}
You should make Process object Serializable:
object Process extends Serializable {
...
}