Binary search not working - scala

Below is a binary search algorithm but its not finding the value :
I don't think this algorithm is correct?
'theArray' is initialised to an array of 0's with item at position 7 equal to 4.
object various {
//O(log N)
def binarySerachForValue(value : Int) = {
var arraySize = 100
var theArray = new Array[Int](arraySize)
theArray(7) = 4
var timesThrough = 0
var lowIndex = 0
var highIndex = arraySize - 1
while(lowIndex <= highIndex){
var middleIndex = (highIndex + lowIndex) / 2
if(theArray(middleIndex) < value)
lowIndex = middleIndex + 1
else if(theArray(middleIndex) > value)
highIndex = middleIndex - 1
else {
println("Found match in index " + middleIndex)
lowIndex = highIndex + 1
}
timesThrough = timesThrough + 1
}
timesThrough
} //> binarySerachForValue: (value: Int)Int
binarySerachForValue(4) //> res0: Int = 7
}

Assuming your array is already properly sorted, you could write your search function a little more functionally using tail optimized recursion as follows:
def binarySearchForValue(value : Int, theArray:Array[Int]) = {
#tailrec
def doSearch(arr:Array[Int], index:Int = 0):Int = {
val middleIndex = arr.size / 2
val splits = arr.splitAt(middleIndex)
val totalIndex = middleIndex + index
arr(middleIndex) match{
case i if i == value => totalIndex
case i if i < value => doSearch(splits._2, totalIndex)
case _ => doSearch(splits._1 dropRight(1), totalIndex)
}
}
doSearch(theArray)
}
Note that this could also be accomplished slightly differently as follows:
def binarySearchForValue(value : Int, theArray:Array[Int]) = {
#tailrec
def doSearch(low:Int, high:Int):Int = {
val mid = (low + high) / 2
if(mid >= theArray.size) -1
else {
val currval = theArray(mid)
if (currval == value) mid
else if (currval < value) doSearch(mid+1, high)
else doSearch(low, mid - 1)
}
}
doSearch(0, theArray.size)
}

It looks like a proper implementation of the Binary Search Algorithm, but you are providing an array of 0's, with just one number at the index of 7. Binary Search usually takes an array of sorted values (although you can implement sorting as the first step).
Here is an example of why you need a sorted array first:
Searchfor(4)
theArray = [0,4,0,0,0]
First iteration, look at theArray(2), which equals 0. 0 < 4, so use the upperhalf(i.e. lower index = middleindex + 1
newArray = [0,0]
Then we iterate again and eventually exit the loop because we never found it. With a sorted list, your technique would work well.
With finding a single value in an array of 0's, your best bet is to just iterate through the array until you find it. Best of Luck.

loop should be like this:
while(lowIndex <= highIndex){
//note the lowIndex + other
var middleIndex = lowIndex + ((highIndex + lowIndex) / 2)
if(theArray(middleIndex) < value)
lowIndex = middleIndex + 1
else if(theArray(middleIndex) > value)
highIndex = middleIndex - 1
else return middleIndex
timesThrough = timesThrough + 1
}
// if loop finished and not returned middleIndex in last else, return -1 (not found)
return -1

Related

How to use string.split() without foreach()?

Write a program in Scala that reads an String from the keyboard and counts the number of characters, ignoring if its UpperCase or LowerCase
ex: Avocado
R: A = 2; v = 1; o = 2; c = 1; d = 2;
So, i tried to do it with two fors iterating over the string, and then a conditional to transform the character in the position (x) to Upper and compare with the character in the position (y) which is the same position... basically i'm transforming the same character so i can increment in the counter ex: Ava -> A = 2; v = 1;
But with this logic when i print the result it comes with:
ex: Avocado
R: A = 2; v = 1; o = 2; c = 1; a = 2; d = 1; o = 2;
its repeting the same character Upper or Lower in the result...
so my teacher asked us to resolve this using the split method and yield of Scala but i dunno how to use the split without forEach() that he doesnt allow us to use.
sorry for the bad english
object ex8 {
def main(args: Array[String]): Unit = {
println("Write a string")
var string = readLine()
var cont = 0
for (x <- 0 to string.length - 1) {
for (y <- 0 to string.length - 1) {
if (string.charAt(x).toUpper == string.charAt(y).toUpper)
cont += 1
}
print(string.charAt(x) + " = " + cont + "; ")
cont = 0
}
}
}
But with this logic when i print the result it comes with:
ex: Avocado
R: A = 2; V = 1; o = 2; c = 1; a = 2; d = 1; o = 2;
Scala 2.13 has added a very handy method to cover this sort of thing.
inputStr.groupMapReduce(_.toUpper)(_ => 1)(_+_)
.foreach{case (k,v) => println(s"$k = $v")}
//A = 2
//V = 1
//C = 1
//O = 2
//D = 1
It might be easier to group the individual elements of the String (i.e. a collection of Chars, made case-insensitive with toLower) to aggregate their corresponding size using groupBy/mapValues:
"Avocado".groupBy(_.toLower).mapValues(_.size)
// res1: scala.collection.immutable.Map[Char,Int] =
// Map(a -> 2, v -> 1, c -> 1, o -> 2, d -> 1)
Scala 2.11
Tried with classic word count approach of map => group => reduce
val exampleStr = "Avocado R"
exampleStr.
toLowerCase.
trim.
replaceAll(" +","").
toCharArray.map(x => (x,1)).groupBy(_._1).
map(x => (x._1,x._2.length))
Answer :
exampleStr: String = Avocado R
res3: scala.collection.immutable.Map[Char,Int] =
Map(a -> 2, v -> 1, c -> 1, r -> 1, o -> 2, d -> 1)

How to get multiple adjacent data in a RDD with Scala Spark

I have a RDD, the rdd's value is 0 or 1, and a limit is 4. When I map the RDD, if rdd's value is 1 then the values from the current position to the (current position+limit) are all 1 else there are 0 0 .
example.
input : 1,0,0,0,0,0,1,0,0
expected output : 1,1,1,1,0,0,1,1,1
This is what I have tried so far :
val rdd = sc.parallelize(Array(1, 0, 0, 0, 0, 0, 1, 0, 0))
val limit = 4
val resultlimit = rdd.mapPartitions(parIter => {
var result = new ArrayBuffer[Int]()
var resultIter = new ArrayBuffer[Int]()
while (parIter.hasNext) {
val iter = parIter.next()
resultIter.append(iter)
}
var i = 0
while (i < resultIter.length) {
result.append(resultIter(i))
if (resultIter(i) == 1) {
var j = 1
while (j + i < resultIter.length && j < limit) {
result.append(1)
j += 1
}
i += j
} else {
i += 1
}
}
result.toIterator
})
resultlimit.foreach(println)
The result of resultlimit is RDD:[1,1,1,1,0,0,1,1,1]
My quick and dirty approach is to first create an Array but that is so ugly and inefficient.
Is there any cleaner solution?
Plain and simple. Import RDDFunctions
import org.apache.spark.mllib.rdd.RDDFunctions._
Define a limit:
val limit: Int = 4
Perpend limit - 1 zeros to the first partition:
val extended = rdd.mapPartitionsWithIndex {
case (0, iter) => Seq.fill(limit - 1)(0).toIterator ++ iter
case (_, iter) => iter
}
Slide over the RDD:
val result = extended.sliding(limit).map {
slice => if (slice.exists(_ != 0)) 1 else 0
}
Check the result:
val expected = Seq(1,1,1,1,0,0,1,1,1)
require(expected == result.collect.toSeq)
On a side note, your current approach doesn't correct for partition boundaries, therefore result will vary depending on the source.
Following is an improved approach to your requirement. Three while loops are reduced to one for loop and two ArrayBuffers are reduced to one ArrayBuffer. So processing time and memory usage both are reduced.
val resultlimit= rdd.mapPartitions(parIter => {
var result = new ArrayBuffer[Int]()
var limit = 0
for (value <- parIter) {
if (value == 1) limit = 4
if (limit > 0) {
result.append(1)
limit -= 1
}
else {
result.append(value)
}
}
result.toIterator
})
Edited
Above solution is when you don't have a partition defined in the original rdd. But when a partition is defined as
val rdd = sc.parallelize(Array(1,1,0,0,0,0,1,0,0), 4)
We need to collect the rdds as above solution will get executed on each partitions.
So the following solution should work
var result = new ArrayBuffer[Int]()
var limit = 0
for (value <- rdd.collect()) {
if (value == 1) limit = 4
if (limit > 0) {
result.append(1)
limit -= 1
}
else {
result.append(value)
}
}
result.foreach(println)

Counting by range

The following script can be used to "count by" keys
val nbr = List(1,2,2,3,3,3,4,4,4,4)
val nbrPairsRDD = sc.parallelize(nbr).map(nbr => (nbr, 1))
val nbrCountsWithReduce = nbrPairsRDD
.reduceByKey(_ + _)
.collect()
nbrCountsWithReduce.foreach(println)
it returns:
(1,1)
(2,2)
(3,3)
(4,4)
How could it be modified to map by range rather than absolute values and give the following output if we had two ranges 1:2 and 3:4:
(1:2,3)
(3:4,7)
One option is to convert the list into double and use the histogram function:
val nbr = List(1,2,2,3,3,3,4,4,4,4)
val nbrPairsRDD = sc.parallelize(nbr).map(_.toDouble).histogram(2)
One easy way that I can think of is to map the keys to individual ranges, for eg :
val nbrRangePairs = sc.parallelize(nbr)
.map(nbr => (computeRange(nbr), 1))
.reduceByKey(_ + _)
.collect()
// function to compute Ranges
def computeRange(num : int) : String =
{
if(num < 3)
return "1:2"
else if(num < 5)
return "2:3"
else
return "invalid"
}
Here is the code snippet to compute aggregations by range:
val nbr = List(1,2,2,3,3,3,4,4,4,4)
val nbrs = sc.parallelize(nbr)
var lb = 1
var incr = 1
var ub = lb + incr
val nbrsMap = nbrs.map(rec => {
if(rec > ub) {
lb = rec
ub = lb + incr
}
(lb.toString + ":" + ub.toString, 1)
})
nbrsMap.reduceByKey((acc, value) => acc + value).foreach(println)
(1:2,3)
(3:4,7)

how to add adjacent values in Array[Double]

val values = Array[Double].sliding(2).map(x => x.reduce(_ + _) / 2)
This works successfully. But if that array contains 10000 or more values, it take times to get the values. Is there a faster method to find the adjacent values?
I think this should be faster:
val values = (for(i <- 0 until array.length - 1) yield ((array(i) + array(i + 1)) / 2)).toArray
Going low-level:
var i = 0
val valuesLength = array.length - 1
val values = new Array[Double](valuesLength)
while (i < valuesLength) {
values(i) = (array(i) + array(i + 1)) / 2
i += 1
}
Of course, you should only do this if this is actually a bottleneck in your program.

Scala Branch And Bound Motif Search

Below code searches for a motif (of length 8) in a sequence(String) and, as the result, it has to give back sequence with the best score. The problem is, although the code produces no errors, there is no output at all (probably infinite cycle, I observe blank console).
I am gonna give all my code online and if that is required. In order to reproduce the problem, just pass a number (between 0 and 3 - you can give 4 sequence, so you must choose 1 of them 0 is the first , 1 is the second etc) as args(0) (e.g. "0"), expected output should look something like "Motif = ctgatgta"
import scala.util.control._
object BranchAndBound {
var seq: Array[String] = new Array[String](20)
var startPos: Array[Int] = new Array[Int](20)
var pickup: Array[String] = new Array[String](20)
var bestMotif: Array[Int] = new Array[Int](20)
var ScoreMatrix = Array.ofDim[Int](5, 20)
var i: Int = _
var j: Int = _
var lmer: Int = _
var t: Int = _
def main(args: Array[String]) {
var t1: Long = 0
var t2: Long = 0
t1 = 0
t2 = 0
t1 = System.currentTimeMillis()
val seq0 = Array(
Array(
" >5 regulatory reagions with 69 bp",
" cctgatagacgctatctggctatccaggtacttaggtcctctgtgcgaatctatgcgtttccaaccat",
" agtactggtgtacatttgatccatacgtacaccggcaacctgaaacaaacgctcagaaccagaagtgc",
" aaacgttagtgcaccctctttcttcgtggctctggccaacgagggctgatgtataagacgaaaatttt",
" agcctccgatgtaagtcatagctgtaactattacctgccacccctattacatcttacgtccatataca",
" ctgttatacaacgcgtcatggcggggtatgcgttttggtcgtcgtacgctcgatcgttaccgtacggc"),
Array(
" 2 columns mutants",
" cctgatagacgctatctggctatccaggtacttaggtcctctgtgcgaatctatgcgtttccaaccat",
" agtactggtgtacatttgatccatacgtacaccggcaacctgaaacaaacgctcagaaccagaagtgc",
" aaacgttagtgcaccctctttcttcgtggctctggccaacgagggctgatgtataagacgaaaattttt",
" agcctccgatgtaagtcatagctgtaactattacctgccacccctattacatcttacgtccatataca",
" ctgttatacaacgcgtcatggcggggtatgcgttttggtcgtcgtacgctcgatcgttaccgtacggc"),
Array(
" 2 columns mutants",
" cctgatagacgctatctggctatccaggtacttaggtcctctgtgcgaatctatgcgtttccaaccat",
" agtactggtgtacatttgatccatacgtacaccggcaacctgaaacaaacgctcagaaccagaagtgc",
" aaacgttagtgcaccctctttcttcgtggctctggccaacgagggctgatgtataagacgaaaattttt",
" agcctccgatgtaagtcatagctgtaactattacctgccacccctattacatcttacgtccatataca",
" ctgttatacaacgcgtcatggcggggtatgcgttttggtcgtcgtacgctcgatcgttaccgtacggc"),
Array(
" 2 columns mutants",
" cctgatagacgctatctggctatccaggtacttaggtcctctgtgcgaatctatgcgtttccaaccat",
" agtactggtgtacatttgatccatacgtacaccggcaacctgaaacaaacgctcagaaccagaagtgc",
" aaacgttagtgcaccctctttcttcgtggctctggccaacgagggctgatgtataagacgaaaattttt",
" agcctccgatgtaagtcatagctgtaactattacctgccacccctattacatcttacgtccatataca",
" ctgttatacaacgcgtcatggcggggtatgcgttttggtcgtcgtacgctcgatcgttaccgtacggc"))
var k: Int = 0
var m: Int = 0
var n: Int = 0
var bestScore: Int = 0
var optScore: Int = 0
var get: Int = 0
var ok1: Boolean = false
var ok3: Boolean = false
ok1 = false
ok3 = false
j = 1
lmer = 8
m = 1
t = 5
n = 69
optScore = 0
bestScore = 0
k = java.lang.Integer.parseInt(args(0))
j = 1
while (j <= t) {
seq(j) = new String()
i = 0
while (i < n) {
seq(j) += seq0(k)(j).charAt(i)
i += 1
}
j += 1
}
j = 1
while (j <= t) {
newPickup(1, j)
j += 1
}
j = 0
bestScore = 0
i = 1
val whilebreaker = new Breaks
whilebreaker.breakable {
while (i > 0) {
if (i < t) {
if (startPos(1) == (n - lmer)) whilebreaker.break
val sc = Score()
optScore = sc + (t - i) * lmer
if (optScore < bestScore) {
ok1 = false
j = i
val whilebreak1 = new Breaks
whilebreak1.breakable {
while (j >= 1) {
if (startPos(j) < n - lmer) {
ok1 = true
newPickup(0, j)
whilebreak1.break
} else {
ok1 = true
newPickup(1, j)
val whilebreak2 = new Breaks
whilebreak2.breakable {
while (startPos(i - 1) == (n - lmer)) {
newPickup(1, i - 1)
i -= 1
if (i == 0) whilebreak2.break
}
}
if (i > 1) {
newPickup(0, i - 1)
i -= 1
}
whilebreak1.break
}
}
}
if (ok1 == false) i = 0
} else {
newPickup(1, i + 1)
i += 1
}
} else {
get = Score()
if (get > bestScore) {
bestScore = get
m = 1
while (m <= t) {
bestMotif(m) = startPos(m)
m += 1
}
}
ok3 = false
j = t
val whilebreak3 = new Breaks
whilebreak3.breakable {
while (j >= 1) {
if (startPos(j) < n - lmer) {
ok3 = true
newPickup(0, j)
whilebreak3.break
} else {
ok3 = true
newPickup(1, j)
val whilebreak4 = new Breaks
whilebreak4.breakable {
while (startPos(i - 1) == (n - lmer)) {
newPickup(1, i - 1)
i -= 1
if (i == 0) whilebreak4.break
}
}
if (i > 1) {
newPickup(0, i - 1)
i -= 1
}
whilebreak3.break
}
}
}
if (ok3 == false) i = 0
}
}
}
println("Motiv: " + Consensus())
// println()
j = 1
while (j <= t) {
t2 = System.currentTimeMillis()
j += 1
}
println("time= " + (t2 - t1) + " ms")
}
def Score(): Int = {
var j: Int = 0
var k: Int = 0
var m: Int = 0
var max: Int = 0
var sum: Int = 0
sum = 0
max = 0
m = 1
while (m <= lmer) {
k = 1
while (k <= 4) {
ScoreMatrix(k)(m) = 0
k += 1
}
m += 1
}
m = 1
while (m <= lmer) {
k = 1
while (k <= i) pickup(k).charAt(m) match {
case 'a' => ScoreMatrix(1)(m) += 1
case 'c' => ScoreMatrix(2)(m) += 1
case 'g' => ScoreMatrix(3)(m) += 1
case 't' => ScoreMatrix(4)(m) += 1
}
m += 1
}
j = 1
while (j <= lmer) {
max = 0
m = 1
while (m <= 4) {
if (ScoreMatrix(m)(j) > max) {
max = ScoreMatrix(m)(j)
}
m += 1
}
sum += max
j += 1
}
sum
}
def Consensus(): String = {
var i: Int = 0
var j: Int = 0
var k: Int = 0
var m: Int = 0
var max: Int = 0
var imax: Int = 0
var str: String = null
i = 1
while (i <= t) {
pickup(i) = " " +
seq(i).substring(bestMotif(i), bestMotif(i) + lmer)
i += 1
}
m = 1
while (m <= lmer) {
k = 1
while (k <= 4) {
ScoreMatrix(k)(m) = 0
k += 1
}
m += 1
}
m = 1
while (m <= lmer) {
k = 1
while (k <= t) pickup(k).charAt(m) match {
case 'a' => ScoreMatrix(1)(m) += 1
case 'c' => ScoreMatrix(2)(m) += 1
case 'g' => ScoreMatrix(3)(m) += 1
case 't' => ScoreMatrix(4)(m) += 1
}
m += 1
}
str = ""
imax = 0
j = 1
while (j <= lmer) {
max = 0
i = 1
while (i <= 4) {
if (ScoreMatrix(i)(j) > max) {
max = ScoreMatrix(i)(j)
imax = i
}
i += 1
}
imax match {
case 1 => str += 'a'
case 2 => str += 'c'
case 3 => str += 'g'
case 4 => str += 't'
}
j += 1
}
str
}
def newPickup(one: Int, h: Int) {
if (one == 1) startPos(h) = 1 else startPos(h) += 1
pickup(h) = " " + seq(h).substring(startPos(h), startPos(h) + lmer)
}
}
and thanks, i hope someone gonna find my failure.
Your current implementation 'hangs' on this loop:
while (k <= i) pickup(k).charAt(m) match {
case 'a' => ScoreMatrix(1)(m) += 1
case 'c' => ScoreMatrix(2)(m) += 1
case 'g' => ScoreMatrix(3)(m) += 1
case 't' => ScoreMatrix(4)(m) += 1
}
As it stands, the exit condition is never fulfilled because the relation between k and i never changes. Either increment k or decrement i.
It looks like programming is not the key aspect of this work, but increased modularity should help contain complexity.
Also, I wonder about the choice of using Scala. There're many areas in this algorithm that would benefit of a more functional approach. In this translation, using Scala in an imperative way gets cumbersome. If you have the opportunity, I'd recommend you to explore a more functional approach to solve this problem.
A tip: The intellij debugger didn't have issues with this code.