Scala, iterating a collection, working out 10% points - scala

While iterating an arbitrarily-sized List, I'd like to print some output at ~10% intervals to show that the iteration is progressing. For any list of 10 or more elements, I want 10 outputs printed.
I've played around with % and Math functions, but am not always getting 10 outputs printed unless the list sizes are multiples of 10. Would appreciate your help.

One possibility is to calculate 10% of the size based on your input, and then use IterableLike.grouped to group based on that percent:
object Test {
def main(args: Array[String]): Unit = {
val range = 0 to Math.abs(Random.nextInt(100))
val length = range.length
val percent = Math.ceil((10.0 * length) / 100.0).toInt
println(s"Printing by $percent percent")
range.grouped(percent).foreach {
listByPercent =>
println(s"Printing $percent elements: ")
listByPercent.foreach(println)
}
}
}

Unless the length of your list is divisible by 10, then you are not going to get 10 print statements. Here I am rounding by interval up (ceil) so you will have less print statements. You could used Math.floor which would round down, and give you more print statements.
// Some list
val list = List.range(0, 27)
// Find the interval that is roughly 10 percent
val interval = Math.ceil(list.length / 10.0)
// Zip the list with the index, so that we can look at the indexes
list.zipWithIndex.foreach {
case (value, index) =>
// If an index is divisible by out interval, do your logging
if (index % interval == 0) {
println(s"$index / ${list.length}")
}
// Do something with the value here
}
Output:
0 / 27
3 / 27
6 / 27
9 / 27
12 / 27
15 / 27
18 / 27
21 / 27
24 / 27

Related

Creating an optimal selection of overlapping time intervals

A car dealer rents out the rare 1956 Aston Martin DBR1 (of which Aston Martin only ever made 5).
Since there are so many rental requests, the dealer decides to place bookings for an entire year in advance.
He collects the requests and now needs to figure out which requests to take.
Make a script that selects the rental requests such that greatest number of individual customers
can drive in the rare Aston Martin.
The input of the script is a matrix of days of the year, each row representing the starting and ending
days of the request. The output should be the indices of the customers and their day ranges.
It is encouraged to plan your code first and write your own functions.
At the top of the script, add a comment block with a description of how your code works.
Example of a list with these time intervals:
list = [10 20; 9 15; 16 17; 21 100;];
(It should also work for a list with 100 time intervals)
We could select customers 1 and 4, but then 2 and 3 are impossible, resulting in two happy customers.
Alternatively we could select requests 2, 3 and 4. Hence three happy customers is the optimum here.
The output would be:
customers = [2, 3, 4],
days = [9, 15; 16, 17; 21, 100]
All I can think of is checking if intervals intersect, but I have no clue how to make an overall optimal selection.
My idea:
1) Sort them by start date
2) Make an array of intersections for each one
3) Start to reject from the ones which has the biggest intersection array, removing rejected item from arrays of intersected units
4) Repeat point 3 until only units with empty arrays will remain
In your example we will get data
10 20 [9 15, 16 17]
9 15 [10 20]
16 17 [10 20]
21 100 []
so we reject 10 20 as it has 2 intersections, so we will have only items with empty arrays
9 15 []
16 17 []
21 100 []
so the search is finished
code on javascript
const inputData = ' 50 74; 6 34; 147 162; 120 127; 98 127; 120 136; 53 68; 145 166; 95 106; 242 243; 222 250; 204 207; 69 79; 183 187; 198 201; 184 199; 223 245; 264 291; 100 121; 61 61; 232 247'
// convert string to array of objects
const orders = inputData.split(';')
.map((v, index) => (
{
id: index,
start: Number(v.split(' ')[1]),
end: Number(v.split(' ')[2]),
intersections: []
}
))
// sort them by start value
orders.sort((a, b) => a.start - b.start)
// find intersection for each one and add them to intersection array
orders.forEach((item, index) => {
for (let i = index + 1; i < orders.length; i++) {
if (orders[i].start <= item.end) {
item.intersections.push(orders[i])
orders[i].intersections.push(item)
} else {
break
}
}
})
// sort by intersections count
orders.sort((a, b) => a.intersections.length - b.intersections.length)
// loop while at least one item still has intersections
while (orders[orders.length - 1].intersections.length > 0) {
const rejected = orders.pop()
// remove rejected item from other's intersections
rejected.intersections.forEach(item => {
item.intersections = item.intersections.filter(
item => item.id !== rejected.id
)
})
// sort by intersections count
orders.sort((a, b) => a.intersections.length - b.intersections.length)
}
// sort by start value
orders.sort((a, b) => a.start - b.start)
// show result
orders.forEach(item => { console.log(item.start + ' - ' + item.end)})
Wanted to expand/correct a little bit on the acvepted answer.
You should start by sorting by the start date.
Then accept the very last customer.
Go through the list descending from there and accept all request that do not overlap with the already accepted ones.
That's the optimal solution.

How would I constantly increase a value after a certain amount of time?

I'm trying to figure out how to increase a variable by + 20 every 10 seconds, any simple way to do this?
This is how I might do it.
import java.time.LocalTime
import java.time.temporal.ChronoUnit.SECONDS
class Clocker(initial :Long, increment :Long, interval :Long) {
private val start = LocalTime.now()
def get :Long =
initial + SECONDS.between(start, LocalTime.now()) / interval * increment
}
usage:
// start from 7, increase by 20 every 10 seconds
val clkr = new Clocker(7, 20, 10)
clkr.get //res0: Long = 7
// 11 seconds later
clkr.get //res1: Long = 27
// 19 seconds later
clkr.get //res2: Long = 27
// 34 seconds later
clkr.get //res3: Long = 67

How to efficiently perform nested-loop in Spark/Scala?

So I have this main dataframe, called main_DF which contain all measurement values:
main_DF
group index width height
--------------------------------
1 1 21.3 15.2
1 2 11.3 45.1
2 3 23.2 25.2
2 4 26.1 85.3
...
23 986453 26.1 85.3
And another table called selected_DF, derived from main_DF, which contain the start & end index of important rows in main_DF, along with the length (end_index - start_index). The fields start_index and end_index correspond with field index in main_DF.
selected_DF
group start_index end_index length
--------------------------------
1 1 154 153
2 236 312 76
3 487 624 137
...
238 17487 18624 1137
Now, for each row in selected_DF, I need to perform filtering for all measurement values between the start_index and end_index. For example, let's say row 1 is for index = 1 until 154. After some filtering, dataframe derived from this row is:
peak_DF
peak_start peak_end
--------------------------------
1 12
15 21
27 54
86 91
...
143 150
peak_start and peak_end indicate the area where width exceeds the threshold. It was obtained by selecting all width > threshold, and then check the position of its index (sorry but it's kind of hard to explain, even with the code)
Then I need to take the measurement value (width) based on peak_DF and calculate the average, making it something like:
peak_DF_summary
peak_start peak_end avg_width
--------------------------------
1 12 25.6
15 21 35.7
27 54 24.2
86 91 76.6
...
143 150 13.1
And, lastly, calculate the average of avg_width, and save the result.
After that, the curtain moves to the next row in selected_DF, and so on.
So far I somehow managed to obtain what I want with this code:
val main_DF = spark.read.parquet("hdfs_path_here")
df.createOrReplaceTempView("main_DF")
val selected_DF = spark.read.parquet("hdfs_path_here").collect.par //parallelized array
val final_result_array = scala.collection.mutable.ArrayBuffer.empty[Array[Double]] //for final result
selected_DF.foreach{x =>
val temp = x.split(',')
val start_index = temp(1)
val end_index = temp(2)
//obtain peak_start and peak_end (START)
val temp_df_1 = spark.sql( " (SELECT index, width, height FROM main_DF WHERE width > 25 index BETWEEN " + start_index + " AND " + end_index + ")")
val temp_df_2 = temp_df_1.withColumn("next_index", lead(temp_df("index"), 1).over(window_spec) ).withColumn("previous_index", lag(temp_df("index"), 1).over(window_spec) )
val temp_df_3 = temp_df_2.withColumn("rear_gap", temp_df_2.col("index") - temp_df_2.col("previous_index") ).withColumn("front_gap", temp_df_2.col("next_index") - temp_df_2.col("index") )
val temp_df_4 = temp_df_3.filter("front_gap > 9 or rear_gap > 9")
val temp_df_5 = temp_df_4.withColumn("next_front_gap", lead(temp_df_4("front_gap"), 1).over(window_spec) ).withColumn("next_front_gap_index", lead(temp_df_4("index"), 1).over(window_spec) )
val temp_df_6 = temp_df_5.filter("rear_gap > 9 and next_front_gap > 9").sort("index")
//obtain peak_start and peak_end (END)
val peak_DF = temp_df_6.select("index" , "next_front_gap_index").toDF("peak_start", "peak_end").collect
val peak_DF_temp = peak_DF.map { y =>
spark.sql( " (SELECT avg(width) as avg_width FROM main_DF WHERE index BETWEEN " + y(0) + " AND " + y(1) + ")")
}
val peak_DF_summary = peak_DF_temp.reduceLeft( (dfa, dfb) => dfa.unionAll(dfb) )
val avg_width = peak_DF_summary.agg(mean("avg_width")).as[(Double)].first
final_result_array += avg_width._1
}
spark.catalog.dropTempView("main_DF")
(reference)
The problem is, the code can only run until around halfway (after 20-30 iterations) until it crashed and give out java.lang.OutOfMemoryError: Java heap space. It runs okay when I ran the iterations 1-by-1, though.
So my questions are:
How can there be insufficient memory? I thought the reason should be
accumulated usage of memory, so I add .unpersist() for every
dataframe inside foreach loop (even though I do no .persist()) to no avail.
But then, every memory consumption should be reset along with
re-initiation of variables when we enter new iteration in foreach
loop, no?
Is there any efficient way to do this kind of calculation? I am
doing nested-loop in Spark and I feel like this is a very
inefficient way to do this, but so far it's the only way I can get
result.
I'm using CDH 5.7 with Spark 2.1.0. My cluster has 6 nodes with 32GB memory (each) & 40 cores (total). main_DF is based on 30GB parquet file.

Count number of repeats in Swift

I want to know how am I supposed to count the number of time a loop has repeated itself? More specifically how do I extract and output the number of repeats?
var x = 20
while x < 100 {
x += 10
}
The loop has executed 8 times in order to get x == 100. Is there a way to extract the number '8' so it can be used somewhere else (e.g. to make it a variable elsewhere)?
You said it yourself: you want to count. So count!
var x = 20
var numtimes = 0
while x < 100 {
x += 10
numtimes += 1 // count!
}
numtimes // 8

Number of Cycles from list of values, which are mix of positives and negatives in Spark and Scala

Have an RDD with List of values, which are mix of positives and negatives.
Need to compute number of cycles from this data.
For example,
val range = List(sampleRange(2020,2030,2040,2050,-1000,-1010,-1020,Starting point,-1030,2040,-1020,2050,2040,2020,end point,-1060,-1030,-1010)
the interval between each value in above list is 1 second. ie., 2020 and 2030 are recorded in 1 second interval and so on.
how many times it turns from negative to positive and stays positive for >= 2 seconds.
If >= 2 seconds it is a cycle.
Number of cycles: Logic
Example 1: List(1,2,3,4,5,6,-15,-66)
No. of cycles is 1.
Reason: As we move from 1st element of list to 6th element, we had 5 intervals which means 5 seconds. So one cycle.
As we move to 6th element of list, it is a negative value. So we start counting from 6th element and move to 7th element. The negative values are only 2 and interval is only 1. So not counted as cycle.
Example 2:
List(11,22,33,-25,-36,-43,20,25,28)
No. of cycles is 3.
Reason: As we move from 1st element of list to 3rd element, we had 2 intervals which means 2 seconds. So one cycle As we move to 4th element of list, it is a negative value. So we start counting from 4th element and move to 5th, 6th element. we had 2 intervals which means 2 seconds. So one cycle As we move to 7th element of list, it is a positive value. So we start counting from 7th element and move to 8th, 9th element. we had 2 intervals which means 2 seconds. So one cycle.
range is a RDD in the use case. It looks like
scala> range
range: Seq[com.Range] = List(XtreamRange(858,890,899,920,StartEngage,-758,-790,-890,-720,920,940,950))
You can encode this "how many times it turns from negative to positive and stays positive for >= 2 seconds. If >= 2 seconds it is a cycle." pretty much directly into a pattern match with a guard. The expression if(h < 0 && ht > 0 && hht > 0) checks for a cycle and adds one to the result then continues with the rest of the list.
def countCycles(xs: List[Int]): Int = xs match {
case Nil => 0
case h::ht::hht::t if(h < 0 && ht > 0 && hht > 0) => 1 + countCycles(t)
case h::t => countCycles(t)
}
scala> countCycles(range)
res7: Int = 1
A one liner
range.sliding(3).count{case f::s::t::Nil => f < 0 && s > 0 && t > 0}
This generates all sub-sequences of length 3 and counts how many are -ve, +ve, +ve
Generalising cycle length
def countCycles(n:Int, xs:List[Int]) = xs.sliding(n+1)
.count(ys => ys.head < 0 && ys.tail.forall(_ > 0))
The below code would help you resolve you query.
object CycleCheck {
def main(args: Array[String]) {
var data3 = List(1, 4, 82, -2, -12, "startingpoint", -9, 32, 76,45, -98, 76, "Endpoint", -24)
var data2 = data3.map(x => getInteger(x)).filter(_ != "unknown").map(_.toString.toInt)
println(data2)
var nCycle = findNCycle(data2)
println(nCycle)
}
def getInteger(obj: Any) = obj match {
case n: Int => obj
case _ => "unknown"
}
def findNCycle(obj: List[Int]) : Int = {
var cycleCount =0
var sign = ""
var signCheck="+"
var size = obj.size - 1
var numberOfCycles=0
var i=0
for( x <- obj){
if (x < 0){
sign="-"
}
else if (x > 0){
sign="+"
}
if(signCheck.equals(sign))
cycleCount=cycleCount+1
if(!signCheck.equals(sign) && cycleCount>1){
cycleCount = 1
numberOfCycles=numberOfCycles+1
}
if(size==i && cycleCount>1)
numberOfCycles= numberOfCycles+1
if(cycleCount==1)
signCheck = sign;
i=i+1
}
return numberOfCycles
}
}