Can only do 4 concurrent futures as maximum in Scala - scala

I thought that using futures would easily allow me to to fire off one shot code blocks, however it seems I can only have 4 futures at a time.
Where does this restriction come from, or am I abusing Futures by using it like this?
import scala.concurrent._
import ExecutionContext.Implicits.global
import scala.util.{Failure, Success}
import java.util.Calendar
object Main extends App{
val rand = scala.util.Random
for (x <- 1 to 100) {
val f = Future {
//val sleepTime = rand.nextInt(1000)
val sleepTime = 2000
Thread.sleep(sleepTime)
val today = Calendar.getInstance().getTime()
println("Future: " + x + " - sleep was: " + sleepTime + " - " + today)
1;
}
}
Thread.sleep(10000)
}
Output:
Future: 3 - sleep was: 2000 - Mon Aug 31 10:02:44 CEST 2015
Future: 2 - sleep was: 2000 - Mon Aug 31 10:02:44 CEST 2015
Future: 4 - sleep was: 2000 - Mon Aug 31 10:02:44 CEST 2015
Future: 1 - sleep was: 2000 - Mon Aug 31 10:02:44 CEST 2015
Future: 7 - sleep was: 2000 - Mon Aug 31 10:02:46 CEST 2015
Future: 5 - sleep was: 2000 - Mon Aug 31 10:02:46 CEST 2015
Future: 6 - sleep was: 2000 - Mon Aug 31 10:02:46 CEST 2015
Future: 8 - sleep was: 2000 - Mon Aug 31 10:02:46 CEST 2015
Future: 9 - sleep was: 2000 - Mon Aug 31 10:02:48 CEST 2015
Future: 11 - sleep was: 2000 - Mon Aug 31 10:02:48 CEST 2015
Future: 10 - sleep was: 2000 - Mon Aug 31 10:02:48 CEST 2015
Future: 12 - sleep was: 2000 - Mon Aug 31 10:02:48 CEST 2015
Future: 16 - sleep was: 2000 - Mon Aug 31 10:02:50 CEST 2015
Future: 13 - sleep was: 2000 - Mon Aug 31 10:02:50 CEST 2015
Future: 15 - sleep was: 2000 - Mon Aug 31 10:02:50 CEST 2015
Future: 14 - sleep was: 2000 - Mon Aug 31 10:02:50 CEST 2015
I expected them to all show the same time.
To give some context, I thought I could use this construct and extend it by having a main loop, in which it sleeps every loop according to a value drawn from a exponential disitribution , to emulate user arrival/execution of a query. After each sleep I'd like to execute the query by sending it to the program's driver (in this case Spark, and the driver allows for multiple threads using it.) Is there a more obvious way than to use Futures?

When you are using using import ExecutionContext.Implicits.global,
It creates thread pool which has the same size of the number of CPUs.
From the source of the ExecutionContext.scala
The default ExecutionContext implementation is backed by a work-stealing thread pool. By default,
the thread pool uses a target number of worker threads equal to the number of [[https://docs.oracle.com/javase/8/docs/api/java/lang/Runtime.html#availableProcessors-- available processors]].
And there's good StackOverflow question: What is the behavior of scala.concurrent.ExecutionContext.Implicits.global?
Since the default size of the thread pool depends on number of CPUs, if you want to use larger thread pool, you have to write something like
import scala.concurrent.ExecutionContext
import java.util.concurrent.Executors
implicit val ec = ExecutionContext.fromExecutorService(Executors.newWorkStealingPool(8))
before executing the Future.
( In your code, you have to place it before for loop. )
Note that work stealing pool was added in java 8, scala has their own ForkJoinPool which does the work stealing: scala.concurrent.forkjoin.ForkJoinPool vs java.util.concurrent.ForkJoinPool
Also if you want one thread per Future, you can write something like
implicit val ec = ExecutionContext.fromExecutorService(Executors.newSingleThreadExecutor)
Therefore, the following code executes 100 threads in parallel
import scala.concurrent._
import java.util.concurrent.Executors
object Main extends App{
for (x <- 1 to 100) {
implicit val ec = ExecutionContext.fromExecutorService(Executors.newSingleThreadExecutor)
val f = Future {
val sleepTime = 2000
Thread.sleep(sleepTime)
val today = Calendar.getInstance().getTime()
println("Future: " + x + " - sleep was: " + sleepTime + " - " + today)
1;
}
}
Thread.sleep(10000)
}
In addition to work stealing thread pool and single thread executors, there's some other executors: http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html
Read the docs for detail:
http://docs.scala-lang.org/overviews/core/futures.html

The default pool when using import scala.concurrent.ExecutionContext.Implicits.global indeed has as many threads as you have cores on your machine. This is ideal for non-blocking code (no synchronous io/sleep/...) but can be problematic and even cause deadlocks when you use it for blocking code.
However, this pool can actually grow if you mark blocking code in a scala.concurrent.blocking block. The same marker is for example in use when you are using Await.result and Await.ready functions that block while waiting for a Future.
see the api docs for blocking
So all you have to do is update your example:
import scala.concurrent.blocking
...
val sleepTime = 2000
blocking{
Thread.sleep(sleepTime)
}
...
Now all futures will end after 2000 ms

you can also use
`implicit val ec = ExecutionContext.fromExecutorService(ExecutorService.newFixedThreadPool(NUMBEROFTHREADSYOUWANT))`
in NUMBEROFTHREADSYOUWANT you can give number of threads want to start.
This will use before Future .

Related

Is there any formula that i can use to how to show up value (month) in between from start to end date in spreadsheet

Is there any formula that I can use to show up each month according to start & end date in spreadsheet.
Example:
Start Date:2022-07-22
End Date:2022-10-22
I expected formula to extract value something like this
Jul - Aug - Sep - Oct
I've tried formula
=IF(A2="","",IF(TEXT(B2,"MM")-TEXT(A2,"MM")>1,CONCATENATE(TEXT(A2,"MMM")&" - "&text(EDATE(A2,1),"MMM")&" - "&TEXT(B2,"MMM")),IF(TEXT(A2,"MMM")=TEXT(B2,"MMM"),TEXT(A2,"MMM"),CONCATENATE(TEXT(A2,"MMM")&" - "&TEXT(B2,"MMM"))))) but it only give me correct value if there is up to 3 month period between start & end date.
Here's a link to the sample spreadsheet
For single cell can try-
=JOIN("-",UNIQUE(INDEX(TEXT(SEQUENCE(B2-A2+1,1,A2),"mmm"))))
For spill array-
=BYROW(A2:INDEX(B2:B,MATCH(9^9,B2:B)),LAMBDA(x,JOIN("-",UNIQUE(INDEX(TEXT(SEQUENCE(INDEX(x,2)-INDEX(x,1)+1,1,INDEX(x,1)),"mmm"))))))
See your sheet.
Get the difference in dates in months using DATEDIF and get dates in each intervening month using EOMONTH+SEQUENCE and convert the end of month dates to TEXT:
Start Date
End Date
Months
2022-07-01
2022-10-30
Jul - Aug - Sep - Oct
2022-08-02
2022-08-31
Aug
2022-07-03
2022-11-01
Jul - Aug - Sep - Oct - Nov
Drag fill formula:
=ARRAYFORMULA(JOIN(" - ",TEXT(EOMONTH(A2,SEQUENCE(DATEDIF(A2,EOMONTH(B2,),"M")+1)-1),"mmm")))
Or as a self adjusting array formula:
=MAP(A2:INDEX(A:A,COUNTA(A:A)),LAMBDA(a, ARRAYFORMULA(JOIN(" - ",TEXT(EOMONTH(a,SEQUENCE(DATEDIF(a,EOMONTH(OFFSET(a,0,1),),"M")+1)-1),"mmm")))))
This should be faster and efficient than getting all the dates and filtering them out one by one, thereby reducing space and time complexity.
Use sequence(), edate() and join(), like this:
=arrayformula( map(
A2:A, B2:B,
lambda(
start, end,
if(
isdate(start) * isdate(end),
join(
" - ",
text(
edate(
start,
sequence(
12 * (year(end) - year(start)) + month(end) - month(start) + 1,
1, 0
)
),
"MMM"
)
),
iferror(1/0)
)
)
) )

Shard Server crashed with invariant failure - commonPointOpTime.getTimestamp() >= lastCommittedOpTime.getTimestamp()

We have deployed the community edition of MongoDB in a Kubernetes cluster. In this deployment, one of the shard DB pods is crashing with a failure error message,
2022-08-08T17:46:04.110+0000 F - [rsBackgroundSync] Invariant failure commonPointOpTime.getTimestamp() >= lastCommittedOpTime.getTimestamp() src/mongo/db/repl/rollback_impl.cpp 955
We want to understand in what circumstances such an error might arise.
Going through the logs, I understand that this specific server has a replication commit point greater than the rollback common point.
https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/rs_rollback.cpp#L1995
2022-08-08T17:43:22.737+0000 I REPL [rsBackgroundSync] Starting rollback due to OplogStartMissing: Our last optime fetched: { ts: Timestamp(1658645237, 1), t: 33 }. source's GTE: { ts: Timestamp(1658645239, 1), t: 34 }
2022-08-08T17:43:22.737+0000 I REPL [rsBackgroundSync] Replication commit point: { ts: Timestamp(1658645115, 8), t: 33 }
2022-08-08T17:43:22.737+0000 I REPL [rsBackgroundSync] Rollback using 'recoverToStableTimestamp' method.
2022-08-08T17:43:22.737+0000 I REPL [rsBackgroundSync] Scheduling rollback (sync source: mongo-shareddb-3.mongo-shareddb-service.avx.svc.cluster.local:27017)
2022-08-08T17:43:22.737+0000 I ROLLBACK [rsBackgroundSync] transition to ROLLBACK
2022-08-08T17:46:04.109+0000 I ROLLBACK [rsBackgroundSync] Rollback common point is { ts: Timestamp(1658645070, 13), t: 33 }
2022-08-08T17:46:04.110+0000 F - [rsBackgroundSync] Invariant failure commonPointOpTime.getTimestamp() >= lastCommittedOpTime.getTimestamp() src/mongo/db/repl/rollback_impl.cpp 955
2022-08-08T17:46:04.110+0000 F - [rsBackgroundSync] \n\n***aborting after invariant() failure\n\n
Timestamps in the log:
1658645105 - 24 July 2022 06:45:05 GMT
1658645237 - 24 July 2022 06:47:17 GMT
1658645239 - 24 July 2022 06:47:19 GMT
1658645115 - 24 July 2022 06:45:15 GMT
1658645070 - 24 July 2022 06:44:30 GMT
commonPointOpTime = 24 July 2022 06:44:30 GMT
lastCommittedOpTime = 24 July 2022 06:45:15 GMT
What could lead to such an invalid state?

Wildlfy Schedule overlapping

I use a scheduler in WildFly 9, with this EJBs:
import javax.ejb.Singleton;
import javax.ejb.Startup;
import javax.ejb.Schedule;
I get loads of these warnings:
2020-01-21 12:35:59,000 WARN [org.jboss.as.ejb3] (EJB default - 6) WFLYEJB0043: A previous execution of timer [id=3e4ec2d2-cea9-43c2-8e80-e4e66593dc31 timedObjectId=FiloJobScheduler.FiloJobScheduler.FiskaldatenScheduler auto-timer?:true persistent?:false timerService=org.jboss.as.ejb3.timerservice.TimerServiceImpl#71518cd4 initialExpiration=null intervalDuration(in milli sec)=0 nextExpiration=Tue Jan 21 12:35:59 GMT+02:00 2020 timerState=IN_TIMEOUT info=null] is still in progress, skipping this overlapping scheduled execution at: Tue Jan 21 12:35:59 GMT+02:00 2020.
But when I measure the elapsed times, they are allways < 1 minute.
The Scheduling is:
#Schedule(second = "*", minute = "*/5", hour = "*", persistent = false)
Has anyone an idea what is going on?
A little logging would help you. This runs every second because that's what you're telling it to do with the second="*" section. If you want to only run every 5 minutes of every hour, change the schedule to:
#Schedule(minute = "*/5", hour="*", persistent = false)

Akka-Quartz-Scheduler, how to use cron expression

When i use jobs and triggers to schedule message publishing, it works
val job = JobBuilder.newJob(classOf[ScheduledMessagePublisher]).withIdentity("Job", "Group").build()
val trigger: CronTrigger = TriggerBuilder.newTrigger()
.withIdentity("Trigger", "Group")
.withSchedule(CronScheduleBuilder.cronSchedule("0 33 10 11 JAN ? 2019"))
.forJob("Job", "Group")
.build
quartz.start()
quartz.scheduleJob(job, trigger)
But, when i use actors and QuartzSchedulerExtension, my code never fire when the time has come, logs just write batch acquisition of 0 triggers
val test = context.actorOf(Executor.props(client))
QuartzSchedulerExtension(context.system).createSchedule("Test", None, "0 33 10 11 JAN ? 2019")
QuartzSchedulerExtension(context.system).schedule("Test", test, Executor.PublishMessage)
i think problem in cron expression "0 33 10 11 JAN ? 2019" because when i use only seconds and minutes, it works "0 30 * * * ? *"
Your cron expression is correct.
But the default timezone for QuartzSchedulerExtension is UTC. Check the document here.
Hence, you explicitly need to specify your current timezone.
Here's the solution:
val test = context.actorOf(Executor.props(client))
QuartzSchedulerExtension(context.system).createSchedule("Test", None, "0 33 10 11 JAN ? 2019", None, TimeZone.getDefault)
QuartzSchedulerExtension(context.system).schedule("Test", test, Executor.PublishMessage)

How to perform future of futures in parallel and wait for them to complete (running in parallel)

I want to perform "flows" where each flow is performed in parallel. Each flow in itself performs operations using futures:
def doFlow(...): Seq[Future[Something]] = {
(1 to 10) map {
Future {
Something(...)
}
}
}
val sequence: Seq[Seq[Future[Something]]] = (1 to 10) map {
iter => doFlow(...)
}
// now I want to wait for all of them to complete:
val flat: Seq[Future[Something]] = sequence.flatten
val futureSeq = Future.sequence(flat)
futureSeq.onComplete {
...
case Success(val) => {...}
}
I am printing a log of the completions and I see that they are running sequentially rather than in parallel like I want it to
=======================
First started at Wed Apr 18 12:02:22 IDT 2018
Last ended at Wed Apr 18 12:02:28 IDT 2018
Took 4.815 seconds
=======================
First started at Wed Apr 18 12:02:28 IDT 2018
Last ended at Wed Apr 18 12:02:35 IDT 2018
Took 4.335 seconds
=======================
First started at Wed Apr 18 12:02:35 IDT 2018
Last ended at Wed Apr 18 12:02:41 IDT 2018
Took 3.83 seconds
...
...
Works on my machine:
import ExecutionContext.Implicits.global
def doFlow(chunk: Int): Seq[Future[Int]] = {
(1 to 5) map { i =>
Future {
println(s"--> chunk $chunk idx $i")
Thread.sleep(1000)
println(s"<-- chunk $chunk idx $i")
0
}
}
}
val sequence: Seq[Seq[Future[Int]]] = (1 to 5) map {
iter => doFlow(iter)
}
val flat: Seq[Future[Int]] = sequence.flatten
val futureSeq = Future.sequence(flat)
Await.ready(futureSeq, scala.concurrent.duration.Duration.Inf)
Output sample:
--> chunk 1 idx 2
--> chunk 1 idx 4
--> chunk 1 idx 1
--> chunk 1 idx 3
--> chunk 2 idx 1
--> chunk 1 idx 5
--> chunk 2 idx 3
--> chunk 2 idx 2
<-- chunk 1 idx 2
<-- chunk 2 idx 1
<-- chunk 1 idx 3
--> chunk 2 idx 5
--> chunk 3 idx 1
<-- chunk 1 idx 1
<-- chunk 1 idx 5
<-- chunk 1 idx 4
--> chunk 3 idx 3
--> chunk 2 idx 4
It processes 8 tasks at a time.
Do you have any internal synchronization inside Something which can introduce blocking?
Future.sequence(xs.map { x => futureY })
is the way to go.
However, if your future finish immediately, or if ExecutorContext process only 1 of them at a time, they will be effectively sequential.
Your Futures do take time to execute, so I would investigate the ExecutionContext. ExecutionContext.Implicits.global uses as many threads as host has CPUs (so, a single core machine will have ExecutorService with 1 thread).
Defining ExecutorContext for SingleThreadExecutor will also result in running things sequentially.
Then there is also a possibility of blocking inside a Future. Or tracing wrong things.
To figure out more we would have to look what Something(...) does and what is the ExecutorContext you use.