Spark getting slow as time goes on

Spark getting slow as time goes on - scala

I have a scala code that runs on top of spark 2.4.0 to compute the BFS of a graph which is stored in a table as below:
src
dst
isVertex
1
1
1
2
2
1
...
...
...
1
2
0
2
4
0
...
...
...
At some point in the algorithm, I need to update the visited flag of current vertex neighbors. I am doing this by the following code. When I execute the code, it works fine but as time goes on, it becomes slower and slower. It seems that the last nested loop is the problem:
//var vertices = schema(StructType(Seq(StructField("id",IntegerType),StructField("visited", IntegerType),StructField("qOrder",IntegerType))
val neighbours = edges.filter($"src" === start).join(vertices,$"id" === $"dst").filter($"visited" === lit(0))
.select($"dst".as("id")).withColumn("visited", lit(1)).withColumn("qOrder", lit(priorityCounter)).cache()
-----------------------------------------------------------------------
vertices.collect.foreach{x=>
if(!neighbours.filter(col("id")===x(0)).head(1).isEmpty){
vertices = vertices.filter($"id" =!= x(0)).union(neighbours.filter(col("id")===x(0))).cache()
}
}
-----------------------------------------------------------------------
When it becomes slow, it starts giving the following errors and warnings:
2021-06-08 19:55:08,998 [driver-heartbeater] WARN org.apache.spark.executor.Executor - Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: GC overhead limit exceeded
Does anyone have any idea about the problem?
I have set the spark parameters as follows:
spark.scheduler.listenerbus.eventqueue.capacity 100000000
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.executorIdleTimeout 2m
spark.dynamicAllocation.minExecutors 1
spark.dynamicAllocation.maxExecutors 10000
spark.max.fetch.failures.per.stage 10
spark.rpc.io.serverThreads 64
spark.rpc.askTimeout 600s
spark.driver.memory 32g
spark.executor.memory 32g

Related

Spark task hanging at [GC (Allocation Failure) ]

EDIT: Note: An executor will normally emit the message [GC (Allocation Failure) ] . It runs it because it is trying to allocate memory to the Executor, but the executor is full, so it will GC in attempt to make space when loading something new to the Executor. If your Executor does this in a loop, it may mean what you are trying to load into that Executor is too big.
I am running Spark 2.2, Scala 2.11 on AWS EMR 5.8.0
I'm trying to run a count operation on a Dataset that refuses to finish. What's frustrating, is that it is only hanging on one particular file. I run this job on a different file from S3, no problem - it finishes completely. The original CSV file is #18GB itself, and we run a transformation on it to turn the original CSV into a struct column, giving it one extra column.
My environment's core slaves are 8 instance where each is:
r3.2xlarge
16 vCore, 61 GiB memory, 160 SSD GB storage
My Spark session settings are:
implicit val spark = SparkSession
.builder()
.appName("MyApp")
.master("yarn")
.config("spark.speculation","false")
.config("hive.metastore.uris", s"thrift://$hadoopIP:9083")
.config("hive.exec.dynamic.partition", "true")
.config("hive.exec.dynamic.partition.mode", "nonstrict")
.config("mapreduce.fileoutputcommitter.algorithm.version", "2")
.config("spark.dynamicAllocation.enabled", false)
.config("spark.executor.cores", 5)
.config("spark.executors.memory", "18G")
.config("spark.yarn.executor.memoryOverhead", "2G")
.config("spark.driver.memory", "18G")
.config("spark.executor.instances", 23)
.config("spark.default.parallelism", 230)
.config("spark.sql.shuffle.partitions", 230)
.enableHiveSupport()
.getOrCreate()
The data comes from a CSV file:
val ds = spark.read
.option("header", "true")
.option("delimiter", ",")
.schema(/* 2 cols: [ValidatedNel, and a stuct schema */)
.csv(sourceFromS3)
.as(MyCaseClass)
val mappedDs:Dataset[ValidatedNel, MyCaseClass] = ds.map(...)
mappedDs.repartition(230)
val count = mappedDs.count() // never finishes
As expected, it spins up 230 tasks, and 229 finish, except one somewhere in the middle. See below - the first task just hangs forever, the middle one finishes no problem (though is odd - the size records/ratio is very different) - and the other 229 tasks look exactly the same as the finished one.
Index| ID |Attempt |Status|Locality Level|Executor ID / Host| Launch Time | Duration |GC Time|Input Size / Records|Write Time | Shuffle Write Size / Records| Errors
110 117 0 RUNNING RACK_LOCAL 11 / ip-XXX-XX-X-XX.uswest-2.compute.internal 2019/10/01 20:34:01 1.1 h 43 min 66.2 MB / 2289538 0.0 B / 0
0 7 0 SUCCESS PROCESS_LOCAL 9 / ip-XXX-XX-X-XXX.us-west-2.compute.internal 2019/10/01 20:32:10 1.0 s 16 ms 81.2 MB /293 5 ms 59.0 B / 1 <-- this task is odd, but finishes
1 8 0 SUCCESS RACK_LOCAL 9 / ip-XXX-XX-X-XXX.us-west-2.compute.internal 2019/10/01 20:32:10 2.1 min 16 ms 81.2 MB /2894845 9 s 59.0 B / 1 <- the other tasks are all similar to this one
Checking the stdout of the hanging tasks, I repeatedly see the following the never ends:
2019-10-01T21:51:16.055+0000: [GC (Allocation Failure) 2019-10-01T21:51:16.055+0000: [ParNew: 10904K->0K(613440K), 0.0129982 secs]2019-10-01T21:51:16.068+0000: [CMS2019-10-01T21:51:16.099+0000: [CMS-concurrent-mark: 0.031/0.044 secs] [Times: user=0.17 sys=0.00, real=0.04 secs]
(concurrent mode failure): 4112635K->2940648K(4900940K), 0.4986233 secs] 4123539K->2940648K(5514380K), [Metaspace: 60372K->60372K(1103872K)], 0.5121869 secs] [Times: user=0.64 sys=0.00, real=0.51 secs]
Another note is that before I call the count, I'm calling repartition(230) just priot to calling count on the Dataset[T] to insure equal distribution of data
What is going on here?

It probably has something to do with data skew and/or data parsing issues. Note that the problem partition has radically more records than the one, processed successfully:
Input Size / Records
66.2 MB / 2289538
81.2 MB /293
I'd check that all the partition files have roughly the same size and number of records. Perhaps line and/or column delimiters are off in either the problem or "good" partition files (293 lines seems to be too low for ~80 Mb file).

Why is orientdb oetl import giving me this error

I am trying to import a csv file into orientdb 3.0 I have created and tested the json file and it works with a smaller dataset. But the dataset that I want to import is around a billion rows (six columns)
Following is the user.json file I am using for import with oetl
{
"source": { "file": { "path": "d1.csv" } },
"extractor": { "csv": {} },
"transformers": [
{ "vertex": { "class": "User" } }
],
"loader": {
"orientdb": {
"dbURL": "plocal:/databases/magriwebdoc",
"dbType": "graph",
"classes": [
{"name": "User", "extends": "V"}
], "indexes": [
{"class":"User", "fields":["id:string"], "type":"UNIQUE" }
]
}
}
}
This is the console output from oetl command:
2019-05-22 14:31:15:484 INFO Windows OS is detected, 262144 limit of open files will be set for the disk cache. [ONative]
2019-05-22 14:31:15:647 INFO 8261029888 B/7878 MB/7 GB of physical memory were detected on machine [ONative]
2019-05-22 14:31:15:647 INFO Detected memory limit for current process is 8261029888 B/7878 MB/7 GB [ONative]
2019-05-22 14:31:15:649 INFO JVM can use maximum 455MB of heap memory [OMemoryAndLocalPaginatedEnginesInitializer]
2019-05-22 14:31:15:649 INFO Because OrientDB is running outside a container 12% of memory will be left unallocated according to the setting 'memory.leftToOS' not taking into account heap memory [OMemoryAndLocalPaginatedEnginesInitializer]
2019-05-22 14:31:15:650 INFO OrientDB auto-config DISKCACHE=6,477MB (heap=455MB os=7,878MB) [orientechnologies]
2019-05-22 14:31:15:652 INFO System is started under an effective user : `lenovo` [OEngineLocalPaginated]
2019-05-22 14:31:15:670 INFO WAL maximum segment size is set to 6,144 MB [OrientDBEmbedded]
2019-05-22 14:31:15:701 INFO BEGIN ETL PROCESSOR [OETLProcessor]
2019-05-22 14:31:15:703 INFO [file] Reading from file d1.csv with encoding UTF-8 [OETLFileSource]
2019-05-22 14:31:15:703 INFO Started execution with 1 worker threads [OETLProcessor]
2019-05-22 14:31:16:008 INFO Page size for WAL located in D:\databases\magriwebdoc is set to 4096 bytes. [OCASDiskWriteAheadLog]
2019-05-22 14:31:16:703 INFO + extracted 0 rows (0 rows/sec) - 0 rows -> loaded 0 vertices (0 vertices/sec) Total time: 1001ms [0 warnings, 0 errors] [OETLProcessor]
2019-05-22 14:31:16:770 INFO Storage 'plocal:D:\databases/magriwebdoc' is opened under OrientDB distribution : 3.0.18 - Veloce (build 747595e790a081371496f3bb9c57cec395644d82, branch 3.0.x) [OLocalPaginatedStorage]
2019-05-22 14:31:17:703 INFO + extracted 0 rows (0 rows/sec) - 0 rows -> loaded 0 vertices (0 vertices/sec) Total time: 2001ms [0 warnings, 0 errors] [OETLProcessor]
2019-05-22 14:31:17:954 SEVER ETL process has problem: [OETLProcessor]
2019-05-22 14:31:17:956 INFO END ETL PROCESSOR [OETLProcessor]
2019-05-22 14:31:17:957 INFO + extracted 0 rows (0 rows/sec) - 0 rows -> loaded 0 vertices (0 vertices/sec) Total time: 2255ms [0 warnings, 0 errors] [OETLProcessor]D:\orientserver\bin>
I know the code is right but I am assuming it's more of a memory issue!
Please advise what should I do.

Have you tried improving your memory settings, according to the size of the data that you want to process?
From the documentation, you can custom these properties:
Configuration Environmental Variables (See $ORIENTDB_OPTS_MEMORY parameter)
Performance-Tuning - Memory Settings
Maybe could help you

Your json script seems no problem, but you can try to delete your indexes part. I have encountered the same problem because of the wrong indexes, too. It may because the UNIQUE indexes constraint. You can try:
Delete the indexes part of json script.
If you need this index, make sure to clear you database before you import your dataset.

Spark Job keeps failing by the running the same map 3 times

My job has a step where I am converting the data frame as RDD[(key, value)] but the step runs three 3 times and getting stuck in the third time and fails
Spark UI shows :
Active Jobs(1)
Job Id (Job Group) Description Submitted Duration Stages: Succeeded/Total Tasks (for all stages): Succeeded/Total
3 (zeppelin-20161017-005442_839671900) Zeppelin map at <console>:69 2016/10/25 05:50:02 1.6 min 0/1 210/45623
Completed Jobs (2)
2 (zeppelin-20161017-005442_839671900) Zeppelin map at <console>:69 2016/10/25 05:16:28 23 min 1/1 46742/46075 (21 failed)
1 (zeppelin-20161017-005442_839671900) Zeppelin map at <console>:69 2016/10/25 04:47:58 17 min 1/1 47369/46795 (20 failed)
This is the code :
val eventsRDD = eventsDF.map {
r =>
val customerId = r.getAs[String]("customerId")
val itemId = r.getAs[String]("itemId")
val countryId = r.getAs[Long]("countryId").toInt
val timeStamp = r.getAs[String]("eventTimestamp")
val totalRent = r.getAs[Int]("totalRent")
val totalPurchase = r.getAs[Int]("totalPurchase")
val totalProfit = r.getAs[Int]("totalProfit")
val store = r.getAs[String]("store")
val itemName = r.getAs[String]("itemName")
val itemName = if (itemName.size > 0 && itemName.nonEmpty && itemName != null ) itemName else "NA"
(itemId, (customerId, countryId, timeStamp, totalRent, totalProfit, totalPurchase, store,itemName ))
}
Can someone tell what is wrong here ? If I want persist/cache which one I should do ?
Error :
16/10/25 23:28:55 INFO YarnClientSchedulerBackend: Asked to remove non-existent executor 181
16/10/25 23:28:55 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1477415847345_0005_02_031011 on host: ip-172-31-14-104.ec2.internal. Exit status: 52. Diagnostics: Exception from container-launch.
Container id: container_1477415847345_0005_02_031011
Exit code: 52
Stack trace: ExitCodeException exitCode=52:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

You map operation is resulting in some error and that propogates to the driver which results in task failure.
By default spark.task.maxFailures has value as 4 which is for :
Number of failures of any particular task before giving up on the job.
The total number of failures spread across different tasks will not
cause the job to fail; a particular task has to fail this number of
attempts. Should be greater than or equal to 1. Number of allowed
retries = this value - 1.
So what happens when your task fails spark tries to recompute the map operation untill it has failed 4 times in all.
If I want persist/cache which one I should do ?
cache is just specific persist operation where RDD is persisted with default storage level (MEMORY_ONLY).

MongoDB Concurrency Bottleneck

Too Long; Didn't Read
The question is about a concurrency bottleneck I am experiencing on MongoDB. If I make one query, it takes 1 unit of time to return; if I make 2 concurrent queries, both take 2 units of time to return; generally, if I make n concurrent queries, all of them take n units of time to return. My question is about what can be done to improve Mongo's response times when faced with concurrent queries.
The Setup
I have a m3.medium instance on AWS running a MongoDB 2.6.7 server. A m3.medium has 1 vCPU (1 core of a Xeon E5-2670 v2), 3.75GB and a 4GB SSD.
I have a database with a single collection named user_products. A document in this collection has the following structure:
{ user: <int>, product: <int> }
There are 1000 users and 1000 products and there's a document for every user-product pair, totalizing a million documents.
The collection has an index { user: 1, product: 1 } and my results below are all indexOnly.
The Test
The test was executed from the same machine where MongoDB is running. I am using the benchRun function provided with Mongo. During the tests, no other accesses to MongoDB were being made and the tests only comprise read operations.
For each test, a number of concurrent clients is simulated, each of them making a single query as many times as possible until the test is over. Each test runs for 10 seconds. The concurrency is tested in powers of 2, from 1 to 128 simultaneous clients.
The command to run the tests:
mongo bench.js
Here's the full script (bench.js):
var
seconds = 10,
limit = 1000,
USER_COUNT = 1000,
concurrency,
savedTime,
res,
timediff,
ops,
results,
docsPerSecond,
latencyRatio,
currentLatency,
previousLatency;
ops = [
{
op : "find" ,
ns : "test_user_products.user_products" ,
query : {
user : { "#RAND_INT" : [ 0 , USER_COUNT - 1 ] }
},
limit: limit,
fields: { _id: 0, user: 1, product: 1 }
}
];
for (concurrency = 1; concurrency <= 128; concurrency *= 2) {
savedTime = new Date();
res = benchRun({
parallel: concurrency,
host: "localhost",
seconds: seconds,
ops: ops
});
timediff = new Date() - savedTime;
docsPerSecond = res.query * limit;
currentLatency = res.queryLatencyAverageMicros / 1000;
if (previousLatency) {
latencyRatio = currentLatency / previousLatency;
}
results = [
savedTime.getFullYear() + '-' + (savedTime.getMonth() + 1).toFixed(2) + '-' + savedTime.getDate().toFixed(2),
savedTime.getHours().toFixed(2) + ':' + savedTime.getMinutes().toFixed(2),
concurrency,
res.query,
currentLatency,
timediff / 1000,
seconds,
docsPerSecond,
latencyRatio
];
previousLatency = currentLatency;
print(results.join('\t'));
}
Results
Results are always looking like this (some columns of the output were omitted to facilitate understanding):
concurrency queries/sec avg latency (ms) latency ratio
1 459.6 2.153609008 -
2 460.4 4.319577324 2.005738882
4 457.7 8.670418178 2.007237636
8 455.3 17.4266174 2.00989353
16 450.6 35.55693474 2.040380754
32 429 74.50149883 2.09527338
64 419.2 153.7325095 2.063482104
128 403.1 325.2151235 2.115460969
If only 1 client is active, it is capable of doing about 460 queries per second over the 10 second test. The average response time for a query is about 2 ms.
When 2 clients are concurrently sending queries, the query throughput maintains at about 460 queries per second, showing that Mongo hasn't increased its response throughput. The average latency, on the other hand, literally doubled.
For 4 clients, the pattern continues. Same query throughput, average latency doubles in relation to 2 clients running. The column latency ratio is the ratio between the current and previous test's average latency. See that it always shows the latency doubling.
Update: More CPU Power
I decided to test with different instance types, varying the number of vCPUs and the amount of available RAM. The purpose is to see what happens when you add more CPU power. Instance types tested:
Type vCPUs RAM(GB)
m3.medium 1 3.75
m3.large 2 7.5
m3.xlarge 4 15
m3.2xlarge 8 30
Here are the results:
m3.medium
concurrency queries/sec avg latency (ms) latency ratio
1 459.6 2.153609008 -
2 460.4 4.319577324 2.005738882
4 457.7 8.670418178 2.007237636
8 455.3 17.4266174 2.00989353
16 450.6 35.55693474 2.040380754
32 429 74.50149883 2.09527338
64 419.2 153.7325095 2.063482104
128 403.1 325.2151235 2.115460969
m3.large
concurrency queries/sec avg latency (ms) latency ratio
1 855.5 1.15582069 -
2 947 2.093453854 1.811227185
4 961 4.13864589 1.976946318
8 958.5 8.306435055 2.007041742
16 954.8 16.72530889 2.013536347
32 936.3 34.17121062 2.043083977
64 927.9 69.09198599 2.021935563
128 896.2 143.3052382 2.074122435
m3.xlarge
concurrency queries/sec avg latency (ms) latency ratio
1 807.5 1.226082735 -
2 1529.9 1.294211452 1.055566166
4 1810.5 2.191730848 1.693487447
8 1816.5 4.368602642 1.993220402
16 1805.3 8.791969257 2.01253581
32 1770 17.97939718 2.044979532
64 1759.2 36.2891598 2.018374668
128 1720.7 74.56586511 2.054769676
m3.2xlarge
concurrency queries/sec avg latency (ms) latency ratio
1 836.6 1.185045183 -
2 1585.3 1.250742872 1.055438974
4 2786.4 1.422254414 1.13712774
8 3524.3 2.250554777 1.58238551
16 3536.1 4.489283844 1.994745425
32 3490.7 9.121144097 2.031759277
64 3527 18.14225682 1.989033023
128 3492.9 36.9044113 2.034168718
Starting with the xlarge type, we begin to see it finally handling 2 concurrent queries while keeping the query latency virtually the same (1.29 ms). It doesn't last too long, though, and for 4 clients it again doubles the average latency.
With the 2xlarge type, Mongo is able to keep handling up to 4 concurrent clients without raising the average latency too much. After that, it starts to double again.
The question is: what could be done to improve Mongo's response times with respect to the concurrent queries being made? I expected to see a rise in the query throughput and I did not expect to see it doubling the average latency. It clearly shows Mongo is not being able to parallelize the queries that are arriving.
There's some kind of bottleneck somewhere limiting Mongo, but it certainly doesn't help to keep adding up more CPU power, since the cost will be prohibitive. I don't think memory is an issue here, since my entire test database fits in RAM easily. Is there something else I could try?

You're using a server with 1 core and you're using benchRun. From the benchRun page:
This benchRun command is designed as a QA baseline performance measurement tool; it is not designed to be a "benchmark".
The scaling of the latency with the concurrency numbers is suspiciously exact. Are you sure the calculation is correct? I could believe that the ops/sec/runner was staying the same, with the latency/op also staying the same, as the number of runners grew - and then if you added all the latencies, you would see results like yours.

How to achieve high concurrency with spray.io in this Future and Thread.sleep example?

I was trying the following POC to check how to get high concurrency
implicit def executionContext = context.system.dispatchers.lookup("async-futures-dispatcher")
implicit val timeout = 10 seconds
val contestroute = "/contestroute" {
get {
respondWithMediaType(`application/json`) {
dynamic {
onSuccess(
Future {
val start = System.currentTimeMillis()
// np here should be dealt by 200 threads defined below, so why
// overall time takes so long? why doesn't it really utilize all
// threads I have given to it? how to update the code so it
// utilizes the 200 threads?
Thread.sleep(5000)
val status = s"timediff ${System.currentTimeMillis() - start}ms ${Thread.currentThread().getName}"
status
}) { time =>
complete(s"status: $time")
}
}
}
}
}
My config:
async-futures-dispatcher {
# Dispatcher is the name of the event-based dispatcher
type = Dispatcher
# What kind of ExecutionService to use
executor = "thread-pool-executor"
# Configuration for the thread pool
thread-pool-executor {
# minimum number of threads to cap factor-based core number to
core-pool-size-min = 200
# No of core threads ... ceil(available processors * factor)
core-pool-size-factor = 20.0
# maximum number of threads to cap factor-based number to
core-pool-size-max = 200
}
# Throughput defines the maximum number of messages to be
# processed per actor before the thread jumps to the next actor.
# Set to 1 for as fair as possible.
throughput = 100
}
however when I run apache bench like this:
ab -n 200 -c 50 http://LAP:8080/contestroute
Results I get are:
Server Software: Apache-Coyote/1.1
Server Port:erred: 37500 bytes
HTML transferred: 10350 bytes
Requests per second: 4.31 [#/sec] (mean)
Time per request: 34776.278 [ms] (mean)
Time per request: 231.842 [ms] (mean, across all concurrent requests)
Transfer rate: 1.05 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 5 406 1021.3 7 3001
Processing: 30132 30466 390.8 30308 31231
Waiting: 30131 30464 391.8 30306 31231
Total: 30140 30872 998.9 30353 33228 8080
Document Path: /contestroute
Document Length: 69 bytes
Concurrency Level: 150
Time taken for tests: 34.776 seconds
Complete requests: 150
Failed requests: 0
Write errors: 0
Non-2xx responses: 150
Total transferred: 37500 bytes
HTML transferred: 10350 bytes
Requests per second: 4.31 [#/sec] (mean)
Time per request: 34776.278 [ms] (mean)
Time per request: 231.842 [ms] (mean, across all concurrent requests)
Transfer rate: 1.05 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 5 406 1021.3 7 3001
Processing: 30132 30466 390.8 30308 31231
Waiting: 30131 30464 391.8 30306 31231
Total: 30140 30872 998.9 30353 33228
Am I missing something big? what do I need to change to have my spray and futures utilize all threads i given to it?
(to add i'm running on top of tomcat servlet 3.0)

In your example all spray operations and blocking operations happen in the same context. You need to split 2 contexts:
Also I don't see the reason to use dynamic, I guess just 'complete' should be good.
implicit val timeout = 10.seconds
// Execution Context for blocking ops
val blockingExecutionContext = {
ExecutionContext.fromExecutor(Executors.newFixedThreadPool(2000))
}
// Execution Context for Spray
import context.dispatcher
override def receive: Receive = runRoute(contestroute)
val contestroute = path("contestroute") {
get {
complete {
Future.apply {
val start = System.currentTimeMillis()
// np here should be dealt by 200 threads defined below, so why
// overall time takes so long? why doesn't it really utilize all
// threads I have given to it? how to update the code so it
// utilizes the 200 threads?
Thread.sleep(5000)
val status = s"timediff ${System.currentTimeMillis() - start}ms ${Thread.currentThread().getName}"
status
}(blockingExecutionContext)
}
}
}
After that you can test it with
ab -n 200 -c 200 http://LAP:8080/contestroute
and you'll see that spray will create all 200 threads for blocking operations
Results:
Concurrency Level: 200
Time taken for tests: 5.096 seconds

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Spark getting slow as time goes on - scala

Related

Spark task hanging at [GC (Allocation Failure) ]

Why is orientdb oetl import giving me this error

Spark Job keeps failing by the running the same map 3 times

MongoDB Concurrency Bottleneck

How to achieve high concurrency with spray.io in this Future and Thread.sleep example?

Categories

Resources