“can't find element ... of the list ... which is only of length” error in Netlogo - netlogo

I'm working on my thesis about solving a Traveling salesman problem with genetic algorithm. I use netlogo to solve the problem. But i got this error :
Can't find element 62 of the list
[7400 5100 5000 5000 2100 4300 5200 1200 900 4300 6000 6000 7600 5900 7600
8600 7400 7100 6800 8100 3300 1400 1200 10400 8500 3700 11400 6900 2000 650
0 3000 4900 9800 10600 4000 5200 7700 8500 5900 5000 7100 6100 6800 1000
3200 2700 2900 1800 1300 9600 4800 4600 6700 7700 6100 4200 3200 9000 8200
10500 13400],
which is only of length 62.
error while turtle 2 running ITEM
called by procedure CALCULATE-DISTANCE
called by procedure SETUP_1
called by Button 'setup 1'
and i dont know why. Can someone help me about this?

Related

A few odd things in a tcpdump capture for a database replication stream

I'm trying to resolve the following performance problem. There is a database which is synchronously replicated to a remote location via TCP. Currently, everything works great. But it's being migrated to new hardware, and a test load shows that everything slows down roughly by a factor of 2. Basically, the current setup supports sustained transfer rates of 200-300 MB/s whereas the new one gets 100-150MB/s at best, and it's not good enough for us.
There is nothing obviously wrong from the database side. Database instrumentation says that the source database is busy sending data on the network (by large chunks, tens of MB at a time), and the destination one is busy receiving it on the network. So I'm looking at the TCP packet capture in Wireshark and I notice a few things that look a bit odd in the new setup -- see a sample below.
AFAIK the window scaling factor is 7 for this conversation so receive window gets a x128 factor which means most of the time it's not a limiting factor.
First of all, most of the time there is only 1 packet in flight per every ACK which is not the case for the existing setup where I can see multiple bursts of tens of outgoing packets. Is this Nagle algorithm in action or something else? It's supposed to be off (there is a tcp nodelay option on the application level) but it's still a bit suspicious.
Second, I don't understand the timings. It's almost as if something is controlling the rate of outgoing packets and keeps it roughly to 1 packet every 50 us (sometimes a bit more, sometimes a bit less), rather than leaving within a couple of microseconds after getting an ACK. Could there be some sort of burst control in place or am I imagining things?
Third, segment size. Most of segments are 8kB as compared to existing setup where they are 64kB. We experimented with the application settings but we can't seem to be able to make a difference -- 64kB segments are there, but they are rare. Is there a way in Linux to strongly encourage larger segments?
36 2022-09-01 15:02:45.267111 192.168.240.122 192.168.240.115 TCP 8210 45508 → 1600 [PSH, ACK] Seq=2162935757 Ack=3197136358 Win=6166 Len=8156
37 2022-09-01 15:02:45.267115 192.168.240.115 192.168.240.122 TCP 54 1600 → 45508 [ACK] Seq=3197136358 Ack=2162943913 Win=24525 Len=0
38 2022-09-01 15:02:45.267162 192.168.240.122 192.168.240.115 TCP 8210 45508 → 1600 [PSH, ACK] Seq=2162943913 Ack=3197136358 Win=6166 Len=8156
39 2022-09-01 15:02:45.267166 192.168.240.115 192.168.240.122 TCP 54 1600 → 45508 [ACK] Seq=3197136358 Ack=2162952069 Win=24525 Len=0
40 2022-09-01 15:02:45.267212 192.168.240.122 192.168.240.115 TCP 8210 45508 → 1600 [PSH, ACK] Seq=2162952069 Ack=3197136358 Win=6166 Len=8156
41 2022-09-01 15:02:45.267215 192.168.240.115 192.168.240.122 TCP 54 1600 → 45508 [ACK] Seq=3197136358 Ack=2162960225 Win=24525 Len=0
42 2022-09-01 15:02:45.267261 192.168.240.122 192.168.240.115 TCP 8210 45508 → 1600 [PSH, ACK] Seq=2162960225 Ack=3197136358 Win=6166 Len=8156
43 2022-09-01 15:02:45.267265 192.168.240.115 192.168.240.122 TCP 54 1600 → 45508 [ACK] Seq=3197136358 Ack=2162968381 Win=24525 Len=0
44 2022-09-01 15:02:45.267313 192.168.240.122 192.168.240.115 TCP 8210 45508 → 1600 [PSH, ACK] Seq=2162968381 Ack=3197136358 Win=6166 Len=8156
45 2022-09-01 15:02:45.267318 192.168.240.115 192.168.240.122 TCP 54 1600 → 45508 [ACK] Seq=3197136358 Ack=2162976537 Win=24525 Len=0
46 2022-09-01 15:02:45.267342 192.168.240.122 192.168.240.115 TCP 8210 45508 → 1600 [PSH, ACK] Seq=2162976537 Ack=3197136358 Win=6166 Len=8156
47 2022-09-01 15:02:45.267346 192.168.240.115 192.168.240.122 TCP 54 1600 → 45508 [ACK] Seq=3197136358 Ack=2162984693 Win=24525 Len=0
48 2022-09-01 15:02:45.267391 192.168.240.122 192.168.240.115 TCP 8210 45508 → 1600 [PSH, ACK] Seq=2162984693 Ack=3197136358 Win=6166 Len=8156
Any suggestions are greatly appreciated.
Thanks!
Update: I've shared packet capture files on sender and receiver sides for both current setup and old setup at https://drive.google.com/drive/folders/1ktBDjRHOUCfia1kTfdVIQdS-Q1k4B3qn
Update2: I've written a blog entry about this investigation for those interested: https://savvinov.com/2022/09/20/use-of-packet-capture-and-other-advanced-tools-in-network-issues-troubleshooting/
Best regards,
Nikolai
While I couldn't find answers to all of my questions, I found the ones that mattered most.
It turned out that the TCP stack was sending data in 8kB segments because the "application" send it that way to it. By "application" I mean the replication software (Oracle Data Guard) that picked up a stream of database changes on the source database and wrote it to the remote standby.
So eventually I traced tcp_sendmsg using BCC trace.py utility and found that its segment size argument was about 8kB (8156 bytes to be more specific). Then I traced the network stack on the "application" level, forcing the connection to be re-established during the tracing, and it turned out to be that the parameter controlling the size of the transmission (SDU or session data unit) was supposed to be 64kB in settings, but in fact the new connection was using a smaller value, 8kB.
Further research showed that there was a number of oddities around the way this parameter is set, and also that the documentation around it was inaccurate.
When the correct way to set the value was found by trial and error, the throughput became immediately much better and all the bottlenecks that bothered us disappeared.
Best regards,
Nikolai

Spring boot admin – high CPU usage on client

I have one application which uses Spring Boot Admin Server library (v2.2.1) to monitor the other applications with integrated Spring Boot Client library (v2.2.1). It works very well; server tracks status of client applications with low performance impact.
Nevertheless, when I open page Insight -> Details (default page), CPU usage of client application grows to 90 – 100 % which causes that my application (running on 1CPU system) responds very slowly. Other pages of Spring Boot Server are OK.
According to my observation, high CPU usage is caused by frequent refresh (1 second) of information there, especially charts. In my case sends Spring Boot Admin about 16 requests in 1 second. Processing each takes about 400 ms (last number)
2020-03-11 08:31:45.290 127.0.0.1 - GET "/actuator/metrics/cache.gets?tag=name:ui-logbook,result:miss" 200 HTTP/1.0 308 421
2020-03-11 08:31:45.291 127.0.0.1 - GET "/actuator/metrics/cache.gets?tag=name:area-services,result:miss" 200 HTTP/1.0 312 438
2020-03-11 08:31:45.291 127.0.0.1 - GET "/actuator/metrics/cache.gets?tag=name:area-services,result:hit" 200 HTTP/1.0 286 437
2020-03-11 08:31:45.290 127.0.0.1 - GET "/actuator/metrics/cache.size?tag=name:area-services" 200 HTTP/1.0 314 420
2020-03-11 08:31:45.292 127.0.0.1 - GET "/actuator/metrics/cache.gets?tag=name:ui-logbook,result:hit" 200 HTTP/1.0 282 428
2020-03-11 08:31:45.426 127.0.0.1 - GET "/actuator/metrics/cache.size?tag=name:ui-logbook" 200 HTTP/1.0 310 100
2020-03-11 08:31:46.513 127.0.0.1 - GET "/actuator/metrics/jvm.threads.peak" 200 HTTP/1.0 219 436
2020-03-11 08:31:46.520 127.0.0.1 - GET "/actuator/metrics/jvm.threads.live" 200 HTTP/1.0 215 434
2020-03-11 08:31:46.520 127.0.0.1 - GET "/actuator/metrics/process.cpu.usage" 200 HTTP/1.0 207 434
2020-03-11 08:31:46.520 127.0.0.1 - GET "/actuator/metrics/system.cpu.usage" 200 HTTP/1.0 177 433
2020-03-11 08:31:46.521 127.0.0.1 - GET "/actuator/metrics/jvm.gc.pause" 200 HTTP/1.0 401 433
2020-03-11 08:31:46.945 127.0.0.1 - GET "/actuator/metrics/jvm.threads.daemon" 200 HTTP/1.0 179 398
2020-03-11 08:31:46.991 127.0.0.1 - GET "/actuator/metrics/jvm.memory.max?tag=area:heap" 200 HTTP/1.0 282 425
2020-03-11 08:31:46.998 127.0.0.1 - GET "/actuator/metrics/jvm.memory.max?tag=area:nonheap" 200 HTTP/1.0 369 420
2020-03-11 08:31:46.998 127.0.0.1 - GET "/actuator/metrics/jvm.memory.used?tag=area:nonheap" 200 HTTP/1.0 318 422
2020-03-11 08:31:46.999 127.0.0.1 - GET "/actuator/metrics/jvm.memory.used?tag=area:heap" 200 HTTP/1.0 233 420
Is there any way how to reduce refresh rate in order to reduce CPU load on client system?
So far I have found only this answer but suggested solution did not work.
Thanks for sharing ideas.

Spark SQL group data by range and trigger alerts

I am processing the data stream from Kafka using structured streaming with pyspark. I want to publish alerts to Kafka if the readings are abnormal in avro format
source temperature timestamp
1001 21 4/28/2019 10:25
1001 22 4/28/2019 10:26
1001 23 4/28/2019 10:27
1001 24 4/28/2019 10:28
1001 25 4/28/2019 10:29
1001 34 4/28/2019 10:30
1001 37 4/28/2019 10:31
1001 36 4/28/2019 10:32
1001 38 4/28/2019 10:33
1001 40 4/28/2019 10:34
1001 41 4/28/2019 10:35
1001 42 4/28/2019 10:36
1001 45 4/28/2019 10:37
1001 47 4/28/2019 10:38
1001 50 4/28/2019 10:39
1001 41 4/28/2019 10:40
1001 42 4/28/2019 10:41
1001 45 4/28/2019 10:42
1001 47 4/28/2019 10:43
1001 50 4/28/2019 10:44
Transform
source range count alert
1001 21-25 5 HIGH
1001 26-30 5 MEDIUM
1001 40-45 5 MEDIUM
1001 45-50 5 HIGH
I have defined a window function with 20 sec and 1 sec sliding. I am able to publish alerts with simple where condition but I am not able to tranform the data frame like above and trigger alerts if the count is 20 for any alert priority (all records in a window are matches with any priority HIGH->count(20) etc). Can any one suggest how to do this?
Also I am able to publish data using json format but not able to generate using AVRO. Scala and Java has to_avro() function but pyspark doesn't have any.
Appreciate your response
I am able to solve this problem using Bucketizer feature transfrom from ml library in spark.
How to bin in PySpark?

Understanding locust summary result

I have a problem to understanding the locust result as this is the first time load test my server, I ran locust using command line on 00:00 local time with; 1000 total user , 100 hatch per second and 10000 request. Below are the result
Name # reqs # fails Avg Min Max | Median req/s
--------------------------------------------------------------------------------------------------------------------------------------------
GET /api/v0/business/result/22918 452 203(30.99%) 9980 2830 49809 | 6500 1.70
GET /api/v0/business/result/36150 463 229(33.09%) 10636 2898 86221 | 7000 1.50
GET /api/v0/business/result/55327 482 190(28.27%) 10401 3007 48228 | 7000 1.60
GET /api/v0/business/result/69274 502 203(28.79%) 9882 2903 48435 | 6800 1.50
GET /api/v0/business/result/71704 469 191(28.94%) 10714 2748 62271 | 6900 1.70
POST /api/v0/business/query 2268 974(30.04%) 10528 2938 55204 | 7100 7.10
GET /api/v0/suggestions/query/?q=na 2361 1013(30.02%) 10775 2713 63359 | 6800 7.80
--------------------------------------------------------------------------------------------------------------------------------------------
Total 6997 3003(42.92%) 22.90
Percentage of the requests completed within given times
Name # reqs 50% 66% 75% 80% 90% 95% 98% 99% 100%
--------------------------------------------------------------------------------------------------------------------------------------------
GET /api/v0/business/result/22918 452 6500 8300 11000 13000 20000 35000 37000 38000 49809
GET /api/v0/business/result/36150 463 7000 9400 12000 14000 21000 35000 37000 38000 86221
GET /api/v0/business/result/55327 482 7000 9800 12000 13000 21000 34000 38000 39000 48228
GET /api/v0/business/result/69274 502 6800 9000 11000 12000 20000 35000 37000 38000 48435
GET /api/v0/business/result/71704 469 6900 9500 11000 13000 21000 36000 38000 40000 62271
POST /api/v0/business/query 2268 7100 9600 12000 13000 21000 35000 37000 38000 55204
GET /api/v0/suggestions/query/?q=na 2361 6800 9900 12000 14000 22000 35000 37000 39000 63359
--------------------------------------------------------------------------------------------------------------------------------------------
Error report
# occurences Error
--------------------------------------------------------------------------------------------------------------------------------------------
80 GET /api/v0/business/result/71704: "HTTPError('502 Server Error: Bad Gateway',)"
111 GET /api/v0/business/result/71704: "HTTPError('504 Server Error: Gateway Time-out',)"
134 GET /api/v0/business/result/22918: "HTTPError('504 Server Error: Gateway Time-out',)"
69 GET /api/v0/business/result/22918: "HTTPError('502 Server Error: Bad Gateway',)"
92 GET /api/v0/business/result/69274: "HTTPError('502 Server Error: Bad Gateway',)"
594 GET /api/v0/suggestions/query/?q=na: "HTTPError('504 Server Error: Gateway Time-out',)"
111 GET /api/v0/business/result/69274: "HTTPError('504 Server Error: Gateway Time-out',)"
419 GET /api/v0/suggestions/query/?q=na: "HTTPError('502 Server Error: Bad Gateway',)"
69 GET /api/v0/business/result/55327: "HTTPError('502 Server Error: Bad Gateway',)"
121 GET /api/v0/business/result/55327: "HTTPError('504 Server Error: Gateway Time-out',)"
397 POST /api/v0/business/query: "HTTPError('502 Server Error: Bad Gateway',)"
145 GET /api/v0/business/result/36150: "HTTPError('504 Server Error: Gateway Time-out',)"
577 POST /api/v0/business/query: "HTTPError('504 Server Error: Gateway Time-out',)"
84 GET /api/v0/business/result/36150: "HTTPError('502 Server Error: Bad Gateway',)"
--------------------------------------------------------------------------------------------------------------------------------------------
here is that I confused about :
what is the meaning of the numbers below #reqs, #fails, Avg, and all number after the name on first and second table? is it to show the total request has been sent or the n-th request sent ?
at the Error Report below # occurences, does total number represent number of request that cause the error ?
thanks for your answer
First table shows the Statistic related to each row with given column explanation in millisecond but total raw shows the total number for each given column. However in your example, there is problem with the calculation of perfcentage of faliure for each raw. For the first raw: 452 requests are sent but 203 of them are failed which means 203/453 ~= 44.81% thbut in the total raw it is calculated correctly.
The second table is the distribution table which shows the percentage of request completed given time interval which in table means that 50% of the total requests to home is completed 6500ms and 66% of requests are completed in 8300ms and respectively goes on.

Mongod resident memory usage low

I'm trying to debug some performance issues with a MongoDB configuration, and I noticed that the resident memory usage is sitting very low (around 25% of the system memory) despite the fact that there are occasionally large numbers of faults occurring. I'm surprised to see the usage so low given that MongoDB is so memory dependent.
Here's a snapshot of top sorted by memory usage. It can be seen that no other process is using an significant memory:
top - 21:00:47 up 136 days, 2:45, 1 user, load average: 1.35, 1.51, 0.83
Tasks: 62 total, 1 running, 61 sleeping, 0 stopped, 0 zombie
Cpu(s): 13.7%us, 5.2%sy, 0.0%ni, 77.3%id, 0.3%wa, 0.0%hi, 1.0%si, 2.4%st
Mem: 1692600k total, 1676900k used, 15700k free, 12092k buffers
Swap: 917500k total, 54088k used, 863412k free, 1473148k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2461 mongodb 20 0 29.5g 564m 492m S 22.6 34.2 40947:09 mongod
20306 ubuntu 20 0 24864 7412 1712 S 0.0 0.4 0:00.76 bash
20157 root 20 0 73352 3576 2772 S 0.0 0.2 0:00.01 sshd
609 syslog 20 0 248m 3240 520 S 0.0 0.2 38:31.35 rsyslogd
20304 ubuntu 20 0 73352 1668 872 S 0.0 0.1 0:00.00 sshd
1 root 20 0 24312 1448 708 S 0.0 0.1 0:08.71 init
20442 ubuntu 20 0 17308 1232 944 R 0.0 0.1 0:00.54 top
I'd like to at least understand why the memory isn't being better utilized by the server, and ideally to learn how to optimize either the server config or queries to improve performance.
UPDATE:
It's fair that the memory usage looks high, which might lead to the conclusion it's another process. There's no other processes using any significant memory on the server; the memory appears to be consumed in the cache, but I'm not clear why that would be the case:
$free -m
total used free shared buffers cached
Mem: 1652 1602 50 0 14 1415
-/+ buffers/cache: 172 1480
Swap: 895 53 842
UPDATE:
You can see that the database is still page faulting:
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn set repl time
0 402 377 0 1167 446 0 24.2g 51.4g 3g 0 <redacted>:9.7% 0 0|0 1|0 217k 420k 457 mover PRI 03:58:43
10 295 323 0 961 592 0 24.2g 51.4g 3.01g 0 <redacted>:10.9% 0 14|0 1|1 228k 500k 485 mover PRI 03:58:44
10 240 220 0 698 342 0 24.2g 51.4g 3.02g 5 <redacted>:10.4% 0 0|0 0|0 164k 429k 478 mover PRI 03:58:45
25 449 359 0 981 479 0 24.2g 51.4g 3.02g 32 <redacted>:20.2% 0 0|0 0|0 237k 503k 479 mover PRI 03:58:46
18 469 337 0 958 466 0 24.2g 51.4g 3g 29 <redacted>:20.1% 0 0|0 0|0 223k 500k 490 mover PRI 03:58:47
9 306 238 1 759 325 0 24.2g 51.4g 2.99g 18 <redacted>:10.8% 0 6|0 1|0 154k 321k 495 mover PRI 03:58:48
6 301 236 1 765 325 0 24.2g 51.4g 2.99g 20 <redacted>:11.0% 0 0|0 0|0 156k 344k 501 mover PRI 03:58:49
11 397 318 0 995 395 0 24.2g 51.4g 2.98g 21 <redacted>:13.4% 0 0|0 0|0 198k 424k 507 mover PRI 03:58:50
10 544 428 0 1237 532 0 24.2g 51.4g 2.99g 13 <redacted>:15.4% 0 0|0 0|0 262k 571k 513 mover PRI 03:58:51
5 291 264 0 878 335 0 24.2g 51.4g 2.98g 11 <redacted>:9.8% 0 0|0 0|0 163k 330k 513 mover PRI 03:58:52
It appears this was being caused by a large amount of inactive memory on the server that wasn't be cleared for Mongo's use.
By looking at the result from:
cat /proc/meminfo
I could see a large amount of Inactive memory. Using this command as a sudo user:
free && sync && echo 3 > /proc/sys/vm/drop_caches && echo "" && free
Freed up the inactive memory, and over the next 24 hours I was able to see the resident memory of my Mongo instance increasing to consume the rest of the memory available on the server.
Credit to the following blog post for it's instructions:
http://tinylan.com/index.php/article/how-to-clear-inactive-memory-in-linux
MongoDB only uses as much memory as it needs, so if all of the data and indexes that are in MongoDB can fit inside what it's currently using you won't be able to push that anymore.
If the data set is larger than memory, there are a couple of considerations:
Check MongoDB itself to see how much data it thinks its using by running mongostat and looking at resident-memory
Was MongoDB re/started recently? If it's cold then the data won't be in memory until it gets paged in (leading to more page faults initially that gradually settle). Check out the touch command for more information on "warming MongoDB up"
Check your read ahead settings. If your system read ahead is too high then MongoDB can't efficiently use the memory on the system. For MongoDB a good number to start with is a setting of 32 (that's 16 KB of read ahead assuming you have 512 byte blocks)
I had the same issue: Windows Server 2008 R2, 16 Gb RAM, Mongo 2.4.3. Mongo uses only 2 Gb of RAM and generates a lot of page faults. Queries are very slow. Disk is idle, memory is free. Found no other solution than upgrade to 2.6.5. It helped.