I'm experiencing issues within the T3 architecture where the db processes aren't fully utilizing the cpu's threading capabilities. I'd like to understand how 11g and 10g takes advantage of threading and if I can validate those queries from the system.
How can I tell whether a solaris process is parallelized and is taking advantage of the threading within the cpu?
Just run prstat and have a look to the last column, labeled PROCESS/NLWP.
NLWP means number of light-weight processes which is precisely the number of threads the process is currently using with Solaris as there is a one-to-one mapping between lwp and user threads.
A single thread process will show 1 there while a multi-threaded one will show a larger number.
eg:
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
...
12905 root 4472K 3640K cpu0 59 0 0:00:01 0.4% prstat/1
18403 monitor 474M 245M run 59 17 1:01:28 9.1% java/103
4102 oracle 12G 12G run 59 0 0:00:12 4.5% oracle/1
Here prstat and oracle are single-threaded while java is multi-threaded (it is always)
You can drill down individual threads activity of a multi-threaded process by using the -L and -p options, like prstat -L -p pid
This will show up a line for each thread sorted by CPU activity. In that case the last column will be labeled PROCESS/LWPID, LWPID being the thread id. If more than one thread shows a significant activity, your process will actively be taking advantage of multi-threading.
Related
I have been monitoring the end to end latency of my microservice applications. Each service is loosely coupled via an ActiveMQ Artemis queue.
------------- ------------- -------------
| Service 1 | --> | Service 2 | --> | Service 3 |
------------- ------------- -------------
Service 1 listens as an HTTP endpoint and produces to a queue 1. Service 2 consumes from queue 1, modifies the message, & produces to queue 2. Service 3 consumes from queue 2. Each service inserts to db a row in a separate table. From there I can also monitor latency. So "end-to-end" is going into "Service 1" and coming out of "Service 3".
Each service processing time remains steady, and most messages have a reasonable e2e latency of a few milliseconds. I produce with a constant rate using JMeter of 400 req/sec, and I can monitor this via Grafana.
Sporadically I notice a dip in this constant rate which can be seen throughout the chain. At first I thought it could be the producer side (Service 1) since the rate suddenly dropped to 370 req/sec and might be attributed to GC or possibly the JMeter HTTP simulator fault, but this does not explain why certain messages e2e latency jumps to ~2-3 sec.
Since it would be hard to reproduce my scenario I checked out this load generator for ActiveMQ Artemis and bumped the versions up to 2.17.0, 5.16.2 & 0.58.0. To match my broker 2.17.0. Which is a cluster of 2 masters/slaves using nfsv4 shared storage.
The below command generated 5,000,000 messages to a single queue q6, with 4 producer/consumer with a max overall produce rate of 400. Messages are persistent. The only code change in the artemis-load-generator was in ConsumerLatencyRecorderTask when elapsedTime > 1sec I would print out the message ID and latency.
java -jar destination-bench.jar --persistent --bytes 1000 --protocol artemis --url tcp://localhost:61616?producerMaxRate=100 --out /tmp/test1.txt --name q6 --iterations 5000000 --runs 1 --warmup 20000 --forks 4 --destinations 1
From this I noticed that there were outlier messages with produce/consume latency nearing 2 secs. Most (90.00%) were below 3358.72 microseconds.
I am not sure why and how this happens? Is this reasonable ?
EDIT/UPDATE
I have run the test a few times this an output of a shorter run.
java -jar destination-bench.jar --persistent --bytes 1000 --protocol artemis --url tcp://localhost:61616?producerMaxRate=100 --out ~/test-perf1.txt --name q6 --iterations 400000 --runs 1 --warmup 20000 --forks 4 --destinations 1
The result is below
RUN 1 EndToEnd Throughput: 398 ops/sec
**************
EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
mean 10117.30
min 954.37
50.00% 1695.74
90.00% 2637.82
99.00% 177209.34
99.90% 847249.41
99.99% 859832.32
max 5939134.46
count 1600000
The JVM Threads Statusis what I am noticing in my actual system on the broker a lot of time_waiting threads and were there are spike push-to-queue latency seems to increase.
Currently my data is as i said hosted on ntfs v4 as shown . I read Artemis persistence section that
If the journal is on a volume which is shared with other processes which might be writing other files (e.g. bindings journal, database, or transaction coordinator) then the disk head may well be moving rapidly between these files as it writes them, thus drastically reducing performance.
Should I move the bindings folder outside ntfs on the vms disk? Will this improve performance ? It is unclear to me.
How does this affect Shared Store HA?
I started a fresh, default instance of ActiveMQ Artemis 2.17.0, cloned and built the artemis-load-generator (with a modification to alert immediately on messages that take > 1 second to process), and then ran the same command you ran. I let the test run for about an hour on my local machine, but I didn't let it finish because it was going to take over 3 hours (5 million messages at 400 messages per second). Out of roughly 1 million messages I saw only 1 "outlier" - certainly nothing close to the 10% you're seeing. It's worth noting that I was still using my computer for my normal development work during this time.
At this point I have to attribute this to some kind of environmental issue, e.g.:
Garbage Collection
Low performance disk
Network latency
Insufficient CPU, RAM, etc.
We have a Data ware house server running on Debian linux ,We are using PostgreSQL , Jenkins and Python.
It's been few day the memory of the CPU is consuming a lot by jenkins and Postgres.tried to find and check all the ways from google but the issue is still there.
Anyone can give me a lead on how to reduce this memory consumption,It will be very helpful.
below is the output from free -m
total used free shared buff/cache available
Mem: 63805 9152 429 16780 54223 37166
Swap: 0 0 0
below is the postgresql.conf file
Below is the System configurations,
Results from htop
Please don't post text as images. It is hard to read and process.
I don't see your problem.
Your machine has 64 GB RAM, 16 GB are used for PostgreSQL shared memory like you configured, 9 GB are private memory used by processes, and 37 GB are free (the available entry).
Linux uses available memory for the file system cache, which boosts PostgreSQL performance. The low value for free just means that the cache is in use.
For Jenkins, run it with these JAVA Options
JAVA_OPTS=-Xms200m -Xmx300m -XX:PermSize=68m -XX:MaxPermSize=100m
For postgres, start it with option
-c shared_buffers=256MB
These values are the one I use on a small homelab of 8GB memory, you might want to increase these to match your hardware
How can I know the consumption of RAM, processor and disk that MongoDB takes when I'm doing find queries, insert queries, update queries, bulk queries, etc.
I though about MongoPerf but it only shows me disk usage, although is awesome cause can create threads, choose an amount of gb, and read or write. But I need to know how much RAM it takes too, and processor
It could be like doing htop for MongoDB
You could use the ps(1) command (I guess you are on Linux).
Programmatically, you could (on Linux) use the /proc/ file system (which is used by ps, top, htop). For details, read proc(5).
To get the pid of your MongoDb process, you could use pidof(1) or pgrep(1). If the pid of mongod server is 1234, you should be interested by /proc/1234/status.
Notice that (on Linux) a process does not directly consume RAM. The (mongod server) process has a virtual address space, and the kernel manages the RAM (and dispatches it among-st processes). You could be interested by the resident set size (and you can query it with ps or via /proc/)
The virtual address space of process of pid 1234 can be queried via /proc/1234/status and /proc/1234/maps (see also pmap(1)).
If you are not familiar with /proc/ play first with it on the command line, for your shell, by running cat /proc/$$/status and cat /proc/$$/maps and exploring /proc/$$/.
On my machine, sudo cat /proc/$(pidof mongod)/status gives some interesting output.
From the bind9 man page, I understand that the named process starts one worker thread per CPU if it was able to determine the number of CPUs. If its unable to determine, a single worker thread is started.
My question is how does it calculate the number of CPUs? I presume by CPU, it means cores. The Linux machine I work is customized and has kernel 2.6.34 and does not support lscpu or nproc utilities. named is starting a single thread even if i give -n 4 option. Is there any other way to force named to start multiple threads?
Thanks in advance.
I am doing load testing on my application using jmeter and I have a situation where the cpu usage by the applications jvm goes to 99% and it stays there. Application still work, I am able to login and do some activity. But, it’s understandably slower.
Details of environment:
Server: AMD Optrom, 2.20 Ghz, 8 Core, 64bit, 24 GB RAM. Windows Server 2008 R2 Standard
Application server: jboss-4.0.4.GA
JAVA: jdk1.6.0_25, Java HotSpot(TM) 64-Bit Server VM
JVM settings:
-Xms1G -Xmx10G -XX:MaxNewSize=3G -XX:MaxPermSize=12G -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseCompressedOops -Dsun.rmi.dgc.client.gcInterval=1800000 -Dsun.rmi.dgc.server.gcInterval=1800000
Database: MySql 5.6 (in a different machine)
Jmeter: 2.13
My scenario is that, I make 20 users of my application to log into it and perform normal activity that should not be bringing huge load. Some, minutes into the process, JVM of Jboss goes up and it never comes back. CPU usage will remain like that till JVM is killed.
To help better understand, here are few screen shots.
I found few post which had cup # 100%, but nothing there was same as my situation and could not find a solution.
Any suggestion on what’s to be done will be great.
Regards,
Sreekanth.
To understand the root cause of the high CPU utilization, we need to check the CPU data and thread dumps at same time.
Capture 5-6 thread dumps at the time of the issue. Similarly capture CPU consumption thread-by-thread basis.
Generally the root cause of the CPU issue would be problems with threads like BLOCKED threads, long running threads, dead-lock, long running loops etc. That can be resolved by going through the stacks of the threads.