Mongodb with very high CPU rate

Mongodb with very high CPU rate - mongodb

When I ran the following code and killed it immediately(that means to abnormally exit), the CPU rate of Mongodb would go extremely high(around 100%):
#-*- encoding:UTF-8 -*-
import threading
import time
import pymongo
single_conn = pymongo.Connection('localhost', 27017)
class SimpleExampleThread(threading.Thread):
def run(self):
print single_conn['scrapy'].zhaodll.count(), self.getName()
time.sleep(20)
for i in range(100):
SimpleExampleThread().start()
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
VIRT RES SHR S %CPU %MEM TIME+ COMMAND
696m 35m 6404 S 1181.7 0.1 391:45.31 mongod
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
My Mongodb version is 2.2.3. When the Mongodb worked well, ran the command "strace -c -p " for 1 minute giving the following output:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
33.50 0.322951 173 1867 nanosleep
33.19 0.319950 730 438 recvfrom
21.16 0.203969 16 12440 select
12.13 0.116983 19497 6 restart_syscall
0.02 0.000170 2 73 write
0.00 0.000016 0 146 sendto
0.00 0.000007 0 73 lseek
0.00 0.000000 0 2 read
0.00 0.000000 0 3 open
0.00 0.000000 0 3 close
0.00 0.000000 0 2 fstat
0.00 0.000000 0 87 mmap
0.00 0.000000 0 2 munmap
0.00 0.000000 0 1 pwrite
0.00 0.000000 0 3 msync
0.00 0.000000 0 29 mincore
0.00 0.000000 0 73 fdatasync
------ ----------- ----------- --------- --------- ----------------
100.00 0.964046 15248 total
When the cpu rate of Mongodb went very high(around 100%), ran the same command giving the following output:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
29.12 5.064230 3088 1640 nanosleep
28.83 5.013239 27851 180 recvfrom
22.72 3.950399 658400 6 restart_syscall
19.30 3.356491 327 10268 select
0.02 0.004026 67 60 sendto
0.01 0.001000 333 3 msync
0.00 0.000269 9 30 write
0.00 0.000125 4 30 fdatasync
0.00 0.000031 10 3 open
0.00 0.000000 0 2 read
0.00 0.000000 0 3 close
0.00 0.000000 0 2 fstat
0.00 0.000000 0 30 lseek
0.00 0.000000 0 57 mmap
0.00 0.000000 0 2 munmap
0.00 0.000000 0 1 pwrite
0.00 0.000000 0 14 mincore
------ ----------- ----------- --------- --------- ----------------
100.00 17.389810 12331 total
And if I run the command "lsof", there are many socks with the description "can't identify protocol". I don't know what goes wrong. Are there some bugs in Mongodb?
Thanks!

Related

Minimal Powershell script with Process block gives system process list (MacOS, Pwsh 7.1.3)

I was writing a Powershell script using a pipeline with a Process block and it started doing something unexpected: listing all the running processes and then dumping the script contents. I kept minimizing the script to try to figure out what was going on and ended up with this:
[CmdletBinding()]
Param()
$varname = "huh"
Process
{
# nothing here
}
So it looks like this:
PS /Volumes/folder> cat ./test.ps1
[CmdletBinding()]
Param()
$varname = "huh"
Process
{
# nothing here
}
PS /Volumes/folder> pwsh ./test.ps1
NPM(K) PM(M) WS(M) CPU(s) Id SI ProcessName
------ ----- ----- ------ -- -- -----------
0 0.00 0.00 0.00 0 639
0 0.00 0.00 0.00 1 1
0 0.00 0.00 0.00 60 60
0 0.00 0.00 0.00 61 61
0 0.00 0.00 0.00 65 65
0 0.00 0.00 0.00 67 67
0 0.00 0.00 0.00 68 68
0 0.00 0.00 0.00 69 69
0 0.00 0.00 0.00 71 71
0 0.00 0.00 0.00 73 73
0 0.00 0.00 0.00 75 75
0 0.00 25.60 75.82 68475 1 Activity Monito
0 0.00 11.74 97.63 1053 1 Adobe Crash Han
0 0.00 11.76 97.62 1084 1 Adobe Crash Han
0 0.00 11.69 97.64 1392 1 Adobe Crash Han
0 0.00 112.50 83.59 973 1 Adobe Desktop S
0 0.00 11.94 97.31 986 1 AdobeCRDaemon
0 0.00 16.95 105.99 966 1 AdobeIPCBroker
0 0.00 61.52 168.92 721 1 Adobe_CCXProces
0 0.00 18.57 3.01 454 1 adprivacyd
0 0.00 16.46 23.16 700 1 AGMService
0 0.00 13.65 4.43 701 1 AirPlayUIAgent
--snip--
0 0.00 9.11 12.72 89003 …03 VTDecoderXPCSer
0 0.00 13.32 4.69 418 1 WiFiAgent
0 0.00 12.21 1.58 543 543 WiFiProxy
# nothing here
I haven't done much in Powershell for a long time so if this is something stupid simple I'm going to laugh but I couldn't find anything searching the net.
Can someone tell me what's happening?

In order to use a process block (possibly alongside a begin, end, and, in v7.3+, the clean block), there must not be any code OUTSIDE these blocks - see the conceptual about_Functions help topic.
Therefore, remove $varname = "huh" from the top-level scope of your function body (possibly move it into one of the aforementioned blocks).
As for what you tried:
By having $varname = "huh" in the top-level scope of your function body, you've effectively made the function in one whose code runs in an implicit end block only.
process - because it is on its own line - was then interpreted as a command, which - due to the best-avoided default-verb logic - was interpreted as an argument-less call to the Get-Process cmdlet.
The output therefore included the list of all processes on your system.
The { ... } on the subsequent lines was then interpreted as a script block literal. Since that script block wasn't invoked, it was implicitly output, which results in its stringification, which is its _verbatim content, excluding { and }, resulting in output of the following string:
# nothing here

Osm2pgsql extremely slow on import on server with 192GB RAM

I have a server that has great specs, dual 6 core 3.3GHz 196GB ram with RAID 10 across 4 10K SAS drives. I wrote a script that should download each of the North America files and process them one by one rather than the entire section all at once.
processList.sh:
wget http://download.geofabrik.de/north-america/us/alabama-latest.osm.pbf -O ./geoFiles/north-america/us/alabama-latest.osm.pbf
osm2pgsql -d gis --create --slim -G --hstore --tag-transform-script ~/src/openstreetmap-carto/openstreetmap-carto.lua -C 2000 --number-processes 15 -S ~/src/openstreetmap-carto/openstreetmap-carto.style ./geoFiles/north-america/us/alabama-latest.osm.$
while read in;
do wget http://download.geofabrik.de/$in -O ./geoFiles/$in;
osm2pgsql -d gis --append --slim -G --hstore --tag-transform-script ~/src/openstreetmap-carto/openstreetmap-carto.lua -C 2000 --number-processes 15 -S ~/src/openstreetmap-carto/openstreetmap-carto.style ./geoFiles/$in;
done < maplist.txt
At first it starts out processing at nearly 400K points/second, then slows to 10k or less
osm2pgsql version 0.96.0 (64 bit id space)
Using lua based tag processing pipeline with script /root/src/openstreetmap-carto/openstreetmap-carto.lua
Using projection SRS 3857 (Spherical Mercator)
Setting up table: planet_osm_point
Setting up table: planet_osm_line
Setting up table: planet_osm_polygon
Setting up table: planet_osm_roads
Allocating memory for dense node cache
Allocating dense node cache in one big chunk
Allocating memory for sparse node cache
Sharing dense sparse
Node-cache: cache=2000MB, maxblocks=32000*65536, allocation method=11
Mid: pgsql, cache=2000
Setting up table: planet_osm_nodes
Setting up table: planet_osm_ways
Setting up table: planet_osm_rels
Reading in file: ./geoFiles/north-america/us/alabama-latest.osm.pbf
Using PBF parser.
Processing: Node(5580k 10.7k/s) Way(0k 0.00k/s) Relation(0 0.00/s))
I applied the performance stuff from https://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks for Postgresql:
shared_buffers = 14GB
work_mem = 1GB
maintenance_work_mem = 8GB
effective_io_concurrency = 500
max_worker_processes = 8
max_parallel_workers_per_gather = 2
max_parallel_workers = 8
checkpoint_timeout = 1h
max_wal_size = 5GB
min_wal_size = 1GB
checkpoint_completion_target = 0.9
random_page_cost = 1.1
min_parallel_table_scan_size = 8MB
min_parallel_index_scan_size = 512kB
effective_cache_size = 22GB
Though it starts out well, it quickly deteriorates within about 20 seconds. Any idea why? I looked at top, but it didn't show anything really:
top - 22:48:46 up 3:11, 2 users, load average: 3.49, 4.03, 3.38
Tasks: 298 total, 1 running, 297 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni, 87.5 id, 12.5 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 19808144+total, 19237500+free, 780408 used, 4926040 buff/cache
KiB Swap: 29321212 total, 29321212 free, 0 used. 19437014+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16156 root 20 0 50.819g 75920 8440 S 0.7 0.0 0:02.81 osm2pgsql
16295 root 20 0 42076 4156 3264 R 0.3 0.0 0:00.27 top
1 root 20 0 37972 6024 4004 S 0.0 0.0 0:07.10 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.05 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
6 root 20 0 0 0 0 S 0.0 0.0 0:00.58 kworker/u64:0
8 root 20 0 0 0 0 S 0.0 0.0 0:01.79 rcu_sched
9 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
10 root rt 0 0 0 0 S 0.0 0.0 0:00.05 migration/0
11 root rt 0 0 0 0 S 0.0 0.0 0:00.03 watchdog/0
It had a large load average without listing anything as using it. Here are the results from iotop
Total DISK READ : 0.00 B/s | Total DISK WRITE : 591.32 K/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 204.69 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
28638 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.60 % [kworker/u65:1]
20643 be/4 postgres 0.00 B/s 204.69 K/s 0.00 % 0.10 % postgres: wal writer process
20641 be/4 postgres 0.00 B/s 288.08 K/s 0.00 % 0.00 % postgres: checkpointer process
26923 be/4 postgres 0.00 B/s 98.55 K/s 0.00 % 0.00 % postgres: root gis [local] idle in transaction
1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
3 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]
5 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0H]
6 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/u64:0]
8 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_sched]
9 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_bh]
10 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0]
11 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/0]
12 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/1]

How to remove repeated header from the output of iostat linux command

I am running following command and getting following output which I am saving into file.
sysstat/iostat -mdt sda1 1 >> /tmp/disk.out &
Outout is following
Linux 3.16.0-25-generic (bscpower8n2) 09/25/2016 _ppc64le_ (192 CPU)
09/25/2016 08:12:01 PM
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda1 0.00 0.00 0.00 1 0
09/25/2016 08:12:02 PM
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda1 0.00 0.00 0.00 0 0
09/25/2016 08:12:03 PM
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda1 0.00 0.00 0.00 0 0
09/25/2016 08:12:04 PM
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda1 0.00 0.00 0.00 0 0
09/25/2016 08:12:05 PM
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda1 0.00 0.00 0.00 0 0
09/25/2016 08:12:06 PM
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda1 0.00 0.00 0.00 0 0
But I want to save it without header and datetime also in same row. Could anyone let me know how to achieve this?
e.g
09/25/2016 08:12:01 PM sda1 0.00 0.00 0.00 1 0
09/25/2016 08:12:02 PM sda1 0.00 0.00 0.00 0 0

I have following script written and though it may need improvement, it worked.
#!/bin/bash
while true;
do
iostat -mdt sda1 1 1 | sed -n -e '1d' -e '/^Device:/d' -e '/^$/d' -e 'p' |sed -e 'N;s/\n/ /'
sleep 1
done;

#jpeiper: this loop doesn't work well for me as it gives the same average value (over a longer period) all the time. Only if you let iostat show several values in the continuous mode it will show you shorttime interval values.
You can use grep but it will not push the info to the logfile until the grep buffer is full, and you cannot add extra information to/between the output lines (which you can with a loop)
I'm still looking for a better solution.

insert word between lines

I have pdb (protein data base) file which has thousands of lines.
REMARK 1 PDB file generated by ptraj (set 1000)
ATOM 1 O22 DDM 1 2.800 4.419 20.868 0.00 0.00
ATOM 2 H22 DDM 1 3.427 4.096 20.216 0.00 0.00
ATOM 3 C22 DDM 1 3.351 5.588 21.698 0.00 0.00
ATOM 4 H42 DDM 1 3.456 5.274 22.736 0.00 0.00
ATOM 5 C23 DDM 1 2.530 6.846 21.639 0.00 0.00
ATOM 6 H43 DDM 1 2.347 7.159 20.611 0.00 0.00
ATOM 7 O23 DDM 1 1.313 6.498 22.334 0.00 0.00
ATOM 8 H23 DDM 1 0.903 5.837 21.771 0.00 0.00
ATOM 9 C24 DDM 1 3.073 8.109 22.266 0.00 0.00
ATOM 10 H44 DDM 1 3.139 7.837 23.319 0.00 0.00
ATOM 11 O24 DDM 1 2.218 9.278 22.007 0.00 0.00
ATOM 12 H24 DDM 1 1.278 9.184 22.179 0.00 0.00
ATOM 13 C25 DDM 1 4.494 8.317 21.764 0.00 0.00
ATOM 14 H45 DDM 1 4.391 8.452 20.687 0.00 0.00
'
I want to insert word "TER" every 81 lines in that file whcih contains more than 20,000 lines but ignoring the first line since it is a comment.
I browse through internet, seems SED can do it. But i am lost.
Can anyone guide?
Thanks in advance.

Try this:
sed -i -e '1~81 i\TER' file

I'm partial to awk myself:
awk '{if(FNR%81==0)print "TER"; print}' file
I find this is a lot easier to understand and debug than the sed equivalent. The only magic is that FNR is the line number
You might have to fiddle with the numbers in the if to get it exactly the way you want it.

The more verbose shell commands would be
{
read header
echo "$header"
i=0
while read line; do
echo "$line"
if (( ++i == 81 )); then
echo TER
i=0
fi
done
} < infile > outfile &&
mv outfile infile

How can I profile template performance in Template::Toolkit?

What's the best method for benchmarking the performance of my various templates when using Template::Toolkit?
I want something that will break down how much cpu/system time is spent processing each block or template file, exclusive of the time spent processing other templates within. Devel::DProf, for example, is useless for this, since it simply tells me how much time is spent in the various internal methods of the Template module.

It turns out that Googling for template::toolkit profiling yields the best result, an article from November 2005 by Randal Schwartz. I can't copy and paste any of the article here due to copyright, but suffice to say that you simply get his source and use it as a module after template, like so:
use Template;
use My::Template::Context;
And you'll get output like this to STDERR when your script runs:
-- info.html at Thu Nov 13 09:33:26 2008:
cnt clk user sys cuser csys template
1 0 0.06 0.00 0.00 0.00 actions.html
1 0 0.00 0.00 0.00 0.00 banner.html
1 0 0.00 0.00 0.00 0.00 common_javascript.html
1 0 0.01 0.00 0.00 0.00 datetime.html
1 0 0.01 0.00 0.00 0.00 diag.html
3 0 0.02 0.00 0.00 0.00 field_table
1 0 0.00 0.00 0.00 0.00 header.html
1 0 0.01 0.00 0.00 0.00 info.html
1 0 0.01 0.01 0.00 0.00 my_checklists.html
1 0 0.00 0.00 0.00 0.00 my_javascript.html
1 0 0.00 0.00 0.00 0.00 qualifier.html
52 0 0.30 0.00 0.00 0.00 referral_options
1 0 0.01 0.00 0.00 0.00 relationship_block
1 0 0.00 0.00 0.00 0.00 set_bgcolor.html
1 0 0.00 0.00 0.00 0.00 shared_javascript.html
2 0 0.00 0.00 0.00 0.00 table_block
1 0 0.03 0.00 0.00 0.00 ticket.html
1 0 0.08 0.00 0.00 0.00 ticket_actions.html
-- end
Note that blocks as well as separate files are listed.
This is, IMHO, much more useful than the CPAN module Template::Timer.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Mongodb with very high CPU rate - mongodb

Related

Minimal Powershell script with Process block gives system process list (MacOS, Pwsh 7.1.3)

Osm2pgsql extremely slow on import on server with 192GB RAM

How to remove repeated header from the output of iostat linux command

insert word between lines

How can I profile template performance in Template::Toolkit?

Categories

Resources