ACI Container hangs during run CPU=0 - wget

I am attempting to run a bash script using ACI. Occasionally the container stalls or hangs. CPU activity drops to 0, memory flattens(see below) and network activity drops to ~50bytes but the script never completes. The container never terminates that I can tell. A bash window can be opened on the container. The logs suggest the hang occurs during wget.
Possible clue:
How can I verify my container is using SMB3.0 to connect to my share or is that handled at the host server level and I have to assume ACI uses SMB 3.0 ?
This script:
Dequeues an item from ServiceBus.
Runs an exe to obtain a URL
Performs a wget using the URL; writes the output to a StorageAccount Fileshare.
Exits, terminating the container.
wget is invoked with a 4min timeout. Data is written directly to the share so the run can be retried if it fails and the wget can resume. The timeout command should force wget to end if it hangs. The logs suggest the container hangs at wget.
timeout 4m wget -c -O "/aci/mnt/$ID/$ID.sra" $URL
I have 100 items in queue. I have 5 ACI's running 5 containers each (25 total containers.)
A Logic App checks the Queue and if items are present, runs the containers.
Approximately 95% of the download runs work as expected.
Many of the runs simply hang as far as I can tell at 104GB total downloads.
I am using a Premium Storage Account with a 300GB Fileshare using SMB Multichannel=Enabled
It seems on some of the large files (>3GB) the Container Instance will hang.
A successful run looks somethin like this:
PeekLock Message (5min)...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 353 0 353 0 0 1481 0 --:--:-- --:--:-- --:--:-- 1476
Wed Dec 28 14:31:41 UTC 2022: prefetch-01-0: ---------- BEGIN RUN ----------
./pipeline.sh: line 80: can't create /aci/mnt/SRR10575111/SRR10575111.txt: nonexistent directory
Wed Dec 28 14:31:41 UTC 2022: prefetch-01-0: vdb-dump [SRR10575111]...
Wed Dec 28 14:31:44 UTC 2022: prefetch-01-0: wget [SRR10575111]...
Wed Dec 28 14:31:44 UTC 2022: prefetch-01-0: URL=https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR10575111/SRR10575111
Connecting to sra-pub-run-odp.s3.amazonaws.com (54.231.131.113:443)
saving to '/aci/mnt/SRR10575111/SRR10575111.sra'
SRR10575111.sra 0% | | 12.8M 0:03:39 ETA
SRR10575111.sra 1% | | 56.1M 0:01:39 ETA
...
SRR10575111.sra 99% |******************************* | 2830M 0:00:00 ETA
SRR10575111.sra 100% |********************************| 2833M 0:00:00 ETA
'/aci/mnt/SRR10575111/SRR10575111.sra' saved
Wed Dec 28 14:35:42 UTC 2022: prefetch-01-0: wget exit...
Wed Dec 28 14:35:43 UTC 2022: prefetch-01-0: wget Success!
Wed Dec 28 14:35:43 UTC 2022: prefetch-01-0: Delete Message...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
Wed Dec 28 14:35:43 UTC 2022: prefetch-01-0: POST to [orchestrator] queue...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 325 0 0 100 325 0 1105 --:--:-- --:--:-- --:--:-- 1109
Wed Dec 28 14:35:44 UTC 2022: prefetch-01-0: exit RESULTCODE=0
A Hung run looks like this:
PeekLock Message (5min)...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 352 0 352 0 0 1252 0 --:--:-- --:--:-- --:--:-- 1252
Wed Dec 28 14:31:41 UTC 2022: prefetch-01-1: ---------- BEGIN RUN ----------
./pipeline.sh: line 80: can't create /aci/mnt/SRR9164212/SRR9164212.txt: nonexistent directory
Wed Dec 28 14:31:41 UTC 2022: prefetch-01-1: vdb-dump [SRR9164212]...
Wed Dec 28 14:31:44 UTC 2022: prefetch-01-1: wget [SRR9164212]...
Wed Dec 28 14:31:44 UTC 2022: prefetch-01-1: URL=https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR9164212/SRR9164212
Connecting to sra-pub-run-odp.s3.amazonaws.com (52.216.146.75:443)
saving to '/aci/mnt/SRR9164212/SRR9164212.sra'
SRR9164212.sra 0% | | 2278k 0:55:44 ETA
SRR9164212.sra 0% | | 53.7M 0:04:30 ETA
SRR9164212.sra 1% | | 83.9M 0:04:18 ETA
...
SRR9164212.sra 44% |************** | 3262M 0:04:55 ETA
SRR9164212.sra 44% |************** | 3292M 0:04:52 ETA
SRR9164212.sra 45% |************** | 3326M 0:04:47 ETA
The container is left in a Running State.
CPU goes to 0; Network activity goes to ~ 50B received/transmitted.

Related

postgress dvdrental sample database

Why do i get this error while trying to get the sample database on my Ubuntu?
postgres#vagrant:~$ curl -O https://sp.postgresqltutorial.com/wp-content/uploads/2019/05/dvdrental.zip
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0
curl: (51) SSL: no alternative certificate subject name matches target host name 'sp.postgresqltutorial.com'
I downloaded it manually but can't access it from psql shell

How to set up pgbouncer in a docker-compose setup for Airflow

I'm running a distributed Airflow setup using docker-compose. The main part of the services are run on one server and the celery workers are run on multiple servers. I have few hundred tasks running every five minutes and I started to run out of db connections which was indicated byt his error message in task logs.
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "SERVER" (IP), port XXXXX failed: FATAL: sorry, too many clients already
I'm using Postgres as metastore and and the max_connections is set to the default value of 100. I didn't want to raise the max_connections value since I thought, that there should be a better solution for this. At some point I'll run thousands of tasks every 5 min and the number of connections is guaranteed to run out again. So I added pgbouncer to my configuration.
Here's how I configured pgbouncer
pgbouncer:
image: "bitnami/pgbouncer:1.16.0"
restart: always
environment:
POSTGRESQL_HOST: "postgres"
POSTGRESQL_USERNAME: ${POSTGRES_USER}
POSTGRESQL_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRESQL_PORT: ${PSQL_PORT}
PGBOUNCER_DATABASE: ${POSTGRES_DB}
PGBOUNCER_AUTH_TYPE: "trust"
PGBOUNCER_IGNORE_STARTUP_PARAMETERS: "extra_float_digits"
ports:
- '1234:1234'
depends_on:
- postgres
pgbouncer logs look like this:
pgbouncer 13:29:13.87
pgbouncer 13:29:13.87 Welcome to the Bitnami pgbouncer container
pgbouncer 13:29:13.87 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-pgbouncer
pgbouncer 13:29:13.87 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-pgbouncer/issues
pgbouncer 13:29:13.88
pgbouncer 13:29:13.89 INFO ==> ** Starting PgBouncer setup **
pgbouncer 13:29:13.91 INFO ==> Validating settings in PGBOUNCER_* env vars...
pgbouncer 13:29:13.91 WARN ==> You set the environment variable PGBOUNCER_AUTH_TYPE=trust. For safety reasons, do not use this flag in a production environment.
pgbouncer 13:29:13.91 INFO ==> Initializing PgBouncer...
pgbouncer 13:29:13.92 INFO ==> Waiting for PostgreSQL backend to be accessible
pgbouncer 13:29:13.92 INFO ==> Backend postgres:9876 accessible
pgbouncer 13:29:13.93 INFO ==> Configuring credentials
pgbouncer 13:29:13.93 INFO ==> Creating configuration file
pgbouncer 13:29:14.06 INFO ==> Loading custom scripts...
pgbouncer 13:29:14.06 INFO ==> ** PgBouncer setup finished! **
pgbouncer 13:29:14.08 INFO ==> ** Starting PgBouncer **
2022-10-25 13:29:14.089 UTC [1] LOG kernel file descriptor limit: 1048576 (hard: 1048576); max_client_conn: 100, max expected fd use: 152
2022-10-25 13:29:14.089 UTC [1] LOG listening on 0.0.0.0:1234
2022-10-25 13:29:14.089 UTC [1] LOG listening on unix:/tmp/.s.PGSQL.1234
2022-10-25 13:29:14.089 UTC [1] LOG process up: PgBouncer 1.16.0, libevent 2.1.8-stable (epoll), adns: c-ares 1.14.0, tls: OpenSSL 1.1.1d 10 Sep 2019
2022-10-25 13:30:14.090 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:31:14.090 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:32:14.090 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:33:14.090 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:34:14.089 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:35:14.090 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:36:14.090 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:37:14.090 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:38:14.090 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:39:14.089 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
The service seems to run ok, but I think it doesn't do anything. There was very little information about this in the Airflow documentation and I'm unsure what to change.
Should I change the pgbouncer setup in my docker-compose file?
Should I change AIRFLOW__DATABASE__SQL_ALCHEMY_CONN variable?
Update 1:
I edited the docker-compose.yml for the worker nodes and changed the db port to be the pgbouncer port. After this I got some traffic on the bouncer logs. Airflow tasks are queued and not procesessed with this configuration so there's still something wrong. I didn't edit the docker-compose yaml that launches the webserver, scheduler etc., didn't know how.
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://<XXX>#${AIRFLOW_WEBSERVER_URL}:${PGBOUNCER_PORT}/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://<XXX>#${AIRFLOW_WEBSERVER_URL}:${PGBOUNCER_PORT}/airflow
pgbouncer log after the change:
2022-10-26 11:46:22.517 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-26 11:47:22.517 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-26 11:48:22.517 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-26 11:49:22.519 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-26 11:50:22.518 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-26 11:51:22.516 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-26 11:51:52.356 UTC [1] LOG C-0x5602cf8ab180: <XXX>#<IP:PORT> login attempt: db=airflow user=airflow tls=no
2022-10-26 11:51:52.359 UTC [1] LOG S-0x5602cf8b1f20: <XXX>#<IP:PORT> new connection to server (from <IP:PORT>)
2022-10-26 11:51:52.410 UTC [1] LOG C-0x5602cf8ab180: <XXX>#<IP:PORT> closing because: client close request (age=0s)
2022-10-26 11:51:52.834 UTC [1] LOG C-0x5602cf8ab180: <XXX>#<IP:PORT> login attempt: db=airflow user=airflow tls=no
2022-10-26 11:51:52.845 UTC [1] LOG C-0x5602cf8ab180: <XXX>#<IP:PORT> closing because: client close request (age=0s)
2022-10-26 11:51:56.752 UTC [1] LOG C-0x5602cf8ab180: <XXX>#<IP:PORT> login attempt: db=airflow user=airflow tls=no
2022-10-26 11:51:57.393 UTC [1] LOG C-0x5602cf8ab3b0: <XXX>#<IP:PORT> login attempt: db=airflow user=airflow tls=no
2022-10-26 11:51:57.394 UTC [1] LOG S-0x5602cf8b2150: <XXX>#<IP:PORT> new connection to server (from <IP:PORT>)
2022-10-26 11:51:59.906 UTC [1] LOG C-0x5602cf8ab180: <XXX>#<IP:PORT> closing because: client close request (age=3s)
2022-10-26 11:52:00.642 UTC [1] LOG C-0x5602cf8ab3b0: <XXX>#<IP:PORT> closing because: client close request (age=3s)

Kafka REST proxy: how to retrieve and deserialize Kafka data based on AVRO schema stored in schema-registry

I am new to Kafka. I run a docker based Kafka ecosystem on my local machine, including broker/zookeeper/schema-registry/rest-proxy. Also I have a external producer(temp-service), which sends AVRO schema serialized data to the topic temp-topic in broker.
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
411564d10c06 confluentinc/cp-kafka-rest:latest "/etc/confluent/dock…" 27 seconds ago Up 25 seconds 0.0.0.0:8082->8082/tcp kafka_kafka-rest_1
38c4e3ea008c confluentinc/cp-schema-registry:latest "/etc/confluent/dock…" 30 seconds ago Up 27 seconds 0.0.0.0:8081->8081/tcp kafka_schema-registry_1
7abe6cf9a7a0 confluentinc/cp-kafka:latest "/etc/confluent/dock…" 30 minutes ago Up 30 seconds 0.0.0.0:9092->9092/tcp kafka
bdffd9e03088 confluentinc/cp-zookeeper:latest "/etc/confluent/dock…" 30 minutes ago Up 30 seconds 2888/tcp, 0.0.0.0:2181->2181/tcp, 3888/tcp zookeeper
d1909c6877c5 temp-service:latest "node /home/tempserv…" 3 hours ago Up 2 hours (healthy) 0.0.0.0:8107->8107/tcp, 0.0.0.0:9107->9107/tcp, 0.0.0.0:9229->9229/tcp temp-service
I have also posted the AVRO schema of the kafka data of temp-service to the schema-registry, so that it is stored there as id 1.
I created a consumer group temp_consumers and a consumer instance temp_consumer_instance,
$ curl -X POST -H "Content-Type: application/vnd.kafka.v2+json" --data '{"name": "temp_consumer_instance", "format": "avro", "auto.offset.reset": "earliest"}' http://localhost:8082/consumers/temp_consumers
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 219 0 134 100 85 158 100 --:--:-- --:--:-- --:--:-- 260
{"instance_id":"temp_consumer_instance","base_uri":"http://kafka-rest:8082/consumers/temp_consumers/instances/temp_consumer_instance"}
checked topics in kafka:
$ curl -X GET http://localhost:8082/topics
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 29 0 29 0 0 1610 0 --:--:-- --:--:-- --:--:-- 1705
["temp-topic","_schemas"]
subscribed to the topic temp-topic.
$ curl -X POST -H "Content-Type: application/vnd.kafka.v2+json" --data '{"topics":["temp-topic"]}' http://localhost:8082/consumers/temp_consumers/instances/temp_consumer_instance/subscription
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 29 0 0 100 29 0 1046 --:--:-- --:--:-- --:--:-- 1115
tried to consume the records in the topic but failed:
$ curl -X GET -H "Accept: application/vnd.kafka.binary.v2+json" http://localhost:8082/consumers/temp_consumers/instances/temp_consumer_instance/records
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 126 0 126 0 0 10598 0 --:--:-- --:--:-- --:--:-- 12600
{"error_code":40601,"message":"The requested embedded data format does not match the deserializer for this consumer instance"}
I would like to know if there are any ways to deserialize the kafka data posted by the producer, based on the AVRO schema stored in schema registry?

sar command output strange socket information about totsck and tcpsck

my nginx web server sits on a linux virtual machine running cent os 6.4.
when using sar command to check socket info ,it output mistake socket count like this :
05:00:01 PM totsck tcpsck udpsck rawsck ip-frag tcp-tw
05:10:01 PM 16436 16944 9 0 0 4625
05:20:01 PM 16457 16844 9 0 0 2881
05:30:01 PM 16501 16835 9 0 0 2917
05:40:01 PM 16486 16842 9 0 0 3083
05:50:02 PM 16436 16885 9 0 0 2962
pay attention to the totsck and tcpsck,the later is more than the former, this is suppose to be less than the former, why ?
I suppose that's due to tcp-tw.
that correspond to the Number of TCP sockets in TIME_WAIT state.
http://linux.die.net/man/1/sar

Mongod resident memory usage low

I'm trying to debug some performance issues with a MongoDB configuration, and I noticed that the resident memory usage is sitting very low (around 25% of the system memory) despite the fact that there are occasionally large numbers of faults occurring. I'm surprised to see the usage so low given that MongoDB is so memory dependent.
Here's a snapshot of top sorted by memory usage. It can be seen that no other process is using an significant memory:
top - 21:00:47 up 136 days, 2:45, 1 user, load average: 1.35, 1.51, 0.83
Tasks: 62 total, 1 running, 61 sleeping, 0 stopped, 0 zombie
Cpu(s): 13.7%us, 5.2%sy, 0.0%ni, 77.3%id, 0.3%wa, 0.0%hi, 1.0%si, 2.4%st
Mem: 1692600k total, 1676900k used, 15700k free, 12092k buffers
Swap: 917500k total, 54088k used, 863412k free, 1473148k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2461 mongodb 20 0 29.5g 564m 492m S 22.6 34.2 40947:09 mongod
20306 ubuntu 20 0 24864 7412 1712 S 0.0 0.4 0:00.76 bash
20157 root 20 0 73352 3576 2772 S 0.0 0.2 0:00.01 sshd
609 syslog 20 0 248m 3240 520 S 0.0 0.2 38:31.35 rsyslogd
20304 ubuntu 20 0 73352 1668 872 S 0.0 0.1 0:00.00 sshd
1 root 20 0 24312 1448 708 S 0.0 0.1 0:08.71 init
20442 ubuntu 20 0 17308 1232 944 R 0.0 0.1 0:00.54 top
I'd like to at least understand why the memory isn't being better utilized by the server, and ideally to learn how to optimize either the server config or queries to improve performance.
UPDATE:
It's fair that the memory usage looks high, which might lead to the conclusion it's another process. There's no other processes using any significant memory on the server; the memory appears to be consumed in the cache, but I'm not clear why that would be the case:
$free -m
total used free shared buffers cached
Mem: 1652 1602 50 0 14 1415
-/+ buffers/cache: 172 1480
Swap: 895 53 842
UPDATE:
You can see that the database is still page faulting:
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn set repl time
0 402 377 0 1167 446 0 24.2g 51.4g 3g 0 <redacted>:9.7% 0 0|0 1|0 217k 420k 457 mover PRI 03:58:43
10 295 323 0 961 592 0 24.2g 51.4g 3.01g 0 <redacted>:10.9% 0 14|0 1|1 228k 500k 485 mover PRI 03:58:44
10 240 220 0 698 342 0 24.2g 51.4g 3.02g 5 <redacted>:10.4% 0 0|0 0|0 164k 429k 478 mover PRI 03:58:45
25 449 359 0 981 479 0 24.2g 51.4g 3.02g 32 <redacted>:20.2% 0 0|0 0|0 237k 503k 479 mover PRI 03:58:46
18 469 337 0 958 466 0 24.2g 51.4g 3g 29 <redacted>:20.1% 0 0|0 0|0 223k 500k 490 mover PRI 03:58:47
9 306 238 1 759 325 0 24.2g 51.4g 2.99g 18 <redacted>:10.8% 0 6|0 1|0 154k 321k 495 mover PRI 03:58:48
6 301 236 1 765 325 0 24.2g 51.4g 2.99g 20 <redacted>:11.0% 0 0|0 0|0 156k 344k 501 mover PRI 03:58:49
11 397 318 0 995 395 0 24.2g 51.4g 2.98g 21 <redacted>:13.4% 0 0|0 0|0 198k 424k 507 mover PRI 03:58:50
10 544 428 0 1237 532 0 24.2g 51.4g 2.99g 13 <redacted>:15.4% 0 0|0 0|0 262k 571k 513 mover PRI 03:58:51
5 291 264 0 878 335 0 24.2g 51.4g 2.98g 11 <redacted>:9.8% 0 0|0 0|0 163k 330k 513 mover PRI 03:58:52
It appears this was being caused by a large amount of inactive memory on the server that wasn't be cleared for Mongo's use.
By looking at the result from:
cat /proc/meminfo
I could see a large amount of Inactive memory. Using this command as a sudo user:
free && sync && echo 3 > /proc/sys/vm/drop_caches && echo "" && free
Freed up the inactive memory, and over the next 24 hours I was able to see the resident memory of my Mongo instance increasing to consume the rest of the memory available on the server.
Credit to the following blog post for it's instructions:
http://tinylan.com/index.php/article/how-to-clear-inactive-memory-in-linux
MongoDB only uses as much memory as it needs, so if all of the data and indexes that are in MongoDB can fit inside what it's currently using you won't be able to push that anymore.
If the data set is larger than memory, there are a couple of considerations:
Check MongoDB itself to see how much data it thinks its using by running mongostat and looking at resident-memory
Was MongoDB re/started recently? If it's cold then the data won't be in memory until it gets paged in (leading to more page faults initially that gradually settle). Check out the touch command for more information on "warming MongoDB up"
Check your read ahead settings. If your system read ahead is too high then MongoDB can't efficiently use the memory on the system. For MongoDB a good number to start with is a setting of 32 (that's 16 KB of read ahead assuming you have 512 byte blocks)
I had the same issue: Windows Server 2008 R2, 16 Gb RAM, Mongo 2.4.3. Mongo uses only 2 Gb of RAM and generates a lot of page faults. Queries are very slow. Disk is idle, memory is free. Found no other solution than upgrade to 2.6.5. It helped.