Does FIWARE Cygnus have any issue with space usage?

Does FIWARE Cygnus have any issue with space usage? - mongodb

I'm using some FIWARE components (Orion 2.4.0-bext and Cygnus 2.3.0). Also I'm using mongoDB (version 3.6) for current context information and MySQL (version 5.7) for saving historical changes using Cygnus. All component are dockerized, each one on its own container.
In my current solution, I receive lots of messages (between 100 and 500 per second) that are stored into mongoDB (through Orion) and MySQL (through Cygnus). The problem that I found is that
Mongo and Cygnus are using a lot of CPU power, even tough the amount of data is not huge.
Cygnus is growing insanely in the amount of disk used by the container.
Cygnus has been configured to store 1 row per attribute change in MySQL. These are the docker-compose environment variables of Cygnus regarding the configuration of the mapping in MySQL:
environment:
- ...
- "CYGNUS_LOG_LEVEL=DEBUG" #
- "CYGNUS_MYSQL_DATA_MODEL=dm-by-entity-type"
- "CYGNUS_MYSQL_ATTR_NATIVE_TYPES=true"
- "CYGNUS_MYSQL_ATTR_PERSISTENCE=column"
To ilustrate point 2), I'll show you how much space is using Cygnus after and bofere running for a while.
When I check the disk used:
[root#maaster-docker-test docker]# du -h --max-depth=1
4.0K ./trust
20K ./builder
11M ./image
12G ./overlay2
4.0K ./runtimes
72K ./buildkit
100K ./network
4.0K ./tmp
807M ./volumes
4.0K ./swarm
20K ./plugins
38G ./containers
50G .
We can see that containers folder is using 38G of space. Inspecting a little more:
[root#maaster-docker-test containers]# du -h --max-depth=1
35G ./dd1c103dd7d1a5ecbab0e3a2e039a6c4d3fd525837a766a9bab28b29ba923a32
1.4M ./ebf1c3978077a727d414d3e2de5974b03a236999cc66c29f66c0e517d6bbe055
40K ./2c92baf1368ee292b7cf33db86369d0b6f7941753f54f145a14e5793eac6eba2
40K ./047c082e1f1e8f26be0bb1a063e93907f55b4736ad88fd29736e383c6e03d559
175M ./63c715643cfbd7695dc538081e4a963e270dc03a4ffdb528bb375cf57438a477
152M ./cbb5fe85b16411dc94c8ab00dcd7b40b728acaa422398445be4130aa0197d287
1.4M ./e7c01dca2246f17c5f0aa9f413772b7b73be2d962ae772e062c5257ce95252fa
2.6G ./534b92fe6f1dac9f56d59b1e9feeb0d013199d17fae0af1df150dd20b96c9f70
38G .
container starting with dd1c103dd7d1 is taking 35G of space. When I check to whom this container corresponds:
[root#maaster-docker-test containers]# docker ps
...
dd1c103dd7d1 fiware/cygnus-ngsi:latest "/cygnus-entrypoint.…" 2 days ago Up 2 days (healthy) 0.0.0.0:5050-5051->5050-5051/tcp, 0.0.0.0:5080->5080/tcp fiware-cygnus
...
Turns out that is the Cygnus. At this point, I had been inserted around 12000 different entities in MongoDB, reflecting around 300.000 changes (rows) in MySQL. So the amount of data is not that insane.
To finalize, after restarting the services defined in docker-compose, this is stopping Cygnus, removing containers and starting them again):
[root#maaster-docker-test containers]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
...
0fb5ab78b70b fiware/cygnus-ngsi:latest "/cygnus-entrypoint.…" 3 minutes ago Up 3 minutes (healthy) 0.0.0.0:5050-5051->5050-5051/tcp, 0.0.0.0:5080->5080/tcp fiware-cygnus
...
And checking again for space used
[root#maaster-docker-test containers]# du -h --max-depth=1
48K ./e8d32c563ad0a22f70a83b576592b32b6336b0b39cc401cfe451896e6b76a897
40K ./043069752e112cdce051d3792b8a261133b869e939bed263b9e0eb7dc265c379
80K ./f1f40e29a8d620fc38b7b12cfcd4986f3ffcc2f970f08d582cb826c2f2a41ff0
1.1M ./0fb5ab78b70bcc856c91b9b99266049b20f05f5d1d6d7d224cdfde9eec47b801
40K ./047c082e1f1e8f26be0bb1a063e93907f55b4736ad88fd29736e383c6e03d559
128K ./aa3c63cafe57c6ea1b4ed01bee9415b6a87bf538d6d7a700750be21c738d39d5
68K ./11388b7090bcd08ac0235129b7354e5c9b68f1d4855abdc97f025ad597b73e68
68K ./84a6aaf4221fde5cb0e21b727e33e21ff58d275597b121ba22f87296148b38db
1.6M .
We can see that the container used by Cygnus (0fb5ab78b70b), is taking only 1.1M of space.
So, does anyone know if there is a problem with Cygnus, with Docker, or with Cygnus running in a Docker environment?
Is there any configuration change I should perform on Cygnus to be more efficient and also consume less disk space?
Also, I'd be nice to know if there is any configuration change I must to do mongoDB (or maybe ORION) in order to make mongodb more efficient? I don't think that inserting ~200 new entities per second should not impact in the CPU usage of mongo process.
Thanks a lot in advance

I've discovered that the configuration
"CYGNUS_LOG_LEVEL=DEBUG"
was generating an enormous amount of data that was putting down the performance of the service, increasing the disk usage.
Changing that value solved my issue.

Related

citus write activity after ingestion is completed

I have a citus cluster of 1 coordinator node (32 vcores , 64 GB RAM) and 6 worker nodes (4 cores , 32 GB RAM each).
After performing ingestion of data using the following command where chunks_0 directory contains 300 files having 1M record each:
find chunks_0/ -type f | time xargs -n1 -P24 sh -c "psql -d citus_testing -c \"\\copy table_1 from '/home/postgres/\$0' with csv HEADER;\""
I notice that after the ingestion is done, there is still a write activity occurring on the worker nodes at smaller rate (was around 800MB/sec overall during ingestion, and around 80-100MB/sec after ingestion) for a certain time.
I'm wondering what is citus doing during this time?

If you do not run any queries in said time period, I do not think Citus is responsible for any writes. It's possible that PostgreSQL ran autovacuum. You can check the PostgreSQL logs in the worker nodes and see for yourself.

Mongorestore seems to run out of memory and kills the mongo process

In current setup there are two Mongo Docker containers, running on hosts A and B, with Mongo version of 3.4 and running in a replica set. I would like to upgrade them to 3.6 and increase a member so the containers would run on hosts A, B and C. Containers have 8GB memory limit and no swap allocated (currently), and are administrated in Rancher. So my plan was to boot up the three new containers, initialize a replica set for those, take a dump from the 3.4 container, and restore it the the new replica set master.
Taking the dump went fine, and its size was about 16GB. When I tried to restore it to the new 3.6 master, restoring starts fine, but after it has restored roughly 5GB of the data, mongo process seems to be killed by OS/Rancher, and while the container itself doesn't restart, MongoDB process just crashes and reloads itself back up again. If I run mongorestore to the same database again, it says unique key error for all the already inserted entries and then continue where it left off, only to do the same again after 5GB or so. So it seems that mongorestore loads all the entries it restores to memory.
So I've got to get some solution to this, and:
Every time it crashes, just run the mongorestore command so it continues where it left off. It probably should work, but I feel a bit uneasy doing it.
Restore the database one collection at a time, but the largest collection is bigger than 5GB so it wouldn't work properly either.
Add swap or physical memory (temporarily) to the container so the process doesn't get killed after the process runs out of physical memory.
Something else, hopefully a better solution?

Increasing the swap size as the other answer pointed out worked out for me. Also, The --numParallelCollections option controls the number of collections mongodump/mongorestore should dump/restore in parallel. The default is 4 which may consume a lot of memory.

Since it sounds like you're not running out of disk space due to mongorestore continuing where it left off successfully, focusing on memory issues is the correct response. You're definitely running out of memory during the mongorestore process.
I would highly recommend going with the swap space, as this is the simplest, most reliable, least hacky, and arguably the most officially supported way to handle this problem.
Alternatively, if you're for some reason completely opposed to using swap space, you could temporarily use a node with a larger amount of memory, perform the mongorestore on this node, allow it to replicate, then take the node down and replace it with a node that has fewer resources allocated to it. This option should work, but could become quite difficult with larger data sets and is pretty overkill for something like this.

I solved the OOM problem by using the --wiredTigerCacheSizeGB parameter of mongod. Excerpt from my docker-compose.yaml below:
version: '3.6'
services:
db:
container_name: db
image: mongo:3.2
volumes:
- ./vol/db/:/data/db
restart: always
# use 1.5GB for cache instead of the default (Total RAM - 1GB)/2:
command: mongod --wiredTigerCacheSizeGB 1.5

Just documenting here my experience in 2020 using mongodb 4.4:
I ran into this problem restoring a 5GB collection on a machine with 4GB mem. I added 4GB swap which seemed to work, I was no longer seeing the KILLED message.
However, a while later I noticed I was missing a lot of data! Turns out if mongorestore runs out of memory during the final step (at 100%) it will not show killed, BUT IT HASNT IMPORTED YOUR DATA.
You want to make sure you see this final line:
[########################] cranlike.files.chunks 5.00GB/5.00GB (100.0%)
[########################] cranlike.files.chunks 5.00GB/5.00GB (100.0%)
[########################] cranlike.files.chunks 5.00GB/5.00GB (100.0%)
[########################] cranlike.files.chunks 5.00GB/5.00GB (100.0%)
[########################] cranlike.files.chunks 5.00GB/5.00GB (100.0%)
restoring indexes for collection cranlike.files.chunks from metadata
finished restoring cranlike.files.chunks (23674 documents, 0 failures)
34632 document(s) restored successfully. 0 document(s) failed to restore.
In my case I needed 4GB mem + 8GB swap, to import 5GB GridFS collection.

Rather than starting up a new replica set, it's possible to do the entire expansion and upgrade without even going offline.
Start MongoDB 3.6 on host C
On the primary (currently A or B), add node C into the replica set
Node C will do an initial sync of the data; this may take some time
Once that is finished, take down node B; your replica set has two working nodes still (A and C) so will continue uninterrupted
Replace v3.4 on node B with v3.6 and start back up again
When node B is ready, take down node A
Replace v3.4 on node A with v3.6 and start back up again
You'll be left with the same replica set running as before, but now with three nodes all running v.3.4.
PS Be sure to check out the documentation on Upgrade a Replica Set to 3.6 before you start.

I ran into a similar issue running 3 nodes on a single machine (8GB RAM total) as part of testing a replicaset. The default storage cache size is .5 * (Total RAM - 1GB). The mongorestore caused each node to use the full cache size on restore and consume all available RAM.
I am using ansible to template this part of mongod.conf, but you can set your cacheSizeGB to any reasonable amount so multiple instances do not consume the RAM.
storage:
wiredTiger:
engineConfig:
cacheSizeGB: {{ ansible_memtotal_mb / 1024 * 0.2 }}

My scenario is similar to #qwertz but to be able to upload all collection to my database the following script was created to handle partials uploads; uploading every single collection one-by-one instead of trying to send all database at once was the only way to properly populate it.
populate.sh:
#!/bin/bash
backupFiles=`ls ./backup/${DB_NAME}/*.bson.gz`
for file in $backupFiles
do
file="${file:1}"
collection=(${file//./ })
collection=(${collection//// })
collection=${collection[2]}
mongorestore \
$file \
--gzip \
--db=$DB_NAME \
--collection=$collection \
--drop \
--uri="${DB_URI}://${DB_USERNAME}:${DB_PASSWORD}#${DB_HOST}:${DB_PORT}/"
done
Dockerfile.mongorestore:
FROM alpine:3.12.4
RUN [ "apk", "add", "--no-cache", "bash", "mongodb-tools" ]
COPY populate.sh .
ENTRYPOINT [ "./populate.sh" ]
docker-compose.yml:
...
mongorestore:
build:
context: .
dockerfile: Dockerfile.mongorestore
restart: on-failure
environment:
- DB_URI=${DB_URI}
- DB_NAME=${DB_NAME}
- DB_USERNAME=${DB_USERNAME}
- DB_PASSWORD=${DB_PASSWORD}
- DB_HOST=${DB_HOST}
- DB_PORT=${DB_PORT}
volumes:
- ${PWD}/backup:/backup
...

How to limit pg_dump's memory usage?

I have a ~140 GB postgreDB on Heroku / AWS. I want to create a dump of this on a windows Azure - Windows server 2012 R2 virtual machine, as i need to move the DB into Azure environment.
The DB has a couple of smaller tables, but mainly consists of a single table taking ~130 GB, including indexes. It has ~500 million rows.
I've tried to use pg_dump for this, with:
./pg_dump -Fc --no-acl --no-owner --host * --port 5432 -U * -d * > F:/051418.dump
I've tried on various Azure virtual machine sizes, including some fairly large with (D12_V2) 28GB ram, 4 VCPUs 12000 MAXIOPs, etc. But in all cases the pg_dump stalls completely due to memory swapping.
On above machine it's currently using all available memory and has used the past 12 hrs swapping memory on the disk. I dont expect it to complete, due to the swapping.
From other posts i've understood it could be an issue with the network speed, beeing much faster than the disk IO speed, causing pg_dump to suck up all available memory and more, so i've tried using the azure machine with most IOPs. This hasnt helped.
So is there another way i can force pg_dump to cap it's memory usage, or wait on pulling more data until it has written to disk and clear memory ?
Looking forward to your help!
Krgds.
Christian

performance issue until mongodump

we operate for our customer a server with a single mongo instance, gradle, postgres and nginx running on it. The problem is we had massiv performance problmes until mongodump is running. The mongo queue is growing and no data be queried. The next problem is the costumer want not invest in a replica-set or a software update (mongod 3.x).
Has somebody any idea how i clould improve the performance.
command to create the dump:
mongodump -u ${MONGO_USER} -p ${MONGO_PASSWORD} -o ${MONGO_DUMP_DIR} -d ${MONGO_DATABASE} --authenticationDatabase ${MONGO_DATABASE} > /backup/logs/mongobackup.log
tar cjf ${ZIPPED_FILENAME} ${MONGO_DUMP_DIR}
System:
6 Cores
36 GB RAM
1TB SATA HDD
+ 2TB (backup NAS)
MongoDB 2.6.7
Thanks
Best regards,
Markus

As you have heavy load, adding a replica set is a good solution, as backup could be taken on secondary node, but be aware that replica need at least three servers (you can have an master/slave/arbiter - where the last need a little amount of resources)
MongoDump makes general query lock which will have an impact if there is a lot of writes in dumped database.
Hint: try to make backup when there is light load on system.

Try with volume snapshots. Check with your cloud provider what are the options available to take snapshots. It is super fast and cheaper if you compare actual pricing used in taking a backup(RAM and CPU used and if HDD then transactions const(even if it is little)).

Is there any option to limit mongodb memory usage?

I am using Mongo-DBv1.8.1. My server memory is 4GB but Mongo-DB is utilizing more than 3GB. Is there memory limitation option in Mongo-DB?.

If you are running MongoDB 3.2 or later version, you can limit the wiredTiger cache as mentioned above.
In /etc/mongod.conf add the wiredTiger part
...
# Where and how to store data.
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
wiredTiger:
engineConfig:
cacheSizeGB: 1
...
This will limit the cache size to 1GB, more info in Doc
This solved the issue for me, running ubuntu 16.04 and mongoDB 3.2
PS: After changing the config, restart the mongo daemon.
$ sudo service mongod restart
# check the status
$ sudo service mongod status

Starting in 3.2, MongoDB uses the WiredTiger as the default storage engine. Previous versions used the MMAPv1 as the default storage engine.
With WiredTiger, MongoDB utilizes both the WiredTiger internal cache and the filesystem cache.
In MongoDB 3.2, the WiredTiger internal cache, by default, will use the larger of either:
60% of RAM minus 1 GB, or
1 GB.
For systems with up to 10 GB of RAM, the new default setting is less than or equal to the 3.0 default setting (For MongoDB 3.0, the WiredTiger internal cache uses either 1 GB or half of the installed physical RAM, whichever is larger).
For systems with more than 10 GB of RAM, the new default setting is greater than the 3.0 setting.
to limit the wiredTriggered Cache Add following line to .config file :
wiredTigerCacheSizeGB = 1

This question has been asked a couple times ...
See this related question/answer (quoted below) ... how to release the caching which is used by Mongodb?
MongoDB will (at least seem) to use up a lot of available memory, but it actually leaves it up to the OS's VMM to tell it to release the memory (see Caching in the MongoDB docs.)
You should be able to release any and all memory by restarting MongoDB.
However, to some extent MongoDB isn't really "using" the memory.
For example from the MongoDB docs Checking Server Memory Usage ...
Depending on the platform you may see
the mapped files as memory in the
process, but this is not strictly
correct. Unix top may show way more
memory for mongod than is really
appropriate. The Operating System (the
virtual memory manager specifically,
depending on OS) manages the memory
where the "Memory Mapped Files"
reside. This number is usually shown
in a program like "free -lmt".
It is called "cached" memory.
MongoDB uses the LRU (Least Recently Used) cache algorithm to determine which "pages" to release, you will find some more information in these two questions ...
MongoDB limit memory
MongoDB index/RAM relationship
Mongod start with memory limit (You can't.)

You can limit mongod process usage using cgroups on Linux.
Using cgroups, our task can be accomplished in a few easy steps.
Create control group:
cgcreate -g memory:DBLimitedGroup
(make sure that cgroups binaries installed on your system, consult your favorite Linux distribution manual for how to do that)
Specify how much memory will be available for this group:
echo 16G > /sys/fs/cgroup/memory/DBLimitedGroup/memory.limit_in_bytes
This command limits memory to 16G (good thing this limits the memory for both malloc allocations and OS cache)
Now, it will be a good idea to drop pages already stayed in cache:
sync; echo 3 > /proc/sys/vm/drop_caches
And finally assign a server to created control group:
cgclassify -g memory:DBLimitedGroup \`pidof mongod\`
This will assign a running mongod process to a group limited by only 16GB memory.
source: Using Cgroups to Limit MySQL and MongoDB memory usage

I don't think you can configure how much memory MongoDB uses, but that's OK (read below).
To quote from the official source:
Virtual memory size and resident size will appear to be very large for the mongod process. This is benign: virtual memory space will be just larger than the size of the datafiles open and mapped; resident size will vary depending on the amount of memory not used by other processes on the machine.
In other words, Mongo will let other programs use memory if they ask for it.

mongod --wiredTigerCacheSizeGB 2 xx

Adding to the top voted answer, in case you are on a low memory machine and want to configure the wiredTigerCache in MBs instead of whole number GBs, use this -
storage:
wiredTiger:
engineConfig:
configString : cache_size=345M
Source - https://jira.mongodb.org/browse/SERVER-22274

For Windows it seems possible to control the amount of memory MongoDB uses, see this tutorial at Captain Codeman:
Limit MongoDB memory use on Windows without Virtualization

Not really, there are a couple of tricks to limit memory, like on Windows you can use the Windows System Resource Manager (WSRM), but generally Mongo works best on a dedicated server when it's free to use memory without much contention with other systems.
Although the operating system will try to allocate memory to other processes as they need it, in practice this can lead to performance issues if other systems have high memory requirements too.
If you really need to limit memory, and only have a single server, then your best bet is virtualization.

This can be done with cgroups, by combining knowledge from these two articles:
https://www.percona.com/blog/2015/07/01/using-cgroups-to-limit-mysql-and-mongodb-memory-usage/
http://frank2.net/cgroups-ubuntu-14-04/
You can find here a small shell script which will create config and init files for Ubuntu 14.04:
http://brainsuckerna.blogspot.com.by/2016/05/limiting-mongodb-memory-usage-with.html
Just like that:
sudo bash -c 'curl -o- http://brains.by/misc/mongodb_memory_limit_ubuntu1404.sh | bash'

There is no reason to limit MongoDB cache as by default the mongod process will take 1/2 of the memory on the machine and no more. The default storage engine is WiredTiger. "With WiredTiger, MongoDB utilizes both the WiredTiger internal cache and the filesystem cache."
You are probably looking at top and assuming that Mongo is using all the memory on your machine. That is virtual memory. Use free -m:
total used free shared buff/cache available
Mem: 7982 1487 5601 8 893 6204
Swap: 0 0 0
Only when the available metric goes to zero is your computer swapping memory out to disk. In that case your database is too large for your machine. Add another mongodb instance to your cluster.
Use these two commands in the mongod console to get information about how much virtual and physical memory Mongodb is using:
var mem = db.serverStatus().tcmalloc;
mem.tcmalloc.formattedString
------------------------------------------------
MALLOC: 360509952 ( 343.8 MiB) Bytes in use by application
MALLOC: + 477704192 ( 455.6 MiB) Bytes in page heap freelist
MALLOC: + 33152680 ( 31.6 MiB) Bytes in central cache freelist
MALLOC: + 2684032 ( 2.6 MiB) Bytes in transfer cache freelist
MALLOC: + 3508952 ( 3.3 MiB) Bytes in thread cache freelists
MALLOC: + 6349056 ( 6.1 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 883908864 ( 843.0 MiB) Actual memory used (physical + swap)
MALLOC: + 33611776 ( 32.1 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 917520640 ( 875.0 MiB) Virtual address space used
MALLOC:
MALLOC: 26695 Spans in use
MALLOC: 22 Thread heaps in use
MALLOC: 4096 Tcmalloc page size

One thing you can limit is the amount of memory mongodb uses while building indexes. This is set using the maxIndexBuildMemoryUsageMegabytes setting. An example of how its set is below:
mongo --eval "db.adminCommand( { setParameter: 1, maxIndexBuildMemoryUsageMegabytes: 70000 } )"

this worked for me on an AWS instance, to at least clear the cached memory mongo was using. after this you can see how your settings have had effect.
ubuntu#hongse:~$ free -m
total used free shared buffers cached
Mem: 3952 3667 284 0 617 514
-/+ buffers/cache: 2535 1416
Swap: 0 0 0
ubuntu#hongse:~$ sudo su
root#hongse:/home/ubuntu# sync; echo 3 > /proc/sys/vm/drop_caches
root#hongse:/home/ubuntu# free -m
total used free shared buffers cached
Mem: 3952 2269 1682 0 1 42
-/+ buffers/cache: 2225 1726
Swap: 0 0 0

If you're using Docker, reading the Docker image documentation (in the Setting WiredTiger cache size limits section) I found out that they set the default to consume all available memory regardless of memory limits you may have imposed on the container, so you would have to limit the RAM usage directly from the DB configuration.
Create you mongod.conf file:
# Limits cache storage
storage:
wiredTiger:
engineConfig:
cacheSizeGB: 1 # Set the size you want
Now you can assign that config file to the container: docker run --name mongo-container -v /path/to/mongod.conf:/etc/mongo/mongod.conf -d mongo --config /etc/mongo/mongod.conf
Alternatively you could use a docker-compose.yml file:
version: '3'
services:
mongo:
image: mongo:4.2
# Sets the config file
command: --config /etc/mongo/mongod.conf
volumes:
- ./config/mongo/mongod.conf:/etc/mongo/mongod.conf
# Others settings...

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse