CephFS pool can't use all the available raw space (MAX_AVAIL < AVAIL) - ceph

I have a Ceph cluster intended to run as CephFS on hard drive enclosures that provide 9PiB raw space total over a number of servers.
I created a 3+3 erasure coding pool that is supposed to span the whole raw space of my hard drives.
Surprisingly, it seems to occupy only 6PiB out of 9PiB available, so that when I've written ~2.5PiB data into it (and ~2.5PiB more checksums), it says that I have only 500TiB space available (corresponding to 1PiB raw space).
Here's the output of ceph df:
$ sudo ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 8.9 PiB 3.7 PiB 5.2 PiB 5.2 PiB 58.62
ssd 35 TiB 15 TiB 20 TiB 20 TiB 57.96
TOTAL 9.0 PiB 3.7 PiB 5.2 PiB 5.3 PiB 58.62
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
cephfs_metadata 7 5.1 GiB 1.55M 5.7 GiB 0.15 780 GiB
cephfs_erdata 8 2.5 PiB 687.98M 5.2 PiB 84.29 500 TiB
Note the MAX AVAIL column in POOLS section for pool cephfs_erdata states that only 500TiB is left, while AVAIL column in RAW STORAGE hdd CLASS has 3.7PiB available.
What does that mean? Can I allocate more space for that pool? Why didn't Ceph itself allocate all the space available for it?

We found out the causes of this problem.
Due to a mis-configuration our CephFS was using ssd drives not only for storing metadata, but the actual data as well. CephFS runs out of space whenever one of the OSDs runs out of space and it can't place any more data on it. So the SSDs were the bottleneck for MAX_AVAIL.
Even hdd Ceph OSDs were not evenly loaded. So we had to run reweight. After that the data were distributed evenly and MAX_AVAIL size approached AVAIL.

Related

Understanding Kubernetes eviction algorithm

In have a situation where the node has 4GB memory and actual memory usage looks as below:
Pod
Memory Requested
Memory Limit
Memory Used
1
2.0 GiB
3.0 GiB
1.0 GiB
2
2.0 GiB
3.0 GiB
1.0 GiB
Free
0.0 GiB
0.0 GiB
2.0 GB
Since there is free memory, nothing gets evicted.
But now let's say both pods 1 and 2 start doing real work, and the situation changes to
Pod
Memory Requested
Memory Limit
Memory Used
1
2.0 GiB
3.0 GiB
3.0 GiB
2
2.0 GiB
3.0 GiB
2.0 GiB
and the Kubernetes eviction algorithm gets triggered.
In such a situation which pod will be evicted? Will it be pod1 or pod2?.
I have already checked pod selection rules, but still not able to get an understanding of how eviction will work in this case.
In your example, pod 1 will get evicted. The Pod that is not using more memory than it requested will not get evicted.
This is mentioned in the Kubernetes documentation you link to:
The kubelet uses the following parameters to determine the pod eviction order:
Whether the pod's resource usage exceeds requests
Pod Priority
The pod's resource usage relative to requests
In your example, pod 2's resource usage does not exceed requests (memory requested=2 GiB, actual use=2 GiB) so it is removed from the algorithm. That leaves pod 1 as the only pod remaining, and it gets evicted.
Say pod 2 is also above its request. Then for both pods, subtract actual utilization from the request, and the pod that's the most over its limit gets evicted.
Let's look at a little more complex example on a hypothetical 8 GiB node:
Pod
Requested
Actual
Excess use
1
4.0 GiB
4.0 GiB
0.0 GiB
2
1.0 GiB
2.0 GiB
1.0 GiB
3
1.0 GiB
1.3 GiB
0.3 GiB
4
0.0 GiB
0.8 GiB
0.8 GiB
Pod 1 is using the most memory, but it is within its requests, so it is safe. Subtracting actual use from requests, pod 2 is using the most excess memory and it is the one that will get evicted. Pod 4 hasn't declared resource requests at all, and while it's safe in this scenario, it's at risk in general; absent pod 2, it's the pod using the most memory above its requests, even though it's the second-least absolute memory.

Jenkins and PostgreSQL is consuming a lot of memory

We have a Data ware house server running on Debian linux ,We are using PostgreSQL , Jenkins and Python.
It's been few day the memory of the CPU is consuming a lot by jenkins and Postgres.tried to find and check all the ways from google but the issue is still there.
Anyone can give me a lead on how to reduce this memory consumption,It will be very helpful.
below is the output from free -m
total used free shared buff/cache available
Mem: 63805 9152 429 16780 54223 37166
Swap: 0 0 0
below is the postgresql.conf file
Below is the System configurations,
Results from htop
Please don't post text as images. It is hard to read and process.
I don't see your problem.
Your machine has 64 GB RAM, 16 GB are used for PostgreSQL shared memory like you configured, 9 GB are private memory used by processes, and 37 GB are free (the available entry).
Linux uses available memory for the file system cache, which boosts PostgreSQL performance. The low value for free just means that the cache is in use.
For Jenkins, run it with these JAVA Options
JAVA_OPTS=-Xms200m -Xmx300m -XX:PermSize=68m -XX:MaxPermSize=100m
For postgres, start it with option
-c shared_buffers=256MB
These values are the one I use on a small homelab of 8GB memory, you might want to increase these to match your hardware

osm2pgsql - importing of an openstreetmaps planet file takes very long

I have installed Nominatim to a server dedicated just for OSM data, with the following configurations: CentOS 7 operating system, 2x Intel XEON CPU L5420 # 2.50GHz (Total 8 CPU cores), 16 GB of ram, and 2x2TB SATA hard drive.
I've configured the postgresql based on the recomendations on the Nominatim install wiki (http://wiki.openstreetmap.org/wiki/Nominatim/Installation#PostgreSQL_Tuning), taking into account, that my machine has only got 16 GB instead of the 32 GB recommended for those configs. I've used the following things:
shared_buffers = 1GB # recommended for a 32GB machine was 2 GB
maintenance_work_mem = 4GB # recommended for a 32GB macinhe was 8 GB
work_mem = 20MB # recommended for a 32GB machine was 50 MB
effective_cache_size = 10GB # recommended for a 32GB machine was 24 GB
synchronous_commit = off
checkpoint_segments = 100
checkpoint_timeout = 10min
checkpoint_completion_target = 0.9
fsync = off
full_page_writes = off`
First, I've tried importing a small country extract(Luxembourg), setting a cache size of 6000, using the setup.php file from utils, it was imported succesfully under 1 hour.
Secondly, I've deleted the data of Luxembourg, and imported for another test purpose the country extract of Great Brittain, using a cache size of 8000, it imported succesfully as well, in around 2-3 hours.
Today, I've decided, to try to import the whole planet.pbf file, so I've deleted the postgresql database, downloaded a pbf of the planet from one of the official mirror sites, and ran the setup with a cache size of 10000. Beforehand, I've read up some benchmarks to get a vague idea of how much time and space will this operation take.
When the import started, I was very surprised. The importing of the nodes went with a whopping high speed of 1095.6k/s, in the benchmark which I've analyized (a 32GB ram machine), it was only 311.7k/s.
But when the import of the nodes finished, and the import of the ways started, the speed significantly dropped. It was importing the ways with the speed of 0.16k/s (altough it was slowly rising, it started from 0.05k/s, and in 4 hours it rised to the above mentioned value).
I've stopped the import, and tried to tweak the settings. I've allocated a higher cache size first (12000), but with no success, the nodes imported with a very high speed, but the ways remained at 0.10-0.13k/s. I then tried allocating a new swap file(the original was 8GB, I've allocated another 32GB as a swap file), but that didn't change anything neither. Lastly, I've edited the setup.php, changed the --number-processes from 1, to 6, and included the --slim keyword when osm2psql is started from there, but nothing changed.
Right now I am out of ideas. Is this speed decrease normal? Should I upgrade my machine to the recommended memory? I tought that a 16GB ram would be enough for planet pbf, I was aware that it could take more time with this machine, then with a 32 GB, but this seems very much. If the whole planet import would take not more then 12-15 days, I would be ok with that, but as things look now, with these settings the import would take around 2 months, and this is just too much, considering, an error could occur anywhere, and I have to start the whole import process again.
Any ideas what could cause this problem, or what other tweaks could I try, to fasten the import process?
Thanks
I had a similar performance problem using SATA drives, when I replaced the SATA drives for SSD drives the ways import speeded up from 0.02k/s to 8.29k/s. Now I have a very slow relations import which is at 0.01/s rate, so I believe memory is also an important factor for a full planet import but I have not tested it again.

Spark Configuration: SPARK_MEM vs. SPARK_WORKER_MEMORY

In spark-env.sh, it's possible to configure the following environment variables:
# - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g)
export SPARK_WORKER_MEMORY=22g
[...]
# - SPARK_MEM, to change the amount of memory used per node (this should
# be in the same format as the JVM's -Xmx option, e.g. 300m or 1g)
export SPARK_MEM=3g
If I start a standalone cluster with this:
$SPARK_HOME/bin/start-all.sh
I can see at the Spark Master UI webpage that all the workers start with only 3GB RAM:
-- Workers Memory Column --
22.0 GB (3.0 GB Used)
22.0 GB (3.0 GB Used)
22.0 GB (3.0 GB Used)
[...]
However, I specified 22g as SPARK_WORKER_MEMORY in spark-env.sh
I'm somewhat confused by this. Probably I don't understand the difference between "node" and "worker".
Can someone explain the difference between the two memory settings and what I might have done wrong?
I'm using spark-0.7.0. See also here for more configuration info.
A standalone cluster can host multiple Spark clusters (each "cluster" is tied to a particular SparkContext). i.e. you can have one cluster running kmeans, one cluster running Shark, and another one running some interactive data mining.
In this case, the 22GB is the total amount of memory you allocated to the Spark standalone cluster, and your particular instance of SparkContext is using 3GB per node. So you can create 6 more SparkContext's using up to 21GB.

MongoDB in the cloud hosting, benefits

Im still fighting with mongoDB and I think this war will end is not soon.
My database has a size of 15.95 Gb;
Objects - 9963099;
Data Size - 4.65g;
Storage Size - 7.21g;
Extents - 269;
Indexes - 19;
Index Size - 1.68g;
Powered by:
Quad Xeon E3-1220 4 × 3.10 GHz / 8Gb
For me to pay dearly for a dedicated server.
On VPS 6GB memory, database is not imported.
Migrate to the cloud service?
https://www.dotcloud.com/pricing.html
I try to pick up the rate but there max 4Gb memory mongoDB (USD 552.96/month o_0), I even import your base can not, not enough memory.
Or something I do not know about cloud services (no experience with)?
Cloud services are not available to a large database mongoDB?
2 x Xeon 3.60 GHz, 2M Cache, 800 MHz FSB / 12Gb
http://support.dell.com/support/edocs/systems/pe1850/en/UG/p1295aa.htm
Will work my database on that server?
This is of course all the fun and get the experience in the development, but already beginning to pall ... =]
You shouldn't have an issue with a db of this size. We were running a mongodb instance on Dotcloud with 100's of GB of data. It may just be because Dotcloud only allow 10GB of HDD space by default per service.
We were able to backup and restore that instance on 4GB of RAM - albeit that it took several hours
I would suggest you email them directly support#dotcloud.com to get help increasing the HDD allocation of your instance.
You can also consider using ObjectRocket which is a MOngoDB as a service. For a 20Gb database the price is $149 per month - http://www.objectrocket.com/pricing