We have a process that will bulk ingest around 6 billion documents of 500 bytes each. We are finding that the performance really tends to drop off around the 4-5bn document mark with the lions share of the time spent splitting chunks which prevents insertions during that time.
Does it make sense to increase the chunk size from the default of 64MB and / or to disable auto splitting altogether as a temporary solution?
Related
I have db table which has around 5-6 Mn entries and it is taking around 20 minutes to perform vacuuming. Since, one field of this table is updated very frequently, thereare a lot of dead rows to deal with.
For an estimate, with our current user base it can have 2 Million dead tuples on daily basis. So, vacuuming of this table requires both:
Read IO: as the whole table is not present in shared memory.
Write IO: as there are a lot of entries to update.
What should be an ideal way to vacuum this table? Should I increase the autovacuum_cost_limit to allow more operations per autovacuum run? But as i can see, it will increase IOPS, which again might hinder the performance. Currently, I have autovacuum_scale_factor = 0.2. Should I decrease it? If I decrease it it will run more often, although write IO will decrease but it will lead to more number of time period with high read IO.
Also, as the user base will increase it will take more and more time as the size of table with increase and vacuum will have to read a lot from disk. So, what should I do?
One of the solution I have thought of:
Separate the highly updated column and make a separate table.
Tweaking the parameter to make it run more often to decrease write IO(as discussed above). How to handle more Read IO, as vacuum will now run more often?
Combine point 2 along with increasing RAM to reduce Read IO as well.
In general what is the approach that people takes, because I assume people must have very big table 10GB or more, that needs to be vacuumed.
Separating the column is a viable strategy but would be a last resort to me. PostgreSQL already has a high per-row overhead, and doing this would double it (which might also remove most of the benefit). Plus, it would make your queries uglier, harder to read, harder to maintain, easier to introduce bugs. Where splitting it would be most attractive is if index-only-scans on a set of columns not including this is are important to you, and splitting it out lets you keep the visibility map for those remaining columns in a better state.
Why do you care that it takes 20 minutes? Is that causing something bad to happen? At that rate, you could vacuum this table 72 times a day, which seems to be way more often than it actually needs to be vacuumed. In v12, the default value for autovacuum_vacuum_cost_delay was dropped 10 fold, to 2ms. This change in default was not driven by changes in the code in v12, but rather by the realization that the old default was just out of date with modern hardware in most cases. I would have no trouble pushing that change into v11 config; but I don't think doing so would address your main concern, either.
Do you actually have a problem with the amount of IO you are generating, or is it just conjecture? The IO done is mostly sequential, but how important that is would depend on your storage hardware. Do you see latency spikes while the vacuum is happening? Are you charged per IO and your bill is too high? High IO is not inherently a problem, it is only a problem if it causes a problem.
Currently, I have autovacuum_scale_factor = 0.2. Should I decrease it?
If I decrease it it will run more often, although write IO will
decrease but it will lead to more number of time period with high read
IO.
Running more often probably won't decrease your write IO by much if any. Every table/index page with at least one obsolete tuple needs to get written, during every vacuum. Writing one page just to remove one obsolete tuple will cause more writing than waiting until there are a lot of obsolete tuples that can all be removed by one write. You might be writing somewhat less per vacuum, but doing more vacuums will make up for that, and probably far more than make up for it.
There are two approaches:
Reduce autovacuum_vacuum_cost_delay for that table so that autovacuum becomes faster. It will still consume I/O, CPU and RAM.
Set the fillfactor for the table to a value less than 100 and make sure that the column you update frequently is not indexed. Then you could get HOT updates which don't require VACUUM.
I have a set of large files (around 2Gb).
When I attempt to load it (let's assume that correctly):
ctq_table:flip `QTIM`BID`OFR`QSEQ`BIDSIZE`QFRSIZ`OFRSIZ`MODE`EX`MMID!("ijjiiihcs";4 8 8 4 4 4 2 1 4) 1: `:/q/data/Q200405A.BIN
It gives back a wsfull error. Kdb+ as far as I know meant to be used for such tasks.
Is there a way to handle big files without running out of memory (like keeping on disk, even if it is slower)?
As Igor mentioned in the comments (and getting back on to the topic of the question) you can read the large binary file in chunks and write to disk one piece at a time. This will reduce your memory footprint at the expense of being slower due to the additional disk i/o operations.
In general, chunking can be trickier for bytestreams because you might end a chunk with an incomplete message (if your chunk point was arbitrary and messages were variable-width) however in your case you seem to have fixed-width messages so the chunk end-points are easier to calculate.
Either way I often find it useful to loop using over (/) and keep track of your last known (good) index and then start at that index when reading the next chunk. The general idea (untested) would be something like
file:`:/q/data/Q200405A.BIN;
chunkrows:10000; /number of rows to process in each chunk
columns:`QTIM`BID`OFR`QSEQ`QFRSIZ`OFRSIZ`MODE`EX`MMID;
types:"ijjiiihcs";
widths:4 8 8 4 4 4 2 1 4;
{
data:flip columns!(types;widths)1:(file;x;chunkrows*sum widths);
upsertToDisk[data]; /write a function to upsert to disk (partitioned or splayed)
x+chunkrows*sum widths /return the rolling index of the starting point for the next chunk
}/[hcount[file]>;0]
This will continue until the last good index reaches the end of the file. You can adjust the chunkrows size accordingly depending on your memory constraints.
Ultimately if you're trying to handle large-ish data with the free 32bit version then you're going to have headaches no matter what you do.
Here is the situation:
There is a chunk, has the shard key range [10001, 100030], but currently, it has only one key (e.g. 10001) has the data, key range from [10002, 10030] is just empty, the chuck data is beyond 8M, then we set the current chuck size to 8M.
After we fill the data in the key range [10002, 10030], this chunk starts to split, and stopped at a key range like this `[10001, 10003], it has two keys, and we just wonder if this is OK or not.
From the document on the official site we thought that the chunk might NOT contains more than ONE key.
So, would you please help us make sure if this is ok or not ?
What we want to is to split the chunks as many as possible, so that to make sure the data is balanced.
There is a notion called jumbo chunks. Every chunk which exceeds its specified size or has more documents than the maximum configured is considered a jumbo chunk.
Since MongoDB usually splits a chunk when about half the chunk size is reached, I take Jumbo chunks as a sign that there is something rather wrong with the cluster.
The most likely reason for jumbo chunks is that one or more config servers wasn't available for a time.
Metadata updates need to be written to all three config servers (they don't build a replica set), metadata updates can not be made in case one of the config servers is down. Both chunk splitting and migration need a metadata update. So when one config server is down, a chunk can not be split early enough and it will grow in size and ultimately become a jumbo chunk.
Jumbo chunks aren't automatically split, even when all three config servers are available. The reason for this is... Well, IMHO MongoDB plays a little save here. And Jumbo chunks aren't moved, either. The reason for this is rather obvious - moving data which in theory can have any size > 16MB simply is a too costly operation.
Proceed at your own risk! You have been warned!
Since you can identify the jumbo chunks, they are pretty easy to deal with.
Simply identify the key range of the chunk and use it within
sh.splitFind("database.collection", query)
This will identify the shard in question and split in half, which is quite important. Please, please read Split Chunks in a Sharded Cluster and make sure you understood all of it and the implications before trying to split the chunks manually.
I have a perl script (in Ubuntu 12.04 LTS) writing to 26 TCH files. The keys are roughly equally distributed. The writes become very slow after 3 Million inserts (equally distributed to all the files) and the speed comes down from 240,000 inserts/min at the beginning to 14,000 inserts/min after 3 MM inserts. Individually the shard files are no more than 150 MB and overall their size comes to around 2.7 GB.
I run optimize on every TCH File after every 100K inserts to that file with bnum as 4*num_records_then and options set to TLARGE and make sure xmsiz matches the size of bnum (as mentioned in Why does tokyo tyrant slow down exponentially even after adjusting bnum?)
Even after this, the inserts start at high speed then slowly decrease to 14k inserts/min from 240k inserts/min. Could it be due to holding multiple tch connections (26) in a single script? Or is there configuration setting, I'm missing (would disabling journaling help, but the above thread says journaling affects performance only after the tch file becomes bigger than 3-4GB, my shards are <150MB files..)?
I would turn off journaling and measure what changes.
The cited thread talks about a 2-3 GB tch file, but if you sum the sizes of your 26 tch files, you are in the same league. For the filesystem, the total amount of data ranges written to should be the relevant parameter.
I have a collections with a shard key and index endabled. But when I run balancing, the chunks are not moved for this collections where as the other collection chunks are moving as expected to other machines. Only one chunk is moved from this collection.
Currently (this will change in the near future), the balancer will only start moving chunks when there is a sufficient imbalance (8 or more). If the chunk counts are closer than that, then there will be no movement. The number of chunks is dependent on the max chunk size (64MB at the time of writing this in 2.0.x) and the amount of data written. There is a split triggered every time a certain amount of data is written to a chunk.
So, if you are not adding data to the collection, or the data is not very large, it can take some time to create the number of chunks necessary yo trigger a balancing round.
You can take this into your own hands by manually splitting and moving a chunk:
http://www.mongodb.org/display/DOCS/Splitting+Shard+Chunks
Or, you can add more data to trigger splits and the balancer will eventually kick in and move the chunks around for you.