GFS Architecture
Total hypothetical scenario
The GFS contains 2 chunkservers. The first chunkserver contains the first chunk C1, and the second chunkserver contains the second chunk C2.
The first chunk server also contains replica of C2 and second chunk server also contains replica of C1.
A file named "file1" is stored in C1 and C2 chunks. It exact size is irrelevant to the analysis. But it's of course greater than 64MB.
Now, user wants to access the contents of "file1" starting at offset 100,000 Bytes and client wants to read 1,000,000 Bytes.
So, what's the chunk index?
Now, I definitely know the probable answers, but I'm not sure how this is helping to find that file.
The answer is
0.1/64=0.0015625
And, 0.0171.
Offset/Chunk_size --- (Offset+Bytes_to_be_accessed)/Chunk_size
This isn't making much sense to me. So how will that master now find the exact location of that chunk?
And why could not client just send this information to Chunkserver and get the file?
Sounds weirdly complicated and GFS research paper has been written without any clarification to readers.
Why is further processing required for chunk index? What's the need of chunk handle? (Chunk handle is the unique id of each chunk). And if chunk handle identified the chunk, why was chunk location required?
Like how does the GFS master "meta data" looks like?
Related
As per the documentation of Kafka
the data structure used in Kafka to store the messages is a simple log where all writes are actually just appends to the log.
What I don't understand here is, many claim that Kafka performance is constant irrespective of the data size it handles.
How can random reads be constant time in a linear data structure?
If I have a single partition topic with 1 billion messages in it. How can the time taken to retrieve the first message be same as the time taken to retrieve the last message, if the reads are always sequential?
In Kafka, the log for each partition is not a single file. It is actually split in segments of fixed size.
For each segment, Kafka knows the start and end offsets. So for a random read, it's easy to find the correct segment.
Then each segment has a couple of indexes (time and offset based). Those are the file named *.index and *.timeindex. These files enable jumping directly to a location near (or at) the desired read.
So you can see that the total number of segments (also total size of the log) does not really impact the read logic.
Note also that the size of segments, the size of indexes and the index interval are all configurable settings (even at the topic level).
I have a set of large files (around 2Gb).
When I attempt to load it (let's assume that correctly):
ctq_table:flip `QTIM`BID`OFR`QSEQ`BIDSIZE`QFRSIZ`OFRSIZ`MODE`EX`MMID!("ijjiiihcs";4 8 8 4 4 4 2 1 4) 1: `:/q/data/Q200405A.BIN
It gives back a wsfull error. Kdb+ as far as I know meant to be used for such tasks.
Is there a way to handle big files without running out of memory (like keeping on disk, even if it is slower)?
As Igor mentioned in the comments (and getting back on to the topic of the question) you can read the large binary file in chunks and write to disk one piece at a time. This will reduce your memory footprint at the expense of being slower due to the additional disk i/o operations.
In general, chunking can be trickier for bytestreams because you might end a chunk with an incomplete message (if your chunk point was arbitrary and messages were variable-width) however in your case you seem to have fixed-width messages so the chunk end-points are easier to calculate.
Either way I often find it useful to loop using over (/) and keep track of your last known (good) index and then start at that index when reading the next chunk. The general idea (untested) would be something like
file:`:/q/data/Q200405A.BIN;
chunkrows:10000; /number of rows to process in each chunk
columns:`QTIM`BID`OFR`QSEQ`QFRSIZ`OFRSIZ`MODE`EX`MMID;
types:"ijjiiihcs";
widths:4 8 8 4 4 4 2 1 4;
{
data:flip columns!(types;widths)1:(file;x;chunkrows*sum widths);
upsertToDisk[data]; /write a function to upsert to disk (partitioned or splayed)
x+chunkrows*sum widths /return the rolling index of the starting point for the next chunk
}/[hcount[file]>;0]
This will continue until the last good index reaches the end of the file. You can adjust the chunkrows size accordingly depending on your memory constraints.
Ultimately if you're trying to handle large-ish data with the free 32bit version then you're going to have headaches no matter what you do.
Here is the situation:
There is a chunk, has the shard key range [10001, 100030], but currently, it has only one key (e.g. 10001) has the data, key range from [10002, 10030] is just empty, the chuck data is beyond 8M, then we set the current chuck size to 8M.
After we fill the data in the key range [10002, 10030], this chunk starts to split, and stopped at a key range like this `[10001, 10003], it has two keys, and we just wonder if this is OK or not.
From the document on the official site we thought that the chunk might NOT contains more than ONE key.
So, would you please help us make sure if this is ok or not ?
What we want to is to split the chunks as many as possible, so that to make sure the data is balanced.
There is a notion called jumbo chunks. Every chunk which exceeds its specified size or has more documents than the maximum configured is considered a jumbo chunk.
Since MongoDB usually splits a chunk when about half the chunk size is reached, I take Jumbo chunks as a sign that there is something rather wrong with the cluster.
The most likely reason for jumbo chunks is that one or more config servers wasn't available for a time.
Metadata updates need to be written to all three config servers (they don't build a replica set), metadata updates can not be made in case one of the config servers is down. Both chunk splitting and migration need a metadata update. So when one config server is down, a chunk can not be split early enough and it will grow in size and ultimately become a jumbo chunk.
Jumbo chunks aren't automatically split, even when all three config servers are available. The reason for this is... Well, IMHO MongoDB plays a little save here. And Jumbo chunks aren't moved, either. The reason for this is rather obvious - moving data which in theory can have any size > 16MB simply is a too costly operation.
Proceed at your own risk! You have been warned!
Since you can identify the jumbo chunks, they are pretty easy to deal with.
Simply identify the key range of the chunk and use it within
sh.splitFind("database.collection", query)
This will identify the shard in question and split in half, which is quite important. Please, please read Split Chunks in a Sharded Cluster and make sure you understood all of it and the implications before trying to split the chunks manually.
in fact the following is what i really want to ask:
the set of chunk size is 1M. when a config server down, the whole config servers are readonly. current cluster have only a chunk, if I want to insert a lot of data to this cluster, the capacity of these data is more than 1M, Can I successfully insert these data?
if yes, do it describe that the real chunk size can more than the set of chunk size?
Thanks!
Short answer: yes you can, but you should fix your config server(s) asap to avoid unbalancing your shards for long.
Chunks are split automatically when they reach their size threshold - stated here.
However, during a config server failure, chunks cannot be split. Even if just one server fails. See here.
Edits
As stated by Sergio Tulentsev, you should fix your config server(s) before performing your insert. The system's metadata will continue to be readonly until then.
As Adam C's link points out, your shard will become unbalanced if you were to perform an insert like you describe before fixing your config server(s).
I have a collections with a shard key and index endabled. But when I run balancing, the chunks are not moved for this collections where as the other collection chunks are moving as expected to other machines. Only one chunk is moved from this collection.
Currently (this will change in the near future), the balancer will only start moving chunks when there is a sufficient imbalance (8 or more). If the chunk counts are closer than that, then there will be no movement. The number of chunks is dependent on the max chunk size (64MB at the time of writing this in 2.0.x) and the amount of data written. There is a split triggered every time a certain amount of data is written to a chunk.
So, if you are not adding data to the collection, or the data is not very large, it can take some time to create the number of chunks necessary yo trigger a balancing round.
You can take this into your own hands by manually splitting and moving a chunk:
http://www.mongodb.org/display/DOCS/Splitting+Shard+Chunks
Or, you can add more data to trigger splits and the balancer will eventually kick in and move the chunks around for you.