In a write concern mechanism, Suppose our data to be written is now available at journal and not yet passed to hard drive. Meanwhile if a read/write operation comes to the same data then how the engine will handle them ?
If the data has been written to the journal, it has also been written in memory and would be available for reads.
There is no mechanism for you to access/read the journal, but this is not necessary.
Related
As i know MonogoDB cache working set in RAM.
Then if i increase wiredTigerCacheSizeGB as much as all of data in disk, does it work as fast as in-memory db?
if no, what is difference?
See In-Memory Storage Engine and WiredTiger Storage Engine
(In-memory) By avoiding disk I/O, the in-memory storage engine allows for more predictable latency of database operations.
Keep in mind that you are limited a 10000 GB when setting wiredTigerCacheSizeGB. You should also disable journaling and set storage.syncPeriodSecs to 0 in order to increase performance of WiredTiger. But, still WiredTiger has to create WiredTiger.wt and WiredTiger.turtle at least...
PS. I think this link might answer your question
I cannot answer all your questions.
A cache reads data from disk and keeps it in the RAM. When you access such data again then you read it from RAM instead of reading it again from disk - which would be much slower.
So, a cache is useless if you have to read the data only once. Some applications anticipate the data you may read in future and put it into the cache in advance.
The MongoDB in-memory DB puts all data into RAM only, it does not read or write anything from disk, apart from some logging data. When you stop an in-memory MongoDB process then all data is lost.
The wiredTiger storage engine is a data format used by MongoDB to store data persistently on disk.
If you set wiredTigerCacheSizeGB high enough to hold all of your data, then all of your reads will be satisfied from the cache. Writes will update the cache and also be written to storage.
If you use the in-memory configuration then all of your reads will be satisfied from memory. Writes will only go to memory and will not be stored on disk.
So if your workload is mostly reads, then the large cache will behave similarly to an in-memory DB. If your workload has a lot of writes, then the large cache configuration may be slower because it needs to write to disk.
Also, the in-memory DB will not preserve your data in the event of a crash, since it only holds data in memory.
I am using kafka_2.10-0.10.0.1. I have two questions:
- I want to know how I can modify the default configuration of Kafka to process large amount of data with good performance.
- Is it possible to configure Kafka to process the records in memory without storing in disk?
thank you
Is it possible to configure Kafka to process the records in memory without storing in disk?
No. Kafka is all about storing records reliably on disk, and then reading them back quickly off of disk. In fact, its documentation says:
As a result of taking storage seriously and allowing the clients to control their read position, you can think of Kafka as a kind of special purpose distributed filesystem dedicated to high-performance, low-latency commit log storage, replication, and propagation.
You can read more about its design here: https://kafka.apache.org/documentation/#design. The implementation section is also quite interesting: https://kafka.apache.org/documentation/#implementation.
That said, Kafka is also all about processing large amounts of data with good performance. In 2014 it could handle 2 million writes per second on three cheap instances: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines. More links about performance:
https://docs.confluent.io/current/kafka/deployment.html
https://www.confluent.io/blog/optimizing-apache-kafka-deployment/
https://community.hortonworks.com/articles/80813/kafka-best-practices-1.html
https://www.cloudera.com/documentation/kafka/latest/topics/kafka_performance.html
I've been read a lot about MongoDB recently, but one topic I can't find any clear material on, is how data is written to the journal and oplog.
So this is what I understand of the process so far, please correct me where I'm wrong
A client connect to mongod and performs a write. The write is stored in the socket buffer
When Mongo is available (not sure what available means at this point), data is written to the journal?
The mongoDB docs then say that writes every 60 seconds are flushed from the journal onto disk. By this I can only assume this mean written to the primary and the oplog. If this is the case, how to writes appear earlier than the 60 seconds sync interval?
Some time later, secondaries suck data from the primary or their sync source and update their oplog and databases. It seems very vague about when exactly this happens and what delays it.
I'm also wondering if journaling was disabled (I understand that's a really bad idea), at what point does the oplog and database get updated?
Lastly I'm a bit stumpted at which points in this process, the write locks get created. Is this just when the database and oplog are updated or at other times too?
Thanks to anyone who can shed some light on this or point me to some reading material.
Simon
Here is what happens as far as I understand it. I simplified a bit, but it should make clear how it works.
A client connects to mongo. No writes done so far, and no connection torn down, because it really depends on the write concern what happens now.Let's assume that we go with the (by the time of this writing) default "acknowledged".
The client sends it's write operation. Here is where I am really not sure. Either after this step or the next one the acknowledgement is sent to the driver.
The write operation is run through the query optimizer. It is here where the acknowledgment is sent because with in an acknowledged write concern, you may be returned a duplicate key error. It is possible that this was checked in the last step. If I should bet, I'd say it is after this one.
The output of the query optimizer is then applied to the data in memory Actually to the data of the memory mapped datafiles, to the memory mapped oplog and to the journal's memory mapped files. Queries are answered from this memory mapped parts or the according data is mapped to memory for answering the query. The oplog is read from memory if present, too.
Every 100ms in general the journal is synced to disk. The precise value is determined by a number of factors, one of them being the journalCommitInterval configuration parameter. If you have a write concern of journaled, the driver will be notified now.
Every syncDelay seconds, the current state of the memory mapped files is synced to disk I think the journal is truncated to the entries which weren't applied to the data yet, but I am not too sure of that since that it should basically never happen that data in the journal isn't yet applied to the current data.
If you have read carefully, you noticed that the data is ready for the oplog as early as it has been run through the query optimizer and was applied to the files mapped into memory. When the oplog entry is pulled by one of the secondaries, it is immediately applied to it's data of the memory mapped files and synced in the disk the same way as on the primary.
Some things to note: As soon as the relatively small data is written to the journal, it is quite safe. If a node goes down between two syncs to the datafiles, both the datafiles and the oplog can be restored from their last state in the datafiles and the journal. In general, the maximum data loss you can have is the operations recorded into the log after the last commit, 50ms in median.
As for the locks. If you have written carefully, there aren't locks imposed on a database level when the data is synced to disk. Write locks may be created in order to assure that only one thread at any given point in time modifies a given document. There are other write locks possible , but in general, they should be rather rare.
Write locks on the filesystem layer are created once, though only implicitly, iirc. During application startup, a lock file is created in the root directory of the dbpath. Any other mongod instance will refuse to do any operation on those datafiles while a valid lock exists. And you shouldn't either ;)
Hope this helps.
Using MongoDB (via PyMongo) in the default "acknowledged" write concern mode, is it the case that if I have a line that writes to the DB (e.g. a mapReduce that outputs a new collection) followed by a line that reads from the DB, the read will always see the changes from the write?
Further, is the above true for all stricter write concerns than "acknowledged," i.e. "journaled" and "replica acknowledged," but not true in the case of "unacknowledged"?
If the write has been acknowledged, it should have been written to memory, thus any subsequent query should get the current data. This won't work if you have a replica set and allow reads from secondaries.
Journaled writes are written to the journal file on disk, which protects your data in case of power / hardware failures, etc. This shouldn't have an impact on consistency, which is covered as soon as the data is in memory.
Any replica configuration in the write concern will ensure that writes need to be acknowledged by the majority / all nodes in the replica set. This will only make a difference if you read from replicas or to protect your data against unreachable / dead servers.
For example in case of WiredTiger database engine, there'll be a cache of pages inside memory that are periodically written and read from disk, depending on memory pressure. And, in case of MMAPV1 storage engine, there would be a memory mapped address space that would correspond to pages on the disk. Now, the secondary structure that's called a journal. And a journal is a log of every single thing that the database processes - notice that the journal is also in memory.
When does the journal gets written to the disk?
When the app request something to the mongodb server via a TCP connection - and the server is gonna process the request. And it's going to write it into the memory pages. But they may not write to the disk for quite a while, depending on the memory pressure. It's also going to update request into the journal. By default, in the MongoDB driver, when we make a database request, we wait for the response. Say an acknowledged insert/update. But we don't wait for the journal to be written to the disk. The value that represents - whether we're going to wait for this write to be acknowledged by the server is called w.
w = 1
j = false
And by default, it's set to 1. 1 means, wait for this server to respond to the write. By default, j equals false, and j which stands for journal, represents whether or not we wait for this journal to be written to be written to the disk before we continue. So, what are the implications of these defaults? Well, the implications are that when we do an update/insert - we're really doing the operation in memory and not necessarily to the disk. This means, of course, it's very fast. And periodically (every few seconds) the journal gets written to the disk. It won't be long, but during this window of vulnerability when the data has been written into the server's memory into the pages, but the journal has not yet been persisted to the disk, if the server crashed, we could lose the data. We also have to realize that, as a programmer just because the write came back as good and it was written successfully to the memory. It may not ever persist to disk if the server subsequently crashes. And whether or not this is the problem depends on the application. For some applications, where there are lots of writes and logging small amount of data, we might find that it's very hard to even keep up with the data stream, if we wait for the journal to get written to the disk, because the disk is going to be 100 times, 1,000 times slower than memory for every single write. But for other applications, we may find that it's completely necessary for us to wait for this to be journaled and to know that it's been persisted to the disk before we continue. So, it's really upto us.
The w and j value together are called write concern. They can be set in the driver, at the collection level, database level or a client level.
1 : wait for the write to be acknowledged
0 : do not wait for the write to be acknowledged
TRUE : sync to journal
FALSE : do not sync to journal
There are also other values for w as well that also have some significance. With w=1 & j=true we could make sure that those writes have been persisted to disk. Now, if the writes have been written to the journal, then what happens is if the server crashes, then even though the pages may not be written back to disk yet, on recovery, the server can look at the journal on the disk - the mongod process and recreate all the writes that were not yet persisted to the pages. Because, they've been written to the journal. So, that's why this gives us a greater level of safety.
Mongo documentation says single document write are atomic but at another place it mentions interleaved transactions may read uncommitted data and before the writer thread has returned.
I understand that other transactions can read uncommitted data because the write may not be still committed to the journal.
But how can threads read data while the writer thread has not returned. Is it for cases when the write concern is not default?
Thanks
Ankur
Ok with the reference I can now get a context and tell you what this is about.
Mongo documentation says single document write are atomic
Yep
it mentions interleaved transactions may read uncommitted
Infact any read may get uncommited data. This is because MongoDB will write to the fsync queue BEFORE it writes to disk.
MongoDB can read from this fsync queue before it goes to disk, and quoting the page:
Other database systems refer to these isolation semantics as read uncommitted.
Mainly ACID databases do.
But how can threads read data while the writer thread has not returned.
Thanks to MongoDBs con-currency rules: http://docs.mongodb.org/manual/faq/concurrency/#does-a-read-or-write-operation-ever-yield-the-lock
In short, to sum up: The write will not take exclusive lock for the duration of its running, instead it can subside (due to various rules) to a read allowing you to return data half way through a write.
This is also why you must sometimes be careful about multi-document updates and other threads of your application reading data, it may actually get one half of data that is upto date and the other half which is not.