[MongoDB]:where is the incoming write operations saved in when the balancer is doing migration? - mongodb

please see http://docs.mongodb.org/manual/core/sharding-internals/#balancing-internals.
it has the following phrase:
"when MongoDB begins migrating a chunk, the database begins copying the data to the new server and tracks incoming write operations."
my question is where is these incoming write operations saved in? if it is in memory,then i need how to call getLastError and ensure the data has been synchronous to disk. Thanks!

First, if you want to guarantee a write, you should be calling getLastError (or using your driver's equivalent for a safe write) anyway, regardless of whether you are using sharding or not.
In terms of what happens to operations during a migration. You can find the answers to what happens to the data for in-flight writes by looking at the answer to these two questions in the FAQ:
http://docs.mongodb.org/manual/faq/sharding/#what-happens-if-a-client-updates-a-document-in-a-chunk-during-a-migration
http://docs.mongodb.org/manual/faq/sharding/#what-does-writebacklisten-in-the-log-mean
The two mechanisms describe what happens, depending on the nature of the operation. Either the origin shard for the migration ensures that the writes are "sent on" to the destination shard, or the writeback mechanism sends them back to the mongos process (where they will be automatically retried).

Related

External processing using Kafka Streams

There are several questions regarding message enrichment using external data, and the recommendation is almost always the same: ingest external data using Kafka Connect and then join the records using state stores. Although it fits in most cases, there are several other use cases in which it does not, such as IP to location and user agent detection, to name a few.
Enriching a message with an IP-based location usually requires a lookup by a range of IPs, but currently, there is no built-in state store that provides such capability. For user agent analysis, if you rely on a third-party service, you have no choices other than performing external calls.
We spend some time thinking about it, and we came up with an idea of implementing a custom state store on top of a database that supports range queries, like Postgres. We could also abstract an external HTTP or GRPC service behind a state store, but we're not sure if it is the right way.
In that sense, what is the recommended approach when you cannot avoid querying an external service during the stream processing, but you still must guarantee fault tolerance? What happens when an error occurs while the state store is retrieving data (a request fails, for instance)? Do Kafka Streams retry processing the message?
Generally, KeyValueStore#range(fromKey, toKey) is supported by build-in stores. Thus, it would be good to understand how the range queries you try to do are done? Also note, that internally, everything is stored as byte[] arrasy and RocksDB (default storage engine) sorts data accordingly -- hence, you can actually implement quite sophisticated range queries if you start to reason about the byte layout, and pass in corresponding "prefix keys" into #range().
If you really need to call an external service, you have "two" options to not lose data: if an external calls fails, throw an exception and let the Kafka Streams die. This is obviously not a real option, however, if you swallow error from the external lookup you would "skip" the input message and it would be unprocessed. Kafka Streams cannot know that processing "failed" (it does not know what your code does) and will not "retry", but consider the message as completed (similar if you would filter it out).
Hence, to make it work, you would need to put all data you use to trigger the lookup into a state store if the external call fails, and retry later (ie, do a lookup into the store to find unprocessed data and retry). This retry can either be a "side task" when you process the next input message, of you schedule a punctuation, to implement the retry. Note, that this mechanism changes the order in which records are processed, what might or might not be ok for your use case.

Synchronising transactions between database and Kafka producer

We have a micro-services architecture, with Kafka used as the communication mechanism between the services. Some of the services have their own databases. Say the user makes a call to Service A, which should result in a record (or set of records) being created in that service’s database. Additionally, this event should be reported to other services, as an item on a Kafka topic. What is the best way of ensuring that the database record(s) are only written if the Kafka topic is successfully updated (essentially creating a distributed transaction around the database update and the Kafka update)?
We are thinking of using spring-kafka (in a Spring Boot WebFlux service), and I can see that it has a KafkaTransactionManager, but from what I understand this is more about Kafka transactions themselves (ensuring consistency across the Kafka producers and consumers), rather than synchronising transactions across two systems (see here: “Kafka doesn't support XA and you have to deal with the possibility that the DB tx might commit while the Kafka tx rolls back.”). Additionally, I think this class relies on Spring’s transaction framework which, at least as far as I currently understand, is thread-bound, and won’t work if using a reactive approach (e.g. WebFlux) where different parts of an operation may execute on different threads. (We are using reactive-pg-client, so are manually handling transactions, rather than using Spring’s framework.)
Some options I can think of:
Don’t write the data to the database: only write it to Kafka. Then use a consumer (in Service A) to update the database. This seems like it might not be the most efficient, and will have problems in that the service which the user called cannot immediately see the database changes it should have just created.
Don’t write directly to Kafka: write to the database only, and use something like Debezium to report the change to Kafka. The problem here is that the changes are based on individual database records, whereas the business significant event to store in Kafka might involve a combination of data from multiple tables.
Write to the database first (if that fails, do nothing and just throw the exception). Then, when writing to Kafka, assume that the write might fail. Use the built-in auto-retry functionality to get it to keep trying for a while. If that eventually completely fails, try to write to a dead letter queue and create some sort of manual mechanism for admins to sort it out. And if writing to the DLQ fails (i.e. Kafka is completely down), just log it some other way (e.g. to the database), and again create some sort of manual mechanism for admins to sort it out.
Anyone got any thoughts or advice on the above, or able to correct any mistakes in my assumptions above?
Thanks in advance!
I'd suggest to use a slightly altered variant of approach 2.
Write into your database only, but in addition to the actual table writes, also write "events" into a special table within that same database; these event records would contain the aggregations you need. In the easiest way, you'd simply insert another entity e.g. mapped by JPA, which contains a JSON property with the aggregate payload. Of course this could be automated by some means of transaction listener / framework component.
Then use Debezium to capture the changes just from that table and stream them into Kafka. That way you have both: eventually consistent state in Kafka (the events in Kafka may trail behind or you might see a few events a second time after a restart, but eventually they'll reflect the database state) without the need for distributed transactions, and the business level event semantics you're after.
(Disclaimer: I'm the lead of Debezium; funnily enough I'm just in the process of writing a blog post discussing this approach in more detail)
Here are the posts
https://debezium.io/blog/2018/09/20/materializing-aggregate-views-with-hibernate-and-debezium/
https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/
first of all, I have to say that I’m no Kafka, nor a Spring expert but I think that it’s more a conceptual challenge when writing to independent resources and the solution should be adaptable to your technology stack. Furthermore, I should say that this solution tries to solve the problem without an external component like Debezium, because in my opinion each additional component brings challenges in testing, maintaining and running an application which is often underestimated when choosing such an option. Also not every database can be used as a Debezium-source.
To make sure that we are talking about the same goals, let’s clarify the situation in an simplified airline example, where customers can buy tickets. After a successful order the customer will receive a message (mail, push-notification, …) that is sent by an external messaging system (the system we have to talk with).
In a traditional JMS world with an XA transaction between our database (where we store orders) and the JMS provider it would look like the following: The client sets the order to our app where we start a transaction. The app stores the order in its database. Then the message is sent to JMS and you can commit the transaction. Both operations participate at the transaction even when they’re talking to their own resources. As the XA transaction guarantees ACID we’re fine.
Let’s bring Kafka (or any other resource that is not able to participate at the XA transaction) in the game. As there is no coordinator that syncs both transactions anymore the main idea of the following is to split processing in two parts with a persistent state.
When you store the order in your database you can also store the message (with aggregated data) in the same database (e.g. as JSON in a CLOB-column) that you want to send to Kafka afterwards. Same resource – ACID guaranteed, everything fine so far. Now you need a mechanism that polls your “KafkaTasks”-Table for new tasks that should be send to a Kafka-Topic (e.g. with a timer service, maybe #Scheduled annotation can be used in Spring). After the message has been successfully sent to Kafka you can delete the task entry. This ensures that the message to Kafka is only sent when the order is also successfully stored in application database. Did we achieve the same guarantees as we have when using a XA transaction? Unfortunately, no, as there is still the chance that writing to Kafka works but the deletion of the task fails. In this case the retry-mechanism (you would need one as mentioned in your question) would reprocess the task an sends the message twice. If your business case is happy with this “at-least-once”-guarantee you’re done here with a imho semi-complex solution that could be easily implemented as framework functionality so not everyone has to bother with the details.
If you need “exactly-once” then you cannot store your state in the application database (in this case “deletion of a task” is the “state”) but instead you must store it in Kafka (assuming that you have ACID guarantees between two Kafka topics). An example: Let’s say you have 100 tasks in the table (IDs 1 to 100) and the task job processes the first 10. You write your Kafka messages to their topic and another message with the ID 10 to “your topic”. All in the same Kafka-transaction. In the next cycle you consume your topic (value is 10) and take this value to get the next 10 tasks (and delete the already processed tasks).
If there are easier (in-application) solutions with the same guarantees I’m looking forward to hear from you!
Sorry for the long answer but I hope it helps.
All the approach described above are the best way to approach the problem and are well defined pattern. You can explore these in the links provided below.
Pattern: Transactional outbox
Publish an event or message as part of a database transaction by saving it in an OUTBOX in the database.
http://microservices.io/patterns/data/transactional-outbox.html
Pattern: Polling publisher
Publish messages by polling the outbox in the database.
http://microservices.io/patterns/data/polling-publisher.html
Pattern: Transaction log tailing
Publish changes made to the database by tailing the transaction log.
http://microservices.io/patterns/data/transaction-log-tailing.html
Debezium is a valid answer but (as I've experienced) it can require some extra overhead of running an extra pod and making sure that pod doesn't fall over. This could just be me griping about a few back to back instances where pods OOM errored and didn't come back up, networking rule rollouts dropped some messages, WAL access to an aws aurora db started behaving oddly... It seems that everything that could have gone wrong, did. Not saying Debezium is bad, it's fantastically stable, but often for devs running it becomes a networking skill rather than a coding skill.
As a KISS solution using normal coding solutions that will work 99.99% of the time (and inform you of the .01%) would be:
Start Transaction
Sync save to DB
-> If fail, then bail out.
Async send message to kafka.
Block until the topic reports that it has received the
message.
-> if it times out or fails Abort Transaction.
-> if it succeeds Commit Transaction.
I'd suggest to use a new approach 2-phase message. In this new approach, much less codes are needed, and you don't need Debeziums any more.
https://betterprogramming.pub/an-alternative-to-outbox-pattern-7564562843ae
For this new approach, what you need to do is:
When writing your database, write an event record to an auxiliary table.
Submit a 2-phase message to DTM
Write a service to query whether an event is saved in the auxiliary table.
With the help of DTM SDK, you can accomplish the above 3 steps with 8 lines in Go, much less codes than other solutions.
msg := dtmcli.NewMsg(DtmServer, gid).
Add(busi.Busi+"/TransIn", &TransReq{Amount: 30})
err := msg.DoAndSubmitDB(busi.Busi+"/QueryPrepared", db, func(tx *sql.Tx) error {
return AdjustBalance(tx, busi.TransOutUID, -req.Amount)
})
app.GET(BusiAPI+"/QueryPrepared", dtmutil.WrapHandler2(func(c *gin.Context) interface{} {
return MustBarrierFromGin(c).QueryPrepared(db)
}))
Each of your origin options has its disadvantage:
The user cannot immediately see the database changes it have just created.
Debezium will capture the log of the database, which may be much larger than the events you wanted. Also deployment and maintenance of Debezium is not an easy job.
"built-in auto-retry functionality" is not cheap, it may require much codes or maintenance efforts.

Is there any ACK-like mechanism for PostgreSQLs logical decoding/replication?

Is there a way for a replication client to say whenever they it was able to successfully store the data, or is it that PostgreSQL is streaming pending data to the client and the moment data leave network interface it is considered delivered?
I'd think that client has a chance to say "ACK - I got the data", but I can't seem to find this anywhere... I'm simply wondering what if the client fails to store the data (e.g. due to power failure) - isn't there a way to get it again from Postgres?
General info here https://www.postgresql.org/docs/9.5/static/logicaldecoding.html
I'll answer my own Q.
After doing much more reading, I can say there is ACK-like mechanism there.
Under some conditions (e.g. on interval) server will ask logical replication consumer to report what was the last piece of data that was persisted (i.e. flushed to disk or similar). Then and only then server will treat data up to that reported point delivered for given replication channel.

How concurrency works in mongo db when many reads are trying when there is already write lock in action..?

I think mongodb conceptually allows any number of concurrent reads.
But it typically allows only one write at a time.
Now my question is
What will it happen when one write is in progress and many reads are going to hit the server ?
Will the write lock allow reads to happen
OR
reads are supposed to be waiting untill the write activity completes
Thanks in advance Regards,
UDAY
The mongod process uses a modified reader/writer lock with dynamic yielding on page faults and long operations. Any number of concurrent read operations are allowed, but a write operation can block all other operations.
See documentation for more detail: How does concurrency work
In brief: write blocked all other operation, but write process can be split into multiple parts between them can process read requests

Filtering Redis Hash Entries

I'm using redis to store hashes with ~100k records per hash. I want to implement filtering (faceting) the records within a given hash. Note a hash entry can belong to n filters.
After reading this and this it looks like I should:
Implement a sorted SET per filter. The values within the SET correspond to the keys within a HASH.
Retrieve the HASH keys from the given filter SET.
Once I have the HASH keys from the SET fetch the corresponding entries from the HASH. This should give me all entries that belong to the filter.
Firstly is the above approach correct at a high level?
Assuming the approach is OK the bit I'm missing is what's the most efficient implementation to retrieve the HASH entries? Am I right in thinking once I have the HASH keys I should then use a PIPELINE to queue multiple HGETALL commands passing through each HASH key? Is there a better approach?
My concern about using a PIPELINE is that I believe it will block all other clients while servicing the command. I'll be paging the filtered results with 500 results per page. With multiple browser based clients performing filtering, not to mention the back end processes that populate the SETs and HASHes it sounds like there's potential for a lot of contention if PIPELINE does block. Could anyone provide a view on this?
If it helps I'm using 2.2.4 redis, predis for the web clients and servicestack for the back end.
Thanks,
Paul
Redis is a lock-free non-blocking async server so there is no added contention when using pipelining. Redis hums along happily processing each operation as soon as it receives them so in practice can process multiple pipelined operations. In essence redis-server really doesn't care if the operation is pipelined or not it just processes each operation as it receives them.
The benefit of pipelining is to reduce client latency where instead of waiting for a response from redis-server for each operation before sending the next one, the client can just pump all operations at once in a single write then read back all the responses in a single read.
An example of this in action is in my Redis mini StackOverflow clone each click makes a call to ToQuestionResults() which because the operations are pipelined sends all operations on 1 Socket write call and reads the results in 1 Socket blocking read which is more efficient instead of a blocking read per call:
https://github.com/ServiceStack/ServiceStack.Examples/blob/master/src/RedisStackOverflow/RedisStackOverflow.ServiceInterface/IRepository.cs#L180
My concern about using a PIPELINE is
that I believe it will block all other
clients while servicing the command.
This is not a valid concern and I wouldn't over think how Redis works here, assume it's doing it the most efficiently where Pipelining doesn't block processing of other clients commands. Conceptually you can think that redis-server processes each command (pipelined or not) in FIFO order (i.e. no time is wasted in waiting/reading the entire pipeline).
You're describing something closer to MULTI/EXEC (i.e. Redis Transactions) where all operations are done at once as soon as Redis server reads EXEC (i.e. EOF Transaction). This is not a problem either and redis-server still doesn't waste any time waiting to receive your entire transaction, it just queues the partial commandset in a temporary queue until it receives the final EXEC which is then processed all at once.
This is how redis achieves atomicity by processing each command, one at a time, as soon as it receives them. Since there are no other threads, there is no thread context switching, no locks and no multi-threading issues. It basically achieves concurrency by processing each command really fast.
So in this case I would use Pipelining as it's always a win, more so the more commands you pipeline (as you reduce the blocking read count).
Individual operations do block, but it doesn't matter as they shouldn't be long running. It sounds like you are retrieving more information than you really need - HGETALL will return 100,000 items when you only need 500.
Sending 500 HGET operations may work (assuming the set stores both hash and key) though it's possible that using hashes at all is a case of premature optimization - you may be better off using regular keys and MGET.
I think you misunderstand what pipelining does. It doesn't block while all the commands are being sent. All it's doing is BUFFERING the commands, then executing them all at once at the end, so they are executed as if they are one single command. At no time is blocking occurring. The same is true for redis multi/exec. The closest thing you get to blocking/locking in redis is optimistic locking by using the watch, which will cause exec to fail if the redis key has been written to since you called watch.
Even more efficient that calling hget 500 times within a pipeline block is to just call hmget('hash-key',*keys) where keys is an array of the 500 hash keys you are looking up. This will result in a single call to redis, which is the same as if it was pipelined, but should be faster to execute since you aren't looping in ruby.