I wrote a simple benchmark that keeps producing messages in a single Flowable. And it ran more and more slower and finally stopped giving a benchmark result. What's the inner reason for this?
Related
I have a parquet I read from disk (20,000 partitions) and the display command df.display() returns almost right away, whereas df.limit(1).display() literally takes hours to execute. I don't understand what is going on here. It is also not only the display() command that is slow, but also a join I would actually like to perform. By contrast, df.show(n=1) returns almost instantaneously.
Limit() runs per partition first, then combines the result into a final result. Since there are 20,000 partitions in your data this takes a lot of time to execute.
One solution to still use limit() is to reduce the number of partitions as in this answer with: df.coalesce(1).limit(1).display(). But this is not recommended as all the data will be sent to the driver, and may cause out of memory exception.
Today our application got launched, meaning it started receiving more traffic than usual. But the increase isn't huge. Double of what it was before at most.
But since a few hours, our Sentry logs are full or errors with code DEADLINE_EXCEEDED. When I look at the trace, all of them refer to read operations, most of them on single documents (no queries, just singe doc reads), for example: const res = await fs.collection('coll').doc('doc').get();
When I google for this error message, I get a lot of results about issues with writing, especially in batches, but barely anything is written to our database, it's almost exclusively reads.
To give an indication of the amount of reads our database has to handle, we've had 1.2M reads in the past 30 days, with a peak of 60k per day, a number which we haven't exceeded yet today (41k).
What could be the issue in our application?
As usual, I find the answer right after posting the question to StackOverflow. What we saw here was a symptom of our VM running out of memory! After scaling up the server, the problem disappeared.
I am using Scala, Reactive Mongo 0.10.5 and Mongo 2.6.4 running on Ubuntu. I have tested on a few machine configurations but right now I am working with 15gb of memory, 2 cores and 60gb of SSD storage (AWS)
I have just set up a test mongo instance and have been using it to benchmark a few things, however I am seeing some inconsistency that I can't explain.
I am writing a consistent amount of data using 10 separate threads to a single collection. Each write consists of a document containing an array which contains 1000 elements. Each element is a complex document consisting of several fields and nested fields. I have tested with arrays of 1000, 10000 and 100 and have seen the same behavior with all. Each write is unique (i.e. I never write to the same document twice)
The write speed tends to be around 100-200ms per write with the current hardware I am using. I would like better but that isn't my main issue.
My main issue is that sometimes the write times will spike. When they do, it can take a single write several seconds to complete. They do eventually complete but it takes a while. I have timeouts built into the app doing the writing (10 seconds) and when the spikes happen it will frequently hit that timeout. I have increased the timeout and verified that the write does eventually complete but it can take a long time (30+ seconds).
I have worked with Mongo before using the Mongo Java Driver in Scala and have not noticed this problem. However it is unclear whether the issue is a result of the driver, or my Mongo setup.
I have looked at the logs and while they report when the query is taking longer, they don't actually provide any information about why it is taking longer. I have done the same with profiling and again they report a long query but don't say why it is long.
I have run mongostat while running and it seems that when the writes start taking a long time I notice a similar slow down in mongostat. I.E. mongostat will pause for several seconds before continuing.
The mongo machine itself is bored while this is happening. Load averages are minimal as are CPU and memory usage. It does not appear to be going into swap.
I suspect I just have something configured incorrectly in the Mongo but I haven't been able to find anything that indicates what.
Has anyone seen this behavior before? Is it something in my configuration or perhaps something with the Reactive Mongo driver?
UPDATE:
Using iostat I was able to determine that the normal writes/second is hitting around 1Mb/second. However during the slow periods it spikes to 6-7Mb/second.
I also found the following in the mongo logs.
[DataFileSync] flushing mmaps took 15621ms for 35 files
[DataFileSync] flushing mmaps took 14816ms for 22 files
In at least one case this log statement corresponds exactly with one of the slow downs.
This definitely seems to be a disk flush problem based on these observations.
Does this imply that I am pushing more data than the current Mongo configuration can handle? Or is there some other configuration that can be done to reduce the impact of those flushes?
It appears that in this case the problem may actually have been related to thread locking within the application itself. Once I resolved the issues with thread locking these other issues seemed to go away.
To be honest I don't know why thread locking would result in the observed behavior in Mongo, but if the problem is gone I am not going to complain.
I need to write a program that performs a parallel search in a large space of possible states, with new areas being discovered (and their exploration started) in the process, and exploration of some areas being terminated early as intermediate results obtained elsewhere eliminate a possibility of discovering new useful results in them. The search is performed using multiple threads running in a heavy cooperation with each other to avoid recalculation of intermediate data.
A complex internal state (including call stacks of several threads and state synchronization primitives they use) has to be maintained and updated during the whole process, and there is no apparent way to split the computation into isolated chunks that can be executed sequentially, each saving and passing a small intermediate result to the next. Also, there is no way to split the computation into independent parallel threads not communicating with each other, without imposing a prohibitive overhead due to recalculation of large amount of intermediate data.
Because of the large search domain, the program would possibly run for months before producing a final result. Hence, there is a significant risk of power, hardware or OS failure during the program execution that can lead to a complete loss of all work that has been done to the moment. In such a case the program will need to restart all its computations from scratch.
I need a solution that can prevent a complete data loss in such cases. I thought of an execution engine/platform that continuously saves the current state of the process to a failure-resistant storage like a redundant disk array or database. But I understand that this approach can significantly slow down the process, even to a degree when there would be no benefit compared to an expected computation time including restarts due to possible failures.
In fact, I do not need an ideal solution that continuously saves the program state, and I can easily bear a loss of hours or maybe even days of work. A possible heavyweight solution that comes to my mind is to run the program inside a virtual machine, saving its snapshots from time to time, and restoring the machine after a possible host failure from a recent snapshot. This approach can also help to recover the program state after a random or preventable guest OS failure.
Is there a similar, but more lightweight solution limited to preserving a state of a single process? Or could you suggest any other approaches that can solve my problem?
You may want to look at using Erlang which allows large numbers of threads to run at relatively low cost. Because the thread cost is low, redundancy can be used to achieve increased reliability.
For the problem you present, a triple-redundancy scheme may be the way to go, where periodic checks for synchronization across the three (or more) systems would determine by vote who has failed.
The short version of the question: how to build a fail-safe word count program (topology) in Twitter Storm that produces accurate results even when failure occurs? Is that even possible?
Long version: I am studying Twitter Storm and trying to understand how it should be used. I have followed the tutorial and find it a very simple concept. But the word count example outlined in the tutorial is not fault tolerant (because bolts save some data in memory). Saving the same data in back-end DB however leads to double counting if an event is re-submitted to the start of chain (which happens when some of the bolts fail).
Should I see Twitter Storm as real-time platform for producing partially accurate results and still depend on MapReduce to get the accurate ones?
It really depends on what kind of failure your trying to hege against. There are a few things that you can do:
Storm bolts are supposed to ack a tuple only after they have processed it. If you write your spouts and bolts and topology to use this, you can implement an "exactly one time" system which will guarantee accuracy.
Kafka can be a good way to put data into Storm because it uses disk persistance to keep messages around for a long time even after they are consumed. This means you can retrieve them if there's a failure by a consumer down the line.
In general though, it's difficult to guarantee that things are processed exactly once in any streaming system. This is a known problem, and it is a very difficult problem to solve efficiently.
Storm has the concept of transactional topologies. In practice, this means you will want to process items in batches, then commit to your database at the end of the batch, storing the transaction ID in the database alongside a count. This also has the practical benefit of reducing the load on your database with fewer inserts.
Batches are processed in parallel and may be replayed on failure, but are guaranteed to be committed in order. This is important because it makes it safe to write code that fetches the current count row, checks the transaction ID against the one in memory, and if the two differ (meaning it is an uncommitted batch), adding the new count to the existing one and committing that updated count.
See the following link for much more information and code examples:
https://github.com/nathanmarz/storm/wiki/Transactional-topologies