How can I avoid "Data too large" in ELK / elasticsearch bulk inserts? - perl

I'm sending data daily to my elk-stack via https://metacpan.org/pod/Search::Elasticsearch::Client::7_0::Bulk
Sometimes it happens, more often recently, that I receive a "Data too large" error. The first part of my data was received, but after this error my sending script stops and I end up with incomplete data.
As far as I understood, correct me if I'm wrong, this happens when my stack is experiencing memory issues while processing the data it already received. I assume that, after some time, I could send the rest of the data, because the next day, the same issue occurs: The first bunch of my data is processed, the rest rejected with "Data too large".
I saw that I can add an "on-error" callback, but I have no clue what I can do in it. My idea would be to implement a delay and retry after some time.
Can anyone give me have a hint how to achieve it?
Are there any ideas how to avoid the issue in the first place? I already increased heap space some time ago, but after 2 month the issue reoccured.

you'd need to check your Elasticsearch logs and the full response that Elasticsearch sends back (eg was it a 429?). however heap pressure can cause this, and you'd probably need to dig into why you are experiencing that
the other option is to reduce the size of your requests that you are sending

Update Remembering my "experience" with Java I simply did a restart of my ELK stack and the next import went through smoothly.
So despite the fact that 512m memory seem a bit low, it worked after a restart. Will check again today and then.
Increase memory
Schedule a nightly restart

Related

Kafka Streams "Suppressed" feature causes OOM / heavy GC

I use Kafka Streams 2.1 and created the following stream using Suppressed feature to process the aggregation of each whole minute:
originStream
.windowedBy(TimeWindows.of(Duration.ofSeconds(60)).grace(Duration.ofMillis(500)))
.aggregate(factory::createAggregation,
(k, v, a) -> a.aggregate(v),
materialized.withLoggingDisabled())
.suppress(untilWindowCloses(Suppressed.BufferConfig.unbounded()))
.toStream();
The rate of messages I receive is about 200 per second.
After a short time I see the GC starting to work very hard, and sometimes OOM errors.
Since I use a heap of 2GB and a record will not take more than 1KB, it is clear to me that something is wrong - there shouldn't be so many messages in a window of 1 minute to explode a 2GB heap.
So.. I took a heap dump, in which I see 5 InMemoryTimeOrderedKeyValueBuffer Objects taking more than 300MB each (total >1.5GB).
I dived some more into one of those, and saw that the smallest/highest timestamp in their sortedMap was 1,575,458,160,000/1,575,481,800,000. This means that the buffer holds messages during a period of 23,640,000 = 394 minutes.
To my understanding the buffer was supposed to be flushed, so that only the last minute will consume the memory - all other windows should have been evicted.
Am I doing something wrong?
Any help would be appriciated.
The problem should not be suppress() but the aggregation state store. By default, it has a retention time of 1 day. You can reduce the retention time by passing in Materialized.withRetention(...) into aggregate().
I am surprised that your heap dump shows InMemoryTimeOrderedKeyValueBuffer though, because this is the store used by suppress(). Hence, I am not 100% sure if reducing the retention time will fix the issue.
Btw: that there are a few bugs in suppress() in 2.1 version that are only fixed in 2.3 release and thus it's highly recommended to upgrade to 2.3 if you use suppress().
I've changed The BufferConfig to use max-bytes boundary:
Suppressed.BufferConfig.unbounded().withMaxBytes(10_000_000)
and that seem to solve the problem. I looked at the code, and don't understand why - because I see it now should have thrown an exception, but it doesn't.
So, I still don't understand something here, but the problem is solved for now.
After that I used Mattias J. Sax suggestions too, just to be even safer (Thanks).
Edit:
It happened again twice today. This means that what I did did not fix the problem (Although it may have changed its frequency).
Right now, I have no solution for this problem.

The fast way to execute rest requests that require incremented value (nonce)

I'm working with Rest Api that requires an incremented parameter to be sent with each request. I use unix miliseconds as nonce and originally naively sent requests one after another but even if I send one message before another, they can arrive in a reversed order which results in an error.
One solution could be sending the next request only after the previous one got back. But it would be too slow. I'm thinking about less strict solution like measuring latency over the last 10 requests and waiting for x% of latency before sending the next message. I feel like this problem should've been already solved but can't find any good reference. Would appreciate any advice.

Mongo Connection Count creeping up one per 10 second with mgo driver

We monitor our mongoDB connection count using this:
http://godoc.org/labix.org/v2/mgo#GetStats
However, we have been facing a strange connection leak issue where the connectionCount creeps up consistently by 1 more open connection per 10 seconds. (That's regardless whether there is any requests). I can spin up a server in localhost, leave it there, do nothing, the conectionCount will still creep up. Connection count eventually creeps up to a few thousand and it kills the app/db then and we have to restart the app.
This might not be enough information for you to debug. Does anyone have any ideas, connection leaks that you have dealt with in the past. How did you debug it? What are some of the way that I can debug this.
We have tried a few things, we scanned our code base for any code that could open a connection and put counters/debugging statements there, and so far we have found no leak. It is almost like there is a leak in a library somewhere.
This is a bug in a branch that we have been working on and there have been a few hundred commits into it. We have done a diff between this and master and couldn't find why there is a connection leak in this branch.
As an example, there is the dataset that I am referencing:
Clusters: 1
MasterConns: 9936 <-- creeps up 1 per second
SlaveConns: -7359 <-- why is this negative?
SentOps: 42091780
ReceivedOps: 38684525
ReceivedDocs: 39466143
SocketsAlive: 78 <-- what is the difference between the socket count and the master conns count?
SocketsInUse: 1231
SocketRefs: 1231
MasterConns is the number that creeps up one per 10 second. I am not entirely sure what the other numbers can mean.
MasterConns cannot tell you whether there's a leak or not, because it does not decrease. The field indicates the number of connections made since the last statistics reset, not the number of sockets that are currently in use. The latter is indicated by the SocketsAlive field.
To give you some additional relief on the subject, every single test in the mgo suite is wrapped around logic that ensures that statistics show sane values after the test finishes, so that potential leaks don't go unnoticed. That's the main reason why such statistics collection system was introduced.
Then, the reason why you see this number increasing every 10 seconds or so is due to the internal activity that happens to learn the status of the cluster. That said, this behavior was recently changed so that it doesn't establish new connections and instead picks existent sockets from the pool, so I believe you're not using the latest release.
Having SlaveConns negative looks like a bug. There's a small edge case about statistics collection for connections made, because we cannot tell whether a given server is a master or a slave before we've talked to it, so there might be an uncovered path. If you still see that behavior after you upgrade, please report the issue and I'll be happy to look at it.
SocketsInUse is the number of sockets that are still being referenced by one or more sessions, whether they are alive (the connection is established) or not. SocketsAlive is, again, the real number of live TCP connections. The delta between the two indicates that a number of sessions were not closed. This may be okay, if they are still being held in memory by the application and will eventually be closed, or it may be a leak if a session.Close operation was missed by the application.

Does MongoDB fail silently if I don't check error codes?

I'm wondering if any persistence failure will go undetected if I don't check error codes? If so, what's the right way to write fast (asynchronously) while still detecting errors?
If you don't check for errors, your update is only fireAndForget. You'll indeed miss all errors which could arise. Please see MongoDB WriteConcerns for the available write modes in MongoDB (sorry I always fail to find the official, non driver related documentation, I really should bookmark it).
So with NORMAL you'll get at least connectivity errors, with NONE no exceptions at all. If you want to be informed of exceptions you have to use one of the other modes, which differ only in the persistence guarantee they give you.
You can't detect errors when running asynchronous, as this is against the intention. Your connection which sent the write operation, may be already closed or reused, so you can't sent it through that connection. Further more only your actual code knows what to do if it fails. As mongoDB doesn't offer some remote procedure call to asynchronous inform you of updates you'll have to wait until the write finished to a given stage.
So the fastest, but most unrelieable is SAFE, where the write only happened to memory. JOURNAL gives you the security that it was written at least to disk. With FSYNC you'll have those changes persisted on your db on disk. REPLICA that a least two replicas have written it, and MAJORITY that more than half of your replicas have written it(by three replicas which should be the default this doesn't differ).
The only chance I see to have something like asynchronous, is to have a separate Thread who is performing all write operations synchronous. This thread you could handle the actual update as well as a class which is called in case of a failure to perform the needed operations to handle this failure. But I don't think that this is good application design.
Yes, depending on the error, it can fail silently if you don't check the returned error code. It's necessary to wait for error checking. Your only other option would be for your app to occasionally tell the user "oops, remember when I acted like I saved your data a moment ago? Well, not really."

Memcache-based message queue?

I'm working on a multiplayer game and it needs a message queue (i.e., messages in, messages out, no duplicates or deleted messages assuming there are no unexpected cache evictions). Here are the memcache-based queues I'm aware of:
MemcacheQ: http://memcachedb.org/memcacheq/
Starling: http://rubyforge.org/projects/starling/
Depcached: http://www.marcworrell.com/article-2287-en.html
Sparrow: http://code.google.com/p/sparrow/
I learned the concept of the memcache queue from this blog post:
All messages are saved with an integer as key. There is one key that has the next key and one that has the key of the oldest message in the queue. To access these the increment/decrement method is used as its atomic, so there are two keys that act as locks. They get incremented, and if the return value is 1 the process has the lock, otherwise it keeps incrementing. Once the process is finished it sets the value back to 0. Simple but effective. One caveat is that the integer will overflow, so there is some logic in place that sets the used keys to 1 once we are close to that limit. As the increment operation is atomic, the lock is only needed if two or more memcaches are used (for redundancy), to keep those in sync.
My question is, is there a memcache-based message queue service that can run on App Engine?
I would be very careful using the Google App Engine Memcache in this way. You are right to be worrying about "unexpected cache evictions".
Google expect you to use the memcache for caching data and not storing it. They don't guarantee to keep data in the cache. From the GAE Documentation:
By default, items never expire, though
items may be evicted due to memory
pressure.
Edit: There's always Amazon's Simple Queueing Service. However, this may not meet price/performance levels either as:
There would be the latency of calling from the Google to Amazon servers.
You'd end up paying twice for all the data traffic - paying for it to leave Google and then paying again for it to go in to Amazon.
I have started a Simple Python Memcached Queue, it might be useful:
http://bitbucket.org/epoz/python-memcache-queue/
If you're happy with the possibility of losing data, by all means go ahead. Bear in mind, though, that although memcache generally has lower latency than the datastore, like anything else, it will suffer if you have a high rate of atomic operations you want to execute on a single element. This isn't a datastore problem - it's simply a problem of having to serialize access.
Failing that, Amazon's SQS seems like a viable option.
Why not use Task Queue:
https://developers.google.com/appengine/docs/python/taskqueue/
https://developers.google.com/appengine/docs/java/taskqueue/
It seems to solve the issue without the likely loss of messages in Memcached-based queue.
Until Google impliment a proper job-queue, why not use the data-store? As others have said, memcache is just a cache and could lose queue items (which would be.. bad)
The data-store should be more than fast enough for what you need - you would just have a simple Job model, which would be more flexible than memcache as you're not limited to key/value pairs