can I set the limit to avoid disk full - druid

Is there any way that I can set the limit on historical node? getting error message that says "disk full" frequently, can I fix this?
Druid version I am using is 0.18.1
Thank you.

druid doesn't have resource quotas yet (disk, cpu, etc). Maybe in the future. You'd need to monitor disk usage over time in some way.

In the historical runtime properties we set the "freeSpacePercent" argument to leave operational room for the historical processes. In our experience this is the only thing that is honored.
Other historical runtime properties like maxSize and druid.server.maxSize are only used as guidance.
druid.segmentCache.locations=[{"path":"/mnt/druid/segment-cache","maxSize":400000000000, "freeSpacePercent": 5.0}]

Related

High CPU usage on Cloud SQL causing timeouts

We have a postres database that has billions of records in it.
We have one client that uses our older API to query the database to fetch thousands of records once a day.
I would say close to the top end of the thousands.
The API is currently on a compute engine behind a load balancer and during the allotted time I spin up 6 instances of this to attempt to help handle the load.
What I have found is that the CPU usage on cloud SQL is maxing out at 100% and most of the other stats are fine, it's just the CPU.
This basically renders our API useless as we can't accept connections and it just shits its self.
What can we do to help this?
Here is the CPU utilisation chart
And the connections
Read/Writes
Memory Usage
You can see in most of the other charts the readings are well within normal for what we expect.
I don't really want to have to beef up the CPU usage if it isn't really the actual underlining problem.
A further thing to note is we have developed a new endpoint for this client specifically to use, they have not got that in place yet, and there is no guarantee that it will reduce the db load.
High CPU usage can most definitely cause dropped or ignored connections. The database engine and underlying OS are fighting for resources and aren't able to respond to the connection in time.
While you can increase CPU usage, it looks like the CPU usage you have it (usually) enough, except during parts where the CPU is at 100%. I'd suggest instead finding out why the query is eating so much CPU usage and optimizing it.
You might be interested in something like Cloud SQL Insights to help debug the query.

Apache Kafka persist all data

When using Kafka as an event store, how is it possible to configure the logs never to lose data (v0.10.0.0) ?
I have seen the (old?) log.retention.hours, and I have been considering playing with compaction keys, but is there simply an option for kafka never to delete messages ?
Or is the best option to put a ridiculously high value for the retention period ?
You don't have a better option that using a ridiculously high value for the retention period.
Fair warning : Using an infinite retention will probably hurt you a bit.
For example, default behaviour only allows a new suscriber to start from start or end of a topic, which will be at least annoying in an event sourcing perspective.
Also, Kafka, if used at scale (let's say tens of thousands of messages per second), benefits greatly for high performance storage, the cost of which will be ridiculously high with an eternal retention policy.
FYI, Kafka provides tools (Kafka Connect e.g) to easily persist data on cheap data stores.
Update: It’s Okay To Store Data In Apache Kafka
Obviously this is possible, if you just set the retention to “forever”
or enable log compaction on a topic, then data will be kept for all
time. But I think the question people are really asking, is less
whether this will work, and more whether it is something that is
totally insane to do.
The short answer is that it’s not insane, people do this all the time,
and Kafka was actually designed for this type of usage. But first, why
might you want to do this? There are actually a number of use cases,
here’s a few:
People concerned with data replaying and disk cost for eternal messages, just wanted to share some things.
Data replaying:
You can seek your consumer consumer to a given offset. It is possible even to query offset given a timestamp. Then, if your consumer doesn't need to know all data from beginning but a subset of the data is enough, you can use this.
I use kafka java libs, eg: kafka-clients. See:
https://kafka.apache.org/0101/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#offsetsForTimes(java.util.Map)
and
https://kafka.apache.org/0101/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seek(org.apache.kafka.common.TopicPartition,%20long)
Disk cost:
You can at least minimize disk space usage a lot by using something like Avro (https://avro.apache.org/docs/current/) and compation turned on.
Maybe there is a way to use symbolic links to separate between file systems. But that is only an untried idea.

Scaling DB2 to increase tps

I wanna have brainstorm with you guys all about scaling option that DB2 have. Hope can helped me to resolve the problem.
I need to scale my DB2 database to anticipate flash crowd transaction to database server. My database can only serve around 200 transaction++ per sec in application term not database tps before my database totally stalled and out of cpu.
What are you guys think, if I want to increase to reach 2000++ or 10 times before, what options i have to scale my database?
Recently i read about pureScale feature. Its look promising but its not flexible solution by mean it just can be deploy on IBM System X and ours is not. Are there other solution like pureScale in shared-everything approach?
The second option maybe database partition. Is database partition or shared-nothing approach can help resolve my problem? Can add processing power to my system?
Thanks and regards,
Fritz
Before you worry about how to scale up (more hardware in 1 server) or out (more servers), look at how to tune your database. Buying your way out of a performance problem is almost always more expensive than spending time to find and fix the performance problem.
Assuming that the process(es) consuming CPU on your database server are the database engine, then high CPU activity and low I/O activity is indicative that you're doing a LOT of reads, but they are just all in memory. Scanning a huge table is still in inefficient, even if that table is stored completely in memory (buffer pools).
Find the SQL statements that are using the most CPU. Look at the explain plans, and figure out how to make them more efficient. There are LOTS of resources on the web for database performance tuning.

Reduce Membase quota per bucket to 5 MB

In Heroku, I notice that they limit my free Memcached Bucket (actually Membase) to 5MB. However, I tried it on my own server and cannot set Bucket quota to less than 64MB (per node, and for Memcached bucket type). For Membase bucket type, it's even more: 100MB.
Hmm, my server have a humble amount of RAM. And I need to allocate a very small amount of Memcached. Please advice.
Heroku is running a slightly modified version of our memcached software that lets them keep the bucket overhead very low. Unfortunately the "productized" version has some limits imposed to prevent the software from getting itself into trouble.
Especially for Membase buckets, we need at least 100mb in order to safely run.
You may be able to reduce/eliminate these limits if you recompile the source, but that wouldn't be a supported configuration.
Perry
Sorry for the delay in getting back to this...
As with any piece of software, there are internal data structures that need RAM to run...that's what gets allocated immediately with Membase.
If you install memcached, it will use as much RAM as you configure it to use...no more, no less.

Slony-I replication CPU usage

I have recently had to install slony (version 2.0.2) at work. Everything works fine, however, my boss would like to lower the cpu usage on slave nodes during replication. Searching on the net does not reveal any blatantly obvious answers to this. Any suggestions that would help reduce CPU usage (or spread the update out over a longer period) would be very much appreciated!
Have you looked into general PostgreSQL tuning here? The server can waste a lot of CPU cycles doing redundant work if it's not given enough resources to work with, and the default config is extremely small. Tuning Your PostgreSQL Server is a useful guide here, shared_buffers and checkpoint_segments are the two parameters you might get some significant improvement from on a slave (many of the rest only really help for improving query time).
Magnus might be right, this could very well just be a symptom of the fact that your database has very high traffic. Slony effectively multiplies the resource usage of any given DML operation: not only is data CRUD'ed to the replication master, but every time that happens, a Slony trigger (think of it as a change listener) generates an identical transaction and forwards it to the Slon process, which runs it on other members of the cluster.
However, there are two other possible explanations/solutions to this issue:
A possible solution might be to run the slon processes on a separate machine from your database hosts. Even if you have a single-master/single-slave replication scheme, it is advantageous in terms of stability, role-segregation, and performance (that’s you) to run the slon replication daemons on a physically different set of hardware (on the same LAN segment, ideally). There is nothing about Slony that says it has to run on the same machine as a given database host, so putting it in a different location (think “traffic controller”) might relieve some of the resource load on your database hosts. This is also a good idea in terms of both machine stability and scalability.
There's also a chance that this is only a temporary problem caused by the fact that you recently started using Slony. When you first subscribe a new node to a replication set, that node (and, to some extent, its parent) experiences VERY heavy CPU load (and possibly disk load as well) during the subscription process. I'm not sure how it works under the covers, but, depending on how much data was already on the node subscribed, Slony will either check the master’s data against every single piece of data present on the slave in tables that are replicated, and copy data down to the slave if it is missing or different. These are potentially CPU-intensive operations. Especially in large databases, the process of subscription can take a very long time (it took over a day for me, but our database is over 20GB), during which CPU load will be very high. A simple way to see what Slony is up to is to use pgAdmin’s Server Status viewer, which, while limited, will give you some useful info here. If there are a lot of “prepare table for replication” or “cleanup table after replication” operations in progress on the node that has a high CPU load, it’s probably because a subscription isn’t complete. pgAdmin’s status viewer isn’t too informative, however; there are more reliable ways of checking subscription progress using Slony directly. Section 4.7.6.4 in the Slony log-analysis documentation might help with that, as would reading the doc for SUBSCRIBE SET (pay special attention to the boxed warning message, and the "Dangerous/Unintuitive Behavior" section. A simple yet definitive hack to tell whether a set is still in the process of subscriptions is to run a MERGE SET and try to merge it with an empty (or not) other set. MERGE SET will fail with a "subscriptions in progress" error if subscription is still running. However, that hack won't work on Slony 2.1; MERGE SET will just wait until subscriptions are finished.
The best way to reduce the CPU usage would be to put less data into the database :-)
Other than that, you can experiment with sync_interval. It may be what you're looking for.