Elasticsearch Marvel - Turn off logging - plugins

Is it possible to have Elasticsearch Marvel installed and have it not collect any data, would changing the template to 0 shards and 0 replicates do that for me?

Just add this line to the config/elasticsearch.yml file to stop a node producing data:
marvel.agent.enabled: false
See the configuration docs for Marvel.

marvel.agent.interval: -1
Then restart ElasticSearch. From the docs:
marvel.agent.interval
Controls how often data samples are collected. Defaults to 10s. Set to -1 to temporarily disable data collection. You can update this setting through the Cluster Update Settings API.

Related

How to define max.poll.records (SCS with Kafka) over containers

I'm trying to figure out the poll records mechanism for Kafka over SCS in a K8s environment.
What is the recommended way to control max.poll.records?
How can I poll the defined value?
Is it possible to define it once for all channels and then override for a specific channel?
(referring to this comment form documentation):
To avoid repetition, Spring Cloud Stream supports setting values for
all channels, in the format of
spring.cloud.stream.kafka.default.consumer.=. The
following properties are available for Kafka consumers only and must
be prefixed with
spring.cloud.stream.kafka.bindings..consumer..")
Is this path supported: spring.cloud.stream.binding.<channel name>.consumer.configuration?
Is this: spring.cloud.stream.**kafka**.binding.<channel name>.consumer.configuration?
How are conflicts being resolved? Let's say in a case where both spring.cloud.stream.binding... and spring.cloud.stream.**kafka**.binding... are set?
I've tried all mentioned configurations, but couldn't see in the log what is the actual poll.records and frankly the documentation is not entirely clear on the subject.
These are the configurations:
spring.cloud.stream.kafka.default.consumer.configuration.max.poll.records - default if nothing else specified for given channel
spring.cloud.stream.kafka.bindings..consumer.configuration.max.poll.records

Detect Failover of MongoDB-Cluster with Spring-Data-MongoDB

Current Situation
we have a MongoDB-Cluster with 1 primary node and 2 secondary nodes
our Spring-Boot application is using the Spring-Data-MongoDB framework to read/write
from/to the cluster
Problem
in some circumstances the MongoDB cluster will change the primary node (for example
during the resizing of the cluster)
this fail-over phase will affect our Spring-Boot application
when some reads or writes are still ongoing and the fail-over happens, we receive an
exception, because the mongoDB-Server is not reachable anymore for our application
we have to deal with this state somehow
Questions
1. What is the best way to handle those faile-over states ?
I've come across the following documentation:
retryable writes
retryable reads
would it be sufficient to set the retryReads and retryWrites flag to true and specify the primary node and the secondary nodes in the connection url? Or should we catch the connection-exception (or alternatively listen to some fail-over-event) and handle those cases by ourself ?
we also have to deal with the following problem: what happens if only 50 % of some bulk-write data got successfully written to the primary node and the other 50 % not ? How handle those cases ideally ?
this leads us to the second question ...
2. How to detect the fail-over event in Spring-Boot ?
for our application a possible solution would be to automatically detect the failover state of the MongoDB-Cluster and than just trigger a restart of our Spring-Boot application.
is there a way to listen to a specific MongoDB-event via spring-data-mongodb in order deal with the case that the primary node has changed?
alternatively: is there a specific exception we should catch and handle?
I hope somebody can help us here.
Thank you in advance!

How to configure druid properly to fire a periodic kill task

I have been trying to get druid to fire a kill task periodically to clean up unused segments.
These are the configuration variables responsible for it
druid.coordinator.kill.on=true
druid.coordinator.kill.period=PT45M
druid.coordinator.kill.durationToRetain=PT45M
druid.coordinator.kill.maxSegments=10
From the above configuration my mental model is, once ingested data is marked unused, kill task will fire and delete the segments that are older that 45 mins while retaining 45 mins worth of data. period and durationToRetain are the config vars that are confusing me, not quite sure how to leverage them. Any help would be appreciated.
The caveat for druid.coordinator.kill.on=true is that segments are deleted from whitelisted datasources. The whitelist is empty by default.
To populate the whitelist with all datasources, set killAllDataSources to true. Once I did that, the kill task fired as expected and deleted the segments from s3 (COS). This was tested for Druid version 0.18.1.
Now, while the above configuration properties can be set when you build your image, the killAllDataSources needs to be set through an API. This can be set via the druid UI too.
When you click the option, a modal appears that has Kill All Data Sources. Click on True and you should see a kill task (Ingestion ---> Tasks below) firing in the interval specified. It would be really nice to have this as a part of runtime.properties or some sort of common configuration file that we can set the value in when build the druid image.
Use crontab it works quite well for us.
If you want to have a control outside the druid over the segments removal, then you must use an scheduled task which runs based on your desire interval and register kill-tasks in druid. It can increase your control over your segments, since when they go away, you cannot recover them. You can use this script to accompany you:
https://github.com/mostafatalebi/druid-kill-task

Kafka Jdbc Connect timestamp+incrementing mode

I am using Kafka Jdbc Connect timestamp+incrementing mode to sync a table rows to Kafka. Reference https://docs.confluent.io/current/connect/connect-jdbc/docs/source_config_options.html#mode
The challenge is the table gets synced from the beginning of time since the start time by default is 1970. Is there any way to over ride the start time (i.e) I want to sync only from the beginning of input given date.
You need to set the timestamp.initial to the input given date you desire. It needs to be set in the epoch format.
The SQL query in timestamp+incrementing mode is appended with:
WHERE "DBTime" < 'system high date i.e. 9999-12-12T23:59:59+00:00' AND (("DBTime" = 'timestamp.initial' AND "DBKey" > '-1') OR "DBTime" > 'timestamp.initial') ORDER BY "DBTime","DBKey" ASC
https://docs.confluent.io/kafka-connect-jdbc/current/source-connector/source_config_options.html#mode
In case that you want to start from a given offset with your connector, I'd suggest overwriting the information stored in the connect-offsets topic.
Through the Kafka REST API you can easily read the content of this topic:
http://localhost:8082/topics/connect-offsets
Looking through the code of the kafka-connector-jdbc in the relevant methods for the use case that you've described:
io.confluent.connect.jdbc.source.TimestampIncrementingCriteria#extractValues
io.confluent.connect.jdbc.source.TimestampIncrementingOffset#getTimestampOffset
that overriding the connect-offsets topic content seems to be the only way available at the moment.

Kafka Streams - Low-Level Processor API - RocksDB TimeToLive(TTL)

I'm kind of experimenting with the low level processor API. I'm doing data aggregation on incoming records using the processor API and writing the aggregated records to RocksDB.
However, I want to retain the records added in the rocksdb to be active only for 24hr period. After 24hr period the record should be deleted. This can be done by changing the ttl settings. However, there is not much documentation where I can get some help on this.
how do I change the ttl value? What java api should I use to set the ttl time to 24 hrs and whats the current default ttl settings time?
I believe this is not currently exposed via the api or configuration.
RocksDBStore passes a hard-coded TTL when opening a RocksDB:
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java#L158
and the hardcoded value is simply TTL_SECONDS = TTL_NOT_USED (-1) (see line 79 in that same file).
There are currently 2 open ticket regarding exposing TTL support in the state stores: KAFKA-4212 and KAFKA-4273:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20text%20~%20%22rocksdb%20ttl%22
I suggest you comment on one of them describing your use case to get them moving forward.
In the interim, if you need the TTL functionality right now, state stores are pluggable, and the RocksDBStore sources readily available, so you can fork it and set your TTL value (or, like the pull request associated with KAFKA-4273 proposes, source it from the configs).
I know this is not ideal and sincerely hope someone comes up with a more satisfactory answer.