We have a K8s Cluster (3 Master - 2 Worker) - v1.17
There are 2 Microservice in this cluster, a Microservices A call to Common.
Sometimes, I face the problem is: A call to Common has timeout after 60s - Although this request is processed very quickly in the Common and success ( < 10ms).
getErrorInfoFallback : feign.RetryableException: Read timed out executing GET http://common-service.dev.svc.cluster.local:8002/errormapping/v1.0?errorCode=abcxyz
I use FeignClient to call other Microservice with url like http://common-service.dev.svc.cluster.local:8002
Here is timeline:
- 16:37:42.362 A send request
- 16:37:42.368 Common logging the request
- 16:37:42.378 Common logging respone return
- 16:38:42.424 A: timeout exeption
Could anyone help me?
I have below cloudwatch alarm defined in CF template for altering me on queries running for 30 mins or more.
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: !Sub "awsredshift-${RSClusterName}-QueryDuration"
AlarmDescription: Redshift QueryDuration Alarm
Namespace: AWS/Redshift
MetricName: QueryDuration
Dimensions:
- Name: ClusterIdentifier
Value: !Ref RSClusterName
- Name: latency
Value: long
ActionsEnabled: true
AlarmActions:
- !Ref TopicARN
OKActions:
- !Ref TopicARN
ComparisonOperator: GreaterThanOrEqualToThreshold
DatapointsToAlarm: 1
EvaluationPeriods: 1
Period: 300
Statistic: Average
Threshold: 1800000000
TreatMissingData: missing
But its activating the alarms when there are no queries running that long, am I missing something?
Also is there any way to customize the alarms to put logic in them, I would like to get the SQL text of the query which is running longer. Is there any way to do this via cloudwatch alarms? If not whats the best way to do it - probably lambda?
An alternative approach you could use is to implement a Query Monitoring Rule in Redshift for queries where query_execution_time exceeds 30 minutes and uses the log action to record the details of the query in the STL_WLM_RULE_ACTION table.
This captures all the info you might need about long running queries but doesn't create an alert. However, it's easy enough to set something up yourself to do that, Amazon provide an example solution using Lambda here.
I see that Retry Filter supports retries based on http status codes. I would like to configure retries in case of io exceptions such as connection resets. Is that possible with Spring Cloud Gateway 2?
I was using 2.0.0.RC1. Looks like latest build snapshot has support for retry based on exceptions. Fingers crossed for the next release.
Here is an example that retries twice for 500 series errors or IOExceptions:
filters:
- name: Retry
args:
retries: 2
series:
- SERVER_ERROR
exceptions:
- java.io.IOException
I'm using the following jmx metrics for kafka connect.
Have a look at Connect Monitoring section in the Kafka docs, it lists all the Kafka Connect specific metrics.
For example there are overall metrics for each connector:
kafka.connect:type=connector-metrics,connector="{connector}" which contains a connector status (running, failed, etc)
kafka.connect:type=connector-task-metrics,connector="{connector}",task="{task}" which contains the status of individual tasks
If you want more than just the status, there are also additional metrics for both sink and source tasks:
kafka.connect:type=connector-task-metrics,connector="{connector}",task="{task}"
kafka.connect:type=sink-task-metrics,connector="{connector}",task="{task}"
I still don't have enough rep to comment but I can answer...
Elaborating on Mickael's answer, be careful: currently task metrics disappear when a task is in a failed state rather than show up with the FAILED status. A Jira can be found here and PR can be found here
Connector status is available under kafka.connect:type=connector-metrics.
With jmxterm, you may notice that the attributes are described as doubles instead of strings:
$>info
#mbean = kafka.connect:connector=dev-kafka-connect-mssql,type=connector-metrics
#class name = org.apache.kafka.common.metrics.JmxReporter$KafkaMbean
# attributes
%0 - connector-class (double, r)
%1 - connector-type (double, r)
%2 - connector-version (double, r)
%3 - status (double, r)
$>get status
#mbean = kafka.connect:connector=dev-kafka-connect-mssql,type=connector-metrics:
status = running;
This resulted in WARN logs from my monitoring agent:
2018-05-23 14:35:53,966 | WARN | JMXAttribute | Unable to get metrics from kafka.connect:type=connector-metrics,connector=dev-kafka-connect-rabbitmq-orders - status
java.lang.NumberFormatException: For input string: "running"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at org.datadog.jmxfetch.JMXAttribute.castToDouble(JMXAttribute.java:270)
at org.datadog.jmxfetch.JMXSimpleAttribute.getMetrics(JMXSimpleAttribute.java:32)
at org.datadog.jmxfetch.JMXAttribute.getMetricsCount(JMXAttribute.java:226)
at org.datadog.jmxfetch.Instance.getMatchingAttributes(Instance.java:332)
at org.datadog.jmxfetch.Instance.init(Instance.java:193)
at org.datadog.jmxfetch.App.instantiate(App.java:604)
at org.datadog.jmxfetch.App.init(App.java:658)
at org.datadog.jmxfetch.App.main(App.java:140)
Each monitoring system may have different fixes, but I suspect the cause may be the same?
I've got a task to collect over 500 events from DC with winlogbeat. But windows got a limit 22 events to query. I'm using version 6.1.2. I've tried with processors like this:
winlogbeat.event_logs:
- name: Security
processors:
- drop_event.when.not.or:
- equals.event_id: 4618
...
but with these settings client doesn't work, nothing in logs. If I run it from exe file it just starts and stops with no error.
If I try to do like it was written in the official manual:
winlogbeat.event_logs:
- name: Security
event_id: ...
processors:
- drop_event.when.not.or:
- equals.event_id: 4618
...
client just crashes with "invalid event log key processors found". Also I've tried to create new custom view and take event from there, but apparently it also has query limit to 22 events.