How to distinct key and calculator on Flink + Pulsar - scala

I am a newbie, I am working on Flink and Pulsar.
I have a task about calculator distinct data from pulsar on SlidingProcessingTimeWindows of Flink
the my windowSize: 60s and windows slide : 5s
my data consumer from topic pulsar on every seconds (recieved 2 message/ 1 seconds) :
00:
- a.example.com
- a.example-2.com
---
01:
- b.example.com
- a.example-2.com
---
02:
- c.example.com
- a.example-2.com
---
03:
- a.example.com
- a.example-2.com
---
04:
- b.example.com
- a.example-2.com
How to group key and calculator to receive results:
example.com => 3
example-2.com => 1
I have taken much time for research about that, but I can not resolve it.
And I have a problem, when the first window slide run, my job received all data from this current time to the past, I only received data by the window size.

Flink SQL is a good fit for this.
SELECT window_start, window_end, domain, COUNT(*)
FROM TABLE(
HOP(TABLE Events, DESCRIPTOR(time), INTERVAL '5' SECONDS, INTERVAL '60' SECONDS))
GROUP BY window_start, window_end, domain;
You can execute SQL directly from Scala via TableEnvironment#executeSql or you can use the Table API.

Related

K8s inter service communication timeout

We have a K8s Cluster (3 Master - 2 Worker) - v1.17
There are 2 Microservice in this cluster, a Microservices A call to Common.
Sometimes, I face the problem is: A call to Common has timeout after 60s - Although this request is processed very quickly in the Common and success ( < 10ms).
getErrorInfoFallback : feign.RetryableException: Read timed out executing GET http://common-service.dev.svc.cluster.local:8002/errormapping/v1.0?errorCode=abcxyz
I use FeignClient to call other Microservice with url like http://common-service.dev.svc.cluster.local:8002
Here is timeline:
- 16:37:42.362 A send request
- 16:37:42.368 Common logging the request
- 16:37:42.378 Common logging respone return
- 16:38:42.424 A: timeout exeption
Could anyone help me?

cloudwatch alarm for redshift queryduration

I have below cloudwatch alarm defined in CF template for altering me on queries running for 30 mins or more.
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: !Sub "awsredshift-${RSClusterName}-QueryDuration"
AlarmDescription: Redshift QueryDuration Alarm
Namespace: AWS/Redshift
MetricName: QueryDuration
Dimensions:
- Name: ClusterIdentifier
Value: !Ref RSClusterName
- Name: latency
Value: long
ActionsEnabled: true
AlarmActions:
- !Ref TopicARN
OKActions:
- !Ref TopicARN
ComparisonOperator: GreaterThanOrEqualToThreshold
DatapointsToAlarm: 1
EvaluationPeriods: 1
Period: 300
Statistic: Average
Threshold: 1800000000
TreatMissingData: missing
But its activating the alarms when there are no queries running that long, am I missing something?
Also is there any way to customize the alarms to put logic in them, I would like to get the SQL text of the query which is running longer. Is there any way to do this via cloudwatch alarms? If not whats the best way to do it - probably lambda?
An alternative approach you could use is to implement a Query Monitoring Rule in Redshift for queries where query_execution_time exceeds 30 minutes and uses the log action to record the details of the query in the STL_WLM_RULE_ACTION table.
This captures all the info you might need about long running queries but doesn't create an alert. However, it's easy enough to set something up yourself to do that, Amazon provide an example solution using Lambda here.

How can one configure retry for IOExceptions in Spring Cloud Gateway?

I see that Retry Filter supports retries based on http status codes. I would like to configure retries in case of io exceptions such as connection resets. Is that possible with Spring Cloud Gateway 2?
I was using 2.0.0.RC1. Looks like latest build snapshot has support for retry based on exceptions. Fingers crossed for the next release.
Here is an example that retries twice for 500 series errors or IOExceptions:
filters:
- name: Retry
args:
retries: 2
series:
- SERVER_ERROR
exceptions:
- java.io.IOException

Which jmx metric should be used to monitor the status of a connector in kafka connect?

I'm using the following jmx metrics for kafka connect.
Have a look at Connect Monitoring section in the Kafka docs, it lists all the Kafka Connect specific metrics.
For example there are overall metrics for each connector:
kafka.connect:type=connector-metrics,connector="{connector}" which contains a connector status (running, failed, etc)
kafka.connect:type=connector-task-metrics,connector="{connector}",task="{task}" which contains the status of individual tasks
If you want more than just the status, there are also additional metrics for both sink and source tasks:
kafka.connect:type=connector-task-metrics,connector="{connector}",task="{task}"
kafka.connect:type=sink-task-metrics,connector="{connector}",task="{task}"
I still don't have enough rep to comment but I can answer...
Elaborating on Mickael's answer, be careful: currently task metrics disappear when a task is in a failed state rather than show up with the FAILED status. A Jira can be found here and PR can be found here
Connector status is available under kafka.connect:type=connector-metrics.
With jmxterm, you may notice that the attributes are described as doubles instead of strings:
$>info
#mbean = kafka.connect:connector=dev-kafka-connect-mssql,type=connector-metrics
#class name = org.apache.kafka.common.metrics.JmxReporter$KafkaMbean
# attributes
%0 - connector-class (double, r)
%1 - connector-type (double, r)
%2 - connector-version (double, r)
%3 - status (double, r)
$>get status
#mbean = kafka.connect:connector=dev-kafka-connect-mssql,type=connector-metrics:
status = running;
This resulted in WARN logs from my monitoring agent:
2018-05-23 14:35:53,966 | WARN | JMXAttribute | Unable to get metrics from kafka.connect:type=connector-metrics,connector=dev-kafka-connect-rabbitmq-orders - status
java.lang.NumberFormatException: For input string: "running"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at org.datadog.jmxfetch.JMXAttribute.castToDouble(JMXAttribute.java:270)
at org.datadog.jmxfetch.JMXSimpleAttribute.getMetrics(JMXSimpleAttribute.java:32)
at org.datadog.jmxfetch.JMXAttribute.getMetricsCount(JMXAttribute.java:226)
at org.datadog.jmxfetch.Instance.getMatchingAttributes(Instance.java:332)
at org.datadog.jmxfetch.Instance.init(Instance.java:193)
at org.datadog.jmxfetch.App.instantiate(App.java:604)
at org.datadog.jmxfetch.App.init(App.java:658)
at org.datadog.jmxfetch.App.main(App.java:140)
Each monitoring system may have different fixes, but I suspect the cause may be the same?

How to collect more than 22 event ids with winlogbeat?

I've got a task to collect over 500 events from DC with winlogbeat. But windows got a limit 22 events to query. I'm using version 6.1.2. I've tried with processors like this:
winlogbeat.event_logs:
- name: Security
processors:
- drop_event.when.not.or:
- equals.event_id: 4618
...
but with these settings client doesn't work, nothing in logs. If I run it from exe file it just starts and stops with no error.
If I try to do like it was written in the official manual:
winlogbeat.event_logs:
- name: Security
event_id: ...
processors:
- drop_event.when.not.or:
- equals.event_id: 4618
...
client just crashes with "invalid event log key processors found". Also I've tried to create new custom view and take event from there, but apparently it also has query limit to 22 events.