How can I get zookeeper to manage range of number say 1 to 50, 51 to 100 etc to clients C1, C2,C3 as and when more clients connect into the group and get a range assigned?
Related
For each "Job Number" group there are multiple "Quantity" records. I only require one "Quantity" record per "Job Number", so I have tried to achieve this by using an Average of the "Quantity". This works in isolation at Group Level, however I need to then show an average of "Quantity" at a higher group level
For example, Job Number 100000 has 4 records in the detail level with a Quantity of 5000 for each record, Job Number 100001 has 2 records in the detail level with a Quantity of 2000 for each record
Using the average function, the group level shows Job Number 100000 with one quantity record of 5000 and Job Number 100001 with one quantity record of 2000
At the higher group level, the average should be 3,500 (7,000 / 2) but it actually displays as 4,000 (24,000 / 6)
The formula should be a Sum of the Average Quantity / Distinct Count of Job Number
For the life of me I can't get this to work
Any help appreciated
Thank you
Create a Running Total and set the evaluate condition to 'On Change of Group':
Hi I have created a stream with has the following values from the topic,
"id VARCHAR, src_ip VARCHAR, message VARCHAR"
Now I need to see if failed_login repeates more than 3 times in given time then raise an alert. So I have created a table as below,
CREATE TABLE 231_console_failure AS \
SELECT src_ip, count(*) \
FROM console_failure \
WINDOW TUMBLING (SIZE 30 SECONDS) \
WHERE message = 'failed_login' \
GROUP BY src_ip \
HAVING count(*) > 3;
Now when I use my python script to consume from the topic as '231_console_failure' then I get a None continously when there is no match
And when there is a match i.e more that 3 in 30 sec then it gives that value. But say if there are 10 attempt in 30 sec then the consumer fetches 7 messages where each message differ with count from 4 to 10.
I know I can handle this in script by avoiding the None and take only higher count in given time. But is there any way to create a stream from the above table which will have only matched messages with groupby in KSQL?
This isn't currently possible in KSQL, but there is an enhancement request open if you want to upvote/track it: https://github.com/confluentinc/ksql/issues/1030
For now, per the same ticket, you can experiment with cache.max.bytes.buffering and commit.interval.ms to vary how often the aggregate is emmited.
I am trying to determine the total amount of messages sent from my G Suite account. I am streaming my logs to BigQuery.
Messages may appear in the logs multiple times so I can count the distinct message headers.
SELECT
count (DISTINCT message_info.rfc2822_message_id) as MessageIDCount
FROM `my data set`
This would count total inbound and outbound, so if I only want outbound I can add WHEN message_info.message_set.type = 8.
But the Sending Limits for gmail https://support.google.com/a/answer/2956491#sendinglimitsforrelay
The sending limits are not just total messages sent it counts based on the number of recipients.
I am interested in running a query that will provide total number of messages sent where if I sent an email to 2 people it would count as 2, If I sent to 10 it would count as 10, etc.
Essentially I want to determine how close I was to hitting sending limits for a given day.
Any Suggestions?
Below is for BigQuery Standard SQL
#standardSQL
SELECT
DATE(TIMESTAMP_MICROS(event_info.timestamp_usec)) day,
COUNT(address) AS total_recipients
FROM `project.dataset.gmail_log`,
UNNEST(message_info.destination) AS destination
WHERE EXISTS (SELECT 1 FROM UNNEST(message_info.message_set) WHERE type = 8)
GROUP BY day
it will return daily total number of recipients in below format
Row day total_recipients
1 2019-01-10 100
2 2019-01-11 100
3 2019-01-12 100
4 2019-01-13 100
5 2019-01-14 100
I want to count number of unique label values. Kind of like
select count (distinct a) from hello_info
For example if my metric 'hello_info' has labels a and b. I want to count number of unique a's. Here the count would be 3 for a = "1", "2", "3".
hello_info(a="1", b="ddd")
hello_info(a="2", b="eee")
hello_info(a="1", b="fff")
hello_info(a="3", b="ggg")
count(count by (a) (hello_info))
First you want an aggregator with a result per value of a, and then you can count them.
Other example:
If you want to count the number of apps deployed in a kubernetes cluster based on different values of a label( ex:app):
count(count(kube_pod_labels{app=~".*"}) by (app))
The count(count(hello_info) by (a)) is equivalent to the following SQL:
SELECT
time_bucket('5 minutes', timestamp) AS t,
COUNT(DISTINCT a)
FROM hello_info
GROUP BY t
See time_bucket() function description.
E.g. it returns the number of distinct values for a label per each 5-minute interval by default - see staleness docs for details about 5-minute interval.
If you need to calculate the number of unique values for a label over custom interval (for example, over the last day), then the following PromQL query must be used instead:
count(count(last_over_time(hello_info[1d])) by (a))
The custom interval - 1d in the case above - can be changed to an arbitrary value - see these docs for possible values, which can be used there.
This query uses last_over_time() function for selecting all the time series, which were active during the last day. Time series can stop receiving new samples and become inactive at any time. Such time series aren't captured with simple count(...) by (a) after 5 minutes of inactivity. New deployments in Kubernetes and horizontal pod autoscaling are the most frequent source of big number of inactive time series (aka high churn rate).
We have a stream of events each having the following properties:
public class Event {
private String id;
private String src;
private String dst;
}
Besides, we have a set of hierarchical or nested rules we want to model with EPL and Esper. Each rule should be applied if and only if all of its parent rules have been already activated (a matching instance occurred for all of them). For example:
2 events or more with the same src and dst in 10 seconds
+ 5 or more with src, dst the same as the src, dst in the above rule in 20s
+ 100 or more with src, dst the same as the src, dst in the above rules in 30s
We want to retrieve all event instances corresponding to each level of this rule hierarchy.
For example, considering following events:
id ---- source -------------- destination ---------------- arrival time (second)
1 192.168.1.1 192.168.1.2 1
2 192.168.1.1 192.168.1.2 2
3 192.168.1.1 192.168.1.3 3
4 192.168.1.1 192.168.1.2 4
5 192.168.1.5 192.168.1.8 5
6 192.168.1.1 192.168.1.2 6
7 192.168.1.1 192.168.1.2 7
8 192.168.1.1 192.168.1.2 8
.....
100 other events from 192.168.1.1 to 192.168.1.2 in less than 20 seconds
We want our rule hierarchy to report this instance together with the id of all events corresponding to each level of the hierarchy. For example, something like the following report is required:
2 or more events with src 1928.168.1.1 and dst 192.168.1.2 in 10 seconds ( Ids:1,2 )
+ 5 or more with the same src (192.168.1.1) and dst (192.168.1.2) in 20s (Ids:1,2,4,6,7)
+ 100 or more events from 192.168.1.1 to 192.168.1.2 in 30s (Ids:1,2,4,6,7,8,...)
How can we achieve this (retrieve the ids of the events matched with all rules) in Esper EPL?
Complex use case, it will take some time to model this. I'd start simple with keeping a named window and using some match-recognize or EPL patterns. For the rule nesting, I'd propose triggering other statements using insert-into. A context partition that gets initiated by a triggering event may also come in handy. For the events that are shared between rules, if any, go against the named window using a join or subquery, for example. For the events that arrive after a triggering event of the first or second rule, just use EPL statements that consume the triggering event. Start simple and build it up, become familiar with insert-into and declaring an overlapping context.
You could use each rule as the input for the next rule in the hierarchy, for example, the rule listens for matching events in the last 10 secs ans inserts the results with the 'insert into' clause to a new stream, so the next rule is triggered for events in the new steam and so on... it it a pretty simple use case and can be done even without context partitions.