Siddhi Query : group by results

Siddhi Query : group by results - group-by

I am having a trouble in digesting results of my group by query. My source stream named intermediateStream has data
ts uid id_resp_h
2016-05-08 08:59 CLuCgz3HHzG7LpLwH9 172.30.26.119
2016-05-08 09:00 C3WnnK3TgUf2cSzxVa 172.30.26.127
2016-05-08 09:00 C3WnnK3TgUf2cSzxff 172.30.26.119
SIDDHI query is
from intermediateStream
select ts, count(ts) as ssh_logins
group by ts
insert into SSHOutStream;
I am expecting output to be like
ts ssh_logins
2016-05-08 08:59 1
2016-05-08 09:00 2
But instead it returns
ts ssh_logins
2016-05-08 08:59 1
2016-05-08 09:00 1
2016-05-08 09:00 2
Any suggestions?

Siddhi processes events in real time, as and when they arrive. Thus, in the given scenario, you get count = 1 for the second input since that is the only event with ts=2016-05-08 09:00 among the ones which have arrived so far. When the 3rd event arrives you get count=2, since the previous event too had same ts value.
To get the desired answer, use time batch window which allows you to wait till the specified time elapses prior to giving an output.
(i.e. from intermediateStream#window.timeBatch(1 min))

Related

Redshift how to subtract the number of minutes from two different date columns

I would like to subtract two date columns and get the difference in minutes. Based on the table below, we can see that a notification has an ideal_date of 11/29 1pm and we noticed that the actual_date was sent on 12/30 1pm that means that it took 24 hours for the notification to be sent, meaning it took 1440 minutes for the notification to be sent out.
I tried the following query but I'm not getting what I need.
select n.ideal_date,
n.actual_date,
abs(date_part(minute,n.ideal_date) - date_part(minute,n.actual_date)) as minutes
from table_date n
id
ideal_date
actual_date
minutes
58
12/29/2021,1:00pm
12/30/2021, 1:00pm
1440 mins

You want DATEDIFF(). https://docs.aws.amazon.com/redshift/latest/dg/r_DATEDIFF_function.html
select n.ideal_date,
n.actual_date,
abs(DATEDIFF(minute,n.ideal_date,n.actual_date)) as minutes
from table_date n

Truncate date by month with custom start day in postgres

I am getting some statistics using a query like
SELECT date_trunc('month', created_at) AS time, count(DISTINCT "user_id") AS mau
FROM "session"
GROUP BY time
ORDER BY time;
Which is working fine if I want to get monthly active users for each calendar month. But I would like to shift the result to show last X moths starting from today instead of actual calendar months. How do I do? Can I add an offset in some way?
EDIT
As an example, I am currently getting results like
time | mau
2022-04-01 | 10
2022-05-01 | 20
2022-06-01 | 30
But I would like it to be something like (where 2022-06-07 is today)
time | mau
2022-04-07 | 10
2022-05-07 | 20
2022-06-07 | 30

Tableau Inventory over Time

I have a report that I'm trying to show in-process inventory changes over time. I have a record with an ID, a Start Date and an End Date. So for instance:
ID
Start Date
End Date
12345
1/15/2021
4/30/2021
23456
3/1/2021
3/31/2021
I'd like to be able to show a report, by month, that displays an inventory as of the beginning of the month. Using the data above, I need to show:
Month Beg
1/1/2021
2/1/2021
3/1/2021
4/1/2021
5/1/2021
12345
+1
-1
23456
+1
-1
Inventory
0
1
2
1
0
I don't have the ability to join 2 tables within Tableau as this procedure is restricted. Any other ways around this?

Joining time series events with daily 'shift' data?

What is the best practice for joining 'shift' data and other time series data in Tableau? I am working with multiple geo data (from LA to India, UK, NY, Malaysia, Australia, China etc), and a lot of employees work past midnight.
For example, an employee has shift at 9 PM to 6 AM on 2016-07-31. The 'report date' is 2016-07-31 but no time zone information is provided.
This employee does work and there are events (time stamps in UTC) between 2016-07-31 21:00 to 2016-08-01 06:00. When I look at the events though, 7/31 will only have the events between 21:00 and 23:59. If I filter for just July, my calculations will be skewed (the event data will be cut off at midnight even though the shift extended to 6 AM).
I need to make calculations based upon the total time an employee was actually engaged with work (productive) and the total time they were paid. The request is for this to be daily/weekly/monthly.
If anyone can help me out here or give me some talking points to explain this to my superiors, it would be appreciated. This seems like it must be a common scenario. Do I need to request for a new raw data format or is there something I can do on my end?
the shift data only looks like this:
id date regular_hours overtime_hours total_hours
abc 2016-06-17 8 0.52 8.52
abc 2016-06-18 7.64 0.83 8.47
abc 2016-06-19 7.87 0.23 8.1
the event data is more detailed (30 minute interval data on events handled and the time it took to complete those events in seconds):
id date interval events event_duration
abc 2016-06-17 01:30:00 4 688
abc 2016-06-17 02:00:00 6 924
abc 2016-06-17 02:30:00 10 1320
So, you sum up the event_duration for an entire day and you get a number of seconds which was actually spent doing work. You can then compare this to amount of time that the employee was paid to see how efficient the staffing is.
My concern is that the event data has the date and the time (UTC). The payroll data only has a date without any time zone information. This causes inaccuracies when blending data in Tableau because some shifts cross midnight. Is there a way around this or do I need to propose new data requirements?
(FYI - people have been calculating it just based on the date for years most likely without considering time zones before. My assumption is that they just did not realize that this could cause inaccurate results)

Calculating total working hours based on shifts

I would like to calculate how many hours each employee has worked for a certain time period, based on information from this table:
start employee_id
2014-08-10 18:10:00 5
2014-08-10 13:30:00 7
2014-08-10 09:00:00 7
2014-08-09 23:55:00 4
2014-08-09 16:23:00 12
2014-08-09 03:59:00 9
2014-08-08 20:05:00 7
2014-08-08 13:00:00 8
Each employee replaces another employee and that's where his work is done, so there are no empty slots.
The desired format of the result would be the following:
employee_id total_minutes_worked
I'm trying to think of the best way to achieve this, so any help will be appreciated!

You can get the total time as:
select employee_id, sum(stop - start)
from (
select start, lead(start) over (order by start) as stop, employee_id
from t
) as x
group by employee_id;
It remains to format the time, but I assume this it not what puzzles you

you should use 'GroupBy' clause to first create a group of the same employee id
than you should calculate the time by checking the start time of work and end time of work in each slot.
(NOTE - you should maintain the start time and end time both of the employee in each slot of there shift)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse