I have a table [POSTGRE] like this...
- Agent who answered the call,
- call number
- datetime end of call.
Agent | Caller | DateTime
------------------------------------
101 | 555-1234 | 12-01-16 00:00
101 | 555-1234 | 12-01-16 01:00
102 | 555-1234 | 12-01-16 02:00
102 | 555-1234 | 13-01-16 06:00
1º - I need to verify each number (each line).. and look 24 hours before call, to check if is Recalling, eg.: day 13th was not recalling because the last call from this number was not in 24 hours.
2º - Need to get the agent from recalling.
need to display like this...
Agent | Caller | DateTime | Recalling | Recalling-From
----------------------------------------------------------------------
101 | 555-1234 | 12-01-16 00:00 | NO |
101 | 555-1234 | 12-01-16 01:00 | YES | 101
102 | 555-1234 | 12-01-16 02:00 | YES | 101
102 | 555-1234 | 13-01-16 06:00 | NO |
my query is ...
SELECT
to_char(datahora_entrada_fila, 'DD/MM/YYYY') as dia,
calleridnum As numero,
MAX(datahora_inicio) AS data_ini
FROM callcenter.chamada_fila_in cfin
WHERE datahora_inicio BETWEEN '2016-12-01 00:00:00' AND '2016-12-01 23:59:59'
AND status_chamada = 'Finalizada'
GROUP BY dia,numero
)
SELECT
to_char(datahora_entrada_fila, 'DD/MM/YYYY') as dia,
(SELECT count(calleridnum) FROM callcenter.chamada_fila_in f INNER JOIN maiordata md ON f.calleridnum = md.numero WHERE calleridnum = cfin.calleridnum AND datahora_inicio BETWEEN md.data_ini - interval '24 hours' AND md.data_ini ) AS QtdRechamada,
calleridnum As numero
FROM callcenter.chamada_fila_in cfin
WHERE datahora_inicio BETWEEN '2016-12-01 00:00:00' AND '2016-12-01 23:59:59'
AND status_chamada = 'Finalizada'
GROUP BY dia,numero
ORDER BY
dia,numero DESC
need a better function or method..
this query is very heavy in my database, need To be optimized.
Related
I have a list of dates each with a value in MYSQL.
For each date I want to sum the value for this date and the previous 4 days.
I also want to sum the values for the start of that month to the present date. So for example:
For 07/02/2021 sum all values from 07/02/2021 to 01/02/2021
For 06/02/2021 sum all values from 06/02/2021 to 01/02/2021
For 31/01/2021 sum all values from 31/01/2021 to 01/01/2021
The output should look like:
Any help would be appreciated.
Thanks
In MYSQL 8.0 you get to use analytic/windowed functions.
SELECT
*,
SUM(value) OVER (
ORDER BY date
ROWS BETWEEN 4 PRECEEDING
AND CURRENT ROW
) AS five_day_period,
SUM(value) OVER (
PARTITION BY DATE_FORMAT(date, '%Y-%m-01')
ORDER BY date
) AS month_to_date
FROM
your_table
In the first case, it's just saying sum up the value column, in date order, starting from 4 rows before the current row, and ending on the current row.
In the second case, there's no ROWS BETWEEN, and so it defaults to all the rows preceding the current row up to the current row. Instead, we add a PARTITION BY which says to treat all rows with the same calendar month separately from any rows on a different calendar month. This, all rows before the current one only looks back to the first row in the partition, which is the first row in the current month.
In MySQL 5.x there are no such functions. As such I would resort to correlated sub-queries.
SELECT
*,
(
SELECT SUM(value)
FROM your_table AS five_day_lookup
WHERE date >= DATE_SUB(your_table.date, INTERVAL 4 DAYS)
AND date <= your_table.date
)
AS five_day_period,
(
SELECT SUM(value)
FROM your_table AS monthly_lookup
WHERE date >= DATE(DATE_FORMAT(your_table.date, '%Y-%m-01'))
AND date <= your_table.date
)
AS month_to_date
FROM
your_table
Here is a other way to do that:
Select
t1.`mydate` AS 'Date'
, t1.`val` AS 'Value'
, SUM( IF(t2.`mydate` >= t1.`mydate` - INTERVAL 4 DAY,t2.val,0)) AS '5 Day Period'
, SUM( IF(t2.`mydate` >= DATE_ADD(DATE_ADD(LAST_DAY(t1.`mydate` ),INTERVAL 1 DAY),INTERVAL - 1 MONTH),t2.val,0)) AS 'Month of Date'
FROM tab t1
LEFT JOIN tab t2 ON t2.`mydate`
BETWEEN LEAST( DATE_ADD(DATE_ADD(LAST_DAY(t1.`mydate` ),INTERVAL 1 DAY),INTERVAL - 1 MONTH),
t1.`mydate` - INTERVAL 4 DAY)
AND t1.`mydate`
GROUP BY t1.`mydate`
ORDER BY t1.`mydate` desc;
sample
MariaDB [bkvie]> SELECT * FROM tab;
+----+------------+------+
| id | mydate | val |
+----+------------+------+
| 1 | 2021-02-07 | 10 |
| 2 | 2021-02-06 | 30 |
| 3 | 2021-02-05 | 40 |
| 4 | 2021-02-04 | 50 |
| 5 | 2021-02-03 | 10 |
| 6 | 2021-02-02 | 20 |
| 7 | 2021-01-31 | 20 |
| 8 | 2021-01-30 | 10 |
| 9 | 2021-01-29 | 30 |
| 10 | 2021-01-28 | 40 |
| 11 | 2021-01-27 | 20 |
| 12 | 2021-01-26 | 30 |
| 13 | 2021-01-25 | 10 |
| 14 | 2021-01-24 | 40 |
| 15 | 2021-02-01 | 10 |
+----+------------+------+
15 rows in set (0.00 sec)
result
MariaDB [bkvie]> Select
-> t1.`mydate` AS 'Date'
-> , t1.`val` AS 'Value'
-> , SUM( IF(t2.`mydate` >= t1.`mydate` - INTERVAL 4 DAY,t2.val,0)) AS '5 Day Period'
-> , SUM( IF(t2.`mydate` >= DATE_ADD(DATE_ADD(LAST_DAY(t1.`mydate` ),INTERVAL 1 DAY),INTERVAL - 1 MONTH),t2.val,0)) AS 'Month of Date'
-> FROM tab t1
-> LEFT JOIN tab t2 ON t2.`mydate`
-> BETWEEN LEAST( DATE_ADD(DATE_ADD(LAST_DAY(t1.`mydate` ),INTERVAL 1 DAY),INTERVAL - 1 MONTH),
-> t1.`mydate` - INTERVAL 4 DAY)
-> AND t1.`mydate`
-> GROUP BY t1.`mydate`
-> ORDER BY t1.`mydate` desc;
+------------+-------+--------------+---------------+
| Date | Value | 5 Day Period | Month of Date |
+------------+-------+--------------+---------------+
| 2021-02-07 | 10 | 140 | 170 |
| 2021-02-06 | 30 | 150 | 160 |
| 2021-02-05 | 40 | 130 | 130 |
| 2021-02-04 | 50 | 110 | 90 |
| 2021-02-03 | 10 | 70 | 40 |
| 2021-02-02 | 20 | 90 | 30 |
| 2021-02-01 | 10 | 110 | 10 |
| 2021-01-31 | 20 | 120 | 200 |
| 2021-01-30 | 10 | 130 | 180 |
| 2021-01-29 | 30 | 130 | 170 |
| 2021-01-28 | 40 | 140 | 140 |
| 2021-01-27 | 20 | 100 | 100 |
| 2021-01-26 | 30 | 80 | 80 |
| 2021-01-25 | 10 | 50 | 50 |
| 2021-01-24 | 40 | 40 | 40 |
+------------+-------+--------------+---------------+
15 rows in set (0.00 sec)
MariaDB [bkvie]>
Given a Postgres table with columns highwater_datetime::timestamp and highwater::integer, I am trying to construct a select statement for a given highwater_datetime range, that generates rows with a column for the max highwater for each hour (first occurrence when dups) and another column showing the highwater_datetime when it occurred (truncated to the minute and order by highwater_datetime asc). e.g.
| highwater_datetime | max_highwater |
+--------------------+---------------+
| 2021-01-27 20:05 | 8 |
| 2021-01-27 21:00 | 7 |
| 2021-01-27 22:00 | 7 |
| 2021-01-27 23:00 | 7 |
| 2021-01-28 00:00 | 7 |
| 2021-01-28 01:32 | 7 |
| 2021-01-28 02:00 | 7 |
| 2021-01-28 03:00 | 7 |
| 2021-01-28 04:22 | 9 |
DISTINCT ON should do the trick:
SELECT DISTINCT ON (date_trunc('hour', highwater_datetime))
highwater_datetime,
highwater
FROM mytable
ORDER BY date_trunc('hour', highwater_datetime),
highwater DESC,
highwater_datetime;
DISTINCT ON will output the first row for each entry with the same hour according to the ORDER BY clause.
MY SITUATION:
I have written a piece of code that returns a dataset containing a web user's aggregated activity for the previous 90 days and returns a score, subsequent to some calculation. Essentially, like RFV.
A (VERY) simplified version of the code can be seen below:
WITH start_data AS (
SELECT user_id
,COUNT(web_visits) AS count_web_visits
,COUNT(button_clicks) AS count_button_clicks
,COUNT(login) AS count_log_in
,SUM(time_on_site) AS total_time_on_site
,CURRENT_DATE AS run_date
FROM web.table
WHERE TO_CHAR(visit_date, 'YYYY-MM-DD') BETWEEN DATEADD(DAY, -90, CURRENT_DATE) AND CURRENT_DATE
AND some_flag = 1
AND some_other_flag = 2
GROUP BY user_id
ORDER BY user_id DESC
)
The output might look something like the below:
| user_id | count_web_visits | count_button_clicks | count_log_in | total_time_on_site | run_date |
|---------|------------------|---------------------|--------------|--------------------|----------|
| 1234567 | 256 | 932 |16 | 1200 | 23-01-20 |
| 2391823 | 710 | 1345 |308 | 6000 | 23-01-20 |
| 3729128 | 67 | 204 |83 | 320 | 23-01-20 |
| 5561296 | 437 | 339 |172 | 3600 | 23-01-20 |
This output is then stored in it's own AWS/Redhsift table and will form base table for the task.
SELECT *
into myschema.base_table
FROM start_data
DESIRED OUTPUT:
What I need to be able to do, is iteratively run this code such that I append new data to myschema.base_table, every day, for the previous 90's day aggregation.
The way I see it, I can either go forwards or backwards, it doesn't matter.
That is to say, I can either:
Starting from today, run the code, everyday, for the preceding 90 days, going BACK to the (first date in the table + 90 days)
OR
Starting from the (first date in the table + 90 days), run the code for the preceding 90 days, everyday, going FORWARD to today.
Option 2 seems the best option to me and the desired output looks like this (PARTITION FOR ILLUSTRATION ONLY):
| user_id | count_web_visits | count_button_clicks | count_log_in | total_time_on_site | run_date |
|---------|------------------|---------------------|--------------|--------------------|----------|
| 1234567 | 412 | 339 |180 | 3600 | 20-01-20 |
| 2391823 | 417 | 6253 |863 | 2400 | 20-01-20 |
| 3729128 | 67 | 204 |83 | 320 | 20-01-20 |
| 5561296 | 281 | 679 |262 | 4200 | 20-01-20 |
|---------|------------------|---------------------|--------------|--------------------|----------|
| 1234567 | 331 | 204 |83 | 3200 | 21-01-20 |
| 2391823 | 652 | 1222 |409 | 7200 | 21-01-20 |
| 3729128 | 71 | 248 |71 | 720 | 21-01-20 |
| 5561296 | 366 | 722 |519 | 3600 | 21-01-20 |
|---------|------------------|---------------------|--------------|--------------------|----------|
| 1234567 | 213 | 808 |57 | 3600 | 22-01-20 |
| 2391823 | 817 | 4265 |476 | 1200 | 22-01-20 |
| 3729128 | 33 | 128 |62 | 120 | 22-01-20 |
| 5561296 | 623 | 411 |283 | 2400 | 22-01-20 |
|---------|------------------|---------------------|--------------|--------------------|----------|
| 1234567 | 256 | 932 |16 | 1200 | 23-01-20 |
| 2391823 | 710 | 1345 |308 | 6000 | 23-01-20 |
| 3729128 | 67 | 204 |83 | 320 | 23-01-20 |
| 5561296 | 437 | 339 |172 | 3600 | 23-01-20 |
WHAT I HAVE TRIED:
I have successfully created a WHILE loop to sequentially increment the date as follows:
CREATE OR REPLACE PROCEDURE retrospective_data()
LANGUAGE plpgsql
AS $$
DECLARE
start_date DATE := '2020-11-20' ;
BEGIN
WHILE CURRENT_DATE > start_date
LOOP
RAISE INFO 'Date: %', start_date;
start_date = start_date + 1;
END LOOP;
RAISE INFO 'Loop Statment Executed Successfully';
END;
$$;
CALL retrospective_data();
Thus producing the dates as follows:
INFO: Date: 2020-11-20
INFO: Date: 2020-11-21
INFO: Date: 2020-11-22
INFO: Date: 2020-11-23
INFO: Date: 2020-11-24
INFO: Date: 2020-11-25
INFO: Date: 2020-11-26
INFO: Loop Statment Executed Successfully
Query 1 OK: CALL
WHAT I NEED HELP WITH:
I need to be able to apply the WHILE loop to the initial code such that the WHERE clause becomes:
WHERE TO_CHAR(visit_date, 'YYYY-MM-DD') BETWEEN DATEADD(DAY, -90, start_date) AND start_date
But where start_date is the result of each incremental loop. Additionally, the result of each execution needs to be appended to the previous.
Any help appreciated.
It is fairly clear that you come from a procedural programming background and this first recommendation is to stop thinking in terms of loops. Databases are giant and powerful data filtering machines and thinking in terms of 'do step 1, then step 2' often leads to missing out on all this power.
You want to look into window functions which allow you to look over ranges of other rows for each row you are evaluating. This is exactly what you are trying to do.
Also you shouldn't cast a date to a string just to compare it to other dates (WHERE clause). This is just extra casting and defeats Redshift's table scan optimizations. Redshift uses block metadata that optimizes what data is needed to be read from disk but this cannot work if the column is being cast to another data type.
Now to your code (off the cuff rewrite and for just the first column). Be aware that group by clauses run BEFORE window functions and that I'm assuming that not all users have a visit every day. And since Redshift doesn't support RANGE in window functions will need to make sure all dates are represented for all user-ids. This is done by UNIONing with a sufficient number of rows that covers the date range. You may have a table like this or may want to create one but I'll just generate something on the fly to show the process (and this process makes the assumption that there are fewer dense dates than rows in the table - likely but not iron clad).
SELECT user_id
,COUNT(web_visits) AS count_web_visits_by_day,
,SUM(count_web_visits_by_day) OVER (partition by user_id order by visit_date rows between 90 preceding and current row)
...
,visit_date
FROM (
SELECT visit_date, user_id, web_visits, ...
FROM web.table
WHERE some_flag = 1 AND some_other_flag = 2
UNION ALL -- this is where I want to union with a full set of dates by user_id
( SELECT visit_date, user_id, NULL as web_visits, ...
FROM (
SELECT DISTINCT user_id FROM web.table
CROSS JOIN
SELECT CURRENT_DATE + 1 - row_number() over (order by visit_date) as visit_date
FROM web.table
)
)
)
GROUP BY visit_date, user_id
ORDER BY visit_date ASC, user_id DESC ;
The idea here is to set up your data to ensure that you have at least one row for each user_id for each date. Then the window functions can operate on the "grouped by date and user_id" information to sum and count over the past 90 row (which is the same as past 90 days). You now have all the information you want for all dates where each is looking back over 90 days. One query to give you all the information, no while loop, no stored procedures.
Untested but should give you the pattern. You may want to massage the output to give you the range you are looking for and clean up NULL result rows.
I have a table called forecasts where we store the forecasts for all the products for the next 6 months. For example when we are in November we create the forecast for December, January, February, March, April and May. The forecasts table looks something like the one below
+----------------+---------------+--------------+----------+
| product_number | forecasted_on | forecast_for | quantity |
+----------------+---------------+--------------+----------+
| Prod 1 | 2016-11-01 | 2016-12-01 | 100 |
| Prod 1 | 2016-11-01 | 2017-01-01 | 200 |
| Prod 1 | 2016-11-01 | 2017-02-01 | 300 |
| Prod 1 | 2016-11-01 | 2017-03-01 | 400 |
| Prod 1 | 2016-11-01 | 2017-04-01 | 500 |
| Prod 1 | 2016-11-01 | 2017-05-01 | 600 |
+----------------+---------------+--------------+----------+
Where the table contains a list of product numbers and the date on which the forecast was created i.e. forecasted_on and a month for which the forecast was created for along with the forecasted quantity.
Each month data gets added for the next 6 months. So when the forecasted_on is 1-December-2016 forecasts will be created for January till June.
I am trying to create a report that shows how the total forecasts have varied for the last 3 months. Something like this
+------------+----------------+---------------+----------------+
| | 0 months prior | 1 month prior | 2 months prior |
+------------+----------------+---------------+----------------+
| 2016-12-01 | 200 | 150 | 250 |
| 2017-01-01 | 300 | 250 | 150 |
| 2017-02-01 | 100 | 150 | 100 |
+------------+----------------+---------------+----------------+
Currently I am using a lot of repetitive code in rails to generate this table. I wanted to see if there was an easier way to do it directly using a SQL query.
Any help would be greatly appreciated.
Use PIVOT query:
select forecast_for,
sum( case when forecasted_on + interval '1' month = forecast_for
then quantity end ) q_0,
sum( case when forecasted_on + interval '2' month = forecast_for
then quantity end ) q_1,
sum( case when forecasted_on + interval '3' month = forecast_for
then quantity end ) q_2,
sum( case when forecasted_on + interval '4' month = forecast_for
then quantity end ) q_3,
sum( case when forecasted_on + interval '5' month = forecast_for
then quantity end ) q_4,
sum( case when forecasted_on + interval '6' month = forecast_for
then quantity end ) q_5
from Table1
group by forecast_for
order by 1
;
Demo: http://sqlfiddle.com/#!15/30e5e/1
| forecast_for | q_0 | q_1 | q_2 | q_3 | q_4 | q_5 |
|----------------------------|--------|--------|--------|--------|--------|--------|
| December, 01 2016 00:00:00 | 100 | (null) | (null) | (null) | (null) | (null) |
| January, 01 2017 00:00:00 | (null) | 200 | (null) | (null) | (null) | (null) |
| February, 01 2017 00:00:00 | (null) | (null) | 300 | (null) | (null) | (null) |
| March, 01 2017 00:00:00 | (null) | (null) | (null) | 400 | (null) | (null) |
| April, 01 2017 00:00:00 | (null) | (null) | (null) | (null) | 500 | (null) |
| May, 01 2017 00:00:00 | (null) | (null) | (null) | (null) | (null) | 600 |
Assuming that (product_number, forcast_on, forcasted_for) is unique (so no aggregation is required), then this should do the job:
WITH forecast_dates AS (
SELECT DISTINCT product_number, forcast_for
FROM forecasts
)
SELECT
fd.forcast_for AS "forecast for",
m1.quantity AS "one month prior",
m2.quantity AS "two months prior",
m3.quantity AS "three months prior"
FROM forecast_dates fd
LEFT JOIN forecasts m1 ON fd.forcast_for = m1.forcast_for AND fd.forcast_for = m1.forcasted_on + INTERVAL '1 month'
LEFT JOIN forecasts m2 ON fd.forcast_for = m2.forcast_for AND fd.forcast_for = m2.forcasted_on + INTERVAL '2 month'
LEFT JOIN forecasts m3 ON fd.forcast_for = m3.forcast_for AND fd.forcast_for = m3.forcasted_on + INTERVAL '3 month'
WHERE fd.product_number = 'Prod 1'
ORDER BY fd.forcast_for;
I have some
id_merchant | data | sell
11 | 2009-07-20 | 1100.00
22 | 2009-07-27 | 1100.00
11 | 2005-07-27 | 620.00
31 | 2009-08-07 | 2403.20
33 | 2009-08-12 | 4822.00
52 | 2009-08-14 | 4066.00
52 | 2009-08-15 | 295.00
82 | 2009-08-15 | 0.00
23 | 2011-06-11 | 340.00
23 | 2012-03-22 | 1000.00
23 | 2012-04-08 | 1000.00
23 | 2012-07-13 | 36.00
23 | 2013-07-17 | 2480.00
23 | 2014-04-09 | 1000.00
23 | 2014-06-10 | 1500.00
23 | 2014-07-20 | 700.50
I want to create table as select with interval 2 years. First date for merchant is min(date). So i generate series (min(date)::date,current(date)::date,'2 years')
I want to get to table like that:
id_merchant | data | sum(sell)
23 | 2011-06-11 | 12382.71
23 | 2013-06-11 | 12382.71
23 | 2015-06-11 | 12382.71
But there is some mistake in my query because sum(sell) is the same for all series and the sum is wrong. Event if i sum sale ther is about 6000 not 12382.71.
My query:
select m.id_gos_pla,
generate_series(m.min::date,dath()::date,'2 years')::date,
sum(rch.suma)
from rch, minmax m
where rch.id_gos_pla=m.id_gos_pla
group by m.id_gos_pla,m.min,m.max
order by 1,2;
Pls for help.
I would do it this way:
select
periods.id_merchant,
periods.date as period_start,
(periods.date + interval '2' year - interval '1' day)::date as period_end,
coalesce(sum(merchants.amount), 0) as sum
from
(
select
id_merchant,
generate_series(min(date), max(date), '2 year'::interval)::date as date
from merchants
group by id_merchant
) periods
left join merchants on
periods.id_merchant = merchants.id_merchant and
merchants.date >= periods.date and
merchants.date < periods.date + interval '2' year
group by periods.id_merchant, periods.date
order by periods.id_merchant, periods.date
We use sub-query to generate date periods for each id_merchant according to the first date for this merchant and required interval. Then join it with merchants table on date within period condition and group by merchant_id and period (periods.date is the starting period date which is enough). And finally we take everything we need: starting date, ending date, merchant and sum.