filter several data in several date mysql - mysql-workbench

so i want to filter my buyer data who doing transaction in month 1,2,3 (jan-mar) 2019 who do the transaction too in month 4,5,6 (apr-june) 2017, so if the buyer doing transaction before apr 2017, the the buyer didnt appear in list, i've tried my syntax but idk why the result is so many, here's my syntax
SELECT DISTINCT
d1.buyer_id
FROM data_2019 d1
WHERE
MONTH (d1.tgl) IN (1, 2, 3) AND
NOT EXISTS (SELECT 1 FROM data_2017 d2
WHERE d2.buyer_id = d1.buyer_id AND d2.tgl < '2017-04-01')
GROUP BY
buyer_id;
Can you tell me guys which the wrong at?

I suspect there are two other problems, beyond what Tim Biegeleisen noted.
First, every sale by every buyer in data_2019 will result in tests in data_2017. I suggest querying from a table of all buyers, with an EXISTS() clause on data_2019. This should also eliminate the need for the DISTINCT clause.
Second, partitioning the data into different tables by year will be a serious headache as time passes. Why not put it all into a single table?
Thus:
SELECT
b.buyer_id
FROM buyer b
WHERE
EXISTS (SELECT 1 FROM data_all d
WHERE d.buyer_id = b.buyer_id AND
d.tgl >= '2019-01-01' AND d.tgl < '2019-04-01') AND
EXISTS (SELECT 1 FROM data_all d
WHERE d.buyer_id = b.buyer_id AND
d.tgl >= '2017-04-01' AND d.tgl < '2017-07-01') AND
NOT EXISTS (SELECT 1 FROM data_all d
WHERE d.buyer_id = b.buyer_id AND
d.tgl >= '2017-01-01' AND d.tgl < '2017-04-01');
At this point, if you wanted to extend the "not before april 2017" clause to all years, you just remove the d.tgl >= '2017-01-01' clause, where you might otherwise need many NOT EXISTS classes for each year.

I would express this using two EXISTS clauses:
SELECT DISTINCT
d1.buyer_id
FROM data_2019 d1
WHERE
d1.tgl >= '2019-01-01' AND d1.tgl < '2019-04-01' AND
EXISTS (SELECT 1 FROM data_2017 d2
WHERE d2.buyer_id = d1.buyer_id AND
d2.tgl >= '2017-04-01' AND d2.tgl < '2017-07-01') AND
NOT EXISTS (SELECT 1 FROM data_2017 d2
WHERE d2.buyer_id = d1.buyer_id AND d2.tgl < '2017-04-01');
The first EXISTS clause asserts that the first query 2019 buyer also was active between April and June (inclusive) in 2017. The second EXISTS clause makes sure that this same buyer also had no activity in the first quarter of 2017.

Related

How do you organize this query by week

Here is my Query so far:
select one.week, total, comeback, round(comeback)::Numeric / total::numeric * 100 as comeback_percent
FROM
(
SELECT count(username) as total, week
FROM
(
select row_number () over (partition by u.id order by creation_date) as row, username, date_trunc ('month', creation_date)::date AS week
FROM users u
left join entries e on u.id = e.user_id
where ((entry_type = 0 and distance >= 1) or (entry_type = 1 and seconds_running >= 600))
) x
where row = 1
group by week
order by week asc
) one
join
(
SELECT count(username) as comeback, week
FROM
(
select row_number () over (partition by u.id order by creation_date) as row, username, runs_completed, date_trunc ('month', creation_date)::date AS week
FROM entries e
left join users u on e.user_id = u.id
where ((entry_type = 0 and distance >= 1) or (entry_type = 1 and seconds_running >= 600))
) y
where runs_completed > 1 and row = 1
group by week
order by week asc
) two
on one.week = two.week
What I want to accomplish, is return a line graph for users that have completed one run with us, grouped by week, and assign percentages for that week of anyone who has completed a second run EVER, not just within that week. Our funnel has improved by a factor of 5 since we started, yet the line graph that is produced does not show similar results.
I could be incorrectly joining them together, or there may be a cleaner way to use CTE or window functions to perform this query, I am open to any and all suggestions. Thanks!
If you need tables or further information, let me know. I'm happy to provide anything that may be needed.

Aggregated values depending on an other field

I have a table with a date-time and multiples propertied some on which I group by and some on which I aggregate, the query will be like get me revenue per customer last week.
Now I want to see the change between the requested period and the previous one so I will have 2 columns revenue and previous_revenue.
Right now I'm requesting the rows of the requested period plus the rows of the previous period and for each aggregated field I add a case statement inside which return the value or 0 if not in the period that I want.
That lead to as many CASE as aggregate fields but always with the same conditional statement.
I'm wondering if there is a better design for this use case...
SELECT
customer,
SUM(
CASE TIMESTAMP_CMP('2016-07-01 00:00:00', ft.date) > 0 WHEN true THEN
REVENUE
ELSE 0 END
) AS revenue,
SUM(
CASE TIMESTAMP_CMP('2016-07-01 00:00:00', ft.date) < 0 WHEN true THEN
REVENUE
ELSE 0 END
) AS previous_revenue
WHERE date_hour >= '2016-06-01 00:00:00'
AND date_hour <= '2016-07-31 23:59:59'
GROUP BY customer
(In my real use case I have many columns which make it even more ugly)
First, I'd suggest to refactor out the timestamps and precalculate the current and previous period for later use. This is not strictly necessary to solve your problem, though:
create temporary table _period as
select
'2016-07-01 00:00:00'::timestamp as curr_period_start
, '2016-07-31 23:59:59'::timestamp as curr_period_end
, '2016-06-01 00:00:00'::timestamp as prev_period_start
, '2016-06-30 23:59:59'::timestamp as prev_period_end
;
Now a possible design to avoid repetition of timestamps and CASE statements is to group by the periods first and then doing a FULL OUTER JOIN for that table on itself:
with _aggregate as (
select
case
when date_hour between prev_period_start and prev_period_end then 'previous'
when date_hour between curr_period_start and curr_period_end then 'current'
end::varchar(20) as period
, customer
-- < other columns to group by go here >
, sum(revenue) as revenue
-- < other aggregates go here >
from
_revenue, _period
where
date_hour between prev_period_start and curr_period_end
group by 1, 2
)
select
customer
, current_period.revenue as revenue
, previous_period.revenue as previous_revenue
from
(select * from _aggregate where period = 'previous') previous_period
full outer join (select * from _aggregate where period = 'current') current_period
using(customer) -- All columns which have been group by must go into the using() clause:
-- e.g. using(customer, some_column, another_column)
;

postgresql complex query joing same table

I would like to get those customers from a table 'transactions' which haven't created any transactions in the last 6 Months.
Table:
'transactions'
id, email, state, paid_at
To visualise:
|------------------------ time period with all transactions --------------------|
|-- period before month transactions > 0) ---|---- curr month transactions = 0 -|
I guess this is doable with a join showing only those that didn't have any transactions on the right side.
Example:
Month = November
The conditions for the left side should be:
COUNT(l.id) > 0
l.paid_at < '2013-05-01 00:00:00'
Conditions for the right side:
COUNT(r.id) = 0
r.paid_at BETWEEN '2013-05-01 00:00:00' AND '2013-11-30 23:59:59'
Is join the right approach?
Answer
SELECT
C .email
FROM
transactions C
WHERE
(
C .email NOT IN (
SELECT DISTINCT
email
FROM
transactions
WHERE
paid_at >= '2013-05-01 00:00:00'
AND paid_at <= '2013-11-30 23:59:59'
)
AND
C .email IN (
SELECT DISTINCT
email
FROM
transactions
WHERE
paid_at <= '2013-05-01 00:00:00'
)
)
AND c.paid_at <= '2013-11-30 23:59:59'
There are a couple of ways you could do this. Use a subquery to get distinct customer ids for transactions in the last 6 months, and then select customers where their id isn't in the subquery.
select c.id, c.name
from customer c
where c.id not in (select distinct customer_id from transaction where dt between <start> and <end>);
Or, use a left join from customer to transaction, and filter the results to have transaction id null. A left join includes all rows from the left-hand table, even when there are no matching rows in the right-hand table. Explanation of left joins here: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
select c.id, c.name
from customer c
left join transaction t on c.id = t.customer_id
and t.dt between <start> and <end>
where t.id is null;
The left join approach is likely to be faster.

How do I get a recursive daily average for one month?

I need a daily average for an entire month, but the trick is that all of the clients have different start and end dates. For example, some clients are only enrolled for part of the month. Assume client A is enrolled from 4/3/13-4/8/13, client B from 4/6-4/30, client C from 4/1-5/1, etc. How can I achieve this? Here is my current code which returns super low counts because it assumes all clients are enrolled the entire month:
if exists (
select * from tempdb.dbo.sysobjects o where o.xtype in ('U') and o.id = object_id(N'tempdb..#enrollments_PreviousMonth2')
) DROP TABLE #enrollments_PreviousMonth2;
Select
people_id,
program_modifier,
program_modifier_id,
DATEADD(dd, 0, DATEDIFF(dd, 0, actual_date)) as enroll_midnight_date,
actual_date as enroll_start_date,
end_date as enroll_end_date
INTO #enrollments_PreviousMonth2
From
program_modifier_enrollment_view pmev with(nolock)
Where
program_modifier_id = 'E1AA7A36-0500-4BAE-A0AA-D9E0BC91A6F3' and
actual_date <= '4/30/13' and (end_date >= '4/1/13' or end_date is null)
;with cte as (
select cast(enroll_start_date as date) as actual_date,
count(people_id) cnt
From #enrollments_PreviousMonth2 en
left join Calendar c on en.enroll_midnight_date = c.dt
where program_modifier_id = 'E1AA7A36-0500-4BAE-A0AA-D9E0BC91A6F3'
AND enroll_start_date <= '4/30/13' and (enroll_end_date >= '4/1/13' or enroll_end_date is null)
Group by enroll_start_date--, enroll_end_date, program_modifier_id, program_modifier
)
select
sum(cnt*1.0)
from cte
I prefer to not use a CURSOR for the solution, however.

Removing duplicate date periods

I am working on script to get data from a database with millions of rows and have a problem with gaps in periods. We have decided that gaps less than 10 days should not be considered gaps at all. Thus, these gaps should be deleted (See example below. The bold dates form the “real” periods of interest)
ID InDate OutDate
1 2008-10-10 2009-02-05
1 2009-02-08 2009-05-13
1 2011-01-01 2011-05-20
2 2007-03-17 2008-10-19
2 2009-05-30 2010-10-12
2 2010-10-14 2010-12-31
Thus, several problems arises. The first problem is to identify which Outdates and Indates are that close to each other for the period to be transformed into a single one. The next problem is to move the Outdate from the higher row number to the lower row number (that is up the table). The last problem is to identify and get rid of the rows which are now duplicates.
I have tried to solve the question down below. The first two problems are solved in table #t4a. The strategy in table #t4aa is to get rid of the duplicates by marking the duplicate rows in question in a new (dummy) variable and get rid of all such values (1:s) in a later stage. However, it does not work! All rows are marked with a 0, even those which should be marked with an 1. Any suggestions?
--This temp table measures gaps and creates a new variable OutDate2 which in the cases of a to small gap (less than 11 days) write the next Outdate on the row instead of the original value.
WITH C AS (SELECT Id, InDate, OutDate, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY InDate) Rownum FROM #t4 t4)
SELECT cur.Rownum, cur.Id, cur.InDate CurInDate, cur.OutDate, nxt.InDate NxtInDate, DATEDIFF(day, cur.OutDate, nxt.InDate) Number_of_days,
CASE WHEN DATEDIFF(day, cur.OutDate, nxt.InDate)<11 AND DATEDIFF(day, cur.OutDate, nxt.InDate)>0 THEN nxt.OutDate ELSE cur.OutDate END AS OutDate2
INTO #t4a
FROM C cur
LEFT OUTER JOIN C nxt ON (nxt.rownum=cur.rownum+1 AND nxt.Id=cur.Id)
--This temp table creates a dummy which identifies the OVERLAP of rows in order for these to be eliminated in a later temporary table. It is this table that does not work.
WITH C AS (SELECT Id, InDate, OutDate, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY InDate) rownum FROM #t4a)
SELECT cur.Id, cur.InDate, nxt.OutDate2,
CASE WHEN cur.OutDate2 < nxt.InDate THEN 1.0 ELSE 0.0
END AS Overlap
INTO #t4aa
FROM C cur
LEFT OUTER JOIN C nxt on (cur.rownum=nxt.rownum+1 AND cur.Id=nxt.Id)
This is kind of conceptual but might give you some ideas
WITH C AS
(SELECT Id, InDate, OutDate, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY InDate) Rownum FROM #t4 t4)
select Cgood.*
from c
join C as Cgood
on Cgood.ID = C1.ID
and Cgood.Rownum = C.Rownum + 1
and DATEDIFF(day, C.OutDate, nxt.InDate)>=11
group by Cgood.*
union
select Cgood.*
from c
join C as Cgood
on Cgood.ID = C1.ID
and Cgood.Rownum = 1
and C.Rownum = 2
and DATEDIFF(day, C.OutDate, nxt.InDate)>=11
group by Cgood.*
union
select cMerge.ID, c.Indate, cMerge.OutDate
from c
join C as cMerge
on cMerge.ID = C1.ID
and cMerge.Rownum = C.Rownum + 1
and DATEDIFF(day, C.OutDate, cMerge.InDate) < 11
group by cMerge.ID, c.Indate, cMerge.OutDate
union
select cMerge.ID, c.Indate, cMerge.OutDate
from c
join C as cMerge
on cMerge.ID = C1.ID
and cMerge.Rownum = 1
and C.Rownum = 2
and DATEDIFF(day, C.OutDate, cMerge.InDate) < 11
group b
I solved my own question yesterday. I got rid of the last temp table and incorporated creating the dummy variable in the first temp table. The core of the solution was to join backwards as well as forward.
WITH C AS (SELECT Id, InDate, OutDate, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY InDate) Rownum FROM #t4 t4)
SELECT cur.Rownum, cur.Id, cur.InDate CurInDate, cur.OutDate, nxt.InDate NxtInDate, DATEDIFF(day, cur.OutDate, nxt.InDate) Number_of_days,
CASE
WHEN DATEDIFF(day, prv.OutDate, cur.InDate)<11
AND DATEDIFF(day, prv.OutDate, cur.InDate)>0
THEN 1.0
ELSE 0.0
END AS Overlap,
CASE
WHEN DATEDIFF(day, cur.OutDate, nxt.InDate)<11
AND DATEDIFF(day, cur.OutDate, nxt.InDate)>0
THEN nxt.OutDate
ELSE cur.OutDate
END AS OutDate2
INTO #t4a
FROM C cur
LEFT OUTER JOIN C prv ON (prv.rownum=cur.rownum-1 AND prv.Id=cur.Id)
LEFT OUTER JOIN C nxt ON (nxt.rownum=cur.rownum+1 AND nxt.Id=cur.Id)