Joining distinct counts from another table per group - postgresql

I have 2 tables:
table_1
date id_1 name id_2 transaction_id
202116 1 Google 235 ABAF51
202116 1 Google 489 GHH512
202116 1 Google 973 JDDF12
202116 1 Google 1189 HDFTS1
202116 1 Amazon 207 HSDY12
202116 1 Amazon 3329 KFGJD88
202116 1 Amazon 3360 JHTJDS1
202116 1 Facebook 862 SYTAHJ4
table_2
date id_1 name id_2
202116 1 Google 22
202116 1 Google 102
202116 1 Google 104
202116 1 Google 196
202116 1 Amazon 228
202116 1 Facebook 230
202116 1 Google 235
202116 1 Google 240
I am trying to have a table like so:
date id_1 name id_2 transactions
202116 1 Google 22 1
202116 1 Google 102 3
202116 1 Google 104 4
202116 1 Google 196 2
202116 1 Amazon 228 3
202116 1 Facebook 230 7
202116 1 Google 235 3
202116 1 Google 240 2
Where transactions is the DISTINCT COUNT of transaction_id from table_1 per group of date, id, name, id_2 ( mapped to table_2 and joined by date, id, name, id_2 )
So, the idea would be to count distinct transaction_id from table_1 values for
date id_1 name id_2
202116 1 Google 235
And assign the value ( let's say 1 ) to table_2 column transactions where:
date id_1 name id_2 transactions
202116 1 Google 235 1
And so on for each combination of date, id_1, name, id_2.
What I've tried:
select jp.date, jp.id_1, jp.name, jp.id_2, count(distinct(transaction_id)) from table_2 jp
left join table_1 using(date, id_1, name, id_2)
group by jp.date,jp.id_1, jp.name, jp.id_2,transaction_id
But it does not give me the correct output.
How can I achieve the desired result

without knowing the details of your table structure it's a bit hard, but why don't you solve the problem in 2 steps:
count distinct transaction_id from table_1 for date, id, name, id_2 with
with first_selection as (
select date, id, name, id_2, count(distinct transaction_id) nr_transactions
)
join the result with table_2
select t.date,
t.id,t.name,
t.id_2,
fs.nr_transactions
from table_2 t
join first_selection fs
on t.date=fs.date
and t.id = fs.id
and t.id_2 = fs.id_2
and t.name = ft.name
with the complete query being
with first_selection as (
select date, id, name, id_2, count(distinct transaction_id) nr_transactions
)
select t.date,
t.id,t.name,
t.id_2,
fs.nr_transactions
from table_2 t
join first_selection fs
on t.date=fs.date
and t.id = fs.id
and t.id_2 = fs.id_2
and t.name = ft.name

Related

Find Minimum Timestamp From 2 Users POSTGRES

This is my table_gamers:
game_id
user1
user2
timestamp
1
890
123
2022-01-01
2
123
768
2022-02-09
I need to find for each user:
The first user they played.
Their first game ID.
Their MIN timestamp (timestamp from their first game).
This is what I need:
User
User They Played
Game ID
timestamp
890
123
1
2022-01-01
123
890
1
2022-01-01
768
123
2
2022-02-09
This is my query:
SELECT user1 FROM table_gamers WHERE MIN(timestamp)
UNION ALL
SELECT user1 FROM table_gamers WHERE MIN(timestamp)
How do I query each User's First Opponent? I am confused.
doing step by step by some with_clauses:
first get all matches user1-user2, user2-user1
second give some ids by ordering by timestamp
third get what you want:
with base_data as (
select game_id,user1,user2,timestamp from table_gamers
union all
select game_id,user2,user1,timestamp from table_gamers
),
base_id as (
select
row_number() over (order by base_data.timestamp) as id,
row_number() over (PARTITION by base_data.user1 order by base_data.timestamp) as id_2,
*
from base_data
)
select * from base_id
where id_2 = 1 order by timestamp
retults in
id id_2 game_id user1 user2 timestamp
2 1 1 123 890 2022-01-01T00:00:00.000Z
1 1 1 890 123 2022-01-01T00:00:00.000Z
4 1 2 768 123 2022-02-09T00:00:00.000Z
i hope that gives you the right idea
https://www.db-fiddle.com/f/9PrxioFeVaTmtVcYdteovj/0

Getting duplicate records from 2 sql tables

I have 2 SQL tables
Table #1
account
product
expiry-date
101
prod1
2021-01-30
102
prod2
2021-02-20
103
prod3
2021-03-09
103
prod3
2021-03-19
104
prod4
2021-03-15
105
prod5
2021-04-23
105
prod5
2021-04-24
106
prod6
2021-04-25
Table #2
account
101
106
From the above 2 tables I want to get only unmatched records from Table1 and avoid duplicate records.
Result:
account
product
expiry-date
102
prod2
2021-02-20
103
prod3
2021-03-09
104
prod4
2021-03-15
105
prod5
2021-04-23
Below query I tried but I am getting duplicate records, because expiry date is unique on account. I am getting below records in my output
SQL query I tried:
select distinct (a.account, a.product, a.expiry-date)
from table1 a
where a.account not in (select account from table2)
Result:
account
product
expiry-date
102
prod2
2021-02-20
103
prod3
2021-03-09
103
prod3
2021-03-19
104
prod4
2021-03-15
105
prod5
2021-04-23
105
prod5
2021-04-24
You can use the same query using aggregation:
SELECT a.account
,a.product
,MIN(a.expiry) expiry
FROM table1 a
WHERE a.account NOT IN (
SELECT account
FROM table2
)
GROUP BY a.account
,a.product
You can use an anti-join and then ROW_NUMBER() For example:
select *
from (
select a.*, row_number() over(partition by accoun order by expiry) as rn
from table1 a
left join table2 b on b.account = a.account
where b.account is null
) x
where rn = 1

PostgreSQL : comparing two sets of results does not work

I have a table that contains 3 columns of ids, clothes, shoes, customers and relates them.
I have a query that works fine :
select clothes, shoes from table where customers = 101 (all clothes and shoes of customer 101). This returns
clothes - shoes (SET A)
1 6
1 2
33 12
24 null
Another query that works fine :
select clothes ,shoes from table
where customers in
(select customers from table where clothes = 1 and customers <> 101 ) (all clothes and shoes of any other customer than 101, with specified clothes). This returns
shoes - clothes(SET B)
6 null
null 24
1 1
2 1
12 null
null 26
14 null
Now I want to get all clothes and shoes from SET A that are not in SET B.
So (example) select from SET A where NOT IN SET B. This should return just clothes 33, right?
I try to convert this to a working query :
select clothes, shoes from table where customers = 101
and
(clothes,shoes) not in
(
select clothes,shoes from
table where customers in
(select customers from table where clothes = 1 and customers <> 101 )
) ;
I tried different syntaxes, but the above looks more logic.
Problem is I never get clothes 33, just an empty set.
How do I fix this? What goes wrong?
Thanks
Edit , here is the contents of the table
id shoes customers clothes
1 1 1 1
2 1 4 1
3 1 5 1
4 2 2 2
5 2 3 1
6 1 3 1
44 2 101 1
46 6 101 1
49 12 101 33
51 13 102
52 101 24
59 107 51
60 107 24
62 23 108 51
63 23 108 2
93 124 25
95 6 125
98 127 25
100 3 128
103 24 131
104 25 132
105 102 28
106 10 102
107 23 133
108 4 26
109 6 4
110 4 24
111 12 4
112 14 4
116 102 48
117 102 24
118 102 25
119 102 26
120 102 29
122 134 31
The except clause in PostgreSQL works the way the minus operator does in Oracle. I think that will give you what you want.
I think notionally your query looks right, but I suspect those pesky nulls are impacting your results. Just like a null is not-NOT equal to 5 (it's nothing, therefore it's neither equal to nor not equal to anything), a null is also not-NOT "in" anything...
select clothes, shoes
from table1
where customers = 101
except
select clothes, shoes
from table1
where customers in (
select customers
from table1
where clothes = 1 and customers != 101
)
For PostgreSQL null is undefined value, so You must get rid of potential nulls in your result:
select id,clothes,shoes from t1 where customers = 101 -- or select id...
and (
clothes not in
(
select COALESCE(clothes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101 )
)
OR
shoes not in
(
select COALESCE(shoes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101 )
)
)
if You wanted unique pairs you would use:
select clothes, shoes from t1 where customers = 101
and
(clothes,shoes) not in
(
select coalesce(clothes,-1),coalesce(shoes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101 )
) ;
You can't get "clothes 33" if You are selecting both clothes and shoes columns...
Also if u need to know exactly which column, clothes or shoes was unique to this customer, You might use this little "hack":
select id,clothes,-1 AS shoes from t1 where customers = 101
and
clothes not in
(
select COALESCE(clothes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101)
)
UNION
select id,-1,shoes from t1 where customers = 101
and
shoes not in
(
select COALESCE(shoes,-1) from
t1 where customers in
(select customers from t1 where clothes = 1 and customers <> 101)
)
And Your result would be:
id=49, clothes=33, shoes=-1
(I assume that there aren't any clothes or shoes with id -1, You may put any exotic value here)
Cheers

Ranking in PostgreSQL

I have a query that looks like this:
select
restaurant_id,
rank() OVER (PARTITION BY restaurant_id order by churn desc) as rank_churn,
churn,
orders,
rank() OVER (PARTITION BY restaurant_id order by orders desc) as rank_orders
from data
I would expect that this ranking function will order my data and provide a column that has 1,2,3,4 according to the values of the column.
However the outcome is always 1 in the ranking.
restaurant_id rank_churn churn orders rank_orders
2217 1 75 182 1
2249 1 398 896 1
2526 1 11 56 1
2596 1 89 139 1
What am I doing wrong?

Postgresql Query for display of records every 45 days

I have a table that has data of user_id and the timestamp they joined.
If I need to display the data month-wise I could just use:
select
count(user_id),
date_trunc('month',(to_timestamp(users.timestamp))::timestamp)::date
from
users
group by 2
The date_trunc code allows to use 'second', 'day', 'week' etc. Hence I could get data grouped by such periods.
How do I get data grouped by "n-day" period say 45 days ?
Basically I need to display number users per 45 day period.
Any suggestion or guidance appreciated!
Currently I get:
Date Users
2015-03-01 47
2015-04-01 72
2015-05-01 123
2015-06-01 132
2015-07-01 136
2015-08-01 166
2015-09-01 129
2015-10-01 189
I would like the data to come in 45 days interval. Something like :-
Date Users
2015-03-01 85
2015-04-15 157
2015-05-30 192
2015-07-14 229
2015-08-28 210
2015-10-12 294
UPDATE:
I used the following to get the output, but one problem remains. I'm getting values that are offset.
with
new_window as (
select
generate_series as cohort
, lag(generate_series, 1) over () as cohort_lag
from
(
select
*
from
generate_series('2015-03-01'::date, '2016-01-01', '45 day')
)
t
)
select
--cohort
cohort_lag -- This worked. !!!
, count(*)
from
new_window
join users on
user_timestamp <= cohort
and user_timestamp > cohort_lag
group by 1
order by 1
But the output I am getting is:
Date Users
2015-04-15 85
2015-05-30 157
2015-07-14 193
2015-08-28 225
2015-10-12 210
Basically The users displayed at 2015-03-01 should be the users between 2015-03-01 and 2015-04-15 and so on.
But I seem to be getting values of users upto a date. ie: upto 2015-04-15 users 85. which is not the results I want.
Any help here ?
Try this query :
SELECT to_char(i::date,'YYYY-MM-DD') as date, 0 as users
FROM generate_series('2015-03-01', '2015-11-30','45 day'::interval) as i;
OUTPUT :
date users
2015-03-01 0
2015-04-15 0
2015-05-30 0
2015-07-14 0
2015-08-28 0
2015-10-12 0
2015-11-26 0
This looks like a hot mess, and it might be better wrapped in a function where you could use some variables, but would something like this work?
with number_of_intervals as (
select
min (timestamp)::date as first_date,
ceiling (extract (days from max (timestamp) - min (timestamp)) / 45)::int as num
from users
),
intervals as (
select
generate_series(0, num - 1, 1) int_start,
generate_series(1, num, 1) int_end
from number_of_intervals
),
date_spans as (
select
n.first_date + 45 * i.int_start as interval_start,
n.first_date + 45 * i.int_end as interval_end
from
number_of_intervals n
cross join intervals i
)
select
d.interval_start, count (*) as user_count
from
users u
join date_spans d on
u.timestamp >= d.interval_start and
u.timestamp < d.interval_end
group by
d.interval_start
order by
d.interval_start
With this sample data:
User Id timestamp derived range count
1 3/1/2015 3/1-4/15
2 3/26/2015 "
3 4/4/2015 "
4 4/6/2015 " (4)
5 5/6/2015 4/16-5/30
6 5/19/2015 " (2)
7 6/16/2015 5/31-7/14
8 6/27/2015 "
9 7/9/2015 " (3)
10 7/15/2015 7/15-8/28
11 8/8/2015 "
12 8/9/2015 "
13 8/22/2015 "
14 8/27/2015 " (5)
Here is the output:
2015-03-01 4
2015-04-15 2
2015-05-30 3
2015-07-14 5