How do you find the number of users whose first/last visits are the same website - postgresql

Given a table of timestamp,user_id,country,site_id.
How do you find the number of users whose first/last visits are the same website?
/* unique users first site*/
SELECT ts,SWE.site_id, SWE.user_id
FROM SWE
WHERE SWE.ts = (
SELECT MIN(t.timestamp)
FROM SWE t
WHERE
t.user_id = SWE.user_id
)
/* unique users last site*/
SELECT ts,SWE.site_id, SWE.user_id
FROM SWE
WHERE SWE.ts = (
SELECT max(t.timestamp)
FROM SWE t
WHERE
t.user_id = SWE.user_id
)
I am not sure how to count when these are equal?

I'd use the DISTINCT ON operator to pick out the first/last visits for each user, then aggregate over these to check if they're different. something like:
WITH first_visits AS (
SELECT DISTINCT ON (user_id) * FROM user_visits
ORDER BY user_id, timestamp
), last_visits AS (
SELECT DISTINCT ON (user_id) * FROM user_visits
ORDER BY user_id, timestamp DESC
)
SELECT user_id,
array_to_string(array_agg(DISTINCT site_id), ', ') AS sites,
MIN(timestamp) AS first_visit, MAX(timestamp) as last_visit
FROM (
SELECT * FROM first_visits
UNION ALL
SELECT * FROM last_visits) x
GROUP BY user_id
HAVING COUNT(DISTINCT site_id) = 1;

Related

How to get number of consecutive days from current date using postgres?

I want to get the number of consecutive days from the current date using Postgres SQL.
enter image description here
Above is the scenario in which I have highlighted consecutive days count should be like this.
Below is the SQL query which I have created but it's not returning the expected result
with grouped_dates as (
select user_id, created_at::timestamp::date,
(created_at::timestamp::date - (row_number() over (partition by user_id order by created_at::timestamp::date) || ' days')::interval)::date as grouping_date
from watch_history
)
select * , dense_rank() over (partition by grouping_date order by created_at::timestamp::date) as in_streak
from grouped_dates where user_id = 702
order by created_at::timestamp::date
Can anyone please help me to resolve this issue?
If anyhow we can able to apply distinct for created_at field to below query then I will get solutions for my issue.
WITH list AS
(
SELECT user_id,
(created_at::timestamp::date - (row_number() over (partition by user_id order by created_at::timestamp::date) || ' days')::interval)::date as next_day
FROM watch_history
)
SELECT user_id, count(*) AS number_of_consecutive_days
FROM list
WHERE next_day IS NOT NULL
GROUP BY user_id
Does anyone have an idea how to apply distinct to created_at for the above mentioned query ?
To get the "number of consecutive days" for the same user_id :
WITH list AS
(
SELECT user_id
, array_agg(created_at) OVER (PARTITION BY user_id ORDER BY created_at RANGE BETWEEN CURRENT ROW AND '1 day' FOLLOWING) AS consecutive_days
FROM watch_history
)
SELECT user_id, count(DISTINCT d.day) AS number_of_consecutive_days
FROM list
CROSS JOIN LATERAL unnest(consecutive_days) AS d(day)
WHERE array_length(consecutive_days, 1) > 1
GROUP BY user_id
To get the list of "consecutive days" for the same user_id :
WITH list AS
(
SELECT user_id
, array_agg(created_at) OVER (PARTITION BY user_id ORDER BY created_at RANGE BETWEEN CURRENT ROW AND '1 day' FOLLOWING) AS consecutive_days
FROM watch_history
)
SELECT user_id
, array_agg(DISTINCT d.day ORDER BY d.day) AS list_of_consecutive_days
FROM list
CROSS JOIN LATERAL unnest(consecutive_days) AS d(day)
WHERE array_length(consecutive_days, 1) > 1
GROUP BY user_id
full example & result in dbfiddle

SQL Server - Select with Group By together Raw_Number

I'm using SQL Server 2000 (80). So, it's not possible to use the LAG function.
I have a code a data set with four columns:
Purchase_Date
Facility_no
Seller_id
Sale_id
I need to identify missing Sale_ids. So every sale_id is a 100% sequential, so the should not be any gaps in order.
This code works for a specific date and store if specified. But i need to work on entire data set looping looping through every facility_id and every seller_id for ever purchase_date
declare #MAXCOUNT int
set #MAXCOUNT =
(
select MAX(Sale_Id)
from #table
where
Facility_no in (124) and
Purchase_date = '2/7/2020'
and Seller_id = 1
)
;WITH TRX_COUNT AS
(
SELECT 1 AS Number
union all
select Number + 1 from TRX_COUNT
where Number < #MAXCOUNT
)
select * from TRX_COUNT
where
Number NOT IN
(
select Sale_Id
from #table
where
Facility_no in (124)
and Purchase_Date = '2/7/2020'
and seller_id = 1
)
order by Number
OPTION (maxrecursion 0)
My Dataset
This column:
case when
Sale_Id=0 or 1=Sale_Id-LAG(Sale_Id) over (partition by Facility_no, Purchase_Date, Seller_id)
then 'OK' else 'Previous Missing' end
will tell you which Seller_Ids have some sale missing. If you want to go a step further and have exactly your desired output, then filter out and distinct the 'Previous Missing' ones, and join with a tally table on not exists.
Edit: OP mentions in comments they can't use LAG(). My suggestion, then, would be:
Make a temp table that that has the max(sale_id) group by facility/seller_id
Then you can get your missing results by this pseudocode query:
Select ...
from temptable t
inner join tally N on t.maxsale <=N.num
where not exists( select ... from sourcetable s where s.facility=t.facility and s.seller=t.seller and s.sale=N.num)
> because the only way to "construct" nonexisting combinations is to construct them all and just remove the existing ones.
This one worked out
; WITH cte_Rn AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY Facility_no, Purchase_Date, Seller_id ORDER BY Purchase_Date) AS [Rn_Num]
FROM (
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id
FROM MyTable WITH (NOLOCK)
) a
)
, cte_Rn_0 as (
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id,
-- [Rn_Num] AS 'Skipped Sale'
-- , case when Sale_id = 0 Then [Rn_Num] - 1 Else [Rn_Num] End AS 'Skipped Sale for 0'
, [Rn_Num] - 1 AS 'Skipped Sale for 0'
FROM cte_Rn a
)
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id,
-- [Skipped Sale],
[Skipped Sale for 0]
FROM cte_Rn_0 a
WHERE NOT EXISTS
(
select * from cte_Rn_0 b
where b.Sale_id = a.[Skipped Sale for 0]
and a.Facility_no = b.Facility_no
and a.Purchase_Date = b.Purchase_Date
and a.Seller_id = b.Seller_id
)
--ORDER BY Purchase_Date ASC

calculate the percentage of users from CTE

I've 2 CTE. The first counts the number of users. The second does the same. It is necessary to calculate the percentage ratio between them.
Prompt how it can be done?
WITH count AS ( SELECT user_id
from users u
where u.status = 'Over'),
users as (Select user_id
from users u
where u.status LIKE 'LR'
and user_id IN (select * from count))
Select COUNT(*) From users
WITH count AS ( SELECT user_id
from users u
where u.description = 'Track'),
users as (Select user_id
from from users u
where u.status NOT LIKE 'LR'
and user_id IN (select * from count))
Select COUNT(*) From users
You can do it without CTE, just simple select with 2 counts:
SELECT count( CASE WHEN description = 'Over' AND status LIKE 'LR' THEN 1 END )
/
count( CASE WHEN description = 'Track' AND status NOT LIKE 'LR' THEN 1 END )
As Ratio
FROM users
With minimal changes, you can just do one bigger CTE:
WITH count_1 AS
(
SELECT user_id
FROM users u
WHERE u.status = 'Over'
),
users_1 AS
(
SELECT user_id
FROM users u
WHERE u.status LIKE 'LR'
AND user_id IN (SELECT user_id FROM count_1)
),
count_2 AS
(
SELECT user_id
FROM users u
WHERE u.description = 'Track'
),
users_2 AS
(
SELECT user_id
FROM users u
WHERE u.status NOT LIKE 'LR'
AND user_id IN (select user_id from count_2)
)
SELECT
CAST( (SELECT count(*) FROM users_1) AS FLOAT) /
(SELECT count(*) FROM users_2) AS ratio
NOTE 1: The query doesn't make any sense, so I guess there is some misspelling, or some columns messed up. The count_1 will choose users with a status = 'Over', the users_1 will choose the ones which have also a status = 'LR' (the result is already ZERO).
NOTE 2: You wouldn't make queries this way... The following query means exactly the same, and is much simpler (and faster):
WITH
count_1 AS
(
SELECT count(user_id) AS c
FROM users u
WHERE u.description = 'Over'
AND u.status = 'LR'
),
count_2 AS
(
SELECT count(user_id) AS c
FROM users u
WHERE u.description = 'Track'
AND u.status <> 'LR'
)
SELECT
(count_1.c + 0.0) / count_2.c AS ratio
FROM
count_1, count_2 ;
Yet another version:
SELECT count(*) FILTER (WHERE description = 'Over' AND status LIKE 'LR')
/
count(*) FILTER (WHERE description = 'Track' AND status NOT LIKE 'LR')
As Ratio
FROM users

selecting only two employees from every department

Can you let me know how to select only two employees from every department? The table has deptname, ssn, name . I am doing a sampling and I need only two ssns for every department name. Can someone help?
You can accomplish this with an "OLAP expression" row_number()
with e as
( select deptname, ssn, empname,
row_number() over (partition by dptname order by empname) as pick
from employees
)
select deptname, ssn, empname
from e
where pick < 3
order by deptname, ssn
This example will give you the two employees with the lowest order names, because that is what is specified in the row_number() (order by) expression.
Try this:
select *
from t t1
where (
select count(*)
from t t2
where
t2.deptname = t1.deptname
and
t2.ssn <= t1.ssn) <= 2
order by deptname, ssn,name;
The above will give "smallest" two ssn.
If you want top 2, change to t2.ssn >= t1.ssn
sqlfiddle
The data:
The result from query:
select * from
( select rank() over (partition by dptname order by empname) as count , *
from employees
)
where count<=2
order by deptname, ssn,name;

Difficult query (DB2)

Suppose I have a table called spitems with the following fields:
spitemid (unique key)
modifiedon (timestamp)
parentid
a number of other unsignificant fields
What I want to retrieve, is the spitem rows with the highest modifiedon day for each parentid.
However, be aware that the modifiedon timestamp is not unique, so it is possible that for one parent id, there are two spitemids with the same modifiedon timestamp. In that case, I need one of these two spitemids listed, I don't care which one.
So to be clear: the list I return should contain all the parentids once and only once.
update
meeting over, here is my shot:
select *
from table
join where spitmid in
(select max(spitmid)
from table
join
(select parentid, max(modifiedon) as d from table group by parentid) inlist
on table.parentid = inlist.parentid and table.modifiedon = inlist.d
group by parentid, datemodified
)
old entry
not sure if this is different on DB2, here it is for sql server.
select *
from table
join (select parentid, max(modifiedon) as d from table group by parentid) as toplist on
table.parentid = toplist.parentid and table.modifiedon = toplist.d
hmm... this will return more than one for the dups... can't fix it now, have to go to a meeting.
Based on your requirements, following should get you the latest items.
SELECT t1.*
FROM Table t1
INNER JOIN (
SELECT spitemid = MAX(t1.spitemid)
FROM Table t1
INNER JOIN (
SELECT parentid, modifiedon = MAX(modifiedon)
FROM Table
GROUP BY parentid
) t2 ON t2.parentid = t1.parentid
AND t2.modifiedon = t1.modifiedon
GROUP BY t1.parentid, t1.modifiedon
) t2 ON t2.spitemid = t1.spitemid
You can do it with two nested subqueries. The first gets max modifiedon for each parentid, and then the second gets max spitemid for each parentid/modifiedon group.
SELECT *
FROM spitems
WHERE spitemid IN
(
SELECT parentid, modifiedon, max(spitemid) spitemid
FROM (
SELECT parentid, MAX(modifiedon) modifiedon
FROM spitems
GROUP BY parentid
) A
GROUP BY parentid, modifiedon
)
A common table expression will give you the opportunity to number the rows before you issue the final SELECT.
WITH items AS
(
SELECT spitemid, parentid, modifiedon,
ROWNUMBER() OVER (PARTITION BY parentid ORDER BY modifiedon DESC) AS rnum
FROM yourTable
)
SELECT spitemid, parentid, modifiedon FROM items WHERE rnum = 1
;
SELECT sr.receiving_id, sc.collection_id FROM stock_collection as sc, stock_requisation as srq, stock_receiving as sr WHERE (sc.stock_id = '" & strStockID & "' AND sc.datemm_issued = '" & strMM & "' AND sc.qty_issued >= 0 AND sc.collection_id = srq.requisition_id AND srq.active_status = 'Active') OR (sr.stock_id = '" & strStockID & "' AND sr.datemm_received = '" & strMM & "' AND sr.qty_received >= 0)