Count With Conditional on PostgreSQL - postgresql

I have a table with people and another with visits. I want to count all visits but if the person signed up with 'emp' or 'oth' on ref_signup then remove the first visit. Example:
This are my tables:
PEOPLE:
id | ref_signup
---------------------
20 | emp
30 | oth
23 | fri
VISITS
id | date
-------------------------
20 | 10-01-2019
20 | 10-05-2019
23 | 10-09-2019
23 | 10-10-2019
30 | 09-10-2019
30 | 10-07-2019
On this example the visit count should be 4 because persons with id's 20 and 30 have their ref_signup as emp or oth, so it should exclude their first visit, but count from the second and forward.
This is what I have as a query:
SELECT COUNT(*) as visit_count FROM visits
LEFT JOIN people ON people.id = visits.people_id
WHERE visits.group_id = 1
Would using a case on the count help on this case as I just want to remove one visit not all of the visits from the person.

Subtract from COUNT(*) the distinct number of person.ids with person.ref_signup IN ('emp', 'oth'):
SELECT
COUNT(*) -
COUNT(DISTINCT CASE WHEN p.ref_signup IN ('emp', 'oth') THEN p.id END) as visit_count
FROM visits v LEFT JOIN people p
ON p.id = v.id
See the demo.
Result:
| visit_count |
| ----------- |
| 4 |
Note: this code and demo fiddle use the column names of your sample data.

Premise, select the count of visits from each person, along with a synthetic column that contains a 1 if the referral was from emp or oth, a 0 otherwise. Select the sum of the count minus the sum of that column.
SELECT SUM(count) - SUM(ignore_first) FROM (SELECT COUNT(*) as count, CASE WHEN ref_signup in ('emp', 'oth') THEN 1 ELSE 0 END as ignore_first as visit_count FROM visits
LEFT JOIN people ON people.id = visits.people_id
WHERE visits.group_id = 1 GROUP BY id) a

where's "people_id" in your example ?
SELECT COUNT(*) as visit_count
FROM visits v
JOIN people p ON p.id = v.people_id
WHERE p.ref_signup IN ('emp','oth');
then remove the first visit.
You cannot select count and delete the first visit at same time.
DELETE FROM visits
WHERE id IN (
SELECT id
FROM visits v
JOIN people p ON p.id = v.people_id
WHERE p.ref_signup IN ('emp','oth')
ORDER BY v.id
LIMIT 1
);
edit: typos

First, I create the tables
create table people (id int primary key, ref_signup varchar(3));
insert into people (id, ref_signup) values (20, 'emp'), (30, 'oth'), (23, 'fri');
create table visits (people_id int not null, visit_date date not null);
insert into visits (people_id, visit_date) values (20, '10-01-2019'), (20, '10-05-2019'), (23, '10-09-2019'), (23, '10-10-2019'), (30, '09-10-2019'), (30, '10-07-2019');
You can use the row_number() window function to mark which visit is "visit number one":
select
*,
row_number() over (partition by people_id order by visit_date) as visit_num
from people
join visits
on people.id = visits.people_id
Once you have that, you can do another query on those results, and use the filter clause to count up the correct rows that match the condition where visit_num > 1 or ref_signup = 'fri':
-- wrap the first query in a WITH clause
with joined_visits as (
select
*,
row_number() over (partition by people_id order by visit_date) as visit_num
from people
join visits
on people.id = visits.people_id
)
select count(1) filter (where visit_num > 1 or ref_signup = 'fri')
from joined_visits;

-- First get the corrected counts for all users
WITH grouped_visits AS (
SELECT
COUNT(visits.*) -
CASE WHEN people.ref_signup IN ('emp', 'oth') THEN 1 ELSE 0 END
AS visit_count
FROM visits
INNER JOIN people ON (people.id = visits.id)
GROUP BY people.id, people.ref_signup
)
-- Then sum them
SELECT SUM(visit_count)
FROM grouped_visits;
This should give you the result you're looking for.
On a side note, I can't help but think clever use of a window function could do this in a single shot without the CTE.
EDIT: No, it can't since window functions run after needed WHERE and GROUP BY and HAVING clauses.

Related

Postgresql, combine different columns counts into one result?

I have Car table. Car has is_sold and is_shipped. A Car belongs to a dealership, dealership_id (FK).
I want to run a query that tells me the count of sold cars and the count of shipped cars for a given dealership all in one result.
sold_count | shipped_count
10 | 4
The single queries I have look like this:
select count(*) as sold_count
from car
where dealership_id=25 and is_sold=true;
and
select count(*) as shipped_count
from car
where dealership_id=25 and is_shipped=true;
How do I combine the two to get both counts in one result?
This will do:
select dealership_id,
sum(case when is_sold is true then 1 else 0 end),
sum(case when is_shipped is true then 1 else 0 end)
from cars group by dealership_id;
You can use the filter clause of the Aggregate function. (see demo)
select dealership_id
, count(*) filter (where is_sold) cars_sold
, count(*) filter (where is_shipped) cars_shipped
from cars
where dealership_id = 25
group by dealership_id;
You can also using cross join.
select 'hello' as col1, 'world' as col2;
return:
col1 | col2
-------+-------
hello | world
(1 row)
similarly,
with a as
(
select count(*) as a1 from emp where empid> 5),
b as (
select count(*) as a2 from emp where salary > 6000)
select * from a, b;
or you can even apply to different table. like:
with a as
(select count(*) as a1 from emp where empid> 5),
b as
(select count(*) as a2 from ab )
select * from a, b;
with a as
(
select count(*) as sold_count
from car
where dealership_id=25 and is_sold=true
),
b as
(
select count(*) as shipped_count
from car
where dealership_id=25 and is_shipped=true
)
select a,b;
further reading: https://www.postgresql.org/docs/current/queries-table-expressions.html.
https://stackoverflow.com/a/26369295/15603477

How to find the latest date and price>0 per id?

In POSTGRESQL 13, I have a table of ids,dates, prices.
I simply want to have the latest date where the price is greater than 0 per id.
One row per id.
So the optimal output is :
id | the_date | price
1 2013-08-09 0.45
2 2013-08-11 0.34
I have an SQL fiddle at this link :
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=a89bbbc922601be5465ad764fd035161
I have tried an INNER JOIN with the MAX date unsuccessfully.
SELECT DISTINCT ON (id)
id, the_date, price
FROM inventory
WHERE price>0
ORDER BY id ASC, the_date DESC
You can do something like this:
select i.id, i.the_date, i.price
from inventory as i, (
select id, max(the_date) as max_date
from inventory
where price > 0
group by id
) as c where c.id = i.id and i.the_date = c.max_date
Demo in dbfiddle.uk
This might work for you.
SELECT inventory.id, the_date, price
FROM inventory
join (select id,max(the_date) md from inventory where price>0 group by id ) d
on inventory.id=d.id and the_date=d.md
If you want a row for id's with not price you'd use left join.

Cascading sum hierarchy using recursive cte

I'm trying to perform recursive cte with postgres but I can't wrap my head around it. In terms of performance issue there are only 50 items in TABLE 1 so this shouldn't be an issue.
TABLE 1 (expense):
id | parent_id | name
------------------------------
1 | null | A
2 | null | B
3 | 1 | C
4 | 1 | D
TABLE 2 (expense_amount):
ref_id | amount
-------------------------------
3 | 500
4 | 200
Expected Result:
id, name, amount
-------------------------------
1 | A | 700
2 | B | 0
3 | C | 500
4 | D | 200
Query
WITH RECURSIVE cte AS (
SELECT
expenses.id,
name,
parent_id,
expense_amount.total
FROM expenses
WHERE expenses.parent_id IS NULL
LEFT JOIN expense_amount ON expense_amount.expense_id = expenses.id
UNION ALL
SELECT
expenses.id,
expenses.name,
expenses.parent_id,
expense_amount.total
FROM cte
JOIN expenses ON expenses.parent_id = cte.id
LEFT JOIN expense_amount ON expense_amount.expense_id = expenses.id
)
SELECT
id,
SUM(amount)
FROM cte
GROUP BY 1
ORDER BY 1
Results
id | sum
--------------------
1 | null
2 | null
3 | 500
4 | 200
You can do a conditional sum() for only the root row:
with recursive tree as (
select id, parent_id, name, id as root_id
from expense
where parent_id is null
union all
select c.id, c.parent_id, c.name, p.root_id
from expense c
join tree p on c.parent_id = p.id
)
select e.id,
e.name,
e.root_id,
case
when e.id = e.root_id then sum(ea.amount) over (partition by root_id)
else amount
end as amount
from tree e
left join expense_amount ea on e.id = ea.ref_id
order by id;
I prefer doing the recursive part first, then join the related tables to the result of the recursive query, but you could do the join to the expense_amount also inside the CTE.
Online example: http://rextester.com/TGQUX53703
However, the above only aggregates on the top-level parent, not for any intermediate non-leaf rows.
If you want to see intermediate aggregates as well, this gets a bit more complicated (and is probably not very scalable for large results, but you said your tables aren't that big)
with recursive tree as (
select id, parent_id, name, 1 as level, concat('/', id) as path, null::numeric as amount
from expense
where parent_id is null
union all
select c.id, c.parent_id, c.name, p.level + 1, concat(p.path, '/', c.id), ea.amount
from expense c
join tree p on c.parent_id = p.id
left join expense_amount ea on ea.ref_id = c.id
)
select e.id,
lpad(' ', (e.level - 1) * 2, ' ')||e.name as name,
e.amount as element_amount,
(select sum(amount)
from tree t
where t.path like e.path||'%') as sub_tree_amount,
e.path
from tree e
order by path;
Online example: http://rextester.com/MCE96740
The query builds up a path of all IDs belonging to a (sub)tree and then uses a scalar sub-select to get all child rows belonging to a node. That sub-select is what will make this quite slow as soon as the result of the recursive query can't be kept in memory.
I used the level column to create a "visual" display of the tree structure - this helps me debugging the statement and understanding the result better. If you need the real name of an element in your program you would obviously only use e.name instead of pre-pending it with blanks.
I could not get your query to work for some reason. Here's my attempt that works for the particular table you provided (parent-child, no grandchild) without recursion. SQL Fiddle
--- step 1: get parent-child data together
with parent_child as(
select t.*, amount
from
(select e.id, f.name as name,
coalesce(f.name, e.name) as pname
from expense e
left join expense f
on e.parent_id = f.id) t
left join expense_amount ea
on ea.ref_id = t.id
)
--- final step is to group by id, name
select id, pname, sum(amount)
from
(-- step 2: group by parent name and find corresponding amount
-- returns A, B
select e.id, t.pname, t.amount
from expense e
join (select pname, sum(amount) as amount
from parent_child
group by 1) t
on t.pname = e.name
-- step 3: to get C, D we union and get corresponding columns
-- results in all rows and corresponding value
union
select id, name, amount
from expense e
left join expense_amount ea
on e.id = ea.ref_id
) t
group by 1, 2
order by 1;

How to get latest value from table with self inner join

Please see http://sqlfiddle.com/#!6/9254d/3/0
I have two tables, Person and Values, PersonID is the link between them. Each person in the Values table has multiple values per day for every hour. I need to get the latest value for each user. I had a look on SO and what I could find was to get MAX(ValueDate) and then join on that but doesn't work. Join on PersonID didn't work either, not sure what else to try.
The output I need is
Name Value
1fn 1ln 2
2fn 2ln 20
3fn 3ln 200
I don't need the greatest value, I need the latest value for each person. Please share if you have any ideas. Thanks.
Try this:
SQLFIDDLEExample
DECLARE #Org nvarchar(3)
SELECT #Org = 'aaa'
DECLARE #MyDate date
SELECT #MyDate = CONVERT(date, '2014-09-12')
SELECT a.Name,
a.Value as Revenue
FROM(
SELECT p.FName + ' ' + p.LName AS Name,
vt.Value,
ROW_NUMBER()OVER(PARTITION BY vt.PersonID ORDER BY vt.ValueDate desc) as rnk
FROM Person p
LEFT JOIN ValueTable vt
ON vt.PersonID = p.PersonID
WHERE vt.ValueDate < DATEADD(day,1,#MyDate)
AND vt.ValueDate >= #MyDate
AND vt.Org = #Org)a
WHERE a.rnk = 1
ORDER BY a.Name ASC
Result:
| NAME | REVENUE |
|---------|---------|
| 1fn 1ln | 2 |
| 2fn 2ln | 20 |
| 3fn 3ln | 200 |

Query to get last conversations for user inbox

I need a specific SQL query to select last 10 conversations for user inbox.
Inbox shows only conversations(threads) with every user - it selects the last message from the conversation and shows it in inbox.
Edited.
Expecting result: to extract latest message from each of 10 latest conversations. Facebook shows latest conversations in the same way
And one more question. How to make a pagination to show next 10 latest messages from previous latest conversations in the next page?
Private messages in the database looks like:
| id | user_id | recipient_id | text
| 1 | 2 | 3 | Hi John!
| 2 | 3 | 2 | Hi Tom!
| 3 | 2 | 3 | How are you?
| 4 | 3 | 2 | Thanks, good! You?
As per my understanding, you need to get the latest message of the conversation on per-user basis (of the last 10 latest conversations)
Update: I have modified the query to get the latest_conversation_message_id for every user conversation
The below query gets the details for user_id = 2, you can modify, users.id = 2 to get it for any other user
SQLFiddle, hope this solves your purpose
SELECT
user_id,
users.name,
users2.name as sent_from_or_sent_to,
subquery.text as latest_message_of_conversation
FROM
users
JOIN
(
SELECT
text,
row_number() OVER ( PARTITION BY user_id + recipient_id ORDER BY id DESC) AS row_num,
user_id,
recipient_id,
id
FROM
private_messages
GROUP BY
id,
recipient_id,
user_id,
text
) AS subquery ON ( ( subquery.user_id = users.id OR subquery.recipient_id = users.id) AND row_num = 1 )
JOIN users as users2 ON ( users2.id = CASE WHEN users.id = subquery.user_id THEN subquery.recipient_id ELSE subquery.user_id END )
WHERE
users.id = 2
ORDER BY
subquery.id DESC
LIMIT 10
Info: The query gets the latest message of every conversation with any other user, If user_id 2, sends a message to user_id 3, that too is displayed, as it indicates the start of a conversation. The latest message of every conversation with any other user is displayed
To solve groupwise-max in pg you can use DISTINCT ON. Like this:
SELECT
DISTINCT ON(pm.user_id)
pm.user_id,
pm.text
FROM
private_messages AS pm
WHERE pm.recipient_id= <my user id>
ORDER BY pm.user_id, pm.id DESC;
http://sqlfiddle.com/#!12/4021d/19
To get the latest X however we will have to use it in a subselect:
SELECT
q.user_id,
q.id,
q.text
FROM
(
SELECT
DISTINCT ON(pm.user_id)
pm.user_id,
pm.id,
pm.text
FROM
private_messages AS pm
WHERE pm.recipient_id=2
ORDER BY pm.user_id, pm.id DESC
) AS q
ORDER BY q.id DESC
LIMIT 10;
http://sqlfiddle.com/#!12/4021d/28
To get both sent and recieved threads:
SELECT
q.user_id,
q.recipient_id,
q.id,
q.text
FROM
(
SELECT
DISTINCT ON(pm.user_id,pm.recipient_id)
pm.user_id,
pm.recipient_id,
pm.id,
pm.text
FROM
private_messages AS pm
WHERE pm.recipient_id=2 OR pm.user_id=2
ORDER BY pm.user_id,pm.recipient_id, pm.id DESC
) AS q
ORDER BY q.id DESC
LIMIT 10;
http://sqlfiddle.com/#!12/4021d/42
Paste it after your WHERE clause
ORDER BY "ColumnName" [ASC, DESC]
UNION Description at W3Schools it combines the result of this 2 statements.
SELECT "ColumnName" FROM "TableName"
UNION
SELECT "ColumnName" FROM "TableName"
For large data sets I think you might like to try running the two statements and then consolidating the results, as an index scan on (user_id and id) or (recipient_id and id) ought to be very efficient at getting the 10 most recent conversations of each type.
with sent_messages as (
SELECT *
FROM private_messages
WHERE user_id = my_user_id
ORDER BY id desc
LIMIT 10),
received_messages as ( SELECT *
FROM private_messages
WHERE recipient_id = my_user_id
ORDER BY id desc
LIMIT 10),
all_messages as (
select *
from sent_messages
union all
select *
from received_messages)
select *
from all_messages
order by id desc
limit 10
Edit: Actually another query worth trying might be:
select *
from private_messages
where id in (
select id
from (
SELECT id
FROM private_messages
WHERE user_id = my_user_id
ORDER BY id desc
LIMIT 10
union all
SELECT id
FROM private_messages
WHERE recipient_id = my_user_id
ORDER BY id desc
LIMIT 10) all_ids
order by id desc
limit 10) last_ten_ids
order by id desc
This might be better in 9.2+, where the indexes alone could be used to get the id's, or in cases where the most recent number to retrieve is very large. Still a bit unclear on that though. If in doubt I'd go for the former version.