PostgreSQL join to most recent record between tables - postgresql

I have two tables pertaining to this question: conversations has many messages. The basic structure (with just the relevant columns) is as follows:
conversations (
int id (PK)
)
create table conversation_participants (
int id (PK),
int conversation_id (FK conversations),
int user_id (FK users),
unique key on [conversation_id, profile_id]
)
create table messages (
int id (PK),
int conversation_id (FK conversations),
int sender_id (FK users),
int recipient_id (FK users),
text body
)
For each conversations entry, given a user_id I want to receive:
all conversations that user participated in (i.e.: conversations.*)
joined to the most recent matching message (i.e.: order by messages.id desc limit 1)
conversations ordered by their most recent message id (i.e.: order by messages.id desc)
Unfortunately, all the query help I can seem to find on anything like this pertains to MySQL, and that doesn't work in PostgreSQL. The closest thing I found is this answer on StackOverflow that gives an example of the select distinct on (...) syntax. However, unless I'm just doing it wrong, I can't seem to get the results ordered in the correct way given the grouping constraints I need with that method.

All information is in the table "messages", you don't need the other tables:
SELECT
id,
body,
c.* -- content from conversations
FROM messages
JOIN
(SELECT MAX(id) AS id, conversation_id
FROM messages
WHERE 1 IN(sender_id, recipient_id) -- the number is the userid, should be dynamic
GROUP BY conversation_id) sub
USING(id, conversation_id)
JOIN conversations c ON c.id = messages.conversation_id
ORDER BY
id DESC;
Edit: Just JOIN on "conversations" to get the data needed from this table.

Try this:
select
*
from
conversation_participants cp
join conversations c on
c.id = cp.conversation_id
-- assuming you only want the conversations where a
-- message has been left. otherwise use left join
join messages m on
m.conversation_id = cp.conversation_id
and m.id = (
select
id
from
messages _m
where
_m.conversation_id = m.conversation_id
and sender_id = 1
order by
id desc
limit 1
)
where
cp.user_id = 1
order by
m.id desc;

Related

How can I improve this query in postgresql? Its taking more than 48 houers already

I do have the following query and I'm running it against a postgresql db which has more than 10M entries in table account_message and 1M entries in table message.
Postgresql is in Version PostgreSQL 11.12, compiled by Visual C++ build 1914, 64-bit
Is there any way to make this query faster because it takes more than 2 days already and did not finish yet.
DELETE FROM account_message WHERE message_id in
(SELECT t2.id FROM message t2 WHERE NOT EXISTS
(SELECT 1 FROM customer t1 WHERE
t1.username = t2.username));
Table account_message has the following columns:
id (bigint)(primary key)
user_id (bigint)
message_id (bigint)
isRead (boolean)
isDeleted (boolean)
Table message has the following columns:
id (bigint)(primary key)
username (character varying)(255)
text (character varying)(10000)
details(character varying)(1000)
status(integer)
Table customer has the following columns:
username (character varying)(255)(primary key)
type(character varying)(500)
details(character varying)(10000)
status(integer)
active(boolean)
This did the trick for me and also makes it much faster.
DELETE FROM account_message WHERE message_id IN (
SELECT m.id FROM message m
LEFT JOIN customer c ON m.username = c.username
WHERE c.username IS NULL LIMIT 1000)
You may be able to improve this by
getting rid of your dependent subquery, and
doing it in batches.
Try this to get a batch of one thousand message ids to delete. LEFT JOIN ... WHERE col IS NULL is a way to write WHERE NOT EXISTS without a dependent subqiery.
SELECT m.id
FROM message m
LEFT JOIN customer c ON m.username = c.username
WHERE c.username IS NULL
LIMIT 1000
Then, use the subquery in a statement. Repeat the statement until it deletes no rows.
DELETE
FROM account_message
WHERE message_id IN (
SELECT m.id
FROM message m
LEFT JOIN customer c ON m.username = c.username
WHERE c.username IS NULL
LIMIT 1000)
Doing this in batches of 1000 helps performance: it splits your operation into multiple reasonably sized database transactions.
First, try to optimize the select inside brakets. Something like:
DELETE FROM account_message WHERE message_id in
(
select t2.id message t2
left join customer t1 on (t1.username = t2.username)
where t2.username is NULL
)

show records that have only one matchin row in another table

I need to write a sql code that probably is very simple but I am very new to it.
I need to find all the records from one table that have matching id (but no more than one) from the other table. eg. one table contains records of the employees and the second one with employees' telephone numbers. i need to find all employees with only one telephone no
Sample data would be nice. In absence of:
SELECT
employees.employee_id
FROM
employees
LEFT JOIN
(SELECT distinct on(employee_id) employee_id FROM emp_phone) AS phone
ON
employees.employee_id = phone.employee_id
WHERE
phone.employee_id IS NOT NULL;
You need a join of the 2 tables, group by employee and the condition in the having clause:
SELECT e.employee_id, e.name
FROM employees e INNER JOIN numbers n
ON e.employee_id = n.employee_id
GROUP BY e.employee_id, e.name
HAVING COUNT(*) = 1;
If there can be more than a few numbers per employee in the table with the employees' telephone numbers (calling it tel), then it's cheaper to avoid GROUP BY and HAVING which has to process all rows. Find employees with "unique" numbers using a self-anti-join with NOT EXISTS.
While you don't need more than the employee_id and their unique phone number, you don't even have to involve the employee table at all:
SELECT *
FROM tel t
WHERE NOT EXISTS (
SELECT FROM tel
WHERE employee_id = t.employee_id
AND tel_number <> t.tel_number -- or use PK column
);
If you need additional columns from the employee table:
SELECT * -- or any columns you need
FROM (
SELECT employee_id AS id, tel_number -- or any columns you need
FROM tel t
WHERE NOT EXISTS (
SELECT FROM tel
WHERE employee_id = t.employee_id
AND tel_number <> t.tel_number -- or use PK column
)
) t
JOIN employee e USING (id);
The column alias in the subquery (employee_id AS id) is just for convenience. Then the outer join condition can be USING (id), and the ID column is only included once in the result, even with SELECT * ...
Simpler with a smart naming convention that uses employee_id for the employee ID everywhere. But it's a widespread anti-pattern to use employee.id instead.
Related:
JOIN table if condition is satisfied, else perform no join

How can I sort rows by number of corresponding rows in different table?

I'm making a small-scale reddit clone. There is a table for posts, a table for comments (relevant only for context), and a table for posts_comments. I'm trying to sort posts by the number of comments the post has.
This is the init for the posts_comments table
CREATE TABLE posts_comments (
id SERIAL PRIMARY KEY,
parent_id INTEGER,
comment_id INTEGER,
post_id INTEGER
)
This is the call I have, but it doesn't seem right
SELECT * FROM posts p
JOIN posts_comments pc ON p.id = pc.post_id
ORDER BY (SELECT COUNT(*) FROM pc WHERE pc.post_id = p.id) DESC
LIMIT $1
OFFSET $2
I want the output to be a list of posts sorted by the number of comments linked to that post
maybe like this:
SELECT
COUNT(pc.post_id) OVER (PARTITION BY p.id) AS num_comments
,* FROM posts p
LEFT OUTER JOIN posts_comments pc ON p.id = pc.post_id
ORDER BY 1 DESC
LIMIT $1
OFFSET $2
of it you only want the list of posts and not the comments.
SELECT
COUNT(pc.post_id) AS num_comments
,p.* FROM posts p
LEFT OUTER JOIN posts_comments pc ON p.id = pc.post_id
GROUP BY p.id
ORDER BY 1 DESC
LIMIT $1
OFFSET $2

Picking latest message from a conversation in a messages table

I have a messages table with structure something like this:
Messageid auto
FromUserID int
ToUserid Int
ConversationID int
Subject text
Message text
DateSent datetime
MessageRead bit
I need to write a query which return the row (or just the messageid and I can do a self-join) of the last (most recent) message for each conversation. Essentially this means that within a given conversation (represented by conversationid), which of several messages is the latest and what is the messageid of this message.
I can group by conversationid and ask for max(datesent), but then how do I get the messageid for that particular record?
(This is a production db, so I can't modify the table structures.)
select *
from
( select *
, row_number() over (partition by ConversationID order by DateSent desc) rn
from table
) tt
where tt.rn = 1
Not sure if the execution time would be shorter than Paparazzi's... but here is an alternative you can try using an inner join:
select t.*
from table t
join (
select conversationid, max(datesent)
from table
group by conversationid
) x on x.conversationid = t.conversationid and x.datesent = t.datesent

Complex insert based on few subqueries

I have following tables:
Posts: id, CategoryId
PostAssociations: PostId FK Posts(id), AssociatedPostId FK Posts(id)
So each post may have many other posts as associated posts.
What I need is:
For each post P that does not have any associated posts yet, add 4 random associated posts from the same category (excluding post P - it should not be associated to himself).
So far I have
# SELECT Posts that don't have any associated posts yet
SELECT p.id
FROM "Posts" p
LEFT OUTER JOIN "PostAssociations" pa ON pa."PostId" = p.id
WHERE pa."PostId" IS NULL
And:
# SELECT 4 random posts from given category and excluding given id
SELECT id
FROM "Posts" p2
WHERE p2."CategoryId" = ? AND p2.id != ?
ORDER BY RANDOM()
LIMIT 4
I need query like this:
INSERT INTO "PostAssociations" VALUES ...
SQL fiddle with explanation what I need: http://sqlfiddle.com/#!2/6ba735/5
As you tagged the post with postgresql-9.3 I guess using LATERALto apply the random id generating query over all ids that are missing associations should work. (This should be functionally similar to using OUTER APPLYwith MS SQL for instance).
Sample SQL Fiddle (showing before, insert and after).
INSERT INTO PostAssociations
SELECT
p.id, rand.id rand
FROM Posts p
LEFT OUTER JOIN PostAssociations pa ON pa.PostId = p.id
LEFT JOIN LATERAL
(
SELECT id
FROM Posts
WHERE CategoryId = p.categoryid AND id != p.id
ORDER BY RANDOM()
LIMIT 4) AS rand ON TRUE
WHERE pa.PostId IS NULL;
I can't claim to be an expert with Postgresql so it's quite possible the query can be improved.