Complex insert based on few subqueries - postgresql

I have following tables:
Posts: id, CategoryId
PostAssociations: PostId FK Posts(id), AssociatedPostId FK Posts(id)
So each post may have many other posts as associated posts.
What I need is:
For each post P that does not have any associated posts yet, add 4 random associated posts from the same category (excluding post P - it should not be associated to himself).
So far I have
# SELECT Posts that don't have any associated posts yet
SELECT p.id
FROM "Posts" p
LEFT OUTER JOIN "PostAssociations" pa ON pa."PostId" = p.id
WHERE pa."PostId" IS NULL
And:
# SELECT 4 random posts from given category and excluding given id
SELECT id
FROM "Posts" p2
WHERE p2."CategoryId" = ? AND p2.id != ?
ORDER BY RANDOM()
LIMIT 4
I need query like this:
INSERT INTO "PostAssociations" VALUES ...
SQL fiddle with explanation what I need: http://sqlfiddle.com/#!2/6ba735/5

As you tagged the post with postgresql-9.3 I guess using LATERALto apply the random id generating query over all ids that are missing associations should work. (This should be functionally similar to using OUTER APPLYwith MS SQL for instance).
Sample SQL Fiddle (showing before, insert and after).
INSERT INTO PostAssociations
SELECT
p.id, rand.id rand
FROM Posts p
LEFT OUTER JOIN PostAssociations pa ON pa.PostId = p.id
LEFT JOIN LATERAL
(
SELECT id
FROM Posts
WHERE CategoryId = p.categoryid AND id != p.id
ORDER BY RANDOM()
LIMIT 4) AS rand ON TRUE
WHERE pa.PostId IS NULL;
I can't claim to be an expert with Postgresql so it's quite possible the query can be improved.

Related

postgresql left join but dont fetch if matching condition found

I have a bit of a complicated scenario. I have two tables, employee and agency. An employee may or may not have an agency, but if an employee has an agency I want the select clause to check another condition on the agency, but if the employee does not have an agency its fine I want to fetch the employee. I'm not sure how to write the select statement for this. This is what I have come up with so far
select * from employee e left join
agency a on a.id = e.agencyID and a.valid = true;
However the problem with this is that it fetches both employees without agencies which is fine, but it also fetches employees with agencies where a.valid = false. The only option I can think of is to do an union but I'm looking for something more simpler.
A UNION could actually be the solution that performs best, but you can write the query without UNION like this:
select *
from employee e
left join agency a
on a.id = e.agencyID
where coalesce(a.valid, true);
That will accept agencies where valid IS NULL, that is, result rows where the agency part was substituted with NULLs by the outer join.
You want except the condition that both table match(agency.id = employee.agencyID) and also agency.id is false. The following query will express the condition.
SELECT
e.*,
a.*
FROM
employee e
LEFT JOIN agency a ON a.id = e.agencyID
WHERE
NOT EXISTS (
SELECT
1
FROM
agency
WHERE
a.id = e.agencyID
AND a.valid IS FALSE)
ORDER BY
e.id;

How can I sort rows by number of corresponding rows in different table?

I'm making a small-scale reddit clone. There is a table for posts, a table for comments (relevant only for context), and a table for posts_comments. I'm trying to sort posts by the number of comments the post has.
This is the init for the posts_comments table
CREATE TABLE posts_comments (
id SERIAL PRIMARY KEY,
parent_id INTEGER,
comment_id INTEGER,
post_id INTEGER
)
This is the call I have, but it doesn't seem right
SELECT * FROM posts p
JOIN posts_comments pc ON p.id = pc.post_id
ORDER BY (SELECT COUNT(*) FROM pc WHERE pc.post_id = p.id) DESC
LIMIT $1
OFFSET $2
I want the output to be a list of posts sorted by the number of comments linked to that post
maybe like this:
SELECT
COUNT(pc.post_id) OVER (PARTITION BY p.id) AS num_comments
,* FROM posts p
LEFT OUTER JOIN posts_comments pc ON p.id = pc.post_id
ORDER BY 1 DESC
LIMIT $1
OFFSET $2
of it you only want the list of posts and not the comments.
SELECT
COUNT(pc.post_id) AS num_comments
,p.* FROM posts p
LEFT OUTER JOIN posts_comments pc ON p.id = pc.post_id
GROUP BY p.id
ORDER BY 1 DESC
LIMIT $1
OFFSET $2

Get distinct row by primary key, but use value from another column

I'm trying to get the sum of the total time that was spent sending all emails within a campaign.
Because of the joins in my query I end up with the 'processing_time' column duplicated over many rows. So running sum(s.processing_time) as send_time will always over represent how long it took to run.
select
c.id,
c.sender,
c.subject,
count(*) as total_items,
count(distinct s.id) as sends,
sum(s.processing_time) as send_time,
from campaigns c
left join sends s on c.id = s.campaigns_id
left join opens o on s.id = o.sends_id
group by c.id;
I'd ideally like to do something like sum(s.processing_time when distinct s.id) but I can't quite work out how to achieve that.
I have made other attempts using case but I always run into the same issue, I need to get the distinct rows based on the ID column, but work with another column.
Since you want statistics related to distinct s.id as well as c.id, group by both columns. Collect the (intermediate) data that you need,
and use this table as the inner table in a nested sub-select query.
In the outer select, group by c.id alone.
Since the inner select groups by s.id, values which are unique per s.id will not get double-counted when you sum/group by c.id.
SELECT id
, sender
, subject
, sum(total_items) as total_items
, sum(sends) as sends
, sum(processing_time) as send_time
FROM (
SELECT
c.id
, s.id as sid
, count(*) as total_items
, 1 as sends
, s.processing_time
, c.sender
, c.subject
FROM campaigns c
LEFT JOIN sends s on c.id = s.campaigns_id
LEFT JOIN opens o on s.id = o.sends_id
GROUP BY c.id, c.sender, c.subject, s.processing_time, s.id) t
GROUP BY id, sender, subject
ORDER BY id
Since the final table includes sender and subject, you'll need to group by these columns as well to avoid an error such as:
ERROR: column "c.sender" must appear in the GROUP BY clause or be used in an aggregate function
LINE 14: , c.sender

Removing duplicate rows from relation

I have the following code which produces a relation:
SELECT book_id, shipments.customer_id
FROM shipments
LEFT JOIN editions ON (shipments.isbn = editions.isbn)
LEFT JOIN customers ON (shipments.customer_id = customers.customer_id)
In this relation, there are customer_ids as well as book_ids of books they have bought. My goal is to create a relation with each book in it and then how many unique customers bought it. I assume one way to achieve this is to eliminate all duplicate rows in the relation and then counting the instances of each book_id.
So my question is: How can I delete all duplicate rows from this relation?
Thanks!
EDIT: So what I mean is that I want all the rows in the relation to be unique. If there are three identical rows for example, two of them should be removed.
This will give you all the {customer,edition} pairs for which an order exists:
SELECT *
FROM customers c
JOIN editions e ON (
SELECT * FROM shipments s
WHERE s.isbn = e.isbn
AND s.customer_id = c.customer_id
);
The duplicates are in table shipments. You can remove these with a DISTINCT clause and then count them in an outer query GROUP BY isbn:
SELECT isbn, count(customer_id) AS unique_buyers
FROM (
SELECT DISTINCT isbn, customer_id FROM shipments) book_buyer
GROUP BY isbn;
If you want a list of all books, even where no purchases were made, you should LEFT JOIN the above to the list of all books:
SELECT isbn, coalesce(unique_buyers, 0) AS books_sold_to_unique_buyers
FROM editions
LEFT JOIN (
SELECT isbn, count(customer_id) AS unique_buyers
FROM (
SELECT DISTINCT isbn, customer_id FROM shipments) book_buyer
GROUP BY isbn) books_bought USING (isbn)
ORDER BY isbn;
You can write this more succinctly by joining before counting:
SELECT isbn, count(customer_id) AS books_sold_to_unique_buyers
FROM editions
LEFT JOIN (
SELECT DISTINCT isbn, customer_id FROM shipments) book_buyer USING (isbn)
GROUP BY isbn
ORDER BY isbn;

PostgreSQL join to most recent record between tables

I have two tables pertaining to this question: conversations has many messages. The basic structure (with just the relevant columns) is as follows:
conversations (
int id (PK)
)
create table conversation_participants (
int id (PK),
int conversation_id (FK conversations),
int user_id (FK users),
unique key on [conversation_id, profile_id]
)
create table messages (
int id (PK),
int conversation_id (FK conversations),
int sender_id (FK users),
int recipient_id (FK users),
text body
)
For each conversations entry, given a user_id I want to receive:
all conversations that user participated in (i.e.: conversations.*)
joined to the most recent matching message (i.e.: order by messages.id desc limit 1)
conversations ordered by their most recent message id (i.e.: order by messages.id desc)
Unfortunately, all the query help I can seem to find on anything like this pertains to MySQL, and that doesn't work in PostgreSQL. The closest thing I found is this answer on StackOverflow that gives an example of the select distinct on (...) syntax. However, unless I'm just doing it wrong, I can't seem to get the results ordered in the correct way given the grouping constraints I need with that method.
All information is in the table "messages", you don't need the other tables:
SELECT
id,
body,
c.* -- content from conversations
FROM messages
JOIN
(SELECT MAX(id) AS id, conversation_id
FROM messages
WHERE 1 IN(sender_id, recipient_id) -- the number is the userid, should be dynamic
GROUP BY conversation_id) sub
USING(id, conversation_id)
JOIN conversations c ON c.id = messages.conversation_id
ORDER BY
id DESC;
Edit: Just JOIN on "conversations" to get the data needed from this table.
Try this:
select
*
from
conversation_participants cp
join conversations c on
c.id = cp.conversation_id
-- assuming you only want the conversations where a
-- message has been left. otherwise use left join
join messages m on
m.conversation_id = cp.conversation_id
and m.id = (
select
id
from
messages _m
where
_m.conversation_id = m.conversation_id
and sender_id = 1
order by
id desc
limit 1
)
where
cp.user_id = 1
order by
m.id desc;