Count Same IDs in two different Date Periods - postgresql

Let's say I want the ids that are counted as a result of my query, and checked if the same ids appear in a different month.
Here I join 2 tables via distinct id's and count the returning rows to know how many of the matching id's I have. Here that is for the month June.
I'd like:
eg. in June 100 distinct ids
eg. in July 90 of the same ids left
Please help!
I am stuck as my Sql is not very advanced,...
with total as (
select distinct(transactions.u_id), count(*)
from transactions
join contacts using (u_id)
join table using (contact_id)
where transactions.when_created between '2020-06-01' AND '2020-06-30'
group by transactions.u_id
HAVING COUNT(*) > 1
)
SELECT
COUNT(*)
FROM
total

Let's say that you are interested about the query of the like of
select transactions.u_id, count(*) as total
from transactions
join contacts using (u_id)
join table using (contact_ud)
where transactions.when_created between '2020-06-01' and '2020-06-30'
group by transactions.u_id;
You are also interested in
select transactions.u_id, count(*) as total
from transactions
join contacts using (u_id)
join table using (contact_ud)
where transactions.when_created between '2020-08-01' and '2020-08-30'
group by transactions.u_id;
And you want to get:
the ids which can be found by both
the minimum total
Then, you can do something of the like of
select t1.u_id,
case
when t1.total > t2.total then t2.total
else t1.total
end as total
from (
select transactions.u_id, count(*) as total
from transactions
join contacts using (u_id)
join table using (contact_ud)
where transactions.when_created between '2020-06-01' and '2020-06-30'
group by transactions.u_id) t1
join (
select transactions.u_id, count(*) as total
from transactions
join contacts using (u_id)
join table using (contact_ud)
where transactions.when_created between '2020-06-01' and '2020-06-30'
group by transactions.u_id) t2
on t1.u_id = t2.u_id

Related

Get distinct row by primary key, but use value from another column

I'm trying to get the sum of the total time that was spent sending all emails within a campaign.
Because of the joins in my query I end up with the 'processing_time' column duplicated over many rows. So running sum(s.processing_time) as send_time will always over represent how long it took to run.
select
c.id,
c.sender,
c.subject,
count(*) as total_items,
count(distinct s.id) as sends,
sum(s.processing_time) as send_time,
from campaigns c
left join sends s on c.id = s.campaigns_id
left join opens o on s.id = o.sends_id
group by c.id;
I'd ideally like to do something like sum(s.processing_time when distinct s.id) but I can't quite work out how to achieve that.
I have made other attempts using case but I always run into the same issue, I need to get the distinct rows based on the ID column, but work with another column.
Since you want statistics related to distinct s.id as well as c.id, group by both columns. Collect the (intermediate) data that you need,
and use this table as the inner table in a nested sub-select query.
In the outer select, group by c.id alone.
Since the inner select groups by s.id, values which are unique per s.id will not get double-counted when you sum/group by c.id.
SELECT id
, sender
, subject
, sum(total_items) as total_items
, sum(sends) as sends
, sum(processing_time) as send_time
FROM (
SELECT
c.id
, s.id as sid
, count(*) as total_items
, 1 as sends
, s.processing_time
, c.sender
, c.subject
FROM campaigns c
LEFT JOIN sends s on c.id = s.campaigns_id
LEFT JOIN opens o on s.id = o.sends_id
GROUP BY c.id, c.sender, c.subject, s.processing_time, s.id) t
GROUP BY id, sender, subject
ORDER BY id
Since the final table includes sender and subject, you'll need to group by these columns as well to avoid an error such as:
ERROR: column "c.sender" must appear in the GROUP BY clause or be used in an aggregate function
LINE 14: , c.sender

Can't solve this SQL query

I have a difficulty dealing with a SQL query. I use PostgreSQL.
The query says: Show the customers that have done at least an order that contains products from 3 different categories. The result will be 2 columns, CustomerID, and the amount of orders. I have written this code but I don't think it's correct.
select SalesOrderHeader.CustomerID,
count(SalesOrderHeader.SalesOrderID) AS amount_of_orders
from SalesOrderHeader
inner join SalesOrderDetail on
(SalesOrderHeader.SalesOrderID=SalesOrderDetail.SalesOrderID)
inner join Product on
(SalesOrderDetail.ProductID=Product.ProductID)
where SalesOrderDetail.SalesOrderDetailID in
(select DISTINCT count(ProductCategoryID)
from Product
group by ProductCategoryID
having count(DISTINCT ProductCategoryID)>=3)
group by SalesOrderHeader.CustomerID;
Here are the database tables needed for the query:
where SalesOrderDetail.SalesOrderDetailID in
(select DISTINCT count(ProductCategoryID)
Is never going to give you a result as an ID (SalesOrderDetailID) will never logically match a COUNT (count(ProductCategoryID)).
This should get you the output I think you want.
SELECT soh.CustomerID, COUNT(soh.SalesOrderID) AS amount_of_orders
FROM SalesOrderHeader soh
INNER JOIN SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
INNER JOIN Product p ON sod.ProductID = p.ProductID
HAVING COUNT(DISTINCT p.ProductCategoryID) >= 3
GROUP BY soh.CustomerID
Try this :
select CustomerID,count(*) as amount_of_order from
SalesOrder join
(
select SalesOrderID,count(distinct ProductCategoryID) CategoryCount
from SalesOrderDetail JOIN Product using (ProductId)
group by 1
) CatCount using (SalesOrderId)
group by 1
having bool_or(CategoryCount>=3) -- At least on CategoryCount>=3

SQL Server 2012 Passing parameter from main query to the Joined subquery

I need to select some settings from some joined tables, but only if Items ORDER BY EndTime DESC ItemID is among first 1000 Items.
Do do this I built the following Query that, although surely can be improved, works:
SELECT ss.ModuleCode, ss.MaxItems , w.*
FROM Subscriptions ss
JOIN Sellers s ON s.UID=ss.UID
JOIN Items i ON s.UserID=i.UserID
JOIN Items ii ON i.ItemID=ii.ItemID
JOIN Modules mo ON ss.ModuleCode=mo.ModuleCode
JOIN Settings w ON w.UID=s.UID AND ss.ModuleCode=w.WCode
FULL JOIN GoogleFonts f ON f.FontCode=a.FontFamily
JOIN ( SELECT
ItemID
FROM Items
WHERE UserID=#UserID
ORDER BY EndTime DESC
OFFSET 0 ROWS
FETCH FIRST (1000) ROWS ONLY
) it ON it.ItemID=i.ItemID
WHERE it.ItemID=#ItemID
AND .....
but since MaxItems is not always 1000 and its value is defined by ss.MaxItems,
I would replace the fixed value of 1000 with the dynamic value of ss.MaxItems, but I haven't find a way to do it:
Although not optimal since makes the query much heavier, I tried putting instead of 1000 a further query with this result:
SELECT ss.ModuleCode, ss.MaxItems , w.*
FROM Subscriptions ss
JOIN Sellers s ON s.UID=ss.UID
JOIN Items i ON s.UserID=i.UserID
JOIN Items ii ON i.ItemID=ii.ItemID
JOIN Modules mo ON ss.ModuleCode=mo.ModuleCode
JOIN Settings w ON w.UID=s.UID AND ss.ModuleCode=w.WCode
FULL JOIN GoogleFonts f ON f.FontCode=a.FontFamily
JOIN ( SELECT
ItemID
FROM Items
WHERE UserID=#UserID
ORDER BY EndTime DESC
OFFSET 0 ROWS
FETCH FIRST ( SELECT ss.MaxItems
FROM Subscriptions ss
JOIN Sellers s ON s.UID=ss.UID
JOIN Items i ON s.UserID=i.UserID
JOIN Modules mo ON ss.ModuleCode=mo.ModuleCode
JOIN Settings w ON w.UID=s.UID AND ss.ModuleCode=w.WCode
WHERE i.ItemID=#ItemID) ROWS ONLY
) it ON it.ItemID=i.ItemID
Where it.ItemID=#ItemID
AND .....
but since this returns more than 1 value it is not accepted: limiting to TOP 1 result the latest subquery will work but will not be fully dynamic as required.
Can suggest how to solve or at least suggest the path for the solution?
Thanks!
Instead of fetch use row_number:
JOIN (SELECT ItemID, ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY EndTime) as seqnum
FROM Items it
WHERE UserID = #UserID
) it
ON it.ItemID = i.ItemID AND seqnum <= ss.maxitems

Removing duplicate rows from relation

I have the following code which produces a relation:
SELECT book_id, shipments.customer_id
FROM shipments
LEFT JOIN editions ON (shipments.isbn = editions.isbn)
LEFT JOIN customers ON (shipments.customer_id = customers.customer_id)
In this relation, there are customer_ids as well as book_ids of books they have bought. My goal is to create a relation with each book in it and then how many unique customers bought it. I assume one way to achieve this is to eliminate all duplicate rows in the relation and then counting the instances of each book_id.
So my question is: How can I delete all duplicate rows from this relation?
Thanks!
EDIT: So what I mean is that I want all the rows in the relation to be unique. If there are three identical rows for example, two of them should be removed.
This will give you all the {customer,edition} pairs for which an order exists:
SELECT *
FROM customers c
JOIN editions e ON (
SELECT * FROM shipments s
WHERE s.isbn = e.isbn
AND s.customer_id = c.customer_id
);
The duplicates are in table shipments. You can remove these with a DISTINCT clause and then count them in an outer query GROUP BY isbn:
SELECT isbn, count(customer_id) AS unique_buyers
FROM (
SELECT DISTINCT isbn, customer_id FROM shipments) book_buyer
GROUP BY isbn;
If you want a list of all books, even where no purchases were made, you should LEFT JOIN the above to the list of all books:
SELECT isbn, coalesce(unique_buyers, 0) AS books_sold_to_unique_buyers
FROM editions
LEFT JOIN (
SELECT isbn, count(customer_id) AS unique_buyers
FROM (
SELECT DISTINCT isbn, customer_id FROM shipments) book_buyer
GROUP BY isbn) books_bought USING (isbn)
ORDER BY isbn;
You can write this more succinctly by joining before counting:
SELECT isbn, count(customer_id) AS books_sold_to_unique_buyers
FROM editions
LEFT JOIN (
SELECT DISTINCT isbn, customer_id FROM shipments) book_buyer USING (isbn)
GROUP BY isbn
ORDER BY isbn;

How to get fields and added in group by in PostreSQL8.4?

I am selecting column used in group by and count, and query looks something like
SELECT s.country, count(*) AS posts_ct
FROM store s
JOIN store_post_map sp ON sp.store_id = s.id
GROUP BY 1;
However, I want to select some more fields, like store name or store address from store table where count is max, but I don't to include that in group by clause.
For instance, to get the stores with the highest post-count per country:
SELECT DISTINCT ON (s.country)
s.country, s.store_id, s.name, sp.post_ct
FROM store s
JOIN (
SELECT store_id, count(*) AS post_ct
FROM store_post_map
GROUP BY store_id
) sp ON sp.store_id = s.id
ORDER BY s.country, sp.post_ct DESC
Add any number of columns from store to the SELECT list.
Details about this query style in this related answer:
Select first row in each GROUP BY group?
Reply to comment
This produces the count per country and picks (one of) the store(s) with the highest post-count:
SELECT DISTINCT ON (s.country)
s.country, s.store_id, s.name
,sum(post_ct) OVER (PARTITION BY s.country) AS post_ct_for_country
FROM store s
JOIN (
SELECT store_id, count(*) AS post_ct
FROM store_post_map
GROUP BY store_id
) sp ON sp.store_id = s.id
ORDER BY s.country, sp.post_ct DESC;
This works because the window function sum() is applied before DISTINCT ON per definition.