select only when an item exists on a table 3 or more times postgres - postgresql

I have 2 tables. SalesOrderDetail and SalesOrderHeader.
SalesOrderDetails contains SalesOrderID and ProductID columns.
SalesOrderHeader contains SalesOrderID and CustomerID.
I want to make a query that shows all the Customers who ordered 3 or more products with different ProductID and how many orders he made(with 3 or more different products). I know that a customer made an order of 3 or more products when the table SalesOrderDetail have his SalesOrderID number more than 3 or more times.
So the Customer with ID 29825 has ordered 12 different Products.
And here's my code:
SELECT "SalesOrderHeader"."CustomerID", count("SalesOrderDetail"."SalesOrderID") AS TotalOrders
FROM
public."SalesOrderHeader",
public."SalesOrderDetail"
WHERE
"SalesOrderHeader"."SalesOrderID" = "SalesOrderDetail"."SalesOrderID"
GROUP BY "SalesOrderHeader"."CustomerID"
HAVING count("SalesOrderDetail"."SalesOrderID") >= 3
Problem with this is that is shows the number of products he ordered but I want the total orders with 3 or more different products.

If you want the total orders with 3 or more products per customer, then use two levels of aggregation:
select soh."CustomerID", count(*) as NumOrders
from public."SalesOrderHeader" soh join
(select SalesOrderID, count(distinct ProductID) as numproducts
from public."SalesOrderDetail" sod
group by SalesOrderId
) sod
on sod."SalesOrderID" = soh."SalesOrderID"
where numproducts >= 3
group by soh."CustomerID"

Related

Count Same IDs in two different Date Periods

Let's say I want the ids that are counted as a result of my query, and checked if the same ids appear in a different month.
Here I join 2 tables via distinct id's and count the returning rows to know how many of the matching id's I have. Here that is for the month June.
I'd like:
eg. in June 100 distinct ids
eg. in July 90 of the same ids left
Please help!
I am stuck as my Sql is not very advanced,...
with total as (
select distinct(transactions.u_id), count(*)
from transactions
join contacts using (u_id)
join table using (contact_id)
where transactions.when_created between '2020-06-01' AND '2020-06-30'
group by transactions.u_id
HAVING COUNT(*) > 1
)
SELECT
COUNT(*)
FROM
total
Let's say that you are interested about the query of the like of
select transactions.u_id, count(*) as total
from transactions
join contacts using (u_id)
join table using (contact_ud)
where transactions.when_created between '2020-06-01' and '2020-06-30'
group by transactions.u_id;
You are also interested in
select transactions.u_id, count(*) as total
from transactions
join contacts using (u_id)
join table using (contact_ud)
where transactions.when_created between '2020-08-01' and '2020-08-30'
group by transactions.u_id;
And you want to get:
the ids which can be found by both
the minimum total
Then, you can do something of the like of
select t1.u_id,
case
when t1.total > t2.total then t2.total
else t1.total
end as total
from (
select transactions.u_id, count(*) as total
from transactions
join contacts using (u_id)
join table using (contact_ud)
where transactions.when_created between '2020-06-01' and '2020-06-30'
group by transactions.u_id) t1
join (
select transactions.u_id, count(*) as total
from transactions
join contacts using (u_id)
join table using (contact_ud)
where transactions.when_created between '2020-06-01' and '2020-06-30'
group by transactions.u_id) t2
on t1.u_id = t2.u_id

Sum of one column grouped by 2nd column with groups made based on 3rd column

Data
So my data looks like:
product user_id value id
pizza 1 50 1
burger 1 30 2
pizza 2 50 3
fries 1 10 4
pizza 3 50 5
burger 1 30 6
burger 2 30 7
Problem Statement
And I wanted to compute Lifetime values of customers of each product as a metric to know which product is doing great in terms of user retention.
Desired Output
My desired output is:
product
value_by_customers_of_these_products
total_customers
ltv
pizza
250
3
250/3 = 83.33
burger
200
2
200/2 = 100
fries
120
1
120/1 = 120
Columns Description:
value_by_customers_of_these_products : Total value generated by
customers of each product including orders which do not contain the
product
total_customers : Simple COUNT(DISTINCT user_id) GROUP BY product
Current Workaround
Currently I am doing this:
SELECT "pizza" AS product, SUM(value) value_by_customers_of_these_products, COUNT(DISTINCT user_id) users FROM orders WHERE user_id in (SELECT user_id FROM orders WHERE product = "pizza")
UNION ALL
SELECT "burger" AS product, SUM(value) value_by_customers_of_these_products, COUNT(DISTINCT user_id) users FROM orders WHERE user_id in (SELECT user_id FROM orders WHERE product = "burger")
UNION ALL
SELECT "fries" AS product, SUM(value) value_by_customers_of_these_products, COUNT(DISTINCT user_id) users FROM orders WHERE user_id in (SELECT user_id FROM orders WHERE product = "fries")
I have a python script obtaining DISTINCT product names from my table and then repeating the query string for each product and updating query from time to time. This is really a pain as I have to do every time a new product is launched and sky-rocketing length of query is another issue. How can I achieve this via built-in BigQuery functions or minimal headache?
Code to generate Sample Data
WITH orders as (SELECT "pizza" AS product,
1 AS user_id,
50 AS value, 1 AS id,
UNION ALL SELECT "burger", 1, 30,2
UNION ALL SELECT "pizza", 2, 50,3
UNION ALL SELECT "fries", 1, 10,4
UNION ALL SELECT "pizza", 3, 50,5
UNION ALL SELECT "burger", 1, 30, 6
UNION ALL SELECT "burger", 3, 30, 7)
Use below
with user_value as (
select user_id, sum(value) values
from `project.dataset.table`
group by user_id
), product_user as (
select distinct product, user_id
from `project.dataset.table`
)
select product,
sum(values) as value_by_customers_of_these_products,
count(user_id) as total_customers,
round(sum(values) / count(user_id), 2) as ltv
from product_user
join user_value
using(user_id)
group by product
if applied to sample data in your question - output is

How to simplify a join of 2 tables in HIVE and count values

I have two tables in HIVE, "orders" and "customers". I want to get top n user names of users who placed most orders (in status "CLOSED"). Orders table has key order_customer_id, column order_status and customers has key customer_id and name consists of 2 columns customer_fname and customer_lname.
ORDERS
order_customer_id, order_status
1,CLOSED
2,CLOSED
3,INPROGRESS
1,INPROGRESS
1,CLOSED
2,CLOSED
CUSTOMERS
customer_id, customer_fname, customer_lname
1,Mickey, Mouse
2,Henry, Ford
3,John, Doe
I tried this code:
select c.customer_id, count(o.order_customer_id) as COUNT, concat(c.customer_fname," ",c.customer_lname) as FULLNAME from customers c join orders o on c.customer_id=o.order_customer_id where o.order_status='CLOSED' group by c.customer_id,FULLNAME order by COUNT desc limit 10;
this does not work - returns error.
I was able to get the result by first creating a 3rd table:
create table id_sum as select o.order_customer_id,count(o.order_id) as COUNT from orders o join customers c on c.customer_id=o.order_customer_id where order_status='CLOSED' group by o.order_customer_id;
1833 6
5493 5
1363 5
1687 5
569 4
1764 4
1345 4
Then I joined the tables:
select s.*,concat(c.customer_fname," " ,c.customer_lname) from id_sum s join customers c on s.order_customer_id = c.customer_id order by count desc limit 20;
This resulted in desired output:
customer_id, order_count, full_name
1833 6 Ronald Smith
5493 5 Mary Cochran
1363 5 Kathy Rios
1687 5 Jerry Ellis
569 4 Mary Frye
1764 4 Megan Davila
1345 4 Adam Wilson
Is there a way how to write it in one command or more effectively?
The subquery with alias sq creates a relation with two columns order_count and customer_id calculating for each customer_id the total number of orders. This is then joined with the CUSTOMERS table. The result is sorted descending and limited to (the top) 10 rows.
SELECT c.customer_id, sq.order_count, concat(c.customer_fname," " ,c.customer_lname) as full_name
FROM CUSTOMERS c JOIN (
SELECT COUNT(*) as order_count, order_customer_id FROM ORDERS
WHERE order_status = 'CLOSED'
GROUP BY order_customer_id
) sq on c.customer_id = sq.order_customer_id
ORDER BY sq.order_count desc LIMIT 10
;
The idea is to use a subquery instead of a third table.

Removing duplicate rows from relation

I have the following code which produces a relation:
SELECT book_id, shipments.customer_id
FROM shipments
LEFT JOIN editions ON (shipments.isbn = editions.isbn)
LEFT JOIN customers ON (shipments.customer_id = customers.customer_id)
In this relation, there are customer_ids as well as book_ids of books they have bought. My goal is to create a relation with each book in it and then how many unique customers bought it. I assume one way to achieve this is to eliminate all duplicate rows in the relation and then counting the instances of each book_id.
So my question is: How can I delete all duplicate rows from this relation?
Thanks!
EDIT: So what I mean is that I want all the rows in the relation to be unique. If there are three identical rows for example, two of them should be removed.
This will give you all the {customer,edition} pairs for which an order exists:
SELECT *
FROM customers c
JOIN editions e ON (
SELECT * FROM shipments s
WHERE s.isbn = e.isbn
AND s.customer_id = c.customer_id
);
The duplicates are in table shipments. You can remove these with a DISTINCT clause and then count them in an outer query GROUP BY isbn:
SELECT isbn, count(customer_id) AS unique_buyers
FROM (
SELECT DISTINCT isbn, customer_id FROM shipments) book_buyer
GROUP BY isbn;
If you want a list of all books, even where no purchases were made, you should LEFT JOIN the above to the list of all books:
SELECT isbn, coalesce(unique_buyers, 0) AS books_sold_to_unique_buyers
FROM editions
LEFT JOIN (
SELECT isbn, count(customer_id) AS unique_buyers
FROM (
SELECT DISTINCT isbn, customer_id FROM shipments) book_buyer
GROUP BY isbn) books_bought USING (isbn)
ORDER BY isbn;
You can write this more succinctly by joining before counting:
SELECT isbn, count(customer_id) AS books_sold_to_unique_buyers
FROM editions
LEFT JOIN (
SELECT DISTINCT isbn, customer_id FROM shipments) book_buyer USING (isbn)
GROUP BY isbn
ORDER BY isbn;

Using Derived Tables and CTEs to Display Details?

I am teaching myself T-SQL and am struggling to comprehend the following example..
Suppose you want to display several nonaggregated columns along with
some aggregate expressions that apply to the entire result set or to a
larger grouping level. For example, you may need to display several
columns from the Sales.SalesOrderHeader table and calculate the
percent of the TotalDue for each sale compared to the TotalDue for all
the customer’s sales. If you group by CustomerID, you can’t include
other nonaggregated columns from Sales.SalesOrderHeader unless you
group by those columns. To get around this, you can use a derived
table or a CTE.
Here are two examples given...
SELECT c.CustomerID, SalesOrderID, TotalDue, AvgOfTotalDue,
TotalDue/SumOfTotalDue * 100 AS SalePercent
FROM Sales.SalesOrderHeader AS soh
INNER JOIN
(SELECT CustomerID, SUM(TotalDue) AS SumOfTotalDue,
AVG(TotalDue) AS AvgOfTotalDue
FROM Sales.SalesOrderHeader
GROUP BY CustomerID) AS c ON soh.CustomerID = c.CustomerID
ORDER BY c.CustomerID;
WITH c AS
(SELECT CustomerID, SUM(TotalDue) AS SumOfTotalDue,
AVG(TotalDue) AS AvgOfTotalDue
FROM Sales.SalesOrderHeader
GROUP BY CustomerID)
SELECT c.CustomerID, SalesOrderID, TotalDue,AvgOfTotalDue,
TotalDue/SumOfTotalDue * 100 AS SalePercent
FROM Sales.SalesOrderHeader AS soh
INNER JOIN c ON soh.CustomerID = c.CustomerID
ORDER BY c.CustomerID;
Why doesn't this query produce the same result..
SELECT CustomerID, SalesOrderID, TotalDue, AVG(TotalDue) AS AvgOfTotalDue,
TotalDue/SUM(TotalDue) * 100 AS SalePercent
FROM Sales.SalesOrderHeader
GROUP BY CustomerID, SalesOrderID, TotalDue
ORDER BY CustomerID
I'm looking for someone to explain the above examples in another way or step through it logically so I can understand how they work?
The aggregates in this statement (i.e. SUM and AVG) don't do anything:
SELECT CustomerID, SalesOrderID, TotalDue, AVG(TotalDue) AS AvgOfTotalDue,
TotalDue/SUM(TotalDue) * 100 AS SalePercent
FROM Sales.SalesOrderHeader
GROUP BY CustomerID, SalesOrderID, TotalDue
ORDER BY CustomerID
The reason for this is you're grouping by TotalDue, so all records in the same group have the same value for this field. In the case of AVG this means you're guarenteed for AvgOfTotalDue to always equal TotalDue. For SUM it's possible you'd get a different result, but as you're also grouping by SalesOrderID (which I'd imagine is unique in the SalesOrderHeader table) you will only have one record per group, so again this will always equal the TotalDue value.
With the CTE example you're only grouping by CustomerId; as a customer may have many sales orders associated with it, these aggregate values will be different to the TotalDue.
EDIT
Explanation of the aggregate of field included in group by:
When you group by a value, all rows with that same value are collected together and aggregate functions are performed over them. Say you had 5 rows with a total due of 1 and 3 with a total due of 2 you'd get two result lines; one with the 1s and one with the 2s. Now if you perform a sum on these you have 3*1 and 2*2. Now divide by the number of rows in that result line (to get the average) and you have 3*1/3 and 2*2/2; so things cancel out leaving you with 1 and 2.
select totalDue, avg(totalDue)
from (
select 1 totalDue
union all select 1 totalDue
union all select 1 totalDue
union all select 2 totalDue
union all select 2 totalDue
) x
group by totalDue
select uniqueId, totalDue, avg(totalDue), sum(totalDue)
from (
select 1 uniqueId, 1 totalDue
union all select 2 uniqueId, 1 totalDue
union all select 3 uniqueId, 1 totalDue
union all select 4 uniqueId, 2 totalDue
union all select 5 uniqueId, 2 totalDue
) x
group by uniqueId
Runnable Example: http://sqlfiddle.com/#!2/d41d8/21263