How to simplify a join of 2 tables in HIVE and count values - hiveql

I have two tables in HIVE, "orders" and "customers". I want to get top n user names of users who placed most orders (in status "CLOSED"). Orders table has key order_customer_id, column order_status and customers has key customer_id and name consists of 2 columns customer_fname and customer_lname.
ORDERS
order_customer_id, order_status
1,CLOSED
2,CLOSED
3,INPROGRESS
1,INPROGRESS
1,CLOSED
2,CLOSED
CUSTOMERS
customer_id, customer_fname, customer_lname
1,Mickey, Mouse
2,Henry, Ford
3,John, Doe
I tried this code:
select c.customer_id, count(o.order_customer_id) as COUNT, concat(c.customer_fname," ",c.customer_lname) as FULLNAME from customers c join orders o on c.customer_id=o.order_customer_id where o.order_status='CLOSED' group by c.customer_id,FULLNAME order by COUNT desc limit 10;
this does not work - returns error.
I was able to get the result by first creating a 3rd table:
create table id_sum as select o.order_customer_id,count(o.order_id) as COUNT from orders o join customers c on c.customer_id=o.order_customer_id where order_status='CLOSED' group by o.order_customer_id;
1833 6
5493 5
1363 5
1687 5
569 4
1764 4
1345 4
Then I joined the tables:
select s.*,concat(c.customer_fname," " ,c.customer_lname) from id_sum s join customers c on s.order_customer_id = c.customer_id order by count desc limit 20;
This resulted in desired output:
customer_id, order_count, full_name
1833 6 Ronald Smith
5493 5 Mary Cochran
1363 5 Kathy Rios
1687 5 Jerry Ellis
569 4 Mary Frye
1764 4 Megan Davila
1345 4 Adam Wilson
Is there a way how to write it in one command or more effectively?

The subquery with alias sq creates a relation with two columns order_count and customer_id calculating for each customer_id the total number of orders. This is then joined with the CUSTOMERS table. The result is sorted descending and limited to (the top) 10 rows.
SELECT c.customer_id, sq.order_count, concat(c.customer_fname," " ,c.customer_lname) as full_name
FROM CUSTOMERS c JOIN (
SELECT COUNT(*) as order_count, order_customer_id FROM ORDERS
WHERE order_status = 'CLOSED'
GROUP BY order_customer_id
) sq on c.customer_id = sq.order_customer_id
ORDER BY sq.order_count desc LIMIT 10
;
The idea is to use a subquery instead of a third table.

Related

Postgresql group by relation

I want to group the records by relation.
products table:
id
price
1
100
2
200
3
300
4
400
product_properties table:
id
productId
propertyId
1
1
2
2
1
3
3
2
2
4
2
3
5
3
4
6
4
4
The query should select lowest price group by product_properties. I mean, If products have same properties in product_properties, query should return product that has lowest price.
So, For these tables query should return products that have ids 1,3.
I use TypeORM, I tried join the relation and distinct on relation alias name but its not worked.
How can I achieve this?
I wrote two variants query for you:
-- variant 1
select distinct t1.product_id from (
select
pr.price, pp.product_id, pp.property_id, min(pr.price) OVER(PARTITION BY pp.property_id) as min_price
from
test.product_properties pp
inner join
test.products pr on pp.product_id = pr.id
) t1
where
t1.price = t1.min_price;
-- variant 2
select distinct t1.product_id from test.product_properties t1
inner join test.products t2 on t1.product_id = t2.id
inner join (
select
pp.property_id, min(pr.price) as min_price
from
test.product_properties pp
inner join
test.products pr on pp.product_id = pr.id
group by pp.property_id
) t3 on t3.property_id = t1.property_id and t3.min_price = t2.price;

select only when an item exists on a table 3 or more times postgres

I have 2 tables. SalesOrderDetail and SalesOrderHeader.
SalesOrderDetails contains SalesOrderID and ProductID columns.
SalesOrderHeader contains SalesOrderID and CustomerID.
I want to make a query that shows all the Customers who ordered 3 or more products with different ProductID and how many orders he made(with 3 or more different products). I know that a customer made an order of 3 or more products when the table SalesOrderDetail have his SalesOrderID number more than 3 or more times.
So the Customer with ID 29825 has ordered 12 different Products.
And here's my code:
SELECT "SalesOrderHeader"."CustomerID", count("SalesOrderDetail"."SalesOrderID") AS TotalOrders
FROM
public."SalesOrderHeader",
public."SalesOrderDetail"
WHERE
"SalesOrderHeader"."SalesOrderID" = "SalesOrderDetail"."SalesOrderID"
GROUP BY "SalesOrderHeader"."CustomerID"
HAVING count("SalesOrderDetail"."SalesOrderID") >= 3
Problem with this is that is shows the number of products he ordered but I want the total orders with 3 or more different products.
If you want the total orders with 3 or more products per customer, then use two levels of aggregation:
select soh."CustomerID", count(*) as NumOrders
from public."SalesOrderHeader" soh join
(select SalesOrderID, count(distinct ProductID) as numproducts
from public."SalesOrderDetail" sod
group by SalesOrderId
) sod
on sod."SalesOrderID" = soh."SalesOrderID"
where numproducts >= 3
group by soh."CustomerID"

Limit for inner Join Table

I have a scenario where I am joining three tables and getting the results.
My problem is i have apply limit for joined table.
Take below example, i have three tables 1) books and 2) Customer 3)author. I need to find list of books sold today with author and customer name however i just need last nth customers not all by passing books Id
Books Customer Authors
--------------- ---------------------- -------------
Id Name AID Id BID Name Date AID Name
1 1 1 ABC 1 A1
2 2 1 CED 2 A2
3 3 2 DFG
How we can achieve this?
You are looking for LATERAL.
Sample:
SELECT B.Id, C.Name
FROM Books B,
LATERAL (SELECT * FROM Customer WHERE B.ID=C.BID ORDER BY ID DESC LIMIT N) C
WHERE B.ID = ANY(ids)
AND Date=Current_date

How to sum items from subtable in SQL

Let's say I have table orders
id name
1 order1
2 order2
3 order3
and subtable items
id parent amount price
1 1 1 10
2 1 3 20
3 2 2 5
4 2 5 1
I would like to create query with order with added column value. it should calculate order with all relevant items
id name value
1 order1 70
2 order2 15
3 order3 0
Is this possible with TSQL
GROUP BY and SUM would do it, need to use left join and isnull as you don't have items for all orders.
SELECT o.id, o.name, isnull(sum(i.amount*i.price),0) as value
FROM orders o
left join items i
on o.id = i.parent
group by o.id, o.name
I think you're looking for something like this
SELECT o.name, i.Value FROM orders o WITH (NOLOCK)
LEFT JOIN (SELECT parent, SUM(price) AS Value FROM items WITH (NOLOCK) GROUP BY parent) i
ON o.id = i.parent
...seems like RADAR beat me to the answer.
EDIT: missing the ON line.

counting in sql in subquery in the table

DNO DNAME
----- -----------
1 Research
2 Finance
EN ENAME CITY SALARY DNO JOIN_DATE
-- ---------- ---------- ---------- ---------- ---------
E1 Ashim Kolkata 10000 1 01-JUN-02
E2 Kamal Mumbai 18000 2 02-JAN-02
E3 Tamal Chennai 7000 1 07-FEB-04
E4 Asha Kolkata 8000 2 01-MAR-07
E5 Timir Delhi 7000 1 11-JUN-05
//find all departments that have more than 3 employees.
My try
select deptt.dname
from deptt,empl
where deptt.dno=empl.dno and (select count(empl.dno) from empl group by empl.dno)>3;
here is the solution
select deptt.dname
from deptt,empl
where deptt.dno=empl.dno
group by deptt.dname having count(1)>3;
select
*
from departments d
inner join (
select dno from employees group by dno having count(*) > 3
) e on d.dno = e.dno
There are many approaches to this problem but almost all will use GROUP BY and the HAVING clause. That clause allows you to filter results of aggregate functions. Here it is used to choose only those records where the count is greater than 3.
In the query structure used above the group by is handled on the employee table only, then the result (which is known as a derived table) is joined by an INNER JOIN to the departments table. This inner join only allows matching records so this has the effect of filtering the departments table to only those which have a count() of greater than 3.
An advantage of this query structure is fewer records are joined, and also that all columns of the departments table are available for reporting. Disadvantage of this structure is the the count() of employees per department isn't visible.