Postgresql: How to remove duplicate rows while joining?

Postgresql: How to remove duplicate rows while joining? - postgresql

I have two postgresql table called charges and orders. I'm trying to create a matview with the data of how many charges turned into orders and it's worth. The two tables are not directly related, here's the table structure of both
Charges
| date | transaction_id | amount |
|--------|----------------|--------|
| 23-Apr | abcdef | 36 |
| 23-Apr | fghijkl | 198 |
| 24-Apr | yyyyyy | 200 |
Orders
| date | order_id |
|--------|----------|
| 23-Apr | abcdef |
| 23-Apr | abcdef |
| 24-Apr | yyyyyy |
And below is the query I'm using for generating the matview,
CREATE MATERIALIZED VIEW sales AS
SELECT ch.date AS date,
(ord.id IS NOT NULL) as placed_order,
COUNT(DISTINCT(ch.transaction_id)) AS attempts,
SUM(ch.amount) AS amount
FROM charges ch
LEFT OUTER JOIN orders as ord ON ch.transaction_id = ord.order_id
GROUP BY ch.date
The problem is caused by the Amount column generated in the view. Due to the duplicates in orders table multiple rows of charges are returned during the left outer join and the amount is basically increasing.
Is there an way to Distinct the order_id column from orders at the time of joining itself?
Or is there a way to distinct the order_id and sum the amount at the time of query itself? I tried sub-query and self-join but to no luck.

You can make a sub-query on table orders to filter out the duplicates:
CREATE MATERIALIZED VIEW sales AS
SELECT ch.date AS date,
(ord.order_id IS NOT NULL) AS placed_order,
count(ch.transaction_id) AS attempts,
sum(ch.amount) AS amount
FROM charges ch
LEFT JOIN (
SELECT DISTINCT date, order_id FROM orders) ord ON ch.transaction_id = ord.order_id
GROUP BY 1, 2

Related

PostgreSQL How to merge two tables row to row without condition

I have two tables
The first table contains three text fields(username, email, num) the second have only one column with random birth_date DATE.
I need to merge tables without condition
For example
first table:
+----------+--------------+-----------+
| username | email | num |
+----------+--------------+-----------+
| 'user1' | 'user1#mail' | '+794949' |
| 'user2' | 'user2#mail' | '+799999' |
+----------+--------------+-----------+
second table:
+--------------+
| birth_date |
+--------------+
| '2001-01-01' |
| '2002-02-02' |
+--------------+
And I need result like
+----------+------------+-------------+--------------+
| username | email | num | birth_date |
+----------+------------+-------------+--------------+
| 'user1' | 'us1#mail' | '+7979797' | '2001-01-01' |
| 'user2' | 'us2#mail' | '+79898998' | '2002-02-02' |
+----------+------------+-------------+--------------+
I need to get in result table with 100 rows too
Tried different JOIN but there is no condition here

Sure there is a join condition, about the simplest there is: Join on true or cross join. Either is the basic merge tables without condition. However this does not result in what you want as it generates a result set of 10k rows. But you an then use limit:
select *
from table1
join table2 on true
order by random()
limit 100;
select *
from table1
cross join table2
order by random()
limit 100;
There is other option, witch I think may be closer to what you want. Assign a value to each row of each table. Then join on this assigned value:
select <column list>
from (select *, row_number() over() rn from table1) t1
join (select *, row_number() over() rn from table2) t2
on (t1.rn = t2.rn);
To eliminate the assigned value you must specifically list each column desired in the result. But that is the way it should be done anyway.
See demo here. (demo user just 3 rows instead of 100)

Jsonb_object_keys() does not return any rows in left join if the right side table does not have any matching records

This is db query .
select users.Id,jsonb_object_keys(orders.metadata::jsonb) from users left join orders on users.userId=orders.userId where users.userId=2;
users table orders table
------------------- -----------------------------------------------------
|userId| name | | userId|orderId|metadata |
| 1 | john | | 1 | 1 | {"orderName":"chess","quantity":1}|
| 2 | doe | | 1 | 2 | {"orderName":"cube" ,"quantity":1}|
------------------- -----------------------------------------------------
Why there are no rows returned by the query ?

Very Nice and tricky question. to achieve what you want you should try below query:
select
t1.userid,
t2.keys
from
users t1
left join (select userid, orderid, jsonb_object_keys(metadata) as keys from orders) t2
on t1.userid=t2.userid
Your Query seems correct but there is catch. When you are left joining both tables without jsonb_object_keys(metadata), it will work as you are expecting. But when you use with this function then this function will return a set of records for each rows of select statement and perform simple join with rest of the columns internally. That's why it will remove the rows having NULL value in second column.

You should left join to the result of the jsonb_each() call:
select users.userid, meta.*
from users
left join orders on users.userid = orders.userid
left join jsonb_object_keys(orders.metadata::jsonb) as meta on true
where users.userid = 2;

Sum with different condition for every line

In my Postgresql 9.3 database I have a table stock_rotation:
+----+-----------------+---------------------+------------+---------------------+
| id | quantity_change | stock_rotation_type | article_id | date |
+----+-----------------+---------------------+------------+---------------------+
| 1 | 10 | PURCHASE | 1 | 2010-01-01 15:35:01 |
| 2 | -4 | SALE | 1 | 2010-05-06 08:46:02 |
| 3 | 5 | INVENTORY | 1 | 2010-12-20 08:20:35 |
| 4 | 2 | PURCHASE | 1 | 2011-02-05 16:45:50 |
| 5 | -1 | SALE | 1 | 2011-03-01 16:42:53 |
+----+-----------------+---------------------+------------+---------------------+
Types:
SALE has negative quantity_change
PURCHASE has positive quantity_change
INVENTORY resets the actual number in stock to the given value
In this implementation, to get the current value that an article has in stock, you need to sum up all quantity changes since the latest INVENTORY for the specific article (including the inventory value). I do not know why it is implemented this way and unfortunately it would be quite hard to change this now.
My question now is how to do this for more than a single article at once.
My latest attempt was this:
WITH latest_inventory_of_article as (
SELECT MAX(date)
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
)
SELECT a.id, sum(quantity_change)
FROM stock_rotation sr
INNER JOIN article a ON a.id = sr.article_id
WHERE sr.date >= (COALESCE(
(SELECT date FROM latest_inventory_of_article),
'1970-01-01'
))
GROUP BY a.id
But the date for the latest stock_rotation of type INVENTORY can be different for every article.
I was trying to avoid looping over multiple article ids to find this date.

In this case I would use a different internal query to get the max inventory per article. You are effectively using stock_rotation twice but it should work. If it's too big of a table you can try something else:
SELECT sr.article_id, sum(quantity_change)
FROM stock_rotation sr
LEFT JOIN (
SELECT article_id, MAX(date) AS date
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
GROUP BY article_id) AS latest_inventory
ON latest_inventory.article_id = sr.article_id
WHERE sr.date >= COALESCE(latest_inventory.date, '1970-01-01')
GROUP BY sr.article_id

You can use DISTINCT ON together with ORDER BY to get the latest INVENTORY row for each article_id in the WITH clause.
Then you can join that with the original table to get all later rows and add the values:
WITH latest_inventory as (
SELECT DISTINCT ON (article_id) id, article_id, date
FROM stock_rotation
WHERE stock_rotation_type = 'INVENTORY'
ORDER BY article_id, date DESC
)
SELECT article_id, sum(sr.quantity_change)
FROM stock_rotation sr
JOIN latest_inventory li USING (article_id)
WHERE sr.date >= li.date
GROUP BY article_id;

Here is my take on it: First, build the list of products at their last inventory state, using a window function. Then, join it back to the entire list, filtering on operations later than the inventory date for the item.
with initial_inventory as
(
select article_id, date, quantity_change from
(select article_id, date, quantity_change, rank() over (partition by article_id order by date desc)
from stockRotation
where type = 'INVENTORY'
) a
where rank = 1
)
select ii.article_id, ii.quantity_change + sum(sr.quantity_change)
from initial_inventory ii
join stockRotation sr on ii.article_id = sr.article_id and sr.date > ii.date
group by ii.article_id, ii.quantity_change

PostgreSQL Group By not working as expected - wants too many inclusions

I have a simple postgresql table that I'm tying to query. Imaging a table like this...
| ID | Account_ID | Iteration |
|----|------------|-----------|
| 1 | 100 | 1 |
| 2 | 101 | 1 |
| 3 | 100 | 2 |
I need to get the ID column for each Account_ID where Iteration is at its maximum value. So, you'd think something like this would work
SELECT "ID", "Account_ID", MAX("Iteration")
FROM "Table_Name"
GROUP BY "Account_ID"
And I expect to get:
| ID | Account_ID | MAX(Iteration) |
|----|------------|----------------|
| 2 | 101 | 1 |
| 3 | 100 | 2 |
But when I do this, Postgres complains:
ERROR: column "ID" must appear in the GROUP BY clause or be used in an aggregate function
Which, when I do that it just destroys the grouping altogether and gives me the whole table!
Is the best way to approach this using the following?
SELECT DISTINCT ON ("Account_ID") "ID", "Account_ID", "Iteration"
FROM "Marketing_Sparks"
ORDER BY "Account_ID" ASC, "Iteration" DESC;

The GROUP BY statement aggregates rows with the same values in the columns included in the group by into a single row. Because this row isn't the same as the original row, you can't have a column that is not in the group by or in an aggregate function. To get what you want, you will probably have to select without the ID column, then join the result to the original table. I don't know PostgreSQL syntax, but I assume it would be something like the following.
SELECT Table_Name.ID, aggregate.Account_ID, aggregate.MIteration
(SELECT Account_ID, MAX(Iteration) AS MIteration
FROM Table_Name
GROUP BY Account_ID) aggregate
LEFT JOIN Table_Name ON aggregate.Account_ID = Table_Name.Account_ID AND
aggregate.MIteration = Tabel_Name.Iteration

How to use COUNT() in more that one column?

Let's say I have this 3 tables
Countries ProvOrStates MajorCities
-----+------------- -----+----------- -----+-------------
Id | CountryName Id | CId | Name Id | POSId | Name
-----+------------- -----+----------- -----+-------------
1 | USA 1 | 1 | NY 1 | 1 | NYC
How do you get something like
---------------------------------------------
CountryName | ProvinceOrState | MajorCities
| (Count) | (Count)
---------------------------------------------
USA | 50 | 200
---------------------------------------------
Canada | 10 | 57
So far, the way I see it:
Run the first SELECT COUNT (GROUP BY Countries.Id) on Countries JOIN ProvOrStates,
store the result in a table variable,
Run the second SELECT COUNT (GROUP BY Countries.Id) on ProvOrStates JOIN MajorCities,
Update the table variable based on the Countries.Id
Join the table variable with Countries table ON Countries.Id = Id of the table variable.
Is there a possibility to run just one query instead of multiple intermediary queries? I don't know if it's even feasible as I've tried with no luck.
Thanks for helping

Use sub query or derived tables and views
Basically If You You Have 3 Tables
select * from [TableOne] as T1
join
(
select T2.Column, T3.Column
from [TableTwo] as T2
join [TableThree] as T3
on T2.CondtionColumn = T3.CondtionColumn
) AS DerivedTable
on T1.DepName = DerivedTable.DepName
And when you are 100% percent sure it's working you can create a view that contains your three tables join and call it when ever you want
PS: in case of any identical column names or when you get this message
"The column 'ColumnName' was specified multiple times for 'Table'. "
You can use alias to solve this problem

This answer comes from #lotzInSpace.
SELECT ct.[CountryName], COUNT(DISTINCT p.[Id]), COUNT(DISTINCT c.[Id])
FROM dbo.[Countries] ct
LEFT JOIN dbo.[Provinces] p
ON ct.[Id] = p.[CountryId]
LEFT JOIN dbo.[Cities] c
ON p.[Id] = c.[ProvinceId]
GROUP BY ct.[CountryName]
It's working. I'm using LEFT JOIN instead of INNER JOIN because, if a country doesn't have provinces, or a province doesn't have cities, then that country or province doesn't display.
Thanks again #lotzInSpace.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgresql: How to remove duplicate rows while joining? - postgresql

Related

PostgreSQL How to merge two tables row to row without condition

Jsonb_object_keys() does not return any rows in left join if the right side table does not have any matching records

Sum with different condition for every line

PostgreSQL Group By not working as expected - wants too many inclusions

How to use COUNT() in more that one column?

Categories

Resources