Inserting data from one table based on some Id in postgres - postgresql

I have two tables with almost 13,000 records and looks something like this
TableA:
ID Status Option
----------------------
1 | Approved |
2 | Reject |
3 | Approved |
4
.
.
13,000
TableB
Name Option Status
-----------------------------------------------
First | {'data':'Add into box','ID':'1'} | Approved
Second | {'data':'Don't Add','ID':'2'} | Reject
Third | {'data':'Add into box','ID':'3'} | Approved
.
.
.
13,000
I want to fill the Option column (data type varchar)in table A with similar data to that of Table B Option column (data type B) based on same ID which is also in option json object. How do i fill them in one go rather than going one by one.

An update query where we set the "option" in TableA using a subquery, where we filter the result based on "id" of TableA matching with "id" inside varchar column "option" of TableB.
update tablea
set option = (select option from tableb
where tablea.id::text = tableb.option::json ->> 'id'
limit 1);
-- assuming id has a 1:1 relation in both tables

Related

Delete Duplicate Data on PostgreSQL

How to delete duplicate data on a table which have kind data like these.
I want to keep it with the latest updated_at at each attribute id.
Like as follows:
attribute id | created at | product_id
1 | 2020-04-28 15:31:11 | 112235
4 | 2020-04-28 15:30:25 | 112235
1 | 2020-04-29 15:30:25 | 112236
4 | 2020-04-29 15:30:25 | 112236
You can use an EXISTS condition.
delete from the_table t1
where exists (select *
from the_table t2
where t2.created_at > t1.created_at
and t2.attribute_id = t1.attribute_id);
This will delete all rows where another row for the same attribute_id exists that has bigger created_at value (thus keeping only the row with the highest created_at for each attribute_id). Note that if two created_at values are identical, nothing will be deleted for that attribute_id
Online example

remove duplicate records in postgres where all records are duplicate

My postgres table model have exactly duplicate record, I need to write a query to delete them.
id | model | model_id | dependent_on_model
-----+-------+----------+--------------------
1 | Card | 72 | Metric
1 | Card | 72 | Metric
2 | Card | 79 | Metric
2 | Card | 79 | Metric
3 | Card | 83 | Metric
3 | Card | 83 | Metric
5 | Card | 86 | Metric
using Cte is not helping as i am getting the error
relation "cte" does not exist.
Please suggest a query which delete the duplicate row and i will have just 4 distinct records at the end.
My suggestion is to duplicate the table in a TEMPORARY TABLE WITH OIDS. This way you have some other id to distinguish the two identical rows.
Idea:
Duplicate the data with another ID in a temporary table.
Remove duplicates in temporary table.
Delete actual table
Copy data back into actual table from temporary table.
Delete the TEMPORARY TABLE
You'll have to perform some destructive action on your actual table so make sure your TEMPORARY TABLE has what you want remaining before deleting anything from your actual table.
This is how you would create the TEMPORARY TABLE:
CREATE TEMPORARY TABLE dups_with_oids
( id integer
, model text
, model_id integer
, dependent_on_model text
) WITH OIDS;
Here is the DELETE query:
WITH temp AS
(
SELECT d.id AS keep
, d.oid AS keep_oid
, d2.id AS del
, d2.oid AS del_oid
FROM dups_with_oids d
JOIN dups_with_oids d2 ON (d.id = d2.id AND d.oid < d2.oid)
)
DELETE FROM dups_with_oids d
WHERE d.oid IN (SELECT temp.del_oid FROM temp);
SQLFiddle to prove the theory.
I should add that if id were a PRIMARY KEY or UNIQUE these duplicates wouldn't have been possible.

How to find duplicate rows in JPA

Is there a way to find duplicate entries in a data set using JPA?
| id | text |
-------------
| 1 | foo |
| 2 | bar |
| 3 | foo |
I want to have only entries 1 & 3 in my set.
I can't make it unique on this field.
—
DISTINCT would give me rows 1 & 2.
If it’s a query, a join with the same table? I’m not sure how that would work. I couldn’t get group by to function.
Edited
I believe you can use the following syntax without inner query:
SELECT id, text, COUNT(*) FROM entity GROUP BY text HAVING COUNT(*) > 1
You can apply common practice from SQL to JPQL with the following query:
SELECT e FROM Entity e WHERE e.text IN (SELECT text FROM Entity d GROUP BY text HAVING COUNT(*)>1.
A sub-query is required so you'd need an index on text column for it to be efficient.

Need an efficient select query

I would like to know an efficient to way to fetch the data in the following case.
There are two tables say Table1 and Table2 having two common field say contry and pincode and other table "Table3" having key fields of first two tables (DNO, MPNO).
Here is the little glitch, In table3 data, if it is having DNO it wont have MPNO
So when in the selection screen(Pic no2) if the use enter any thing, result should be as follows
**MFID | DNO | MPNO | COUNTRY | PINCODE**
----------
00001 | 10011 | novalue | IN | 4444
00002 | Novalue | 1200 | IN | 5555
00003 | 300 | novalue | US | 9999
( as you can observe if DNO present no MPNO , vice versa )
Please have a look at the pictures for a clear picture :-)
Table Relation:
Selection screen with select options:
The code shouldn't be long.
PSEUDO CODE:
Select queries:
Select * from table3 into it_table3.
Select * from table1 FOR ALL ENTRIES IN it_table3 INTO it_table1
WHERE dno = table3-dno.
Select * from table2 FOR ALL ENTRIES IN it_table3 INTO it_table2
WHERE mpno = table3-mpno.
Loop at internal table 3 and build final table.
LOOP at it_table3 into wa_table3.
IF wa_table3-dno IS NOT INITIAL.
READ it_table1 where dno = wa_table3-dno.
ELSE.
READ it_table2 where mpno = wa_table3-mpno.
ENDIF.
ENDLOOP.
Hope this was the answer you were hoping to find!
Building of efficient select will require information about obligatory fields in your selection screen, as well as about alleged production size of all 3 tables. However, without this information let's assume that table1 and table2 are reference tables and table3 is a transaction table, as onr can assume from their structure. It would be sensible to build selection in a following way:
Selecting data from reference tables. As you said fields DNO/MPNO are mutually exclusive then there will be no hits of country/pincode pair in both reference tables, so JOIN is useless here. However we can merge 2 result sets in single itab without any constraints' violations.
TYPES: BEGIN OF tt_result,
dno TYPE table1-dno,
mpno TYPE table2-mpno,
country TYPE table1-country,
pincode TYPE table1-pincode,
...other field from table3
END OF tt_result.
DATA: itab_result TYPE tt_result.
SELECT dno
FROM table1
INTO CORRESPONDING FIELDS OF TABLE itab_result
WHERE pincode IN so_pincode
AND country IN so_country.
SELECT mpno
FROM table2
APPENDING CORRESPONDING FIELDS OF TABLE itab_result
WHERE pincode IN so_pincode
AND country IN so_country.
FOR ALL ENTRIES addition allows specifying the same table in FOR ALL ENTRIES clause and in INTO clause, so we can fill our result table with absent table3 data by DNO/MPNO key.
SELECT *
FROM table3
INTO CORRESPONDING FIELDS OF TABLE itab_result
FOR ALL ENTRIES IN itab_result
ON itab_result~dno = itab3~dno
AND itab_result_mpno = itab3~mpno.

Why do I have to provide an items.id column to the group by clause?

I want to return unique items based on condition, sorted by price asc. My query fails because Postgres wants items.id to be present in the group by clause. If it's included the query returns everything matching the where clause, which is not what I want. Why do I need to include the column?
select items.*
from items
where product_id = 1 and items.status = 'in_stock'
group by condition /* , items.id returns everything */
order by items.price asc
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 2 | good | 5 |
| 3 | good | 3 |
I only want items with ids 1 and 3.
Update: Here's a fiddle using the answer below, which still produces the error:
http://sqlfiddle.com/#!1/33786/2
The problem is that PostgreSQL has no way of knowing which items records you want to take values from; that is, it can't tell that you want this:
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 3 | good | 3 |
and not this:
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 2 | good | 5 |
To fix this, you need to use some sort of aggregation function, such as MAX:
SELECT MAX(id) AS id,
condition,
MAX(price) AS price
FROM items
WHERE product_id = 1
AND status = 'in_stock'
GROUP BY condition
ORDER BY price ASC
which gives:
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 3 | good | 5 |
(This restriction is part of the SQL standard, and most DBMSes enforce it. One exception is MySQL, which allows your query, but with the caveat that "The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate" [link].)
SQL Fiddle
select *
from (
select distinct on (cond)
id, cond, price
from items
where product_id = 1 and items.status = 'in_stock'
order by cond, price
) s
order by price
The SQL standard requires this behaviour, though some databases like MySQL ignore it and instead return unpredictable results.
If there's more than one row for "cond = good" and you ask for the "id" of the row where "cond = good", which row should the database give you? The row with id = 3, or id = 2? How should it know which to pick? MySQL picks an arbitrary row if there are multiple candidates, but this isn't allowed by the standard.
In your case you seem to want to pick the lowest-price row for each condition.
PostgreSQL provides an extension, DISTINCT ON ..., to help with this. Clodaldo has demonstrated this in his answer, so I won't repeat that here. Using DISTINCT ON will be much more efficient than the example below.
The SQL-standard way would be to use a window to rank the results, then filter on the ranked data. Unfortunately this is pretty inefficient as it requires all rows that match the inner where clause to be collected and sorted.
SELECT *
FROM (
SELECT *, dense_rank() OVER w AS itemrank
FROM items
WHERE product_id = 1 AND items.status = 'in_stock'
WINDOW w AS (PARTITION BY cond ORDER BY price ASC)
) ranked_items
WHERE itemrank = 1;
(http://sqlfiddle.com/#!1/33786/19)
Another SQL-standard way is to use an aggregation subquery to find the min prices for each category then display all rows with the min price:
SELECT *
FROM items INNER JOIN (
SELECT cond, min(price) AS minprice
FROM items
WHERE product_id = 1 AND items.status = 'in_stock'
GROUP BY cond
) minprices(cond, price)
ON (items.price = minprices.price AND items.cond = minprices.cond)
ORDER BY items.price;
Unlike the DISTINCT ON version, though, this will display multiple entries if the lowest priced item has more than one entry with the same cond and price.
So.. you should really use the DISTINCT ON approach, but you need to understand it. Start with the PostgreSQL documentation here.
On a side note, newer PostgreSQL versions allow you to refer to any column of a table whose primary key you've listed in GROUP BY; they identify the functional dependency of the other columns on the primary key. So you don't have to aggregate other cols if you've mentioned the PK in newer versions. That's what the standard requires, but older versions weren't smart enough to figure it out and required all columns to be listed explicitly.
That's what people who ask this question usually want to know, but doesn't apply strictly to your question since it turns out you're trying to use GROUP BY to filter rows.