How to remove bulk rows of duplicated chars from string in Postgres or SQLAlchemy? - postgresql

I have a table with a column named "ids" , type of String. Could someone tell me how to remove the duplicated values in each of the rows?
Example, table is:
--------------------------------------------------
primary_key | ids
--------------------------------------------------
1 | {23,40,23}
--------------------------------------------------
2 | {78,40,13,78}
--------------------------------------------------
3 | {20,13,20}
--------------------------------------------------
4 | {7,2,7}
--------------------------------------------------
and I want to update it into:
--------------------------------------------------
primary_key | ids
--------------------------------------------------
1 | {23,40}
--------------------------------------------------
2 | {78,40,13}
--------------------------------------------------
3 | {20,13}
--------------------------------------------------
4 | {7,2}
--------------------------------------------------
In postgres I wrote:
UPDATE table_name
SET ids = (SELECT DISTINCT UNNEST(
(SELECT ids FROM table_name)::text[]))
In sqlalchemy I wrote:
session.query(table_name.ids).\
update({table_name.ids: func.unnest(table_name.ids,String).alias('data_view')},
synchronize_session=False)
None of these are working, so please help me, thanks in advance!

I think you could improve the design by storing these ids in another table one id per row with a foreign key referencing table_name.primary_key.
Also storing Array data as text strings seems strange.
Anyway, here is one way to do it: I wrapped the set returned by UNNEST with an inner subselect to be able to apply the aggregate_function needed to concatenate the strings again.
UPDATE table_name
SET ids = new_ids
FROM LATERAL (
SELECT primary_key, array_agg(elem)::text AS new_ids
FROM (SELECT DISTINCT primary_key, UNNEST(ids::text[]) as elem
FROM table_name ) t_inner
GROUP by primary_key )t_sub
WHERE t_sub.primary_key = table_name.primary_key

Related

postgres: temporary column default that is unique and not nullable, without relying on a sequence?

Hi, I want to add a unique, non-nullable column to a table.
It
already has data. I would therefore like to instantly populate the
new column with unique values, eg 'ABC123', 'ABC124', 'ABC125', etc.
The data will eventually be wiped and
replaced with proper data, so i don't want to introduce a sequence
just to populate the default value.
Is it possible to generate a default value for the existing rows, based on something like rownumber()? I realise the use case is ridiculous but is it possible to achieve... if so how?
...
foo text not null unique default 'ABC'||rownumber()' -- or something similar?
...
can be applied generate_series?
select 'ABC' || generate_series(123,130)::text;
ABC123
ABC124
ABC125
ABC126
ABC127
ABC128
ABC129
ABC130
Variant 2 add column UNIQUE and not null
begin;
alter table test_table add column foo text not null default 'ABC';
with s as (select id,(row_number() over(order by id))::text t from test_table) update test_table set foo=foo || s.t from s where test_table.id=s.id;
alter table test_table add CONSTRAINT unique_foo1 UNIQUE(foo);
commit;
results
select * from test_table;
id | foo
----+------
1 | ABC1
2 | ABC2
3 | ABC3
4 | ABC4
5 | ABC5
6 | ABC6

Need an efficient select query

I would like to know an efficient to way to fetch the data in the following case.
There are two tables say Table1 and Table2 having two common field say contry and pincode and other table "Table3" having key fields of first two tables (DNO, MPNO).
Here is the little glitch, In table3 data, if it is having DNO it wont have MPNO
So when in the selection screen(Pic no2) if the use enter any thing, result should be as follows
**MFID | DNO | MPNO | COUNTRY | PINCODE**
----------
00001 | 10011 | novalue | IN | 4444
00002 | Novalue | 1200 | IN | 5555
00003 | 300 | novalue | US | 9999
( as you can observe if DNO present no MPNO , vice versa )
Please have a look at the pictures for a clear picture :-)
Table Relation:
Selection screen with select options:
The code shouldn't be long.
PSEUDO CODE:
Select queries:
Select * from table3 into it_table3.
Select * from table1 FOR ALL ENTRIES IN it_table3 INTO it_table1
WHERE dno = table3-dno.
Select * from table2 FOR ALL ENTRIES IN it_table3 INTO it_table2
WHERE mpno = table3-mpno.
Loop at internal table 3 and build final table.
LOOP at it_table3 into wa_table3.
IF wa_table3-dno IS NOT INITIAL.
READ it_table1 where dno = wa_table3-dno.
ELSE.
READ it_table2 where mpno = wa_table3-mpno.
ENDIF.
ENDLOOP.
Hope this was the answer you were hoping to find!
Building of efficient select will require information about obligatory fields in your selection screen, as well as about alleged production size of all 3 tables. However, without this information let's assume that table1 and table2 are reference tables and table3 is a transaction table, as onr can assume from their structure. It would be sensible to build selection in a following way:
Selecting data from reference tables. As you said fields DNO/MPNO are mutually exclusive then there will be no hits of country/pincode pair in both reference tables, so JOIN is useless here. However we can merge 2 result sets in single itab without any constraints' violations.
TYPES: BEGIN OF tt_result,
dno TYPE table1-dno,
mpno TYPE table2-mpno,
country TYPE table1-country,
pincode TYPE table1-pincode,
...other field from table3
END OF tt_result.
DATA: itab_result TYPE tt_result.
SELECT dno
FROM table1
INTO CORRESPONDING FIELDS OF TABLE itab_result
WHERE pincode IN so_pincode
AND country IN so_country.
SELECT mpno
FROM table2
APPENDING CORRESPONDING FIELDS OF TABLE itab_result
WHERE pincode IN so_pincode
AND country IN so_country.
FOR ALL ENTRIES addition allows specifying the same table in FOR ALL ENTRIES clause and in INTO clause, so we can fill our result table with absent table3 data by DNO/MPNO key.
SELECT *
FROM table3
INTO CORRESPONDING FIELDS OF TABLE itab_result
FOR ALL ENTRIES IN itab_result
ON itab_result~dno = itab3~dno
AND itab_result_mpno = itab3~mpno.

Adding the results of two select queries into one table row with PostgreSQL

I am attempting to return the result of two distinct select statements into one row in PostgreSQL. For example, I have two queries each that return the same number of rows:
Select tableid1, tableid2, tableid3 from table1
+----------+----------+----------+
| tableid1 | tableid2 | tableid3 |
+----------+----------+----------+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
+----------+----------+----------+
Select table2id1, table2id2, table2id3, table2id4 from table2
+-----------+-----------+-----------+-----------+
| table2id1 | table2id2 | table2id3 | table2id4 |
+-----------+-----------+-----------+-----------+
| 7 | 8 | 9 | 15 |
| 10 | 11 | 12 | 19 |
+-----------+-----------+-----------+-----------+
Now i want to concatenate these tables keeping the same number of rows. I do not want to join on any values. The desired result would look like the following:
+----------+----------+----------+-----------+-----------+-----------+-----------+
| tableid1 | tableid2 | tableid3 | table2id1 | table2id2 | table2id3 | table2id4 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
| 1 | 2 | 3 | 7 | 8 | 9 | 15 |
| 4 | 5 | 6 | 10 | 11 | 12 | 19 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
What can I do to the two above queries (select * from table1) and (select * from table2) to return the desired result above.
Thanks!
You can use row_number() for join, but I'm not sure that you have guaranties that order of the rows will stay the same as in the tables. So it's better to add some order into over() clause.
with cte1 as (
select
tableid1, tableid2, tableid3, row_number() over() as rn
from table1
), cte2 as (
select
table2id1, table2id2, table2id3, table2id4, row_number() over() as rn
from table2
)
select *
from cte1 as c1
inner join cte2 as c2 on c2.rn = c1.rn
You can't have what you want, as you wrote the question. Your two SELECTs don't have any ORDER BY clause, so the database can return the rows in whatever order it feels like. If it currently matches up, it does so only by accident, and will stop matching up as soon as you UPDATE a row.
You need a key column. Then you need to join on the key column. Anything else is attempting to invent unreliable and unsafe joins without actually using a join.
Frankly, this seems like a pretty dodgy schema. Lots of numbered integer columns like this, and the desire to concatenate them, may be a sign you should be looking at using integer arrays, or using a side-table with a foreign key relationship, instead.
Sample data in case anyone else wants to play:
CREATE TABLE table1(tableid1 integer, tableid2 integer, tableid3 integer);
INSERT INTO table1 VALUES (1,2,3), (4,5,6);
CREATE TABLE table2(table2id1 integer, table2id2 integer, table2id3 integer, table2id4 integer);
INSERT INTO table2 VALUES (7,8,9,15), (10,11,12,19);
Depending on what you're actually doing you might really have wanted arrays.
I think you might need to read these two posts:
Join 2 sets based on default order
How keep data don't sort?
which explain that SQL tables just don't have an order. So you cannot fetch them in a particular order.
DO NOT USE THE FOLLOWING CODE, IT IS DANGEROUS AND ONLY INCLUDED AS A PROOF OF CONCEPT:
As it happens you can use a set-returning function hack to very inefficiently do what you want. It's incredibly ugly and *completely unsafe without an ORDER BY in the SELECTs, but I'll include it for completeness. I guess.
CREATE OR REPLACE FUNCTION t1() RETURNS SETOF table1 AS $$ SELECT * FROM table1 $$ LANGUAGE sql;
CREATE OR REPLACE FUNCTION t2() RETURNS SETOF table2 AS $$ SELECT * FROM table2 $$ LANGUAGE sql;
SELECT (t1()).*, (t2()).*;
If you use this in any real code then kittens will cry. It'll produce insane and bizarre results if the number of rows in the tables differ and it'll produce the rows in orderings that might seem right at first, but will randomly start coming out wrong later on.
THE SANE WAY is to add a primary key properly, then do a join.

how to make array_agg() work like group_concat() from mySQL

So I have this table:
create table test (
id integer,
rank integer,
image varchar(30)
);
Then some values:
id | rank | image
---+------+-------
1 | 2 | bbb
1 | 3 | ccc
1 | 1 | aaa
2 | 3 | c
2 | 1 | a
2 | 2 | b
I want to group them by id and concatenate the image name in the order given by rank. In mySQL I can do this:
select id,
group_concat( image order by rank asc separator ',' )
from test
group by id;
And the output would be:
1 aaa,bbb,ccc
2 a,b,c
Is there a way I can have this in postgresql?
If I try to use array_agg() the names will not show in the correct order and apparently I was not able to find a way to sort them. (I was using postgres 8.4 )
In PostgreSQL 8.4 you cannot explicitly order array_agg but you can work around it by ordering the rows passed into to the group/aggregate with a subquery:
SELECT id, array_to_string(array_agg(image), ',')
FROM (SELECT * FROM test ORDER BY id, rank) x
GROUP BY id;
In PostgreSQL 9.0 aggregate expressions can have an ORDER BY clause:
SELECT id, array_to_string(array_agg(image ORDER BY rank), ',')
FROM test
GROUP BY id;

PostgreSQL: Can't use DISTINCT for some data types

I have a table called _sample_table_delme_data_files which contains some duplicates. I want to copy its records, without duplicates, into data_files:
INSERT INTO data_files (SELECT distinct * FROM _sample_table_delme_data_files);
ERROR: could not identify an ordering operator for type box3d
HINT: Use an explicit ordering operator or modify the query.
Problem is, PostgreSQL can not compare (or order) box3d types. How do I supply such an ordering operator so I can get only the distinct into my destination table?
Thanks in advance,
Adam
If you don't add the operator, you could try translating the box3d data to text using its output function, something like:
INSERT INTO data_files (SELECT distinct othercols,box3dout(box3dcol) FROM _sample_table_delme_data_files);
Edit The next step is: cast it back to box3d:
INSERT INTO data_files SELECT othercols, box3din(b) FROM (SELECT distinct othercols,box3dout(box3dcol) AS b FROM _sample_table_delme_data_files);
(I don't have box3d on my system so it's untested.)
The datatype box3d doesn't have an operator for the DISTINCT-operation. You have to create the operator, or ask the PostGIS-project, maybe somebody has already fixed this problem.
Finally, this was solved by a colleague.
Let's see how many dups are there:
SELECT COUNT(*) FROM _sample_table_delme_data_files ;
count
-------
12728
(1 row)
Now, we shall add another column to the source table to help us differentiate similar rows:
ALTER TABLE _sample_table_delme_data_files ADD COLUMN id2 serial;
We can now see the dups:
SELECT id, id2 FROM _sample_table_delme_data_files ORDER BY id LIMIT 10;
id | id2
--------+------
198748 | 6449
198748 | 85
198801 | 166
198801 | 6530
198829 | 87
198829 | 6451
198926 | 88
198926 | 6452
199062 | 6532
199062 | 168
(10 rows)
And remove them:
DELETE FROM _sample_table_delme_data_files
WHERE id2 IN (SELECT max(id2) FROM _sample_table_delme_data_files
GROUP BY id
HAVING COUNT(*)>1);
Let's see it worked:
SELECT id FROM _sample_table_delme_data_files GROUP BY id HAVING COUNT(*)>1;
id
----
(0 rows)
Remove the auxiliary column:
ALTER TABLE _sample_table_delme_data_files DROP COLUMN id2;
ALTER TABLE
Insert the remaining rows into the destination table:
INSERT INTO data_files (SELECT * FROM _sample_table_delme_data_files);
INSERT 0 6364