SQL to remove rows with duplicated value while keeping one - tsql

Say I have this table
id | data | value
-----------------
1 | a | A
2 | a | A
3 | a | A
4 | a | B
5 | b | C
6 | c | A
7 | c | C
8 | c | C
I want to remove those rows with duplicated value for each data while keeping the one with the min id, e.g. the result will be
id | data | value
-----------------
1 | a | A
4 | a | B
5 | b | C
6 | c | A
7 | c | C
I know a way to do it is to do a union like:
SELECT 1 [id], 'a' [data], 'A' [value] INTO #test UNION SELECT 2, 'a', 'A'
UNION SELECT 3, 'a', 'A' UNION SELECT 4, 'a', 'B'
UNION SELECT 5, 'b', 'C' UNION SELECT 6, 'c', 'A'
UNION SELECT 7, 'c', 'C' UNION SELECT 8, 'c', 'C'
SELECT * FROM #test WHERE id NOT IN (
SELECT MIN(id) FROM #test
GROUP BY [data], [value]
HAVING COUNT(1) > 1
UNION
SELECT MIN(id) FROM #test
GROUP BY [data], [value]
HAVING COUNT(1) <= 1
)
but this solution has to repeat the same group by twice (consider the real case is a massive group by with > 20 columns)
I would prefer a simpler answer with less code as oppose to complex ones. Is there any more concise way to code this?
Thank you

You can use one of the methods below:
Using WITH CTE:
WITH CTE AS
(SELECT *,RN=ROW_NUMBER() OVER(PARTITION BY data,value ORDER BY id)
FROM TableName)
DELETE FROM CTE WHERE RN>1
Explanation:
This query will select the contents of the table along with a row number RN. And then delete the records with RN >1 (which would be the duplicates).
This Fiddle shows the records which are going to be deleted using this method.
Using NOT IN:
DELETE FROM TableName
WHERE id NOT IN
(SELECT MIN(id) as id
FROM TableName
GROUP BY data,value)
Explanation:
With the given example, inner query will return ids (1,6,4,5,7). The outer query will delete records from table whose id NOT IN (1,6,4,5,7).
This fiddle shows the records which are going to be deleted using this method.
Suggestion: Use the first method since it is faster than the latter. Also, it manages to keep only one record if id field is also duplicated for the same data and value.

I want to add MYSQL solution for this query
Suggestion 1 : MySQL prior to version 8.0 doesn't support the WITH clause
Suggestion 2 : throw this error (you can't specify table TableName for update in FROM clause
So the solution will be
DELETE FROM TableName WHERE id NOT IN
(SELECT MIN(id) as id
FROM (select * from TableName) as t1
GROUP BY data,value) as t2;

Related

convert array of aclitem into multiple rows redshift

I have one array with column values as
{james=UC/james,adam=C/james,chris=UC/james,john=U/james}
The above column values are not json. They are in string in the following form:
{ username=privilegestring/grantor }
How to convert above column into multiple rows
Edit #3:
Updated the query to specifically target pg_catalog.pg_namespace for acl permissions grants, via CTE pg_catalog. Currently this CTE is filtered in the where clause to select a single namespace name ('avengers'); if you want to select from multiple namespace names, you should be able to add them into the WHERE clause of this CTE directly, or in the case of wanting all namespace names, remove the clause altogether.
It's worth noting as well, that you will need to expand the case statements in access_privilege_types to handle all permissions cases: 'r', 'w', 'a', 'd', and 'x', for the operations: SELECT, UPDATE, INSERT, DELETE, REFERENCE, respectively.
Edit #2:
The final posted version of the query below should get you the data you want in the format that you want it in. I don't know how many possible values there are for the permissions types; if you have more than the two specified currently, you will need to expand the case statements in the CTE* access_privilege_types*. Obviously you'll also need to replace your table name within the query, etc.. Let me know if you run into any trouble and I'll help as necessary.
Edit #1:
Was able to validate that this query works in Redshift. Updated the query to break out separate rows by grantee and owner. The current version doesn't break out individual permissions by row yet -- Will take a look later tonight to see if I can get that working as well.
Original:
I don't have access to my Redshift cluster to test this at the moment, but I will when I get home. The general idea behind the following method, is to create a numbered index table to cross join against that will expand the data in the permissions field into a row-based representation.
I had inquired about the size limit, because this will currently only handle 10,000 possible delimited values, however you can adjust the CTEs to scale up to larger amounts if needed for your specific application:
Revision 3 Query:
WITH
pg_namespace AS (
SELECT
nspname
, nspowner
, rtrim(ltrim(array_to_string(nspacl, ','), '{'), '}') as nspacl
FROM pg_catalog.pg_namespace
WHERE nspname = 'public'
),
-- Generating a table with the numbers 1 - 10 in a single column.
ten_numbers AS (
SELECT
1 AS num
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 0
),
-- Expands the values in ten_numbers to create a single column with the values 1 - 10,000.
depivot_index AS (
SELECT
(1000 * t1.num) + (100 * t2.num) + (10 * t3.num) + t4.num AS gen_num
FROM ten_numbers AS t1
JOIN ten_numbers AS t2 ON 1 = 1
JOIN ten_numbers AS t3 ON 1 = 1
JOIN ten_numbers AS t4 ON 1 = 1
),
-- Filters down generated_numbers to house only the numbers up to the maximum times that the delimiter appears.
splitter AS (
SELECT
*
FROM depivot_index
WHERE gen_num BETWEEN 1 AND (
SELECT max(REGEXP_COUNT(nspacl, '\\,') + 1)
FROM pg_namespace
)
),
-- Cross joins permissions_groups and splitter to populate all requests, delimited on ','.
expanded_input AS (
SELECT
pg.nspname
, pg.nspacl
, trim(split_part(pg.nspacl, ',', s.gen_num)) AS raw_permissions_string
FROM pg_namespace AS pg
JOIN splitter AS s ON 1 = 1
WHERE split_part(nspacl, ',', s.gen_num) <> ''
),
-- Breaks out the owner and grantee fields into their own columns respectively.
users_with_raw_permissions_data AS (
SELECT
e.raw_permissions_string
, e.nspname
, trim(split_part(e.raw_permissions_string, '=', 1)) AS grantee
, trim(split_part(trim(split_part(e.raw_permissions_string, '=', 2)), '/', 2)) AS owner
, trim(split_part(trim(split_part(e.raw_permissions_string, '=', 2)), '/', 1)) AS raw_permissions_data
FROM
expanded_input e
),
-- Mines privilege types from raw string data.
access_privilege_types AS (
SELECT
u.nspname
, u.owner
, u.grantee
,CASE
WHEN position('C*' IN u.raw_permissions_data) > 0 THEN 'C*'
WHEN position('U*' IN u.raw_permissions_data) > 0 THEN 'U*'
WHEN position('C' IN u.raw_permissions_data) > 0 THEN 'C'
WHEN position('U' IN u.raw_permissions_data) > 0 THEN 'U'
ELSE u.raw_permissions_data
END AS first_access_privilege
, CASE
WHEN position('U*' IN u.raw_permissions_data) > 0 THEN 'U*'
WHEN position('C*' IN u.raw_permissions_data) > 0 THEN 'C*'
WHEN position('U' IN u.raw_permissions_data) > 0 THEN 'U'
WHEN position('C' IN u.raw_permissions_data) > 0 THEN 'C'
ELSE u.raw_permissions_data
END AS second_access_privilege
, first_access_privilege || ',' || second_access_privilege AS merged_access_privileges
FROM users_with_raw_permissions_data u
),
-- Cross joins access_privilge_types and splitter to populate all privilege_types, delimited on ','.
expanded_access_privilege_types AS (
SELECT
a.nspname
, a.owner
, a.grantee
, trim(split_part(a.merged_access_privileges, ',', s.gen_num)) AS access_privileges
FROM access_privilege_types AS a
JOIN splitter AS s ON 1 = 1
WHERE split_part(a.merged_access_privileges, ',', s.gen_num) <> ''
GROUP BY 1, 2, 3, 4
)
SELECT
ea.nspname
, ea.owner
, ea.grantee
, LEFT(ea.access_privileges, 1) AS access_privilege
, CASE
WHEN POSITION('*' IN ea.access_privileges) > 0 THEN 'YES'
ELSE 'NO'
END AS is_grantable
FROM expanded_access_privilege_types ea
ORDER BY 1, 2, 3, 4, 5
Edit #4:
Adding some clarification on how the ten_numbers, depivot_index, and splitter tables work to break apart the pg_catalog.pg_namespace.nspacl field. The general overview, is that ten_numbers and depivot_index are created purely to return tables with numbered rows to use as an index when joining in thesplit_partvalues ofnspacl`.
ten_numbers generates a table with a single column, containing the numbers 0-9:
-------
| num |
-------
| 0 |
-------
| 1 |
-------
| etc |
-------
| 9 |
-------
This table is then expanded to house the range 0-9999 during the CTE depivot_index:
-----------
| gen_num |
-----------
| 0 |
-----------
| 1 |
-----------
| 2 |
-----------
| etc |
-----------
| 9998 |
-----------
| 9999 |
-----------
splitter then narrows down the table to house only the numbers up to the maximum count of the specified delimiter within the nspacl field:
-------
| num |
-------
| 0 |
-------
| 1 |
-------
| etc |
-------
| 6 |
-------
The table returned by splitter is then used as the target of a CROSS JOIN via the join on 1 = 1 in CTE expanded_input. This ensures that each member returned by split_part will have its own row:
---------------------------------------------------------------------------
| nspname | nspacl | raw_permissions_string |
---------------------------------------------------------------------------
| avengers | "{james=UC/james,adam=C/james}" | "james=UC/james" |
---------------------------------------------------------------------------
| avengers | "{james=UC/james,adam=C/james}" | "adam=C/james" |
---------------------------------------------------------------------------
| avengers | etc. | etc. |
---------------------------------------------------------------------------

how can I get all ids starting from a given id recursively in a postgresql table that references itself?

the title may not be very clear so let's consider this example (this is not my code, just taking this example to model my request)
I have a table that references itself (like a filesystem)
id | parent | name
----+----------+-------
1 | null | /
2 | 1 | home
3 | 2 | user
4 | 3 | bin
5 | 1 | usr
6 | 5 | local
Is it possible to make a sql request so if I choose :
1 I will get a table containing 2,3,4,5,6 (because this is the root) so matching :
/home
/home/user
/home/user/bin
/usr
etc...
2 I will get a table containing 3,4 so matching :
/home/user
/home/user/bin
and so on
Use recursive common table expression. Always starting from the root, use an array of ids to get paths for a given id in the WHERE clause.
For id = 1:
with recursive cte(id, parent, name, ids) as (
select id, parent, name, array[id]
from my_table
where parent is null
union all
select t.id, t.parent, concat(c.name, t.name, '/'), ids || t.id
from cte c
join my_table t on c.id = t.parent
)
select id, name
from cte
where 1 = any(ids) and id <> 1
id | name
----+-----------------------
2 | /home/
5 | /usr/
6 | /usr/local/
3 | /home/user/
4 | /home/user/bin/
(5 rows)
For id = 2:
with recursive cte(id, parent, name, ids) as (
select id, parent, name, array[id]
from my_table
where parent is null
union all
select t.id, t.parent, concat(c.name, t.name, '/'), ids || t.id
from cte c
join my_table t on c.id = t.parent
)
select id, name
from cte
where 2 = any(ids) and id <> 2
id | name
----+-----------------------
3 | /home/user/
4 | /home/user/bin/
(2 rows)
Bidirectional query
The question is really interesting. The above query works well but is inefficient as it parses all tree nodes even when we're asking for a leaf. The more powerful solution is a bidirectional recursive query. The inner query walks from a given node to top, while the outer one goes from the node to bottom.
with recursive outer_query(id, parent, name) as (
with recursive inner_query(qid, id, parent, name) as (
select id, id, parent, name
from my_table
where id = 2 -- parameter
union all
select qid, t.id, t.parent, concat(t.name, '/', q.name)
from inner_query q
join my_table t on q.parent = t.id
)
select qid, null::int, right(name, -1)
from inner_query
where parent is null
union all
select t.id, t.parent, concat(q.name, '/', t.name)
from outer_query q
join my_table t on q.id = t.parent
)
select id, name
from outer_query
where id <> 2; -- parameter

Select query for selecting columns from those records from the inner query . where inner query and outer query have different columns

I have a group by query which fetches me some records. What if I wish to find other column details representing those records.
Suppose I have a query as follows .Select id,max(date) from records group by id;
to fetch the most recent entry in the table.
I wish to fetch another column representing those records .
I want to do something like this (This incorrect query is just for example) :
Select type from (Select id,max(date) from records group by id) but here type doesnt exist in the inner query.
I am not able to define the question in a simpler manner.I Apologise for that.
Any help is appreciated.
EDIT :
Column | Type | Modifiers
--------+-----------------------+-----------
id | integer |
rdate | date |
type | character varying(20) |
Sample Data :
id | rdate | type
----+------------+------
1 | 2013-11-03 | E1
1 | 2013-12-12 | E1
2 | 2013-12-12 | A3
3 | 2014-01-11 | B2
1 | 2014-01-15 | A1
4 | 2013-12-23 | C1
5 | 2014-01-05 | C
7 | 2013-12-20 | D
8 | 2013-12-20 | D
9 | 2013-12-23 | A1
While I was trying something like this (I'm no good at sql) : select type from records as r1 inner join (Select id,max(rdate) from records group by id) r2 on r1.rdate = r2.rdate ;
or
select type from records as r1 ,(Select id,max(rdate) from records group by id) r2 inner join r1 on r1.rdate = r2.rdate ;
You can easily do this with a window function:
SELECT id, rdate, type
FROM (
SELECT id, rdate, type, rank() OVER (PARTITION BY id ORDER BY rdate DESC) rnk
FROM records
WHERE rnk = 1
) foo
ORDER BY id;
The window definition OVER (PARTITION BY id ORDER BY rdate DESC) takes all records with the same id value, then sorts then from most recent to least recent rdate and assigns a rank to each row. The rank of 1 is the most recent, so equivalent to max(rdate).
If I've understood the question right, then this should work (or at least get you something you can work with):
SELECT
b.id, b.maxdate, a.type
FROM
records a -- this is the records table, where you'll get the type
INNER JOIN -- now join it to the group by query
(select id, max(rdate) as maxdate FROM records GROUP BY id) b
ON -- join on both rdate and id, otherwise you'll get lots of duplicates
b.id = a.id
AND b.maxdate = a.rdate
Note that if you have records with different types for the same id and rdate combination you'll get duplicates.

SQL query for insert into with a set of constants

It seems like there should be a query for this, but I can't think of how to do it.
I've got a table with a composite primary key consisting of two fields I'd like to populate with data,
I can do an insert into from one table to fill up half the keys, but I want to fill up the other half with a set of constants (0, 3, 5, 6, 9) etc...
so the end result would look like this
+--------------+
|AwesomeTable |
+--------------+
| Id1 | Id2 |
| 1 | 0 |
| 1 | 3 |
| 1 | 5 |
| 1 | 6 |
| 1 | 9 |
| 2 | 0 |
| 2 | 3 |
| ... | ... |
+--------------+
I've got as far as insert into awesometable (id1, id2) select id1, [need something here] from table1 [need something else here]
I've got a table with 2 primary keys
No, you don't. A table can only have one primary key. You probably mean a composite primary key.
I believe you want this:
INSERT
INTO awesometable (id1, id2)
SELECT t1.id1, q.id2
FROM table1 t1
CROSS JOIN
(
SELECT 0 AS id2
UNION ALL
SELECT 3
UNION ALL
SELECT 5
UNION ALL
SELECT 6
UNION ALL
SELECT 9
) q
, or in Oracle:
INSERT
INTO awesometable (id1, id2)
SELECT t1.id1, q.id2
FROM table1 t1
CROSS JOIN
(
SELECT 0 AS id2
FROM dual
UNION ALL
SELECT 3
FROM dual
UNION ALL
SELECT 5
FROM dual
UNION ALL
SELECT 6
FROM dual
UNION ALL
SELECT 9
FROM dual
) q
If I understand correctly, maybe you can use something like this:
insert into awesometable (id1, id2)
select id1, (select top 1 id2 from table2 where /*a condition here to retreive only one result*/)
from table1

mySQL query: How to insert with UNION?

I am kind of new to mySQL:s union functions, at least when doing inserts with them. I have gotten the following to work based upon a example found on the net:
INSERT INTO tableOne(a, b)
SELECT a, $var FROM tableOne
WHERE b = $var2
UNION ALL SELECT $var,$var
Ok, nothing strange about that. But what happens when I want to insert a third value into the database that has nothing to do with the logic of the Select being done?
Like : INSERT INTO tableOne(a, b, c )
How could that be done?
You can "select" literal values too:
mysql> select 'hello', 1;
+-------+---+
| hello | 1 |
+-------+---+
| hello | 1 |
+-------+---+
1 row in set (0.00 sec)
Hence, you can also use that in INSERT INTO ... SELECT FROM and UNIONs.
INSERT INTO someTable (a, b, c) VALUES
SELECT id, name, 5
FROM someOtherTable
UNION
SELECT id, alias, 8
FROM anotherTable