Identify distinct data by comparing two columns - db2

I have a table as mentioned below:
column1 | column2 | column3
--------|---------|--------
E1 | AA12345 | 12345
E2 | BB12345 | 12345
E3 | CC12345 | 12345
E4 | CC12345 | 12345
E5 | DD12345 | 12345
I need the rows that has same value in column3, but different value in column2 and should be more than 1 row.
Can you please help?
Expected result: It should pick the rows E1 and E2

How about:
select * from my_table where (column3, column2) in (
select column3, column2 from my_table
group by column3, column2
having count(*) = 1
);
This query will pick E1, E2, and E5. It won't pick E3, E4 since it's duplicated.

Related

PostgreSQL : Change column values based on another column value using some condition in same table

I have a table and want to replace the column value with value from other column value based on some condition.
+---------------------+
| Cntry | Code | Value |
+---------------------+
| US | C11 | A |
| US | C12 | B |
| US | C13 | C |
| US | C14 | D |
| US | C15 | E |
| UK | C11 | A |
| UK | C12 | B |
| UK | C13 | C |
| UK | C14 | D |
| UK | C15 | E |
+---------------------+
I want to replace the value of C14 based on the value of C11 based on Cntry
So my output should be like this.
+---------------------+
| Cntry | Code | Value |
+---------------------+
| US | C11 | A |
| US | C12 | B |
| US | C13 | C |
| US | C14 | A |<====Repalce with C11 for US
| US | C15 | E |
| UK | C11 | G |
| UK | C12 | B |
| UK | C13 | C |
| UK | C14 | G |<====Repalce with C11 for UK
| UK | C15 | E |
+---------------------+
Is there anyway to do this in postgresql?
Thanks
Create sample data:
CREATE TABLE table1 (
cntry varchar NULL,
code varchar NULL,
value varchar NULL
);
INSERT INTO table1 (cntry, code, value) VALUES('US', 'C11', 'A');
INSERT INTO table1 (cntry, code, value) VALUES('US', 'C12', 'B');
INSERT INTO table1 (cntry, code, value) VALUES('US', 'C13', 'C');
INSERT INTO table1 (cntry, code, value) VALUES('US', 'C14', 'D');
INSERT INTO table1 (cntry, code, value) VALUES('US', 'C15', 'E');
INSERT INTO table1 (cntry, code, value) VALUES('UK', 'C11', 'G');
INSERT INTO table1 (cntry, code, value) VALUES('UK', 'C12', 'B');
INSERT INTO table1 (cntry, code, value) VALUES('UK', 'C13', 'C');
INSERT INTO table1 (cntry, code, value) VALUES('UK', 'C14', 'D');
INSERT INTO table1 (cntry, code, value) VALUES('UK', 'C15', 'E');
Sample query:
select
t1.cntry,
t1.code,
case when t2.value is not null then t2.value else t1.value end as "value"
from table1 t1
left join (
select
cntry,
'C14' as code,
value
from table1
where code = 'C11'
) t2 on t1.cntry = t2.cntry and t1.code = t2.code
-- Result:
cntry code value
US C11 A
US C12 B
US C13 C
US C14 A
US C15 E
UK C11 G
UK C12 B
UK C13 C
UK C14 G
UK C15 E
If you want to actually change the contents of your table, then an UPDATE query will do the trick.
UPDATE mytable
SET code = 'C11'
WHERE code = 'C14'`
For obvious reasons, you should be super careful with UPDATE queries. There are a couple of ways to avoid mistakes that I sometimes use:
Try a SELECT statement first to get the rows I think I want to change. If this looks good, then edit the query to change SELECT to UPDATE
Make a copy of the table. Try your update on the copy. If you're happy with the results, try the query on the original table. Use SELECT INTO to create table (SELECT * INTO tablecopy FROM mytable) and then DROP TABLE (DROP tablecopy) on the copy.

Update a table from a union select statement

I have two tables as below:
tablea
k | 1 | 2
--------------------
a | mango | xx
b | orange| xx
c | xx | apple
d | xx | banana
a | xx | mango
tableb
k | 1 | 2
--------------------
a | |
b | |
c | |
d | |
How can I update tableb from tablea so I get the results below?
tableb
k | 1 | 2
--------------------
a | mango | mango
b | orange| xx
c | xx | apple
d | xx | banana
if in case I try to use a update statement like below
update tableb
set 1 = x.1,
2 = x.2
from
(
select * from tablea
) x
where tablea.k = x.k
Can I make the update statement to ignore xx if k is duplicate?
Thanks.
Here is the SELECT, hope you can make the update.
Try to search a match for every one on the left side with name <> 'xx'
Then union with the rest of rows I havent use it yet.
SQL Fiddle Demo
SELECT t1."k", t1."1", COALESCE(t2."2", 'xx') "2"
FROM tablea t1
LEFT JOIN tablea t2
ON t1."1" = t2."2"
WHERE t1."1" <> 'xx'
UNION ALL
SELECT t1."k", t1."1", t1."2"
FROM tablea t1
WHERE t1."1" = 'xx'
AND t1."2" NOT IN (SELECT t2."1" FROM tablea t2 WHERE t2."1" <> 'xx')

How can I get the sum(value) on the latest gather_time per group(name,col1) in PostgreSQL?

Actually, I got a good answer about the similar issue on below thread, but I need one more solution for different data set.
How to get the latest 2 rows ( PostgreSQL )
The Data set has historical data, and I just want to get sum(value) for the group on the latest gather_time.
The final result should be as following:
name | col1 | gather_time | sum
-------+------+---------------------+-----
first | 100 | 2016-01-01 23:12:49 | 6
first | 200 | 2016-01-01 23:11:13 | 4
However, I just can see the data for the one group(first-100) with a query below meaning that there is no data for the second group(first-200).
Thing is that I need to get the one row per the group.
The number of the group can be vary.
select name,col1,gather_time,sum(value)
from testtable
group by name,col1,gather_time
order by gather_time desc
limit 2;
name | col1 | gather_time | sum
-------+------+---------------------+-----
first | 100 | 2016-01-01 23:12:49 | 6
first | 100 | 2016-01-01 23:11:19 | 6
(2 rows)
Can you advice me to accomplish this requirement?
Data set
create table testtable
(
name varchar(30),
col1 varchar(30),
col2 varchar(30),
gather_time timestamp,
value integer
);
insert into testtable values('first','100','q1','2016-01-01 23:11:19',2);
insert into testtable values('first','100','q2','2016-01-01 23:11:19',2);
insert into testtable values('first','100','q3','2016-01-01 23:11:19',2);
insert into testtable values('first','200','t1','2016-01-01 23:11:13',2);
insert into testtable values('first','200','t2','2016-01-01 23:11:13',2);
insert into testtable values('first','100','q1','2016-01-01 23:11:11',2);
insert into testtable values('first','100','q1','2016-01-01 23:12:49',2);
insert into testtable values('first','100','q2','2016-01-01 23:12:49',2);
insert into testtable values('first','100','q3','2016-01-01 23:12:49',2);
select *
from testtable
order by name,col1,gather_time;
name | col1 | col2 | gather_time | value
-------+------+------+---------------------+-------
first | 100 | q1 | 2016-01-01 23:11:11 | 2
first | 100 | q2 | 2016-01-01 23:11:19 | 2
first | 100 | q3 | 2016-01-01 23:11:19 | 2
first | 100 | q1 | 2016-01-01 23:11:19 | 2
first | 100 | q3 | 2016-01-01 23:12:49 | 2
first | 100 | q1 | 2016-01-01 23:12:49 | 2
first | 100 | q2 | 2016-01-01 23:12:49 | 2
first | 200 | t2 | 2016-01-01 23:11:13 | 2
first | 200 | t1 | 2016-01-01 23:11:13 | 2
One option is to join your original table to a table containing only the records with the latest gather_time for each name, col1 group. Then you can take the sum of the value column for each group to get the result set you want.
SELECT t1.name, t1.col1, MAX(t1.gather_time) AS gather_time, SUM(t1.value) AS sum
FROM testtable t1 INNER JOIN
(
SELECT name, col1, col2, MAX(gather_time) AS maxTime
FROM testtable
GROUP BY name, col1, col2
) t2
ON t1.name = t2.name AND t1.col1 = t2.col1 AND t1.col2 = t2.col2 AND
t1.gather_time = t2.maxTime
GROUP BY t1.name, t1.col1
If you wanted to use a subquery in the WHERE clause, as you attempted in your OP, to restrict to only records with the latest gather_time then you could try the following:
SELECT name, col1, gather_time, SUM(value) AS sum
FROM testtable t1
WHERE gather_time =
(
SELECT MAX(gather_time)
FROM testtable t2
WHERE t1.name = t2.name AND t1.col1 = t2.col1
)
GROUP BY name, col1

Counting occurrences of a value in a column of a table - postgresql

I have a table with multiple columns:
table1 | column1 | column2 | column3 |
| x | .... | .... |
| y | .... | .... |
| x | .... | .... |
How can I count the occurences of a value, for example x, in one of the columns, for example column1? Given table1 this would have to return me 2 (numbers of x present in column1).
You can use SUM() aggregate function with a CASE statement like
select sum(case when column1 = 'x' then 1 else 0 end) as X_Count
from tabl1;
SELECT COUNT(*) FROM table1 WHERE column1 = 'x'

SQL - group by - limit clause - postgresql

I have a table which has two columns C1 and C2.
C1 has an integer data type and C2 has text.
Table looks like this.
---C1--- ---C2---
1 | a |
1 | b |
1 | c |
1 | d |
1 | e |
1 | f |
1 | g |
2 | h |
2 | i |
2 | j |
2 | k |
2 | l |
2 | m |
2 | n |
------------------
My question: i want a sql query which does group by on column C1 but with size of 3.
looks like this.
------------------
1 | a,b,c |
1 | d,e,f |
1 | g |
2 | h,i,j |
2 | k,l,m |
2 | n |
------------------
is it possible by executing SQL???
Note: I do not want to write stored procedure or function...
You can use a common table expression to partition the results into rows, and then use STRING_AGG to join them into comma separated lists;
WITH cte AS (
SELECT *, (ROW_NUMBER() OVER (PARTITION BY C1 ORDER BY C2)-1)/3 rn
FROM mytable
)
SELECT C1, STRING_AGG(C2, ',') ALL_C2
FROM cte
GROUP BY C1,rn
ORDER BY C1
An SQLfiddle to test with.
A short explanation of the common table expression;
ROW_NUMBER() OVER (...) will number the results from 1 to n for each value of C1. We then subtract 1 and divide by 3 to get the sequence 0,0,0,1,1,1,2,2,2... and group by that value in the outer query to get 3 results per row.
Apart from Joachim Isaksson's answer,you try this method also
SELECT C1, string_agg(C2, ',') as c2
FROM (
SELECT *, (ROW_NUMBER() OVER (PARTITION BY C1 ORDER BY C2)-1)/3 as row_num
FROM atable) t
GROUP BY C1,row_num
ORDER BY c2