I have a table with 20 columns, and I like to get distinct values of each column
So if I have
A B C D ....
----------
z c c d
z f c f
a c f d
z c c d
b f b d
z c a d
I want to get back
{ 'A':[z,a,b],
'B':[c,f],
'C': [c,f,b,a]
'D': [d,f]
....
}
How would the query look like ?
Maybe you need in
SELECT array_agg(DISTINCT a) a,
array_agg(DISTINCT b) b,
array_agg(DISTINCT c) c,
array_agg(DISTINCT d) d
FROM test;
?
fiddle
Related
I have two tables table1 and table2. Both has 4 columns with same column names. table1 has 50 rows and table2 has 100 rows. How can I get only those rows from table2, which are not there in table1. I tried performing left join, but I am not able to do that, since we can't do left join using all columns.
Since tables are lists of dictionaries, you could use the except keyword to exclude all rows from table2 which are found in table1.
For example:
q)table1:([]a:til 3;b:3#.Q.a;c:3#.Q.A)
q)table1
a b c
-----
0 a A
1 b B
2 c C
q)table2:([]a:til 6;b:6#.Q.a;c:6#.Q.A)
q)table2
a b c
-----
0 a A
1 b B
2 c C
3 d D
4 e E
5 f F
q)table2 except table1
a b c
-----
3 d D
4 e E
5 f F
I have a simple but very important concept to clear in T-SQL.
I am writing a lot of T-SQL queries against a table, with a lot of aggregations and GROUP BY.
Now, in the SELECT clause of my T-SQL query, I have a CASE-WHEN statements. Please see below:
Statement 1:
SELECT X, Y, Z,
A = CASE
WHEN P = 1 THEN B
ELSE Q
END,
SUM(Sales)
FROM mytable
GROUP BY
X, Y, Z,
CASE
WHEN P = 1 THEN B
ELSE Q
END
Now can Statement 1 be written as Statement 2 ?
Statement 2:
SELECT X, Y, Z,
A = CASE
WHEN P = 1 THEN B
ELSE Q
END,
SUM(Sales)
FROM mytable
GROUP BY
X, Y, Z,
P, B, Q
Is Statement 1 = Statement 2 ?
Can the CASE-WHEN in the SELECT clause be modified in the GROUP BY clause into individual columns?
Will the result set be the same always ?
The difference relies on the amount of different values you might get from columns P, B and Q, against the result of your CASE statement. You can spot the different on this example.
IF OBJECT_ID('tempdb..#Data') IS NOT NULL
DROP TABLE #Data
CREATE TABLE #Data (
P INT,
B INT,
Q INT,
Sales INT)
INSERT INTO #Data (
P,
B,
Q,
Sales)
VALUES
(1, 20, 300, 1000),
(1, 20, 400, 500),
(2, 1, 1, 50),
(2, 1, 1, 250)
-- Statement 2
SELECT
P,
B,
Q,
TotalSales = SUM(D.Sales)
FROM
#Data AS D
GROUP BY
P,
B,
Q
/*
All different combinations of PBQ and listed, and their sales added
P B Q TotalSales
1 20 300 1000
1 20 400 500
2 1 1 300
*/
-- Statement 1
SELECT
CaseResult = CASE WHEN P = 1 THEN B ELSE Q END,
TotalSales = SUM(D.Sales)
FROM
#Data AS D
GROUP BY
CASE WHEN P = 1 THEN B ELSE Q END
/*
The grouping value depends on value B when P = 1 (and not on Q!) so
all records with P = 1 and same B are grouped together and
all records with P = 0 and same Q are grouped together
CaseResult TotalSales
1 300
20 1500
*/
There might be the case when you data doesn't generate different values from the CASE to the combination of P, B and Q, in that case the results will be the same for both queries.
In postgresql I got 4 tables
Table A:
-----------
a_id
a_date
Table B
-----------
a_id b_id
Table C:
-------------------
c_id
b_id
invoice_number
Table D
-------------------
d_id
invoice_number
value_D
Multiple records have value_D
I would like to select Table A, Table B, Table C and Table D, where a_date BETWEEN X AND Y.
However, I would also like to select all the other value_D that are not included in my selection (so A innerjoin B innerjoin C full outerjoin D)
my code
SELECT
Table A, Table B, Table C, Table D
FROM
Table A
JOIN
Table B ON A.a_id = B.a_id
JOIN
Table C ON B.b_id = C.b_id
FULL OUTER JOIN
Table D ON C.invoice_number = D.invoice_number
WHERE
A.a_date BETWEEN X AND Y;
It only shows D.value_d for the A.a_id, where A.a_date BETWEEN X and Y.
I would like however that D.value_d would also be shown for A.a_id, where A.a_date is also other.
I am kinda a newbie, so hopefully it is understandable and you could help me.
Thanks in advance
You can also add more conditions to the Where clause, for example:
"a_date BETWEEN X AND Y OR a_date > '2015-04-21'". This will retrieve the union of both conditions.
Regards
I think I solved it.
SELECT Table A, Table B, Table C, Table D
FROM Table A
JOIN Table B ON A.a_id = B.a_id
JOIN Table C ON B.b_id = C.b_id
JOIN Table D ON C.invoice_number = D.invoice_number
WHERE A.a_date BETWEEN X AND Y OR
D.value_D IN (SELECT D.value_D
FROM Table D
JOIN Table C ON D.invoice_number on C.invoice_number
JOIN Table B ON C.b_id = B.b_id
JOIN Table A ON B.a_id = A.a_id
WHERE A.a_date BETWEEN X AND Y);
Thank you all for the help guys!
I have two tables
Table A Table B
-------- ---------
a b c a b c
a b c a b c
a b c a b c
e f g a b c
h i j e f g
k l m k l m
k l m
x y z
s t u
a b c
a b c
Now i want to remove rows in Table B matching on column 1, 2 and 3 with table A where the count of each duplicate row in Table B should be less than or equal to table A.
So the output should be
Table A Table B
-------- ---------
a b c a b c
a b c a b c
a b c a b c
e f g e f g
h i j k l m
k l m x y z
s t u
I have tried using inner join and intersect but failed to get the desired result.
Try:
DELETE FROM tableB
WHERE ctid IN (
SELECT BB.ctid
FROM (
SELECT a, b, c, count(*) cnt
FROM tablea
GROUP BY a, b, c
) AA
JOIN (
SELECT ctid,
a, b, c,
row_number() over (partition by a,b,c) cnt
FROM tableb
) BB
ON AA.a = BB.a
AND AA.b = BB.b
AND AA.c = BB.c
AND AA.cnt < BB.cnt
)
demo: http://sqlfiddle.com/#!12/73e99/1
I think if table isn't big the simply way is to delete all rows from TableB which exist in TableA and then insert TableA into TableB. Another ways IMHO are required at least a primary key in TableB.
DELETE FROM TableB
WHERE EXISTS(SELECT * FROM TableA
WHERE C1=TableA.C1
AND C2=TableA.C2
AND C3=TableA.C3) ;
INSERT INTO TableB SELECT * FROM TableA;
I have a table x which have the fields a, b, c, and d. I want to do a SELECT statement which is GROUPED BY a HAVING a_particular_value = ANY(array_agg(b)) and retrieves a, MIN(d), and c <- from which row is chosen by a_particular_value = ANY(array_agg(b)).
It's a bit confusing.
Lemme try to explain. a_particular_value = ANY(array_agg(b)) will choose some or one record from all records that is grouped by a. I want to retrieve the value of c from the record that causes the condition to be true. While NOT filter out other records because I still need those for the other aggregate function, MIN(d).
The query that I've tried to make:
SELECT a, MIN(d) FROM x
GROUP BY a
HAVING 1 = ANY(array_agg(b))
The only thing that's left to do is put c in the SELECT clause. How do I do this?
with agg as (
select a, min(d) as d
from x
group by a
having 1 = any(array_agg(b))
)
select distinct on (a, c)
a, c, d
from
x
inner join
agg using (a, d)
order by a, c
If min(d) is not unique within the a group then it is possible to exist more than one corresponding c. The above will return the smallest c. If you want the biggest do in instead
order by a, c desc
c can have various values in this scenario, so your only option is to group by c as well.
SELECT a, c FROM x
GROUP BY a, c
HAVING 1 = ANY(array_agg(b))
If you want to eliminate rows with b not satisfying condition before applying GROUP BY then use WHERE as documentation for HAVING says http://www.postgresql.org/docs/9.2/static/sql-select.html#SQL-HAVING