postgresql where clause behavior - postgresql

I made two queries that I thought should have the same result:
SELECT COUNT(*) FROM (
SELECT DISTINCT ON (id1) id1, value
FROM (
SELECT table1.id1, table2.value
FROM table1
JOIN table2 ON table1.id1=table2.id
WHERE table2.value = '1')
AS result1 ORDER BY id1)
AS result2;
SELECT COUNT(*) FROM (
SELECT DISTINCT ON (id1) id1, value
FROM (
SELECT table1.id1, table2.value
FROM table1
JOIN table2 ON table1.id1=table2.id
)
AS result1 ORDER BY id1)
AS result2
WHERE value = '1';
The only difference being that one had the WHERE clause inside SELECT DISTINCT ON, and the other outside that, but inside SELECT COUNT. But the results were not the same. I don't understand why the position of the WHERE clause should make a difference in this case. Can anyone explain? Or is there a better way to phrase this question?

here's a good way to look at this:
SELECT DISTINCT ON (id) id, value
FROM (select 1 as id, 1 as value
union
select 1 as id, 2 as value) a;
SELECT DISTINCT ON (id) id, value
FROM (select 1 as id, 1 as value
union
select 1 as id, 2 as value) a
WHERE value = 2;
The problem has to do with the unique conditions and what is visible where. It is behavior by design.

Related

How to optimise a SQL query to check for consistency of column values across tables

I would like to check across multiple tables that the same keys / same number of keys are present in each of the tables.
Currently I have created a solution that checks the count of keys per individual table, checks the count of keys when all tables are merged together, then compares.
This solution works but I wonder if there is a more optimal solution...
Example solution as it stands:
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_a;
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_b;
SELECT COUNT(DISTINCT variable) AS num_ids FROM table_c;
SELECT COUNT(DISTINCT a.variable) AS num_ids
FROM (SELECT DISTINCT VARIABLE FROM table_a) a
INNER JOIN (SELECT DISTINCT VARIABLE FROM table_b) b ON a.variable = b.variable
INNER JOIN (SELECT DISTINCT VARIABLE FROM table_c) c ON a.variable = c.variable;
UPDATE:
The difficultly that I'm facing putting this together in one query is that any of the tables might not be unique on the VARIABLE that I am looking to check, so I've had to use distinct before merging to avoid expanding the join
Since we are only counting, I think there is no need in joining the tables on the variable column. A UNION should be enough.
We still have to use DISTINCT to ignore/suppress duplicates, which often means extra sort.
An index on variable should help for getting counts for separate tables, but it will not help for getting the count of the combined table.
Here is an example for comparing two tables:
WITH
CTE_A
AS
(
SELECT COUNT(DISTINCT variable) AS CountA
FROM TableA
)
,CTE_B
AS
(
SELECT COUNT(DISTINCT variable) AS CountB
FROM TableB
)
,CTE_AB
AS
(
SELECT COUNT(DISTINCT variable) AS CountAB
FROM
(
SELECT variable
FROM TableA
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableB
) AS AB
)
SELECT
CASE WHEN CountA = CountAB AND CountB = CountAB
THEN 'same' ELSE 'different' END AS ResultAB
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_AB
;
Three tables:
WITH
CTE_A
AS
(
SELECT COUNT(DISTINCT variable) AS CountA
FROM TableA
)
,CTE_B
AS
(
SELECT COUNT(DISTINCT variable) AS CountB
FROM TableB
)
,CTE_C
AS
(
SELECT COUNT(DISTINCT variable) AS CountC
FROM TableC
)
,CTE_ABC
AS
(
SELECT COUNT(DISTINCT variable) AS CountABC
FROM
(
SELECT variable
FROM TableA
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableB
UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`
SELECT variable
FROM TableC
) AS AB
)
SELECT
CASE WHEN CountA = CountABC AND CountB = CountABC AND CountC = CountABC
THEN 'same' ELSE 'different' END AS ResultABC
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_C
CROSS JOIN CTE_ABC
;
I deliberately chose CTE, because as far as I know Postgres materializes CTE and in our case each CTE will have only one row.
Using array_agg with order by is even better variant, if it is available on redshift. You'll still need to use DISTINCT, but you don't have to merge all tables together.
WITH
CTE_A
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS A
FROM TableA
)
,CTE_B
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS B
FROM TableB
)
,CTE_C
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS C
FROM TableC
)
SELECT
CASE WHEN A = B AND B = C
THEN 'same' ELSE 'different' END AS ResultABC
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_C
;
Well, here is probably the nastiest piece of SQL I could build for you :) I will forever deny that I wrote this and that my stackoverflow account was hacked ;)
SELECT
'All OK'
WHERE
( SELECT COUNT(DISTINCT id) FROM table_a ) = ( SELECT COUNT(DISTINCT id) FROM table_b )
AND ( SELECT COUNT(DISTINCT id) FROM table_b ) = ( SELECT COUNT(DISTINCT id) FROM table_c )
By the way, this won't optimise the query - it's still doing three queries (but I guess it's better than 4?).
UPDATE: In light of your use-case below: NEW sql fiddle http://sqlfiddle.com/#!15/a0403/1
SELECT DISTINCT
tbl_a.a_count,
tbl_b.b_count,
tbl_c.c_count
FROM
( SELECT COUNT(id) a_count, array_agg(id order by id) ids FROM table_a) tbl_a,
( SELECT COUNT(id) b_count, array_agg(id order by id) ids FROM table_b) tbl_b,
( SELECT COUNT(id) c_count, array_agg(id order by id) ids FROM table_c) tbl_c
WHERE
tbl_a.ids = tbl_b.ids
AND tbl_b.ids = tbl_c.ids
The above query will only return if all tables have the same number of rows, ensuring that the IDS are also the same.

Postgresql rows to columns (UNION ALL to JOIN)

Hello with this query I'm getting one result with four rows, how can I change it in order to get four named columns with their own result every one?
SELECT COUNT(*) FROM vehicles WHERE cus=1
UNION ALL
SELECT COUNT(*) FROM user WHERE cus=1
UNION ALL
SELECT COUNT(*) FROM vehicle_events WHERE cus=1
UNION ALL
SELECT COUNT(*) FROM vehicle_alerts WHERE cus=1
Thanks in advance.
SELECT a.ct veh_count, b.ct user_count, c.ct event_count, d.ct alert_count
FROM
( SELECT COUNT(*) ct FROM vehicles WHERE cus=1 ) a,
( SELECT COUNT(*) ct FROM user WHERE cus=1 ) b,
( SELECT COUNT(*) ct FROM vehicle_events WHERE cus=1 ) c,
( SELECT COUNT(*) ct FROM vehicle_alerts WHERE cus=1 ) d;
UNION only adds rows; it has no effect on the columns.
Columns, which define the "shape" of the row tuples, must appear as selected columns1.
For example:
SELECT
(SELECT COUNT(*) FROM vehicles WHERE cus=1) as veh_count
,(SELECT COUNT(*) FROM users WHERE cus=1) as user_count
..
1 There are other constructs that can allow this, see crosstab for example - but the columns are fixed by the query command. It takes dynamic SQL to get a variable number of columns.

How to use a Table type in query

I have 9000 row in News table and use this code for selecting 20 from it:
Select *
From (
Select *, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS Num
From News
Where SubjectID in(Select MenuSubject.SubjectID
From MenuSubject inner join Menu on MenuSubject.MenuID = Menu.MenuID)
) as myTable
where myTable.Num BETWEEN 100 and 120
But time is 28 second spent reading! Also, I test this query with out join table and get result at 1 second.
So, I want use Table type for select join table and use this in query. I made new Table type using the following code:
DECLARE #MyTable2 IntListTable
Insert Into #MyTable2
Select MenuSubject.SubjectID
From MenuSubject inner join Menu on MenuSubject.MenuID = Menu.MenuID
Select *
From (
Select *, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS Num
From News
Where SubjectID in #MyTable2
) as myTable
where myTable.Num BETWEEN 100 and 120
But get Error in
SubjectID in #MyTable2
Error:
Incorrect syntax near '#MyTable2'.
Edit:
I test my code with:
Select myTable.Title
or use this code instead join table:
Where SubjectID in(13,14,20,21,25,24,26,24,28,29,30,54,55,60,47,98,99,65,14,20,33,666,987,254)
get result at 1 second.
but use this code in query:
Select myTable.MoreText
time is 28 second spent reading!. why!?
Try this,
Select x.Num
From (
Select *, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS Num
From News
Where SubjectID in(Select MenuSubject.SubjectID
From MenuSubject inner join Menu on MenuSubject.MenuID = Menu.MenuID)
) x
where x.Num <21
WITH myTempTable as (Select MenuSubject.SubjectID
From MenuSubject inner join Menu on MenuSubject.MenuID = Menu.MenuID)
Select *
From (
Select *, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS Num
From News
Where SubjectID in (SELECT SubjectID FROM myTempTable)
) as myTable
where myTable.Num BETWEEN 100 and 120
You can try above query.
There is absolutely no need for a User-Defined Table Type in this query. It adds work but no actual benefit.
The problem is most likely the fact that you are using an IN list as those translate out to be an OR condition for each of the values. But an IN list isn't needed either.
This query can actually be simplified by rethinking it in terms of an INNER JOIN, which should be better as it will allow the Query Optimizer to do its job.
SELECT *
FROM (
SELECT nw.*, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS [Num]
FROM News nw
INNER JOIN (
MenuSubject
INNER JOIN Menu
ON MenuSubject.MenuID = Menu.MenuID
) ON MenuSubject.SubjectID = nw.SubjectID
) AS myTable
WHERE myTable.Num BETWEEN 100 AND 120;
One final simplification that can be made, though I doubt it is needed here since 9000 rows is almost no data at all, is to first dump the results to a local temporary table and then use that in the INNER JOIN:
CREATE TABLE #Subjects
(
SubjectID INT NOT NULL -- PRIMARY KEY -- test with and without PK to see if it helps
);
INSERT INTO #Subjects (SubjectID)
SELECT MenuSubject.SubjectID
FROM MenuSubject
INNER JOIN Menu
ON Menu.MenuID = MenuSubject.MenuID;
SELECT *
FROM (
SELECT nw.*, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS [Num]
FROM News nw
INNER JOIN #Subjects sub
ON sub.SubjectID = nw.SubjectID
) AS myTable
WHERE myTable.Num BETWEEN 100 AND 120;

SQL Server SUM() for DISTINCT records

I have a field called "Users", and I want to run SUM() on that field that returns the sum of all DISTINCT records. I thought that this would work:
SELECT SUM(DISTINCT table_name.users)
FROM table_name
But it's not selecting DISTINCT records, it's just running as if I had run SUM(table_name.users).
What would I have to do to add only the distinct records from this field?
Use count()
SELECT count(DISTINCT table_name.users)
FROM table_name
SQLFiddle demo
This code seems to indicate sum(distinct ) and sum() return different values.
with t as (
select 1 as a
union all
select '1'
union all
select '2'
union all
select '4'
)
select sum(distinct a) as DistinctSum, sum(a) as allSum, count(distinct a) as distinctCount, count(a) as allCount from t
Do you actually have non-distinct values?
select count(1), users
from table_name
group by users
having count(1) > 1
If not, the sums will be identical.
You can see for yourself that distinct works with the following example. Here I create a subquery with duplicate values, then I do a sum distinct on those values.
select DistinctSum=sum(distinct x), RegularSum=Sum(x)
from
(
select x=1
union All
select 1
union All
select 2
union All
select 2
) x
You can see that the distinct sum column returns 3 and the regular sum returns 6 in this example.
You can use a sub-query:
select sum(users)
from (select distinct users from table_name);
SUM(DISTINCTROW table_name.something)
It worked for me (innodb).
Description - "DISTINCTROW omits data based on entire duplicate records, not just duplicate fields." http://office.microsoft.com/en-001/access-help/all-distinct-distinctrow-top-predicates-HA001231351.aspx
;WITH cte
as
(
SELECT table_name.users , rn = ROW_NUMBER() OVER (PARTITION BY users ORDER BY users)
FROM table_name
)
SELECT SUM(users)
FROM cte
WHERE rn = 1
SQL Fiddle
Try here yourself
TEST
DECLARE #table_name Table (Users INT );
INSERT INTO #table_name Values (1),(1),(1),(3),(3),(5),(5);
;WITH cte
as
(
SELECT users , rn = ROW_NUMBER() OVER (PARTITION BY users ORDER BY users)
FROM #table_name
)
SELECT SUM(users) DisSum
FROM cte
WHERE rn = 1
Result
DisSum
9
If circumstances make it difficult to weave a "distinct" into the sum clause, it will usually be possible to add an extra "where" clause to the entire query - something like:
select sum(t.ColToSum)
from SomeTable t
where (select count(*) from SomeTable t1 where t1.ColToSum = t.ColToSum and t1.ID < t.ID) = 0
May be a duplicate to
Trying to sum distinct values SQL
As per Declan_K's answer:
Get the distinct list first...
SELECT SUM(SQ.COST)
FROM
(SELECT DISTINCT [Tracking #] as TRACK,[Ship Cost] as COST FROM YourTable) SQ

Most effective way to get value if select count(*) = 1 with grouping

Lets say I have table with ID int, VALUE string:
ID | VALUE
1 abc
2 abc
3 def
4 abc
5 abc
6 abc
If I do select value, count(*) group by value I should get
VALUE | COUNT
abc 5
def 1
Now the tricky part, if there is count == 1 I need to get that ID from first table. Should I be using CTE? creating resultset where I will add ID string == null and run update b.ID = a.ID where count == 1 ?
Or is there another easier way?
EDIT:
I want to have result table like this:
ID VALUE count
null abc 5
3 def 1
If your ID values are unique, you can simply check to see if the max(id) = min(id). If so, then use either one, otherwise you can return null. Like this:
Select Case When Min(id) = Max(id) Then Min(id) Else Null End As Id,
Value, Count(*) As [Count]
From YourTable
Group By Value
Since you are already performing an aggregate, including the MIN and Max function is not likely to take any extra (noticeable) time. I encourage you to give this a try.
The way I would do it would indeed be a CTE:
using #group AS (SELECT value, Count(*) as count from MyTable GROUP BY value HAVING count = 1)
SELECT MyTable.ID, #group.value, #group.count from MyTable
JOIN #group ON #group.value = MyTable.value
When using group by, after the group by statement you can use a having clause.
So
SELECT [ID]
FROM table
GROUP BY [VALUE]
HAVING COUNT(*) = 1
Edit: with regards to your edited question: this uses some fun joins and unions
CREATE TABLE #table
(ID int IDENTITY,
VALUE varchar(3))
INSERT INTO #table (VALUE)
VALUES('abc'),('abc'),('def'),('abc'),('abc'),('abc')
SELECT * FROM (
SELECT Null as ID,VALUE, COUNT(*) as [Count]
FROM #table
GROUP BY VALUE
HAVING COUNT(*) > 1
UNION ALL
SELECT t.ID,t.VALUE,p.Count FROM
#table t
JOIN
(SELECT VALUE, COUNT(*) as [Count]
FROM #table
GROUP BY VALUE
HAVING COUNT(*) = 1) p
ON t.VALUE=p.VALUE
) a
DROP TABLE #table
maybe not the most efficient but something like this works:
SELECT MAX(Id) as ID,Value FROM Table WHERE COUNT(*) = 1 GROUP BY Value