Postgres "NOT IN" operator usage - postgresql

In my database when I run the following query, I get 1077 as output.
select count(distinct a_t1) from t1;
Then, when I run this query, I get 459.
select count(distinct a_t1) from t1
where a_t1 in (select a_t1 from t1 join t2 using (a_t1_t2) where a_t2=0);
The above is the same as, this query which also give 459:
select count(distinct a_t1) from t1 join t2 using (a_t1_t2) where a_t2=0
But when I run this query, I get 0 instead of 618 which I was expecting:
select count(distinct a_t1) from t1
where a_t1 not in (select a_t1 from t1 join t2 using (a_t1_t2) where a_t2=0);
I am running PostgreSQL 9.1.5, which really might not be necessary. Please point out my mistake in the above query.
UPDATE 1:
I created a new table and output the result of the subquery above into that one. Then, I ran a few queries:
select count(distinct a_t1) from t1
where a_t1 not in (select a_t1 from sub_query_table order by a_t1 limit 10);
And Hooray! now I get 10 as the answer! I was able to increase the limit until 450. After that, I started getting 0 again.
UPDATE 2:
The sub_query_table has 459 values in it. Finally, this query gives me the required answer:
select count(distinct a_t1) from t1
where a_t1 not in (select a_t1 from sub_query_table order by a_t1 limit 459);
Where as this one, gives 0 as the answer:
select count(distinct a_t1) from t1
where a_t1 not in (select a_t1 from sub_query_table);
But, why is this happening?

The 'NOT IN' operator works only over 'NOT NULL'. Columns with a value of null are not matched.
select count(distinct a_t1) from t1
where a_t1 not in (select a_t1 from t1 join t2 using (a_t1_t2) where a_t2=0) OR a_t1 IS NULL;

Related

Calculate difference between the row counts of tables in two schemas in PostgreSQL

I have two table with same name in two different schemas (old and new dump). I would like to know the difference between the two integration.
I have two queries, that gives old and new count:
select count(*) as count_old from(
SELECT
distinct id
FROM
schema1.compound)q1
select count(*) as count_new from(
SELECT
distinct id
FROM
schema2.compound)q2
I would like have the following output.
table_name count_new count_new diff
compound 4740 4735 5
Any help is appreciated. Thanks in advance
with counts as (
select
(select count(distinct id) from schema1.compound) as count_old,
(select count(distinct id) from schema2.compound) as count_new
)
select
'compound' as table_name,
count_old,
count_new,
count_old - count_new as diff
from counts;
I think you could do something like this:
SELECT 'compound' AS table_name, count_old, count_new, (count_old - count_new) AS diff FROM (
SELECT(
(SELECT count(*) FROM (SELECT DISTINCT id FROM schema1.compound)) AS count_old,
(SELECT count(*) FROM (SELECT DISTINCT id FROM schema2.compound)) AS count_new
)
It was probably answered already, but it is a subquery/nested query.
You can directly compute the COUNT on distinct values if you use the DISTINCT keyword inside your aggregation function. Then you can join the queries extracting your two needed values, and use them inside your query to get the output table.
WITH cte AS (
SELECT new.cnt AS count_new,
old.cnt AS count_old
FROM (SELECT COUNT(DISTINCT id) AS cnt FROM schema1.compound) AS old
INNER JOIN (SELECT COUNT(DISTINCT id) AS cnt FROM schema2.compound) AS new
ON 1 = 1
)
SELECT 'compound' AS table_name,
count_new,
count_old,
count_new = count_old AS diff
FROM cte

postgresql How to share cte among different tables in plain sql?

Say select id from some_expensive_query is the cte I want to share. Currently I write two sql in a transaction:
with t as (select id from some_expensive_query) select * from t1 join t on t.id =t1.id;
with t as (select id from some_expensive_query) select * from t2 join t on t.id =t2.id;
As you can see, the cte is executed twice but I want something like:
t = select id from some_expensive_query;
select * from t1 join t on t.id =t1.id;
select * from t2 join t on t.id =t2.id;
for portability, I don't want to use pgsql or functions, anyway to solve this?
Why don't you use union all ?
with t as (select id from some_expensive_query)
select * from t1 join t on t.id =t1.id
union all
select * from t2 join t on t.id =t2.id;

KNIME does not create TEMP table

I have tried to create TEMP table using "Database SQL Executor" node using expression:
CREATE TEMPORARY TABLE tmp_table AS
SELECT type_id, created_at
FROM t1
LEFT JOIN t2 ON t1.id = t2.id
WHERE t2.id = (SELECT id
FROM t2
WHERE id = t1.id
AND event_date <= created_at
ORDER BY event_date DESC
LIMIT 1);
Unfortunately despite the fact there is neither no mistake in pure syntax (above code works when run from PSQL console) nor error after node execution, table does not exist in database.
EDIT : Thanks to #SABER-FICTIONALCHARACTER

How to use a Table type in query

I have 9000 row in News table and use this code for selecting 20 from it:
Select *
From (
Select *, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS Num
From News
Where SubjectID in(Select MenuSubject.SubjectID
From MenuSubject inner join Menu on MenuSubject.MenuID = Menu.MenuID)
) as myTable
where myTable.Num BETWEEN 100 and 120
But time is 28 second spent reading! Also, I test this query with out join table and get result at 1 second.
So, I want use Table type for select join table and use this in query. I made new Table type using the following code:
DECLARE #MyTable2 IntListTable
Insert Into #MyTable2
Select MenuSubject.SubjectID
From MenuSubject inner join Menu on MenuSubject.MenuID = Menu.MenuID
Select *
From (
Select *, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS Num
From News
Where SubjectID in #MyTable2
) as myTable
where myTable.Num BETWEEN 100 and 120
But get Error in
SubjectID in #MyTable2
Error:
Incorrect syntax near '#MyTable2'.
Edit:
I test my code with:
Select myTable.Title
or use this code instead join table:
Where SubjectID in(13,14,20,21,25,24,26,24,28,29,30,54,55,60,47,98,99,65,14,20,33,666,987,254)
get result at 1 second.
but use this code in query:
Select myTable.MoreText
time is 28 second spent reading!. why!?
Try this,
Select x.Num
From (
Select *, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS Num
From News
Where SubjectID in(Select MenuSubject.SubjectID
From MenuSubject inner join Menu on MenuSubject.MenuID = Menu.MenuID)
) x
where x.Num <21
WITH myTempTable as (Select MenuSubject.SubjectID
From MenuSubject inner join Menu on MenuSubject.MenuID = Menu.MenuID)
Select *
From (
Select *, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS Num
From News
Where SubjectID in (SELECT SubjectID FROM myTempTable)
) as myTable
where myTable.Num BETWEEN 100 and 120
You can try above query.
There is absolutely no need for a User-Defined Table Type in this query. It adds work but no actual benefit.
The problem is most likely the fact that you are using an IN list as those translate out to be an OR condition for each of the values. But an IN list isn't needed either.
This query can actually be simplified by rethinking it in terms of an INNER JOIN, which should be better as it will allow the Query Optimizer to do its job.
SELECT *
FROM (
SELECT nw.*, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS [Num]
FROM News nw
INNER JOIN (
MenuSubject
INNER JOIN Menu
ON MenuSubject.MenuID = Menu.MenuID
) ON MenuSubject.SubjectID = nw.SubjectID
) AS myTable
WHERE myTable.Num BETWEEN 100 AND 120;
One final simplification that can be made, though I doubt it is needed here since 9000 rows is almost no data at all, is to first dump the results to a local temporary table and then use that in the INNER JOIN:
CREATE TABLE #Subjects
(
SubjectID INT NOT NULL -- PRIMARY KEY -- test with and without PK to see if it helps
);
INSERT INTO #Subjects (SubjectID)
SELECT MenuSubject.SubjectID
FROM MenuSubject
INNER JOIN Menu
ON Menu.MenuID = MenuSubject.MenuID;
SELECT *
FROM (
SELECT nw.*, ROW_NUMBER() OVER (ORDER BY DateSend DESC) AS [Num]
FROM News nw
INNER JOIN #Subjects sub
ON sub.SubjectID = nw.SubjectID
) AS myTable
WHERE myTable.Num BETWEEN 100 AND 120;

SQL Server SUM() for DISTINCT records

I have a field called "Users", and I want to run SUM() on that field that returns the sum of all DISTINCT records. I thought that this would work:
SELECT SUM(DISTINCT table_name.users)
FROM table_name
But it's not selecting DISTINCT records, it's just running as if I had run SUM(table_name.users).
What would I have to do to add only the distinct records from this field?
Use count()
SELECT count(DISTINCT table_name.users)
FROM table_name
SQLFiddle demo
This code seems to indicate sum(distinct ) and sum() return different values.
with t as (
select 1 as a
union all
select '1'
union all
select '2'
union all
select '4'
)
select sum(distinct a) as DistinctSum, sum(a) as allSum, count(distinct a) as distinctCount, count(a) as allCount from t
Do you actually have non-distinct values?
select count(1), users
from table_name
group by users
having count(1) > 1
If not, the sums will be identical.
You can see for yourself that distinct works with the following example. Here I create a subquery with duplicate values, then I do a sum distinct on those values.
select DistinctSum=sum(distinct x), RegularSum=Sum(x)
from
(
select x=1
union All
select 1
union All
select 2
union All
select 2
) x
You can see that the distinct sum column returns 3 and the regular sum returns 6 in this example.
You can use a sub-query:
select sum(users)
from (select distinct users from table_name);
SUM(DISTINCTROW table_name.something)
It worked for me (innodb).
Description - "DISTINCTROW omits data based on entire duplicate records, not just duplicate fields." http://office.microsoft.com/en-001/access-help/all-distinct-distinctrow-top-predicates-HA001231351.aspx
;WITH cte
as
(
SELECT table_name.users , rn = ROW_NUMBER() OVER (PARTITION BY users ORDER BY users)
FROM table_name
)
SELECT SUM(users)
FROM cte
WHERE rn = 1
SQL Fiddle
Try here yourself
TEST
DECLARE #table_name Table (Users INT );
INSERT INTO #table_name Values (1),(1),(1),(3),(3),(5),(5);
;WITH cte
as
(
SELECT users , rn = ROW_NUMBER() OVER (PARTITION BY users ORDER BY users)
FROM #table_name
)
SELECT SUM(users) DisSum
FROM cte
WHERE rn = 1
Result
DisSum
9
If circumstances make it difficult to weave a "distinct" into the sum clause, it will usually be possible to add an extra "where" clause to the entire query - something like:
select sum(t.ColToSum)
from SomeTable t
where (select count(*) from SomeTable t1 where t1.ColToSum = t.ColToSum and t1.ID < t.ID) = 0
May be a duplicate to
Trying to sum distinct values SQL
As per Declan_K's answer:
Get the distinct list first...
SELECT SUM(SQ.COST)
FROM
(SELECT DISTINCT [Tracking #] as TRACK,[Ship Cost] as COST FROM YourTable) SQ