searching not null condition in multiple columns with query postgresql - postgresql

I want to count how many document that does not have date in field1, field2, field3, and field4. I have created the query as below but it does not really look good.
select
count(doc)
where true
and field1 is not null
and field2 is not null
and field3 is not null
and field4 is not null
How can I apply one filter for multiple columns?
Thanks in advance.

There is nothing at all wrong with your current query, and it is probably what I would be using here. However, you could use a COALESCE trick here:
SELECT COUNT(*)
FROM yourTable
WHERE COALESCE(field1, field2, field3, field4) IS NOT NULL;
This works because for any record having at least one of the four fields assigned to a non NULL date would fail the IS NOT NULL check. Only records for which all four fields are NULL would match.
Note that this counts records having at least one non NULL field. If instead you want to count records where all four fields are NULL, then use:
SELECT COUNT(*)
FROM yourTable
WHERE COALESCE(field1, field2, field3, field4) IS NULL;

Related

Apply join, sort on date column and select the first row where one of the column value is not null

I have two tables(Table A and Table B) in a Postgres DB.
Both have "id" column in common. Table A has one column called "id" and Table B has three columns: "id, date, value($)".
For each "id" of Table A there exists multiple rows in Table B in the following format - (id, date, value).
For instance, for Table A with "id" as 1 if there exists following rows in Table B:
(1, 2018-06-21, null)
(1, 2018-06-20, null)
(1, 2018-06-19, 202)
(1, 2018-06-18, 200)
I would like to extract the most recent dated non-null value. For example for id - 1, the result should be 202. Please share your thoughts or let me know in case more info is required.
Here is the solution I went ahead with:
with mapping as ( select distinct table1.id, table2.value, table2.date, row_number() over (partition by table1.id order by table2.date desc nulls last) as row_number
from table1
left join table2 on table2.id=table1.id and table2.value is not null
)
select * from mapping where row_number = 1
Let me know if there is scope for improvement.
You may very well want an inner join, not an outer join. If you have an id in table1 that does not exist in table2 or that has only null values you will get NULL for both date and value. This is due to the how outer join works. What it says is if nothing in the right side table matches the ON condition then return NULL for each column in that table. So
with mapping as
(select distinct table1.id
, table2.value
, table2.date
, row_number() over (partition by table1.id order by table2.date desc nulls last) as row_number
from table1
join table2 on table2.id=table1.id and table2.value is not null
)
select *
from mapping
where row_number = 1;
See example of each here. Your query worked because all your test data satisfied the 1st condition of the ON condition. You really need test data that fails to see what your query does.
Caution: DATE and VALUE are very poor choice for a column names. Both are SQL standard reserved words, although not Postgres specifically. Further DATE is a Postgres data type. Having columns with names the same as datatype leads to confusion.

Can lead() return the next row only when a condition is met?

Recently my company upgraded from SQL Server 2008 to 2016, so I want to take advantage of some "new" features, one of which is lead().
I understand the basic usage, but I want to know if I can return the next row only when a condition is met. My original query looked like the following, where x.next_id is null if the next row isn't more than 12 days past the current row.
SELECT
a.id,
a.date_a,
x.next_id
FROM
table a
OUTER APPLY
(SELECT TOP 1
next_id = i.intIndex
FROM
table i
WHERE
i.date_a > DATEADD(DAY, 12, a.date_a)
ORDER BY
date_a, id ASC) x
ORDER BY
date_a, id ASC
Data might look like the following, where the third column is added by the query:
id date_a next_id
--------------------------------
1798678 2014-12-01 NULL
1798689 2013-01-05 1798688
1798688 2014-03-31 NULL
1798696 2013-04-03 1798694
1798694 2013-08-12 1798691
1798691 2014-09-30 NULL
1798698 2013-05-14 1798697
1798697 2013-08-29 NULL
Assuming this data set (your result table; minus the result column):
CREATE TABLE some_table(id INT PRIMARY KEY,date_a DATE);
INSERT INTO some_table(id,date_a)
VALUES (1798678,'2014-12-01'),
(1798689,'2013-01-05'),
(1798688,'2014-03-31'),
(1798696,'2013-04-03'),
(1798694,'2013-08-12'),
(1798691,'2014-09-30'),
(1798698,'2013-05-14'),
(1798697,'2013-08-29');
This query returns the same result set as what the query you have returns:
SELECT
id,
date_a,
next_id=
CASE WHEN LEAD(date_a) OVER (ORDER BY date_a,id)>DATEADD(DAY,12,date_a)
THEN LEAD(id) OVER (ORDER BY date_a,id)
ELSE NULL
END
FROM
some_table
ORDER BY
date_a,id;

Add Column in table with value partition by group

My table is somethingg like
CREATE TABLE table1
(
_id text,
name text,
data_type int,
data_value int,
data_date timestamp -- insertion time
);
Now due to a system bug, many duplicate entries are created and I need to remove those duplicated and keep only unique entries excluding data_date because it is a system generated date.
My query to do that is something like:
DELETE FROM table1 A
USING ( SELECT _id, name, data_type, data_value, MIN(data_date) min_date
FROM table1
GROUP BY _id, name, data_type, data_value
HAVING count(data_date) > 1) B
WHERE A._id = B._id
AND A.name = B.name
AND A.data_type = B.data_type
AND A.data_value = B.data_value
AND A.data_date != B.min_date;
However this query works, having millions of records in the table, I want a faster way for it. My idea is to create a new column with value as partition by [_id, name, data_type, data_value] or columns which are in group by. However, I could not find the way to create such column.
I would appretiate if any one may suggest a way to create such column.
Edit 1:
There is another thing to add, I don't want to use CTE or subquery for updating this new column because it will be same as my existing query.
The best way is simply creating a new table without duplicated records:
CREATE...
SELECT _id, name, data_type, data_value, MIN(data_date) min_date
FROM table1
GROUP BY _id, name, data_type, data_value;
Alternatively, you can create a rank and then filter, but a subquery is needed.
RANK() OVER (PARTITION BY your_variables ORDER BY data_date ASC) r
And then filter r=1.

How to handle null values while querying with distinct column values condition from the same table

There is a single table named Products which has 100s of columns. I am running a distinct column1,column2,column3....column6 postgresql query and the result is something like below:
2 Product A 300 2017 Null Null
2 Product A 300 2017 Null Null
Due to null values, instead of a single row I am getting two rows. How to solve this? Your help is much appreciated.
null differs from itself, distinct checks for equality under the hood. Instead of
select distinct field1, field2, ..., fieldn
you can have your select clause like this:
select distinct coalesce(field1, 'Empty') AS field1, ..., coalesce(fieldn, 'Empty') AS fieldn
You will only need coalesce for nullable fields.
One way to remove the duplicates you have above is to use GROUP BY the columns that you want distinct values for. So something like this
SELECT column1, column2, column3, ...,column6
FROM sometable
GROUP BY column1, column2, column3, ...,column6

Instead of selecting into a table and then selecting from that table: what is a better way to join?

So, in my queries, I've had the need to restructure some data.
To do this--I have created select statements that select some set into a table that before hand, I checked if it existed and then dropped the table if it did.
These tables are repopulated everytime I have to run an SSIS package and I feel like if I keep going down this path--things will get messy.
Is there a way to select out of a select statement in the middle of a select statement?
Select Field1
,Field2
,Field3
,SELECT DISTINCT R.Field4
FROM
(
SELECT DISTINCT
L.Open_DT
,L.Group_NPI
, bil.*
,Row_Number() over (partition by group_npi order by L.open_dt) AS RNK
FROM tbl_Location L
LEFT OUTER JOIN [MOAD].[dbo].[qry_Location_Address_Billing] bil
ON L.Location_ID = bil.Location_ID
) AS R
WHERE Rnk = 1 AND Location_ID IS NOT NULL AS Field4,
,Field5
,Field6
From Tables
I've edited this question in attempts to clarify.
I want it to come out like so:
Field1, Field2, Field3, Field4 (from that nested query), Field5, Field6