PostgreSQL Null breaks lists when looking for NOT IN - postgresql

I think this might be a PostgreSQL bug but I'm posting it here in case I'm just missing something. When my WHERE clause has a NOT IN () clause, having null in the list makes the clause no longer truthy. Below is a dumbed down example of my issue.
=# select 1 where 1 not in (1);
?column?
----------
(0 rows)
=# select 1 where 1 not in (2);
?column?
----------
1
(1 row)
=# select 1 where 1 not in (null);
?column?
----------
(0 rows)
=# select 1 where 1 not in (null, 2);
?column?
----------
(0 rows)
=# select 1 where 1 not in (2, null);
?column?
----------
(0 rows)
=# select 1 where 1 not in (2, 3);
?column?
----------
1
(1 row)
So where 1 not in (1) returns 0 rows as expected since 1 is in the list, where 1 not in (2) returns 1 row as expected since 1 is not in the list, but where 1 not in (null) returns 0 rows even though 1 is not in the list.

This is not a PostgreSQL bug.
The problem is that NOT IN is just the short version for testing all inequalities one by one.
1 NOT IN (null, 2) is equivalent to:
1 <> null
AND
1 <> 2
However, NULL is a special value, so 1 <> null is itself NULL (not TRUE). See the documentation:
Do not write expression = NULL because NULL is not “equal to” NULL. (The null value represents an unknown value, and it is not known whether two unknown values are equal.)
As far as I know that's the standard SQL behaviour.
PostgreSQL has an additional keyword to check whether a value is different from null:
1 IS DISTINCT FROM NULL would be TRUE.

Related

Replace calculated negative values with 0 in PostgreSQL

I have a table my_table:
case_id first_created last_paid submitted_time
3456 2021-01-27 2021-01-29 2021-01-26 21:34:36.566023+00:00
7891 2021-08-02 2021-09-16 2022-10-26 19:49:14.135585+00:00
1245 2021-09-13 None 2022-10-31 02:03:59.620348+00:00
9073 None None 2021-09-12 10:25:30.845687+00:00
6891 2021-08-03 2021-09-17 None
I created 2 new variables:
select *,
first_created-coalesce(submitted_time::date) as create_duration,
last_paid-coalesce(submitted_time::date) as paid_duration
from my_table;
The output:
case_id first_created last_paid submitted_time create_duration paid_duration
3456 2021-01-27 2021-01-29 2021-01-26 21:34:36.566023+00:00 1 3
7891 2021-08-02 2021-09-16 2022-10-26 19:49:14.135585+00:00 -450 -405
1245 2021-09-13 null 2022-10-31 02:03:59.620348+00:00 -412 null
9073 None None 2021-09-12 10:25:30.845687+00:00 null null
6891 2021-08-03 2021-09-17 null null null
My question is how can I replace new variables' value with 0, if it is smaller than 0?
The ideal output should look like:
case_id first_created last_paid submitted_time create_duration paid_duration
3456 2021-01-27 2021-01-29 2021-01-26 21:34:36.566023+00:00 1 3
7891 2021-08-02 2021-09-16 2022-10-26 19:49:14.135585+00:00 0 0
1245 2021-09-13 null 2022-10-31 02:03:59.620348+00:00 0 null
9073 None None 2021-09-12 10:25:30.845687+00:00 null null
6891 2021-08-03 2021-09-17 null null null
My code:
select *,
first_created-coalesce(submitted_time::date) as create_duration,
last_paid-coalesce(submitted_time::date) as paid_duration,
case
when create_duration < 0 THEN 0
else create_duration
end as QuantityText
from my_table
greatest(yourvalue,0)
Given yourvalue lower than 0, 0 will be returned as the greater value:
select *,
greatest(0,first_created-coalesce(submitted_time::date)) as create_duration,
greatest(0,last_paid-coalesce(submitted_time::date)) as paid_duration
from my_table
This will also change null values to 0.
case statement
If you wish to keep the null results, you can resort to a regular case statement. In order to alias your calculation you'll have to put it in a subquery or a cte:
select *,
case when create_duration<0 then 0 else create_duration end as create_duration_0,
case when paid_duration<0 then 0 else paid_duration end as paid_duration_0
from (
select *,
first_created-coalesce(submitted_time::date) as create_duration,
last_paid-coalesce(submitted_time::date) as paid_duration
from my_table ) as subquery;
(n+abs(n))/2
If you sum a number with its absolute value, then divide by two (average them out), you'll get that same number if it was positive, or you'll get zero if it was negative because a negative number will always balance itself out with its absolute value:
(-1+abs(-1))/2 = (-1+1)/2 = 0/2 = 0
( 1+abs( 1))/2 = ( 1+1)/2 = 2/2 = 1
select *,
(create_duration + abs(create_duration)) / 2 as create_duration_0,
(paid_duration + abs(paid_duration) ) / 2 as paid_duration_0
from (
select *,
first_created-coalesce(submitted_time::date) as create_duration,
last_paid-coalesce(submitted_time::date) as paid_duration
from my_table ) as subquery;
Which according to this demo, is slightly faster than case and about as fast as greatest(), without affecting null values.
Note that select * pulls everything from below, so you'll end up seeing create_duration as well as create_duration_0 - you can get rid of it by listing your desired output columns explicitly in the outer query. You can also rewrite it without subquery/cte, repeating the calculation, which will look ugly but in most cases planner will notice the repetition and make evaluate it only once
select *,
case when first_created-coalesce(submitted_time::date) < 0
then 0
else first_created-coalesce(submitted_time::date)
end as create_duration,
(abs(last_paid-coalesce(submitted_time::date))+last_paid-coalesce(submitted_time::date))/2 as paid_duration
from my_table ) as subquery;
or using a scalar subquery
select *,
(select case when a<0 then 0 else a end
from (select first_created-coalesce(submitted_time::date)) as alias(a) )
as create_duration,
(select case when a<0 then 0 else a end
from (select last_paid-coalesce(submitted_time::date)) as alias(a) )
as paid_duration
from my_table ) as subquery;
Neither of which help with anything in this case but are good to know.
If you are planning on attaching your SQL Database to an ASP.NET app, you could create a c# script to query your database, and use the following:
Parameters.AddWithValue(‘Data You want to change’ ‘0’);
However, if your not using your SQL database with a ASP.NET app, this will not work.

postgresql: datatype numeric with limited digits

I am looking for numeric datatype with limited digits
(before and after the decimal point)
The function kills only digits after the decimal point. (PG version >= 13)
create function num_flex( v numeric, d int) returns numeric as
$$
select case when v=0 then 0
when v < 1 and v > -1 then trim_scale(round(v, d - 1 ) )
else trim_scale(round(v, d - 1 - least(log(abs(v))::int,d-1) ) ) end;
$$
language sql ;
For testing:
select num_flex( 0, 6)
union all
select num_flex( 1.22000, 6)
union all
select num_flex( (-0.000000123456789*10^x)::numeric,6)
from generate_series(1,15,3) t(x)
union all
select num_flex( (0.0000123456789*10^x)::numeric,6)
from generate_series(1,15,3) t(x) ;
It runs,
but have someone a better idea or find a bug (a situation, that is not implemented)?
The next step is to integrate this in PG, so that I can write
select 12.123456789::num_flex6 ;
select 12.123456789::num_flex7 ;
for a num_flex datatype with 6 or 7 digits.
with types from num_flex2 to num_flex9. Is this possible?
There are a few problems with your function:
Accepting negative digit counts (parameter d). num_flex(1234,-2) returns 1200 - you specified you want the function to only kill digits after decimal point, so 1234 would be expected.
Incorrect results between -1 and 1. num_flex(0.123,3) returns 0.12 instead of 0.123. I guess this might also be desired effect if you do want to count 0 to the left of decimal point. Normally, that 0 is ignored when a number's precision and scale are considered.
Your counting of digits to the left of decimal point is incorrect due to how ::int rounding works. log(abs(11))::int is 1 but log(abs(51))::int is 2. ceil(log(abs(v)))::int returns 2 in both cases, while keeping int type to still work as 2nd parameter in round().
create or replace function num_flex(
input_number numeric,
digit_count int,
is_counting_unit_zero boolean default false)
returns numeric as
$$
select trim_scale(
case
when input_number=0
then 0
when digit_count<=0 --avoids negative rounding
then round(input_number,0)
when (input_number between -1 and 1) and is_counting_unit_zero
then round(input_number,digit_count-1)
when (input_number between -1 and 1)
then round(input_number,digit_count)
else
round( input_number,
greatest( --avoids negative rounding
digit_count - (ceil(log(abs(input_number))))::int,
0)
)
end
);
$$
language sql;
Here's a test
select *,"result"="should_be"::numeric as "is_correct" from
(values
('num_flex(0.1234 ,4)',num_flex(0.1234 ,4), '0.1234'),
('num_flex(1.234 ,4)',num_flex(1.234 ,4), '1.234'),
('num_flex(1.2340000 ,4)',num_flex(1.2340000 ,4), '1.234'),
('num_flex(0001.234 ,4)',num_flex(0001.234 ,4), '1.234'),
('num_flex(123456 ,5)',num_flex(123456 ,5), '123456'),
('num_flex(0 ,5)',num_flex(0 ,5), '0'),
('num_flex(00000.00000 ,5)',num_flex(00000.00000 ,5), '0'),
('num_flex(00000.00001 ,5)',num_flex(00000.00001 ,5), '0.00001'),
('num_flex(12345678901 ,5)',num_flex(12345678901 ,5), '12345678901'),
('num_flex(123456789.1 ,5)',num_flex(123456789.1 ,5), '123456789'),
('num_flex(1.234 ,-4)',num_flex(1.234 ,4), '1.234')
) as t ("operation","result","should_be");
-- operation | result | should_be | is_correct
----------------------------+-------------+-------------+------------
-- num_flex(0.1234 ,4) | 0.1234 | 0.1234 | t
-- num_flex(1.234 ,4) | 1.234 | 1.234 | t
-- num_flex(1.2340000 ,4) | 1.234 | 1.234 | t
-- num_flex(0001.234 ,4) | 1.234 | 1.234 | t
-- num_flex(123456 ,5) | 123456 | 123456 | t
-- num_flex(0 ,5) | 0 | 0 | t
-- num_flex(00000.00000 ,5) | 0 | 0 | t
-- num_flex(00000.00001 ,5) | 0.00001 | 0.00001 | t
-- num_flex(12345678901 ,5) | 12345678901 | 12345678901 | t
-- num_flex(123456789.1 ,5) | 123456789 | 123456789 | t
-- num_flex(1.234 ,-4) | 1.234 | 1.234 | t
--(11 rows)
You can declare the precision (total number of digits) of your numeric data type in the column definition. Only digits after decimal point will be rounded. If there are too many digits before the decimal point, you'll get an error.
The downside is that numeric(n) is actually numeric(n,0), which is dictated by the SQL standard. So if by limiting the column's number of digits to 5 you want to have 12345.0 as well as 0.12345, there's no way you can configure numeric to hold both. numeric(5) will round 0.12345 to 0, numeric(5,5) will dedicate all digits to the right of decimal point and reject 12345.
create table test (numeric_column numeric(5));
insert into test values (12345.123);
table test;
-- numeric_column
------------------
-- 12345
--(1 row)
insert into test values (123456.123);
--ERROR: numeric field overflow
--DETAIL: A field with precision 5, scale 0 must round to an absolute value less than 10^5.

Replace first n entries in a column in kdb

How can I replace the values in the first n columns of my table?
i.e. mycol:(1 2 3 4) to mycol:(a a 3 4)
Thank you in advance!
If it's the values within mycol that you want updated then they will need to be of the same type as the existing values. See below.
q)t:([]mycol:`$string 1+til 4;mycol2:til 4)
q)update mycol:`a from t where i<2
mycol mycol2
------------
a 0
a 1
3 2
4 3
One way around this though is to enlist mycol, that way updates of any type can be made.
q)t:([]mycol:1+til 4;mycol2:til 4)
q)update mycol:`a from(update enlist each mycol from t)where i<2
mycol mycol2
------------
`a 0
`a 1
,3 2
,4 3
q)meta update mycol:`a from(update enlist each mycol from t)where i<2
c | t f a
------| -----
mycol |
mycol2| j
It's unclear from your question whether you want the column names or the column values changed. If it's the column names, you can use xcol.
q)(2#`a)xcol([]w:3#til 3;x:3#.Q.a;y:`;z:0N)
a a y z
-------
0 a
1 b
2 c

TSQL - replace isnumeric = 0

I have a select statement and in that select statement I have a few columns on which I perform basic calculations (e.g. [Col1] * 3.14). However, occasionally I run into non-numeric values and when that happens, the whole stored procedure fails because of one row.
I've thought about using a WHERE ISNUMERIC(Col1) <> 0, but then I would be excluding information in the other columns.
Is there a way in TSQL to somehow replace all stings with NULL or 0??
Something like...
SELECT blah1, blah2, blah3
CASE WHEN ISNUMERIC(Col1) = 1 THEN [Col1] * 3.14 ELSE NULL END as whatever
FROM your_table
A case can also be made that..
The non-numeric values should be converted to numeric or NULL if that's what's expected in the column, and
If numbers are expected then the column should be a numeric data type in the first place and not a character data type, which allows for these types of errors.
I prefer Try_Cast:
SELECT
someValue
,TRY_CAST(someValue as int) * 3.14 AS TRY_CAST_to_int
,TRY_CAST(someValue as decimal) * 3.14 AS TRY_CAST_to_decimal
,IIF(ISNUMERIC(someValue) = 1, someValue, null) * 3.14 as IIF_IS_NUMERIC
FROM (values
( 'asdf'),
( '2' ),
( '1.55')
) s(someValue)
ISNUMERIC is a terrible way to do this, as there are far too many things that identify as NUMERIC which are not able to be multiplied by a non-MONEY data type.
https://www.brentozar.com/archive/2018/02/fifteen-things-hate-isnumeric/
This fails miserably, as '-' is a numeric...
DECLARE #example TABLE (numerics VARCHAR(10));
INSERT INTO #example VALUES ('-')
SELECT CASE WHEN ISNUMERIC(numerics) = 1 THEN numerics * 3.14 ELSE NULL END
FROM #example;
Try TRY_CAST instead (albeit amend your DECIMAL precision to suit your needs):
DECLARE #example TABLE (numerics VARCHAR(10));
INSERT INTO #example VALUES ('-')
SELECT TRY_CAST(numerics AS decimal(10,2)) * 3.14 FROM #example;
trycast will test for a specfic type
declare #T table (num varchar(20));
insert into #T values ('12'), ('3.14'), ('5.6E12'), ('$120'), ('-'), (''), ('cc'), ('aa'), ('bb'), ('1/5');
select t.num, ISNUMERIC(t.num) as isnumeric
, isnull(TRY_CONVERT(smallmoney, t.num), 0) as smallmoney
, TRY_CONVERT(float, t.num) as float
, TRY_CONVERT(decimal(18,4), t.num) as decimal
, isnull(TRY_CONVERT(smallmoney, t.num), TRY_CONVERT(float, t.num)) as mix
from #T t
num isnumeric smallmoney float decimal
-------------------- ----------- --------------------- ---------------------- ---------------------------------------
12 1 12.00 12 12.0000
3.14 1 3.14 3.14 3.1400
5.6E12 1 0.00 5600000000000 NULL
$120 1 120.00 NULL NULL
- 1 0.00 NULL NULL
0 0.00 0 NULL
cc 0 0.00 NULL NULL
aa 0 0.00 NULL NULL
bb 0 0.00 NULL NULL
1/5 0 0.00 NULL NULL
interesting the last still fails

PostgreSQL - dynamic INSERT on column names

I'm looking to dynamically insert a set of columns from one table to another in PostgreSQL. What I think I'd like to do is read in a 'checklist' of column headings (those columns which exist in table 1 - the storage table), and if they exist in the export table (table 2) then insert them in all at once from table 1. Table 2 will be variable in its columns though - once imported ill drop it and import new data to be imported with potentially different column structure. So I need to import it based on the column names.
e.g.
Table 1. - The storage table
ID NAME YEAR LITH_AGE PROV_AGE SIO2 TIO2 CAO MGO COMMENTS
1 John 1998 2000 3000 65 10 5 5 comment1
2 Mark 2005 2444 3444 63 8 2 3 comment2
3 Luke 2001 1000 1500 77 10 2 2 comment3
Table 2. - The export table
ID NAME MG# METHOD SIO2 TIO2 CAO MGO
1 Amy 4 Method1 65 10 5 5
2 Poe 3 Method2 63 8 2 3
3 Ben 2 Method3 77 10 2 2
As you can see the export table may include columns which do not exist in the storage table, so these would be ignored.
I want to insert all of these columns at once, as I've found if I do it individually by column it extends the number of rows each time on the insert (maybe someone can solve this issue instead? Currently I've written a function to check if a column name exists in table 2, if it does, insert it, but as said this extends the rows of the table every time and NULL the rest of the columns).
The INSERT line from my function:
EXECUTE format('INSERT INTO %s (%s) (SELECT %s::%s FROM %s);',_tbl_import, _col,_col,_type,_tbl_export);
As a type of 'code example' for my question:
EXECUTE FORMAT('INSERT INTO table1 (%s) (SELECT (%s) FROM table2)',columns)
where 'columns' would be some variable denoting the columns that exist in the export table that need to go into the storage table. This will be variable as table 2 will be different every time.
This would ideally update Table 1 as:
ID NAME YEAR LITH_AGE PROV_AGE SIO2 TIO2 CAO MGO COMMENTS
1 John 1998 2000 3000 65 10 5 5 comment1
2 Mark 2005 2444 3444 63 8 2 3 comment2
3 Luke 2001 1000 1500 77 10 2 2 comment3
4 Amy NULL NULL NULL 65 10 5 5 NULL
5 Poe NULL NULL NULL 63 8 2 3 NULL
6 Ben NULL NULL NULL 77 10 2 2 NULL
UPDATED answer
As my original answer did not meet requirement came out later but was asked to post an alternative example for information_schema solution so here it is.
I made two versions for solutions:
V1 - is equivalent to already given example using information_schema. But that solution relies on table1 column DEFAULTs. Meaning, if table1 column that does not exist at table2 does not have DEFAULT NULL then it will be filled with whatever the default is.
V2 - is modified to force 'NULL' in case of two table columns mismatch and does not inherit table1 own DEFAULTs
Version1:
CREATE OR REPLACE FUNCTION insert_into_table1_v1()
RETURNS void AS $main$
DECLARE
columns text;
BEGIN
SELECT string_agg(c1.attname, ',')
INTO columns
FROM pg_attribute c1
JOIN pg_attribute c2
ON c1.attrelid = 'public.table1'::regclass
AND c2.attrelid = 'public.table2'::regclass
AND c1.attnum > 0
AND c2.attnum > 0
AND NOT c1.attisdropped
AND NOT c2.attisdropped
AND c1.attname = c2.attname
AND c1.attname <> 'id';
-- Following is the actual result of query above, based on given data examples:
-- -[ RECORD 1 ]----------------------
-- string_agg | name,si02,ti02,cao,mgo
EXECUTE format(
' INSERT INTO table1 ( %1$s )
SELECT %1$s
FROM table2
',
columns
);
END;
$main$ LANGUAGE plpgsql;
Version2:
CREATE OR REPLACE FUNCTION insert_into_table1_v2()
RETURNS void AS $main$
DECLARE
t1_cols text;
t2_cols text;
BEGIN
SELECT string_agg( c1.attname, ',' ),
string_agg( COALESCE( c2.attname, 'NULL' ), ',' )
INTO t1_cols,
t2_cols
FROM pg_attribute c1
LEFT JOIN pg_attribute c2
ON c2.attrelid = 'public.table2'::regclass
AND c2.attnum > 0
AND NOT c2.attisdropped
AND c1.attname = c2.attname
WHERE c1.attrelid = 'public.table1'::regclass
AND c1.attnum > 0
AND NOT c1.attisdropped
AND c1.attname <> 'id';
-- Following is the actual result of query above, based on given data examples:
-- t1_cols | t2_cols
-- --------------------------------------------------------+--------------------------------------------
-- name,year,lith_age,prov_age,si02,ti02,cao,mgo,comments | name,NULL,NULL,NULL,si02,ti02,cao,mgo,NULL
-- (1 row)
EXECUTE format(
' INSERT INTO table1 ( %s )
SELECT %s
FROM table2
',
t1_cols,
t2_cols
);
END;
$main$ LANGUAGE plpgsql;
Also link to documentation about pg_attribute table columns if something is unclear: https://www.postgresql.org/docs/current/static/catalog-pg-attribute.html
Hopefully this helps :)