Postgres ignoring null values when using filter as not in array - postgresql

SELECT * FROM Entity e WHERE e.Status <> ANY(ARRAY[1,2,3]);
Here Status is a nullable integer column. Using the above query i am unable to fetch the records whose status value is NULL.
SELECT * FROM Entity e WHERE (e.Status is NULL OR e.Status = 4);
This query does the trick. Could someone explain me why the first query was not working as expected.

NULL kinda means "unknown", so the expressions
NULL = NULL
and
NULL != NULL
are neither true nor false, they're NULL. Because it is not known whether an "unknown" value is equal or unequal to another "unknown" value.
Since <> ANY uses an equality test, if the value searched in the array is NULL, then the result will be NULL.
So your second query is correct.

It is spelled out in the docs Array ANY:
If the array expression yields a null array, the result of ANY will be null. If the left-hand expression yields null, the result of ANY is ordinarily null (though a non-strict comparison operator could possibly yield a different result). Also, if the right-hand array contains any null elements and no true comparison result is obtained, the result of ANY will be null, not false (again, assuming a strict comparison operator). This is in accordance with SQL's normal rules for Boolean combinations of null values.
FYI:
e.Status is NULL OR e.Status = 4
can be shortened to:
e_status IS NOT DISTINCT FROM 4
per Comparison operators.

Related

Pyspark Dataframe Null value Logic Operation

In python, None !=1 will return True.
But why in Pyspark "Null_column" != 1 will return false?
example:
data = [(1,5),(2,5)]
columns=["id","test"]
df_null=spark.createDataFrame(data,columns)
df_null = df_null.withColumn("nul_val",lit(None))
df_null.printSchema()
df_null.show()
but df_null.filter(df_null.nul_val != 1).count() will return 0
Please check NULL Semantics - Spark 3.0.0 for how to handle comparison with null in spark.
But to summerize, in Spark, null is undefined , so any comparison with null will result in undefined and should be avoided to avoid unwanted results. And in your case, since undefined is not True, the count will be 0.
Apache spark supports the standard comparison operators such as ‘>’, ‘>=’, ‘=’, ‘<’ and ‘<=’. The result of these operators is unknown or NULL when one of the operarands or both the operands are unknown or NULL.
If you want to compare with a column that might contain null, use the null-safe operation <=> which results in False if one of the operands is null:
In order to compare the NULL values for equality, Spark provides a null-safe equal operator (‘<=>’), which returns False when one of the operand is NULL
So, back to your problem. To solve it I would do a null-check and the comparison with 1:
df_null.filter((df_null.nul_val.isNull()) | (df_null.nul_val != 1)).count()
Another solution would be to replace null with 0, if that does not destroy any other logic:
df_null.fill(value=0,subset=["nul_val"]).filter(df_null.nul_val != 1).count()

number equality to null using case when

In my Postgres database, I'm checking user answers for correctness by checking if two IDs, "user_answered_id" and "expected_answer_id", are equivalent. If the user doesn't provide a "user_answered_id", then we still mark their answer as incorrect.
In Postgres, the following queries
select case when 1 != null then TRUE else FALSE end as test;
select case when 1 = null then TRUE else FALSE end as test;
both result in FALSE. This is true for any number check (e.g., when 2 != null, when 3 != null, ..., etc.
Why doesn't CASE WHEN show TRUE for 1 != null?
Must I put in the check "or is null"? E.g.,
CASE WHEN
user_answered_id != expected_answer_id
OR user_answered_id IS NULL
THEN TRUE
ELSE FALSE
END as user_incorrect_tally
What you are looking for is: IS DISTINCT FROM
select 2 is distinct from null;
?column?
----------
t
select 2 is distinct from 1;
?column?
----------
t
From the docs:
datatype IS DISTINCT FROM datatype → boolean
Not equal, treating null as a comparable value.
1 IS DISTINCT FROM NULL → t (rather than NULL)
NULL IS DISTINCT FROM NULL → f (rather than NULL)
SQL uses three-valued logic: true, false, and null. Null is not false. Null can be thought of as "no value".
Operations on null almost always yield null. So 1 != null is null. 1 = null is null. null = null is null. 5 < null is null. Etc.
To check for null, use is null and is not null.
Back to your query. is not distinct from and is distinct from are like = and != which treat null as a comparable value. So null is distinct from 1 will be true.
select
user_answered_id is distinct from expected_answer_id as user_incorrect
If you need to convert a null into a different value such as 0 or an empty string, use coalesce.
select
coalesce(user_answered_text, 'No Answer')
Your column is named "tally", but a tally means a count. If you intend to count a user's true and false answers use count with a filter.
select
count(user_answered_id) filter (
where user_answered_id = expected_answer_id
) as user_correct_tally,
-- count ignores null, this will only be the questions they tried to answer
count(user_answered_id) as user_answered_tally,
count(user_answered_id) filter (
where user_answered_is is distinct from expected_answer_id
) as user_incorrect_tally
Yes, You should check NULL value with is null, And last query you wrote is correct.
I suggest you to read below documents:
https://www.postgresql.org/docs/current/functions-comparison.html

JPQL query with input parameter collection containing null

I need to compare a nullable entity property via IN expression within the following JPQL query:
#NamedQuery(name = "query",
query = "SELECT e FROM MyEntity e WHERE e.status IN :statuses")
Now, I like the shown collection-valued input parameter statuses to optionally contain null as an element:
final List<MyEntity> actual = entityManager.createNamedQuery("query", MyEntity.class)
.setParameter("statuses", Arrays.asList(null, 1L))
.getResultList();
However with Hibernate/Derby an actual result list only contains entities with status 1L but not null.
I have not found anything in the JPA 2.2 specification about this case. Did I miss something or is this vendor-specific?
The answers to this question only solve part of my problem. In their proposed solutions, the null comparison is hard-baked into the query and cannot be controlled via the collection-valued parameter.
As a Java programmer, where null = null yields true it might come as a surprise that in SQL (and JPQL) null = null is itself null which is "falsy". As a result, null in (null) yields null as well.
Instead you need to treat null seperately with a IS NULL check: e.status IS NULL OR e.status IN :statuses.
This is described in 4.11 Null Values of the JPA Specification:
Comparison or arithmetic operations with a NULL value always yield an unknown value.
Two NULL values are not considered to be equal, the comparison yields an unknown value.

Update with ISNULL and operation

original query looks like this :
UPDATE reponse_question_finale t1, reponse_question_finale t2 SET
t1.nb_question_repondu = (9-(ISNULL(t1.valeur_question_4)+ISNULL(t1.valeur_question_6)+ISNULL(t1.valeur_question_7)+ISNULL(t1.valeur_question_9))) WHERE t1.APPLICATION = t2.APPLICATION;
I know you cannot update 2 tables in a single query so i tried this :
UPDATE reponse_question_finale t1
SET nb_question_repondu = (9-(COALESCE(t1.valeur_question_4,'')::int+COALESCE(t1.valeur_question_6,'')::int+COALESCE(t1.valeur_question_7)::int+COALESCE(t1.valeur_question_9,'')::int))
WHERE t1.APPLICATION = t1.APPLICATION;
But this query gaves me an error : invalid input syntax for integer: ""
I saw that the Postgres equivalent to MySQL is COALESCE() so i think i'm on the good way here.
I also know you cannot add varchar to varchar so i tried to cast it to integer to do that. I'm not sure if i casted it correctly with parenthesis at the good place and regarding to error maybe i cannot cast to int with coalesce.
Last thing, i can certainly do a co-related sub-select to update my two tables but i'm a little lost at this point.
The output must be an integer matching the number of questions answered to a backup survey.
Any thoughts?
Thanks.
coalesce() returns the first non-null value from the list supplied. So, if the column value is null the expression COALESCE(t1.valeur_question_4,'') returns an empty string and that's why you get the error.
But it seems you want something completely different: you want check if the column is null (or empty) and then subtract a value if it is to count the number of non-null columns.
To return 1 if a value is not null or 0 if it isn't you can use:
(nullif(valeur_question_4, '') is null)::int
nullif returns null if the first value equals the second. The IS NULL condition returns a boolean (something that MySQL doesn't have) and that can be cast to an integer (where false will be cast to 0 and true to 1)
So the whole expression should be:
nb_question_repondu = 9 - (
(nullif(t1.valeur_question_4,'') is null)::int
+ (nullif(t1.valeur_question_6,'') is null)::int
+ (nullif(t1.valeur_question_7,'') is null)::int
+ (nullif(t1.valeur_question_9,'') is null)::int
)
Another option is to unpivot the columns and do a select on them in a sub-select:
update reponse_question_finale
set nb_question_repondu = (select count(*)
from (
values
(valeur_question_4),
(valeur_question_6),
(valeur_question_7),
(valeur_question_9)
) as t(q)
where nullif(trim(q),'') is not null);
Adding more columns to be considered is quite easy then, as you just need to add a single line to the values() clause

Py-postgresql: WHERE param = None not working

I am using python3.6 and py-postgresql==1.2.1.
I have the following statement:
db.prepapre("SELECT * FROM seasons WHERE user_id=$1 AND season_id=$2 LIMIT 1), where season_id can be NULL.
I want to be able to be able to get the latest record with a NULL season_id by passing None as the $2 param, but it does not work. Instead, I need to create this second statement:
db.prepapre("SELECT * FROM seasons WHERE user_id=$1 AND season_id IS NULL LIMIT 1)
It must have something to do with season_id = NULL not working and season_id IS NULL is, but is there a way to make this work?
From Comparison Functions and Operators:
Do not write expression = NULL because NULL is not “equal to” NULL. (The null value represents an unknown value, and it is not known whether two unknown values are equal.)
Some applications might expect that expression = NULL returns true if expression evaluates to the null value. It is highly recommended that these applications be modified to comply with the SQL standard. However, if that cannot be done the transform_null_equals configuration variable is available. If it is enabled, PostgreSQL will convert x = NULL clauses to x IS NULL.
and:
19.13.2. Platform and Client Compatibility
transform_null_equals (boolean)
When on, expressions of the form expr = NULL (or NULL = expr) are treated as expr IS NULL, that is, they return true if expr evaluates to the null value, and false otherwise. The correct SQL-spec-compliant behavior of expr = NULL is to always return null (unknown). Therefore this parameter defaults to off.
You could rewrite your query:
SELECT *
FROM seasons
WHERE user_id = $1
AND (season_id = $2 OR ($2 IS NULL AND season_id IS NULL))
-- ORDER BY ... --LIMIT without sorting could be dangerous
-- you should explicitly specify sorting
LIMIT 1;