PostgreSQL WHERE Clause what is included/excluded? - postgresql

I have come across a postgresql query with a lot of WHERE statements like this
WHERE (category <> 'A' OR category IS NULL)
AND (category <> 'B' OR category IS NULL)
I am struggling to understand what data this query is including/excluding.
I tried rewriting the code above as
WHERE category NOT IN ('A','B')
WHERE category NOT IN ('A', 'B') OR category IS NULL
WHERE (category NOT IN ('A', 'B') OR category IS NULL)
And all three gave different answers to the original code.
Could someone explain to me what data in included/excluded in each of the four cases above?
Say for example the data looked like
ID
Category
1
A
2
B
3
C
4
D
5
NULL
For (1) I would just get ID's 3, 4. But I am unsure about the others.
EDIT: WHERE (category NOT IN ('A', 'B') OR category IS NULL) and
WHERE (category <> 'A' OR category IS NULL)
AND (category <> 'B' OR category IS NULL)
Give the same answer.
But WHERE category NOT IN ('A', 'B') OR category IS NULL without parenthesis gives a different answer.

To correctly understand the output of the mentioned queries you have to think in the following way: take one by one all the lines that satisfy the WHERE clause.
Query 1
WHERE category NOT IN ('A','B')
The query 1 should give all the lines with the attribute category not in the set specified.
If you proceed step by step, one line at a time you can see that:
the first 2 lines are not included in the output since the category column contains values in the set ('A','B')
the next 2 lines are included in the output since the category column doesn't contain values in the set ('A','B')
the last line is not included in the output since the NULL values are evaluated as UNKNOWN according to the Three-Valued Logic
To better understand the last point the clause WHERE category NOT IN ('A','B') can be rewritten as WHERE category<>'A' AND category<>'B'. Since category is NULL the logical expression is evaluated in the following way WHERE NULL<>'A' AND NULL<>'B', which output is UNKNOWN, so the line will not be included in the output result.
Queries 2 & 3
WHERE category NOT IN ('A', 'B') OR category IS NULL
Queries 2 and 3 are the same, since parentheses in this case doesn't affect the evaluation order of the logical operators.
In this particular case the last line of the example table above is included in the output since category NOT IN ('A','B') is evaluated as UNKNOWN and category IS NULL is evaluated to TRUE. For the same reason mentioned above (Three-Valued Logic) the result of UNKNOWN or TRUE is TRUE.

I think that you are struggling in the logic of the query, the first one is easy to understand
where category not in ('A','B')
you will get id : 3,4,5
Note that it is better if you use the ids of the category than the letters.
in the second query
where category not in ('A','B') or category is null
both the condition will be true and both will be done
you will get ids of : 3,4
and the third one it has to give you the same output of the (2) condition

Related

Postgres: Query Values in nested jsonb-structure with unknown keys

I am quite new in working with psql.
Goal is to get values from a nested jsonb-structure where the last key has so many different characteristics it is not possible to query them explicitely.
The jsonb-structure in any row is as follows:
TABLE_Products
{'products':[{'product1':['TYPE'], 'product2':['TYPE2','TYPE3'], 'productN':['TYPE_N']}]}
I want to get the values (TYPE1, etc.) assigned to each product-key (product1, etc.). The product-keys are the unknown, because of too many different names.
My work so far achieves to pull out a tuple for each key:value-pair on the last level. To illustrate this here you can see my code and the results from the previously described structure.
My Code:
select url, jsonb_each(pro)
from (
select id , jsonb_array_elements(data #> '{products}') as pro
from TABLE_Products
where data is not null
) z
My result:
("product2","[""TYPE2""]")
("product2","[""TYPE3""]")
My questions:
Is there a way to split this tuple on two columns?
Or how can I query the values kind of 'unsupervised', so without knowing the exact names of 'product1 ... n'

Neo4J - How to merge nodes with the same attribute value (name) but different node labels

I currently have duplicate name values across different node labels and I want to merge them. The issue is that every question I've found online assumes that BOTH the attribute name and the labels are the same. The code I've executed to query the instances I'm referring to is:
MATCH (a)-[r:FEATURED_IN]->(b) WHERE a.name = b.name AND id(a) <> id(b)
So that means that 'a' is featured in 'b', obviously 'a' and 'b' refer to the different node labels but the values are the same. How can I perform a merge to ensure that the 'b' node is deleted and only the 'a' node is returned? I know I could do this manually but there are so many instances of this that I would like to find a quick fix.
Thanks in advance.
You can collect all names in the nodes and combine them. Then UNWIND (it is like a For loop) and return distinct name.
MATCH (a)-[r:SYN_OF]->(b) WHERE a.name = b.name AND id(a) <> id(b)
WITH collect(distinct a.name) + collect(distinct b.name) as names
UNWIND names as name
RETURN distinct name
Result:
╒═══════════╕
│"name" │
╞═══════════╡
│"Same Name"│
└───────────┘

Get string after ',' delimeter comma or special characters

The field name is message, table name is log.
Data Examples:
Values for message:
"(wsname,cmdcode,stacode,data,order_id) values (hyd-l904149,2,1,,1584425657892);"
"(wsname,cmdcode,stacode,data,order_id) values (hyd-l93mt54,2,1,,1584427657892);"
(command_execute,order_id,workstation,cmdcode,stacode,application_to_kill,application_parameters) values (kill, 1583124192811, hyd-psag314, 10, 2, tsws.exe, -u production ); "
and in log table i need to get separated column wsname with values as hyd-l904149 and hyd-l93mt54 and hyd-psag314, column cmdcode with values as 2,2 and 10 and column stacode with values as 1,1 and 2, e.g.:
wsname cmdcode stacode
hyd-l904149 2 1
hyd-l93mt54 2 1
hyd-psag314 10 2
Use regexp_matches to extract left and right part of values clause, then regexp_split_to_array to split these parts by commas, then filter rows containing wsname using = any(your_array) construct, then select required columns from array.
Or - alternative solution - fix data to be syntactically valid part of insert statement, create auxiliary tables, insert data into them and then just select.
As in comment section I mentioned about inbuilt function in posgressql
split_part(string,delimiter, field_number)
http://www.sqlfiddle.com/#!15/eb1df/1
As the json capabilities of the un-supported version 9.3 are very limited, I would install the hstore extension, and then do it like this:
select coalesce(vals -> 'wsname', vals -> 'workstation') as wsname,
vals -> 'cmdcode' as cmdcode,
vals -> 'stacode' as stacode
from (
select hstore(regexp_split_to_array(e[1], '\s*,\s*'), regexp_split_to_array(e[2], '\s*,\s*')) as vals
from log l,
regexp_matches(l.message, '\(([^\)]+)\)\s+values\s+\(([^\)]+)\)') as x(e)
) t
regexp_matches() splits the message into two arrays: one for the list of column names and one for the matching values. These arrays are used to create a key/value pair so that I can access the value for each column by the column name.
If you know that the positions of the columns are always the same, you can remove the use of the hstore type. But that would require quite a huge CASE expression to test where the actual columns appear.
Online example
With a modern, supported version of Postgres, I would use jsonb_object(text[], text[]) passing the two arrays resulting from the regexp_matches() call.

CASE in JOIN not working PostgreSQL

I got the following tables:
Teams
Matches
I want to get an output like:
matches.semana | teams.nom_equipo | teams.nom_equipo | Winner
1 AMERICA CRUZ AZUL AMERICA
1 SANTOS MORELIA MORELIA
1 LEON CHIVAS LEON
The columns teams.nom_equipo reference to matches.num_eqpo_lo & to matches.num_eqpo_v and at the same time they reference to the column teams.nom_equipo to get the name of each team based on their id
Edit: I have used the following:
SELECT m.semana, t_loc.nom_equipo AS LOCAL, t_vis.nom_equipo AS VISITANTE,
CASE WHEN m.goles_loc > m.goles_vis THEN 'home'
WHEN m.goles_vis > m.goles_loc THEN 'visitor'
ELSE 'tie'
END AS Vencedor
FROM matches AS m
JOIN teams AS t_loc ON (m.num_eqpo_loc = t_loc.num_eqpo)
JOIN teams AS t_vis ON (m.num_eqpo_vis = t_vis.num_eqpo)
ORDER BY m.semana;
But as you can see from the table Matches in row #5 from the goles_loc column (home team) & goles_vis (visitor) column, they have 2 vs 2 (number of goals - home vs visitor) being a tie but and when I run the code I get something that is not a tie:
Matches' score
Resultset from Select:
I also noticed that since the row #5 the names of both teams in the matches are not correct (both visitor & home team).
So, the Select brings correct data but in other order different than the original order (referring to the order from the table matches)
The order from the second week must be:
matches.semana | teams.nom_equipo | teams.nom_equipo | Winner
5 2 CRUZ AZUL TOLUCA TIE
6 2 MORELIA LEON LEON
7 2 CHIVAS SANTOS TIE
Row 8 from the Resultset must be Row # 5 and so on.
Any help would be really thanked!
When doing a SELECT which includes null for a column, that's the value it will always be, so winner in your case will never be populated.
Something like this is probably more along the lines of what you want:
SELECT m.semana, t_loc.nom_equipo AS loc_equipo, t_vis.nom_equipo AS vis_equipo,
CASE WHEN m.goles_loc - m.goles_vis > 0 THEN t_loc.nom_equipo
WHEN m.goles_vis - m.goles_loc > 0 THEN t_vis.nom_equipo
ELSE NULL
END AS winner
FROM matches AS m
JOIN teams AS t_loc ON (m.nom_eqpo_loc = t.num_eqpo)
JOIN teams AS t_vis ON (m.nom_eqpo_vis = t.num_eqpo)
ORDER BY m.semana;
Untested, but this should provide the general approach. Basically, you JOIN to the teams table twice, but using different conditions, and then you need to calculate the scores. I'm using NULL to indicate a tie, here.
Edit in response to comment from OP:
It's the same table -- teams -- but the JOINs produce different results, because the query uses different JOIN conditions in each JOIN.
The first JOIN, for t_loc, compares m.nom_eqpo_loc to t.num_eqpo. This means it gets the teams rows for the home team.
The second JOIN, for t_vis, compares m.nom_eqpo_vis to t.num_eqpo. This means it gets the teams rows for the visting team.
Therefore, in the CASE statement, t_loc refers to the home team, while t_vis refers to the visting one, enabling both to be used in the CASE statement, enabling the correct name to be found for winning.
Edit in response to follow-up comment from OP:
My original query was sorting by m.semana, which means other columns can appear in any order (essentially whichever Postgres feels is most efficient).
If you want the resulting table to be sorted exactly the same way as the matches table, then use the same ORDER BY tuple in its ORDER BY.
So, the ORDER BY clause would then become:
ORDER BY m.semana, m.nom_eqpo_loc, m.nom_eqpo_vis
Basically, the matches table PRIMARY KEY tuple.

Longest matching substring

How would you search for the longest match within a varchar variable? For example, table GOB has entries as follows:
magic_word | prize
===================
sh| $0.20
sha| $0.40
shaz| $0.60
shaza| $1.50
I would like to write a plpgsql function that takes amongst other arguments a string as input (e.g. shazam), and returns the 'prize' column on the row of GOB with the longest matching substring. In the example shown, that would be $1.50 on the row with magic_word shaza.
All the function format I can handle, it's just the matching bit. I can't think of an elegant solution. I'm guessing it's probably really easy, but I am scratching my head. I don't know the input string at the start, as it will be derived from the result of a query on another table.
Any ideas?
Simple solution
SELECT magic_word
FROM gob
WHERE 'shazam' LIKE (magic_word || '%')
ORDER BY magic_word DESC
LIMIT 1;
This works because the longest match sorts last - so I sort DESC and pick the first match.
I am assuming from your example that you want to match left-anchored, from the beginning of the string. If you want to match anywhere in the string (which is more expensive and even harder to back up with an index), use:
...
WHERE 'shazam' LIKE ('%' || magic_word || '%')
...
SQL Fiddle.
Performance
The query is not sargable. It might help quite a bit if you had additional information, like a minimum length that you could base an index on, to reduce the number of rows to consider. It needs to be criteria that gets you less than ~ 5% of the table to be effective. So, initials (a natural minimum pick) may or may not be useful. But two or three letters at the start might help quite a bit.
In fact you could optimize this iteratively. Something along the line of:
Try a partial index of words with 15 letters+
If not found, try 12 letters+
If not found, try 9 letters+
...
A simple case of what I outlined in this related answer on dba.SE:
Can spatial index help a “range - order by - limit” query
Another approach would be to use a trigram index. You'd need the additional module pg_trgm for that. Normally you would search with a short pattern in a table with longer strings. But trigrams work for your reverse approach, too, with some limitations. Obviously you couldn't match a string with just two characters in the middle of a longer string using trigrams ... Test for corner cases.
There are a number of answers here on SO with more information. Example:
Effectively query on column that includes a substring
Advanced solution
Consider the solution under this closely related question for a whole table of search strings. Implemented with a recursive CTE:
Longest Prefix Match
How about
1
select max(FOO.matchingValue)
from
(
select magic_word as matchingValue
from T
where substr( "abracadabra", 1, length(magic_word)) = magic_word
)
as FOO
2
select prize from
T
join
(
select max(FOO.matchingValue) as MaxValue
from
(
select magic_word as matchingValue
from T
where substr( "abracadabra", 1, length(magic_word)) = magic_word
)
as FOO
) as BAR
on BAR.MaxValue = T.magic_word