Postgres query for IN(NULL, 'test') does not work - postgresql

When I wan't to match a column that has some certain string values or is null, I assumed I can do something like this:
SELECT * FROM table_name WHERE column_name IN (NULL, 'someTest', 'someOtherTest');
But it does not return the columns where column_name set set to NULL. Is this anywhere documented? Why does it not work?

You can't compare NULL values using = (which is what IN is doing).
Quote from the manual
Ordinary comparison operators yield null (signifying “unknown”), not true or false, when either input is null. For example, 7 = NULL yields null, as does 7 <> NULL
You need to add a check for NULL explicitly:
SELECT *
FROM table_name
WHERE (column_name IN ('someTest', 'someOtherTest') OR column_name IS NULL);

NULL and empty string (i.e ' ') both are considered different in Postgres, unlike Oracle.
The query can be modified as:
SELECT *
FROM table_name
WHERE (column_name IN ('someTest', 'someOtherTest', '', ' ') OR
column_name IS NULL);

Related

PostgreSQL - Comparison operator with character varying - Exclude values

I want to query a PostgreSQL table with comparison operators. This table have two character varying columns.
Table
CREATE TABLE IF NOT EXISTS test.test
(
scope character varying COLLATE pg_catalog."default",
project_code character varying COLLATE pg_catalog."default"
)
Values
INSERT INTO test.test(scope, project_code) VALUES (NULL, 'AA');
INSERT INTO test.test(scope, project_code) VALUES ('A', 'AA');
When I wan't to query values with a project_code = 'AA' and a scope = 'A', I write:
SELECT * FROM test.test WHERE project_code LIKE 'AA' AND scope LIKE 'A';
It returns me one row, result is ok.
But when I try to query values with a project_code = 'AA' and scope with any other values than 'A', I write:
SELECT * FROM test.test WHERE project_code LIKE 'AA' AND scope NOT LIKE 'A';
It doesn't return me any results. But I have a row who match this. How to explain this and how to write this query ?
I try other comparaison operators <> and !=, same result. I'm using PostgreSQL 13.6.
You need to use a NULL safe comparison operator. The SQL standard defines the is not distinct from operator as the NULL safe version of <> and Postgres supports this:
SELECT *
FROM test.test
WHERE project_code = 'AA'
AND scope IS DISTINCT FROM 'A';
NULL in most operations will return NULL. For example
SELECT NULL LIKE 'A', NULL NOT LIKE 'A'
returns (NULL, NULL). Probably handling the NULL case specifically helps:
SELECT
*
FROM
test.test
WHERE
project_code LIKE 'AA'
AND (scope IS NULL OR scope NOT LIKE 'A')
The solution offered by #a_horse_with_no_name is more elegant; this solution may be interesting when using "wildcards" in the LIKE operator.
select null like 'a' is true; --return false
select null not like 'a' is true; --return false
select null like 'a'; --return null
select null not like 'a' ; --return null
https://www.postgresql.org/docs/current/functions-matching.html.
If pattern does not contain percent signs or underscores, then the
pattern only represents the string itself; in that case LIKE acts like
the equals operator. An underscore (_) in pattern stands for (matches)
any single character; a percent sign (%) matches any sequence of zero
or more characters.

Concatenate string instead of just replacing it

I have a table with standard columns where I want to perform regular INSERTs.
But one of the columns is of type varchar with special semantics. It's a string that's supposed to behave as a set of strings, where the elements of the set are separated by commas.
Eg. if one row has in that varchar column the value fish,sheep,dove, and I insert the string ,fish,eagle, I want the result to be fish,sheep,dove,eagle (ie. eagle gets added to the set, but fish doesn't because it's already in the set).
I have here this Postgres code that does the "set concatenation" that I want:
SELECT string_agg(unnest, ',') AS x FROM (SELECT DISTINCT unnest(string_to_array('fish,sheep,dove' || ',fish,eagle', ','))) AS x;
But I can't figure out how to apply this logic to insertions.
What I want is something like:
CREATE TABLE IF NOT EXISTS t00(
userid int8 PRIMARY KEY,
a int8,
b varchar);
INSERT INTO t00 (userid,a,b) VALUES (0,1,'fish,sheep,dove');
INSERT INTO t00 (userid,a,b) VALUES (0,1,',fish,eagle')
ON CONFLICT (userid)
DO UPDATE SET
a = EXCLUDED.a,
b = SELECT string_agg(unnest, ',') AS x FROM (SELECT DISTINCT unnest(string_to_array(t00.b || EXCLUDED.b, ','))) AS x;
How can I achieve something like that?
Storing comma separated values is a huge mistake to begin with. But if you really want to make your life harder than it needs to be, you might want to create a function that merges two comma separated lists:
create function merge_lists(p_one text, p_two text)
returns text
as
$$
select string_agg(item, ',')
from (
select e.item
from unnest(string_to_array(p_one, ',')) as e(item)
where e.item <> '' --< necessary because of the leading , in your data
union
select t.item
from unnest(string_to_array(p_two, ',')) t(item)
where t.item <> ''
) t;
$$
language sql;
If you are using Postgres 14 or later, unnest(string_to_array(..., ',')) can be replace with string_to_table(..., ',')
Then your INSERT statement gets a bit simpler:
INSERT INTO t00 (userid,a,b) VALUES (0,1,',fish,eagle')
ON CONFLICT (userid)
DO UPDATE SET
a = EXCLUDED.a,
b = merge_lists(excluded.b, t00.b);
I think I was only missing parentheses around the SELECT statement:
INSERT INTO t00 (userid,a,b) VALUES (0,1,',fish,eagle')
ON CONFLICT (userid)
DO UPDATE SET
a = EXCLUDED.a,
b = (SELECT string_agg(unnest, ',') AS x FROM (SELECT DISTINCT unnest(string_to_array(t00.b || EXCLUDED.b, ','))) AS x);

How to use queried table name in subquery

I'm trying to query field names as well as their maximum length in their corresponding table with a single query - is it at all possible? I've read about correlated subqueries, but I couldn't get the desired result.
Here is the query I have so far:
select T1.RDB$FIELD_NAME, T2.RDB$FIELD_NAME, T2.RDB$RELATION_NAME as tabName, T1.RDB$CHARACTER_SET_ID, T1.RDB$FIELD_LENGTH,
(select max(char_length(T2.RDB$FIELD_NAME))
FROM tabName as MaxLength)
from RDB$FIELDS T1, RDB$RELATION_FIELDS T2
The above doesn't work because, of course, here the subquery tries to find "tabName" table. My guess is that I should use some kind of joins, but my SQL skills are very limited in this matter.
The origin of the request is that I want to apply this script in order to transform all my non-utf8 fields to UTF8 but I run into "string truncation" issues, as I have a few `VARCHAR(8192)' fields that lead to string truncation errors with the script. Usually, none of the fields would actually use these 8192 chars, but I'd rather make sure before truncating.
What you're trying to do cannot be done this way. It looks like you want to obtain the actual maximum length of fields in tables, but you cannot dynamically reference table and column names like this; being able to do that would be a SQL injection heaven. In addition, your use of a SQL-89 cross join instead of an inner join (preferably in SQL-92 style) causes other problems, as you will combine fields incorrectly (as a Cartesian product).
Instead you need to write PSQL to dynamically build and execute the statement to obtain the lengths (using EXECUTE BLOCK (or a stored procedure) and EXECUTE STATEMENT).
For example, something like this:
execute block
returns (
table_name varchar(63) character set unicode_fss,
column_name varchar(63) character set unicode_fss,
type varchar(10),
length smallint,
charset_name varchar(63) character set unicode_fss,
collation_name varchar(63) character set unicode_fss,
max_length smallint)
as
begin
for select
trim(rrf.RDB$RELATION_NAME) as table_name,
trim(rrf.RDB$FIELD_NAME) as column_name,
case rf.RDB$FIELD_TYPE when 14 then 'CHAR' when 37 then 'VARCHAR' end as type,
coalesce(rf.RDB$CHARACTER_LENGTH, rf.RDB$FIELD_LENGTH / rcs.RDB$BYTES_PER_CHARACTER) as length,
trim(rcs.RDB$CHARACTER_SET_NAME) as charset_name,
trim(rc.RDB$COLLATION_NAME) as collation_name
from RDB$RELATIONS rr
inner join RDB$RELATION_FIELDS rrf
on rrf.RDB$RELATION_NAME = rr.RDB$RELATION_NAME
inner join RDB$FIELDS rf
on rf.RDB$FIELD_NAME = rrf.RDB$FIELD_SOURCE
inner join RDB$CHARACTER_SETS rcs
on rcs.RDB$CHARACTER_SET_ID = rf.RDB$CHARACTER_SET_ID
left join RDB$COLLATIONS rc
on rc.RDB$CHARACTER_SET_ID = rf.RDB$CHARACTER_SET_ID
and rc.RDB$COLLATION_ID = rf.RDB$COLLATION_ID
and rc.RDB$COLLATION_NAME <> rcs.RDB$DEFAULT_COLLATE_NAME
where coalesce(rr.RDB$RELATION_TYPE, 0) = 0 and coalesce(rr.RDB$SYSTEM_FLAG, 0) = 0
and rf.RDB$FIELD_TYPE in (14 /* char */, 37 /* varchar */)
into table_name, column_name, type, length, charset_name, collation_name
do
begin
execute statement 'select max(character_length("' || replace(column_name, '"', '""') || '")) from "' || replace(table_name, '"', '""') || '"'
into max_length;
suspend;
end
end
As an aside, the maximum length of a VARCHAR of character set UTF8 is 8191, not 8192.

Postgresql, select a "fake" row

In Postgres 8.4 or higher, what is the most efficient way to get a row of data populated by defaults without actually creating the row. Eg, as a transaction (pseudocode):
create table "mytable"
(
id serial PRIMARY KEY NOT NULL,
parent_id integer NOT NULL DEFAULT 1,
random_id integer NOT NULL DEFAULT random(),
)
begin transaction
fake_row = insert into mytable (id) values (0) returning *;
delete from mytable where id=0;
return fake_row;
end transaction
Basically I'd expect a query with a single row where parent_id is 1 and random_id is a random number (or other function return value) but I don't want this record to persist in the table or impact on the primary key sequence serial_id_seq.
My options seem to be using a transaction like above or creating views which are copies of the table with the fake row added but I don't know all the pros and cons of each or whether a better way exists.
I'm looking for an answer that assumes no prior knowledge of the datatypes or default values of any column except id or the number or ordering of the columns. Only the table name will be known and that a record with id 0 should not exist in the table.
In the past I created the fake record 0 as a permanent record but I've come to consider this record a type of pollution (since I typically have to filter it out of future queries).
You can copy the table definition and defaults to the temp table with:
CREATE TEMP TABLE table_name_rt (LIKE table_name INCLUDING DEFAULTS);
And use this temp table to generate dummy rows. Such table will be dropped at the end of the session (or transaction) and will only be visible to current session.
You can query the catalog and build a dynamic query
Say we have this table:
create table test10(
id serial primary key,
first_name varchar( 100 ),
last_name varchar( 100 ) default 'Tom',
age int not null default 38,
salary float default 100.22
);
When you run following query:
SELECT string_agg( txt, ' ' order by id )
FROM (
select 1 id, 'SELECT ' txt
union all
select 2, -9999 || ' as id '
union all
select 3, ', '
|| coalesce( column_default, 'null'||'::'||c.data_type )
|| ' as ' || c.column_name
from information_schema.columns c
where table_schema = 'public'
and table_name = 'test10'
and ordinal_position > 1
) xx
;
you will get this sting as a result:
"SELECT -9999 as id , null::character varying as first_name ,
'Tom'::character varying as last_name , 38 as age , 100.22 as salary"
then execute this query and you will get the "phantom row".
We can build a function that build and excecutes the query and return our row as a result:
CREATE OR REPLACE FUNCTION get_phantom_rec (p_i test10.id%type )
returns test10 as $$
DECLARE
v_sql text;
myrow test10%rowtype;
begin
SELECT string_agg( txt, ' ' order by id )
INTO v_sql
FROM (
select 1 id, 'SELECT ' txt
union all
select 2, p_i || ' as id '
union all
select 3, ', '
|| coalesce( column_default, 'null'||'::'||c.data_type )
|| ' as ' || c.column_name
from information_schema.columns c
where table_schema = 'public'
and table_name = 'test10'
and ordinal_position > 1
) xx
;
EXECUTE v_sql INTO myrow;
RETURN myrow;
END$$ LANGUAGE plpgsql ;
and then this simple query gives you what you want:
select * from get_phantom_rec ( -9999 );
id | first_name | last_name | age | salary
-------+------------+-----------+-----+--------
-9999 | | Tom | 38 | 100.22
I would just select the fake values as literals:
select 1 id, 1 parent_id, 1 user_id
The returned row will be (virtually) indistinguishable from a real row.
To get the values from the catalog:
select
0 as id, -- special case for serial type, just return 0
(select column_default::int -- Cast to int, because we know the column is int
from INFORMATION_SCHEMA.COLUMNS
where table_name = 'mytable'
and column_name = 'parent_id') as parent_id,
(select column_default::int -- Cast to int, because we know the column is int
from INFORMATION_SCHEMA.COLUMNS
where table_name = 'mytable'
and column_name = 'user_id') as user_id;
Note that you must know what the columns are and their type, but this is reasonable. If you change the table schema (except default value), you would need to tweak the query.
See the above as a SQLFiddle.

Most succinct way to transform a CSV string to a table in T-SQL?

-- Given a CSV string like this:
declare #roles varchar(800)
select #roles = 'Pub,RegUser,ServiceAdmin'
-- Question: How to get roles into a table view like this:
select 'Pub'
union
select 'RegUser'
union
select 'ServiceAdmin'
After posting this, I started playing with some dynamic SQL. This seems to work, but seems like there might be some security risks by using dynamic SQL - thoughts on this?
declare #rolesSql varchar(800)
select #rolesSql = 'select ''' + replace(#roles, ',', ''' union select ''') + ''''
exec(#rolesSql)
If you're working with SQL Server compatibility level 130 then the STRING_SPLIT function is now the most succinct method available.
Reference link: https://msdn.microsoft.com/en-gb/library/mt684588.aspx
Usage:
SELECT * FROM string_split('Pub,RegUser,ServiceAdmin',',')
RESULT:
value
-----------
Pub
RegUser
ServiceAdmin
See my answer from here
But basically you would:
Create this function in your DB:
CREATE FUNCTION dbo.Split(#origString varchar(max), #Delimiter char(1))
returns #temptable TABLE (items varchar(max))
as
begin
declare #idx int
declare #split varchar(max)
select #idx = 1
if len(#origString )<1 or #origString is null return
while #idx!= 0
begin
set #idx = charindex(#Delimiter,#origString)
if #idx!=0
set #split= left(#origString,#idx - 1)
else
set #split= #origString
if(len(#split)>0)
insert into #temptable(Items) values(#split)
set #origString= right(#origString,len(#origString) - #idx)
if len(#origString) = 0 break
end
return
end
and then call the function and pass in the string you want to split.
Select * From dbo.Split(#roles, ',')
Here's a thorough discussion of your options:
Arrays and Lists in SQL Server
What i do in this case is just using some string replace to convert it to json and open the json like a table. May not be suitable for every use case but it is very simple to get running and works with strings and files. With files you just need to watch your line break character, mostly i find it to be "Char(13)+Char(10)"
declare #myCSV nvarchar(MAX)= N'"Id";"Duration";"PosX";"PosY"
"•P001";223;-30;35
"•P002";248;-28;35
"•P003";235;-26;35'
--CSV to JSON
--convert to json by replacing some stuff
declare #myJson nvarchar(MAX)= '[['+ replace(#myCSV, Char(13)+Char(10), '],[' ) +']]'
set #myJson = replace(#myJson, ';',',') -- Optional: ensure coma delimiters for json if the current delimiter differs
-- set #myJson = replace(#myJson, ',,',',null,') -- Optional: empty in between
-- set #myJson = replace(#myJson, ',]',',null]') -- Optional: empty before linebreak
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT 0))-1 AS LineNumber, *
FROM OPENJSON( #myJson )
with (
col0 varchar(255) '$[0]'
,col1 varchar(255) '$[1]'
,col2 varchar(255) '$[2]'
,col3 varchar(255) '$[3]'
,col4 varchar(255) '$[4]'
,col5 varchar(255) '$[5]'
,col6 varchar(255) '$[6]'
,col7 varchar(255) '$[7]'
,col8 varchar(255) '$[8]'
,col9 varchar(255) '$[9]'
--any name column count is possible
) csv
order by (SELECT 0) OFFSET 1 ROWS --hide header row
Using SQL Server's built in XML parsing is also an option. Of course, this glosses over all the nuances of an RFC-4180 compliant CSV.
-- Given a CSV string like this:
declare #roles varchar(800)
select #roles = 'Pub,RegUser,ServiceAdmin'
-- Here's the XML way
select split.csv.value('.', 'varchar(100)') as value
from (
select cast('<x>' + replace(#roles, ',', '</x><x>') + '</x>' as xml) as data
) as csv
cross apply data.nodes('/x') as split(csv)
If you are using SQL 2016+, using string_split is better, but this is a common way to do this prior to SQL 2016.
Using BULK INSERT you can import a csv file into your sql table -
http://blog.sqlauthority.com/2008/02/06/sql-server-import-csv-file-into-sql-server-using-bulk-insert-load-comma-delimited-file-into-sql-server/
Even the accepted answer is working fine. but I got this function much faster even for thousands of record. create below function and use.
IF EXISTS (
SELECT 1
FROM Information_schema.Routines
WHERE Specific_schema = 'dbo'
AND specific_name = 'FN_CSVToStringListTable'
AND Routine_Type = 'FUNCTION'
)
BEGIN
DROP FUNCTION [dbo].[FN_CSVToStringListTable]
END
GO
CREATE FUNCTION [dbo].[FN_CSVToStringListTable] (#InStr VARCHAR(MAX))
RETURNS #TempTab TABLE (Id NVARCHAR(max) NOT NULL)
AS
BEGIN
;-- Ensure input ends with comma
SET #InStr = REPLACE(#InStr + ',', ',,', ',')
DECLARE #SP INT
DECLARE #VALUE VARCHAR(1000)
WHILE PATINDEX('%,%', #INSTR) <> 0
BEGIN
SELECT #SP = PATINDEX('%,%', #INSTR)
SELECT #VALUE = LEFT(#INSTR, #SP - 1)
SELECT #INSTR = STUFF(#INSTR, 1, #SP, '')
INSERT INTO #TempTab (Id)
VALUES (#VALUE)
END
RETURN
END
GO
---Test like this.
declare #v as NVARCHAR(max) = N'asdf,,as34df,234df,fs,,34v,5fghwer,56gfg,';
SELECT Id FROM dbo.FN_CSVToStringListTable(#v)
I was about you use the solution mentioned in the accepted answer, but doing more research led me to use Table Value Types:
These are far more efficient and you don't need a TVF (Table valued function) just to create a table from csv. You can use it directly in your scripts or pass that to a stored procedure as a Table Value Parameter. The Type can be created as :
CREATE TYPE [UniqueIdentifiers] AS TABLE(
[Id] [varchar](20) NOT NULL
)