Postgresql 11.16 equal comparison on varchar with two space characters not working anymore - postgresql

We're experiencing a weird behavior on our select statements since we've updated from postgres 11.12 to 11.16.
We are selecting rows using a WHERE condition on a pretty simple Varchar column. The value that were looking for in the condition does contain two consecutive SPACE characters, something like this: WORD1 WORD2.
Our query for finding the necessary data looks like this:
SELECT * FROM table WHERE name = 'WORD1 WORD2';
While this query used to be working fine (and still does on our older test systems), right now it does not find the given entry in our productive environment. What seems to be working though is really wild:
-- LIKE with % after word working
SELECT * FROM table WHERE name LIKE 'WORD1 WORD2%';
-- LIKE with % before word working
SELECT * FROM table WHERE name LIKE '%WORD1 WORD2';
-- ILIKE without % working
SELECT * FROM table WHERE name ILIKE 'WORD1 WORD2';
-- IS NOT DISTINCT FROM working
SELECT * FROM table WHERE name IS NOT DISTINCT FROM 'WORD1 WORD2';
-- standard equals (=) NOT working
SELECT * FROM table WHERE name = 'WORD1 WORD2';
We double checked white space characters before and after the visible string, even updated the value with plain text again to make sure no strange non visible stuff is found in the value. Nothing worked. We also checked other entries inside of the same table with only a single space between words like WORD1 WORD2 which somehow still seems to be working, so from our perspective it seems to have something to do with equals and two white spaces in the varchar.
We are accessing the database through DB visualizer, through a Java 8 driver and through psql shell, all with the same result that equals does not seem to find the required entry.
Any help is greatly appreciate, we're a little out if ideas right now.

Related

How would I diagnose what error seems to lead to non-functional underscore wildcard queries in Postgresql 15?

I am working through a quick refresher ('SQL Handbook' by Flavio Copes), and any LIKE or ILIKE query I use with the underscore wildcard returns no results.
The table is created as such:
CREATE TABLE people (
names CHAR(20)
);
INSERT INTO people VALUES ('Joe'), ('John'), ('Johanna'), ('Zoe');
Given this table, I use the following query:
SELECT * FROM people WHERE names LIKE '_oe';
I expect it to return
names
1
Joe
2
Zoe
Instead, it returns
names
The install is PostgreSQL 15 (x64), pgAdmin 4, and PostGIS v3.3.1
Using char(20) means all strings are exactly 20 chars long, being padded with spaces out to that length. The spaces make it not match the pattern, as there is nothing in the pattern to accommodate spaces at the end.
If you make the pattern be '_oe%' it would work. Or better yet, don't use char(20).

Alphanumeric sorting without any pattern on the strings [duplicate]

I've got a Postgres ORDER BY issue with the following table:
em_code name
EM001 AAA
EM999 BBB
EM1000 CCC
To insert a new record to the table,
I select the last record with SELECT * FROM employees ORDER BY em_code DESC
Strip alphabets from em_code usiging reg exp and store in ec_alpha
Cast the remating part to integer ec_num
Increment by one ec_num++
Pad with sufficient zeors and prefix ec_alpha again
When em_code reaches EM1000, the above algorithm fails.
First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code, breaking the unique key constraint.
Any idea how to select EM1000?
Since Postgres 9.6, it is possible to specify a collation which will sort columns with numbers naturally.
https://www.postgresql.org/docs/10/collation.html
-- First create a collation with numeric sorting
CREATE COLLATION numeric (provider = icu, locale = 'en#colNumeric=yes');
-- Alter table to use the collation
ALTER TABLE "employees" ALTER COLUMN "em_code" type TEXT COLLATE numeric;
Now just query as you would otherwise.
SELECT * FROM employees ORDER BY em_code
On my data, I get results in this order (note that it also sorts foreign numerals):
Value
0
0001
001
1
06
6
13
۱۳
14
One approach you can take is to create a naturalsort function for this. Here's an example, written by Postgres legend RhodiumToad.
create or replace function naturalsort(text)
returns bytea language sql immutable strict as $f$
select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;
Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql
To use it simply call the function in your order by:
SELECT * FROM employees ORDER BY naturalsort(em_code) DESC
The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9.
You could solve it like this:
SELECT * FROM employees
ORDER BY substring(em_code, 3)::int DESC;
It would be more efficient to drop the redundant 'EM' from your em_code - if you can - and save an integer number to begin with.
Answer to question in comment
To strip any and all non-digits from a string:
SELECT regexp_replace(em_code, E'\\D','','g')
FROM employees;
\D is the regular expression class-shorthand for "non-digits".
'g' as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.
After replacing every non-digit with the empty string, only digits remain.
This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:
https://github.com/Bjond/pg_natural_sort_order
It's free to use, MIT license.
Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.
The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.
you can use just this line
"ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"
I wrote about this in detail in this related question:
Humanized or natural number sorting of mixed word-and-number strings
(I'm posting this answer as a useful cross-reference only, so it's community wiki).
I came up with something slightly different.
The basic idea is to create an array of tuples (integer, string) and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.
ORDER BY ARRAY(
SELECT ROW(
CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
match[2]
)
FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
AS match
)
I thought about another way of doing this that uses less db storage than padding and saves time than calculating on the fly.
https://stackoverflow.com/a/47522040/935122
I've also put it on GitHub
https://github.com/ccsalway/dbNaturalSort
The following solution is a combination of various ideas presented in another question, as well as some ideas from the classic solution:
create function natsort(s text) returns text immutable language sql as $$
select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$$;
The design goals of this function were simplicity and pure string operations (no custom types and no arrays), so it can easily be used as a drop-in solution, and is trivial to be indexed over.
Note: If you expect numbers with more than 20 digits, you'll have to replace the hard-coded maximum length 20 in the function with a suitable larger length. Note that this will directly affect the length of the resulting strings, so don't make that value larger than needed.

How to removing spacing in SQL

I have data in DB2 then i want to insert that data to SQL.
The DB2 data that i had is like :
select char('AAA ') as test from Table_1
But then, when i select in SQL after doing insert, the data become like this.
select test from Table_1
result :
test
------
AAA
why Space character read into box character. How do I fix this so that the space character is read into.
Or is there a setting I need to change? or do I have to use a parameter?
I used AS400 and datastage.
Thank you.
Datastage appends pad characters so you know that there are spaces there. The pad character is 0x00 (NUL) by default and that's what you're seeing.
Research the APT_STRING_PADCHAR environment variable; you can set it to something else if you want.
The 0x00 characters are not actually in your database. The short answer is, you can safely ignore it.
When you said:
select char('AAA ') as test from Table_1
You were not actually showing any data from the table. Instead you were showing an expression casting a constant AAA as a character value, and giving that result column the name test which coincidentally seems to be the name of a column in the table, although that coincidence doesn't matter here.
Then your 2nd statement does show the contents of the database column.
select test from Table_1
Find out what the hexadecimal value actually is.

tsql using like with wildcard and trailing space?

I cannot get the like statement to work with space and trailing wildcard.
My query goes as follows:
select * from Table where Field like 'Desc_%'
The data is space-delimited such as, Desc Top, Desc Bottom and so on. The query works when I use the pattern 'Desc_%' but not when I use the pattern 'Desc %'. The field is nvarchar(255).
Any ideas?
EDIT
Turns out the data was tab-delimited and when I copied a value from the 2008 Management Studio it converted the tab to space. Dumb mistake. I did like the [ ] tip so I marked it the answer. Thanks everyone, I'll remember not to trust the copy from the grid results.
Use brackets '[' & ']' to set up a single-character class to match. In your case the SQL should look like this: "select * from Table where Field like 'Desc[ ]%'"
EDIT: add sample, link
CREATE TABLE #findtest (mytext varchar(200) )
insert #findtest VALUES ('Desc r')
insert #findtest VALUES ('Descr')
select * from #findtest where mytext like 'Desc[ ]%'
DROP TABLE #findtest
(1 row(s) affected)
(1 row(s) affected)
mytext
--------
Desc r
(1 row(s) affected)
See this article.
Since an underscore is a single character wildcard, and percent is the multi-char wildcard, they are the same ( "%" and "_%" ). It is as if you are asking for two consecutive wildcards. Not sure if I understand your question, but I am not surprised it does not behave the way you expect.
Consider explicitly stating that you want a space, using its ASCII value?
SELECT * FROM Table WHERE Field Like 'Desc' + CHAR(32) + '%'

TSQL Prefixing String Literal on Insert - Any Value to This, or Redundant?

I just inherited a project that has code similar to the following (rather simple) example:
DECLARE #Demo TABLE
(
Quantity INT,
Symbol NVARCHAR(10)
)
INSERT INTO #Demo (Quantity, Symbol)
SELECT 127, N'IBM'
My interest is with the N before the string literal.
I understand that the prefix N is to specify encoding (in this case, Unicode). But since the select is just for inserting into a field that is clearly already Unicode, wouldn't this value be automatically upcast?
I've run the code without the N and it appears to work, but am I missing something that the previous programmer intended? Or was the N an oversight on his/her part?
I expect behavior similar to when I pass an int to a decimal field (auto-upcast). Can I get rid of those Ns?
Your test is not really valid, try something like a Chinese character instead, I remember if you don't prefix it it will not insert the correct character
example, first one shows a question mark while the bottom one shows a square
select '作'
select N'作'
A better example, even here the output is not the same
declare #v nvarchar(50), #v2 nvarchar(50)
select #v = '作', #v2 = N'作'
select #v,#v2
Since what you look like is a stock table why are you using unicode, are there even symbols that are unicode..I have never seen any and this includes ISIN, CUSIPS and SEDOLS
Yes, SQL Server will automatically convert (widen, cast down) varchar to nvarchar, so you can remove the N in this case. Of course, if you're specifying a string literal where the characters aren't actually present in the database's default collation, then you need it.
It's like you can suffix a number with "L" in C et al to indicate it's a long literal instead of an int. Writing N'IBM' is either being precise or a slave to habit, depending on your point of view.
One trap for the unwary: nvarchar doesn't get automatically converted to varchar, and this can be an issue if your application is all Unicode and your database isn't. For example, we had this with the jTDS JDBC driver, which bound all parameter values as nvarchar, resulting in statements effectively like this:
select * from purchase where purchase_reference = N'AB1234'
(where purchase_reference was a varchar column)
Since the automatic conversions are only one way, that became:
select * from purchase where CONVERT(NVARCHAR, purchase_reference) = N'AB1234'
and therefore the index of purchase_reference wasn't used.
By contrast, the reverse is fine: if purchase_reference was an nvarchar, and an application passed in a varchar parameter, then the rewritten query:
select * from purchase where purchase_reference = CONVERT(NVARCHAR, 'AB1234')
would be fine. In the end we had to disable binding parameters as Unicode, hence causing a raft of i18n problems that were considered less serious.