How would I diagnose what error seems to lead to non-functional underscore wildcard queries in Postgresql 15? - postgresql

I am working through a quick refresher ('SQL Handbook' by Flavio Copes), and any LIKE or ILIKE query I use with the underscore wildcard returns no results.
The table is created as such:
CREATE TABLE people (
names CHAR(20)
);
INSERT INTO people VALUES ('Joe'), ('John'), ('Johanna'), ('Zoe');
Given this table, I use the following query:
SELECT * FROM people WHERE names LIKE '_oe';
I expect it to return
names
1
Joe
2
Zoe
Instead, it returns
names
The install is PostgreSQL 15 (x64), pgAdmin 4, and PostGIS v3.3.1

Using char(20) means all strings are exactly 20 chars long, being padded with spaces out to that length. The spaces make it not match the pattern, as there is nothing in the pattern to accommodate spaces at the end.
If you make the pattern be '_oe%' it would work. Or better yet, don't use char(20).

Related

Postgresql 11.16 equal comparison on varchar with two space characters not working anymore

We're experiencing a weird behavior on our select statements since we've updated from postgres 11.12 to 11.16.
We are selecting rows using a WHERE condition on a pretty simple Varchar column. The value that were looking for in the condition does contain two consecutive SPACE characters, something like this: WORD1 WORD2.
Our query for finding the necessary data looks like this:
SELECT * FROM table WHERE name = 'WORD1 WORD2';
While this query used to be working fine (and still does on our older test systems), right now it does not find the given entry in our productive environment. What seems to be working though is really wild:
-- LIKE with % after word working
SELECT * FROM table WHERE name LIKE 'WORD1 WORD2%';
-- LIKE with % before word working
SELECT * FROM table WHERE name LIKE '%WORD1 WORD2';
-- ILIKE without % working
SELECT * FROM table WHERE name ILIKE 'WORD1 WORD2';
-- IS NOT DISTINCT FROM working
SELECT * FROM table WHERE name IS NOT DISTINCT FROM 'WORD1 WORD2';
-- standard equals (=) NOT working
SELECT * FROM table WHERE name = 'WORD1 WORD2';
We double checked white space characters before and after the visible string, even updated the value with plain text again to make sure no strange non visible stuff is found in the value. Nothing worked. We also checked other entries inside of the same table with only a single space between words like WORD1 WORD2 which somehow still seems to be working, so from our perspective it seems to have something to do with equals and two white spaces in the varchar.
We are accessing the database through DB visualizer, through a Java 8 driver and through psql shell, all with the same result that equals does not seem to find the required entry.
Any help is greatly appreciate, we're a little out if ideas right now.

Alphanumeric sorting without any pattern on the strings [duplicate]

I've got a Postgres ORDER BY issue with the following table:
em_code name
EM001 AAA
EM999 BBB
EM1000 CCC
To insert a new record to the table,
I select the last record with SELECT * FROM employees ORDER BY em_code DESC
Strip alphabets from em_code usiging reg exp and store in ec_alpha
Cast the remating part to integer ec_num
Increment by one ec_num++
Pad with sufficient zeors and prefix ec_alpha again
When em_code reaches EM1000, the above algorithm fails.
First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code, breaking the unique key constraint.
Any idea how to select EM1000?
Since Postgres 9.6, it is possible to specify a collation which will sort columns with numbers naturally.
https://www.postgresql.org/docs/10/collation.html
-- First create a collation with numeric sorting
CREATE COLLATION numeric (provider = icu, locale = 'en#colNumeric=yes');
-- Alter table to use the collation
ALTER TABLE "employees" ALTER COLUMN "em_code" type TEXT COLLATE numeric;
Now just query as you would otherwise.
SELECT * FROM employees ORDER BY em_code
On my data, I get results in this order (note that it also sorts foreign numerals):
Value
0
0001
001
1
06
6
13
۱۳
14
One approach you can take is to create a naturalsort function for this. Here's an example, written by Postgres legend RhodiumToad.
create or replace function naturalsort(text)
returns bytea language sql immutable strict as $f$
select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;
Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql
To use it simply call the function in your order by:
SELECT * FROM employees ORDER BY naturalsort(em_code) DESC
The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9.
You could solve it like this:
SELECT * FROM employees
ORDER BY substring(em_code, 3)::int DESC;
It would be more efficient to drop the redundant 'EM' from your em_code - if you can - and save an integer number to begin with.
Answer to question in comment
To strip any and all non-digits from a string:
SELECT regexp_replace(em_code, E'\\D','','g')
FROM employees;
\D is the regular expression class-shorthand for "non-digits".
'g' as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.
After replacing every non-digit with the empty string, only digits remain.
This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:
https://github.com/Bjond/pg_natural_sort_order
It's free to use, MIT license.
Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.
The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.
you can use just this line
"ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"
I wrote about this in detail in this related question:
Humanized or natural number sorting of mixed word-and-number strings
(I'm posting this answer as a useful cross-reference only, so it's community wiki).
I came up with something slightly different.
The basic idea is to create an array of tuples (integer, string) and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.
ORDER BY ARRAY(
SELECT ROW(
CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
match[2]
)
FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
AS match
)
I thought about another way of doing this that uses less db storage than padding and saves time than calculating on the fly.
https://stackoverflow.com/a/47522040/935122
I've also put it on GitHub
https://github.com/ccsalway/dbNaturalSort
The following solution is a combination of various ideas presented in another question, as well as some ideas from the classic solution:
create function natsort(s text) returns text immutable language sql as $$
select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$$;
The design goals of this function were simplicity and pure string operations (no custom types and no arrays), so it can easily be used as a drop-in solution, and is trivial to be indexed over.
Note: If you expect numbers with more than 20 digits, you'll have to replace the hard-coded maximum length 20 in the function with a suitable larger length. Note that this will directly affect the length of the resulting strings, so don't make that value larger than needed.

How to find all tables whose name ends with a certain suffix in the Firebird system tables

I'm trying to run the following query to get a list of table names that match a pattern. I have tables in my db that has names ends with T, but the following query doesn't work. It doesn't return me any table names. If I get rid of T, only leave % in the quotes, it gives me all the table names in the db.
select rdb$relation_name
from rdb$relations
where rdb$relation_name like '%T';
The problem is that the datatype of RDB$RELATION_NAME is CHAR(31) (CHAR(63) in Firebird 4), which means it is padded with spaces. Comparisons with LIKE do not ignore trailing spaces, contrary to equality comparison which does ignore trailing spaces.
For correct comparisons you can TRIM the trailing spaces from the value:
where trim(trailing from rdb$relation_name) like '%T'
or use a SQL regular expression with SIMILAR TO:
where rdb$relation_name similar to '%T *'
Which is similar to the like, but specifies it is followed by zero or more spaces.

Postgresql constraint to check for non-ascii characters

I have a Postgresql 9.3 database that is encoded 'UTF8'. However, there is a column in database that should never contain anything but ASCII. And if non-ascii gets in there, it causes a problem in another system that I have no control over. Therefore, I want to add a constraint to the column. Note: I already have a BEFORE INSERT trigger - so that might be a good place to do the check.
What's the best way to accomplish this in PostgreSQL?
You can define ASCII as ordinal 1 to 127 for this purpose, so the following query will identify a string with "non-ascii" values:
SELECT exists(SELECT 1 from regexp_split_to_table('abcdéfg','') x where ascii(x) not between 1 and 127);
but it's not likely to be super-efficient, and the use of subqueries would force you to do it in a trigger rather than a CHECK constraint.
Instead I'd use a regular expression. If you want all printable characters then you can use a range in a check constraint, like:
CHECK (my_column ~ '^[ -~]*$')
this will match everything from the space to the tilde, which is the printable ASCII range.
If you want all ASCII, printable and nonprintable, you can use byte escapes:
CHECK (my_column ~ '^[\x00-\x7F]*$')
The most strictly correct approach would be to convert_to(my_string, 'ascii') and let an exception be raised if it fails ... but PostgreSQL doesn't offer an ascii (i.e. 7-bit) encoding, so that approach isn't possible.
Use a CHECK constraint built around a regular expression.
Assuming that you mean a certain column should never contain anything but the lowercase letters from a to z, the uppercase letters from A to Z, and the numbers 0 through 9, something like this should work.
alter table your_table
add constraint allow_ascii_only
check (your_column ~ '^[a-zA-Z0-9]+$');
This is what people usually mean when they talk about "only ASCII" with respect to database columns, but ASCII also includes glyphs for punctuation, arithmetic operators, etc. Characters you want to allow go between the square brackets.

Is name a special keyword in PostgreSQL?

I am using Ubuntu and PostgreSql 8.4.9.
Now, for any table in my database, if I do select table_name.name from table_name, it shows a result of concatenated columns for each row, although I don't have any name column in the table. For the tables which have name column, no issue. Any idea why?
My results are like this:
select taggings.name from taggings limit 3;
---------------------------------------------------------------
(1,4,84,,,PlantCategory,soil_pref_tags,"2010-03-18 00:37:55")
(2,5,84,,,PlantCategory,soil_pref_tags,"2010-03-18 00:37:55")
(3,6,84,,,PlantCategory,soil_pref_tags,"2010-03-18 00:37:55")
(3 rows)
select name from taggings limit 3;
ERROR: column "name" does not exist
LINE 1: select name from taggings limit 3;
This is a known confusing "feature" with a bit of history. Specifically, you could refer to tuples from the table as a whole with the table name, and then appending .name would invoke the name function on them (i.e. it would be interpreted as select name(t) from t).
At some point in the PostgreSQL 9 development, Istr this was cleaned up a bit. You can still do select t from t explicitly to get the rows-as-tuples effect, but you can't apply a function in the same way. So on PostgreSQL 8.4.9, this:
create table t(id serial primary key, value text not null);
insert into t(value) values('foo');
select t.name from t;
produces the bizarre:
name
---------
(1,foo)
(1 row)
but on 9.1.1 produces:
ERROR: column t.name does not exist
LINE 1: select t.name from t;
^
as you would expect.
So, to specifically answer your question: name is a standard type in PostgreSQL (used in the catalogue for table names etc) and also some standard functions to convert things to the name type. It's not actually reserved, just the objects that exist called that, plus some historical strange syntax, made things confusing; and this has been fixed by the developers in recent versions.
According to the PostgreSQL documentation, name is a "non-reserved" keyword in PostgreSQL, SQL:2003, SQL:1999, or SQL-92.
SQL distinguishes between reserved and non-reserved key words. According to the standard, reserved key words are the only real key words; they are never allowed as identifiers. Non-reserved key words only have a special meaning in particular contexts and can be used as identifiers in other contexts. Most non-reserved key words are actually the names of built-in tables and functions specified by SQL. The concept of non-reserved key words essentially only exists to declare that some predefined meaning is attached to a word in some contexts.
The suggested fix when using keywords is:
As a general rule, if you get spurious parser errors for commands that contain any of the listed key words as an identifier you should try to quote the identifier to see if the problem goes away.