Alphanumeric Sorting in PostgreSQL - postgresql

I have this table with a character varying column in Postgres 9.6:
id | column
------------
1 |IR ABC-1
2 |IR ABC-2
3 |IR ABC-10
I see some solutions typecasting the column as bytea.
select * from table order by column::bytea.
But it always results to:
id | column
------------
1 |IR ABC-1
2 |IR ABC-10
3 |IR ABC-2
I don't know why '10' always comes before '2'. How do I sort this table, assuming the basis for ordering is the last whole number of the string, regardless of what the character before that number is.

When sorting character data types, collation rules apply - unless you work with locale "C" which sorts characters by there byte values. Applying collation rules may or may not be desirable. It makes sorting more expensive in any case. If you want to sort without collation rules, don't cast to bytea, use COLLATE "C" instead:
SELECT * FROM table ORDER BY column COLLATE "C";
However, this does not yet solve the problem with numbers in the string you mention. Split the string and sort the numeric part as number.
SELECT *
FROM table
ORDER BY split_part(column, '-', 2)::numeric;
Or, if all your numbers fit into bigint or even integer, use that instead (cheaper).
I ignored the leading part because you write:
... the basis for ordering is the last whole number of the string, regardless of what the character before that number is.
Related:
Alphanumeric sorting with PostgreSQL
Split comma separated column data into additional columns
What is the impact of LC_CTYPE on a PostgreSQL database?
Typically, it's best to save distinct parts of a string in separate columns as proper respective data types to avoid any such confusion.
And if the leading string is identical for all columns, consider just dropping the redundant noise. You can always use a VIEW to prepend a string for display, or do it on-the-fly, cheaply.

As in the comments split and cast the integer part
select *
from
table
cross join lateral
regexp_split_to_array(column, '-') r (a)
order by a[1], a[2]::integer

Related

How to get max length (in bytes) of a variable-length column?

I want to get max length (in bytes) of a variable-length column. One of my columns has the following definition:
shortname character varying(35) NOT NULL
userid integer NOT NULL
columncount smallint NOT NULL
I tried to retrieve some info from the pg_attribute table, but the attlen column has -1 value for all variable-length columns. I also tried to use pg_column_size function, but it doesn't accept the name of the column as an input parameter.
It can be easily done in SQL Server.
Are there any other ways to get the value I'm looking for?
You will need to use a CASE expression checks pg_attribute.attlen and then calculate the maximum size in bytes depending on that. To get the max size for a varchar column you can "steal" the expression used in information_schema.columns.character_octet_length for varchar or char columns
Something along the lines:
select a.attname,
t.typname,
case
when a.attlen <> -1 then attlen
when t.typname in ('bytea', 'text') then pg_size_bytes('1GB')
when t.typname in ('varchar', 'char') then information_schema._pg_char_octet_length(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*))
end as max_bytes
from pg_attribute a
join pg_type t on a.atttypid = t.oid
where a.attrelid = 'stuff.test'::regclass
and a.attnum > 0
and not a.attisdropped;
Note that this won't return a proper size for numeric as that is also a variable length type. The documentation says "he actual storage requirement is two bytes for each group of four decimal digits, plus three to eight bytes overhead".
As a side note: this seems an extremely strange thing to do. Especially with your mentioning of temp tables in stored procedures. More often than not, the use of temp tables is not needed in Postgres. Instead of blindly copying the old approach that might have worked well in SQL Server, you should understand how Postgres works and change the approach to match the best practices in Postgres.
I have seen many migrations fail or deliver mediocre performance because of the assumption that the best practices for "System A" can be applied without any change to "System B". You need to migrate your mindset as well.
If this checks the columns of a temp table, then why not simply check the actual size of the column values using pg_column_size()?

Convert a varchar column to integer in Redshift

Is there a way in Amazon Redshift to convert a varchar column (with values such as A,B,D,M) to integer (1 for A, 2 for B, 3 for C...and so on) ? I know teradata has something like ASCII() but that doesn't work in Redshift.
Note: My goal is to convert the varchar columns to a number in my query and compare those two columns to see if the numbers are same or different.
demo:db<>fiddle
Postgres:
SELECT
ascii(upper(t.letter)) - 64
FROM
table t
Explanation:
upper() makes the input to capital letters (to handle the different ascii value for capital and non-capital letters)
ascii() converts the letters to ASCII code. The capital letters begin at number 65.
decrease the input by 64 to shift from ASCII starting point == 65 downto 1
Redshift:
The ascii() function is marked as deprecated on Redshift (https://docs.aws.amazon.com/redshift/latest/dg/c_SQL_functions_leader_node_only.html)
So one possible (and more pragmatic) solution is to get a fixed alphabet string and give out the index for a given letter:
SELECT
letter,
strpos('ABCDEFGHIJKLMNOPQRSTUVWXYZ', upper(t.letter))
FROM
table t

PostgreSQL query on a text column ignoring special characters

I have a table which contains a text column, say vehicle number.
Now I want to query the table for fields which contain a particular vehicle number.
While matching I do not want to consider non-alphanumeric characters.
example: query condition - DEL123
should match - DEL-123, DEL/123, DEL#123, etc...
If you know which characters to skip, put them as the second parameter of this translate() call (which is faster than regexp functions):
select *
from a_table
where translate(code, '-/#', '') = 'DEL123';
Else, you can compare only alphanumeric characters using regexp_replace():
select *
from a_table
where regexp_replace(code, '[^[:alnum:]]', '', 'g') = 'DEL123';
#klin's answer is great, but is not sargable, so in cases where you're searching through millions of records (maybe not your case, but perhaps someone else with a similar question looking for answers), using regular expressions will likely render much better results.
The following will use indexes on code significantly reducing the number of rows tested:
select *
from a_table
where code ~ '^DEL[^[:alnum:]]*123$';

Get substring from a column with delimiter and varying number of elements

A column in my table in Postgres has varchar values in the format: 'str1/str2' or 'str1/str2/str3' where str represents any string.
I want to write a select query which will return me str2. I surfed but couldn't find any proper function.
Use split_part():
SELECT split_part(col, '/', 2) AS result
FROM tbl;
As Victoria pointed out, the index is 1-based,
Obviously, the delimiter needs to be unambiguous. It (/ in your example) cannot cannot be part of a substring. (Unless that's to the right of what you extract, which is ignored anyway.)
Related:
Split comma separated column data into additional columns

ltrim(rtrim(x)) leave blanks on rtl content - anyone knows on a work around?

i have a table [Company] with a column [Address3] defined as varchar(50)
i can not control the values entered into that table - but i need to extract the values without leading and trailing spaces. i perform the following query:
SELECT DISTINCT RTRIM(LTRIM([Address3])) Address3 FROM [Company] ORDER BY Address3
the column contain both rtl and ltr values
most of the data retrieved is retrieved correctly - but SOME (not all) RTL values are returned with leading and or trailing spaces
i attempted to perform the following query:
SELECT DISTINCT ltrim(rTRIM(ltrim(rTRIM([Address3])))) c, ltrim(rTRIM([Address3])) b, [Address3] a, rtrim(LTRIM([Address3])) Address3 FROM [Company] ORDER BY Address3
but it returned the same problem on all columns - anyone has any idea what could cause it?
The rows that return with extraneous spaces might have a kind of space or invisible character the trim functions don't know about. The documentation doesn't even mention what is considered "a blank" (pretty damn sloppy if you ask me). Try taking one of those rows and looking at the characters one by one to see what character they are.
since you are using varchar, just do this to get the ascii code of all the bad characters
--identify the bad character
SELECT
COUNT(*) AS CountOf
,'>'+RIGHT(LTRIM(RTRIM(Address3)),1)+'<' AS LastChar_Display
,ASCII(RIGHT(LTRIM(RTRIM(Address3)),1)) AS LastChar_ASCII
FROM Company
GROUP BY RIGHT(LTRIM(RTRIM(Address3)),1)
ORDER BY 3 ASC
do a one time fix to data to remove the bogus character, where xxxx is the ASCII value identified in the previous select:
--only one bad character found in previous query
UPDATE Company
SET Address3=REPLACE(Address3,CHAR(xxxx),'')
--multiple different bad characters found by previous query
UPDATE Company
SET Address3=REPLACE(REPLACE(Address3,CHAR(xxxx1),''),char(xxxx2),'')
if you have bogus chars in your data remove them from the data and not each time you select the data. you WILL have to add this REPLACE logic to all INSERTS and UPDATES on this column, to keep any new data from having the bogus characters.
If you can't alter the data, you can just select it this way:
SELECT
LTRIM(RTRIM(REPLACE(Address3,CHAR(xxxx),'')))
,LTRIM(RTRIM(REPLACE(REPLACE(Address3,CHAR(xxxx1),''),char(xxxx2),'')))
...