Convert a varchar column to integer in Redshift - postgresql

Is there a way in Amazon Redshift to convert a varchar column (with values such as A,B,D,M) to integer (1 for A, 2 for B, 3 for C...and so on) ? I know teradata has something like ASCII() but that doesn't work in Redshift.
Note: My goal is to convert the varchar columns to a number in my query and compare those two columns to see if the numbers are same or different.

demo:db<>fiddle
Postgres:
SELECT
ascii(upper(t.letter)) - 64
FROM
table t
Explanation:
upper() makes the input to capital letters (to handle the different ascii value for capital and non-capital letters)
ascii() converts the letters to ASCII code. The capital letters begin at number 65.
decrease the input by 64 to shift from ASCII starting point == 65 downto 1
Redshift:
The ascii() function is marked as deprecated on Redshift (https://docs.aws.amazon.com/redshift/latest/dg/c_SQL_functions_leader_node_only.html)
So one possible (and more pragmatic) solution is to get a fixed alphabet string and give out the index for a given letter:
SELECT
letter,
strpos('ABCDEFGHIJKLMNOPQRSTUVWXYZ', upper(t.letter))
FROM
table t

Related

Postgres large numeric value operations

I am trying some operations on large numeric field such as 2^89.
Postgres numeric data type can store 131072 on left of decimal and 16383 digits on right of decimal.
I tried some thing like this and it worked:
select 0.037037037037037037037037037037037037037037037037037037037037037037037037037037037037037037037037037::numeric;
But when I put some operator, it rounds off values after 14 digits.
select (2^89)::numeric(40,0);
numeric
-----------------------------
618970019642690000000000000
(1 row)
I know the value from elsewhere is:
>>> 2**89
618970019642690137449562112
Why is this strange behavior. It is not letting me enter values beyond 14 digits numeric to database.
insert into x select (2^89-1)::numeric;
select * from x;
x
-----------------------------
618970019642690000000000000
(1 row)
Is there any way to circumvent this.
Thanks in advance.
bb23850
You should not cast the result but one part of the operation to make clear that this is a numeric operation, not an integer operation:
select (2^89::numeric)
Otherwise PostgreSQL takes the 2 and the 89 as type integer. In that case the result is type integer, too, which is not an exact value at that size. Your cast is a cast of that inaccurate result, so it cannot work.

Alphanumeric Sorting in PostgreSQL

I have this table with a character varying column in Postgres 9.6:
id | column
------------
1 |IR ABC-1
2 |IR ABC-2
3 |IR ABC-10
I see some solutions typecasting the column as bytea.
select * from table order by column::bytea.
But it always results to:
id | column
------------
1 |IR ABC-1
2 |IR ABC-10
3 |IR ABC-2
I don't know why '10' always comes before '2'. How do I sort this table, assuming the basis for ordering is the last whole number of the string, regardless of what the character before that number is.
When sorting character data types, collation rules apply - unless you work with locale "C" which sorts characters by there byte values. Applying collation rules may or may not be desirable. It makes sorting more expensive in any case. If you want to sort without collation rules, don't cast to bytea, use COLLATE "C" instead:
SELECT * FROM table ORDER BY column COLLATE "C";
However, this does not yet solve the problem with numbers in the string you mention. Split the string and sort the numeric part as number.
SELECT *
FROM table
ORDER BY split_part(column, '-', 2)::numeric;
Or, if all your numbers fit into bigint or even integer, use that instead (cheaper).
I ignored the leading part because you write:
... the basis for ordering is the last whole number of the string, regardless of what the character before that number is.
Related:
Alphanumeric sorting with PostgreSQL
Split comma separated column data into additional columns
What is the impact of LC_CTYPE on a PostgreSQL database?
Typically, it's best to save distinct parts of a string in separate columns as proper respective data types to avoid any such confusion.
And if the leading string is identical for all columns, consider just dropping the redundant noise. You can always use a VIEW to prepend a string for display, or do it on-the-fly, cheaply.
As in the comments split and cast the integer part
select *
from
table
cross join lateral
regexp_split_to_array(column, '-') r (a)
order by a[1], a[2]::integer

Inserting a substring of column in Redshift

Hello I am using Redshift where I have a staging table & a base table. one of the column (city) in my base table has data type varchar & its length is 100.When I am trying to insert the column value from staging table to base table, I want this value to be truncated to 1st 100 characters or leftmost 100 characters. Can this be possible in Redshift?
INSERT into base_table(org_city) select substring(city,0,100) from staging_table;
I tried using the above query but it failed. Any solutions please ?
Try this! Your base table column length is Varchar(100), so you need to substring 0-99 chars, which is 100 chars. You are trying to substring 101 chars.
INSERT into base_table(org_city) select substring(city,0,99) from staging_table;

Split a column value into two columns in query output (DB2)

How can I split a column value into two values in the output? I need have the numerals in one column and the alphabet in the other.
For Example 1
Existing
Column
========
678J
2345K
I need the output to be:
Column 1 Column 2
======== ========
678 J
2345 K
The existing column can have 4 or 5 characters, as shown in the example. There is no space.
Thanks in advance!!
You could convert all letters to spaces & strip them away, then do the opposite with digits in the other column:
SELECT trim(translate(mycol,repeat(' ',26),'ABCDEFGHIJKLMNOPQRSTUVWXYZ')) as col1,
trim(translate(mycol,repeat(' ',10),'0123456789')) as col2
FROM mytable
Adjust as necessary to translate additional characters.
I am not sure about the performance of WarrenT's solution, but it looks like very heavy solution. It does what it is supposed to be doing with little constraints on the the data. If you know more about the data, you can optimize.
String always ends with 1 and only one letter
select left(mycol, length(mycol)-1), right(mycol,1) from mytable

Order by char column numerically

How to Sort Character column numerically.
I have a column of numbers stored as chars. When I do a ORDER BY for this column I get the following:
100D
131A
200
21B
30
31000A
etc.
There may be chance of having one Alphabet at the end.
How can I order these chars numerically? Do I need to convert something or is there already an SQL command or function for this?
You could use something like:
ORDER BY Cast(regexp_replace(yourcolumn, '[^0-9]', '', 'g') as integer)