SQL Server's isNumeric() equivalent in amazon redshift - amazon-redshift

I'm using amazon redshift as my data warehouse
I have a field (field1)of type string. Some of the strings start with four numbers and others with letters:
'test alpha'
'1382 test beta'
I want to filter out rows where the string does not start with four numbers
Looking at the redshift documentation, I don't believe isnumber or isnumeric are functions. It seems that the 'like' function is the best possibility.
I tried
where left(field1, 4) like '[0-9][0-9][0-9][0-9]'
this did not work and from the link below seems like redshift may not support that:
https://forums.aws.amazon.com/message.jspa?messageID=439850
is there an error in the 'where' clause? if not and that clause isn't supported in redshift, is there a way to filter? I was thinking of using cast
cast(left(field1,4) as integer)
and then passing over the row if it generated an error, but not sure how to do this in amazon redshift. or is there some other proxy for the isnumeric filter.
thanks

Try something like:
where field1 ~ '^[0-9]{4}'
It will match any string, that starts with 4 digits.

Although long time has passed since this question was asked I have not found an adequate response. So I feel obliged to share my solution which works fine on my Redshift cluster today (March 2016).
The UDF function is:
create or replace function isnumeric (aval VARCHAR(20000))
returns bool
IMMUTABLE
as $$
try:
x = int(aval);
except:
return (1==2);
else:
return (1==1);
$$ language plpythonu;
Usage would be:
select isnumeric(mycolumn), * from mytable
where isnumeric(mycolumn)=false

It looks like the code you are looking for the is the similar to function:
where left(field,4) similar to '[0-9]{4}'
Redshift doc

It seems that redshift doesn't support any of the following:
where left(field1,4) like '[0-9][0-9][0-9][0-9]'
where left(field1,4) ~ '^[0-9]{4}'
where left(field1,4) like '^[0-9]{4}'
what does seem to work is:
where left(field1,4) between 0 and 9999
this returns all rows that start with four numeric characters.
it seems that even though field1 is type string, the 'between' function interprets left(field1,4) as a single integer when the string characters are numeric (and does not give an error when they are not numeric). I'll follow up if I find a problem. For instance I don't deal with anything less than 1000, so I assume, but am not sure, that 0001 is interpreted as 1.

Per Amazon, the posix style ~regex style expressions are slow...
https://docs.aws.amazon.com/redshift/latest/dg/pattern-matching-conditions.html
Using their own REGEXP_* functions seems to be faster.
https://docs.aws.amazon.com/redshift/latest/dg/String_functions_header.html
For checking just a true/false for integers I've been using the following with success.
REGEXP_COUNT(my_field_to_check, '^[0-9]+$') > 0
this returns 1 if only numeric, 0 if anything else

where regexp_instr(field1,'^[0-9]{4}') = 0
will remove rows starting with 4 digits (the above regexp_instr will return 1 for the rows with field1 starting with 4 digits)

We have tried the following and worked for most of our scenarios:
columnn ~ '^[-]{0,1}[0-9]{1,}[.]{0,1}[0-9]{0,}$'
This will positive, negative, integer and float numbers.

redshift should support similar to.
WHERE field1 SIMILAR TO '[0-9]{4}%'
This reads as where field1 starts with 4 characters in the range of 0-9, then anything else.

Related

PostgreSQL - number treated as the string, most of aggregate functions don't work

I am new to PostgreSQL. I am watching the tutirial from FreeCodeCamp by following their examples:
https://www.youtube.com/watch?v=qw--VYLpxG4&t=7241s&ab_channel=freeCodeCamp.org
But instead of PostgreSQL I use the Server-based platform (PhpPgAdmin).
The problem is, that neither SUM nor AVG and many others aggregate functions cannot be executed.
The problem seem to be the same constantly:
"Function ... does not exist"
No function matches the given name and argument types. You might need to add explicit type casts.
I found some similar problem here:
No function matches the given name and argument types
but it's related to more complicated example.
What I guess, the PhpPgAdmin treats all my numbers as the strings and here is the problem.
I tried this example:
How do I convert an integer to string as part of a PostgreSQL query?
but it returns the other error:
operator does not exist: character varying = bigint
I think the $ before the price is not a problem as the MIN and MAX functions work.
What is the reason behind it?
You may trim the leading $ in the price column, then cast the string amount to float, before summing, e.g.
SELECT SUM(CAST(TRIM('$' FROM price) AS float)) AS total_sum
FROM car;

IBM DataStage: Evaluate string as code/expression

I have a complex transformation where a lookup stage specifies one of approximately 30 different/specific string operations that has to be done on a row. I am wondering how to do this efficiently in DataStage?
The requirement is something like this:
If
col_a = 1
Then
col_b := some_string_function(col_c)
Else If
col_a = 2
Then
col_b := some_other_string_function(col_d)
Else If
col_a = 3
Then
col_b := yet_another_string_function(col_c & col_d)
Else If ...
... and so on.
What I have explored so far:
My first impulse was to include the code (field name(s) and string functions) as string/field in the lookup table and use that code after the lookup in a transformer stage expression. However, there seems to be no way to evaluate a string as code inside a transformer expression?
Another solution I have come up with is to put the code into a lot of nested control statements inside a transformer stage, which seems terribly inefficient, especially since DataStage does not seem to offer a control statement equivalent to something like "CASE"/"SWITCH". Or does it?
Substituting (part of) the control statements with a switch stage feeding into different lookup/transfomer stages would seem more efficient since they could be done in parallel but would be a pain to design.
I have not yet dabbled in server routines.
I'm familiar with Datastage 8.5. Having a long If/Then/Else statement in the transform would work, but yes, it's messy and inefficient.
My first thought is to use a Server routine, of type Transform function.
The function could work like this:
Transform function
Arguments: col_A, col_C, col_D
FUNCTION CALC_B(col_A,col_C,col_D)
Begin Case
Case colA = 1
Ans = StringFunc(colC)
Case colA = 2
Ans = OtherStringFunc(colC,colD)
Case colA = 3 OR colA = 4
Ans = YetOtherStringFunc(colC,colD)
End Case
Then in your transform you could use the function to set your col_B value.
CALC_B(myrow.colA,myrow.colC,myrow.colD)
I think the biggest problem with this is whether BASIC has the string operations you need. Below is a link to their programming page.
IBM - Working with Routines
IBM - Basic Programming Language
DataStage BASIC has a bazillion string functions. OK, maybe only 440 or so.
Its CASE construct compiles to the equivalent If..Then..Else structure.
Actually, both of them compile to a series of TEST..JUMP instructions at the lowest level.

DB2 query results in Hex format -- Need Character/String

I have a query that I can run on a DB2 table using my python SQL tester, and it returns the string values I'm looking for.
However, when I run it directly on my database it returns a hex value. Any help in getting the results as a character string would be greatly appreciated!
Here is the field definition:
ORCTL CCDATA 243 A 14 256 Order Control File Data
My query on the iSeries is:
select ccdata from ORCTL where ccctlk = 'BUYRAK'
Using your query, you can cast a string with another CCSID, eg :
select cast(ccdata as char(14) CCSID 37) from ORCTL where ccctlk = 'BUYRAK'
My guess is that it isn't returning hex...
Rather it's returning EBCDIC.
Your column is probably tagged with CCSID 65535, which tells the system not to translate it.
The right way to fix this issue, is to make sure the column is tagged with the appropriate CCSID; for example, 37 for US English.
The alternative, is to look for a "force translate" option in the settings of driver you're using.

DB2 LIKE operator character range

I'm attempting to construct a LIKE operator in my query on DB2 that is checking if a varchar is just two digits. I've looked online and it seems like DB2 does not support a character range i.e. [0-9]. I've tried LIKE '[0-9][0-9]' and I didn't get an error from DB2, but no rows showed up in my result set from that query when I can see rows that exactly match this through looking at a SELECT * of the same table.
Is there anyway I can replicate this in DB2 if it is indeed true? Is my syntax for the LIKE wrong? Thanks in advance.
The TRANSLATE function is more appropriate for validating an expression that contains a limited number of valid values.
WHERE TRANSLATE( yourExpressionOrColumn, '000000000', '123456789') = '00'
Found it. No you cannot and there are no symbols that can represent an OR in LIKE.

how to evaluate the condition?

One of row will have the data as 30 > 50, or 170 > 40 etc
How to evalute this varchar column data to find what it represents true or false.
SQL Server 2008 R2 and above.
If you MUST keep your data in this (painful) form, then your best bet will probably be to parse the string into it's individual parts. Something like the following steps:
Parse string into operand1, operator, operand2
Cast operand1 and operand2 to int
Probably go into some painful case statement to apply the correct operator based on what you parsed out
I would personally recommend finding a way to calculate this before you insert into the database. What you are storing and what you need are very far apart right now.