Test for numeric value? - db2

The vendor data we load in our staging table is rather dirty. One column in particular captures number data but 40% of the time has garbage characters or random strings.
I have to create a report that filters out value ranges in that column. So, I tried playing with a combination of replace/translate like so
select replace(translate(upper(str),' ','all possible char'),' ','')
from table
but it fails whenever it encounters a char I did not code. Therefore, the report can never be automated.
Javascript has the isNaN() function to determine whether a value is an illegal number (True if it is and false if not).
How can I do the same thing with DB2?? Do you have any idea?
Thanks in advance.

A fairly reliable (but somewhat hackish) way is to compare the string to its upper- and lower-case self (numbers don't have different cases). As long as your data that is bringing in characters only includes Latin characters, you should be fine:
SELECT input, CASE
WHEN UPPER(input) = LOWER(input) THEN TO_NUMBER(input)
ELSE 0
END AS output
FROM source
Another option would be to use the TRANSLATE function:
SELECT input,
CASE
WHEN TRANSLATE(CAST(input as CHAR(10)), '~~~~~~~~~~~~~', '0123456789-. ') = '~~~~~~~~~~' THEN CAST(input AS DECIMAL(12, 2))
ELSE 0
END AS num
FROM x

WITH x (stringval) AS
(
VALUES ('x2'),(''),('2.2.'),('5-'),('-5-'),('--5'),('.5'),('2 2'),('0.5-'),(' 1 '),('2 '),('3.'),('-4.0')
)
SELECT stringval,
CASE WHEN (
-- Whitespace must not appear in the middle of a number
-- (but trailing and/or leading whitespace is permitted)
RTRIM(LTRIM( stringval )) NOT LIKE '% %'
-- A number cannot start with a decimal point
AND LTRIM( stringval ) NOT LIKE '.%'
-- A negative decimal number must contain at least one digit between
-- the negative sign and the decimal point
AND LTRIM( stringval ) NOT LIKE '-.%'
-- The negative sign may only appear at the beginning of the number
AND LOCATE( '-', LTRIM(stringval)) IN ( 0, 1 )
-- A number must contain at least one digit
AND TRANSLATE( stringval, '0000000000', '123456789') LIKE '%0%'
-- Allow up to one negative sign, followed by up to one decimal point
AND REPLACE(
TRANSLATE( RTRIM(LTRIM(stringval)), '000000000', '123456789'),
'0', '') IN ('','-','.','-.')
)
THEN 'VALID'
ELSE 'INVALID'
END AS stringisvalidnumber
FROM x
;

Check this out:
SELECT Mobile,
TRANSLATE(Mobile, '~~~~~~~~~~', '0123456789') AS FirstPass,
TRANSLATE(TRANSLATE(Mobile, '~~~~~~~~~~', '0123456789'), '', '~') AS Erroneous,
REPLACE(TRANSLATE(Mobile, '', TRANSLATE(TRANSLATE(Mobile, '~~~~~~~~~~', '0123456789'), '', '~')), ' ', '') AS Corrected
FROM Person WHERE Mobile <> '' FETCH FIRST 100 ROWS ONLY
The table is "Person" and the field that you want to check is "Mobile".
If you work a little bit more on this, you can build an UPDATE to fix the entire table

Related

Cast to int instead of decimal?

I have field that has up to 9 comma separated values each of which have a string value and a numeric value separated by colon. After parsing them all some of the values between 0 and 1 are being set to an integer rather than a numeric as cast. The problem is obviously related to data type but I am unsure what is causing it or how to fix it. The problem only exists in the case statement, the split_part function seems to be working perfect.
Things I have tried:
nvl(split_part(one,':',2),0) = COALESCE types text and integer cannot be matched
nvl(split_part(one,':',2)::numeric,0) => Invalid input syntax for type numeric
numerous other cast/convert variations
(CASE WHEN split_part(one,':',2) = '' THEN 0::numeric ELSE split_part(one,':',2)::numeric END)::numeric => runs but get int value of 0
When using the split_part function outside of case statement it does work correctly. However, I need the result to be zero for null values.
split_part(one,':',2) => 0.02068278096187390979 (expected result)
When running the code above I get zero but expect 0.02068278096187390979
Field "one" has the following value 'xyz: 0.02068278096187390979' before the split_part function.
EXAMPLE:
create table test(one varchar);
insert into test values('XYZ: 0.50000000000000000000')
select
one ,split_part(one,':',2) as correct_value_for_those_that_are_not_null ,
case
when split_part(one,':',2) = '' then null
else split_part(one,':',2)::numeric
end::numeric as this_one_is_the_problem
from test
However, I need the result to be zero for null values.
Your example does not deal with NULL values at all, though. Only addressing the empty string ('').
To replace either with 0 reliably, efficiently and without casting issues:
SELECT part1, CASE WHEN part2 <> '' THEN part2::numeric ELSE numeric '0' END AS part2
FROM (
SELECT split_part(one, ':', 1) AS part1
, split_part(one, ':', 2) AS part2
FROM test
) sub;
See:
Best way to check for "empty or null value"
Also note that all SQL CASE branches must agree on a common data type. There have been minor adjustments in the logic that determines the resulting type in the past, so the version of Postgres may play a role in corner cases. Don't recall the details now.
nvl()is not a Postgres function. You probably meant COALESCE. The manual:
This SQL-standard function provides capabilities similar to NVL and IFNULL, which are used in some other database systems.

In DB2 SQL, is it possible to set a variable in the SELECT statement to use multiple times..?

In DB2 SQL, is it possible to SET a variable with the contents of a returned field in the SELECT statement, to use multiple times for calculated fields and criteria further along in the same SELECT statement?
The purpose is to shrink and streamline the code, by doing a calculation once at the beginning and using it multiple times later on...including the HAVING, WHERE, and ORDER BY.
To be honest, I'm not sure this is possible in any version of SQL, much less DB2.
This is on an IBM iSeries 8202 with DB2 SQL v6, which unfortunately is not a candidate for upgrade at this time. This is a very old & messy database, which I have no control over. I must regularly include "cleanup functions" in my SQL.
To to clarify the question, note the following pseudocode. Actual working code follows further below.
DECLARE smnum INTEGER --Not sure if this is correct.
SELECT
-- This is where I'm not sure what to do.
SET CAST((CASE WHEN %smnum%='' THEN '0' ELSE %smnum% END) AS INTEGER) INTO smnum,
%smnum% AS sm,
invdat,
invno,
daqty,
dapric,
dacost,
(dapric-dacost)*daqty AS profit
FROM
saleshistory
WHERE
%smNum% = 30
ORDER BY
%smnum%
Below is my actual working SQL. When adjusted for 2017 or 2016, it can return >10K rows, depending on the salesperson. The complete table has >22M rows.
That buttload of CASE((CAST... function is what I wish to replace with a variable. This is not the only example of this. If I can make it work, I have many other queries that could benefit from the technique.
SELECT
CAST((CASE WHEN TRIM(DASM#)='' THEN '0' ELSE TRIM(DASM#) END) AS INTEGER) AS DASM,
DAIDAT,
DAINV# AS DAINV,
DALIN# AS DALIN,
CAST(TRIM(DAITEM) AS INTEGER) AS DAITEM,
TRIM(DABSW) AS DABSW,
TRIM(DAPCLS) AS DAPCLS,
DAQTY,
DAPRIC,
DAICOS,
DADPAL,
(DAPRIC-DAICOS+DADPAL)*DAQTY AS PROFIT
FROM
VIPDTAB.DAILYV
WHERE
CAST((CASE WHEN TRIM(DASM#)='' THEN '0' ELSE TRIM(DASM#) END) AS INTEGER)=30 AND
TRIM(DABSW)='B' AND
DAIDAT BETWEEN (YEAR(CURDATE())*10000) AND (((YEAR(CURDATE())+1)*10000)-1) AND
CAST(TRIM(DACOMP) AS INTEGER)=1
ORDER BY
CAST((CASE WHEN TRIM(DASM#)='' THEN '0' ELSE TRIM(DASM#) END) AS INTEGER),
DAIDAT,
DAINV#,
DALIN#
Just use a subquery or CTE. I can't figure out the actual logic you want, but the structure looks like this:
select . . .
from (select d.*,
(CASE . . . END) as calc_field
from VIPDTAB.DAILYV d
) d
No variable declaration is needed.
Here is what your SQL would look like with the sub-query that Gordon suggested:
SELECT
DASM,
DAIDAT,
DAINV# AS DAINV,
DALIN# AS DALIN,
CAST(DAITEM AS INTEGER) AS DAITEM,
TRIM(DABSW) AS DABSW,
TRIM(DAPCLS) AS DAPCLS,
DAQTY,
DAPRIC,
DAICOS,
DADPAL,
(DAPRIC-DAICOS+DADPAL)*DAQTY AS PROFIT
FROM
(SELECT
D.*,
CAST((CASE WHEN D.DASM#='' THEN '0' ELSE D.DASM# END) AS INTEGER) AS DASM
FROM VIPDTAB.DAILYV D
) D
WHERE
DASM=30 AND
TRIM(DABSW)='B' AND
DAIDAT BETWEEN (YEAR(CURDATE())*10000) AND (((YEAR(CURDATE())+1)*10000)-1) AND
CAST(DACOMP AS INTEGER)=1
ORDER BY
DASM,
DAIDAT,
DAINV#,
DALIN#
Notice that I removed a lot of the trim() functions, and you could likely remove the rest. The way IBM resolves the Varchar vs. Char comparison thing is by ignoring trailing blanks. So trim(anything) = '' is the same as anything = ''. And since cast(' 123 ' as integer) = 123, I have removed trims from within the cast functions as well. In addition trim(dabsw) = 'B' is the same as dabsw = 'B' as long as the 'B' is the first character in dabsw. So you could even remove that trim if all you are concerned with is trailing blanks.
Here are some additional notes based on comments. The above paragraph is not talking about auto-trim. Fixed length fields will always return as fixed length fields, the trailing blanks will remain. But in comparisons and expressions where trailing blanks are unimportant, or even a hindrance, they are ignored. In expressions where trailing blanks are important, like concatenation, the trailing blanks are not ignored. Another thing, trim() removes both leading and trailing blanks. If you are using trim() to read a fixed length character field into a Varchar, then rtrim() is likely the better choice as it only removes the trailing blanks.
Also, I didn't go through your fields to make sure I got everything you need, I just used * in the sub-query. For performance, it would be best to only return the fields you need. So if you replace D.* with an actual field list, you can remove the correlation name in the from clause of the sub-query. But, the sub-query itself still needs a correlation clause.
My verification was done using IBM i v7.1.
You can encapsalate the case statement in a view. I even have the fancy profit calc in there for you to order by profit. Now the biggest issue you have is the CCSID on the view for calculated columns but that's another question.
create or replace view VIPDTAB.DAILYVQ as
SELECT
CAST((CASE WHEN TRIM(DASM#)='' THEN '0' ELSE TRIM(DASM#) END) AS INTEGER) AS DASM,
DAIDAT,
DAINV# AS DAINV,
DALIN# AS DALIN,
CAST(TRIM(DAITEM) AS INTEGER) AS DAITEM,
TRIM(DABSW) AS DABSW,
TRIM(DAPCLS) AS DAPCLS,
DAQTY,
DAPRIC,
DAICOS,
DADPAL,
(DAPRIC-DAICOS+DADPAL)*DAQTY AS PROFIT
FROM
VIPDTAB.DAILYV
now you can
select dasm, count(*) from vipdtab.dailyvq where dasm = 0 group by dasm order by dasm
or
select * from vipdtab.dailyvq order by profit desc

Pulling variable length substring from middle of string

I am trying to grab variable length string from a primary string.
Example:
ABC*12*1*name name****XX*123456789~
ABC*12*1*diffname diffname****XX*234567890~
ABC*12*1*diffname2 diffname2***XX*345678901~
I need to pull out the 'name name', 'diffname diffname', 'diffname2 diffname2'
etc from the string. And then replace the ' ' between the names with an asterisk - but, I cant just insert in the first space in the string, there could be multiple names, and so I would want to insert the '*' into the second, or third space, depending on the length of the name string.
SELECT
CHARINDEX('*1*',data)+3 AS startpos,
CHARINDEX('***',data) AS Endpos,
data
from #t
where data like '%ABC*12*1*%'
This gives me a start point and end point for the variable length string. So I try:
SELECT SUBSTRING(data,CHARINDEX('*1*',data)+3,CHARINDEX('***',data) -CHARINDEX('*1*',data)+3)
FROM #t
WHERE data like '%ABC*12*1*name%'
But this gives me
name n name aa*****X
as a result set, basically starting at the start point and then running well past the end point.
What am I doing wrong?
This part is the problem :
SELECT .....-CHARINDEX('*1*',data)+3
FROM .....
WHERE .....
You want to substract with Endpos so it supposed to be written in brackets like so :
-(CHARINDEX('*1*',data)+3)
and if the brackets are removed the last part should become -3 :
-CHARINDEX('*1*',data)-3

T-SQL Join on foreign key that has leading zero

I need to link various tables that each have a common key (a serial number in this case). In some tables the key has a leading zero e.g. '037443' and on others it doesn't e.g. '37443'. In both cases the serial refers to the same product. To confound things serial 'numbers' are not always just numeric e.g. may be "BDO1234", in these cases there is never a leading zero.
I'd prefer to use the WHERE statement (WHERE a.key = b.key) but could use joins if required. Is there any way to do this?
I'm still learning so please keep it simple if possible. Many thanks.
Based on the accepted answer in this link, I've written a small tsql sample to show you what I meant by 'the right direction':
Create the test table:
CREATE TABLE tblTempTest
(
keyCol varchar(20)
)
GO
Populate it:
INSERT INTO tblTempTest VALUES
('1234'), ('01234'), ('10234'), ('0k234'), ('k2304'), ('00034')
Select values:
SELECT keyCol,
SUBSTRING(keyCol, PATINDEX('%[^0]%', keyCol + '.'), LEN(keyCol)) As trimmed
FROM tblTempTest
Results:
keyCol trimmed
-------------------- --------------------
1234 1234
01234 1234
10234 10234
0k234 k234
k2304 k2304
00034 34
Cleanup:
DROP TABLE tblTempTest
Note that the values are alpha-numeric, and only leading zeroes are trimmed.
One possible drawback is that if there is a 0 after a white space it will not be trimmed, but that's an easy fix - just add ltrim:
SUBSTRING(LTRIM(keyCol), PATINDEX('%[^0]%', LTRIM(keyCol + '.')), LEN(keyCol)) As trimmed
You need to create a function
CREATE FUNCTION CompareSerialNumbers(#SerialA varchar(max), #SerialB varchar(max))
RETURNS bit
AS
BEGIN
DECLARE #ReturnValue AS bit
IF (ISNUMERIC(#SerialA) = 1 AND ISNUMERIC(#SerialB) = 1)
SELECT #ReturnValue =
CASE
WHEN CAST(#SerialA AS int) = CAST(#SerialB AS int) THEN 1
ELSE 0
END
ELSE
SELECT #ReturnValue =
CASE
WHEN #SerialA = #SerialB THEN 1
ELSE 0
END
RETURN #ReturnValue
END;
GO
If both are numeric then it compares them as integers otherwise it compares them as strings.

Order By alphanumeric values like numeric

there is a table's fields on MSSQL Serrver 2005 as VARCHAR. It contains alphanumeric values like "A,B,C,D ... 1,2,3,...,10,11,12" etc.
When i use below codes;
....
ORDER BY TableFiledName
Ordering result is as follow 11,12,1,2,3 etc.
When i use codes as below,
....
ORDER BY
CASE WHEN ISNUMERIC(TableFiledName) = 0 THEN CAST(TableFiledNameAS INT) ELSE TableFiledName END
I get error message as below;
Msg 8114, Level 16, State 5, Line 1 Error converting data type varchar
to float.
How can get like this ordering result: 1,2,3,4,5,6,7,8,9,10,11,12 etc..
Thanks in advance.
ISNUMERIC returns 1 when the field is numeric.
So your first problem is that it should be ...
CASE WHEN ISNUMERIC(TableFiledName) = 1 THEN
But this alone won't work.
You need to prefix the values with zeroes and take the rightmost
order by
case when ISNUMERIC(FieldName) =1
then right('000000000'+FieldName, 5)
else FieldName
end
Using 5 allows for numbers up to 99999 - if your numbers are higher, increase that number.
This will put the numbers before the letters. If you want the letters before the numbers, then you can add an isnumeric to the sort order - ie:
order by
isnumeric(FieldName),
case...
This won't cope with decimals, but you haven't mentioned them