Why does t-sql's LEN() function ignore the trailing spaces? - tsql

Description of LEN() function on MSDN : Returns the number of characters of the specified string expression, excluding trailing blanks.
Why was the LEN() function designed to work this way? What problem does this behaviour solve?
Related :
LEN function not including trailing spaces in SQL Server
charindex() counts whiteshars in the end, len() doesn't in T-SQL

It fixes the automatic padding of strings due to the data type length. Consider the following:
DECLARE #Test CHAR(10), #Test2 CHAR(10)
SET #Test = 'test'
SET #Test2 = 'Test2'
SELECT LEN(#Test), LEN(#Test + '_') - 1, LEN(#Test2), LEN(#Test2 + '_') - 1
This will return 4, 10, 5 and 10 respectively. Even though no trailing spaces were used for #Test, it still maintains its length of 10. If LEN did not trim the trailing spaces then LEN(#test) and LEN(#Test2) would be the same. More often than not people will want to know the length of the meaningful data, and not the length of the automatic padding so LEN removes trailing blanks. There are workarounds/alternatives where this is not the required behaviour.

space padding in non variable fiels like
CREATE TABLE tblSomething ( variable_length varchar(20), non_var_lenght char(10) );

Related

remove non-numeric characters in a column (character varying), postgresql (9.3.5)

I need to remove non-numeric characters in a column (character varying) and keep numeric values in postgresql 9.3.5.
Examples:
1) "ggg" => ""
2) "3,0 kg" => "3,0"
3) "15 kg." => "15"
4) ...
There are a few problems, some values are like:
1) "2x3,25"
2) "96+109"
3) ...
These need to remain as is (i.e when containing non-numeric characters between numeric characters - do nothing).
Using regexp_replace is more simple:
# select regexp_replace('test1234test45abc', '[^0-9]+', '', 'g');
regexp_replace
----------------
123445
(1 row)
The ^ means not, so any character that is not in the range 0-9 will be replaced with an empty string, ''.
The 'g' is a flag that means all matches will be replaced, not just the first match.
For modifying strings in PostgreSQL take a look at The String functions and operators section of the documentation. Function substring(string from pattern) uses POSIX regular expressions for pattern matching and works well for removing different characters from your string.
(Note that the VALUES clause inside the parentheses is just to provide the example material and you can replace it any SELECT statement or table that provides the data):
SELECT substring(column1 from '(([0-9]+.*)*[0-9]+)'), column1 FROM
(VALUES
('ggg'),
('3,0 kg'),
('15 kg.'),
('2x3,25'),
('96+109')
) strings
The regular expression explained in parts:
[0-9]+ - string has at least one number, example: '789'
[0-9]+.* - string has at least one number followed by something, example: '12smth'
([0-9]+.\*)* - the string similar to the previous line zero or more times, example: '12smth22smth'
(([0-9]+.\*)*[0-9]+) - the string from the previous line zero or more times and at least one number at the end, example: '12smth22smth345'

How to automatically fill a column with spaces to a pre-determined length?

I am wondering how I can manipulate a column type = integer to return a pre-determined length of 10 even if the actual value only has a length of 4. The delta should be filled with spaces.
To be specific:
column: last_id;
exemplary value: 101223;
length of integer: SELECT length(id::text) = 6
Thus it should add 4 spaces. If the length of the integer is 5 it should 5 spaces.
How can I do that?
SELECT lpad(last_id::text, 10, ' ') -- pad left
, rpad(last_id::text, 10, ' ') -- pad right
, last_id::char(10) -- trick to pad right
The manual has more on String Functions and on character types like char(n).

PostgreSQL convert a string with commas into an integer

I want to convert a column of type "character varying" that has integers with commas to a regular integer column.
I want to support numbers from '1' to '10,000,000'.
I've tried to use: to_number(fieldname, '999G999G999'), but it only works if the format matches the exact length of the string.
Is there a way to do this that supports from '1' to '10,000,000'?
select replace(fieldname,',','')::numeric ;
To do it the way you originally attempted, which is not advised:
select to_number( fieldname,
regexp_replace( replace(fieldname,',','G') , '[0-9]' ,'9','g')
);
The inner replace changes commas to G. The outer replace changes numbers to 9. This does not factor in decimal or negative numbers.
You can just strip out the commas with the REPLACE() function:
CREATE TABLE Foo
(
Test NUMERIC
);
insert into Foo VALUES (REPLACE('1,234,567', ',', '')::numeric);
select * from Foo; -- Will show 1234567
You can replace the commas by an empty string as suggested, or you could use to_number with the FM prefix, so the query would look like this:
SELECT to_number(my_column, 'FM99G999G999')
There are things to take note:
When using function REPLACE("fieldName", ',', '') on a table, if there are VIEW using the TABLE, that function will not work properly. You must drop the view to use it.

Prevent trailing spaces during insert?

I have this INSERT statement and there seems to be trailing spaces at the end of the acct_desc fields. I'd like to know how to prevent trailing spaces from occurring during my insert statement.
INSERT INTO dwh.attribution_summary
SELECT d.adic,
d.ucic,
b.acct_type_desc as acct_desc,
a.begin_mo_balance as opening_balance,
c.date,
'fic' as userid
FROM fic.dim_members d
JOIN fic.fact_deposits a ON d.ucic = a.ucic
JOIN fic.dim_date c ON a.date_id = c.date_id
JOIN fic.dim_acct_type b ON a.acct_type_id = b.acct_type_id
WHERE c.date::timestamp = current_date - INTERVAL '1 days';
Use the PostgreSQL trim() function. There is trim(), rtrim() and ltrim().
To trim trailing spaces:
...
rtrim(b.acct_type_desc) as acct_desc,
...
If acct_type_desc is not of type text or varchar, cast it to text first:
...
rtrim(b.acct_type_desc::text) as acct_desc,
...
If acct_type_desc is of type char(n), casting it to text removes trailing spaces automatically, no trim() necessary.
Besides what others have said, add a CHECK CONSTRAINT to that column, so if one forgets to pass the rtrim() function inside the INSERT statement, the check constraint won't.
For example, check trailing spaces (in the end) of string:
ALTER TABLE dwh.attribution_summary
ADD CONSTRAINT tcc_attribution_summary_trim
CHECK (rtrim(acct_type_desc) = acct_type_desc);
Another example, check for leading and trailing spaces, and consecutive white spaces in string middle):
ALTER TABLE dwh.attribution_summary
ADD CONSTRAINT tcc_attribution_summary_whitespace
CHECK (btrim(regexp_replace(acct_type_desc, '\s+'::text, ' '::text, 'g'::text)) = acct_type_desc);
What is the type of acct_desc?
If it is CHAR(n), then the DBMS has no choice but to add spaces at the end; the SQL Standard requires that.
If it is VARCHAR(n), then the DBMS won't add spaces at the end.
If PostgresSQL supported them, the national variants of the types (NCHAR, NVARCHAR) would behave the same as the corresponding non-national variant does.

Test for numeric value?

The vendor data we load in our staging table is rather dirty. One column in particular captures number data but 40% of the time has garbage characters or random strings.
I have to create a report that filters out value ranges in that column. So, I tried playing with a combination of replace/translate like so
select replace(translate(upper(str),' ','all possible char'),' ','')
from table
but it fails whenever it encounters a char I did not code. Therefore, the report can never be automated.
Javascript has the isNaN() function to determine whether a value is an illegal number (True if it is and false if not).
How can I do the same thing with DB2?? Do you have any idea?
Thanks in advance.
A fairly reliable (but somewhat hackish) way is to compare the string to its upper- and lower-case self (numbers don't have different cases). As long as your data that is bringing in characters only includes Latin characters, you should be fine:
SELECT input, CASE
WHEN UPPER(input) = LOWER(input) THEN TO_NUMBER(input)
ELSE 0
END AS output
FROM source
Another option would be to use the TRANSLATE function:
SELECT input,
CASE
WHEN TRANSLATE(CAST(input as CHAR(10)), '~~~~~~~~~~~~~', '0123456789-. ') = '~~~~~~~~~~' THEN CAST(input AS DECIMAL(12, 2))
ELSE 0
END AS num
FROM x
WITH x (stringval) AS
(
VALUES ('x2'),(''),('2.2.'),('5-'),('-5-'),('--5'),('.5'),('2 2'),('0.5-'),(' 1 '),('2 '),('3.'),('-4.0')
)
SELECT stringval,
CASE WHEN (
-- Whitespace must not appear in the middle of a number
-- (but trailing and/or leading whitespace is permitted)
RTRIM(LTRIM( stringval )) NOT LIKE '% %'
-- A number cannot start with a decimal point
AND LTRIM( stringval ) NOT LIKE '.%'
-- A negative decimal number must contain at least one digit between
-- the negative sign and the decimal point
AND LTRIM( stringval ) NOT LIKE '-.%'
-- The negative sign may only appear at the beginning of the number
AND LOCATE( '-', LTRIM(stringval)) IN ( 0, 1 )
-- A number must contain at least one digit
AND TRANSLATE( stringval, '0000000000', '123456789') LIKE '%0%'
-- Allow up to one negative sign, followed by up to one decimal point
AND REPLACE(
TRANSLATE( RTRIM(LTRIM(stringval)), '000000000', '123456789'),
'0', '') IN ('','-','.','-.')
)
THEN 'VALID'
ELSE 'INVALID'
END AS stringisvalidnumber
FROM x
;
Check this out:
SELECT Mobile,
TRANSLATE(Mobile, '~~~~~~~~~~', '0123456789') AS FirstPass,
TRANSLATE(TRANSLATE(Mobile, '~~~~~~~~~~', '0123456789'), '', '~') AS Erroneous,
REPLACE(TRANSLATE(Mobile, '', TRANSLATE(TRANSLATE(Mobile, '~~~~~~~~~~', '0123456789'), '', '~')), ' ', '') AS Corrected
FROM Person WHERE Mobile <> '' FETCH FIRST 100 ROWS ONLY
The table is "Person" and the field that you want to check is "Mobile".
If you work a little bit more on this, you can build an UPDATE to fix the entire table