Split a column value into two columns in query output (DB2) - db2

How can I split a column value into two values in the output? I need have the numerals in one column and the alphabet in the other.
For Example 1
Existing
Column
========
678J
2345K
I need the output to be:
Column 1 Column 2
======== ========
678 J
2345 K
The existing column can have 4 or 5 characters, as shown in the example. There is no space.
Thanks in advance!!

You could convert all letters to spaces & strip them away, then do the opposite with digits in the other column:
SELECT trim(translate(mycol,repeat(' ',26),'ABCDEFGHIJKLMNOPQRSTUVWXYZ')) as col1,
trim(translate(mycol,repeat(' ',10),'0123456789')) as col2
FROM mytable
Adjust as necessary to translate additional characters.

I am not sure about the performance of WarrenT's solution, but it looks like very heavy solution. It does what it is supposed to be doing with little constraints on the the data. If you know more about the data, you can optimize.
String always ends with 1 and only one letter
select left(mycol, length(mycol)-1), right(mycol,1) from mytable

Related

Postgresql row count not working accurately

I have a table with 2.44 million rows, and after loading it into server:
copy sample_table
from 'C:\sample_table.txt'
delimiter E'\t'
csv header
if I do
select count(*) from sample_table
pgAdmin 4 will return count as only 1.35 million rows
I found it odd, so I exported this table and looked at the number of rows in Notepad++, and it is still 2.44 million rows (in fact there is 1 row count difference and not sure why, but guess will worry about that later)
As recommended by Adrian in comments, I verified this in psql, and still only see 1.35M.
Any advice please? Thank you!
When using CSV format, literal newlines enclosed in quotes do not terminate a row.
"How now
Brown cow"
Is 2 "lines", but only 1 row.
If you re-export in the default text format, then the number of lines should match the number of rows, with the literal newlines turned into the two-character escape \n

Alphanumeric Sorting in PostgreSQL

I have this table with a character varying column in Postgres 9.6:
id | column
------------
1 |IR ABC-1
2 |IR ABC-2
3 |IR ABC-10
I see some solutions typecasting the column as bytea.
select * from table order by column::bytea.
But it always results to:
id | column
------------
1 |IR ABC-1
2 |IR ABC-10
3 |IR ABC-2
I don't know why '10' always comes before '2'. How do I sort this table, assuming the basis for ordering is the last whole number of the string, regardless of what the character before that number is.
When sorting character data types, collation rules apply - unless you work with locale "C" which sorts characters by there byte values. Applying collation rules may or may not be desirable. It makes sorting more expensive in any case. If you want to sort without collation rules, don't cast to bytea, use COLLATE "C" instead:
SELECT * FROM table ORDER BY column COLLATE "C";
However, this does not yet solve the problem with numbers in the string you mention. Split the string and sort the numeric part as number.
SELECT *
FROM table
ORDER BY split_part(column, '-', 2)::numeric;
Or, if all your numbers fit into bigint or even integer, use that instead (cheaper).
I ignored the leading part because you write:
... the basis for ordering is the last whole number of the string, regardless of what the character before that number is.
Related:
Alphanumeric sorting with PostgreSQL
Split comma separated column data into additional columns
What is the impact of LC_CTYPE on a PostgreSQL database?
Typically, it's best to save distinct parts of a string in separate columns as proper respective data types to avoid any such confusion.
And if the leading string is identical for all columns, consider just dropping the redundant noise. You can always use a VIEW to prepend a string for display, or do it on-the-fly, cheaply.
As in the comments split and cast the integer part
select *
from
table
cross join lateral
regexp_split_to_array(column, '-') r (a)
order by a[1], a[2]::integer

How to trim ONLY certain characters from string in Postgresql database?

I have a table in a db that is labeled like so:
TableA:
ID ID2 Text
0 64556-546-1 465846 adfadf
1 64556-546-1 465846 adfadf
2 64556-546-1 465846 adfadf
I want to trim the symbol(-) and digit after the dash only from the first column and only the dash in the second position how would I go about this? so the number will look like 64556-546.
The value will be compared to another in another table, so no need to create another column.
Thank you in advance.
I'd check here;
https://www.postgresql.org/docs/9.1/static/functions-string.html
and do a;
regexp_replace(string text, pattern text, replacement text [, flags text])
So one way to do it (ignoring probable performance issues), would be;
SELECT TRIM(BOTH '-1' FROM regexp_replace(ID, '[-]', '' )) FROM TableA
But you could probably build a better way with a more advanced regex.

Optimal use of LIKE on indexed column

I have a large table (+- 1 million rows, 7 columns including the primary key). The table contains two columns (ie: symbol_01 and symbol_02) that are indexed and used for querying. This table contains rows such as:
id symbol_01 symbol_02 value_01 value_02
1 aaa bbb 12 15
2 bbb aaa 12 15
3 ccc ddd 20 50
4 ddd ccc 20 50
As per the example rows 1 and 2 are identical except that symbol_01 and symbol_02 are swapped but they have the same values for value_01 and value_02. That is true once again with row 3 and 4. This is the case for the entire table, there are essentially two rows for each combination of symbol_01+symbol_02.
I need to figure out a better way of handling this to get rid of the duplication. So far the solution I am considering is to just have one column called symbol which would be a combination of the two symbols, so the table would be as follows:
id symbol value_01 value_02
1 ,aaa,bbb, 12 15
2 ,ccc,ddd, 20 50
This would cut the number of rows in half. As a side note, every value in the symbol column will be unique. Results always need to be queried for using both symbols, so I would do:
select value_01, value_02
from my_table
where symbol like '%,aaa,%' and symbol like '%,bbb,%'
This would work but my question is around performance. This is still going to be a big table (and will get bigger soon). So my question is, is this the best solution for this scenario given that symbol will be indexed, every symbol combination will be unique, and I will need to use LIKE to query results.
Is there a better way to do this? Im not sure how great LIKE is for performance but I don't see an alternative?
There's no high performance solution, because your problem is shoehorning multiple values into one column.
Create a child table (with a foreign key to your current/main table) to separately hold all the individual values you want to search on, index that column and your query will be simple and fast.
With this index:
create index symbol_index on t (
least(symbol_01, symbol_02),
greatest(symbol_01, symbol_02)
)
The query would be:
select *
from t
where
least(symbol_01, symbol_02) = least('aaa', 'bbb')
and
greatest(symbol_01, symbol_02) = greatest('aaa', 'bbb')
Or simply delete the duplicates:
delete from t
using (
select distinct on (
greatest(symbol_01, symbol_02),
least(symbol_01, symbol_02),
value_01, value_02
) id
from t
order by
greatest(symbol_01, symbol_02),
least(symbol_01, symbol_02),
value_01, value_02
) s
where id = s.id
Depending on the columns semantics it might be better to normalize the table as suggested by #Bohemian

ltrim(rtrim(x)) leave blanks on rtl content - anyone knows on a work around?

i have a table [Company] with a column [Address3] defined as varchar(50)
i can not control the values entered into that table - but i need to extract the values without leading and trailing spaces. i perform the following query:
SELECT DISTINCT RTRIM(LTRIM([Address3])) Address3 FROM [Company] ORDER BY Address3
the column contain both rtl and ltr values
most of the data retrieved is retrieved correctly - but SOME (not all) RTL values are returned with leading and or trailing spaces
i attempted to perform the following query:
SELECT DISTINCT ltrim(rTRIM(ltrim(rTRIM([Address3])))) c, ltrim(rTRIM([Address3])) b, [Address3] a, rtrim(LTRIM([Address3])) Address3 FROM [Company] ORDER BY Address3
but it returned the same problem on all columns - anyone has any idea what could cause it?
The rows that return with extraneous spaces might have a kind of space or invisible character the trim functions don't know about. The documentation doesn't even mention what is considered "a blank" (pretty damn sloppy if you ask me). Try taking one of those rows and looking at the characters one by one to see what character they are.
since you are using varchar, just do this to get the ascii code of all the bad characters
--identify the bad character
SELECT
COUNT(*) AS CountOf
,'>'+RIGHT(LTRIM(RTRIM(Address3)),1)+'<' AS LastChar_Display
,ASCII(RIGHT(LTRIM(RTRIM(Address3)),1)) AS LastChar_ASCII
FROM Company
GROUP BY RIGHT(LTRIM(RTRIM(Address3)),1)
ORDER BY 3 ASC
do a one time fix to data to remove the bogus character, where xxxx is the ASCII value identified in the previous select:
--only one bad character found in previous query
UPDATE Company
SET Address3=REPLACE(Address3,CHAR(xxxx),'')
--multiple different bad characters found by previous query
UPDATE Company
SET Address3=REPLACE(REPLACE(Address3,CHAR(xxxx1),''),char(xxxx2),'')
if you have bogus chars in your data remove them from the data and not each time you select the data. you WILL have to add this REPLACE logic to all INSERTS and UPDATES on this column, to keep any new data from having the bogus characters.
If you can't alter the data, you can just select it this way:
SELECT
LTRIM(RTRIM(REPLACE(Address3,CHAR(xxxx),'')))
,LTRIM(RTRIM(REPLACE(REPLACE(Address3,CHAR(xxxx1),''),char(xxxx2),'')))
...