How to find non-ASCII symbols in a string ? (We are using DB2)
We have tried following select statement but it is not working.
SELECT columnname
FROM tablename
WHERE columnname LIKE '%[' + CHAR(127) + '-' + CHAR(255) + ']%'
COLLATE Latin1_General_100_BIN2
I guess you were trying to use CHR() function, instead of CHAR(), which is a data-type.
If you are using a newer db2 version, that has REGEXP functions, you can try using REGEXP_LIKE() function.
Follow an example from samble db:
SELECT EMPNO, LASTNAME FROM EMPLOYEE WHERE REGEXP_LIKE(LASTNAME,'[E-H]')
EMPNO LASTNAME
------ ---------------
000010 HAAS
000020 THOMPSON
000050 GEYER
000060 STERN
000090 HENDERSON
000100 SPENSER
000110 LUCCHESSI
000120 O'CONNELL
000140 NICHOLLS
000170 YOSHIMURA
000180 SCOUTTEN
000190 WALKER
000210 JONES
000230 JEFFERSON
000250 SMITH
000260 JOHNSON
000270 PEREZ
000280 SCHNEIDER
000290 PARKER
000300 SMITH
000310 SETRIGHT
000320 MEHTA
000330 LEE
000340 GOUNOT
200010 HEMMINGER
200220 JOHN
200240 MONTEVERDE
200280 SCHWARTZ
200310 SPRINGER
200330 WONG
30 record(s) selected.
All names selected contains letters from E to H, as specified by the search-pattern.
As I didn't have any row containing such ranges.. I updated one of the rows, adding chars 169 and 174 to it.
Update employee set LASTNAME = ('LEE' || chr(169) || chr(174)) WHERE LASTNAME = 'LEE'
and, using this REGEXP_LIKE function:
SELECT EMPNO, LASTNAME FROM EMPLOYEE WHERE REGEXP_LIKE(LASTNAME , '[' || CHR(127) || '-' || CHR(255) || ']')"
EMPNO LASTNAME
------ ---------------
000330 LEE©®
1 record(s) selected.
Regards
Related
I have two sql-server tables: bills and payments. I am trying to create a VIEW to highlight the bill numbers if they occur in the payment description field. For example:
TABLE bll
|bllID | bllNum |
| -------- | -------- |
| 1 | qwerty123|
| 2 | qwerty345|
| 3 | 1234 |
TABLE payments
|paymentID | description |
| -------- | ---------------------------------- |
| 1 | payment of qwerty123 and qwerty345 |
I want to highlight both the 'qwerty123' and 'qwerty345' strings by adding html code to it. The code I have is this:
SELECT REPLACE(payments.description,
COALESCE((SELECT TOP 1 bll.bllNum
FROM bll
WHERE COALESCE(bll.bllNum, '') <> '' AND
PATINDEX('%' + bll.bllNum + '%', payments.description) > 0), ''),
'<font color=red>' +
COALESCE((SELECT TOP 1 bll.bllNum
FROM bll
WHERE COALESCE(bll.bllNum, '') <> '' AND
PATINDEX('%' + bll.bllNum + '%', payments.description) > 0), '') +
'</font>')
FROM payments
This works but only for the first occurrence of a bill number. If the description field has more than one bill number, the consecutive bill numbers are not highlighted. So in my example 'qwerty123' gets highlighted, but not 'qwerty345'
I need to highlight all occurrences. How can I accomplish this?
With the caveat that this is not a task best done in the database, one possible approach you could try is to use string_split to break your description into words and then join this to your Bills, doing your string manipulation on matching rows.
Note, according to the documentation, string_split is not 100% guaranteed to retain its correct ordering but always has in my usage. It could always be substituted for an alternative function to work on the same principle.
select string_agg (w.word,' ') [Description]
from (
select
case when exists (select * from bill b where b.billnum=s.value)
then Concat('<font colour="red">',s.value,'</font>') else s.value end word
from payments p
cross apply String_Split(description,' ')s
)w
Example DB Fiddle
Okay, I understand, I can put code in the front-end application by looping through the bill numbers and replacing them as they are found in the description. Just thought/ hoped there was a simple solution to this using t-sql. But I understand the difficulty.
I am working on Data standardization rules, and one of the rules says, "If the last name is part of first name, then remove the last name from first name".
my Query- how do i check if first name column has the last name in it using oracle sql developer?
I tried using :
select fst_nm, lst_nm from emp where fst_nm = fst_nm || lst_nm ;
but this query returns '0' results.
Also, I tried another query:
select fst_nm, lst_nm, regexp_substr(fst_nm, '[^ ]+',1,1) from emp ;
I tried using the below query
select fst_nm, lst_nm from emp where fst_nm = fst_nm || lst_nm ;
but this query returns nothing, I mean '0' results.
Also, I tried another query:
select fst_nm, lst_nm, regexp_substr(fst_nm, '[^ ]+',1,1) from emp ;
expected result is:
fst_nm = john smith ;
lst_nm = smith
Actual result showing up is :
fst_nm = john ;
lst_nm = smith
Please help
You should be able to just do a blanket replace on the entire table:
UPDATE emp
SET fst_nm = REPLACE(fst_nm, lst_nm, '');
The reason this should work is that for those records where the last name does not appear as part of the first name, the replace would have no effect. Otherwise, the last name would be stripped from the first name.
You can use below logic
select length('saurabh rai'),instr('saurabh rai',' '),case when length('saurabh rai') > instr('saurabh rai',' ') then substr('saurabh',1,instr('saurabh rai',' ')-1) else 'saurabh rai' end as a from dual;
Update emp set fst_nm=(Case when length(fst_nm) > instr(fst_nm, ' ') then substr(fst_nm,1,instr(fst_nm,' ')-1) else fst_nm end);
I have a column "pnum" in a "test" table.
I'd like to replace the leading "9" in pnum with "*" for every record.
testdb=# select * from test limit 5;
id name pnum
===========================================
1 jk 912312345
2 tt 9912333333
I would like the pnums to look like this:
id name pnum
===========================================
1 jk *12312345
2 tt *912333333
How would I do something like this in postgres?
EDIT 1:
I have tried something like this so far:
select id, name, '*' && substring(pnum FROM 2 FOR CHAR_LENGTH(pnum)-1 ) from test limit 3;
Also tried this:
select id, name, '*' || substring(pnum FROM 2 FOR CHAR_LENGTH(pnum)-1 ) from test limit 3;
Neither one has worked...
EDIT 2:
I figured it out:
select id, name, '*'::text || substring(pnum FROM 2 FOR CHAR_LENGTH(pnum)-1 ) from test limit 3;
See function regexp_replace(string text, pattern text, replacement text [, flags text]) String Functions and Operators
SELECT regexp_replace('9912333333', '^[9]', '*');
regexp_replace
----------------
*912333333
You can use Postgres' string manipulation functions for this. In your case "Substring" and "Char_Length"
'*' || Substring(<yourfield> FROM 2 FOR CHAR_LENGTH(<yourfield>)-1) as outputfield
So I have a in my Postgresql:
TAG_TABLE
==========================
id tag_name
--------------------------
1 aaa
2 bbb
3 ccc
To simplify my problem,
What I want to do is SELECT 'id' from TAG_TABLE when a string "aaaaaaaa" contains the 'tag_name'.
So ideally, it should only return "1", which is the ID for tag name 'aaa'
This is what I am doing so far:
SELECT id FROM TAG_TABLE WHERE 'aaaaaaaaaaa' LIKE '%tag_name%'
But obviously, this does not work, since the postgres thinks that '%tag_name%' means a pattern containing the substring 'tag_name' instead of the actual data value under that column.
How do I pass the tag_name to the pattern??
You should use tag_name outside of quotes; then it's interpreted as a field of the record. Concatenate using '||' with the literal percent signs:
SELECT id FROM TAG_TABLE WHERE 'aaaaaaaa' LIKE '%' || tag_name || '%';
And remember that LIKE is case-sensitive. If you need a case-insensitive comparison, you could do this:
SELECT id FROM TAG_TABLE WHERE 'aaaaaaaa' LIKE '%' || LOWER(tag_name) || '%';
A proper way to search for a substring is to use position function instead of like expression, which requires escaping %, _ and an escape character (\ by default):
SELECT id FROM TAG_TABLE WHERE position(tag_name in 'aaaaaaaaaaa')>0;
I personally prefer the simpler syntax of the ~ operator.
SELECT id FROM TAG_TABLE WHERE 'aaaaaaaa' ~ tag_name;
Worth reading through Difference between LIKE and ~ in Postgres to understand the difference.
`
In addition to the solution with 'aaaaaaaa' LIKE '%' || tag_name || '%' there
are position (reversed order of args) and strpos.
SELECT id FROM TAG_TABLE WHERE strpos('aaaaaaaa', tag_name) > 0
Besides what is more efficient (LIKE looks less efficient, but an index might change things), there is a very minor issue with LIKE: tag_name of course should not contain % and especially _ (single char wildcard), to give no false positives.
SELECT id FROM TAG_TABLE WHERE 'aaaaaaaa' LIKE '%' || "tag_name" || '%';
tag_name should be in quotation otherwise it will give error as tag_name doest not exist
I'm having great difficultly making my DB2 (AS/400) queries case insensitive.
For example:
SELECT *
FROM NameTable
WHERE LastName = 'smith'
Will return no results, but the following returns 1000's of results:
SELECT *
FROM NameTable
WHERE LastName = 'Smith'
I've read of putting SortSequence/SortType into your connection string but have had no luck... anyone have exepierence with this?
Edit:
Here's the stored procedure:
BEGIN
DECLARE CR CURSOR FOR
SELECT T . ID ,
T . LASTNAME ,
T . FIRSTNAME ,
T . MIDDLENAME ,
T . STREETNAME || ' ' || T . ADDRESS2 || ' ' || T . CITY || ' ' || T . STATE || ' ' || T . ZIPCODE AS ADDRESS ,
T . GENDER ,
T . DOB ,
T . SSN ,
T . OTHERINFO ,
T . APPLICATION
FROM
( SELECT R . * , ROW_NUMBER ( ) OVER ( ) AS ROW_NUM
FROM CPSAB32.VW_MYVIEW
WHERE R . LASTNAME = IFNULL ( #LASTNAME , LASTNAME )
AND R . FIRSTNAME = IFNULL ( #FIRSTNAME , FIRSTNAME )
AND R . MIDDLENAME = IFNULL ( #MIDDLENAME , MIDDLENAME )
AND R . DOB = IFNULL ( #DOB , DOB )
AND R . STREETNAME = IFNULL ( #STREETNAME , STREETNAME )
AND R . CITY = IFNULL ( #CITY , CITY )
AND R . STATE = IFNULL ( #STATE , STATE )
AND R . ZIPCODE = IFNULL ( #ZIPCODE , ZIPCODE )
AND R . SSN = IFNULL ( #SSN , SSN )
FETCH FIRST 500 ROWS ONLY )
AS T
WHERE ROW_NUM <= #MAXRECORDS
OPTIMIZE FOR 500 ROW ;
OPEN CR ;
RETURN ;
Why not do this:
WHERE lower(LastName) = 'smith'
If you're worried about performance (i.e. the query not using an index), keep in mind that DB2 has function indexes, which you can read about here. So essentially, you can create an index on upper(LastName).
EDIT
To do the debugging technique I discussed in the comments, you could do something like this:
create table log (msg varchar(100, dt date);
Then in your SP, you can insert messages to this table for debugging purposes:
insert into log (msg, dt) select 'inside the SP', current_date from sysibm.sysdummy1;
Then after the SP runs, you can select from this log table to see what happened.
If you want case-insensitive in your procedure, try using this option in it:
SET OPTION SRTSEQ = *LANGIDSHR ;
You should also create an index to support it for performance. Create the index when you have *LANGIDSHR as a connection attribute, and the shared-weight index should then be available to later jobs. (There are various ways to get the appropriate setting into effect.)
*LANGIDSHR relates to the language-ID for your jobs. Characters in the character set that might be considered as "equals", such as 'A' and 'a' or 'ü' and 'u', should be given equal weights (shared) and so select together.
I did something similar when I wanted a case insensitive search. I used UPPER(mtfield) = 'SEARCHSTRING'. I know this works.
See: https://stackoverflow.com/a/47181640/5507619
Database setting
There is a database config setting you can set at database creation. It's based on unicode, though.
CREATE DATABASE yourDB USING COLLATE UCA500R1_S1
The default Unicode Collation Algorithm is implemented by the UCA500R1 keyword without any attributes. Since the default UCA cannot simultaneously encompass the collating sequence of every language supported by Unicode, optional attributes can be specified to customize the UCA ordering. The attributes are separated by the underscore (_) character. The UCA500R1 keyword and any attributes form a UCA collation name.
The Strength attribute determines whether accent or case is taken into account when collating or comparing text strings. In writing systems without case or accent, the Strength attribute controls similarly important features.
The possible values are: primary (1), secondary (2), tertiary (3), quaternary (4), and identity (I). To ignore:
accent and case, use the primary strength level
case only, use the secondary strength level
neither accent nor case, use the tertiary strength level
Almost all characters can be distinguished by the first three strength levels, therefore in most locales the default Strength attribute is set at the tertiary level. However if the Alternate attribute (described below) is set to shifted, then the quaternary strength level can be used to break ties among white space characters, punctuation marks, and symbols that would otherwise be ignored. The identity strength level is used to distinguish among similar characters, such as the MATHEMATICAL BOLD SMALL A character (U+1D41A) and the MATHEMATICAL ITALIC SMALL A character (U+1D44E).
Setting the Strength attribute to higher level will slow down text string comparisons and increase the length of the sort keys.
Examples:
UCA500R1_S1 will collate "role" = "Role" = "rôle"
UCA500R1_S2 will collate "role" = "Role" < "rôle"
UCA500R1_S3 will collate "role" < "Role" < "rôle"
This worked for me. As you can see, ..._S2 ignores case, too.
Using a newer standard version, it should look like this:
CREATE DATABASE yourDB USING COLLATE CLDR181_S1
Collation keywords:
UCA400R1 = Unicode Standard 4.0 = CLDR version 1.2
UCA500R1 = Unicode Standard 5.0 = CLDR version 1.5.1
CLDR181 = Unicode Standard 5.2 = CLDR version 1.8.1
If your database is already created, there is supposed to be a way to change the setting.
CALL SYSPROC.ADMIN_CMD( 'UPDATE DB CFG USING DB_COLLNAME UCA500R1_S1 ' );
I do have problems executing this, but for all I know it is supposed to work.
Generated table row
Other options are e.g. generating a upper case row:
CREATE TABLE t (
id INTEGER NOT NULL PRIMARY KEY,
str VARCHAR(500),
ucase_str VARCHAR(500) GENERATED ALWAYS AS ( UPPER(str) )
)#
INSERT INTO t(id, str)
VALUES ( 1, 'Some String' )#
SELECT * FROM t#
ID STR UCASE_STR
----------- ------------------------------------ ------------------------------------
1 Some String SOME STRING
1 record(s) selected.