TSQL: Special (National) char detection (SQL Server2016) - tsql

I found a lot of techniques to detect aka special chars($%##) avail on English keyboard, however I see that some national char like á works differently, though my LIKE condition should get it, as it should select anything but a-z1-9, what is the trick here: In sample below I'm missing my special á. I'm on TSQL 2016 with default settings in US.
;WITH cte AS (SELECT 'Euro a€' St UNION SELECT 'adgkjb$' St UNION SELECT 'Bravo Endá' St)
SELECT * FROM cte WHERE St LIKE '%[^a-zA-Z0-9 ]%'
St
adgkjb$
Euro a€
SELECT CAST(N'€' AS VARBINARY(8)) --0xAC20
SELECT CAST(N'á' AS VARBINARY(8)) --0xE100

SQL Server appears to be helping with ranges of characters due to the default collation. If you explicitly list all of the valid characters it will work as desired. Alternatively, you can force a collation on the pattern match that won't interpret the pattern a containing non-ASCII characters.
-- Explicit pattern for "bad" characters.
declare #InvalidCharactersPattern as VarChar(100) = '%[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ]%';
-- Query some sample data with the explicit pattern and an explicitly specified collation.
select Sample,
case when Sample like #InvalidCharactersPattern then 'Bad Character' else 'Okay' end as ExplicitStatus,
case when Sample like '%[^a-zA-Z0-9 ]%' collate Latin1_General_100_BIN
then 'Bad Character' else 'Okay' end as CollationStatus
from ( values ( 'a' ), ( 'A' ), ( 'á' ), ( 'Foo' ), ( 'F&o' ), ( '$%^&' ) ) as Samples( Sample );
-- Server collation.
select ServerProperty( 'collation' ) as ServerCollation;
-- Available collations.
select name, description
from sys.fn_helpcollations()
order by name;

Related

Converting rows to columns using crosstab in PostgreSQL not working (relation “table” does not exist)

I need to use the compiled data using CTE and then convert the columns to rows using crosstab(open to other ideas) in the next select statement. Below is the query.
with checked_adgroup AS (
SELECT
ua.new_adgroup,
ua.account,
ua.campaign,
ua.ad_group,
ua."position",
cp.category,
pt.full_value,
FROM unnest_adgroup ua
LEFT JOIN taxonomy_category cp ON ua."position" = cp."position"
LEFT JOIN taxonomy pt ON ua.short_val = pt.short_value AND cp.category = pt.category AND (pt.lob IS NULL OR pt.lob = ua.lob)
)
SELECT *
from crosstab(
'select
cad.account,
cad.campaign,
cad.ad_group,
cad.category,
cad.full_value
FROM checked_adgroup cad
WHERE cad.all_correct AND cad.category IS NOT NULL
ORDER BY 1,2,3')
AS final_result(
account text, campaign text, ad_group text,
division text, lob text, match_type text );
Error message:
ERROR: relation "checked_adgroup" does not exist LINE 7: FROM checked_adgroup cad
Output of checked_adgroup cte looks like below:
enter image description here
Desired output of the final statement is:
enter image description here
Welcome to the community. First off please do not post images, they are useless to work with, and in some instances they are prohibited and cannot be viewed. Instead use formatted text.
I haven't used crosstab functionality all that much, but it does offer a second version which contains 2 queries, the second feeding into the first. There are however a couple errors in your posted query that would need correcting either way. So that first. Look for --<< tag.
with checked_adgroup AS (
SELECT
ua.new_adgroup,
ua.account,
ua.campaign,
ua.ad_group,
ua."position",
cp.category,
pt.full_value,
--<< missing column or ending , above should not be there, assumption missing column see below.
FROM unnest_adgroup ua
LEFT JOIN taxonomy_category cp ON ua."position" = cp."position"
LEFT JOIN taxonomy pt ON ua.short_val = pt.short_value AND cp.category = pt.category AND (pt.lob IS NULL OR pt.lob = ua.lob)
)
SELECT *
from crosstab(
'select
cad.account,
cad.campaign,
cad.ad_group,
cad.category,
cad.full_value
FROM checked_adgroup cad
WHERE cad.all_correct AND cad.category IS NOT NULL
--<< above line has 2 errors:
--<< Incorrectly formatted needs to cad.all_correct is not null AND cad.category IS NOT NULL
--<< column cad.all_correct does not exist (see missing column above
ORDER BY 1,2,3')
AS final_result(
account text, campaign text, ad_group text,
division text, lob text, match_type text);
Now we need to transform the CTE to a second query that crosstab might be ale to use. I have identified each with Postgres $Quoting$, not so much from necessity as standard string quote (') would be sufficant, but more from visibility standing.
select *
from crosstab(
$ct1$select
account,
campaign,
ad_group,
category,
full_value
--<< from checked_adgroup cad
--<< where cad.all_correct and cad.category is not null
--<< moved above lines to second query to avoid reference and removed qualification
order by 1,2,3
$ct1$
, $ct2$select *
from (
select
ua.new_adgroup,
ua.account,
ua.campaign,
ua.ad_group,
ua."position",
cp.category,
pt.full_value,
'mssing from orig posted query' all_correct
from unnest_adgroup ua
left join taxonomy_category cp on ua."position" = cp."position"
left join taxonomy pt on ua.short_val = pt.short_value
and cp.category = pt.category
and (pt.lob is null or pt.lob = ua.lob)
) s
where all_correct is not null and cad.category is not null
--<< move from query1
$ct2$ )
as final_result(
account text, campaign text, ad_group text,
division text, lob text, match_type text );
But at this point I get an error relationship unnest_adgroup does not exist. Which is true as you did not post the definition, not other referenced tables. But that seems to imply the syntax is correct.
Admittedly, this may be way off base if so, so be it, I can always delete later. But, I stuck at home with no other projects at the moment and this seems like an interesting question. Looking forward to the results. Good Luck.

How to find/replace weird whitespace in string

I find in my sql database string whit weird whitespace which cannot be replace like REPLACE(string, ' ', '') RTRIM and cant it even find with string = '% %'. This space is even transfered to new table when using SELECT string INTO
If i select this string in managment studio and copy that is seems is normal space and when everything is works but cant do nothing directly from database. What else can i do? Its some kind of error or can i try some special character for this?
First, you must identify the character.
You can do that by using a tally table (or a cte) and the Unicode function:
The following script will return a table with two columns: one contains a char and the other it's unicode value:
DECLARE #Str nvarchar(100) = N'This is a string containing 1 number and some words.';
with Tally(n) as
(
SELECT TOP(LEN(#str)) ROW_NUMBER() OVER(ORDER BY ##SPID)
FROM sys.objects a
--CROSS JOIN sys.objects b -- (unremark if there are not enough rows in the tally cte)
)
SELECT SUBSTRING(#str, n, 1) As TheChar,
UNICODE(SUBSTRING(#str, n, 1)) As TheCode
FROM Tally
WHERE n <= LEN(#str)
You can also add a condition to the where clause to only include "special" chars:
AND SUBSTRING(#str, n, 1) NOT LIKE '[a-zA-Z0-9]'
Then you can replace it using it's unicode value using nchar (I've used 32 in this example since it's unicode "regular" space:
SELECT REPLACE(#str, NCHAR(32), '|')
Result:
This|is|a|string|containing|1|number|and|some|words.

Search with Turkish characters

I have problem on db search with like and elastic search in Turkish upper and lower case.
For example I have posts table which contains post titled 'DENEME YAZI'.
If I run this query:
select * from posts where title like '%deneme%';
or:
select * from posts where title like '%YAZI%';
I get correct result but if I run:
select * from posts where title like '%yazı%';
it doesn't return any record. My database encoding is tr_TR.UTF-8.
How can I get correct results without entering exact word?
You must use ILIKE for case insensitive matches:
select * from posts where title ilike '%yazı%';
However, there is the additional complication of peculiar rules in the Turkish locale. Upper case of 'ı' is 'I'. But not the other way round. Lower case of 'I' is 'i':
db=# SELECT lower(upper('ı'));
lower
-------
i
You could solve that by applying upper() on either side of the LIKE expression:
select upper('DENEME YAZI') like ('%' || upper('yazı') || '%');
Applying just a single UPPER (or LOWER) on either side of the expression is not a solution. You should handle problematic Turkish characters (ıI-iİ) by yourself.
İ and i are the same letters in Turkish alphabet.
I and ı are the same letters in Turkish alphabet.
But even using UTF-8, Latin5, Windows 1254 Encoding and collation settings in postgre
UPPER('İ') returns 'İ' OK
UPPER('i') return 'I' Not OK
UPPER('I') returns 'I' OK
UPPER('ı') return 'İ' Not OK
so
SELECT ... FROM ... WHERE ... UPPER('İZMİR') like UPPER('izmir') return false
SELECT ... FROM ... WHERE ... UPPER('ISPARTA') like UPPER('ısparta') return false.
Here's some more precise but not perfect solution because of performance issues
SELECT ... FROM ... WHERE ...
UPPER(REPLACE(REPLACE(COLUMNX, 'i', 'İ'), 'ı', 'I')) = UPPER(REPLACE(REPLACE(myvalue,
'i', 'İ'), 'ı', 'I'))
or
SELECT ... FROM ... WHERE ...
UPPER(TRANSLATE('COLUMNX','ıi','Iİ')) = UPPER(TRANSLATE(myvalue,'ıi','Iİ'))

TRIM not works with lines and tabs of a xpath in PostgreSQL?

With this query
SELECT trim(title) FROM (
SELECT
unnest( xpath('//p[#class="secTitle1"]', xmlText )::varchar[] ) AS title
FROM t1
) as t2
and XML input text with lines and spaces,
<root>
...
<p class="x">
text text
text text
</p><p> ...</p>
...
</root>
The trim() have no effect (!). It is a PostgreSQL bug? How to apply fn:normalize-space() with the XPath? I need something like "WHERE title is not null"? (Oracle is simpler...) How to do this simple query with PostreSQL?
Workaround
I need a well-configured build-in function, not a workaround... But I need to work and to show results, so I am using regular expression...
SELECT id, TRIM(regexp_replace(tit, E'[\\n\\r\\t ]+', ' ', 'g')) AS tit
FROM (
SELECT
id, -- xpath returns array of 1, 2, or more strings
unnest( xpath('//p[#class="secTitle1"]', texto )::VARCHAR[] ) AS tit
FROM t
) AS tmp
So, a "only simple space trim" is not friendly, not util (!).
EDIT after #mu comment
I try
SELECT id, TRIM(tit, E'\\n\\r\\t') AS tit
and
SELECT id, TRIM(tit, '\n\r\t') AS tit
both NOT WORKs.
QUESTION REMAINS:
there are no TRIM-option or postgresql configuration to say to TRIM work as it is required?
can I use normalize-space() at xpath? How?
I am using PostgreSQL 9.1, need to upgrade?
It works in 9.2, and it works on 8.4 too.
postgres=# select trim(unnest(string_to_array(e'\t\tHello\n\t\tHello\n\t\tHello', e'\n')), e'\t');
btrim
-------
Hello
Hello
Hello
(3 rows)
your regexp replace any char \n or \r or \t, but trim working with string "\n\r\t". It has different meaning than you expect.

DB2 Case Sensitivity

I'm having great difficultly making my DB2 (AS/400) queries case insensitive.
For example:
SELECT *
FROM NameTable
WHERE LastName = 'smith'
Will return no results, but the following returns 1000's of results:
SELECT *
FROM NameTable
WHERE LastName = 'Smith'
I've read of putting SortSequence/SortType into your connection string but have had no luck... anyone have exepierence with this?
Edit:
Here's the stored procedure:
BEGIN
DECLARE CR CURSOR FOR
SELECT T . ID ,
T . LASTNAME ,
T . FIRSTNAME ,
T . MIDDLENAME ,
T . STREETNAME || ' ' || T . ADDRESS2 || ' ' || T . CITY || ' ' || T . STATE || ' ' || T . ZIPCODE AS ADDRESS ,
T . GENDER ,
T . DOB ,
T . SSN ,
T . OTHERINFO ,
T . APPLICATION
FROM
( SELECT R . * , ROW_NUMBER ( ) OVER ( ) AS ROW_NUM
FROM CPSAB32.VW_MYVIEW
WHERE R . LASTNAME = IFNULL ( #LASTNAME , LASTNAME )
AND R . FIRSTNAME = IFNULL ( #FIRSTNAME , FIRSTNAME )
AND R . MIDDLENAME = IFNULL ( #MIDDLENAME , MIDDLENAME )
AND R . DOB = IFNULL ( #DOB , DOB )
AND R . STREETNAME = IFNULL ( #STREETNAME , STREETNAME )
AND R . CITY = IFNULL ( #CITY , CITY )
AND R . STATE = IFNULL ( #STATE , STATE )
AND R . ZIPCODE = IFNULL ( #ZIPCODE , ZIPCODE )
AND R . SSN = IFNULL ( #SSN , SSN )
FETCH FIRST 500 ROWS ONLY )
AS T
WHERE ROW_NUM <= #MAXRECORDS
OPTIMIZE FOR 500 ROW ;
OPEN CR ;
RETURN ;
Why not do this:
WHERE lower(LastName) = 'smith'
If you're worried about performance (i.e. the query not using an index), keep in mind that DB2 has function indexes, which you can read about here. So essentially, you can create an index on upper(LastName).
EDIT
To do the debugging technique I discussed in the comments, you could do something like this:
create table log (msg varchar(100, dt date);
Then in your SP, you can insert messages to this table for debugging purposes:
insert into log (msg, dt) select 'inside the SP', current_date from sysibm.sysdummy1;
Then after the SP runs, you can select from this log table to see what happened.
If you want case-insensitive in your procedure, try using this option in it:
SET OPTION SRTSEQ = *LANGIDSHR ;
You should also create an index to support it for performance. Create the index when you have *LANGIDSHR as a connection attribute, and the shared-weight index should then be available to later jobs. (There are various ways to get the appropriate setting into effect.)
*LANGIDSHR relates to the language-ID for your jobs. Characters in the character set that might be considered as "equals", such as 'A' and 'a' or 'ü' and 'u', should be given equal weights (shared) and so select together.
I did something similar when I wanted a case insensitive search. I used UPPER(mtfield) = 'SEARCHSTRING'. I know this works.
See: https://stackoverflow.com/a/47181640/5507619
Database setting
There is a database config setting you can set at database creation. It's based on unicode, though.
CREATE DATABASE yourDB USING COLLATE UCA500R1_S1
The default Unicode Collation Algorithm is implemented by the UCA500R1 keyword without any attributes. Since the default UCA cannot simultaneously encompass the collating sequence of every language supported by Unicode, optional attributes can be specified to customize the UCA ordering. The attributes are separated by the underscore (_) character. The UCA500R1 keyword and any attributes form a UCA collation name.
The Strength attribute determines whether accent or case is taken into account when collating or comparing text strings. In writing systems without case or accent, the Strength attribute controls similarly important features.
The possible values are: primary (1), secondary (2), tertiary (3), quaternary (4), and identity (I). To ignore:
accent and case, use the primary strength level
case only, use the secondary strength level
neither accent nor case, use the tertiary strength level
Almost all characters can be distinguished by the first three strength levels, therefore in most locales the default Strength attribute is set at the tertiary level. However if the Alternate attribute (described below) is set to shifted, then the quaternary strength level can be used to break ties among white space characters, punctuation marks, and symbols that would otherwise be ignored. The identity strength level is used to distinguish among similar characters, such as the MATHEMATICAL BOLD SMALL A character (U+1D41A) and the MATHEMATICAL ITALIC SMALL A character (U+1D44E).
Setting the Strength attribute to higher level will slow down text string comparisons and increase the length of the sort keys.
Examples:
UCA500R1_S1 will collate "role" = "Role" = "rôle"
UCA500R1_S2 will collate "role" = "Role" < "rôle"
UCA500R1_S3 will collate "role" < "Role" < "rôle"
This worked for me. As you can see, ..._S2 ignores case, too.
Using a newer standard version, it should look like this:
CREATE DATABASE yourDB USING COLLATE CLDR181_S1
Collation keywords:
UCA400R1 = Unicode Standard 4.0 = CLDR version 1.2
UCA500R1 = Unicode Standard 5.0 = CLDR version 1.5.1
CLDR181 = Unicode Standard 5.2 = CLDR version 1.8.1
If your database is already created, there is supposed to be a way to change the setting.
CALL SYSPROC.ADMIN_CMD( 'UPDATE DB CFG USING DB_COLLNAME UCA500R1_S1 ' );
I do have problems executing this, but for all I know it is supposed to work.
Generated table row
Other options are e.g. generating a upper case row:
CREATE TABLE t (
id INTEGER NOT NULL PRIMARY KEY,
str VARCHAR(500),
ucase_str VARCHAR(500) GENERATED ALWAYS AS ( UPPER(str) )
)#
INSERT INTO t(id, str)
VALUES ( 1, 'Some String' )#
SELECT * FROM t#
ID STR UCASE_STR
----------- ------------------------------------ ------------------------------------
1 Some String SOME STRING
1 record(s) selected.