Can we exclude match with few strings in oracle regexp_like()? - regexp-like

Background Knowledge:
We can't use (?!) to exclude, since, regexp_like() doesn't support negative lookahead.
I don't want to exclude using 'NOT REGEXP_LIKE()'
[^] can negate a single character only but not a string
Question:
Interested to know if we have any alternative to change the regular expression itself being passed to oracle regexp_like().
Example scenario to explain:
Regexp - "STANDARD.*TIME" when used in regexp_like() would match all time zones containing both words STANDARD and TIME. Say I want to exclude 'INDIAN STANDARD TIME', 'ATLANTIC STANDARD TIME', 'IRISH STANDARD TIME' from the matched time zones

I would be interested to know why using 'NOT' is out of the question. But if you are looking for a regex solution for the fun of it, I don't think REGEXP_LIKE is going to work as the Oracle flavor does not support negative look-aheads. However, thinking outside of the box a little, and knowing that REGEX_REPLACE returns NULL if the pattern is not found, you could do something like this (although just because you can does not mean you should and I would use NOT with REGEXP_LIKE):
SQL> with tbl(str) as (
select 'INDIAN STANDARD TIME' from dual union all
select 'ATLANTIC STANDARD TIME' from dual union all
select 'IRISH STANDARD TIME' from dual union all
select 'EASTERN STANDARD TIME' from dual union all
select 'PST STANDARD TIME' from dual union all
select 'CST STANDARD TIME' from dual
)
select str
from tbl
where str = regexp_replace(str, '^(INDIAN|ATLANTIC|IRISH) STANDARD.*TIME', 'DO NOT WANT THESE');
STR
----------------------
EASTERN STANDARD TIME
PST STANDARD TIME
CST STANDARD TIME
SQL>
So this replaces the strings you don't want then compares them. Since the match is not found the select does not return them. Still not as clean as:
select str
from tbl
where NOT regexp_like(str, '^(INDIAN|ATLANTIC|IRISH) STANDARD.*TIME');

Related

How to split a string in TSQL by space character

I have a difficult task in TSQL that I can't seem to find a simple way to do. I am trying to use CROSS APPLY STRING_SPLIT(sentence, ' '), but I can only get one word to the method. Can you please help? Thank you.
Sample sentence:
I need to split strings using TSQL.
This approach is traditional, and is supported in all versions and editions of SQL Server.
Desired answer:
I need
to split
strings using
TSQL.
Desired Answer:
This approach
is traditional
, and
is supported
in all
versions and
editions of
SQL Server.
Here you go:
First add a space to any comma (you want a comma treated as a word), then split the string on each space into rows using some Json, then assign groups to pair each row using modulo and lag over(), then aggregate based on the groups:
declare #s varchar(100)='This approach is traditional, and is supported in all versions and editions of SQL Server';
select Result = String_Agg(string,' ') within group (order by seq)
from (
select j.[value] string, Iif(j.[key] % 2 = 1, Lag(seq) over(order by seq) ,seq) gp, seq
from OpenJson(Concat('["',replace(Replace(#s,',',' ,'), ' ', '","'), '"]')) j
cross apply(values(Convert(tinyint,j.[key])))x(seq)
)x
group by gp;
Result:
See Demo Fiddle

How to concate the Currency symbol for negative integers?

I am trying to show the currency symbol with the numbers. I am using the CONCAT method to do this.
select concat('$', "amount") from payments;
This method working good when the amount is positive but when the amount is negative it is concat the currency symbol before minus.
eg:
$-243.44
What is the proper way to do this?
You can use select case
select case when amount < 0 then concat('-$', abs("amount")) else concat('$', "amount") end from payments;
I suggest that you take advantage of Postgres' in-built currency type, e.g.
SELECT '-243.44'::float8::numeric::money;
This printed -£243.44 on the demo tool I am using, which appears to be located in the UK. The actual currency symbol you see would depend on your Postgres locale settings.
If you really need to do this concatenation yourself, you could use REGEXP_REPLACE:
WITH cte AS (
SELECT '-123.456'::text AS val UNION ALL
SELECT '123.456'::text
)
SELECT
val,
REGEXP_REPLACE(val, '^(-?)', '\1$') AS val_out
FROM cte;

to_date in HANA with mixed date formats

What can you do in case you have different date formats in the origin?
I have a case where we are using a to_date function to get the information from a table, but I am getting an error because some of the records have a date format YYYY-DD-MM instead of YYYY-MM-DD
How to apply a uniform solution for this?
To handle this situation (arbitrary text should be converted into a structured date value), I would probably work with regular expressions.
That way you can select the set of records that fit the format you like to support and perform the type conversion on those records.
For example:
create column table date_vals (dateval nvarchar (4000), date_val date)
insert into date_vals values ('2018-01-23', NULL);
insert into date_vals values ('12/23/2016', NULL);
select dateval, to_date(dateval, 'YYYY-MM-DD') as SQL_DATE
from date_vals
where
dateval like_regexpr '[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}'
union all
select dateval, to_date(dateval, 'MM/DD/YYYY') as SQL_DATE
from date_vals
where
dateval like_regexpr '[[:digit:]]{2}/[[:digit:]]{2}/[[:digit:]]{4}';
This approach also provides a good option to review the non-matching records and possible come up with additional required pattern.
Why not use a case when in the select where you would test the different regular expressions, then use the to_date to return the date with the proper format.
This would avoid a union all and 2 select statements.
You could add more "format" without more "select" in an additional union.
Unless like_regexpr only works in where clause (I have to admit I never tried that function).

Using a list of search patterns in LIKE or IN expression

The question: I have a list of sales quotations and many of them are not valid as they are simply in the system for practice or training. Usually the quotation name contains the word 'Test' or 'Dummy'. (In a couple of instances the quote_name contains 'Prova' - which happens to be Italian for 'Test').
Given that I cannot easily control the list of strings to search for, I decided to maintain the list in a second table - 'Terms to Search for'. A simple one column table with a list of terms ('Test', 'Prova', 'Dummy', ...).
In Amazon Redshift, I tried a simple CASE statement:
CASE WHEN UPPER(vx.quote_name) LIKE ('%' + UPPER(terms.term) + '%') THEN 'Y' ELSE 'N' END AS "Any DPS"
However, that seems to only get the first search term in the list.
Also, for the same quotation, which can have multiple rows due to multiple items being sold, I usually get one row set to 'Y' and the rest set to 'N'.
I modified the statement to:
---- #4a: get a list of the quotes whose quote_names match the patterns in the list
SELECT
vx.master_quote_number,
'Y' AS "Any DPS"
FROM t_quotes vx, any_dps_search_families terms
WHERE UPPER(vx.prod_fmly) IN ('%'+ UPPER(terms.term) +'%');
--- 4b: merge Any DPS results back in
select vx.*, dps."Any DPS"
from t_quotes vx
LEFT JOIN transform_data_4 dps ON (vx.master_quote_number = dps.master_quote_number)
But that isn't doing it either.
Environment: Amazon Redshit (which is mostly like Postgres). An answer to this in Postgres would be ideal. I can switch this clause to MySQL if needed but I'd rather not.
This is a case for lateral joins (untested):
SELECT vx.master_quote_number
FROM any_dps_search_families terms
CROSS JOIN LATERAL (SELECT master_quote_number
FROM t_quotes
WHERE UPPER(prod_fmly)
LIKE ('%' || UPPER(terms.term) || '%')
) vx;

how to remove my stop words from a string column in postgresql

I have a table with a string column. I want to remove the stop words. I used this query which seems Ok.
SELECT to_tsvector('english',colName)from tblName order by colName asc;
it does not update the column in table
I want to see the stop words of Postgresql and what the query found.Then in case I can replace it with my own file. I also checked this address and could not find the stop words list file. Actually, the address does not exist.
$SHAREDIR/tsearch_data/english.stop
There is no function to do that.
You could use something like this (in this example in German):
SELECT array_to_string(tsvector_to_array(to_tsvector('Hallo, Bill und Susi!')), ' ');
array_to_string
-----------------
bill hallo susi
(1 row)
This removes stop words, but also stems and non-words, and it does not care about word order, so I doubt that the result will make you happy.
If that doesn't fit the bill, you can use regexp_replace like this:
SELECT regexp_replace('Bill and Susi, hand over or die!', '\y(and|or|if)\y', '', 'g');
regexp_replace
-----------------------------
Bill Susi, hand over die!
(1 row)
But that requires that you include your list of stop words in the query string. An improved version would store the stop words in a table.
The chosen answer did not match my requirement, but I found a solution for this:
SELECT regexp_replace('Bill and Susi, hand over or die!', '[^ ]*$','');
regexp_replace
-----------------------------
Bill and Susi, hand over or
(1 row)