MySQL 8.0 regexp_replace with parentheses (callback)

MySQL 8.0 regexp_replace with parentheses (callback) - callback

I have a table that currently contains a log of urls used in API calls, for example:
https : // mysite.org/data/2.5/endpoint?lat=32.2049&lon=-95.8555
(The spaces are in that URL due to stackoverflows insistence.)
The challenge is to now separate these into api_server, api_endpoint and api_parameters. It's pretty easy to get the api_server and api_parms from this, since it's simple removal of what doesn't belong from one end. But I'm having a challenge when it comes to callback.
My experience with regexp, (I'm by far not a newbie) has been that there's inconsistency with regards to need for escaping parentheses for a callback - and in the case of mySQL, just how to do that escaping. I've tried several different ways of approaching this, my three most likely candidates being:
mysql> with t1 as (select 'https : // mysite.org/data/2.5/endpoint?lat=32.2049&lon=-95.8555' as url)
-> select regexp_replace(url, '[^/]*$', '') as api_server
-> , regexp_replace(url, '^[^?]*\\?', '?') as api_parms
-> , regexp_replace(url, '^[^/?]*/([:alnum:]*)\?.*$', '\\1') AS endpoint
-> FROM t1;
+------------------------------+---------------------------+----------+
| api_server | api_parms | endpoint |
+------------------------------+---------------------------+----------+
| https : // mysite.org/data/2.5/ | ?lat=32.2049&lon=-95.8555 | 1 |
+------------------------------+---------------------------+----------+
1 row in set (0.14 sec)
mysql> with t1 as (select 'https : // mysite.org/data/2.5/endpoint?lat=32.2049&lon=-95.8555' as url)
-> select regexp_replace(url, '[^/]*$', '') as api_server
-> , regexp_replace(url, '^[^?]*\\?', '?') as api_parms
-> , regexp_replace(url, '^[^/?]*/\\([:alnum:]*\\)\?.*$', '\\1') AS endpoint
-> FROM t1;
+------------------------------+---------------------------+---------------------------------------------------------------+
| api_server | api_parms | endpoint |
+------------------------------+---------------------------+---------------------------------------------------------------+
| https : // mysite.org/data/2.5/ | ?lat=32.2049&lon=-95.8555 | https : // mysite.org/data/2.5/endpoint?lat=32.2049&lon=-95.8555 |
+------------------------------+---------------------------+---------------------------------------------------------------+
1 row in set (0.03 sec)
mysql> with t1 as (select 'https : // mysite.org/data/2.5/endpoint?lat=32.2049&lon=-95.8555' as url)
-> select regexp_replace(url, '[^/]*$', '') as api_server
-> , regexp_replace(url, '^[^?]*\\?', '?') as api_parms
-> , regexp_replace(url, '^[^/?]*/\([:alnum:]*\)\?.*$', '\\1') AS endpoint
-> FROM t1;
+------------------------------+---------------------------+----------+
| api_server | api_parms | endpoint |
+------------------------------+---------------------------+----------+
| https : // mysite.org/data/2.5/ | ?lat=32.2049&lon=-95.8555 | 1 |
+------------------------------+---------------------------+----------+
1 row in set (0.04 sec)
(The difference being in how (and whether) I escape the callback parentheses.)
The mySQL documentation on this, sadly is extremely poor... they briefly mention the parentheses in passing, saying that they have to be escaped with double-backslash, but as far as the callback itself, I found nothing.
Is it even possible to use callback in regexp_replace? And if so, what's the trick I'm missing? And if not, why did they go through all the trouble of implementing and then explaining the callback escape requirement?
Thanks for your consideration and time.

Related

Querying jsonb field with #> through Postgrex adapter

I'm trying to query jsonb field via Postgrex adapter, however I receive errors I cannot understand.
Notification schema
def all_for(user_id, external_id) do
from(n in __MODULE__,
where: n.to == ^user_id and fragment("? #> '{\"external_id\": ?}'", n.data, ^external_id)
)
|> order_by(desc: :id)
end
it generates the following sql
SELECT n0."id", n0."data", n0."to", n0."inserted_at", n0."updated_at" FROM "notifications"
AS n0 WHERE ((n0."to" = $1) AND n0."data" #> '{"external_id": $2}') ORDER BY n0."id" DESC
and then I receive the following error
↳ :erl_eval.do_apply/6, at: erl_eval.erl:680
** (Postgrex.Error) ERROR 22P02 (invalid_text_representation) invalid input syntax for type json. If you are trying to query a JSON field, the parameter may need to be interpolated. Instead of
p.json["field"] != "value"
do
p.json["field"] != ^"value"
query: SELECT n0."id", n0."data", n0."to", n0."inserted_at", n0."updated_at" FROM "notifications" AS n0 WHERE ((n0."to" = $1) AND n0."data" #> '{"external_id": $2}') ORDER BY n0."id" DESC
Token "$" is invalid.
(ecto_sql 3.9.1) lib/ecto/adapters/sql.ex:913: Ecto.Adapters.SQL.raise_sql_call_error/1
(ecto_sql 3.9.1) lib/ecto/adapters/sql.ex:828: Ecto.Adapters.SQL.execute/6
(ecto 3.9.2) lib/ecto/repo/queryable.ex:229: Ecto.Repo.Queryable.execute/4
(ecto 3.9.2) lib/ecto/repo/queryable.ex:19: Ecto.Repo.Queryable.all/3
however if I just copypaste generated sql to psql console and run it, it will succeed.
SELECT n0."id", n0."data", n0."to", n0."inserted_at", n0."updated_at" FROM "notifications" AS n0 WHERE ((n0."to" = 233) AND n0."data" #> '{"external_id": 11}') ORDER BY n0."id" DESC
notifications-# ;
id | data | to | inserted_at | updated_at
----+---------------------+-----+---------------------+---------------------
90 | {"external_id": 11} | 233 | 2022-12-15 14:07:44 | 2022-12-15 14:07:44
(1 row)
data is jsonb column
Column | Type | Collation | Nullable | Default
-------------+--------------------------------+-----------+----------+-------------------------------------------
data | jsonb | | | '{}'::jsonb
What am I missing in my elixir notification query code?

Searching for solution I came across only using raw sql statement, as I couldn't figure out what's wrong with my query when it gets passed through Postgrex
so as a solution I found the following:
def all_for(user_id, external_ids) do
{:ok, result} =
Ecto.Adapters.SQL.query(
Notifications.Repo,
search_by_external_id_query(user_id, external_ids)
)
Enum.map(result.rows, &Map.new(Enum.zip(result.columns, &1)))
end
defp search_by_external_id_query(user_id, external_id) do
"""
SELECT * FROM "notifications" AS n0 WHERE ((n0."to" = #{user_id})
AND n0.data #> '{\"external_id\": #{external_id}}')
ORDER BY n0."id" DESC
"""
end
But as a result I'm receiving Array with Maps inside not with Ecto.Schema as if I've been using Ecto.Query through Postgrex, so be aware.

T-sql: Highlight invoice numbers if they occur in a payment description field

I have two sql-server tables: bills and payments. I am trying to create a VIEW to highlight the bill numbers if they occur in the payment description field. For example:
TABLE bll
|bllID | bllNum |
| -------- | -------- |
| 1 | qwerty123|
| 2 | qwerty345|
| 3 | 1234 |
TABLE payments
|paymentID | description |
| -------- | ---------------------------------- |
| 1 | payment of qwerty123 and qwerty345 |
I want to highlight both the 'qwerty123' and 'qwerty345' strings by adding html code to it. The code I have is this:
SELECT REPLACE(payments.description,
COALESCE((SELECT TOP 1 bll.bllNum
FROM bll
WHERE COALESCE(bll.bllNum, '') <> '' AND
PATINDEX('%' + bll.bllNum + '%', payments.description) > 0), ''),
'<font color=red>' +
COALESCE((SELECT TOP 1 bll.bllNum
FROM bll
WHERE COALESCE(bll.bllNum, '') <> '' AND
PATINDEX('%' + bll.bllNum + '%', payments.description) > 0), '') +
'</font>')
FROM payments
This works but only for the first occurrence of a bill number. If the description field has more than one bill number, the consecutive bill numbers are not highlighted. So in my example 'qwerty123' gets highlighted, but not 'qwerty345'
I need to highlight all occurrences. How can I accomplish this?

With the caveat that this is not a task best done in the database, one possible approach you could try is to use string_split to break your description into words and then join this to your Bills, doing your string manipulation on matching rows.
Note, according to the documentation, string_split is not 100% guaranteed to retain its correct ordering but always has in my usage. It could always be substituted for an alternative function to work on the same principle.
select string_agg (w.word,' ') [Description]
from (
select
case when exists (select * from bill b where b.billnum=s.value)
then Concat('<font colour="red">',s.value,'</font>') else s.value end word
from payments p
cross apply String_Split(description,' ')s
)w
Example DB Fiddle

Okay, I understand, I can put code in the front-end application by looping through the bill numbers and replacing them as they are found in the description. Just thought/ hoped there was a simple solution to this using t-sql. But I understand the difficulty.

How to write that tsql request (conversion problem)?

I have 2 tables :
Table Contract :
IdContract | Department | IdCompany
-----------------------------------------------------
154 | 6 | /1150/1420/879/
And table Company :
IdCompany | WorkingIdDepartment
-------------------------------------
1150 | /17/8/26/
For some reason some companies are wrongly assigned to some contracts : they are on contracts for department 6 but they work only on department 17, 8 and 26. I need to find those companies.
I try to request my database this way :
SELECT *
FROM tbl_contract ct, tbl_compagny cp
WHERE ct.IdCompany like CONCAT('%/1150/%')
AND ct.Department NOT IN (17,8,26)
Or in a dynamic way :
SELECT *
FROM tbl_contract ct, tbl_compagny cp
WHERE ct.IdCompany like CONCAT('%/' , cp.IdCompany , '/%')
AND ct.Department NOT IN (Right(Left(Replace(WorkingIdDepartment, '/', ','), len(Replace(WorkingIdDepartment, '/', ',')) - 1), len(Left(Replace(WorkingIdDepartment, '/', ','), len(Replace(WorkingIdDepartment, '/', ',')) - 1))-1))
But my error is the last part of my request give me a string : '17,8,26' and not several Int separated with coma... so I get conversion error (I don"t copy/paste my error here because it's in french)
How can I achieve my request please ?
Thanks for your help !

The solution was :
SELECT *
FROM tbl_contract ct, tbl_compagny cp
WHERE ct.IdCompany like CONCAT('%/' , cp.IdCompany , '/%')
AND ct.Department NOT IN (SELECT value from STRING_SPLIT(WorkingIdDepartment, '/'))
It's very "performance consumming" but it works with my database poor design

How to return a function result into query?

I have a function called ClientStatus that returns a record with two fields Status_Description and Status_Date. This function receives a parameter Client_Id.
I'm trying to get the calculated client status for all the clients in the table Clients, something like:
| Client_Name | Status_Description | Status_Date |
+-------------+--------------------+-------------+
| Abc | Active | 12-12-2010 |
| Def | Inactive | 13-12-2011 |
Where Client_Name comes from the table Clients, Status_Description and Status_Date from the function result.
My first (wrong) approach was to join the table and the function like so:
SELECT c.Client_Name, cs.Status_Description, cs.Status_Date FROM Clients c
LEFT JOIN (
SELECT * FROM ClientStatus(c.ClientId) as (Status_Description text, Status_Date date)) cs
This obviously didn't work because c.ClientId could not be referenced.
Could someone explain me how can I obtain the result I am looking for?
Thanks in advance.

I think the following can give the result you expect :
SELECT c.Client_Name, d.Status_Description, d.Status_Date
FROM Clients c, ClientStatus(c.ClientId) d

I have solved my problem writing the query like this:
SELECT c.Client_Name, cs.status[1] as Description, cs.stautus[2]::date as Date
FROM (
SELECT string_to_array(translate(
(SELECT ClientStatus(ClientId))::Text, '()', ''), ',') status
FROM Clients
) cs
It is not the most elegant solution but it was the only one I could find to make this work.

EXISTS(select 1 from t1) vs EXISTS(select * from t1) [duplicate]

I used to write my EXISTS checks like this:
IF EXISTS (SELECT * FROM TABLE WHERE Columns=#Filters)
BEGIN
UPDATE TABLE SET ColumnsX=ValuesX WHERE Where Columns=#Filters
END
One of the DBA's in a previous life told me that when I do an EXISTS clause, use SELECT 1 instead of SELECT *
IF EXISTS (SELECT 1 FROM TABLE WHERE Columns=#Filters)
BEGIN
UPDATE TABLE SET ColumnsX=ValuesX WHERE Columns=#Filters
END
Does this really make a difference?

No, SQL Server is smart and knows it is being used for an EXISTS, and returns NO DATA to the system.
Quoth Microsoft:
http://technet.microsoft.com/en-us/library/ms189259.aspx?ppud=4
The select list of a subquery
introduced by EXISTS almost always
consists of an asterisk (*). There is
no reason to list column names because
you are just testing whether rows that
meet the conditions specified in the
subquery exist.
To check yourself, try running the following:
SELECT whatever
FROM yourtable
WHERE EXISTS( SELECT 1/0
FROM someothertable
WHERE a_valid_clause )
If it was actually doing something with the SELECT list, it would throw a div by zero error. It doesn't.
EDIT: Note, the SQL Standard actually talks about this.
ANSI SQL 1992 Standard, pg 191 http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
3) Case:
a) If the <select list> "*" is simply contained in a <subquery> that
is immediately contained in an <exists predicate>, then the <select list> is
equivalent to a <value expression>
that is an arbitrary <literal>.

The reason for this misconception is presumably because of the belief that it will end up reading all columns. It is easy to see that this is not the case.
CREATE TABLE T
(
X INT PRIMARY KEY,
Y INT,
Z CHAR(8000)
)
CREATE NONCLUSTERED INDEX NarrowIndex ON T(Y)
IF EXISTS (SELECT * FROM T)
PRINT 'Y'
Gives plan
This shows that SQL Server was able to use the narrowest index available to check the result despite the fact that the index does not include all columns. The index access is under a semi join operator which means that it can stop scanning as soon as the first row is returned.
So it is clear the above belief is wrong.
However Conor Cunningham from the Query Optimiser team explains here that he typically uses SELECT 1 in this case as it can make a minor performance difference in the compilation of the query.
The QP will take and expand all *'s
early in the pipeline and bind them to
objects (in this case, the list of
columns). It will then remove
unneeded columns due to the nature of
the query.
So for a simple EXISTS subquery like
this:
SELECT col1 FROM MyTable WHERE EXISTS (SELECT * FROM Table2 WHERE MyTable.col1=Table2.col2) The * will be
expanded to some potentially big
column list and then it will be
determined that the semantics of the
EXISTS does not require any of those
columns, so basically all of them can
be removed.
"SELECT 1" will avoid having to
examine any unneeded metadata for that
table during query compilation.
However, at runtime the two forms of
the query will be identical and will
have identical runtimes.
I tested four possible ways of expressing this query on an empty table with various numbers of columns. SELECT 1 vs SELECT * vs SELECT Primary_Key vs SELECT Other_Not_Null_Column.
I ran the queries in a loop using OPTION (RECOMPILE) and measured the average number of executions per second. Results below
+-------------+----------+---------+---------+--------------+
| Num of Cols | * | 1 | PK | Not Null col |
+-------------+----------+---------+---------+--------------+
| 2 | 2043.5 | 2043.25 | 2073.5 | 2067.5 |
| 4 | 2038.75 | 2041.25 | 2067.5 | 2067.5 |
| 8 | 2015.75 | 2017 | 2059.75 | 2059 |
| 16 | 2005.75 | 2005.25 | 2025.25 | 2035.75 |
| 32 | 1963.25 | 1967.25 | 2001.25 | 1992.75 |
| 64 | 1903 | 1904 | 1936.25 | 1939.75 |
| 128 | 1778.75 | 1779.75 | 1799 | 1806.75 |
| 256 | 1530.75 | 1526.5 | 1542.75 | 1541.25 |
| 512 | 1195 | 1189.75 | 1203.75 | 1198.5 |
| 1024 | 694.75 | 697 | 699 | 699.25 |
+-------------+----------+---------+---------+--------------+
| Total | 17169.25 | 17171 | 17408 | 17408 |
+-------------+----------+---------+---------+--------------+
As can be seen there is no consistent winner between SELECT 1 and SELECT * and the difference between the two approaches is negligible. The SELECT Not Null col and SELECT PK do appear slightly faster though.
All four of the queries degrade in performance as the number of columns in the table increases.
As the table is empty this relationship does seem only explicable by the amount of column metadata. For COUNT(1) it is easy to see that this gets rewritten to COUNT(*) at some point in the process from the below.
SET SHOWPLAN_TEXT ON;
GO
SELECT COUNT(1)
FROM master..spt_values
Which gives the following plan
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))
|--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))
|--Index Scan(OBJECT:([master].[dbo].[spt_values].[ix2_spt_values_nu_nc]))
Attaching a debugger to the SQL Server process and randomly breaking whilst executing the below
DECLARE #V int
WHILE (1=1)
SELECT #V=1 WHERE EXISTS (SELECT 1 FROM ##T) OPTION(RECOMPILE)
I found that in the cases where the table has 1,024 columns most of the time the call stack looks like something like the below indicating that it is indeed spending a large proportion of the time loading column metadata even when SELECT 1 is used (For the case where the table has 1 column randomly breaking didn't hit this bit of the call stack in 10 attempts)
sqlservr.exe!CMEDAccess::GetProxyBaseIntnl() - 0x1e2c79 bytes
sqlservr.exe!CMEDProxyRelation::GetColumn() + 0x57 bytes
sqlservr.exe!CAlgTableMetadata::LoadColumns() + 0x256 bytes
sqlservr.exe!CAlgTableMetadata::Bind() + 0x15c bytes
sqlservr.exe!CRelOp_Get::BindTree() + 0x98 bytes
sqlservr.exe!COptExpr::BindTree() + 0x58 bytes
sqlservr.exe!CRelOp_FromList::BindTree() + 0x5c bytes
sqlservr.exe!COptExpr::BindTree() + 0x58 bytes
sqlservr.exe!CRelOp_QuerySpec::BindTree() + 0xbe bytes
sqlservr.exe!COptExpr::BindTree() + 0x58 bytes
sqlservr.exe!CScaOp_Exists::BindScalarTree() + 0x72 bytes
... Lines omitted ...
msvcr80.dll!_threadstartex(void * ptd=0x0031d888) Line 326 + 0x5 bytes C
kernel32.dll!_BaseThreadStart#8() + 0x37 bytes
This manual profiling attempt is backed up by the VS 2012 code profiler which shows a very different selection of functions consuming the compilation time for the two cases (Top 15 Functions 1024 columns vs Top 15 Functions 1 column).
Both the SELECT 1 and SELECT * versions wind up checking column permissions and fail if the user is not granted access to all columns in the table.
An example I cribbed from a conversation on the heap
CREATE USER blat WITHOUT LOGIN;
GO
CREATE TABLE dbo.T
(
X INT PRIMARY KEY,
Y INT,
Z CHAR(8000)
)
GO
GRANT SELECT ON dbo.T TO blat;
DENY SELECT ON dbo.T(Z) TO blat;
GO
EXECUTE AS USER = 'blat';
GO
SELECT 1
WHERE EXISTS (SELECT 1
FROM T);
/* ↑↑↑↑
Fails unexpectedly with
The SELECT permission was denied on the column 'Z' of the
object 'T', database 'tempdb', schema 'dbo'.*/
GO
REVERT;
DROP USER blat
DROP TABLE T
So one might speculate that the minor apparent difference when using SELECT some_not_null_col is that it only winds up checking permissions on that specific column (though still loads the metadata for all). However this doesn't seem to fit with the facts as the percentage difference between the two approaches if anything gets smaller as the number of columns in the underlying table increases.
In any event I won't be rushing out and changing all my queries to this form as the difference is very minor and only apparent during query compilation. Removing the OPTION (RECOMPILE) so that subsequent executions can use a cached plan gave the following.
+-------------+-----------+------------+-----------+--------------+
| Num of Cols | * | 1 | PK | Not Null col |
+-------------+-----------+------------+-----------+--------------+
| 2 | 144933.25 | 145292 | 146029.25 | 143973.5 |
| 4 | 146084 | 146633.5 | 146018.75 | 146581.25 |
| 8 | 143145.25 | 144393.25 | 145723.5 | 144790.25 |
| 16 | 145191.75 | 145174 | 144755.5 | 146666.75 |
| 32 | 144624 | 145483.75 | 143531 | 145366.25 |
| 64 | 145459.25 | 146175.75 | 147174.25 | 146622.5 |
| 128 | 145625.75 | 143823.25 | 144132 | 144739.25 |
| 256 | 145380.75 | 147224 | 146203.25 | 147078.75 |
| 512 | 146045 | 145609.25 | 145149.25 | 144335.5 |
| 1024 | 148280 | 148076 | 145593.25 | 146534.75 |
+-------------+-----------+------------+-----------+--------------+
| Total | 1454769 | 1457884.75 | 1454310 | 1456688.75 |
+-------------+-----------+------------+-----------+--------------+
The test script I used can be found here

Best way to know is to performance test both versions and check out the execution plan for both versions. Pick a table with lots of columns.

There is no difference in SQL Server and it has never been a problem in SQL Server. The optimizer knows that they are the same. If you look at the execution plans, you will see that they are identical.

Personally I find it very, very hard to believe that they don't optimize to the same query plan. But the only way to know in your particular situation is to test it. If you do, please report back!

Not any real difference but there might be a very small performance hit. As a rule of thumb you should not ask for more data than you need.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

MySQL 8.0 regexp_replace with parentheses (callback) - callback

Related

Querying jsonb field with #> through Postgrex adapter

T-sql: Highlight invoice numbers if they occur in a payment description field

How to write that tsql request (conversion problem)?

How to return a function result into query?

EXISTS(select 1 from t1) vs EXISTS(select * from t1) [duplicate]

Categories

Resources