Extract string with multiple criteria - sql - postgresql

We have a column of varying string values like below. How to extract part of the string and return the desired result in Redshift?
Example
Remove last part that starts with an underscore and number (_1_MN, number can be 1-1000)
Remove leading part (Ed_)
Replace any remaining underscore with a space
String:
Ed_Westside Ind School District 94_Williams Elementary School_1_MN
Desired result:
Westside Ind School District 94 Williams Elementary School

MySQL
UPDATE products
SET col = SUBSTRING(col FROM 3)
WHERE col LIKE ('Ed_%')
UPDATE products
SET col = SUBSTRING(col FROM -5 )
WHERE col LIKE ('%_1_MN')
UPDATE products
SET col = REPLACE(col, '_', ' ')

While not very elegant, this worked.
REGEXP_REPLACE(LEFT(RIGHT(name, LEN(name) - 3), LEN(name) -8) , '_', ' ')

Related

How to use inticap and leave last part in capital letter in postgresql 11

I have a table with address all in capital but I am trying to format it like this:
145 WELLBEING STREET, PARADE, LONDON, PP1 5PP
145 Wellbeing Street, Parade, London, PP1 5PP
How can i work with that? I have tried, converting all with initcap:
UPDATE address_table
SET field_address = initcap(field_address)
And then it pick the last piece and convert it in Uppercase:
UPDATE address_table
SET field_address = upper(regexp_replace(field_address , '^.*,', ''))
So i thought in concatenate but still i have the last piece in initcap...
I have also tried with :
SELECT field_address || upper(trim(reverse(split_part(reverse(field_address ), ',', 1)))) FROM address_table
but the results i got are like i.e.
Southridge, Newbury Hill, Hampstead Norreys, Thatcham, Rg18 0trRG18 0TR
How can i delete the last part after the last comma to convert it in capital?
From Postgres 14 :
UPDATE address_table
SET field_address = initcap(replace(field_address, split_part(field_address, ',', -1), '')) || split_part(field_address, ',', -1)
From Postgres 10 :
UPDATE address_table
SET field_address = initcap(regexp_replace(field_address, '[^,]+$', '')) || (regexp_match(field_address, '[^,]+$'))[1]
see test result in dbfiddle.

Postgres: Retrieve first n words from column

I know that I can do a text search in Postgres with TextSearch and get some result with
select ts_headline('german',content, tq, 'MaxFragments=4, MinWords=5, MaxWords=12,
ShortWord=3, StartSel = <strong>, StopSel = </strong>') as highlight, ...
FROM to_tsquery('german', 'test') tq ...
Is there a similar way to apply to content the same limitations? i.e. to get directly up to 12 words from the column content.
You could use regular expressions:
SELECT (regexp_match(
regexp_replace(content, '[^\w\s]+', ' ', 'g'),
'^\s*((?:\w+\s+){9}\w+)'
))[1] FROM ...
That will first replace everything that is not a space or alphanumerical character with a space and then return the first 10 words.

PostgreSQL return last n words

How to return last n words using Postgres.
I have tried using LEFT method.
SELECT DISTINCT LEFT(name, -4) FROM my_table;
but it return last 4 characters ,i want to return last 3 words.
demo:db<>fiddle
You can do this using a the SUBSTRING() function and regular expressions:
SELECT
SUBSTRING(name FROM '((\S+\s+){0,3}\S+$)')
FROM my_table
This has been explained here: How can I match the last two words in a sentence in PostgreSQL?
\S+ is a string of non-whitespace characters
\s+ is a string of whitespace characters (e.g. one space)
(\S+\s+){0,3} Zero to three words separated by a space
\S+$ one word at the end of the text.
-> creates 4 words (or less if there are no more).
One way is to use regexp_split_to_array() to split the string into the words it contains and then put a string back together using the last 3 words in that array.
SELECT coalesce(w.words[array_length(w.words, 1) - 2] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1) - 1] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1)], '')
FROM mytable t
CROSS JOIN LATERAL (SELECT regexp_split_to_array(t."name", ' ') words) w;
db<>fiddle
RIGHT() should do
SELECT RIGHT('MYCOLUMN', 4); -- returns LUMN
UPD
You can convert to array and then back to string
SELECT array_to_string(sentence[(array_length(sentence,1)-3):(array_length(sentence,1))],' ','*')
FROM
(
SELECT regexp_split_to_array('this is the one of the way to get the last four words of the string', E'\\s+') AS sentence
) foo;
DEMO HERE

Count previous occurences of a value split by date ranges

Here's a simple query we do for ad hoc requests from our Marketing department on the leads we received in the last 90 days.
SELECT ID
,FIRST_NAME
,LAST_NAME
,ADDRESS_1
,ADDRESS_2
,CITY
,STATE
,ZIP
,HOME_PHONE
,MOBILE_PHONE
,EMAIL_ADDRESS
,ROW_ADDED_DTM
FROM WEB_LEADS
WHERE ROW_ADDED_DTM BETWEEN #START AND #END
They are asking for more derived columns to be added that show the number of previous occurences of ADDRESS_1 where the EMAIL_ADDRESS matches. But they want is for different date ranges.
So the derived columns would look like this:
,COUNT_ADDRESS_1_LAST_1_DAYS,
,COUNT_ADDRESS_1_LAST_7_DAYS
,COUNT_ADDRESS_1_LAST_14_DAYS
etc.
I've manually filled these derived columns using update statements when there was just a few. The above query is really just a sample of a much larger query with many more columns. The actual request has blossomed into 6 date ranges for 13 columns. I'm asking if there's a better way then using 78 additional update statements.
I think you will have a hard time writing a query that includes all of these 78 metrics per e-mail address without actually creating a query that hard-codes the different choices. However you can generate such a pivot query with dynamic SQL, which will save you some keystrokes and will adjust dynamically as you add more columns to the table.
The result you want to end up with will look something like this (but of course you won't want to type it):
;WITH y AS
(
SELECT
EMAIL_ADDRESS,
/* aggregation portion */
[ADDRESS_1] = COUNT(DISTINCT [ADDRESS_1]),
[ADDRESS_2] = COUNT(DISTINCT [ADDRESS_2]),
... other columns
/* end agg portion */
FROM dbo.WEB_LEADS AS wl
WHERE ROW_ADDED_DTM >= /* one of 6 past dates */
GROUP BY wl.EMAIL_ADDRESS
)
SELECT EMAIL_ADDRESS,
/* pivot portion */
COUNT_ADDRESS_1_LAST_1_DAYS = *count address 1 from 1 day ago*,
COUNT_ADDRESS_1_LAST_7_DAYS = *count address 1 from 7 days ago*,
... other date ranges ...
COUNT_ADDRESS_2_LAST_1_DAYS = *count address 2 from 1 day ago*,
COUNT_ADDRESS_2_LAST_7_DAYS = *count address 2 from 7 days ago*,
... other date ranges ...
... repeat for 11 more columns ...
/* end pivot portion */
FROM y
GROUP BY EMAIL_ADDRESS
ORDER BY EMAIL_ADDRESS;
This is a little involved, and it should all be run as one script, but I'm going to break it up into chunks to intersperse comments on how the above portions are populated without typing them. (And before long #Bluefeet will probably come along with a much better PIVOT alternative.) I'll enclose my interspersed comments in /* */ so that you can still copy the bulk of this answer into Management Studio and run it with the comments intact.
Code/comments to copy follows:
/*
First, let's build a table of dates that can be used both to derive labels for pivoting and to assist with aggregation. I've added the three ranges you've mentioned and guessed at a fourth, but hopefully it is clear how to add more:
*/
DECLARE #d DATE = SYSDATETIME();
CREATE TABLE #L(label NVARCHAR(15), d DATE);
INSERT #L(label, d) VALUES
(N'LAST_1_DAYS', DATEADD(DAY, -1, #d)),
(N'LAST_7_DAYS', DATEADD(DAY, -8, #d)),
(N'LAST_14_DAYS', DATEADD(DAY, -15, #d)),
(N'LAST_MONTH', DATEADD(MONTH, -1, #d));
/*
Next, let's build the portions of the query that are repeated per column name. First, the aggregation portion is just in the format col = COUNT(DISTINCT col). We're going to go to the catalog views to dynamically derive the list of column names (except ID, EMAIL_ADDRESS and ROW_ADDED_DTM) and stuff them into a #temp table for re-use.
*/
SELECT name INTO #N FROM sys.columns
WHERE [object_id] = OBJECT_ID(N'dbo.WEB_LEADS')
AND name NOT IN (N'ID', N'EMAIL_ADDRESS', N'ROW_ADDED_DTM');
DECLARE #agg NVARCHAR(MAX) = N'', #piv NVARCHAR(MAX) = N'';
SELECT #agg += ',
' + QUOTENAME(name) + ' = COUNT(DISTINCT '
+ QUOTENAME(name) + ')' FROM #N;
PRINT #agg;
/*
Next we'll build the "pivot" portion (even though I am angling for the poor man's pivot - a bunch of CASE expressions). For each column name we need a conditional against each range, so we can accomplish this by cross joining the list of column names against our labels table. (And we'll use this exact technique again in the query later to make the /* one of past 6 dates */ portion work.
*/
SELECT #piv += ',
COUNT_' + n.name + '_' + l.label
+ ' = MAX(CASE WHEN label = N''' + l.label
+ ''' THEN ' + QUOTENAME(n.name) + ' END)'
FROM #N as n CROSS JOIN #L AS l;
PRINT #piv;
/*
Now, with those two portions populated as we'd like them, we can build a dynamic SQL statement that fills out the rest:
*/
DECLARE #sql NVARCHAR(MAX) = N';WITH y AS
(
SELECT
EMAIL_ADDRESS, l.label' + #agg + '
FROM dbo.WEB_LEADS AS wl
CROSS JOIN #L AS l
WHERE wl.ROW_ADDED_DTM >= l.d
GROUP BY wl.EMAIL_ADDRESS, l.label
)
SELECT EMAIL_ADDRESS' + #piv + '
FROM y
GROUP BY EMAIL_ADDRESS
ORDER BY EMAIL_ADDRESS;';
PRINT #sql;
EXEC sp_executesql #sql;
GO
DROP TABLE #N, #L;
/*
Now again, this is a pretty complex piece of code, and perhaps it can be made easier with PIVOT. But I think even #Bluefeet will write a version of PIVOT that uses dynamic SQL because there is just way too much to hard-code here IMHO.
*/

Dynamic pivot - how to obtain column titles parametrically?

I wish to write a Query for SAP B1 (t-sql) that will list all Income and Expenses Items by total and month by month.
I have successfully written a Query using PIVOT, but I do not want the column headings to be hardcoded like: Jan-11, Feb-11, Mar-11 ... Dec-11.
Rather I want the column headings to be parametrically generated, so that if I input:
--------------------------------------
Query - Selection Criteria
--------------------------------------
Posting Date greater or equal 01.09.10
Posting Date smaller or equal 31.08.11
[OK] [Cancel]
the Query will generate the following columns:
Sep-10, Oct-10, Nov-10, ..... Aug-11
I guess DYNAMIC PIVOT can do the trick.
So, I modified one SQL obtained from another forum to suit my purpose, but it does not work. The error message I get is Incorrect Syntax near 20100901.
Could anybody help me locate my error?
Note: In SAP B1, '[%1]' is an input variable
Here's my query:
/*Section 1*/
DECLARE #listCol VARCHAR(2000)
DECLARE #query VARCHAR(4000)
-------------------------------------
/*Section 2*/
SELECT #listCol =
STUFF(
( SELECT DISTINCT '],[' + CONVERT(VARCHAR, MONTH(T0.RefDate), 102)
FROM JDT1
FOR XML PATH(''))
, 1, 2, '') + ']'
------------------------------------
/*Section 3*/
SET #query = '
SELECT * FROM
(
SELECT
T0.Account,
T1.GroupMask,
T1.AcctName,
MONTH(T0.RefDate) as [Month],
(T0.Debit - T0.Credit) as [Amount]
FROM dbo.JDT1 T0
JOIN dbo.OACT T1 ON T0.Account = T1.AcctCode
WHERE
T1.GroupMask IN (4,5,6,7) AND
T0.[Refdate] >= '[%1]' AND
T0.[Refdate] <= '[%2]'
) S
PIVOT
(
Sum(Amount)
FOR [Month] IN ('+#listCol+')
) AS pvt
'
--------------------------------------------
/*Section 4*/
EXECUTE (#query)
I don't know SAP, but a couple of things spring to mind:
It looks like you want #listCol to contain a collection of numbers within square brackets, for example [07],[08],[09].... However, your code appears not to put a [ at the start of this string.
Try replacing the lines
T0.[Refdate] >= '[%1]' AND
T0.[Refdate] <= '[%2]'
with
T0.[Refdate] >= ''[%1]'' AND
T0.[Refdate] <= ''[%2]''
(I also added a space before the AND in the first of these two lines while I was editing your question.)