Subtract multiple strings from one record - postgresql

I am novice to Postgres queries. I am trying to pull substring from each record of column based on specific set.
Suppose, I substring from each record between keywords 'start' & 'end'. So the thing is it can be multiple occurrences of 'start' & 'end' in one record and need to extract what occurs between each set of 'start' & 'end' keywords.
Do we have possibility to achieve this with single query in Postgres, rather than creating a procedure? If yes, could you please help on this or re-direct me where I can find related information?

Assuming that / always delimits the elements, you can use string_to_array() to convert the string into multiple elements and unnest() to turn the array into a result. You can then use regexp_replace() to get rid of the delimiters in the curly braces:
select d.id, regexp_replace(t.name, '{start}|{end}', '', 'g')
from the_able d
cross join unnest(string_to_array(d.body,'/')) as t(name);
SQLFiddle example: http://sqlfiddle.com/#!15/9eecb7db59d16c80417c72d1e1f4fbf1/8863

You achieve all this using regular expressions, and the PostgreSQL regex functions regexp_matches (to match content between your tags) and regexp_replace (to remove the tags):
with t(id,body) as (values
(1, '{start}John{end}/{start}Jack{end}'),
(2, '{start}David{end}'),
(3, '{start}Ken{end}/{start}Kane{end}/{start}John{end}'))
select id, regexp_replace(
(regexp_matches(body, '{start}.*?{end}', 'g'))[1],
'^{start}|{end}$', '', 'g') matches
from t

Related

Can I add Apostrophe to numbers I get in PostgreSQL?

I have a Query that gives me a list of numbers, for example:
61728_0be80d3c-029d-4d06-ae75-37f72fdeacaf
61784_4e1b2b79-1190-4e65-91cc-07552e28b522
61864_f0a58134-a1d5-40f6-ada1-d12b7e991675
61928_3a5a70b1-9350-4acf-99e4-e858f14a6d98
62048_a489f752-ae51-4919-b720-1b6e15235a3e
62112_3a8289e9-c5e6-4aae-8c8a-431cc5ca9415
62176_95fbfdc9-88e3-4918-ac19-6b54f3205af4
62296_2f6fbd6b-9af4-4d6c-85e8-07ba64326669
62688_71c3ee51-0f5c-4f8e-8026-8b90a335795e
62776_e93d9f1d-272f-4161-80eb-5de90a026829
How can I make this query give me all these numbers in agg_string in order to add the to a where clause, so I can filter answers in a different query to only these numbers.
example:
'61728_0be80d3c-029d-4d06-ae75-37f72fdeacaf','61784_4e1b2b79-1190-4e65-91cc-07552e28b522',
'61864_f0a58134-a1d5-40f6-ada1-d12b7e991675' etc
in order to put it inside of:
where XXX IN ('61728_0be80d3c-029d-4d06-ae75-37f72fdeacaf','61784_4e1b2b79-1190-4e65-91cc-07552e28b522','61864_f0a58134-a1d5-40f6-ada1-d12b7e991675')
Any way to do it auto in sql or excel if not?
tried to use string_agg("personId",',') and it will add the commas between the numbers but i cant add Apostrophe at the beginning of the personID and at the end
You can concatenate single quotes to the ID inside the string_agg()
string_agg(concat('''', "personId", ''''), ',')
or a bit simpler:
string_agg(quote_literal("personId"), ',')
If those IDs are the result of a query, then you can also use it directly:
where xxx in (select "personId" from ...)

How to split a string in TSQL by space character

I have a difficult task in TSQL that I can't seem to find a simple way to do. I am trying to use CROSS APPLY STRING_SPLIT(sentence, ' '), but I can only get one word to the method. Can you please help? Thank you.
Sample sentence:
I need to split strings using TSQL.
This approach is traditional, and is supported in all versions and editions of SQL Server.
Desired answer:
I need
to split
strings using
TSQL.
Desired Answer:
This approach
is traditional
, and
is supported
in all
versions and
editions of
SQL Server.
Here you go:
First add a space to any comma (you want a comma treated as a word), then split the string on each space into rows using some Json, then assign groups to pair each row using modulo and lag over(), then aggregate based on the groups:
declare #s varchar(100)='This approach is traditional, and is supported in all versions and editions of SQL Server';
select Result = String_Agg(string,' ') within group (order by seq)
from (
select j.[value] string, Iif(j.[key] % 2 = 1, Lag(seq) over(order by seq) ,seq) gp, seq
from OpenJson(Concat('["',replace(Replace(#s,',',' ,'), ' ', '","'), '"]')) j
cross apply(values(Convert(tinyint,j.[key])))x(seq)
)x
group by gp;
Result:
See Demo Fiddle

Does Postgresql has function LISTAGG(column_name [, delimiter] ON OVERFLOW TRUNCATE )?

It seems "ON OVERFLOW TRUNCATE" feature is not available in Postgresql which goes with LISTAGG in Oracle. Is there a alternate function or workaround to it?
I believe array_agg / array_to_string is the Pg equivalent to Listagg:
select (array_to_string(array[1,2,3,4,5,6,7,8,9,10], '-'))
Array_agg is the aggregate that creates the array from your result set.
And if you want to truncate an array, you can use the upper boundary (from the [lower:upper] syntax). In this example, limit to the first six elements or truncate anything past the sixth in your terminology:
my_array[:6]
Combining these two, I believe you can do what you seek. For example, if you have a field you want to concatenate with commas, but you only want the first 10 elements:
select
array_to_string ((array_agg (my_field))[:10], '-')
from my_table

PostgreSQL query on a text column ignoring special characters

I have a table which contains a text column, say vehicle number.
Now I want to query the table for fields which contain a particular vehicle number.
While matching I do not want to consider non-alphanumeric characters.
example: query condition - DEL123
should match - DEL-123, DEL/123, DEL#123, etc...
If you know which characters to skip, put them as the second parameter of this translate() call (which is faster than regexp functions):
select *
from a_table
where translate(code, '-/#', '') = 'DEL123';
Else, you can compare only alphanumeric characters using regexp_replace():
select *
from a_table
where regexp_replace(code, '[^[:alnum:]]', '', 'g') = 'DEL123';
#klin's answer is great, but is not sargable, so in cases where you're searching through millions of records (maybe not your case, but perhaps someone else with a similar question looking for answers), using regular expressions will likely render much better results.
The following will use indexes on code significantly reducing the number of rows tested:
select *
from a_table
where code ~ '^DEL[^[:alnum:]]*123$';

How can I sort (order by) in postgres ignoring leading words like "the, a, etc"

I would like to be able to sort (order by) in postgres ignoring leading words like "the, a, etc"
one way: script (using your favorite language) the creation of an extra column of the text with noise words removed, and sort on that.
Add a SORT_NAME column that has all that stuff stripped out. For bonus points, use an input trigger to populate it automatically, using your favorite SQL dialect's regex parser or similar.
Try splitting the column and sorting on the second item in the resulting array:
select some_col from some_table order by split_part(some_col, ' ', 2);
No need to add an extra column. Strip out the leading words in your ORDER BY:
SELECT col FROM table ORDER BY REPLACE(REPLACE(col, 'A ', ''), 'The ', '')