How can I sort (order by) in postgres ignoring leading words like "the, a, etc" - postgresql

I would like to be able to sort (order by) in postgres ignoring leading words like "the, a, etc"

one way: script (using your favorite language) the creation of an extra column of the text with noise words removed, and sort on that.

Add a SORT_NAME column that has all that stuff stripped out. For bonus points, use an input trigger to populate it automatically, using your favorite SQL dialect's regex parser or similar.

Try splitting the column and sorting on the second item in the resulting array:
select some_col from some_table order by split_part(some_col, ' ', 2);

No need to add an extra column. Strip out the leading words in your ORDER BY:
SELECT col FROM table ORDER BY REPLACE(REPLACE(col, 'A ', ''), 'The ', '')

Related

Can I add Apostrophe to numbers I get in PostgreSQL?

I have a Query that gives me a list of numbers, for example:
61728_0be80d3c-029d-4d06-ae75-37f72fdeacaf
61784_4e1b2b79-1190-4e65-91cc-07552e28b522
61864_f0a58134-a1d5-40f6-ada1-d12b7e991675
61928_3a5a70b1-9350-4acf-99e4-e858f14a6d98
62048_a489f752-ae51-4919-b720-1b6e15235a3e
62112_3a8289e9-c5e6-4aae-8c8a-431cc5ca9415
62176_95fbfdc9-88e3-4918-ac19-6b54f3205af4
62296_2f6fbd6b-9af4-4d6c-85e8-07ba64326669
62688_71c3ee51-0f5c-4f8e-8026-8b90a335795e
62776_e93d9f1d-272f-4161-80eb-5de90a026829
How can I make this query give me all these numbers in agg_string in order to add the to a where clause, so I can filter answers in a different query to only these numbers.
example:
'61728_0be80d3c-029d-4d06-ae75-37f72fdeacaf','61784_4e1b2b79-1190-4e65-91cc-07552e28b522',
'61864_f0a58134-a1d5-40f6-ada1-d12b7e991675' etc
in order to put it inside of:
where XXX IN ('61728_0be80d3c-029d-4d06-ae75-37f72fdeacaf','61784_4e1b2b79-1190-4e65-91cc-07552e28b522','61864_f0a58134-a1d5-40f6-ada1-d12b7e991675')
Any way to do it auto in sql or excel if not?
tried to use string_agg("personId",',') and it will add the commas between the numbers but i cant add Apostrophe at the beginning of the personID and at the end
You can concatenate single quotes to the ID inside the string_agg()
string_agg(concat('''', "personId", ''''), ',')
or a bit simpler:
string_agg(quote_literal("personId"), ',')
If those IDs are the result of a query, then you can also use it directly:
where xxx in (select "personId" from ...)

How to split a string in TSQL by space character

I have a difficult task in TSQL that I can't seem to find a simple way to do. I am trying to use CROSS APPLY STRING_SPLIT(sentence, ' '), but I can only get one word to the method. Can you please help? Thank you.
Sample sentence:
I need to split strings using TSQL.
This approach is traditional, and is supported in all versions and editions of SQL Server.
Desired answer:
I need
to split
strings using
TSQL.
Desired Answer:
This approach
is traditional
, and
is supported
in all
versions and
editions of
SQL Server.
Here you go:
First add a space to any comma (you want a comma treated as a word), then split the string on each space into rows using some Json, then assign groups to pair each row using modulo and lag over(), then aggregate based on the groups:
declare #s varchar(100)='This approach is traditional, and is supported in all versions and editions of SQL Server';
select Result = String_Agg(string,' ') within group (order by seq)
from (
select j.[value] string, Iif(j.[key] % 2 = 1, Lag(seq) over(order by seq) ,seq) gp, seq
from OpenJson(Concat('["',replace(Replace(#s,',',' ,'), ' ', '","'), '"]')) j
cross apply(values(Convert(tinyint,j.[key])))x(seq)
)x
group by gp;
Result:
See Demo Fiddle

Subtract multiple strings from one record

I am novice to Postgres queries. I am trying to pull substring from each record of column based on specific set.
Suppose, I substring from each record between keywords 'start' & 'end'. So the thing is it can be multiple occurrences of 'start' & 'end' in one record and need to extract what occurs between each set of 'start' & 'end' keywords.
Do we have possibility to achieve this with single query in Postgres, rather than creating a procedure? If yes, could you please help on this or re-direct me where I can find related information?
Assuming that / always delimits the elements, you can use string_to_array() to convert the string into multiple elements and unnest() to turn the array into a result. You can then use regexp_replace() to get rid of the delimiters in the curly braces:
select d.id, regexp_replace(t.name, '{start}|{end}', '', 'g')
from the_able d
cross join unnest(string_to_array(d.body,'/')) as t(name);
SQLFiddle example: http://sqlfiddle.com/#!15/9eecb7db59d16c80417c72d1e1f4fbf1/8863
You achieve all this using regular expressions, and the PostgreSQL regex functions regexp_matches (to match content between your tags) and regexp_replace (to remove the tags):
with t(id,body) as (values
(1, '{start}John{end}/{start}Jack{end}'),
(2, '{start}David{end}'),
(3, '{start}Ken{end}/{start}Kane{end}/{start}John{end}'))
select id, regexp_replace(
(regexp_matches(body, '{start}.*?{end}', 'g'))[1],
'^{start}|{end}$', '', 'g') matches
from t

Trimming parts of a word but each word is different size

I have a table with values like this:
book;65
book;1000
table;66
restaurant;1202
park;2
park;44444
Is there a way using postgres sql to remove everything, regardless of the length of the word, that includes the semi-colon and everything after it?
I plan on doing a query that goes something like this after I figure this out:
select col1, modified_col_1
from table_1
--modified is without the semi-colon and everything after
You can use substring and strpos() for this:
select col1, substring(col1, 1, strpos(col1, ';') - 1) as modified_col_1
The above will give an error if there are values without a ;
Another option would be to split the string into an array and then just pick the first element:
select (string_to_array(col1, ';'))[1]
from table_1
This will also work if no ; is present

T-SQL: Find column match within a string (LIKE but different)

Server: SQL Server 2008 R2
I apologize in advance, as I'm not sure of the best way to verbalize the question. I'm receiving a string of email addresses and I need to see if, within that string, any of the addresses exist as a user already. The query that obviously doesn't work is shown below, but hopefully it helps to clarify what I'm looking for:
SELECT f_emailaddress
FROM tb_users
WHERE f_emailaddress LIKE '%user1#domain.com,user2#domain.com%'
I was hoping SQL had an "InString" operator, that would check for matches "within the string", but I my Google abilities must be weak today.
Any assistance is greatly appreciated. If there simply isn't a way, I'll have to dig in and do some work in the codebehind to split each item in the string and search on each one.
Thanks in advance,
Beems
Split the input string and use IN clause
to split the CSV to rows use this.
SELECT Ltrim(Rtrim(( Split.a.value('.', 'VARCHAR(100)') )))
FROM (SELECT Cast ('<M>'
+ Replace('user1#domain.com,user2#domain.com', ',', '</M><M>')
+ '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)
Now use the above query in where clause.
SELECT f_emailaddress
FROM tb_users
WHERE f_emailaddress IN(SELECT Ltrim(Rtrim(( Split.a.value('.', 'VARCHAR(100)') )))
FROM (SELECT Cast ('<M>'
+ Replace('user1#domain.com,user2#domain.com', ',', '</M><M>')
+ '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a))
Or use can use Inner Join
SELECT f_emailaddress
FROM tb_users A
JOIN (SELECT Ltrim(Rtrim(( Split.a.value('.', 'VARCHAR(100)') )))
FROM (SELECT Cast ('<M>'
+ Replace('user1#domain.com,user2#domain.com', ',', '</M><M>')
+ '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)) B
ON a.f_emailaddress = b.f_emailaddress
You first need to split the CSV list into a temp table and then use that to INNER JOIN with your existing table, as that will act as a filter.
You cannot use CONTAINS unless you have created a Full Text index on that table and column, which I doubt is the case here.
For example:
CREATE TABLE #EmailAddresses (Email NVARCHAR(500) NOT NULL);
INSERT INTO #EmailAddress (Email)
SELECT split.Val
FROM dbo.Splitter(#IncomingListOfEmailAddresses);
SELECT usr.f_emailaddress
FROM tb_users usr
INNER JOIN #EmailAddresses tmp
ON tmp.Email = usr.f_emailaddress;
Please note that the reference to "dbo.Splitter" is a placeholder for whatever string splitter you already have or might get. Please do not use any splitter that makes use of a WHILE loop. The best options are either the SQLCLR- or XML- based ones. The XML-based ones are generally fast but do have some issues with encoding if the string to be split has special XML characters such as &, <, or ". If you want a quick and easy SQLCLR-based splitter, you can download the Free version of the SQL# library (which I am the creator of, but this feature is in the free version) which contains String_Split and String_Split4k (for when the input is always <= 4000 characters).
SQL has a CONTAINS and an IN function. You can use either of those to accomplish your task. Click on either for more information via MSDNs website! Hope this helps.
CONTAINS
CONTAINS will look to see if any values in your data contain the entire string you provided. Kind of similar in presentations to LIKE '%myValue%';
SELECT f_emailaddress
FROM tb_users
WHERE CONTAINS (f_emailaddress, 'user1#domain.com');
IN
IN will return matches for any values in the provided comma delimited list. They need to be exact matches however. You can't provide partial terms.
SELECT f_emailaddress
FROM tb_users
WHERE f_emailaddress IN ('user1#domain.com','user2#domain.com')
As far as splitting each of the values out into separate strings, have a look at the StackOverflow question found HERE. This might point you in the proper direction.
You can try like this(not tested).
Before using this, make sure that you have created a Full Text index on that table and column.
Replace your comma with AND then
SELECT id,email
FROM t
where CONTAINS(email, 'user1#domain.com and user2#domain.com');
--prepare temp table for testing
DECLARE #tb_users AS TABLE
(f_emailaddress VARCHAR(100))
INSERT INTO #tb_users
( f_emailaddress)
VALUES ( 'user1#domain.com' ),
( 'user2#domain.com' ),
( 'user3#domain.com' ),
( 'user4#domain.com' )
--Your query
SELECT f_emailaddress
FROM #tb_users
WHERE 'user1#domain.com,user2#domain.com' LIKE '%' + f_emailaddress + '%'