How do I convert a string to sentence case in Postgres - postgresql

With the caveat that this is not a silver bullet, I submit this question and provide my own answer, since after a reasonable college effort, I found myself with no solutions for PostgreSQL.
The task in my case is to convert a collection of largely upper-case-only sentences into a reasonable facsimile of a paragraph using capitalization for just the first letter of each sentence. If that solution is out there, either I'm blind or it was decently well hidden.
So, for example, how do I convert
THE NAME IS BOND. JAMES BOND. 007. AND THIS IS ONE COOL PIECE OF CODE.
to
The name is bond. James bond. 007. And this is one cool piece of code.
?

Here is what I came up with. I'd be glad to award the answer to a better solution!
WITH fixed_sentences_source AS (
WITH single_sentences_source AS (
WITH arrays_source AS (
SELECT
regexp_split_to_array(LOWER('THE NAME IS BOND. JAMES BOND. 007. AND THIS IS ONE COOL PIECE OF CODE.'), '\. ' ) AS arrays
)
SELECT TRIM(UNNEST(arrays)) AS single_sentences
FROM arrays_source
)
SELECT
UPPER(SUBSTRING(single_sentences, 1, 1)) || SUBSTRING(single_sentences, 2, LENGTH(single_sentences) - 1) AS fixed_sentence
FROM single_sentences_source
WHERE
single_sentences <> ''
)
SELECT ARRAY_TO_STRING(ARRAY(SELECT fixed_sentence FROM fixed_sentences_source), '. ')

Related

How to extract an uppercase word from a string in Postgresql only if entire word is in capital letters

I am trying to extract words from a column in a table only if the entire word is in uppercase letters (I am trying to find all acronyms in a column). I tried using the following code, but it gives me all capital letters in a string even if it is just the first letter of a word. Appreciate any help you can provide.
SELECT title, REGEXP_REPLACE(title, '[^A-Z]+', '', 'g') AS acronym
FROM table;
Here is my desired output:
title
acronym
I will leave ASAP
ASAP
David James is LOL
LOL
BTW I went home
BTW
Please RSVP today
RSVP
You could use this regular expression:
regexp_replace(title, '.*?\m([[:upper:]]+)\M.*', '\1 ', 'g')
.*? matches arbitrary letters is a non-greedy fashion
\m matches the beginning of a word
( is the beginning of the part we are interested in
[[:upper:]]* are arbitrarily many uppercase characters
) is the end of that part we are interested in
\M matches the end of a word
.* matches arbitrary characters
\1 references the part delimited by the parentheses
You could try REGEXP_MATCHES function
WITH data AS
(SELECT 'I will leave ASAP' AS title
UNION ALL SELECT 'David James is LOL' AS title
UNION ALL SELECT 'BTW I went home' AS title
UNION ALL SELECT 'Please RSVP today' AS title)
SELECT title, REGEXP_MATCHES(title, '[A-Z][A-Z]+', 'g') AS acronym
FROM data;
See demo here

getting the "title" between the names

The name I'm working on is formatted like this:
King, Mr. Jay Thomas
Smith, Miss. Jane
How do I get the middle title part only using Postgres?
I'm a noob so this is definitely wrong:
SELECT position('%#," #"%#' for '#') AS TITLE
FROM titanic;`
You could use SUBSTRING with the regex pattern \w+\.:
SELECT SUBSTRING(title from '\w+\.')
FROM titanic;

How to change case of characters identified via pattern matching in PostgreSQL

I have several PostgreSQL tables with "comment" columns (data type = text) for which I am trying to standardize the use of upper and lowercase. Specifically, I'd like to change the case of comment strings from all-caps to capitalization of only the first character in each sentence (there are typically 1-3 sentences per comment). I standardized the number of spaces between sentences (to 1) with
update table
set comment = regexp_replace(comment, '( ){2,}',' ','g');
and set all characters in each string except the first to lower case with
update table
set comment = upper(left(comment, 1)) || lower(right(comment, -1))
Now, how do I change the case of the first character after each period to uppercase? I can select the relevant characters with
select regexp_matches('Testing. this. using. some. text.', '([.]\s\S)', 'g');
but haven't been able to figure out how to capitalize these. Also, I'm sure there is a better way to conduct these steps in a more integrative way, but this is my noob-ish attempt.
The following worked in my situation, in which comments are made up of one or more sentences and sentences are separated by a period and single space:
with source as (
select regexp_split_to_table('hello. world', '\.\s') sentence
)
select array_to_string(
array(
select upper(left(sentence, 1)) || right(sentence, -1) sentence from source
), '. '
) modified_comment;

In DB2 SQL RegEx, how can a conditional replacement be done without CASE WHEN END..?

I have a DB2 v7r3 SQL SELECT statement with three instances of REGEXP_SUBSTR(), all with the same regex pattern string, each of which extract one of three groups.
I'd like to change the first SUBSTR to REGEXP_REPLACE() to do a conditional replacement if there's no match, to insert a default value similarly to the ELSE section of a CASE...END. But I can't make it work. I could easily use a CASE, but it seems more compact & efficient to use RegEx.
For example, I have descriptions of food containers sizes, in various states of completeness:
12X125
6X350
1X1500
1500ML
1000
The last two don't have the 'nnX' part at the beginning, in which case '1X' is assumed and needs to be inserted.
This is my current working pattern string:
^(?:(\d{1,3})(?:X))?((?:\d{1,4})(?:\.\d{1,3})?)(L|ML|PK|Z|)$
The groups returned are: quantity, size, and unit.
But only the first group needs the conditional replacement:
(?:(\d{1,3})(?:X))?
This RexEgg webpage describes the (?=...) operator, and it seems to be what I need, but I'm not sure. It's in the list of operators for my version of DB2, but I can't make it work. Frankly, it's a bit deeper than my regex knowledge, and I can't even make it work in my favorite online regex tester, Regex101.
So...does anyone have any idea or suggestions..? Thanks.
Try this (replace "digits not followed by X_or_digit"):
with t(s) as (values
'12X125'
, '6X350'
, '1X1500'
, '1500'
, '1125'
)
select regexp_replace(s, '^([\d]+(?![X\d]))', '1X\1')
from t;

Extracting words from a string in postgres SQL

I am brand new to 'postgres' and trying to extract a value from a string. I'm trying to use a combination of regexp_substr and replace to implement the desired outcome.
UPDATED Example: I have a string "When Harry met Sally" (following the pattern, When X met Y). I'd like to extract the word Harry, which is X.
I am trying the syntax:
regexp_substr(REPLACE('When Harry met Sally', 'When ', ''),' met [^.]*'); but am receiving the error message: ERROR: function regexp_substr(text, unknown) does not exist Hint: No function matches the given name and argument types. You might need to add explicit type casts.
Can anyone help? I imagine that this is child's play for some pro out there.
Before using some function, check into the documentation about its syntax... or if it exists.
regexp_substr does not exist in Postgresql. What are you trying to do ?
I'd like to extract the word Harry.
By which criteria? The second word? The X in "When X met Y"?
Without more info, it's impossible to answer.
postgres=# select regexp_replace('When Harry met Sally',
'When (.*) met.*',
'\1' );
regexp_replace
----------------
Harry
(1 row)
This is just an example. It could be wrong if, for example you want to support other kind of blank space, or be case sensitive, or admit some content before 'When', or...