How to remove empty words from my SQL string_to_array? - postgresql

I have a text column in my Postgres table and I want to remove any non-alphabetical characters and split it by a space (so I can use each word to do a search later).
I can remove the characters and split it successfully however my rows have empty results. I cannot have any empty words:
SELECT
asset.id,
string_to_array(TRIM(regexp_replace(asset.title, '[^a-zA-Z ]', '', 'g')), ' ')
FROM
assets asset
eg.
Hello world! becomes {Hello,world}
but also Some Result V1.0 - 3D Model becomes {Some,Result,V,,D,Model
How do I filter my array to remove empty words?

You can transform array to table, then clean the content, and transform result to array again:
CREATE OR REPLACE FUNCTION public.remove_empty_string(text[])
RETURNS text[]
LANGUAGE sql
AS $function$
select array(select v from unnest($1) g(v) where v <> '')
$function$
(2022-06-20 05:17:57) postgres=# select remove_empty_string(array['Some','Result','V','','D','Model']);
┌─────────────────────────┐
│ remove_empty_string │
╞═════════════════════════╡
│ {Some,Result,V,D,Model} │
└─────────────────────────┘
(1 row)

Try using a multicharacter regular expression
Try with this:
SELECT
asset.id,
string_to_array(TRIM(regexp_replace(asset.title, '[^a-zA-Z]+', ' ', 'g')), ' ')
FROM
assets asset
I just removed the whitespace from the regex in order to capture every non-alphabet character. Then, the + indicates one or more coincidences, which ensures than any consecutive non-alphabetical character will get replaced with a single space. Finally, as you already do the trim, your split by whitespace will work.

Related

Concatenation of jsonb elements in postgreSQL with comma separation

I would like to design a query where I can combine two jsonb with an unknown number/set of elements in postgreSQL in a controlled manner. The jsonb operator || almost exactly fits my purpose, but for one of the jsonb elements I would like to concatenate and separate by comma the two values rather than having the second jsonb's value override the first's value. For example:
'{"a":"foo", "b":"one", "special":"comma"}'::jsonb || '{"a":"bar", "special":"separated"}'::jsonb → '{"a":"bar", "b":"one", "special":"comma,separated"}'
My current query is the following:
INSERT INTO table AS t (col1, col2, col3_jsonb)
VALUES ("first", "second", '["a":"bar", "special":"separated"]'::jsonb))
ON CONFLICT ON CONSTRAINT unique_entries DO UPDATE
SET col3_jsonb = excluded.col3_jsonb || t.col3_jsonb
RETURNING id;
which results in a jsonb element for col3_jsonb that has the value special set to separated rather than the desired comma,separated. I understand that this is the concatenation operator working as documented, but I am not sure how to approach treating one element of the jsonb differently other than perhaps trying to pull out special values with WITH.. clauses elsewhere in the query. Any insights or tips would be hugely appreciated!
with t(a,b) as (values(
'{"a":"foo", "b":"one", "special":"comma"}'::jsonb,
'{"a":"bar", "special":"separated"}'::jsonb))
select
a || jsonb_set(b, '{special}',
to_jsonb(concat_ws(',', nullif(a->>'special', ''), nullif(b->>'special', ''))))
from t;
┌────────────────────────────────────────────────────────┐
│ ?column? │
├────────────────────────────────────────────────────────┤
│ {"a": "bar", "b": "one", "special": "comma,separated"} │
└────────────────────────────────────────────────────────┘
nullif() and concat_ws() functions needed for cases where one or both "special" values is missing/null/empty
You can use jsonb_each on the two values followed by jsonb_object_agg to put them back into an object:
…
SET col3_jsonb = (
SELECT jsonb_object_agg(
key,
COALESCE(to_jsonb((old.value->>0) || ',' || (new.value->>0)), new.value, old.value)
)
FROM jsonb_each(example.old_obj) old
FULL OUTER JOIN jsonb_each(example.new_obj) new USING (key)
)
(online demo)
The casting of the arbitrary JSON value to the concatenable string required a trick. If you know that all your object properties have string values, you can simplify by using jsonb_each_text instead:
SELECT jsonb_object_agg(
key,
COALESCE(old.value || ',' || new.value, new.value, old.value)
)
FROM jsonb_each_text(example.old_obj) old
FULL OUTER JOIN jsonb_each_text(example.new_obj) new USING (key)

to find new lines character in postgres

Having couple of entries in database table that have multiple line "names" data.
I try to find single newline character from it.
SELECT
id,
strpos ( NAME, E'\n' ) AS Position_of_substring
FROM
problems
WHERE
strpos ( NAME, E'\n' ) > 0;
But it fails for the data that have more than 1 new line character (\n).
ANy way to find "n" number of "\n" in names data.
regexp_matches will emit a row for each match. doc
SELECT
id,
strpos ( NAME, E'\n' ) AS Position_of_substring
FROM
problems p
WHERE
(select count(*) from regexp_matches(p.name,E'\n','g') ) = ?;
This one gives you a list of all indexes with \n in your string. I am not sure if you were expecting this result:
demo:db<>fiddle
SELECT
name,
array_remove( -- 5
(array_agg(sum))::int[], -- 4
length(name) + 1
)
FROM (
-- 3
SELECT
name,
SUM(length(lines) + 1) OVER (PARTITION BY name ORDER BY row_number)
FROM (
-- 2
SELECT
*,
row_number() OVER ()
FROM (
-- 1
SELECT
name,
regexp_split_to_table(name, '\n') as lines
FROM problems
)s
)s
) s
GROUP BY name
Splitting the string at the \n chars. Every split part is now one row in a temporary table.
Adding a row_count to assure the right order of the split parts
This counts the length of all single split parts. The (length + 1) gives the position of the \n. The SUM window function sums up all values within a group (your original text). That's why the order is relevant. For example: The first two parts of "abc\nde\nfgh" have the lengths of 3 and 2. So the breaks are at 4 (abc = 3, + 1) and 3 (de = 2, + 1). But the 3 of the second part is no real index, but if you sum up these values you get the right indexes: 4 and 7.
Aggregating these results
If (as in my example) the last char is always a \n and you are only interested in the \n chars the string you could remove the last entry of the aggregated array.
Changed problem in comments below:
Would like to replace \n with spaces. So I am thinking how above query
will look in the Update statement. – Pranav Unde
Replacing the \n by spaces is a quiet different problem then getting indexes for all occurances of a special character. And it's much simpler:
UPDATE problems
SET name = trim(regexp_replace(name, E'\n', ' ', 'g'));
regexp_replace(..., 'g') finds all occurances of \n and does the replacing
trim() removes the whitespaces before and after the string if necessary (maybe because there was a trailing \n as in my example - which was replaced by a space as well in the step before)
demo:db<>fiddle

tsql comma delimited testing for value

I've been given a table with a few fields that hold comma-separated values (either blank or Y/N) like so (and the field name where this data is stored is People_Notified):
Y,,N,
,Y,,N
,,N,Y
Each 'slot' relates to a particular field value and I need to now include that particular field name in the string as well (in this case Parent, Admin, Police and Medical) but inserting a "N" if the current value is blank but leaving the existing Y's and N's in place. So for the above example, where there are four known slots, I would want a tsql statement to end up with:
Parent=Y,Admin=N,Police=N,Medical=N
Parent=N,Admin=Y,Police=N,Medical=N
Parent=N,Admin=N,Police=N,Medical=Y
I tried to use a combination of CHARINDEX and CASE but haven't figured a way to make this work.
js
Although a bit messy, in theory can be done in one statement:
select
'Parent=' +stuff((stuff((stuff(
substring((replace(
(','+(replace((replace(#People_Notified,',,,',',N,N,')),',,',',N,'))+','),',,',',N,')),2,7),7,0,
'Medical=')),5,0,'Police=')),3,0,'Admin=')
broken down is easier to follow:
declare #People_Notified varchar(100)=',,Y,Y' -- test variable
-- Insert Ns
set #People_Notified= (select replace(#People_Notified,',,,',',N,N,')) -- case two consecutive missing
set #People_Notified= (select replace(#People_Notified,',,',',N,')) -- case one missing
set #People_Notified= (select replace((','+#People_Notified+','),',,',',N,')) -- case start or end missing
set #People_Notified= substring(#People_Notified,2,7) -- remove extra commas added previously
-- Stuff the labels
select 'Parent=' +stuff((stuff((stuff(#People_Notified,7,0,'Medical=')),5,0,'Police=')),3,0,'Admin=')
If you're able to use XQuery in SQL Server, I don't think you need to get too complex. You could do something like this:
SELECT CONVERT(XML, REPLACE('<pn>' + REPLACE(People_Notified, ',', '</pn><pn>') + '</pn>', '<pn></pn>', '<pn>N</pn>')).query('
concat("Parent=", data(/pn[1])[1], ",Admin=", data(/pn[2])[1], ",Police=", data(/pn[3])[1], ",Medical=", data(/pn[4])[1])
')
FROM ...
Explanation: Construct an XML-like string out of the original delimited string by replacing commas with closing and opening tags. Add an opening tag to the start and a closing tag to the end. Replace each empty element with one containing "N". Convert the XML-like string into actual XML data so that you can use XQuery. Then just concatenate what you need using concat() and the right indexes for the elements' data.
Here's one way to do it:
;WITH cteXML (Id, Notified)
AS
(
SELECT Id,
CONVERT(XML,'<Notified><YN>'
+ REPLACE([notified],',', '</YN><YN>')
+ '</YN></Notified>') AS Notified
FROM People_Notified
)
select id,
'Parent=' + case Notified.value('/Notified[1]/YN[1]','varchar(1)') when '' then 'N' else Notified.value('/Notified[1]/YN[1]','varchar(1)') end + ',' +
'Admin=' + case Notified.value('/Notified[1]/YN[2]','varchar(1)') when '' then 'N' else Notified.value('/Notified[1]/YN[2]','varchar(1)') end + ',' +
'Police=' + case Notified.value('/Notified[1]/YN[3]','varchar(1)') when '' then 'N' else Notified.value('/Notified[1]/YN[3]','varchar(1)') end + ',' +
'Medical=' + case Notified.value('/Notified[1]/YN[4]','varchar(1)') when '' then 'N' else Notified.value('/Notified[1]/YN[4]','varchar(1)') end Notified
from cteXML
SQL Fiddle
Check this page out for an explanation of what the XML stuff is doing.
This page has a pretty thorough look at the various ways you can split a delimited string into rows.

Prevent trailing spaces during insert?

I have this INSERT statement and there seems to be trailing spaces at the end of the acct_desc fields. I'd like to know how to prevent trailing spaces from occurring during my insert statement.
INSERT INTO dwh.attribution_summary
SELECT d.adic,
d.ucic,
b.acct_type_desc as acct_desc,
a.begin_mo_balance as opening_balance,
c.date,
'fic' as userid
FROM fic.dim_members d
JOIN fic.fact_deposits a ON d.ucic = a.ucic
JOIN fic.dim_date c ON a.date_id = c.date_id
JOIN fic.dim_acct_type b ON a.acct_type_id = b.acct_type_id
WHERE c.date::timestamp = current_date - INTERVAL '1 days';
Use the PostgreSQL trim() function. There is trim(), rtrim() and ltrim().
To trim trailing spaces:
...
rtrim(b.acct_type_desc) as acct_desc,
...
If acct_type_desc is not of type text or varchar, cast it to text first:
...
rtrim(b.acct_type_desc::text) as acct_desc,
...
If acct_type_desc is of type char(n), casting it to text removes trailing spaces automatically, no trim() necessary.
Besides what others have said, add a CHECK CONSTRAINT to that column, so if one forgets to pass the rtrim() function inside the INSERT statement, the check constraint won't.
For example, check trailing spaces (in the end) of string:
ALTER TABLE dwh.attribution_summary
ADD CONSTRAINT tcc_attribution_summary_trim
CHECK (rtrim(acct_type_desc) = acct_type_desc);
Another example, check for leading and trailing spaces, and consecutive white spaces in string middle):
ALTER TABLE dwh.attribution_summary
ADD CONSTRAINT tcc_attribution_summary_whitespace
CHECK (btrim(regexp_replace(acct_type_desc, '\s+'::text, ' '::text, 'g'::text)) = acct_type_desc);
What is the type of acct_desc?
If it is CHAR(n), then the DBMS has no choice but to add spaces at the end; the SQL Standard requires that.
If it is VARCHAR(n), then the DBMS won't add spaces at the end.
If PostgresSQL supported them, the national variants of the types (NCHAR, NVARCHAR) would behave the same as the corresponding non-national variant does.

handle escape sequence char in DB2

I want to search a column and get values where value containts \ .
I tried select * from "Values" where "ValueName" like '\'. But returns no value.
Also tried like "\" and like'\''%' etc. But no results.
See the DB2 Documentation on the LIKE predicate, in particular the parts about escape expressions.
What you want is
select * from Values where ValueName like '\\%' escape '\'
To give an example of usage:
create table backslash_escape_test
(
backslash_escape_test_column varchar(20)
);
insert into backslash_escape_test(backslash_escape_test_column)
values ('foo\');
insert into backslash_escape_test(backslash_escape_test_column)
values ('no slashes here');
insert into backslash_escape_test(backslash_escape_test_column)
values ('foo\bar');
insert into backslash_escape_test(backslash_escape_test_column)
values ('\bar');
select count(*) from backslash_escape_test where
backslash_escape_test_column like '%\\%' escape '\';
returns 3 (all 3 rows with \ in them).
select count(*) from backslash_escape_test where
backslash_escape_test_column like '\\%' escape '\';
returns 1 (the \bar row).
select * from Values where ValueName like '%\\%'
values is a not so good name because it may be confused with the values keyword
Don't escape it. You just need wildcards around it like this:
select count(*)
from escape_test
where test_column like '%\%'
But, suppose you really do need to escape the slash. Here's a simpler, more straightforward answer:
The escape-expression allows you to specify whatever character for escaping that you wish. So why use a character that you're looking for, thus requiring you to escape it? Use any other character instead. I'll use a plus sign as an example, but it could be a backslash, pound-sign, question-mark, anything other than a character you are looking for or one of the wildcard characters (% or _).
select count(*)
from escape_test
where test_column like '%\%' escape '+';
Now you don't have to add anything into your like-pattern.
To hold myself to the same standard of proof that #Michael demonstrated --
create table escape_test
( test_column varchar(20) );
insert into escape_test
(test_column)
values ('foo\'),
('no slashes here'),
('foo\bar'),
('\bar');
select 'test1' trial, count(*) result
from escape_test
where test_column like '%\%'
UNION
select 'test2', count(*)
from escape_test
where test_column like '%\\%' escape '\'
UNION
select 'test3', count(*)
from escape_test
where test_column like '%\%' escape '+'
;
Which returns the same number of rows for each method:
TRIAL RESULT
----- ------
test1 3
test2 3
test3 3