Replace spaces in bracketed text only in Postgresql by regexp_replace function - postgresql

I need to replace all spaces in bracketed text only
"Describe what you (tried and) what (you expected) to happen"
with commas like this:
"Describe what you (tried,and) what (you,expected) to happen"
Please advise how to do it correctly using function regexp_replace.
Thanks in advance.

Not so easy to do with a simple regexp_replace function ...
If you have a maximum of two words in brakets then you can do :
SELECT string_agg (r.res, ' ')
FROM ( SELECT CASE
WHEN elt ~ '^\(' THEN elt || ',' || lead(elt) OVER ()
WHEN elt ~ '\)$' THEN ''
ELSE elt
END AS res
FROM regexp_split_to_table('Describe what you (tried and) what (you expected) to happen', ' ') as elt
) AS r
WHERE r.res <> ''
If you have two or more words inside the brakets then you have to create your own aggregate function :
CREATE OR REPLACE FUNCTION replace(x text, y text, old text, new text) RETURNS text LANGUAGE sql AS $$
SELECT replace(COALESCE(x,y), old, new) ; $$ ;
CREATE OR REPLACE AGGREGATE replace_agg (text, text, text)
( stype = text
, sfunc = replace
)
And the query is :
SELECT replace_agg('Describe what you (tried and) what (you expected) to happen', elt[1], replace(elt[1], ' ', ','))
FROM regexp_matches('Describe what you (tried and) what (you expected) to happen', '\([^\)]*\)', 'g') AS elt
see the test in dbfiddle

Related

In Postgres database How to get column values in one single row comma separated value [duplicate]

I am looking for a way to concatenate the strings of a field within a group by query. So for example, I have a table:
ID COMPANY_ID EMPLOYEE
1 1 Anna
2 1 Bill
3 2 Carol
4 2 Dave
and I wanted to group by company_id to get something like:
COMPANY_ID EMPLOYEE
1 Anna, Bill
2 Carol, Dave
There is a built-in function in mySQL to do this group_concat
PostgreSQL 9.0 or later:
Modern Postgres (since 2010) has the string_agg(expression, delimiter) function which will do exactly what the asker was looking for:
SELECT company_id, string_agg(employee, ', ')
FROM mytable
GROUP BY company_id;
Postgres 9 also added the ability to specify an ORDER BY clause in any aggregate expression; otherwise you have to order all your results or deal with an undefined order. So you can now write:
SELECT company_id, string_agg(employee, ', ' ORDER BY employee)
FROM mytable
GROUP BY company_id;
PostgreSQL 8.4.x:
PostgreSQL 8.4 (in 2009) introduced the aggregate function array_agg(expression) which collects the values in an array. Then array_to_string() can be used to give the desired result:
SELECT company_id, array_to_string(array_agg(employee), ', ')
FROM mytable
GROUP BY company_id;
PostgreSQL 8.3.x and older:
When this question was originally posed, there was no built-in aggregate function to concatenate strings. The simplest custom implementation (suggested by Vajda Gabo in this mailing list post, among many others) is to use the built-in textcat function (which lies behind the || operator):
CREATE AGGREGATE textcat_all(
basetype = text,
sfunc = textcat,
stype = text,
initcond = ''
);
Here is the CREATE AGGREGATE documentation.
This simply glues all the strings together, with no separator. In order to get a ", " inserted in between them without having it at the end, you might want to make your own concatenation function and substitute it for the "textcat" above. Here is one I put together and tested on 8.3.12:
CREATE FUNCTION commacat(acc text, instr text) RETURNS text AS $$
BEGIN
IF acc IS NULL OR acc = '' THEN
RETURN instr;
ELSE
RETURN acc || ', ' || instr;
END IF;
END;
$$ LANGUAGE plpgsql;
This version will output a comma even if the value in the row is null or empty, so you get output like this:
a, b, c, , e, , g
If you would prefer to remove extra commas to output this:
a, b, c, e, g
Then add an ELSIF check to the function like this:
CREATE FUNCTION commacat_ignore_nulls(acc text, instr text) RETURNS text AS $$
BEGIN
IF acc IS NULL OR acc = '' THEN
RETURN instr;
ELSIF instr IS NULL OR instr = '' THEN
RETURN acc;
ELSE
RETURN acc || ', ' || instr;
END IF;
END;
$$ LANGUAGE plpgsql;
How about using Postgres built-in array functions? At least on 8.4 this works out of the box:
SELECT company_id, array_to_string(array_agg(employee), ',')
FROM mytable
GROUP BY company_id;
As from PostgreSQL 9.0 you can use the aggregate function called string_agg. Your new SQL should look something like this: SELECT company_id, string_agg(employee, ', ')
FROM mytable
GROUP BY company_id;
I claim no credit for the answer because I found it after some searching:
What I didn't know is that PostgreSQL allows you to define your own aggregate functions with CREATE AGGREGATE
This post on the PostgreSQL list shows how trivial it is to create a function to do what's required:
CREATE AGGREGATE textcat_all(
basetype = text,
sfunc = textcat,
stype = text,
initcond = ''
);
SELECT company_id, textcat_all(employee || ', ')
FROM mytable
GROUP BY company_id;
As already mentioned, creating your own aggregate function is the right thing to do. Here is my concatenation aggregate function (you can find details in French):
CREATE OR REPLACE FUNCTION concat2(text, text) RETURNS text AS '
SELECT CASE WHEN $1 IS NULL OR $1 = \'\' THEN $2
WHEN $2 IS NULL OR $2 = \'\' THEN $1
ELSE $1 || \' / \' || $2
END;
'
LANGUAGE SQL;
CREATE AGGREGATE concatenate (
sfunc = concat2,
basetype = text,
stype = text,
initcond = ''
);
And then use it as:
SELECT company_id, concatenate(employee) AS employees FROM ...
This latest announcement list snippet might be of interest if you'll be upgrading to 8.4:
Until 8.4 comes out with a
super-effient native one, you can add
the array_accum() function in the
PostgreSQL documentation for rolling
up any column into an array, which can
then be used by application code, or
combined with array_to_string() to
format it as a list:
http://www.postgresql.org/docs/current/static/xaggr.html
I'd link to the 8.4 development docs but they don't seem to list this feature yet.
Following up on Kev's answer, using the Postgres docs:
First, create an array of the elements, then use the built-in array_to_string function.
CREATE AGGREGATE array_accum (anyelement)
(
sfunc = array_append,
stype = anyarray,
initcond = '{}'
);
select array_to_string(array_accum(name),'|') from table group by id;
Following yet again on the use of a custom aggregate function of string concatenation: you need to remember that the select statement will place rows in any order, so you will need to do a sub select in the from statement with an order by clause, and then an outer select with a group by clause to aggregate the strings, thus:
SELECT custom_aggregate(MY.special_strings)
FROM (SELECT special_strings, grouping_column
FROM a_table
ORDER BY ordering_column) MY
GROUP BY MY.grouping_column
Use STRING_AGG function for PostgreSQL and Google BigQuery SQL:
SELECT company_id, STRING_AGG(employee, ', ')
FROM employees
GROUP BY company_id;
I found this PostgreSQL documentation helpful: http://www.postgresql.org/docs/8.0/interactive/functions-conditional.html.
In my case, I sought plain SQL to concatenate a field with brackets around it, if the field is not empty.
select itemid,
CASE
itemdescription WHEN '' THEN itemname
ELSE itemname || ' (' || itemdescription || ')'
END
from items;
If you are on Amazon Redshift, where string_agg is not supported, try using listagg.
SELECT company_id, listagg(EMPLOYEE, ', ') as employees
FROM EMPLOYEE_table
GROUP BY company_id;
According to version PostgreSQL 9.0 and above you can use the aggregate function called string_agg. Your new SQL should look something like this:
SELECT company_id, string_agg(employee, ', ')
FROM mytable GROUP BY company_id;
You can also use format function. Which can also implicitly take care of type conversion of text, int, etc by itself.
create or replace function concat_return_row_count(tbl_name text, column_name text, value int)
returns integer as $row_count$
declare
total integer;
begin
EXECUTE format('select count(*) from %s WHERE %s = %s', tbl_name, column_name, value) INTO total;
return total;
end;
$row_count$ language plpgsql;
postgres=# select concat_return_row_count('tbl_name','column_name',2); --2 is the value
I'm using Jetbrains Rider and it was a hassle copying the results from above examples to re-execute because it seemed to wrap it all in JSON. This joins them into a single statement that was easier to run
select string_agg('drop table if exists "' || tablename || '" cascade', ';')
from pg_tables where schemaname != $$pg_catalog$$ and tableName like $$rm_%$$

Not select if phrase matching from records of another table

I have a big table (100M records) with keywords like these:
('water'),
('mineral water'),
('water bottle'),
('big bottle of water'),
('coke'),
('pepsi')
and I want to select all records excluding keywords where there is a regex match with at least one record of another table.
For example, the exclusion table contains:
water
wine
glass
So I have to select all records from keywords table but excluding all those with a phrase match:
keyword that are equal to 'water' or 'wine' or 'glass'
keyword that starts with 'water' or 'wine' or 'glass'
keyword that ends with 'water' or 'wine' or 'glass'
keyword that contains 'water' or 'wine' or 'glass' in the middle between two spaces
"waterize" do not to be excluded.
Here a pseudo-sql. Desidered output are only records: "coke", "pepsi".
CREATE TABLE keywords (
query TEXT
);
CREATE TABLE negatives (
text TEXT
);
INSERT INTO keywords
(query)
VALUES
('water'),
('mineral water'),
('water bottle'),
('big bottle of water'),
('coke'),
('pepsi');
INSERT INTO negatives (text) VALUES ('water', 'glass', 'wine');
SELECT *
FROM keywords
WHERE NOT (
query ~~ ('% ' || 'water' || ' %') OR
query ~~ ( 'water' || ' %') OR
query ~~ ('% ' || 'water') OR
query ~~ ('water')
)
https://www.db-fiddle.com/f/4ufuFAXKf7mi5yefNQqoXM/33
This needs to be performance efficient because keywords table is very large (100M records) and "exclusion" table is very small (<100 records)
Since the negatives is a reasonably bounded list (<100 records you say), what about leveraging Postgres' most excellent support of arrays?
literal demo:
SELECT *
FROM keywords
where
not (string_to_array(query, ' ') && '{water,glass,wine}')
Using your negatives table:
with omit as (
select array_agg (text) as neg from negatives
)
SELECT k.*
FROM keywords k
cross join omit o
where
not (string_to_array(k.query, ' ') && o.neg)
-- EDIT 9/22/22 --
Per my comment, if you are open to a function, then I think something like this, while slower, would still work and do everything on a single pass. This will have the advantage of short circuiting, meaning it will skip the record when it finds the first match. In such a case, it may make sense to order the negatives in the likelihood they will occur.
create or replace function valid_keywords()
returns setof keywords
language plpgsql
as $BODY$
declare
rw keywords%rowtype;
negs text[];
neg text;
main_text text;
begin
select array_agg (text)
into negs
from negatives;
<<kwd>>
for rw in select * from keywords
loop
main_text := ' ' || rw.query || ' ';
foreach neg in array negs
loop
if main_text ~ (' ' || neg || ' ') then
continue kwd;
end if;
end loop;
return next rw;
end loop;
end
$BODY$
And to execute it, simply:
select * from valid_keywords()
This should handle your "big bottle" example.
If you want case insensitive searches, you can change the regex operator to a ~*.

Escaping characters in PostgreSQL

I have a very large database populated from social media. I'm trying to make a new column to make JSON for word_counter for faster analytics.
I'm first creating a function in PostgreSQL to take a string array, count the occurrences, and return a jsonb that gets inserted. Here is the following function
CREATE
OR REPLACE FUNCTION count_elements (TEXT []) RETURNS JSONB AS $$
DECLARE js JSONB := '{}' ;
DECLARE jjson JSONB ;
BEGIN
SELECT
jsonb_agg (
(
'{"' || i|| '":"' || C || '"}'
) :: JSONB
) INTO jjson
FROM
(
SELECT
i,
COUNT (*) C
FROM
(SELECT UNNEST($1 :: TEXT []) i) i
GROUP BY
i
ORDER BY
C DESC
) foo ; RETURN jjson ;
END ; $$ LANGUAGE plpgsql;
Here is the issue. When running the following query
select count_elements(string_to_array(lower(tweet_text), ' ')),tweet_text from tweet_database
limit 10
I get this error
[Err] ERROR: invalid input syntax for type json
DETAIL: Character with value 0x0a must be escaped.
CONTEXT: JSON data, line 1: {"winning?
SQL statement "SELECT
I tried escaping the column, and then regex replacing some of the items but it hasn't worked yet.
the to_json function can be used to escape text:
SELECT
jsonb_agg (
(
'{' || to_json(i) || ':' || C || '}'
) :: JSONB
) INTO jjson
then
select count_elements(E'{a, a, b, a\nb, a}'::text[]);
results in
[{"a":3}, {"b":1}, {"a\nb":1}]

How to execute a dynamic query in PostgreSQL?

I am trying to execute the following dynamic sql, but I could not figure out how to do it:
DROP FUNCTION f_mycross(text, text);
EXECUTE ('CREATE OR REPLACE FUNCTION f_mycross(text, text)
RETURNS TABLE ("registration_id" integer, '
|| (SELECT string_agg(DISTINCT pivot_headers, ',' order by pivot_headers)
FROM (SELECT DISTINCT '"' || qid::text || '" text' AS pivot_headers
FROM answers) x)
|| ') AS ''$libdir/tablefunc'',''crosstab_hash'' LANGUAGE C STABLE STRICT;')
I am relatively new to PostgreSQL.
Like a_horse commented, EXECUTE is not an SQL command. It's a PL/pgSQL command and can only be used in a function body or DO statement using this procedural language. Like:
DROP FUNCTION IF EXISTS f_mycross(text, text);
DO
$do$
BEGIN
EXECUTE (
SELECT 'CREATE OR REPLACE FUNCTION f_mycross(text, text)
RETURNS TABLE (registration_id integer, '
|| string_agg(pivot_header || ' text', ', ')
|| $$) AS '$libdir/tablefunc', 'crosstab_hash' LANGUAGE C STABLE STRICT$$
FROM (SELECT DISTINCT quote_ident(qid::text) AS pivot_header FROM answers ORDER BY 1) x
);
END
$do$; -- LANGUAGE plpgsql is the default
I added some improvements and simplified the nested SELECT query.
Major points
Add IF EXISTS to DROP FUNCTION unless you are certain the function exists or you want to raise an exception if it does not.
DISTINCT in the subquery is enough, no need for another DISTINCT in the outer SELECT.
Use quote_ident() to automatically double-quote identifiers where necessary.
No parentheses required around the string we feed to EXECUTE.
Simpler nested quoting with $-quotes.
Insert text with single quotes in PostgreSQL
We can apply ORDER BY in the subquery, which is typically much faster than adding ORDER BY in the outer aggregate function.

Split string with two delimiters and convert type

I have a PL/pgSQL function like this (thanks to the guy who made this possible):
CREATE OR REPLACE FUNCTION public.split_string(text, text)
RETURNS SETOF text
LANGUAGE plpgsql
AS $function$
DECLARE
pos int;
delim_length int := length($2);
BEGIN
WHILE $1 <> ''
LOOP
pos := strpos($1,$2);
IF pos > 0 THEN
RETURN NEXT substring($1 FROM 1 FOR pos - 1);
$1 := substring($1 FROM pos + delim_length);
ELSE
RETURN NEXT $1;
EXIT;
END IF;
END LOOP;
RETURN;
END;
$function$
It splits a string with a delimiter. Like this:
select * from split_string('3.584731 60.739211,3.590472 60.738030,3.592740 60.736220', ' ');
"3.584731"
"60.739211,3.590472"
"60.738030,3.592740"
"60.736220"
How can I save the results in a temp_array or temp_table. So I can get the the results in temp_x and split up these points again. Like:
"3.584731"
"60.739211"
"3.590472"
"60.738030"
"3.592740"
"60.736220"
and return the values as double precision. And all of this should be done in the function.
If you need the intermediary step:
SELECT unnest(string_to_array(a, ' '))::float8
-- or do something else with the derived table
FROM unnest(string_to_array('3.584731 60.739211,3.590472 60.738030', ',')) a;
This is more verbose than regexp_split_to_table(), but may still be faster because regular expressions are typically more expensive. (Test with EXPLAIN ANALYZE.)
I first split at ',', and next at ' ' - the reversed sequence of what you describe seems more adequate.
If need be, you can wrap this into a PL/pgSQL function:
CREATE OR REPLACE FUNCTION public.split_string(_str text
, _delim1 text = ','
, _delim2 text = ' ')
RETURNS SETOF float8 AS
$func$
BEGIN
RETURN QUERY
SELECT unnest(string_to_array(a, _delim2))::float8
-- or do something else with the derived table from step 1
FROM unnest(string_to_array(_str, _delim1)) a;
END
$func$ LANGUAGE plpgsql IMMUTABLE;
Or just an SQL function:
CREATE OR REPLACE FUNCTION public.split_string(_str text
, _delim1 text = ','
, _delim2 text = ' ')
RETURNS SETOF float8 AS
$func$
SELECT unnest(string_to_array(a, _delim2))::float8
FROM unnest(string_to_array(_str, _delim1)) a
$func$ LANGUAGE sql IMMUTABLE;
Make it IMMUTABLE to allow performance optimization and other uses.
Call (using the provided defaults for _delim1 and _delim2):
SELECT * FROM split_string('3.584731 60.739211,3.590472 60.738030');
Or:
SELECT * FROM split_string('3.584731 60.739211,3.590472 60.738030', ',', ' ');
Fastest
For top performance, combine translate() with unnest(string_to_array(...)):
SELECT unnest(
string_to_array(
translate('3.584731 60.739211,3.590472 60.738030', ' ', ',')
, ','
)
)::float8
You do not need special functions, use built-in regexp_split_to_table:
SELECT *
FROM regexp_split_to_table(
'3.584731 60.739211,3.590472 60.738030,3.592740 60.736220',
'[, ]') s;
EDIT:
I don't see why you wish to stick with PL/pgSQL function if there's a built-in one.
Anyway, consider this example:
WITH s AS
(
SELECT ' ,'::text sep,
'3.584731 60.739211,3.590472 60.738030,3.592740 60.736220'::text str
)
SELECT sep, left(sep,1), right(sep,-1),
str,
translate(str, right(sep,-1), left(sep,1))
FROM s;
This means, that you can:
do similar transformations before you call your function or
integrate this code inside, but this will mean you'll need to introduce at least one extra variable, unless you feel comfortable replacing all $1 into translate($1, right($2,-1), left($2,1)) throughout the code. Obviously, plain $2 should be changed to left($2,1).
If I understand to your questions well, you can do:
-- store context to temp table
CREATE TEMP TABLE foo AS SELECT v::double precision FROM split_string('...') g(v);
-- store context to ARRAY
SELECT ARRAY(SELECT v::double precision FROM split_string('....') g(v))