PostgreSQL regexp_matches only returning rows that match it? - postgresql

This is my first time using regexp_matches and I find that using it will only return rows that match all regexp_matches in the SELECT clause.
For example:
SELECT parameters,
regexp_matches(parameters, 'a=(\d+)'),
regexp_matches(parameters, 'b=(\d+)')
FROM table;
Will return result with row where parameters is a=1&b=1 but not a=1 or b=1
It's acting as if it was a where clause. Why is this?

This is because regexp_matches() returns set of rows.
With no matches it returns no rows.
Use search by one regexp, e.g.:
SELECT
parameters,
regexp_matches(parameters, '[a|b]=(\d+)')
FROM a_table;
or, if you want to get two columns for aand b:
SELECT parameters, a, b
FROM (
SELECT
parameters,
regexp_matches(parameters, 'a=(\d+)') a,
null b
FROM a_table
UNION
SELECT
parameters,
null,
regexp_matches(parameters, 'b=(\d+)')
FROM a_table
) s;

Another way around this is to match on the end of the string in addition to the pattern you want to find.
SELECT parameters,
regexp_matches(parameters, '(a=(\d+))|$'),
regexp_matches(parameters, '(b=(\d+))|$')
FROM table;
Then you might want to do some other processing since any string missing the pattern will show up in results as {NULL} (or {NULL,NULL<,...>}, one per match group).

Related

How to reference a column in the select clause in the order clause in SQLAlchemy like you do in Postgres instead of repeating the expression twice

In Postgres if one of your columns is a big complicated expression you can just say ORDER BY 3 DESC where 3 is the order of the column where the complicated expression is. Is there anywhere to do this in SQLAlchemy?
As Gord Thompson observes in this comment, you can pass the column index as a text object to group_by or order_by:
q = sa.select(sa.func.count(), tbl.c.user_id).group_by(sa.text('2')).order_by(sa.text('2'))
serialises to
SELECT count(*) AS count_1, posts.user_id
FROM posts GROUP BY 2 ORDER BY 2
There are other techniques that don't require re-typing the expression.
You could use the selected_columns property:
q = sa.select(tbl.c.col1, tbl.c.col2, tbl.c.col3)
q = q.order_by(q.selected_columns[2]) # order by col3
You could also order by a label (but this will affect the names of result columns):
q = sa.select(tbl.c.col1, tbl.c.col2, tbl.c.col3.label('c').order_by('c')

PostgreSQL calculate prefix combinations after split

I do have a string as entry, of the form foo:bar:something:221. I'm looking for a way to generate a table with all prefixes for this string, like:
foo
foo:bar
foo:bar:something
foo:bar:something:221
I wrote the following query to split the string, but can't figure out where to go from there:
select unnest(string_to_array('foo:bar:something:221', ':'));
An option is to simulate a loop over all elements, then take the sub-array from the input for each element index:
with data(input) as (
values (string_to_array('foo:bar:something:221', ':'))
)
select array_to_string(input[1:g.idx], ':')
from data
cross join generate_series(1, cardinality(input)) as g(idx);
generate_series(1, cardinality(input)) generates as many rows as the array has elements. And the expression input[1:g.idx] takes the "sub-array" starting with the first up to the "idx" one. As the output is an array, I use array_to_string to re-create the representation with the :
You can use string_agg as a window function. The default frame is from the beginning of the partition to the current row:
SELECT string_agg(s, ':') OVER (ORDER BY n)
FROM unnest(string_to_array('foo:bar:something:221', ':')) WITH ORDINALITY AS u(s, n);
string_agg
-----------------------
foo
foo:bar
foo:bar:something
foo:bar:something:221
(4 rows)

Need help in parsing column value based on value in other column

I have two columns, COL1 and COL2. COL1 has value like 'Birds sitting on $1 and enjoying' and COL2 has value like 'the.location_value[/tree,\building]'
I need to update third column COL3 with values like 'Birds sitting on /tree and enjoying'
i.e. $1 in 1st column is replaced with /tree
which is the 1st word from list of comma separated words with in square brackets [] in COL2 i.e. [/tree,\building]
I wanted to know the best suitable combination of string function in postgresql to use to achieve this.
You need to first extract the first element from the comma separated list, to do that, you can use split_part() but you first need to extract the actual list of values. This can be done using substring() with a regular expression:
substring(col2 from '\[(.*)\]')
will return /tree,\building
So the complete query would be:
select replace(col1, '$1', split_part(substring(col2 from '\[(.*)\]'), ',', 1))
from the_table;
Online example: http://rextester.com/CMFZMP1728
This one should work with any (int) number after $:
select t.*, c.col3
from t,
lateral (select string_agg(case
when o = 1 then s
else (string_to_array((select regexp_matches(t.col2, '\[(.*)\]'))[1], ','))[(select regexp_matches(s, '^\$(\d+)'))[1]::int] || substring(s from '^\$\d+(.*)')
end, '' order by o) col3
from regexp_split_to_table(t.col1, '(?=\$\d+)') with ordinality s(s, o)) c
http://rextester.com/OKZAG54145
Note:it is not the most efficient though. It splits col2's values (in the square brackets) each time for replacing $N.
Update: LATERAL and WITH ORDINALITY is not supported in older versions, but you could try a correlating subquery instead:
select t.*, (select array_to_string(array_agg(case
when s ~ E'^\\$(\\d+)'
then (string_to_array((select regexp_matches(t.col2, E'\\[(.*)\\]'))[1], ','))[(select regexp_matches(s, E'^\\$(\\d+)'))[1]::int] || substring(s from E'^\\$\\d+(.*)')
else s
end), '') col3
from regexp_split_to_table(t.col1, E'(?=\\$\\d+)') s) col3
from t

postgres `order by` argument type

What is the argument type for the order by clause in Postgresql?
I came across a very strange behaviour (using Postgresql 9.5). Namely, the query
select * from unnest(array[1,4,3,2]) as x order by 1;
produces 1,2,3,4 as expected. However the query
select * from unnest(array[1,4,3,2]) as x order by 1::int;
produces 1,4,3,2, which seems strange. Similarly, whenever I replace 1::int with whatever function (e.g. greatest(0,1)) or even case operator, the results are unordered (on the contrary to what I would expect).
So which type should an argument of order by have, and how do I get the expected behaviour?
This is expected (and documented) behaviour:
A sort_expression can also be the column label or number of an output column
So the expression:
order by 1
sorts by the first column of the result set (as defined by the SQL standard)
However the expression:
order by 1::int
sorts by the constant value 1, it's essentially the same as:
order by 'foo'
By using a constant value for the order by all rows have the same sort value and thus aren't really sorted.
To sort by an expression, just use that:
order by
case
when some_column = 'foo' then 1
when some_column = 'bar' then 2
else 3
end
The above sorts the result based on the result of the case expression.
Actually I have a function with an integer argument which indicates the column to be used in the order by clause.
In a case when all columns are of the same type, this can work: :
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1
WHEN 2 THEN column2
.....
WHEN 1235 THEN column1235
END
If columns are of different types, you can try:
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1::varchar
WHEN 2 THEN column2::varchar
.....
WHEN 1235 THEN column1235::varchar
END
But these "workarounds" are horrible. You need some other approach than the function returning a column number.
Maybe a dynamic SQL ?
I would say that dynamic SQL (thanks #kordirko and the others for the hints) is the best solution to the problem I originally had in mind:
create temp table my_data (
id serial,
val text
);
insert into my_data(id, val)
values (default, 'a'), (default, 'c'), (default, 'd'), (default, 'b');
create function fetch_my_data(col text)
returns setof my_data as
$f$
begin
return query execute $$
select * from my_data
order by $$|| quote_ident(col);
end
$f$ language plpgsql;
select * from fetch_my_data('val'); -- order by val
select * from fetch_my_data('id'); -- order by id
In the beginning I thought this could be achieved using case expression in the argument of the order by clause - the sort_expression. And here comes the tricky part which confused me: when sort_expression is a kind of identifier (name of a column or a number of a column), the corresponding column is used when ordering the results. But when sort_expression is some value, we actually order the results using that value itself (computed for each row). This is #a_horse_with_no_name's answer rephrased.
So when I queried ... order by 1::int, in a way I have assigned value 1 to each row and then tried to sort an array of ones, which clearly is useless.
There are some workarounds without dynamic queries, but they require writing more code and do not seem to have any significant advantages.

Prepare dynamic case statement using PostgreSQL 9.3

I have the following case statement to prepare as a dynamic as shown below:
Example:
I have the case statement:
case cola
when cola between '2001-01-01' and '2001-01-05' then 'G1'
when cola between '2001-01-10' and '2001-01-15' then 'G2'
when cola between '2001-01-20' and '2001-01-25' then 'G3'
when cola between '2001-02-01' and '2001-02-05' then 'G4'
when cola between '2001-02-10' and '2001-02-15' then 'G5'
else ''
end
Note: Now I want to create dynamic case statement because of the values dates and name passing as a parameter and it may change.
Declare
dates varchar = '2001-01-01to2001-01-05,2001-01-10to2001-01-15,
2001-01-20to2001-01-25,2001-02-01to2001-02-05,
2001-02-10to2001-02-15';
names varchar = 'G1,G2,G3,G4,G5';
The values in the variables may change as per the requirements, it will be dynamic. So the case statement should be dynamic without using loop.
You may not need any function for this, just join to a mapping data-set:
with cola_map(low, high, value) as (
values(date '2001-01-01', date '2001-01-05', 'G1'),
('2001-01-10', '2001-01-15', 'G2'),
('2001-01-20', '2001-01-25', 'G3'),
('2001-02-01', '2001-02-05', 'G4'),
('2001-02-10', '2001-02-15', 'G5')
-- you can include as many rows, as you want
)
select table_name.*,
coalesce(cola_map.value, '') -- else branch from case expression
from table_name
left join cola_map on table_name.cola between cola_map.low and cola_map.high
If your date ranges could collide, you can use DISTINCT ON or GROUP BY to avoid row duplication.
Note: you can use a simple sub-select too, I used a CTE, because it's more readable.
Edit: passing these data (as a single parameter) can be achieved by passing a multi-dimensional array (or an array of row-values, but that requires you to have a distinct, predefined composite type).
Passing arrays as parameters can depend on the actual client (& driver) you use, but in general, you can use the array's input representation:
-- sql
with cola_map(low, high, value) as (
select d[1]::date, d[2]::date, d[3]
from unnest(?::text[][]) d
)
select table_name.*,
coalesce(cola_map.value, '') -- else branch from case expression
from table_name
left join cola_map on table_name.cola between cola_map.low and cola_map.high
// client pseudo code
query = db.prepare(sql);
query.bind(1, "{{2001-01-10,2001-01-15,G2},{2001-01-20,2001-01-25,G3}}");
query.execute();
Passing each chunk of data separately is also possible with some clients (or with some abstractions), but this is highly depends on your driver/orm/etc. you use.