Postgresql - Basic Arrays and array_agg - postgresql

As a Test I created this schema:
CREATE TABLE simple_table (client_id int4, order_id int4);
INSERT INTO simple_table (client_id, order_id)
VALUES
(1,2),(1,3),(1,4),(1,6),(1,8),(1,12),(1,16),(1,18),(1,25),(1,32),(1,33),(1,37),(1,43),
(1,56),(1,57),(1,66),(2,2),(2,3),(2,5),(2,7),(2,9),(2,12),(2,17),(2,19),(2,22),(2,30),
(2,33),(2,38),(2,44),(2,56),(2,58),(2,66)
;
Then used array_agg:
SELECT client_id, array_agg(order_id) FROM simple_table GROUP BY client_id;
to create the arrays for client 1 and client 2:
| CLIENT_ID | ARRAY_AGG |
----------------------------------------------------------
| 1 | 2,3,4,6,8,12,16,18,25,32,33,37,43,56,57,66 |
| 2 | 2,3,5,7,9,12,17,19,22,30,33,38,44,56,58,66 |
Now I would like to compare the 2 rows and identify the like values. Tried && overlap (have elements in common) ARRAY[1,4,3] && ARRAY[2,1] from the Postgresql documentation but I am having problems.
Perhaps I am looking at this wrong. Any help or guidance would be appreciated!

The && operator is a predicate that yields a true or false result, not a list of values.
If you're looking for the list of order_id that exist for both client_id=1 and client_id=2, the query would be:
select order_id from simple_table where client_id in (1,2)
group by order_id having count(*)=2;
That's equivalent to the intersections of the two arrays if you consider that these arrays are sets (no duplicates and the positions of the values are irrelevant), except that you don't need to use arrays at all, simple standard SQL is good enough.

Take a look at the "array_intersect" functions here:
Array Intersect
To see elements that are not common to both arrays:
create or replace function arrxor(anyarray,anyarray) returns anyarray as $$
select ARRAY(
(
select r.elements
from (
(select 1,unnest($1))
union all
(select 2,unnest($2))
) as r (arr, elements)
group by 1
having min(arr) = max(arr)
)
)
$$ language sql strict immutable;

Related

How to find posts tagged with any of the predefined tags in Postgresql

I have posts table with the following structure:
| id | score | title | tags |
-------------------------------------------------
| 1 | 42 | Travel | <uk><travel><passport> |
For each blog post I want to find relevant posts, tagged with any of the tags corresponding to the current page, in my case: <uk>, <travel> or <passport>. Then, order results by score, limit it to 5 items and display it to the user.
This is the code I came up with so far, but it seems only getting the result for the first tag in the query – <uk>.
with tags_string (tag) as (
select unnest(string_to_array('<uk><travel><passport>', '>'))
)
select *
from
(
select distinct *
from posts
cross join tags_string
cross join lateral
(select
(tags ~ tag)::int as match_found
) m
where m.match_found > 0
) t
order by t.score desc
limit 5;
EDIT
After #Mike Organek's comment I changed the query this, and it's working as I initially expected.
with tags_string (tag) as (
select unnest(string_to_array('<uk><travel><passport>', '>'))
)
select *
from
(
select distinct *
from posts
cross join tags_string
cross join lateral
(select
position(tag in tags) > 0 as match_found
) m
where m.match_found and tag <> ''
) t
order by t.score desc
limit 5;
I would convert the tags into an array then use array operators to find the relevant posts:
select id, title, score, tags
from posts
where string_to_array(trim(both '<>' from replace(tags, '><', ',')), ',') #> array['uk', 'travel', 'passport']
order by score
limit 5
In the long run, storing the tags as an array or a jsonb array is probably a lot more efficient.
If you do that a lot, things might get a bit easier if you create a function for this:
create function tags_array(p_input text)
returns text[]
as
$$
select string_to_array(trim(both '<>' from replace(p_input, '><', ',')), ',');
$$
language sql
immutable;
Then the query is a bit easier to read:
select id, title, score, tags
from posts
where tags_array(tags) #> array['uk', 'travel', 'passport']
order by score
limit 5
You can even create an index for that if you want:
create index on posts using gin ( (tags_array(tags)) );

Does String Value Exists in a List of Strings | Redshift Query

I have some interesting data, I'm trying to query however I cannot get the syntax correct. I have a temporary table (temp_id), which I've filled with the id values I care about. In this example it is only two ids.
CREATE TEMPORARY TABLE temp_id (id bigint PRIMARY KEY);
INSERT INTO temp_id (id) VALUES ( 1 ), ( 2 );
I have another table in production (let's call it foo) which holds multiples those ids in a single cell. The ids column looks like this (below) with ids as a single string separated by "|"
ids
-----------
1|9|3|4|5
6|5|6|9|7
NULL
2|5|6|9|7
9|11|12|99
I want to evaluate each cell in foo.ids, and see if any of the ids in match the ones in my temp_id table.
Expected output
ids |does_match
-----------------------
1|9|3|4|5 |true
6|5|6|9|7 |false
NULL |false
2|5|6|9|7 |true
9|11|12|99 |false
So far I've come up with this, but I can't seem to return anything. Instead of trying to create a new column does_match I tried to filter within the WHERE statement. However, the issue is I cannot figure out how to evaluate all the id values in my temp table to the string blob full of the ids in foo.
SELECT
ids,
FROM foo
WHERE ids = ANY(SELECT LISTAGG(id, ' | ') FROM temp_ids)
Any suggestions would be helpful.
Cheers,
this would work, however not sure about performance
SELECT
ids
FROM foo
JOIN temp_ids
ON '|'||foo.ids||'|' LIKE '%|'||temp_ids.id::varchar||'|%'
you wrap the IDs list into a pair of additional separators, so you can always search for |id| including the first and the last number
The following SQL (I know it's a bit of a hack) returns exactly what you expect as an output, tested with your sample data, don't know how would it behave on your real data, try and let me know
with seq AS ( # create a sequence CTE to implement postgres' unnest
select 1 as i union all # assuming you have max 10 ids in ids field,
# feel free to modify this part
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10)
select distinct ids,
case # since I can't do a max on a boolean field, used two cases
# for 1s and 0s and converted them to boolean
when max(case
when t.id in (
select split_part(ids,'|',seq.i) as tt
from seq
join foo f on seq.i <= REGEXP_COUNT(ids, '|') + 1
where tt != '' and k.ids = f.ids)
then 1
else 0
end) = 1
then true
else false
end as does_match
from temp_id t, foo
group by 1
Please let me know if this works for you!

How to split a string in a smart way?

Function string_to_array splits strings without grouping substrings in apostrophes:
# select unnest(string_to_array('one, "two,three"', ','));
unnest
--------
one
"two
three"
(3 rows)
I would like to have a smarter function, like this:
# select unnest(smarter_string_to_array('one, "two,three"', ','));
unnest
--------
one
two,three
(2 rows)
Purpose.
I know that COPY command does it in a proper way, but I need this feature internally.
I want to parse a text representation of rows of existing table. Example:
# select * from dataset limit 2;
id | name | state
----+-----------------+--------
1 | Smith, Reginald | Canada
2 | Jones, Susan |
(2 rows)
# select dataset::text from dataset limit 2;
dataset
------------------------------
(1,"Smith, Reginald",Canada)
(2,"Jones, Susan","")
(2 rows)
I want to do it dynamically in a plpgsql function for different tables. I cannot assume constant number of columns of a table nor a format of columns values.
There is a nice method to transpose a whole table into a one-column table:
select (json_each_text(row_to_json(t))).value from dataset t;
If the column id is unique then
select id, array_agg(value) arr from (
select row_number() over() rn, id, value from (
select id, (json_each_text(row_to_json(t))).value from dataset t
) alias
order by id, rn
) alias
group by id;
gives you exactly what you want. Additional query with row_number() is necessary to keep original order of columns.

PostgreSQL: Pivot a 2-column result set into a single-row table

Struggling with what I thought would be a straightforward operation...
EDIT: SQLFiddle available here: http://sqlfiddle.com/#!15/11711/1/0
Using PostgreSQL 9.4, pretend I have a query that returns this two-column set:
CATEGORY | TOTAL
all | 14
soccer | 5
baseball | 6
hockey | 3
However I'd prefer to pivot it into a single-row set:
ALL | SOCCER | BASEBALL | HOCKEY
14 | 5 | 6 | 3
In other words, I want all my "CATEGORY" values to become columns, with the corresponding "TOTAL" value to be placed in the first row under the appropriate column.
I've been trying to use CROSSTAB()... but as of now I'm getting the following error:
ERROR: a column definition list is required for functions returning "record"
For reference, here's what I'm trying to put as my SQL command:
SELECT * FROM crosstab(
$$
WITH "countTotal" AS (
SELECT text 'all' AS "sportType", COUNT(*) AS "total"
FROM log
WHERE type = 'SPORT_EVENT_CREATED'
GROUP BY "sportType"
),
"countBySportType" AS (
SELECT sport_type AS "sportType", COUNT(*) AS "total"
FROM log
WHERE type = 'SPORT_EVENT_CREATED'
GROUP BY "sportType"
)
SELECT * FROM "countTotal"
UNION
SELECT * FROM "countBySportType"
$$
)
I think you have to specify names and types of the output columns. From the postgres manual tablefunc
The crosstab function is declared to return setof record, so the
actual names and types of the output columns must be defined in the
FROM clause of the calling SELECT statement, for example:
SELECT * FROM crosstab('...') AS ct(row_name text, category_1 text, category_2 text);
You have to use crosstabN(text) to use it with dynamic number of columns. This PostgreSQL Crosstab Query whole lot of details about the cross tab query.
One more post Dynamic alternative to pivot with CASE and GROUP BY

Adding the results of two select queries into one table row with PostgreSQL

I am attempting to return the result of two distinct select statements into one row in PostgreSQL. For example, I have two queries each that return the same number of rows:
Select tableid1, tableid2, tableid3 from table1
+----------+----------+----------+
| tableid1 | tableid2 | tableid3 |
+----------+----------+----------+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
+----------+----------+----------+
Select table2id1, table2id2, table2id3, table2id4 from table2
+-----------+-----------+-----------+-----------+
| table2id1 | table2id2 | table2id3 | table2id4 |
+-----------+-----------+-----------+-----------+
| 7 | 8 | 9 | 15 |
| 10 | 11 | 12 | 19 |
+-----------+-----------+-----------+-----------+
Now i want to concatenate these tables keeping the same number of rows. I do not want to join on any values. The desired result would look like the following:
+----------+----------+----------+-----------+-----------+-----------+-----------+
| tableid1 | tableid2 | tableid3 | table2id1 | table2id2 | table2id3 | table2id4 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
| 1 | 2 | 3 | 7 | 8 | 9 | 15 |
| 4 | 5 | 6 | 10 | 11 | 12 | 19 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
What can I do to the two above queries (select * from table1) and (select * from table2) to return the desired result above.
Thanks!
You can use row_number() for join, but I'm not sure that you have guaranties that order of the rows will stay the same as in the tables. So it's better to add some order into over() clause.
with cte1 as (
select
tableid1, tableid2, tableid3, row_number() over() as rn
from table1
), cte2 as (
select
table2id1, table2id2, table2id3, table2id4, row_number() over() as rn
from table2
)
select *
from cte1 as c1
inner join cte2 as c2 on c2.rn = c1.rn
You can't have what you want, as you wrote the question. Your two SELECTs don't have any ORDER BY clause, so the database can return the rows in whatever order it feels like. If it currently matches up, it does so only by accident, and will stop matching up as soon as you UPDATE a row.
You need a key column. Then you need to join on the key column. Anything else is attempting to invent unreliable and unsafe joins without actually using a join.
Frankly, this seems like a pretty dodgy schema. Lots of numbered integer columns like this, and the desire to concatenate them, may be a sign you should be looking at using integer arrays, or using a side-table with a foreign key relationship, instead.
Sample data in case anyone else wants to play:
CREATE TABLE table1(tableid1 integer, tableid2 integer, tableid3 integer);
INSERT INTO table1 VALUES (1,2,3), (4,5,6);
CREATE TABLE table2(table2id1 integer, table2id2 integer, table2id3 integer, table2id4 integer);
INSERT INTO table2 VALUES (7,8,9,15), (10,11,12,19);
Depending on what you're actually doing you might really have wanted arrays.
I think you might need to read these two posts:
Join 2 sets based on default order
How keep data don't sort?
which explain that SQL tables just don't have an order. So you cannot fetch them in a particular order.
DO NOT USE THE FOLLOWING CODE, IT IS DANGEROUS AND ONLY INCLUDED AS A PROOF OF CONCEPT:
As it happens you can use a set-returning function hack to very inefficiently do what you want. It's incredibly ugly and *completely unsafe without an ORDER BY in the SELECTs, but I'll include it for completeness. I guess.
CREATE OR REPLACE FUNCTION t1() RETURNS SETOF table1 AS $$ SELECT * FROM table1 $$ LANGUAGE sql;
CREATE OR REPLACE FUNCTION t2() RETURNS SETOF table2 AS $$ SELECT * FROM table2 $$ LANGUAGE sql;
SELECT (t1()).*, (t2()).*;
If you use this in any real code then kittens will cry. It'll produce insane and bizarre results if the number of rows in the tables differ and it'll produce the rows in orderings that might seem right at first, but will randomly start coming out wrong later on.
THE SANE WAY is to add a primary key properly, then do a join.