How to split a string in a smart way? - postgresql

Function string_to_array splits strings without grouping substrings in apostrophes:
# select unnest(string_to_array('one, "two,three"', ','));
unnest
--------
one
"two
three"
(3 rows)
I would like to have a smarter function, like this:
# select unnest(smarter_string_to_array('one, "two,three"', ','));
unnest
--------
one
two,three
(2 rows)
Purpose.
I know that COPY command does it in a proper way, but I need this feature internally.
I want to parse a text representation of rows of existing table. Example:
# select * from dataset limit 2;
id | name | state
----+-----------------+--------
1 | Smith, Reginald | Canada
2 | Jones, Susan |
(2 rows)
# select dataset::text from dataset limit 2;
dataset
------------------------------
(1,"Smith, Reginald",Canada)
(2,"Jones, Susan","")
(2 rows)
I want to do it dynamically in a plpgsql function for different tables. I cannot assume constant number of columns of a table nor a format of columns values.

There is a nice method to transpose a whole table into a one-column table:
select (json_each_text(row_to_json(t))).value from dataset t;
If the column id is unique then
select id, array_agg(value) arr from (
select row_number() over() rn, id, value from (
select id, (json_each_text(row_to_json(t))).value from dataset t
) alias
order by id, rn
) alias
group by id;
gives you exactly what you want. Additional query with row_number() is necessary to keep original order of columns.

Related

Does String Value Exists in a List of Strings | Redshift Query

I have some interesting data, I'm trying to query however I cannot get the syntax correct. I have a temporary table (temp_id), which I've filled with the id values I care about. In this example it is only two ids.
CREATE TEMPORARY TABLE temp_id (id bigint PRIMARY KEY);
INSERT INTO temp_id (id) VALUES ( 1 ), ( 2 );
I have another table in production (let's call it foo) which holds multiples those ids in a single cell. The ids column looks like this (below) with ids as a single string separated by "|"
ids
-----------
1|9|3|4|5
6|5|6|9|7
NULL
2|5|6|9|7
9|11|12|99
I want to evaluate each cell in foo.ids, and see if any of the ids in match the ones in my temp_id table.
Expected output
ids |does_match
-----------------------
1|9|3|4|5 |true
6|5|6|9|7 |false
NULL |false
2|5|6|9|7 |true
9|11|12|99 |false
So far I've come up with this, but I can't seem to return anything. Instead of trying to create a new column does_match I tried to filter within the WHERE statement. However, the issue is I cannot figure out how to evaluate all the id values in my temp table to the string blob full of the ids in foo.
SELECT
ids,
FROM foo
WHERE ids = ANY(SELECT LISTAGG(id, ' | ') FROM temp_ids)
Any suggestions would be helpful.
Cheers,
this would work, however not sure about performance
SELECT
ids
FROM foo
JOIN temp_ids
ON '|'||foo.ids||'|' LIKE '%|'||temp_ids.id::varchar||'|%'
you wrap the IDs list into a pair of additional separators, so you can always search for |id| including the first and the last number
The following SQL (I know it's a bit of a hack) returns exactly what you expect as an output, tested with your sample data, don't know how would it behave on your real data, try and let me know
with seq AS ( # create a sequence CTE to implement postgres' unnest
select 1 as i union all # assuming you have max 10 ids in ids field,
# feel free to modify this part
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10)
select distinct ids,
case # since I can't do a max on a boolean field, used two cases
# for 1s and 0s and converted them to boolean
when max(case
when t.id in (
select split_part(ids,'|',seq.i) as tt
from seq
join foo f on seq.i <= REGEXP_COUNT(ids, '|') + 1
where tt != '' and k.ids = f.ids)
then 1
else 0
end) = 1
then true
else false
end as does_match
from temp_id t, foo
group by 1
Please let me know if this works for you!

using LOWER with IN condition [duplicate]

This question already has answers here:
PostgreSQL: Case insensitive string comparison
(6 answers)
Closed 6 years ago.
assume I have a table named comodity_group and the structure looks like:
+----------+-------+
| group_id | name |
+----------+-------+
| 1 | Data1 |
+----------+-------+
| 2 | Data2 |
+----------+-------+
| 3 | data3 |
+----------+-------+
and I have the following query
SELECT * FROM comodity_group WHERE name IN('data1','data2','data3')
the query return 0 result, because condition is all in lowercase (note that the condition is also dynamic, meaning it can be Data1 or daTa1, etc)
so I want to make both condition and field name in lowercase, in other word case insensitive.
You can use ILIKE and an array:
select *
from comodity_group
where name ilike any (array['Data1', 'data2', 'dATA3']);
Note that this won't be really fast as the ILIKE operator can't make use of a regular index on the name column.
You can convert your name data to lowercase
SELECT * FROM comodity_group WHERE lower(name) IN('data1','data2','data3')
Assuming you have control over the terms which appear in the IN clause of your query, then you should only need to lowercase the name column before making the comparison:
SELECT *
FROM commodity_group
WHERE LOWER(name) IN ('data1', 'data2', 'data3')
Off the top of my head, you could also join to an inline table containing the search terms:
WITH cte AS (
SELECT 'daTa1' AS name
UNION ALL
SELECT 'Data2'
UNION ALL
SELECT 'datA3'
)
SELECT *
FROM commodity_group t1
INNER JOIN cte t2
ON LOWER(t1.name) = LOWER(t2.name)
With the possible matches in an actual table, we now have the ability to lowercase both sides of the comparison.

postgres offset by value not number

I have a table with at least a "name" column and an "ordinal_position" column. I wish to loop each row starting from a certain row the user inputs. Let's say the user inputs "John", and that his ordinal_position is 6 (out of a 10 total). How do I loop only the last 4 rows without using a subquery? I've tried using the "OVER()" window function but it doesn't seem to work on the offset part of the query, and that same offset only takes numbers (as far as I know) not strings.
EDIT (in response to klin):
INSERT INTO foo(id,name,ordinal_position) VALUES
(DEFAULT,'Peter',1),
(DEFAULT,'James',2),
(DEFAULT,'Freddy',3),
(DEFAULT,'Mark',4),
(DEFAULT,'Jack',5),
(DEFAULT,'John',6),
(DEFAULT,'Will',7),
(DEFAULT,'Robert',8),
(DEFAULT,'Dave',9),
(DEFAULT,'Michael',10);
so in my FOR, since the user inputed "John" I want to loop through Will-Michael. Something like the following but without a subquery:
SELECT * FROM foo ORDER BY ordinal_position OFFSET
(SELECT ordinal_position FROM foo WHERE name='John');
Unfortunately, you have to query the table to find an ordinal_position for a given name.
However, do not use offset. You can do it in where clause, for large tables it will be much faster:
select *
from foo
where ordinal_position > (select ordinal_position from foo where name = 'John')
order by ordinal_position;
id | name | ordinal_position
----+---------+------------------
7 | Will | 7
8 | Robert | 8
9 | Dave | 9
10 | Michael | 10
(4 rows)

PostgreSQL: Pivot a 2-column result set into a single-row table

Struggling with what I thought would be a straightforward operation...
EDIT: SQLFiddle available here: http://sqlfiddle.com/#!15/11711/1/0
Using PostgreSQL 9.4, pretend I have a query that returns this two-column set:
CATEGORY | TOTAL
all | 14
soccer | 5
baseball | 6
hockey | 3
However I'd prefer to pivot it into a single-row set:
ALL | SOCCER | BASEBALL | HOCKEY
14 | 5 | 6 | 3
In other words, I want all my "CATEGORY" values to become columns, with the corresponding "TOTAL" value to be placed in the first row under the appropriate column.
I've been trying to use CROSSTAB()... but as of now I'm getting the following error:
ERROR: a column definition list is required for functions returning "record"
For reference, here's what I'm trying to put as my SQL command:
SELECT * FROM crosstab(
$$
WITH "countTotal" AS (
SELECT text 'all' AS "sportType", COUNT(*) AS "total"
FROM log
WHERE type = 'SPORT_EVENT_CREATED'
GROUP BY "sportType"
),
"countBySportType" AS (
SELECT sport_type AS "sportType", COUNT(*) AS "total"
FROM log
WHERE type = 'SPORT_EVENT_CREATED'
GROUP BY "sportType"
)
SELECT * FROM "countTotal"
UNION
SELECT * FROM "countBySportType"
$$
)
I think you have to specify names and types of the output columns. From the postgres manual tablefunc
The crosstab function is declared to return setof record, so the
actual names and types of the output columns must be defined in the
FROM clause of the calling SELECT statement, for example:
SELECT * FROM crosstab('...') AS ct(row_name text, category_1 text, category_2 text);
You have to use crosstabN(text) to use it with dynamic number of columns. This PostgreSQL Crosstab Query whole lot of details about the cross tab query.
One more post Dynamic alternative to pivot with CASE and GROUP BY

PostgreSQL: Can't use DISTINCT for some data types

I have a table called _sample_table_delme_data_files which contains some duplicates. I want to copy its records, without duplicates, into data_files:
INSERT INTO data_files (SELECT distinct * FROM _sample_table_delme_data_files);
ERROR: could not identify an ordering operator for type box3d
HINT: Use an explicit ordering operator or modify the query.
Problem is, PostgreSQL can not compare (or order) box3d types. How do I supply such an ordering operator so I can get only the distinct into my destination table?
Thanks in advance,
Adam
If you don't add the operator, you could try translating the box3d data to text using its output function, something like:
INSERT INTO data_files (SELECT distinct othercols,box3dout(box3dcol) FROM _sample_table_delme_data_files);
Edit The next step is: cast it back to box3d:
INSERT INTO data_files SELECT othercols, box3din(b) FROM (SELECT distinct othercols,box3dout(box3dcol) AS b FROM _sample_table_delme_data_files);
(I don't have box3d on my system so it's untested.)
The datatype box3d doesn't have an operator for the DISTINCT-operation. You have to create the operator, or ask the PostGIS-project, maybe somebody has already fixed this problem.
Finally, this was solved by a colleague.
Let's see how many dups are there:
SELECT COUNT(*) FROM _sample_table_delme_data_files ;
count
-------
12728
(1 row)
Now, we shall add another column to the source table to help us differentiate similar rows:
ALTER TABLE _sample_table_delme_data_files ADD COLUMN id2 serial;
We can now see the dups:
SELECT id, id2 FROM _sample_table_delme_data_files ORDER BY id LIMIT 10;
id | id2
--------+------
198748 | 6449
198748 | 85
198801 | 166
198801 | 6530
198829 | 87
198829 | 6451
198926 | 88
198926 | 6452
199062 | 6532
199062 | 168
(10 rows)
And remove them:
DELETE FROM _sample_table_delme_data_files
WHERE id2 IN (SELECT max(id2) FROM _sample_table_delme_data_files
GROUP BY id
HAVING COUNT(*)>1);
Let's see it worked:
SELECT id FROM _sample_table_delme_data_files GROUP BY id HAVING COUNT(*)>1;
id
----
(0 rows)
Remove the auxiliary column:
ALTER TABLE _sample_table_delme_data_files DROP COLUMN id2;
ALTER TABLE
Insert the remaining rows into the destination table:
INSERT INTO data_files (SELECT * FROM _sample_table_delme_data_files);
INSERT 0 6364