How to split string in PostgreSQL to make combination with another string - postgresql

I have data like below
Id | Data |Parent Id
----------------------------------------------------------------------------------
1 | IceCream # Chocolate # SoftDrink |0
2 | Amul,Havemore#Cadbary,Nestle#Pepsi |1
3 | Party#Wedding |0
I want to split this data in below format where row 2 is dependent on row 1. I have added ParentId which is use to find dependency.
IceCream | Amul | Party
IceCream | Havemore | Party
IceCream | Amul | Wedding
IceCream | Havemore | Wedding
Chocolate | Cadbery | Party
Chocolate | Nestle | Party
Chocolate | Cadbery | Wedding
Chocolate | Nestle | Wedding
SoftDrink | Pepsi | Party
SoftDrink | Pepsi | Wedding
I have used unnest(string_to_array) to split string but unable to traverse through loop to make this combination.

The is a very "unstable",like sitting on a knife edge and could easily fall apart. It depends on assigning values for each delimited value and then joining on those values. Maybe those flags that are known to you (but unfortunately not us) can stabilize it. But it does match your indicated expectations. It uses the function regexp_split_to_table rather than unnest to split the delimiters.
with base (num, list) as
( values (1,'IceCream#Chocolate#SoftDrink')
, (2,'Amul,Havemore#Cadbary,Nestle#Pepsi')
, (3,'Party#Wedding')
)
, product as
(select p, row_number(*) over() pn
from (
select regexp_split_to_table(list,'#') p
from base
where num=1
) x
)
, maker as
(select regexp_split_to_table(m, ',') m, row_number(*) over() mn
from (
select regexp_split_to_table(list,'#') m
from base
where num=2
) y
)
, event as
( select regexp_split_to_table(regexp_split_to_table(list,'#'), ',') e
from base
where num=3
)
select p as product
, m as maker
, e as event
from (product join maker on pn = mn) cross join event e
order by pn, e, m;
Hope it helps.

Related

Efficient way to retrieve all values from a column that start with other values from the same column in PostgreSQL

For the sake of simplicity, suppose you have a table with numbers like:
| number |
----------
|123 |
|1234 |
|12345 |
|123456 |
|111 |
|1111 |
|2 |
|700 |
What would be an efficient way of retrieving the shortest numbers (call them roots or whatever) and all values derived from them, eg:
| root | derivatives |
--------------------------------
| 123 | 1234, 12345, 123456 |
| 111 | 1111 |
Numbers 2 & 700 are excluded from the list because they're unique, and thus have no derivatives.
An output as the above would be ideal, but since it's probably difficult to achieve, the next best thing would be something like below, which I can then post-process:
| root | derivative |
-----------------------
| 123 | 1234 |
| 123 | 12345 |
| 123 | 123456 |
| 111 | 1111 |
My naive initial attempt to at least identify roots (see below) has been running for 4h now with a dataset of ~500k items, but the real one I'd have to inspect consists of millions.
select number
from numbers n1
where exists(
select number
from numbers n2
where n2.number <> n1.number
and n2.number like n1.number || '_%'
);
This works if number is an integer or bigint:
select min(a.number) as root, b.number as derivative
from nums a
cross join lateral generate_series(1, 18) as gs(power)
join nums b
on b.number / (10^gs.power)::bigint = a.number
group by b.number
order by root, derivative;
EDIT: I moved a non-working query to the bottom. It fails for reasons outlined by #Morfic in the comments.
We can do a similar and simpler join using like for character types:
select min(a.number) as root, b.number as derivative
from numchar a
join numchar b on b.number like a.number||'%'
and b.number != a.number
group by b.number
order by root, derivative;
Updated fiddle.
Faulty Solution Follows
If number is a character type, then try this:
with groupings as (
select number,
case
when number like (lag(number) over (order by number))||'%' then 0
else 1
end as newgroup
from numchar
), groupnums as (
select number, sum(newgroup) over (order by number) as groupnum
from groupings
), matches as (
select min(number) over (partition by groupnum) as root,
number as derivative
from groupnums
)
select *
from matches
where root != derivative;
There should be only a single sort on groupnum in this execution since the column is your table's primary key.
db<>fiddle here

PostgreSQL One ID multiple values

I have a Postgres table where one id may have multiple Channel values as follows
ID |Channel | Column 3 | Column 4
_____|________|__________|_________
1 | Sports | x | null
1 | Organic| x | z
2 | Organic| null | q
3 | Arts | b | w
3 | Organic| e | r
4 | Sports | sp | t
No ID will have a duplicate channel name, and no ID will be both Sports and Arts. That is, ID 1 could have a Sports and Organic channel, a Sports and Arts channel, but not two sports or two organic entries and not a Sports and Arts channel. I want all IDs to be in the query, but if there is a non-organic channel I prefer that. The result I would want would be
ID |Channel | Column 3 | Column 4
_____|________|__________|_________
1 | Sports | x | null
2 | Organic| null | q
3 | Arts | b | w
4 | Sports | sp | t
I feel like there is some CTE here, a rank and partition or something that could do the trick, but I'm just not getting it. I'm only including Columns 3 and 4 to show there are extra columns.
Does anyone have any ideas on the code to deploy here?
You could use DISTINCT ON with an appropriate ORDER BY clause:
SELECT DISTINCT ON (id)
id, channel, column3, column4
FROM atable
ORDER BY id, channel = 'Organic';
This relies on the fact that FALSE < TRUE.
I ended up using a rank over function
ROW_NUMBER () over (partition by salesforce_id order by case when channel is organic then 0 else 1 end desc, timestamp desc) as id_rank
I didn't include in the original question that I had a timestamp! This works now. Thanks

How to get back aggregate values across 2 dimensions using Python Cubes?

Situation
Using Python 3, Django 1.9, Cubes 1.1, and Postgres 9.5.
These are my datatables in pictorial form:
The same in text format:
Store table
------------------------------
| id | code | address |
|-----|------|---------------|
| 1 | S1 | Kings Row |
| 2 | S2 | Queens Street |
| 3 | S3 | Jacks Place |
| 4 | S4 | Diamonds Alley|
| 5 | S5 | Hearts Road |
------------------------------
Product table
------------------------------
| id | code | name |
|-----|------|---------------|
| 1 | P1 | Saucer 12 |
| 2 | P2 | Plate 15 |
| 3 | P3 | Saucer 13 |
| 4 | P4 | Saucer 14 |
| 5 | P5 | Plate 16 |
| and many more .... |
|1000 |P1000 | Bowl 25 |
|----------------------------|
Sales table
----------------------------------------
| id | product_id | store_id | amount |
|-----|------------|----------|--------|
| 1 | 1 | 1 |7.05 |
| 2 | 1 | 2 |9.00 |
| 3 | 2 | 3 |1.00 |
| 4 | 2 | 3 |1.00 |
| 5 | 2 | 5 |1.00 |
| and many more .... |
| 1000| 20 | 4 |1.00 |
|--------------------------------------|
The relationships are:
Sales belongs to Store
Sales belongs to Product
Store has many Sales
Product has many Sales
What I want to achieve
I want to use cubes to be able to do a display by pagination in the following manner:
Given the stores S1-S3:
-------------------------
| product | S1 | S2 | S3 |
|---------|----|----|----|
|Saucer 12|7.05|9 | 0 |
|Plate 15 |0 |0 | 2 |
| and many more .... |
|------------------------|
Note the following:
Even though there were no records in sales for Saucer 12 under Store S3, I displayed 0 instead of null or none.
I want to be able to do sort by store, say descending order for, S3.
The cells indicate the SUM total of that particular product spent in that particular store.
I also want to have pagination.
What I tried
This is the configuration I used:
"cubes": [
{
"name": "sales",
"dimensions": ["product", "store"],
"joins": [
{"master":"product_id", "detail":"product.id"},
{"master":"store_id", "detail":"store.id"}
]
}
],
"dimensions": [
{ "name": "product", "attributes": ["code", "name"] },
{ "name": "store", "attributes": ["code", "address"] }
]
This is the code I used:
result = browser.aggregate(drilldown=['Store','Product'],
order=[("Product.name","asc"), ("Store.name","desc"), ("total_products_sale", "desc")])
I didn't get what I want.
I got it like this:
----------------------------------------------
| product_id | store_id | total_products_sale |
|------------|----------|---------------------|
| 1 | 1 | 7.05 |
| 1 | 2 | 9 |
| 2 | 3 | 2.00 |
| and many more .... |
|---------------------------------------------|
which is the whole table with no pagination and if the products not sold in that store it won't show up as zero.
My question
How do I get what I want?
Do I need to create another data table that aggregates everything by store and product before I use cubes to run the query?
Update
I have read more. I realised that what I want is called dicing as I needed to go across 2 dimensions. See: https://en.wikipedia.org/wiki/OLAP_cube#Operations
Cross-posted at Cubes GitHub issues to get more attention.
This is a pure SQL solution using crosstab() from the additional tablefunc module to pivot the aggregated data. It typically performs better than any client-side alternative. If you are not familiar with crosstab(), read this first:
PostgreSQL Crosstab Query
And this about the "extra" column in the crosstab() output:
Pivot on Multiple Columns using Tablefunc
SELECT product_id, product
, COALESCE(s1, 0) AS s1 -- 1. ... displayed 0 instead of null
, COALESCE(s2, 0) AS s2
, COALESCE(s3, 0) AS s3
, COALESCE(s4, 0) AS s4
, COALESCE(s5, 0) AS s5
FROM crosstab(
'SELECT s.product_id, p.name, s.store_id, s.sum_amount
FROM product p
JOIN (
SELECT product_id, store_id
, sum(amount) AS sum_amount -- 3. SUM total of product spent in store
FROM sales
GROUP BY product_id, store_id
) s ON p.id = s.product_id
ORDER BY s.product_id, s.store_id;'
, 'VALUES (1),(2),(3),(4),(5)' -- desired store_id's
) AS ct (product_id int, product text -- "extra" column
, s1 numeric, s2 numeric, s3 numeric, s4 numeric, s5 numeric)
ORDER BY s3 DESC; -- 2. ... descending order for S3
Produces your desired result exactly (plus product_id).
To include products that have never been sold replace [INNER] JOIN with LEFT [OUTER] JOIN.
SQL Fiddle with base query.
The tablefunc module is not installed on sqlfiddle.
Major points
Read the basic explanation in the reference answer for crosstab().
I am including with product_id because product.name is hardly unique. This might otherwise lead to sneaky errors conflating two different products.
You don't need the store table in the query if referential integrity is guaranteed.
ORDER BY s3 DESC works, because s3 references the output column where NULL values have been replaced with COALESCE. Else we would need DESC NULLS LAST to sort NULL values last:
PostgreSQL sort by datetime asc, null first?
For building crosstab() queries dynamically consider:
Dynamic alternative to pivot with CASE and GROUP BY
I also want to have pagination.
That last item is fuzzy. Simple pagination can be had with LIMIT and OFFSET:
Displaying data in grid view page by page
I would consider a MATERIALIZED VIEW to materialize results before pagination. If you have a stable page size I would add page numbers to the MV for easy and fast results.
To optimize performance for big result sets, consider:
SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'
Optimize query with OFFSET on large table

How to eliminate repeated field with GROUP BY clause?

I have 3 tables called:
1.app_tenant pk:id, fk:pasar_id
---+--------+-----------+
id | nama | pasar_id |
----+--------+-----------+
1 | joe | 1 |
2 | adi | 2 |
3 | adam | 3 |
2.app_pasar pk:id
----+------------- +
id | nama |
----+------------- +
1 | kosambi |
2 | gede bage |
3 | pasar minggu |
3.app_kios pk:id, fk:tenant_id
----+---------------+----------
id | nama |tenant_id
----+-------------- +----------
1 | kios1 |1
2 | kios2 |2
3 | kios3 |3
4 | kios4 |1
5 | kios5 |1
6 | kios6 |2
7 | kios7 |2
8 | kios8 |3
9 | kios9 |3
Then with a LEFT JOIN query and grouping by id in every table I want to displaying data like this:
----+---------------+------------+-----------
id | nama_tenant |nama_pasar |nama_kios
----+-------------- +------------------------
1 | joe |kosambi |kios 1
2 | adi |gede bage |kios 2
2 | adam |pasar minggu|kios 3
but after I execute this query, data are not shown as expected. The problem is
redundancy in the nama_tenant field. How can I eliminate repeated nama_tenantrecords?
This is my query:
select a.id,a.nama as nama_tenant,
b.nama as nama_pasar,
c.nama as nama_kios
from app_tenant a
left join app_pasar b on a.id=b.id
left join app_kios c on a.id= c.tenant_id
group by
a.id,
b.id,
c.id
Table definitions:
CREATE TABLE app_tenant (
id serial PRIMARY KEY,
nama character varying,
pasar_id integer);
CREATE TABLE app_kios (
id serial PRIMARY KEY,
nama character varying,
tenant_id integer REFERENCES app_tenant);
The problem is that tenants can have multiple kiosks. From your sample data it looks like you want to display the first kiosk of every tenant (although "first" is a vague concept on strings, here I use alphabetical sort order). Your query would be like this:
SELECT t.id, t.nama AS nama_tenant, p.nama AS nama_pasar, k.nama AS nama_kios
FROM app_tenant t
LEFT JOIN app_pasar p ON p.id = t.pasar_id
LEFT JOIN (
SELECT tenant_id, nama, rank() OVER (PARTITION BY tenant_id ORDER BY nama) AS rnk
FROM app_kios
WHERE rnk = 1) k ON k.tenant_id = t.id
ORDER BY t.id
The sub-query on app_kios uses a window function to get the first kiosk name after sorting the names of the kiosk for each tenant.
I would also suggest to use meaningful aliases for table names instead of simply a, b, c.

Replacing a comma seperate value in table with another in select query (postgres)

I have two tables, table A has ID column whose values are comma separated, each of those ID value has a representation in table B.
Table A
+-----------------+
| Name | ID |
+------------------
| A1 | 1,2,3|
| A2 | 2 |
| A3 | 3,2 |
+------------------
Table B
+-------------------+
| ID | Value |
+-------------------+
| 1 | Apple |
| 2 | Orange |
| 3 | Mango |
+-------------------+
I was wondering if there is an efficient way to do a select where the result would as below,
Name, Value
A1 Apple, Orange, Mango
A2 Orange
A3 Mango, Orange
Any suggestions would be welcome. Thanks.
You need to first "normalize" table_a into a new table using the following:
select name, regexp_split_to_table(id, ',') id
from table_a;
The result of this can be joined to table_b and the result of the join then needs to be grouped in order to get the comma separated list of the names:
select a.name, string_agg(b.value, ',')
from (
select name, regexp_split_to_table(id, ',') id
from table_a
) a
JOIN table_b b on b.id = a.id
group by a.name;
SQLFiddle: http://sqlfiddle.com/#!12/77fdf/1
There are two regex related functions that can be useful:
http://www.postgresql.org/docs/current/static/functions-string.html
regexp_split_to_table()
regexp_split_to_array()
Code below is untested, but you'd use something like it to match A and B:
select name, value
from A
join B on B.id = ANY(regexp_split_to_array(A.id, E'\\s*,\\s*', 'g')::int[]))
You can then use array_agg(value), grouping by name, and format using array_to_string().
Two notes, though:
It won't be as efficient as normalizing things.
The formatting itself ought to be done further down, in your views.