How to pushdown filters to a view's group by clause? - postgresql

We have a third party BI tool on the project that can only add a where clause with specified filters on a table/view select. We use a set of 4 source tables, they have indexes for columns that can be filtered using BI's UI. We have a view for each table that do grouping by indexed columns and add 1 additional column. Then we have another view that joins all the data from those 4 views using index columns, that is a view that is queried from our BI's UI, BI adds where clause to queries.
The problem is indexes on source tables are not utilized, filters are not pushed down on the level of tables, instead they are applied at the very end. We can't use Set Returning Function, all our BI tool can do is just select from table\view and add a where clause.
We thought about intercepting a select's where condition in Pg but I'm not sure is it possible. Or maybe it's possible to hint a optimizer that filters need to be pushed down. We can query source tables directly without using views but it will multiply a number of data sources\elements on UI which is not desirable. Is there any other ways we can solve it in PostgreSQL?
Update 1
Bellow examples of schemas/queries we use for our tables and views
CREATE TABLE source_table_1
(
dim1 VARCHAR(255) NOT NULL,
dim2 VARCHAR(255) NOT NULL,
dim3 VARCHAR(255) NOT NULL,
meausre1 Bigint NOT NULL,
meausre2 Bigint NOT NULL,
meausre3 Bigint NOT NULL
);
CREATE INDEX ON uc13_failures_by_cell (dim1, dim2, dim3);
... another 3 tables
CREATE OR REPLACE VIEW view1 AS
SELECT
"type1" as type,
dim1,
dim2,
dim3,
sum(meausre1) AS meausre1,
sum(meausre2) AS meausre2,
sum(meausre3) AS meausre3
FROM source_table_1
GROUP BY 1, 2, 3, 4;
... another 3 views
CREATE OR REPLACE VIEW view_uinion AS
SELECT
coalesce(view1.dim1, view2.dim1, view3.dim1, view4.dim1) AS dim1,
... two other dims
view1.meausre1 AS meausre1_1,
view2.meausre1 AS meausre2_1,
view3.meausre1 AS meausre3_1,
view4.meausre1 AS meausre4_1,
... two meausres
FROM view1
FULL JOIN view2 ON
view1.dim1 = view2.dim1 AND
view1.dim2 = view2.dim2 AND
view1.dim3 = view2.dim3 AND
FULL JOIN view3 ON ...
FULL JOIN view4 ON ...
WHERE -- this is were filters on dims are inserted
;

You cannot push a WHERE condition into a full outer join.
See this example:
CREATE TABLE a(id integer NOT NULL, a1 integer NOT NULL);
INSERT INTO a VALUES (1, 20), (2, 20);
CREATE TABLE b(id integer NOT NULL, b1 integer NOT NULL);
INSERT INTO b VALUES (2, 30), (3, 30);
SELECT *
FROM a
FULL JOIN b USING (id)
WHERE b1 = 30;
id | a1 | b1
----+----+----
2 | 20 | 30
3 | | 30
(2 rows)
SELECT *
FROM a
FULL JOIN (SELECT *
FROM b
WHERE b1 = 30) AS b_red
USING (id);
id | a1 | b1
----+----+----
1 | 20 |
2 | 20 | 30
3 | | 30
(3 rows)
So you would have to modify the underlying queries/views.
If you used inner joins, it would not be a problem.

Related

Postgresql group into predefined groups where group names come from a database table

I have a database table with data similar to this.
create table DataTable (
name text,
value number
)
insert into DataTable values
('A', 1),('A', 2),('B', 3),('Other', 5),('C', 1);
And i have another table
create table "group" (
name text,
default boolean
)
insert into "group" values
('A', false),('B', false),('Other', true);
I want to group the data in the first table based on the defined groups in the second table.
Expected output
Name | sum
A | 3
B | 3
Other | 6
Right now I'm using this query:
select coalesce(g.name, (select name from group where default = true)) name
sum(dt.value)
from DataTable dt
left join group g on dt.name = g.name
group by 1
This works but can cause performance tips in some situations. Any better way to do this?

Join Same Column from Same Table Twice

I am new to the SQL world. I would like to replace the Games.home_team_id and Games.away_team_id with the Corresponding entry in the Teams.name column.
First I start by initializing a small table of data:
CREATE TABLE Games (id,away_team_id INT,away_team_score INT,home_team_id INT, home_team_score INT);
CREATE TABLE
INSERT INTO Games (id,away_team_id,away_team_score,home_team_id,home_team_score)
VALUES
(1,1,1,2,4),
(2,1,3,3,2),
(3,1,1,4,1),
(4,2,0,3,2),
(5,2,3,4,1),
(6,3,5,4,2)
;
INSERT 0 6
Then I create a template of a reference table
CREATE TABLE Teams (id INT, name VARCHAR(63);
CREATE TABLE
INSERT INTO Teams (id, name)
VALUES
(1, 'Oogabooga FC'),
(2, 'FC Milawnchair'),
(3, 'Ron\'s Footy United'),
(4, 'Pylon City FC')
;
INSERT 0 4
I would like to have the table displayed as such:
| id | away_team_name | away_team_score | home_team_name | home_team_score |
-----+----------------+-----------------+----------------+------------------
| 1 | Oogabooga FC | 1 | FC Milawnchair | 4 |
...
I managed to get a join query to show the first value from Teams.name in the away_team_name field using this JOIN:
SELECT
Games.id,
Teams.name AS away_team_name,
Games.away_team_score,
Teams.name AS home_team_name,
Games.home_team_score
FROM Games
JOIN Teams ON Teams.id = Games.away_team_id
;
| id | away_team_name | away_team_score | home_team_name | home_team_score |
-----+----------------+-----------------+----------------+------------------
| 1 | Oogabooga FC | 1 | Oogabooga FC | 4 |
...
But now I am stuck when I call it twice as a JOIN it shows the error:
SELECT
Games.id,
Teams.name AS away_team_name,
Games.away_team_score,
Teams.name AS home_team_name,
Games.home_team_score
FROM Games
JOIN Teams ON Teams.id = Games.away_team_id
JOIN Teams ON Teams.id = Games.home_team_id
;
ERROR: table name "teams" specified more than once
How do you reference the same reference the same column of the same table twice for a join?
You need to specify an alias for at least one of the instances of the table; preferably both.
SELECT
Games.id,
Away.name AS away_team_name,
Games.away_team_score,
Home.name AS home_team_name,
Games.home_team_score
FROM Games
JOIN Teams AS Away ON Away.id = Games.away_team_id
JOIN Teams AS Home ON Home.id = Games.home_team_id
Explanation: As you are joining to the same table twice, the DBMS (in your case, PostgreSQL) is unable to identify which of the tables you're referencing to when using its fields; the way to solve this is to assign an alias to the joined tables the same way you assign aliases for your columns. This way you can specify which of the joined instances are you referencing to in your SELECT, JOIN and WHERE statements.

Multiple UPDATE ... FROM same row is not working

I'm trying to do multiple update, but it works only for the first row.
I have table "users" with 2 records:
create table users
(
uid serial not null
constraint users_pkey
primary key,
balance numeric default 0 not null
);
INSERT INTO public.users (uid, balance) VALUES (2, 100);
INSERT INTO public.users (uid, balance) VALUES (1, 100);
I try to UPDATE user "1" twice with the query, but it update only one time:
balance for user "1" become "105", not "115"
update users as u
set balance = balance + c.bal
from (values (1, 5),
(1, 10)
) as c(uid, bal)
where c.uid = u.uid;
Why it not updated for all rows from subquery?
The postgresql documentation gives no reason for this behaviour but does specify it.
Relevant quote
When a FROM clause is present, what essentially happens is that the
target table is joined to the tables mentioned in the from_list, and
each output row of the join represents an update operation for the
target table. When using FROM you should ensure that the join produces
at most one output row for each row to be modified. In other words, a
target row shouldn't join to more than one row from the other
table(s). If it does, then only one of the join rows will be used to
update the target row, but which one will be used is not readily
predictable.
Use a SELECT with a GROUP BY to combine the rows before performing the update.
You need to aggregate in the inner query before joining:
update users as u
set balance = balance + d.bal
from (
select uid, sum(bal) bal
from ( values (1, 5), (1, 10) ) as c(uid, bal)
group by uid
) d
where d.uid = u.uid;
Demo on DB Fiddle:
| uid | balance |
| --- | ------- |
| 2 | 100 |
| 1 | 115 |

What's the best data structure for a hierarchical BOM

I am trying to work out the best schema structure to represent a BoM in Postgres. Assuming a part can have multiple of the same child part, I could add a quantity column, but those parts may also have multiple children.
If I wanted to know the total usage of each part does postgres have a way of using the quantity column in a hierarchical query?
BOM means Bill Of Material.
As far as I understand your question, then yes, you can include the quantity when using a hierarchical BOM. The way I understand your question is, that if one BOM entry has an amount of e.g. 10, then the amount for its children needs to be multiplied with 10 (because you have 10 times that "child" item).
With the following table and sample data:
create table bom_entry
(
entry_id integer primary key,
product text, -- should be a foreign key to the product table
amount integer not null,
parent_id integer references bom_entry
);
insert into bom_entry
values
(1, 'Box', 1, null),
(2, 'Screw', 10, 1),
(3, 'Nut', 2, 2),
(4, 'Shim', 2, 2),
(5, 'Lock', 2, 1),
(6, 'Key', 2, 5);
So our box needs 10 screws and every screw needs 2 nuts and 2 shims, so we need a total of 20 nuts and 20 shims. We also have two locks and each lock has two keys, so we have a total of 4 keys.
You can use a recursive CTE to go through the tree and calculate the amount for each item.
with recursive bom as (
select *, amount as amt, 1 as level
from bom_entry
where parent_id is null
union all
select c.*, p.amt * c.amount as amt, p.level + 1
from bom_entry c
join bom p on c.parent_id = p.entry_id
)
select rpad(' ', (level - 1)*2, ' ')||product as product, amount as entry_amount, amt as total_amount
from bom
order by entry_id;
The rpad/level is used to do the indention to visualize the hierarchy. The above query returns the following:
product | entry_amount | total_amount
---------+--------------+-------------
Box | 1 | 1
Screw | 10 | 10
Nut | 2 | 20
Shim | 2 | 20
Lock | 2 | 2
Key | 2 | 4

Postgresql - `serial` column & inheritence (sequence sharing policy)

In postgresql, when inherit a serial column from parent table, the sequence is shared by parent & child table.
Is it possible to inherit the serial column, while let the 2 table have separated sequence values, e.g both table's column could have value 1.
Is this possible & reasonable, and if yes, how to do that?
#Update
The reasons that I want to avoid sequence sharing are:
Sharing a single int range by multiple table might use up the
MAX_INT, using bigint could improve this, but it takes more space
too.
There is a kind of resource locking when multiple table doing insert concurrently, so it's a performance issue I guess.
The id jump from 1 to 5 then might to 1000 don't look as beautiful as it could.
#Summary
solutions:
If want child table have its own sequence, while still keep the global sequence among parent & child table. (As described in #wildplasser 's answer.)
Then could add a sub_id serial column for each child table.
If want child table have its own sequence, while don't need a global sequence among parent & child table,
There there are 2 ways:
Using int instead of serial. (As described in #lsilva 's answer.)
Steps:
define type as int or bigint in parent table,
for each parent & child table, create a individual sequence,
specify default value for int type for each table using nextval of their own sequence,
don't forget to maintain/reset the sequence, when re-create table,
Define id serial directly in child table, and not in parent table.
DROP schema tmp CASCADE;
CREATE schema tmp;
set search_path = tmp, pg_catalog;
CREATE TABLE common
( seq SERIAL NOT NULL PRIMARY KEY
);
CREATE TABLE one
( subseq SERIAL NOT NULL
, payload integer NOT NULL
)
INHERITS (tmp.common)
;
CREATE TABLE two
( subseq SERIAL NOT NULL
, payload integer NOT NULL
)
INHERITS (tmp.common)
;
/**
\d common
\d one
\d two
\q
***/
INSERT INTO one(payload)
SELECT gs FROM generate_series(1,5) gs
;
INSERT INTO two(payload)
SELECT gs FROM generate_series(101,105) gs
;
SELECT * FROM common;
SELECT * FROM one;
SELECT * FROM two;
Results:
NOTICE: drop cascades to table tmp.common
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 5
INSERT 0 5
seq
-----
1
2
3
4
5
6
7
8
9
10
(10 rows)
seq | subseq | payload
-----+--------+---------
1 | 1 | 1
2 | 2 | 2
3 | 3 | 3
4 | 4 | 4
5 | 5 | 5
(5 rows)
seq | subseq | payload
-----+--------+---------
6 | 1 | 101
7 | 2 | 102
8 | 3 | 103
9 | 4 | 104
10 | 5 | 105
(5 rows)
But: in fact you don't need the subseq columns, since you can always enumerate them by means of row_number():
CREATE VIEW vw_one AS
SELECT seq
, row_number() OVER (ORDER BY seq) as subseq
, payload
FROM one;
CREATE VIEW vw_two AS
SELECT seq
, row_number() OVER (ORDER BY seq) as subseq
, payload
FROM two;
[results are identical]
And, you could add UNIQUE AND PRIMARY KEY constraints to the child tables, like:
CREATE TABLE one
( subseq SERIAL NOT NULL UNIQUE
, payload integer NOT NULL
)
INHERITS (tmp.common)
;
ALTER TABLE one ADD PRIMARY KEY (seq);
[similar for table two]
I use this :
Parent table definition:
CREATE TABLE parent_table (
id bigint NOT NULL,
Child table definition:
CREATE TABLE cild_schema.child_table
(
id bigint NOT NULL DEFAULT nextval('child_schema.child_table_id_seq'::regclass),
I am emulating the serial by using a sequence number as a default.