Postgres: How Can I DRY up Multiple WHERE IN (a,b,c,...) Clauses? - postgresql

In a programming language like Javascript, instead of doing:
['a', 'b', 'c', 'd'].map(...);
['a', 'b', 'c', 'd'].filter(...);
['a', 'b', 'c', 'd'].forEach(...);
I can do the following:
const names = ['a', 'b', 'c', 'd'];
names.map(...);
names.filter(...);
names.forEach(...);
If I have several SQL statements in a file:
SELECT * FROM foo WHERE something IN ('a', 'b', 'c', 'd');
SELECT * FROM bar WHERE something_else IN ('a', 'b', 'c', 'd');
SELECT * FROM baz WHERE another_thing IN ('a', 'b', 'c', 'd')
Is there a similar way I can "create an array variable" and then use it repeatedly in all those queries? I know things get complicated because ('a', 'b', 'c', 'd') isn't actually an array, and I'm not sure if I should be using a literal variable, a view, or a function to hold the ('a', 'b', 'c', 'd') part.

The closest analogy would be a temporary table.
CREATE TEMP TABLE targets (t text);
COPY targets FROM stdin;
a
b
c
d
...thousands more rows
\.
SELECT foo.* FROM foo JOIN targets ON foo.x = targets.t
However, it is less common in a database to need to match one set of values against multiple tables because that can imply your database structure needs reworking.

Related

Get out edges with properties

I have a Digraph G with some nodes and some edges:
G.add_edges_from([ ...
('X', 'Y', {'property': 85}), ...
('X', 'T', {'property': 104}), ...
...])
However, when I run G.out_edges('X'), it returns OutEdgeDataView([('X', 'Y'), ('X', 'T')]). Instead, I want to get a list of tuples with the edges (with the property), like this:
[('X', 'Y', {'property': 85}), ('X', 'T', {'property': 104})]
How should I get these results?
Thanks!
You can use networkx.DiGraph.out_edges(data=True).
import networkx as nx
G = nx.DiGraph()
G.add_edges_from([
('X', 'Y', {'property': 85}),
('X', 'T', {'property': 104}),
('Z', 'X', {'property': 104}),
])
print(G.out_edges('X', data=True))
[('X', 'Y', {'property': 85}), ('X', 'T', {'property': 104})]
I found no better answer than simply writing a list comprehension like this:
def get_out_edges(node, all_edges):
return [e for e in all_edges if node == e[0]]

IN clause for multiple columns on the same value set

I have a WHERE clause like below:
WHERE t1.col1 IN ('a', 'b', 'c') OR t2.col2 IN ('a', 'b', 'c');
I know that the two IN clauses will always have the same contents. Is there any way I can avoid duplicating them?
Something like:
WHERE (t1.col1 OR t2.col2) IN ('a', 'b', 'c')
You can use the array overlaps operator:
where array[t1.col1, t2.col2] && array['a','b','c']
with flt(x) as (values('{a,b,c}'::text[]))
select
...
from ..., flt
where t1.col1 = any(flt.x) or t2.col2 = any(flt.x);
There are two ways to use some constant values in the PostgreSQL. Using CTE as shown above. Or using "options":
set foo.bar to '{a,b,c}';
select '{c,d,e}'::text[] && current_setting('foo.bar')::text[];

How can I create many tables of the main one and store them?

my_table = readtable('some.csv')
'L1' 'B'
'L2' 'B'
'L3' 'A'
'L4' 'C'
'L5' 'B'
'L6' 'C'
'L7' 'C'
'L8' 'A'
In the second column there are different values, B, A, and C. I want to create smaller tables from this one, depending on the value in the right, so the expected outcome would be:
'L1' 'B'
'L2' 'B'
'L5' 'B'
and
'L3' 'A'
'L8' 'A'
and
'L4' 'C'
'L6' 'C'
'L7' 'C'
each of them stored in a different variable for later use.
You can access table contents with my_table.Variables or my_table.my_column_header. From there, you can compare the values with A, B and C; then create new tables.
% Get the ABC column.
content=my_table.Variables;
ABC = char(content{:,2});
% Alternatively, if you have the header.
ABC = char(my_table.Line2);
% Create new tables.
tableA = table(my_table(ABC == 'A',:));
tableB = table(my_table(ABC == 'B',:));
tableC = table(my_table(ABC == 'C',:));

recursive path aggregation and CTE query for top-down tree postgres

I'm trying to write a query to produce a list of all nodes in a tree given a root, and also the paths (using names the parents give their children) taken to get there. The recursive CTE I have working is a textbook CTE straight from the docs here, however, it's proven difficult to get the paths working in this case.
Following the git model, names are given to children by their parents as a result of paths created by traversing the tree. This implies a map to children ids like git's tree structure.
I've been looking online for a solution for a recursive query but they all seem to contain solutions that use parent ids, or materialized paths, which all would break structural sharing concepts that Rich Hickey's database as value talk is all about.
current implementation
Imagine the objects table is dead simple (for simplicity sake, let's assume integer ids):
drop table if exists objects;
create table objects (
id INT,
data jsonb
);
-- A
-- / \
-- B C
-- / \ \
-- D E F
INSERT INTO objects (id, data) VALUES
(1, '{"content": "data for f"}'), -- F
(2, '{"content": "data for e"}'), -- E
(3, '{"content": "data for d"}'), -- D
(4, '{"nodes":{"f":{"id":1}}}'), -- C
(5, '{"nodes":{"d":{"id":2}, "e":{"id":3}}}'), -- B
(6, '{"nodes":{"b":{"id":5}, "c":{"id":4}}}') -- A
;
drop table if exists work_tree;
create table work_tree (
id INT NOT NULL,
path text,
ref text,
data jsonb,
primary key (ref, id) -- TODO change to ref, path
);
create or replace function get_nested_ids_array(data jsonb) returns int[] as $$
select array_agg((value->>'id')::int) as nested_id
from jsonb_each(data->'nodes')
$$ LANGUAGE sql STABLE;
create or replace function checkout(root_id int, ref text) returns void as $$
with recursive nodes(id, nested_ids, data) AS (
select id, get_nested_ids_array(data), data
from objects
where id = root_id
union
select child.id, get_nested_ids_array(child.data), child.data
from objects child, nodes parent
where child.id = ANY(parent.nested_ids)
)
INSERT INTO work_tree (id, data, ref)
select id, data, ref from nodes
$$ language sql VOLATILE;
SELECT * FROM checkout(6, 'master');
SELECT * FROM work_tree;
If you are familiar, these objects' data property look similar to git blobs/trees, mapping names to ids or storing content. So imagine you want to create an index, so, after a "checkout", you need to query for the list of nodes, and potentially the paths to produce a working tree or index:
Current Output:
id path ref data
6 NULL master {"nodes":{"b":{"id":5}, "c":{"id":4}}}
4 NULL master {"nodes":{"d":{"id":2}, "e":{"id":3}}}
5 NULL master {"nodes":{"f":{"id":1}}}
1 NULL master {"content": "data for d"}
2 NULL master {"content": "data for e"}
3 NULL master {"content": "data for f"}
Desired Output:
id path ref data
6 / master {"nodes":{"b":{"id":5}, "c":{"id":4}}}
4 /b master {"nodes":{"d":{"id":2}, "e":{"id":3}}}
5 /c master {"nodes":{"f":{"id":1}}}
1 /b/d master {"content": "data for d"}
2 /b/e master {"content": "data for e"}
3 /c/f master {"content": "data for f"}
What's the best way to aggregate path in this case? I'm aware that I'm compressing the information when I call get_nested_ids_array when I do the recursive query, so not sure with this top-down approach how to properly aggregate with the CTE.
EDIT use case for children ids
to explain more about why I need to use children ids instead of parent:
Imagine a data structure like so:
A
/ \
B C
/ \ \
D E F
If you make a modification to F, you only add a new root A', and children nodes C', and F', leaving the old tree in tact:
A' A
/ \ / \
C' B C
/ / \ \
F' D E F
If you make a deletion, you only add a new root A" that only points to B and you still have A if you ever need to time travel (and they share the same objects, just like git!):
A" A
\ / \
B C
/ \ \
D E F
So it seems that the best way to achieve this is with children ids so children can have multiple parents - across time and space! If you think there's another way to achieve this, by all means, let me know!
Edit #2 case for not using parent_ids
Using parent_ids has cascading effects that requires editing the entire tree. For example,
A
/ \
B C
/ \ \
D E F
If you make a modification to F, you still need a new root A' to maintain immutability. And if we use parent_ids, then that means both B and C now have a new parent. Hence, you can see how it ripples through the entire tree immediately requiring every node is touched:
A A'
/ \ / \
B C B' C'
/ \ \ / \ \
D E F D' E' F'
EDIT #3 use case for parents giving names to children
We can make a recursive query where objects store their own name, but the question I'm asking is specifically about constructing a path where the names are given to children from their parents. This is modeling a data structure similar to the git tree, for example if you see this git graph pictured below, in the 3rd commit there is a tree (a folder) bak that points to the original root which represents a folder of all the files at the 1st commit. If that root object had it's own name, it wouldn't be possible to achieve this so simply as adding a ref. That's the beauty of git, it's as simple as making a reference to a hash and giving it a name.
This is the relationship I'm setting up which is why the jsonb data structure exists, it's to provide a mapping from a name to an id (hash in git's case). I know it's not ideal, but it does the job of providing the hash map. If there's another way to create this mapping of names to ids, and thus, a way for parents to give names to children in a top-down tree, I'm all ears!
Any help is appreciated!
Store the parent of a node instead of its children. It is a simpler and cleaner solution, in which you do not need structured data types.
This is an exemplary model with the data equivalent to that in the question:
create table objects (
id int primary key,
parent_id int,
label text,
content text);
insert into objects values
(1, 4, 'f', 'data for f'),
(2, 5, 'e', 'data for e'),
(3, 5, 'd', 'data for d'),
(4, 6, 'c', ''),
(5, 6, 'b', ''),
(6, 0, 'a', '');
And a recursive query:
with recursive nodes(id, path, content) as (
select id, label, content
from objects
where parent_id = 0
union all
select o.id, concat(path, '->', label), o.content
from objects o
join nodes n on n.id = o.parent_id
)
select *
from nodes
order by id desc;
id | path | content
----+---------+------------
6 | a |
5 | a->b |
4 | a->c |
3 | a->b->d | data for d
2 | a->b->e | data for e
1 | a->c->f | data for f
(6 rows)
The variant with children_ids.
drop table if exists objects;
create table objects (
id int primary key,
children_ids int[],
label text,
content text);
insert into objects values
(1, null, 'f', 'data for f'),
(2, null, 'e', 'data for e'),
(3, null, 'd', 'data for d'),
(4, array[1], 'c', ''),
(5, array[2,3], 'b', ''),
(6, array[4,5], 'a', '');
with recursive nodes(id, children, path, content) as (
select id, children_ids, label, content
from objects
where id = 6
union all
select o.id, o.children_ids, concat(path, '->', label), o.content
from objects o
join nodes n on o.id = any(n.children)
)
select *
from nodes
order by id desc;
id | children | path | content
----+----------+---------+------------
6 | {4,5} | a |
5 | {2,3} | a->b |
4 | {1} | a->c |
3 | | a->b->d | data for d
2 | | a->b->e | data for e
1 | | a->c->f | data for f
(6 rows)
#klin's excellent answer inspired me to experiment with PostgreSQL, trees (paths), and recursive CTE! :-D
Preamble: my motivation is storing data in PostgreSQL, but visualizing those data in a graph. While the approach here has limitations (e.g. undirected edges; ...), it may otherwise be useful in other contexts.
Here, I adapted #klins code to enable CTE without a dependence on the table id, though I do use those to deal with the issue of loops in the data, e.g.
a,b
b,a
that throw the CTE into a nonterminating loop.
To solve that, I employed the rather brilliant approach suggested by #a-horse-with-no-name in SO 31739150 -- see my comments in the script, below.
PSQL script ("tree with paths.sql"):
-- File: /mnt/Vancouver/Programming/data/metabolism/practice/sql/tree_with_paths.sql
-- Adapted from: https://stackoverflow.com/questions/44620695/recursive-path-aggregation-and-cte-query-for-top-down-tree-postgres
-- See also: /mnt/Vancouver/FC/RDB - PostgreSQL/Recursive CTE - Graph Algorithms in a Database Recursive CTEs and Topological Sort with Postgres.pdf
-- https://www.fusionbox.com/blog/detail/graph-algorithms-in-a-database-recursive-ctes-and-topological-sort-with-postgres/620/
-- Run this script in psql, at the psql# prompt:
-- \! cd /mnt/Vancouver/Programming/data/metabolism/practice/sql/
-- \i /mnt/Vancouver/Programming/data/metabolism/practice/sql/tree_with_paths.sql
\c practice
DROP TABLE tree;
CREATE TABLE tree (
-- id int primary key
id SERIAL PRIMARY KEY
,s TEXT -- s: source node
,t TEXT -- t: target node
,UNIQUE(s, t)
);
INSERT INTO tree(s, t) VALUES
('a','b')
,('b','a') -- << adding this 'back relation' breaks CTE_1 below, as it enters a loop and cannot terminate
,('b','c')
,('b','d')
,('c','e')
,('d','e')
,('e','f')
,('f','g')
,('g','h')
,('c','h');
SELECT * FROM tree;
-- SELECT s,t FROM tree WHERE s='b';
-- RECURSIVE QUERY 1 (CTE_1):
-- WITH RECURSIVE nodes(src, path, tgt) AS (
-- SELECT s, concat(s, '->', t), t FROM tree WHERE s = 'a'
-- -- SELECT s, concat(s, '->', t), t FROM tree WHERE s = 'c'
-- UNION ALL
-- SELECT t.s, concat(path, '->', t), t.t FROM tree t
-- JOIN nodes n ON n.tgt = t.s
-- )
-- -- SELECT * FROM nodes;
-- SELECT path FROM nodes;
-- RECURSIVE QUERY 2 (CTE_2):
-- Deals with "loops" in Postgres data, per
-- https://stackoverflow.com/questions/31739150/to-find-infinite-recursive-loop-in-cte
-- "With Postgres it's quite easy to prevent this by collecting all visited nodes in an array."
WITH RECURSIVE nodes(id, src, path, tgt) AS (
SELECT id, s, concat(s, '->', t), t
,array[id] as all_parent_ids
FROM tree WHERE s = 'a'
UNION ALL
SELECT t.id, t.s, concat(path, '->', t), t.t, all_parent_ids||t.id FROM tree t
JOIN nodes n ON n.tgt = t.s
AND t.id <> ALL (all_parent_ids) -- this is the trick to exclude the endless loops
)
-- SELECT * FROM nodes;
SELECT path FROM nodes;
Script execution / output (PSQL):
# \i tree_with_paths.sql
You are now connected to database "practice" as user "victoria".
DROP TABLE
CREATE TABLE
INSERT 0 10
id | s | t
----+---+---
1 | a | b
2 | b | a
3 | b | c
4 | b | d
5 | c | e
6 | d | e
7 | e | f
8 | f | g
9 | g | h
10 | c | h
path
---------------------
a->b
a->b->a
a->b->c
a->b->d
a->b->c->e
a->b->d->e
a->b->c->h
a->b->d->e->f
a->b->c->e->f
a->b->c->e->f->g
a->b->d->e->f->g
a->b->d->e->f->g->h
a->b->c->e->f->g->h
You can change the starting node (e.g. start at node "d") in the SQL script -- giving, e.g.:
# \i tree_with_paths.sql
...
path
---------------
d->e
d->e->f
d->e->f->g
d->e->f->g->h
Network visualization:
I exported those data (at the PSQL prompt) to a CSV,
# \copy (SELECT s, t FROM tree) TO '/tmp/tree.csv' WITH CSV
COPY 9
# \! cat /tmp/tree.csv
a,b
b,c
b,d
c,e
d,e
e,f
f,g
g,h
c,h
... which I visualized (image above) in a Python 3.5 venv:
>>> import networkx as nx
>>> import pylab as plt
>>> G = nx.read_edgelist("/tmp/tree.csv", delimiter=",")
>>> G.nodes()
['b', 'a', 'd', 'f', 'c', 'h', 'g', 'e']
>>> G.edges()
[('b', 'a'), ('b', 'd'), ('b', 'c'), ('d', 'e'), ('f', 'g'), ('f', 'e'), ('c', 'e'), ('c', 'h'), ('h', 'g')]
>>> G.number_of_nodes()
8
>>> G.number_of_edges()
9
>>> from networkx.drawing.nx_agraph import graphviz_layout
## There is a bug in Python or NetworkX: you may need to run this
## command 2x, as you may get an error the first time:
>>> nx.draw(G, pos=graphviz_layout(G), node_size=1200, node_color='lightblue', linewidths=0.25, font_size=10, font_weight='bold', with_labels=True)
>>> plt.show()
>>> nx.dijkstra_path(G, 'a', 'h')
['a', 'b', 'c', 'h']
>>> nx.dijkstra_path(G, 'a', 'f')
['a', 'b', 'd', 'e', 'f']
Note that the dijkstra_path returned from NetworkX is one of several possible, whereas all paths are returned by the Postgres CTE in a visually-appealing manner.

Merge command DB2 AIX (Update ok, insert not ok)

I'm currently struggeling with the merge command on a AIX DB2 9.7.
All I want to achieve is write a command which performes and Update when there is already a row with the primary keys checked and if not, then perform the Insert statement.
That is what I expect the merge command can be used for.
Select something
if exists then update
if not exists then isert
The Update statements works (I can see in my Toad) and of course in the database e.g. timestamp. But the Insert doesn't work.
Threre no syntay error shown, but no insert is made. Toad shows "executed successfully".
Any help is appreciated.
The WHERE clause shows the primary keys of the table.
MERGE INTO DSPTMCP.KENNZAHL_DEFINITION as KD
USING(Select NR_MANDANT, SL_GRUPPE_KENNZAHL, SL_KENNZAHL
FROM DSPTMCP.KENNZAHL_DEFINITION
WHERE NR_MANDANT = 5472 AND SL_GRUPPE_KENNZAHL = '_VBH' AND SL_KENNZAHL = 104) as KD1
ON(KD.NR_MANDANT = KD1.NR_MANDANT AND KD.SL_GRUPPE_KENNZAHL = KD1.SL_GRUPPE_KENNZAHL AND KD.SL_KENNZAHL = KD1.SL_KENNZAHL)
WHEN MATCHED Then
UPDATE SET
BEZ_KENNZAHL_ABS = 'MEFGA',
BEZ_KENNZAHL_REL = 'Aufgaben',
BEZ_EINHEIT_ABS = '%',
BEZ_EINHEIT_REL = '%',
SL_MODIFIZIERUNG = 3,
ANZ_NACHKOMMASTELLEN_ABS = 2,
ANZ_NACHKOMMASTELLEN_REL = 2,
KZ_QUALITAETSZIEL_ABS = 'H',
KZ_QUALITAETSZIEL_REL = 'H',
BEZ_ERMITTLUNGSFREQUENZ = 'Monatl.',
BEZ_ERMITTLUNGSART = 'Automat',
BEZ_DATENLIEFERANT = 'Geschäftsfelddaten',
TXT_QUELLINFORMATION = 'Geschäftsfelddaten',
TXT_KNZ_BESCHREIBUNG = 'Aufgaben',
FAK_REF_GEWICHT = 1,
KZ_HILFSGROESSE = 'N',
SL_GRUPPE_KENNZAHL_REL = 'ALLG',
SL_KENNZAHL_REL = 10,
FAK_ERGEBNIS_REL = 1,
BEZ_EINHEIT_QUELLE = '%',
FAK_UMRECHNUNG_QUELLE = 1,
KZ_REF_OHNE_VORZEICHEN = 'N'
WHEN Not MATCHED Then
INSERT (NR_MANDANT,
SL_GRUPPE_KENNZAHL,
SL_KENNZAHL,
SYS_DWH_TSP,
SL_MODIFIZIERUNG,
UID_ERFASSUNG,
TSP_ERFASSUNG,
UID_AENDERUNG,
TSP_AENDERUNG,
BEZ_EINHEIT_ABS,
ANZ_NACHKOMMASTELLEN_ABS,
SL_GRUPPE_KENNZAHL_REF,
SL_KENNZAHL_REF,
KZ_REF_OHNE_VORZEICHEN,
SL_REF_VERDICHTUNG,
BEZ_KENNZAHL_ABS,
BEZ_KENNZAHL_REL,
KZ_HIERARCHIESUMME,
KZ_QUALITAETSZIEL_ABS,
KZ_QUALITAETSZIEL_REL,
BEZ_ERMITTLUNGSFREQUENZ,
BEZ_ERMITTLUNGSART,
IHT_MINIMAL,
IHT_MAXIMAL,
IHT_MINIMAL_REL,
IHT_MAXIMAL_REL,
BEZ_DATENLIEFERANT,
TXT_QUELLINFORMATION,
DAT_ERFASSUNG_AB,
DAT_ERFASSUNG_BIS,
TXT_KNZ_BESCHREIBUNG,
FAK_REF_GEWICHT,
KZ_HILFSGROESSE,
SL_GRUPPE_KENNZAHL_REL,
SL_KENNZAHL_REL,
FAK_ERGEBNIS_REL,
BEZ_EINHEIT_QUELLE,
FAK_UMRECHNUNG_QUELLE,
BEZ_EINHEIT_REL,
ANZ_NACHKOMMASTELLEN_REL)
VALUES(5472,
'_VBH',
'104',
current timestamp,
3,
'AUTOMAT',
current timestamp,
'AUTOMAT',
current timestamp,
'%',
2,
null,
null,
'N',
null,
'Aufgaben',
'Aufgaben',
'N',
'H',
'H',
'Monatl.',
'Automat',
null,
null,
null,
null,
'Geschäftsfelddaten',
'Geschäftsfelddaten',
'01.01.2000',
'31.12.9999',
'Aufgaben',
1,
'N',
'ALLG',
'10',
1,
'%',
1,
'%',
2);
Problem solved. The target table coming from the using clause needn't to be empty.
Switched it to sysdummy and now it works.
Something like this
MERGE INTO DSPTMCP.KENNZAHL_DEFINITION as KD
USING(Select 5472 AS NR_MANDANT, '_KCR' AS SL_GRUPPE_KENNZAHL,
600 AS SL_KENNZAHL FROM sysibm.sysdummy1) as KD1
ON(KD.NR_MANDANT = KD1.NR_MANDANT AND KD.SL_GRUPPE_KENNZAHL = KD1.SL_GRUPPE_KENNZAHL AND KD.SL_KENNZAHL = KD1.SL_KENNZAHL)
WHEN MATCHED THEN