Postgresql - Chain multiple regex_replace functions in single query? - postgresql

Using Postgresql 11.6. I have values in tab_a.sysdescr that I want to convert using regex_replace and update those converted values into tab_b.os_type.
Here is table tab_a that contains the source string in sysdescr :
hostname | sysdescr |
-------------+-----------------+
wifiap01 | foo HiveOS bar |
switch01 | foo JUNOS bar |
router01 | foo IOS XR bar |
Here is table tab_b that is the target for my update, in column os_type :
hostname | mgmt_ip | os_type
-------------+--------------+---------
wifiap01 | 10.20.30.40 |
switch01 | 20.30.40.50 |
router01 | 30.40.50.60 |
This is example desired state for tab_b :
hostname | mgmt_ip | os_type
-------------+--------------+---------
wifiap01 | 10.20.30.40 | hiveos
switch01 | 20.30.40.50 | junos
router01 | 30.40.50.60 | iosxr
I have a working query that will work against a single os_type. In this example, HiveOS :
UPDATE tab_b
SET os_type = (
SELECT REGEXP_REPLACE(sysdescr, '.*HiveOS.*', 'hiveos')
FROM tab_a
WHERE tab_a.hostname = tab_b.hostname
)
WHERE EXISTS (
SELECT sysdescr
FROM tab_a
WHERE tab_a.hostname = tab_b.hostname
);
What I can't figure out is how I can "chain" multiple regex_replace functions together into a single query, or via nested sub-queries. Adding 'OR' after that SELECT REGEX_REPLACE line doesn't work, and haven't been able to find examples online of something like this.
End-goal is a single query function that will replace the strings as specified, updating the replaced string on all rows in tab_b. I was hoping to avoid having to delve into PL/Python but if that is the best way to solve this, that's okay. Ideally, I could define a third table that contains the pattern and replacement_string arguments - and could iterate over that somehow.
Edit: Example of what I am trying to accomplish
This is not valid code, but hopefully demonstrates what I am trying to accomplish. A single query that can be executed once, and will translate/transform every sysdescr in a table into proper values for os_type in a new table.
UPDATE tab_b
SET os_type = (
SELECT REGEXP_REPLACE(sysdescr, '.*HiveOS.*', 'hiveos') OR
SELECT REGEXP_REPLACE(sysdescr, '.*JUNOS.*', 'junos') OR
SELECT REGEXP_REPLACE(sysdescr, '.*IOS XR.*', 'iosxr')
FROM tab_a
WHERE tab_a.hostname = tab_b.hostname
)
WHERE EXISTS (
SELECT sysdescr
FROM tab_a
WHERE tab_a.hostname = tab_b.hostname
);

If foo and bar are consistent in all rows (as indicated in your example), then this should work:
postgres=# SELECT lower(replace(regexp_replace('foo IOS XR bar','foo (.*) bar','\1'),' ',''));
lower
-------
iosxr
(1 row)
In short, this does the following:
Trim off foo and bar from the front and back with regexp_replace()
Remove the spaces with replace()
Lower-case the text with lower()
If you need to do anything further to remove foo and bar, you can nest the string functions as demonstrated above.

I was able to solve this using a third table (lookup table). It contains two columns, one holding the match string and one holding the return string.
New table tab_lookup:
id | match_str | return_str
----+-----------------------------------------------+------------
1 | HiveOS | hiveos
2 | IOS XR | iosxr
3 | JUNOS | junos
5 | armv | opengear
6 | NX-OS | nxos
7 | Adaptive Security Appliance | asa
17 | NetScreen | netscreen
19 | Cisco Internetwork Operating System Software | ios
18 | Cisco IOS Software | ios
20 | ProCurve | hp
21 | AX Series Advanced Traffic Manager | a10
22 | SSG | netscreen
23 | M13, Software Version | m13
24 | WS-C2948 | catos
25 | Application Control Engine Appliance | ace
Using this query I can update tab_b.os_type with the appropriate value from tab_lookup.return_str:
UPDATE tab_b
SET os_type = (
SELECT return_str
FROM tab_lookup
WHERE EXISTS (
SELECT regexp_matches(sysdescr, match_str)
FROM tab_a
WHERE tab_a.hostname = tab_b.hostname
)
);
The only catch I have encountered is that there must be only one match against a given row. But this is easily accomplished by verbose match_str values. E.g, don't use 'IOS' but instead use 'Cisco IOS Software'.
All in all, very happy with this solution since it provides an easy way to update the lookup values, as more device types are added to the network.

Related

Column-wise autocomplete

I have a table in a PostgreSQL database with four columns that contain increasingly more detailed information (think state->city->street->number), along with a column where everything is concatenated according to some simple formatting rules. Example:
| kommun | trakt | block | enhet | beteckning |
| Mora | Gislövs Läge | 9 | 16 | Mora Gislövs Läge 9:16 |
| Mora | Gisslaved | * | 8 | Mora Gisslaved 8 |
| Mora | Gisslaved | * | 9 | Mora Gisslaved 9 |
| Lilla Edet | Sanda | GA | 1 | Lilla Edet Sanda GA:1 |
A web service uses this table to implement a word-wise autocomplete, where the user gets input suggestions as they drill down. An input of mora gis will result in
["Mora Gislövs", "Mora Gisslaved"]
Currently, this is done by splitting the concatenated column by word in this query:
select distinct trim(substring(beteckning from '(^(\S+\s?){NUMPARTS})')) as bet
from beteckning_ac
where upper(beteckning) like upper('mora gis%')
order by bet
Where NUMPARTS is the number of words in the input - 2 in this case.
Now I want the autocomplete to be done column-wise rather than word-wise, so mora gis would now result in this instead:
["Mora Gislövs Läge", "Mora Gisslaved"]
Since the first two columns can contain an arbitrary number of words, I can no longer use the input to determine how many columns to include in the response. Is there a way to do this, or have I maybe gone about this autocomplete business all wrong?
CREATE OR REPLACE FUNCTION get_auto(text)
--$1 is here your input
RETURNS setof text
LANGUAGE plpgsql
AS $function$
declare
NUMPARTS int := array_length(regexp_split_to_array($1,' '), 1);
begin
return query
select
case
when (NUMPARTS = 1) then kommun
when (NUMPARTS = 2) then kommun||' '||trakt
when (NUMPARTS = 3) then kommun||' '||trakt||' '||block
when (NUMPARTS = 4) then kommun||' '||trakt||' '||block||' '||enhet
--alter if you want to
end
from
auto_complete --your tablename here
where
beteckning like $1||'%';
end;
$function$;

How to get back aggregate values across 2 dimensions using Python Cubes?

Situation
Using Python 3, Django 1.9, Cubes 1.1, and Postgres 9.5.
These are my datatables in pictorial form:
The same in text format:
Store table
------------------------------
| id | code | address |
|-----|------|---------------|
| 1 | S1 | Kings Row |
| 2 | S2 | Queens Street |
| 3 | S3 | Jacks Place |
| 4 | S4 | Diamonds Alley|
| 5 | S5 | Hearts Road |
------------------------------
Product table
------------------------------
| id | code | name |
|-----|------|---------------|
| 1 | P1 | Saucer 12 |
| 2 | P2 | Plate 15 |
| 3 | P3 | Saucer 13 |
| 4 | P4 | Saucer 14 |
| 5 | P5 | Plate 16 |
| and many more .... |
|1000 |P1000 | Bowl 25 |
|----------------------------|
Sales table
----------------------------------------
| id | product_id | store_id | amount |
|-----|------------|----------|--------|
| 1 | 1 | 1 |7.05 |
| 2 | 1 | 2 |9.00 |
| 3 | 2 | 3 |1.00 |
| 4 | 2 | 3 |1.00 |
| 5 | 2 | 5 |1.00 |
| and many more .... |
| 1000| 20 | 4 |1.00 |
|--------------------------------------|
The relationships are:
Sales belongs to Store
Sales belongs to Product
Store has many Sales
Product has many Sales
What I want to achieve
I want to use cubes to be able to do a display by pagination in the following manner:
Given the stores S1-S3:
-------------------------
| product | S1 | S2 | S3 |
|---------|----|----|----|
|Saucer 12|7.05|9 | 0 |
|Plate 15 |0 |0 | 2 |
| and many more .... |
|------------------------|
Note the following:
Even though there were no records in sales for Saucer 12 under Store S3, I displayed 0 instead of null or none.
I want to be able to do sort by store, say descending order for, S3.
The cells indicate the SUM total of that particular product spent in that particular store.
I also want to have pagination.
What I tried
This is the configuration I used:
"cubes": [
{
"name": "sales",
"dimensions": ["product", "store"],
"joins": [
{"master":"product_id", "detail":"product.id"},
{"master":"store_id", "detail":"store.id"}
]
}
],
"dimensions": [
{ "name": "product", "attributes": ["code", "name"] },
{ "name": "store", "attributes": ["code", "address"] }
]
This is the code I used:
result = browser.aggregate(drilldown=['Store','Product'],
order=[("Product.name","asc"), ("Store.name","desc"), ("total_products_sale", "desc")])
I didn't get what I want.
I got it like this:
----------------------------------------------
| product_id | store_id | total_products_sale |
|------------|----------|---------------------|
| 1 | 1 | 7.05 |
| 1 | 2 | 9 |
| 2 | 3 | 2.00 |
| and many more .... |
|---------------------------------------------|
which is the whole table with no pagination and if the products not sold in that store it won't show up as zero.
My question
How do I get what I want?
Do I need to create another data table that aggregates everything by store and product before I use cubes to run the query?
Update
I have read more. I realised that what I want is called dicing as I needed to go across 2 dimensions. See: https://en.wikipedia.org/wiki/OLAP_cube#Operations
Cross-posted at Cubes GitHub issues to get more attention.
This is a pure SQL solution using crosstab() from the additional tablefunc module to pivot the aggregated data. It typically performs better than any client-side alternative. If you are not familiar with crosstab(), read this first:
PostgreSQL Crosstab Query
And this about the "extra" column in the crosstab() output:
Pivot on Multiple Columns using Tablefunc
SELECT product_id, product
, COALESCE(s1, 0) AS s1 -- 1. ... displayed 0 instead of null
, COALESCE(s2, 0) AS s2
, COALESCE(s3, 0) AS s3
, COALESCE(s4, 0) AS s4
, COALESCE(s5, 0) AS s5
FROM crosstab(
'SELECT s.product_id, p.name, s.store_id, s.sum_amount
FROM product p
JOIN (
SELECT product_id, store_id
, sum(amount) AS sum_amount -- 3. SUM total of product spent in store
FROM sales
GROUP BY product_id, store_id
) s ON p.id = s.product_id
ORDER BY s.product_id, s.store_id;'
, 'VALUES (1),(2),(3),(4),(5)' -- desired store_id's
) AS ct (product_id int, product text -- "extra" column
, s1 numeric, s2 numeric, s3 numeric, s4 numeric, s5 numeric)
ORDER BY s3 DESC; -- 2. ... descending order for S3
Produces your desired result exactly (plus product_id).
To include products that have never been sold replace [INNER] JOIN with LEFT [OUTER] JOIN.
SQL Fiddle with base query.
The tablefunc module is not installed on sqlfiddle.
Major points
Read the basic explanation in the reference answer for crosstab().
I am including with product_id because product.name is hardly unique. This might otherwise lead to sneaky errors conflating two different products.
You don't need the store table in the query if referential integrity is guaranteed.
ORDER BY s3 DESC works, because s3 references the output column where NULL values have been replaced with COALESCE. Else we would need DESC NULLS LAST to sort NULL values last:
PostgreSQL sort by datetime asc, null first?
For building crosstab() queries dynamically consider:
Dynamic alternative to pivot with CASE and GROUP BY
I also want to have pagination.
That last item is fuzzy. Simple pagination can be had with LIMIT and OFFSET:
Displaying data in grid view page by page
I would consider a MATERIALIZED VIEW to materialize results before pagination. If you have a stable page size I would add page numbers to the MV for easy and fast results.
To optimize performance for big result sets, consider:
SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'
Optimize query with OFFSET on large table

Update intermediate result

EDIT
As requested a little background of what I want to achieve. I have a table that I want to query but I don't want to change the table itself. Next the result of the SELECT query (what I called the 'intermediate table') needs to be cleaned a bit. For example certain cells of certain rows need to be swapped and some strings need to be trimmed. Of course this could all be done as postprocessing in, e.g., Python, but I was hoping to do all of this with one query statement.
Being new to Postgresql I want to update the intermediate table that results from a SELECT statement. So I basically want to edit the resulting table from a SELECT statement in one query. I'd like to prevent having to store the intermediate result.
I've tried the following 'with clause':
with result as (
select
a
from
b
)
update result as r
set
a = 'd'
...but that results in ERROR: relation "result" does not exist, while the following does work:
with result as (
select
a
from
b
)
select
*
from
result
As I said, I'm new to Postgresql so it is entirely possible that I'm using the wrong approach.
Depending on the complexity of the transformations you want to perform, you might be able to munge it into the SELECT, which would let you get away with a single query:
WITH foo AS (SELECT lower(name), freq, cumfreq, rank, vec FROM names WHERE name LIKE 'G%')
SELECT ... FROM foo WHERE ...
Or, for more or less unlimited manipulation options, you could create a temp table that will disappear at the end of the current transaction. That doesn't get the job done in a single query, but it does get it all done on the SQL server, which might still be worthwhile.
db=# BEGIN;
BEGIN
db=# CREATE TEMP TABLE foo ON COMMIT DROP AS SELECT * FROM names WHERE name LIKE 'G%';
SELECT 4677
db=# SELECT * FROM foo LIMIT 5;
name | freq | cumfreq | rank | vec
----------+-------+---------+------+-----------------------
GREEN | 0.183 | 11.403 | 35 | 'KRN':1 'green':1
GONZALEZ | 0.166 | 11.915 | 38 | 'KNSL':1 'gonzalez':1
GRAY | 0.106 | 15.921 | 69 | 'KR':1 'gray':1
GONZALES | 0.087 | 18.318 | 94 | 'KNSL':1 'gonzales':1
GRIFFIN | 0.084 | 18.659 | 98 | 'KRFN':1 'griffin':1
(5 rows)
db=# UPDATE foo SET name = lower(name);
UPDATE 4677
db=# SELECT * FROM foo LIMIT 5;
name | freq | cumfreq | rank | vec
--------+-------+---------+-------+---------------------
grube | 0.002 | 67.691 | 7333 | 'KRP':1 'grube':1
gasper | 0.001 | 69.999 | 9027 | 'KSPR':1 'gasper':1
gori | 0.000 | 81.360 | 28946 | 'KR':1 'gori':1
goeltz | 0.000 | 85.471 | 47269 | 'KLTS':1 'goeltz':1
gani | 0.000 | 86.202 | 51743 | 'KN':1 'gani':1
(5 rows)
db=# COMMIT;
COMMIT
db=# SELECT * FROM foo;
ERROR: relation "foo" does not exist

How do you do a "where in" sql query with an array data type?

I have a field:
dtype ==> character varying(3)[]
... but it's an array. So I have for example:
ID | name | dtype
1 | one | {'D10', 'D20', 'D30'}
2 | sam | {'D20'}
3 | jax | {'D10', 'D20'}
4 | pam | {'D10', 'D30'}
5 | pot | {'D10'}
I want to be able to do something like this:
select * from table where dtype in ('D20', 'D30')
This syntax doesnt work, but the goal is to then return fields 1,2,3,4 but not 5.
Is this possible?
Use the && operator as shown in the PostgreSQL manual under "array operators".
select * from table where dtype && ARRAY['D20', 'D30']

Query join result appears to be incorrect

I have no idea what's going on here. Maybe I've been staring at this code for too long.
The query I have is as follows:
CREATE VIEW v_sku_best_before AS
SELECT
sw.sku_id,
sw.sku_warehouse_id "A",
sbb.sku_warehouse_id "B",
sbb.best_before,
sbb.quantity
FROM SKU_WAREHOUSE sw
LEFT OUTER JOIN SKU_BEST_BEFORE sbb
ON sbb.sku_warehouse_id = sw.warehouse_id
ORDER BY sbb.best_before
I can post the table definitions if that helps, but I'm not sure it will. Suffice to say that SKU_WAREHOUSE.sku_warehouse_id is an identity column, and SKU_BEST_BEFORE.sku_warehouse_id is a child that uses that identity as a foreign key.
Here's the result when I run the query:
+--------+-----+----+-------------+----------+
| sku_id | A | B | best_before | quantity |
+--------+-----+----+-------------+----------+
| 20251 | 643 | 11 | <<null>> | 140 |
+--------+-----+----+-------------+----------+
(1 row)
The join specifies that the sku_warehouse_id columns have to be equal, but when I pull the ID from each table (labelled as A and B) they're different.
What am I doing wrong?
Perhaps just sw.sku_warehouse_id instead of sw.warehouse_id?