Unpack a PostGIS/PostgreSQL record with SQLAlchemy - postgresql

How would I write the following query using SQLAlchemy's ORM?
SELECT filename, (stats).*
FROM (
SELECT filename, ST_SummaryStats(rast, 1, TRUE) AS stats FROM tiles
) AS stats_table;
Here, ST_SummaryStats is a PostGIS function that returns the record that I wish to unpack. tiles is a PostGIS table with filename and rast (raster) columns. My attempt is as follows:
sub_q = db_session.query(
Tiles.filename,
func.ST_SummaryStats(Tiles.rast, 1, True).label('stats'),
).subquery()
q = db_session.query(
sub_q.columns.filename,
sub_q.columns.stats,
)
However, I don't know how to write the (stats).* expression -- and hence unpack the record -- with SQLAlchemy's ORM. Consequently, stats appears to be a tuple.
Thanks in advance for any help.

ST_SummaryStats() returns a record, so rather than use it as a SELECT expression (which would return the record), use it as a FROM clause and pick the desired statistics at the SELECT level, so it becomes very simply:
SELECT filename, count, sum, mean, stddev, min, max
FROM tiles, ST_SummaryStats(tiles.rast, 1, true);
This results in a so-called LATERAL JOIN and since ST_SummaryStats() returns only a single row for the indicated raster in tiles you do not need a join condition, filter or anything else.
I am not sure about SQLAlchemy's ability to use the result of a function as a class, but a sure-fire way of making this work is to wrap the above SELECT into a VIEW and then access the view from SQLAlchemy:
CREATE VIEW raster_stats AS
SELECT filename, count, sum, mean, stddev, min, max
FROM tiles, ST_SummaryStats(tiles.rast, 1, true);

Related

How to convert an jsonb array and use stats moment

how are you?
I needed to store an array of numbers as JSONB in PostgreSQL.
Now I'm trying to calculate stats moments from this JSON, I'm facing some issues.
Sample of my data:
I already was able to convert a JSON into a float array.
I used a function to convert jsonb to float array.
CREATE OR REPLACE FUNCTION jsonb_array_castdouble(jsonb) RETURNS float[] AS $f$
SELECT array_agg(x)::float[] || ARRAY[]::float[] FROM jsonb_array_elements_text($1) t(x);
$f$ LANGUAGE sql IMMUTABLE;
Using this SQL:
with data as (
select
s.id as id,
jsonb_array_castdouble(s.snx_normalized) as serie
FROM
spectra s
)
select * from data;
I found a function that can do these calculations and I need to pass an array for that: https://github.com/ellisonch/PostgreSQL-Stats-Aggregate/
But this function requires an array in another way: unnested
I already tried to use unnest, but it will get only one value, not the entire array :(.
My goal is:
Be able to apply stats moment (kurtosis, skewness) for each row.
like:
index
skewness
1
21.2131
2
1.123
Bonus: There is a way to not use this 'with data', use the transformation in the select statement?
snx_wavelengths is JSON, right? And also you provided it as a picture and not text :( the data looks like (id, snx_wavelengths) - I believe you meant id saying index (not a good idea to use a keyword, would require identifier doublequotes):
1,[1,2,3,4]
2,[373,232,435,84]
If that is right:
select id, (stats_agg(v::float)).skewness
from myMeasures,
lateral json_array_elements_text(snx_wavelengths) v
group by id;
DBFiddle demo
BTW, you don't need "with data" in the original sample if you don't want to use and could replace with a subquery. ie:
select (stats_agg(n)).* from (select unnest(array[16,22,33,24,15])) data(n)
union all
select (stats_agg(n)).* from (select unnest(array[416,622,833,224,215])) data(n);
EDIT: And if you needed other stats too:
select id, "count","min","max","mean","variance","skewness","kurtosis"
from myMeasures,
lateral (select (stats_agg(v::float)).* from json_array_elements_text(snx_wavelengths) v) foo
group by id,"count","min","max","mean","variance","skewness","kurtosis";
DBFiddle demo

Use sum function in calculated column

Is it possible to use a sum function in a calculated column?
If yes, I would like to create a calculated column, that calculates the sum of a column in the same table where the date is smaller than the date of this entry. is this possible?
And last, would this optimize repeated calls on this value over the exemplified view below?
SELECT ProductGroup, SalesDate, (
SELECT SUM(Sales)
FROM SomeList
WHERE (ProductGroup= KVU.ProductGroup) AND (SalesDate<= KVU.SalesDate)) AS cumulated
FROM SomeList AS KVU
Is it possible to use a sum function in a calculated column?
Yes, it's possible using a scalar valued function (scalar UDF) for you computed column but this would be a disaster. Using scalar UDFs for computed columns destroy performance. Adding a scalar UDF that accesses data (which would be required here) makes things even worse.
It sounds to me like you just need a good ol' fashioned index to speed things up. First some sample data:
IF OBJECT_ID('dbo.somelist','U') IS NOT NULL DROP TABLE dbo.somelist;
GO
CREATE TABLE dbo.somelist
(
ProductGroup INT NOT NULL,
[Month] TINYINT NOT NULL CHECK ([Month] <= 12),
Sales DECIMAL(10,2) NOT NULL
);
INSERT dbo.somelist
VALUES (1,1,22),(2,1,45),(2,1,25),(2,1,19),(1,2,100),(1,2,200),(2,2,50.55);
and the correct index:
CREATE NONCLUSTERED INDEX nc_somelist ON dbo.somelist(ProductGroup,[Month])
INCLUDE (Sales);
With this index in place this query would be extremely efficient:
SELECT s.ProductGroup, s.[Month], SUM(s.Sales)
FROM dbo.somelist AS s
GROUP BY s.ProductGroup, s.[Month];
If you needed to get a COUNT by month & product group you could create an indexed view like so:
CREATE VIEW dbo.vw_somelist WITH SCHEMABINDING AS
SELECT s.ProductGroup, s.[Month], TotalSales = COUNT_BIG(*)
FROM dbo.somelist AS s
GROUP BY s.ProductGroup, s.[Month];
GO
CREATE UNIQUE CLUSTERED INDEX uq_cl__vw_somelist ON dbo.vw_somelist(ProductGroup, [Month]);
Once that indexed view was in place your COUNTs would be pre-aggregated. You cannot, however, include SUM in an indexed view.

Prepare dynamic case statement using PostgreSQL 9.3

I have the following case statement to prepare as a dynamic as shown below:
Example:
I have the case statement:
case cola
when cola between '2001-01-01' and '2001-01-05' then 'G1'
when cola between '2001-01-10' and '2001-01-15' then 'G2'
when cola between '2001-01-20' and '2001-01-25' then 'G3'
when cola between '2001-02-01' and '2001-02-05' then 'G4'
when cola between '2001-02-10' and '2001-02-15' then 'G5'
else ''
end
Note: Now I want to create dynamic case statement because of the values dates and name passing as a parameter and it may change.
Declare
dates varchar = '2001-01-01to2001-01-05,2001-01-10to2001-01-15,
2001-01-20to2001-01-25,2001-02-01to2001-02-05,
2001-02-10to2001-02-15';
names varchar = 'G1,G2,G3,G4,G5';
The values in the variables may change as per the requirements, it will be dynamic. So the case statement should be dynamic without using loop.
You may not need any function for this, just join to a mapping data-set:
with cola_map(low, high, value) as (
values(date '2001-01-01', date '2001-01-05', 'G1'),
('2001-01-10', '2001-01-15', 'G2'),
('2001-01-20', '2001-01-25', 'G3'),
('2001-02-01', '2001-02-05', 'G4'),
('2001-02-10', '2001-02-15', 'G5')
-- you can include as many rows, as you want
)
select table_name.*,
coalesce(cola_map.value, '') -- else branch from case expression
from table_name
left join cola_map on table_name.cola between cola_map.low and cola_map.high
If your date ranges could collide, you can use DISTINCT ON or GROUP BY to avoid row duplication.
Note: you can use a simple sub-select too, I used a CTE, because it's more readable.
Edit: passing these data (as a single parameter) can be achieved by passing a multi-dimensional array (or an array of row-values, but that requires you to have a distinct, predefined composite type).
Passing arrays as parameters can depend on the actual client (& driver) you use, but in general, you can use the array's input representation:
-- sql
with cola_map(low, high, value) as (
select d[1]::date, d[2]::date, d[3]
from unnest(?::text[][]) d
)
select table_name.*,
coalesce(cola_map.value, '') -- else branch from case expression
from table_name
left join cola_map on table_name.cola between cola_map.low and cola_map.high
// client pseudo code
query = db.prepare(sql);
query.bind(1, "{{2001-01-10,2001-01-15,G2},{2001-01-20,2001-01-25,G3}}");
query.execute();
Passing each chunk of data separately is also possible with some clients (or with some abstractions), but this is highly depends on your driver/orm/etc. you use.

PostgreSQL Recursive Query Performance

I am a noob when it comes to PostgreSQL, but I was able to get it to produce what I needed it to do which was to take a hierarchy that was up to 30 levels deep, and create a flattened list 'jagged' listview with the topmost level of and every intervening level to each end node. The recursive function, just pushes every parent found, into an array, and then returns the final flattened list for each node using (LIMIT 1)
The following bit of SQL generates the table I need. My question is whether my function that returns the array of values that I use populate the row-columns is called once per row, or is called once for each of the 30 columns in each row.
Can someone guide me to how I would determine that? And/or if it is blatantly obvious that my SQL is inefficient, what might be a better way of putting together the statements.
Thanks in advance for having a look.
DROP FUNCTION IF EXISTS fnctreepath(nodeid NUMERIC(10,0));
CREATE FUNCTION fnctreepath(nodeid NUMERIC(10,0))
RETURNS TABLE (endnode NUMERIC, depth INTEGER, path NUMERIC[]) AS
$$
WITH RECURSIVE ttbltreepath(endnode, nodeid, parentid, depth, path) AS (
SELECT src.nodeid AS endnode, src.nodeid, src.parentid, 1::INT AS depth,
ARRAY[src.nodeid::NUMERIC(10,0)]::NUMERIC(10,0)[] AS path
FROM tree AS src WHERE nodeid = $1
UNION
SELECT ttbl.endnode, src.nodeid, src.parentid, ttbl.depth + 1 AS depth,
ARRAY_PREPEND(src.nodeid::NUMERIC(10,0), ttbl.path::NUMERIC(10,0)[])::NUMERIC(10,0)[] AS path
FROM tree AS src, ttbltreepath AS ttbl WHERE ttbl.parentid = src.nodeid
)
SELECT endnode, depth, path FROM ttbltreepath GROUP BY endnode, depth, path ORDER BY endnode, depth DESC LIMIT 1;
$$ LANGUAGE SQL;
DROP TABLE IF EXISTS treepath;
SELECT parentid, nodeid, name
(fnctreepath(tree.nodeid)).depth,
(fnctreepath(tree.nodeid)).path[1] as nodeid01,
(fnctreepath(tree.nodeid)).path[2] as nodeid02,
(fnctreepath(tree.nodeid)).path[3] as nodeid03,
(fnctreepath(tree.nodeid)).path[4] as nodeid04,
(fnctreepath(tree.nodeid)).path[5] as nodeid05,
(fnctreepath(tree.nodeid)).path[6] as nodeid06,
(fnctreepath(tree.nodeid)).path[7] as nodeid07,
(fnctreepath(tree.nodeid)).path[8] as nodeid08,
(fnctreepath(tree.nodeid)).path[9] as nodeid09,
(fnctreepath(tree.nodeid)).path[10] as nodeid10,
(fnctreepath(tree.nodeid)).path[11] as nodeid11,
(fnctreepath(tree.nodeid)).path[12] as nodeid12,
(fnctreepath(tree.nodeid)).path[13] as nodeid13,
(fnctreepath(tree.nodeid)).path[14] as nodeid14,
(fnctreepath(tree.nodeid)).path[15] as nodeid15,
(fnctreepath(tree.nodeid)).path[16] as nodeid16,
(fnctreepath(tree.nodeid)).path[17] as nodeid17,
(fnctreepath(tree.nodeid)).path[18] as nodeid18,
(fnctreepath(tree.nodeid)).path[19] as nodeid19,
(fnctreepath(tree.nodeid)).path[20] as nodeid20,
(fnctreepath(tree.nodeid)).path[21] as nodeid21,
(fnctreepath(tree.nodeid)).path[22] as nodeid22,
(fnctreepath(tree.nodeid)).path[23] as nodeid23,
(fnctreepath(tree.nodeid)).path[24] as nodeid24,
(fnctreepath(tree.nodeid)).path[25] as nodeid25,
(fnctreepath(tree.nodeid)).path[26] as nodeid26,
(fnctreepath(tree.nodeid)).path[27] as nodeid27,
(fnctreepath(tree.nodeid)).path[28] as nodeid28,
(fnctreepath(tree.nodeid)).path[29] as nodeid29,
(fnctreepath(tree.nodeid)).path[30] as nodeid30
INTO treepath
FROM tree;
You should check the volatile attribute of your function.
By default a function is VOLATILE, meaning any call to the function may alter the database, so the query optimiser cannot reuse the result when you use the function several times in the same statement.
Your function is not IMUTABLE, 2+2=4 is immutable. But you should define the STABLE volatility keyword for your function, this way the optimiser could reuse your call of fnctreepath(tree.nodeid) used several time in the same statement as a stable result and share it (run it only once).

Using two different rows from the same table in an expression

I'm using PostgreSQL + PostGIS.
In table I have a point and line geometry in the same column of the same table, in different rows. To get the line I run:
SELECT the_geom
FROM filedata
WHERE id=3
If i want to take point I run:
SELECT the_geom
FROM filedata
WHERE id=4
I want take point and line together, like they're shown in this WITH expression, but using a real query against the table instead:
WITH data AS (
SELECT 'LINESTRING (50 40, 40 60, 50 90, 30 140)'::geometry AS road,
'POINT (60 110)'::geometry AS poi)
SELECT ST_AsText(
ST_Line_Interpolate_Point(road, ST_Line_Locate_Point(road, poi))) AS projected_poi
FROM data;
You see in this example data comes from a hand-created WITH expression. I want take it from my filedata table. My problem is i dont know how to work with data from two different rows of one table at the same time.
One possible way:
A subquery to retrieve another value from a different row.
SELECT ST_AsText(
ST_Line_Interpolate_Point(
the_geom
,ST_Line_Locate_Point(
the_geom
,(SELECT the_geom FROM filedata WHERE id = 4)
)
)
) AS projected_poi
FROM filedata
WHERE id = 3;
Use a self-join:
SELECT ST_AsText(
ST_Line_Interpolate_Point(fd_road.the_geom, ST_Line_Locate_Point(
fd_road.the_geom,
fd_poi.the_geom
)) AS projected_poi
FROM filedata fd_road, filedata fd_poi
WHERE fd_road.id = 3 AND fd_poi.id = 4;
Alternately use a subquery to fetch the other row, as Erwin pointed out.
The main options for using multiple rows from one table in a single expression are:
Self-join the table with two different aliases as shown above, then filter the rows;
Use a subquery expression to get a value for all but one of the rows, as Erwin's answer shows;
Use a window function like lag() and lead() to get a row relative to the current row within the query result; or
JOIN on a subquery that returns a table
The latter two are more advanced options that solve problems that're difficult or inefficient to solve with the simpler self-join or subquery expression.