Postgres subquery with a separate aggregate? - postgresql

I have a feeling that the answer here is something to do with pivot tables... however, this is what I am struggling to do. The source table has many report ids and more landcover and data types but I think this illustrates things...
here is part of the table I am querying.
report_id
landcover_type
data_type
mean
area
615
Acid grassland
canopyheight
2
493.9125
615
Arable and horticulture
canopyheight
4
0.86
615
Acid grassland
carbonstoragewoodlands
8
493.9125
615
Arable and horticulture
carbonstoragewoodlands
16
0.86
Is there a simple way to query the data and get the following...
report_id
landcover_type
mean_canopy_height
mean_carbonstorage
615
Acid grassland
2
8
615
Arable and horticulture
4
16

I am not sure the following gives you what you want, but it does give you what you asked for. All that it requires is simple MIN/MAX values of the mean column.
with dataset(report_id, landcover_type, data_type, mean, area) as
( values ( 615, 'Acid grassland', 'canopyheight', 2, 493.9125 )
, ( 615, 'Arable and horticulture', 'canopyheight', 4, 0.86 )
, ( 615, 'Acid grassland', 'carbonstoragewoodlands', 8, 493.9125 )
, ( 615, 'Arable and horticulture', 'carbonstoragewoodlands', 16, 0.86)
)
select report_id
, landcover_type
, min(mean) mean_canopy_height
, max(mean) mean_carbonstorage
from dataset
group by report_id, landcover_type
order by landcover_type;
If not what you actually want, then you need to clarify what you want and/or modify the expected results given the same imput.

Related

Tsrange - Calculating the difference between two ranges

I have two tables free_time and appointment. Both contain tsranges.
How do I write a query (or function) that determines the actual free time after "subtracting" the difference the appointment from the freetime?
INSERT INTO free_time(freetime)
VALUES('[2017-04-19 09:00, 2017-04-19 12:30)');
INSERT INTO appointment(appointment)
VALUES('[2017-04-19 10:30, 2017-04-19 11:30)');
I want the result to be something like:
["2017-04-19 9:00","2017-04-19 10:30:00"),
["2017-04-19 11:30:00","2017-04-19 12:30:00")
You'll have to break apart the range, from the docs
The union and difference operators will fail if the resulting range would need to contain two disjoint sub-ranges, as such a range cannot be represented.
In order to do this you can use lower, and upper
SELECT tsrange( lower(freetime), lower(appointment) ) AS before_appointment,
tsrange( upper(appointment), upper(freetime) ) AS after_appointment
FROM ( VALUES
(
'[2017-04-19 09:00, 2017-04-19 12:30)'::tsrange,
'[2017-04-19 10:30, 2017-04-19 11:30)'::tsrange
)
) AS t(freetime,appointment)
WHERE freetime #> appointment;
before_appointment | after_appointment
-----------------------------------------------+-----------------------------------------------
["2017-04-19 09:00:00","2017-04-19 10:30:00") | ["2017-04-19 11:30:00","2017-04-19 12:30:00")
(1 row)

Update rows returned by a complex SQL query with data from query result

I have a multi-table join and want to update a table based on the result of that join. The join table produces both the scope of the update (only those rows whose effort.id appears in the result should be updated) and the data for the update (a new column should be set to the value of a calculated column).
I've made progress but can't quite make it work. Here's my statement:
UPDATE
efforts
SET
dropped_int = jt.split
FROM
(
SELECT
ef.id,
s.id split,
s.kind,
s.distance_from_start,
s.sub_order,
max(s.distance_from_start + s.sub_order)
OVER (PARTITION BY ef.id) AS max_dist
FROM
split_times st
LEFT JOIN splits s ON s.id = st.split_id
LEFT JOIN efforts ef ON ef.id = st.effort_id
) jt
WHERE
((jt.distance_from_start + jt.sub_order) = max_dist)
AND
kind <> 1;
The SELECT produces the correct join table:
id split kind dfs sub max_dist dropped dropped_int
403 33 2 152404 1 152405 TRUE 33
404 33 2 152404 1 152405 TRUE 33
405 31 2 143392 1 143393 TRUE 33
406 31 2 143392 1 143393 TRUE 33
407 29 2 132127 1 132128 TRUE 33
408 29 2 132127 1 132128 TRUE 33
409 29 2 132127 1 132128 TRUE 33
and does indeed update the efforts.id column, but there are two problems: First, it updates all efforts, not just those that are produced from the query, and second, it sets effort.id to the split value of the first row in the query result, but I need it to set each effort to the associated split value.
If this were non-SQL, it might look something like:
jt_rows.each do |jt_row|
efforts[jt_row].dropped_int = jt[jt_row].split
end
But I don't know how to do that in SQL. It seems like this should be a fairly common problem, but after a couple of hours of searching I'm coming up short.
How should I modify my statement to produce the described result? If it matters, this is Postgres 9.5. Thanks in advance for any suggestions.
EDIT:
I did not get a workable answer but ended up solving this with a mixture of SQL and native code (Ruby/Rails):
dropped_splits = SplitTime.joins(:split).joins(:effort)
.select('DISTINCT ON (efforts.id) split_times.effort_id, split_times.split_id')
.where(efforts: {dropped: true})
.order('efforts.id, splits.distance_from_start DESC, splits.sub_order DESC')
update_hash = Hash[dropped_splits.map { |x| [x.effort_id, {dropped_split_id: x.split_id, updated_at: Time.now}] }]
Effort.update(update_hash.keys, update_hash.values)
Use a condition in the WHERE clause that relates efforts table with a subquery:
efforts.id = jt.id
that is:
WHERE
((jt.distance_from_start + jt.sub_order) = max_dist)
AND
kind <> 1
AND
efforts.id = jt.id

Query works on database A but not on B

I'v got the following query:
SELECT
nr
, txt = info.result
FROM
dbo.anlagen AS a
CROSS APPLY
ocAuxiliary.splitString(
ISNULL(
ocAuxiliary.parseRTF(a.notiz)
,'')
,80)
AS info
which works fine on on database, but not on another. The functions / SPROCS are created by code and therefore deterministic.
Error on B is:
Meldung 102, Ebene 15, Status 1, Zeile 9
Falsche Syntax in der Nähe von '.'.
( Wrong Syntax near '.'.)
Just calling the used functions/SPROCS works fine also:
On DB A
SELECT * from ocAuxiliary.splitString('1234567890', 3)
returns
iteration result
1 123
2 456
3 789
4 0
as it does on DB B.
On DB A
select ocAuxiliary.parseRTF('{\rtf1\ansi\ansicpg1252\deff0{\fonttbl{\f0\fnil\fcharset0 Arial;}}\viewkind4\uc1\pard\lang1031\fs20 12 ')
returns 12
as it does on DB B.
I simply don't see the mistake.

Extracting values from non-standard markup strings in PostgreSQL

Unfortunately, I have a table like the following:
DROP TABLE IF EXISTS my_list;
CREATE TABLE my_list (index int PRIMARY KEY, mystring text, status text);
INSERT INTO my_list
(index, mystring, status) VALUES
(12, '', 'D'),
(14, '[id] 5', 'A'),
(15, '[id] 12[num] 03952145815', 'C'),
(16, '[id] 314[num] 03952145815[name] Sweet', 'E'),
(19, '[id] 01211[num] 03952145815[name] Home[oth] Alabama', 'B');
Is there any trick to get out number of [id] as integer from the mystring text shown above? As though I ran the following query:
SELECT index, extract_id_function(mystring), status FROM my_list;
and got results like:
12 0 D
14 5 A
15 12 C
16 314 E
19 1211 B
Preferably with only simple string functions and if not regular expression will be fine.
If I understand correctly, you have a rather unconventional markup format where [id] is followed by a space, then a series of digits that represents a numeric identifier. There is no closing tag, the next non-numeric field ends the ID.
If so, you're going to be able to do this with non-regexp string ops, but only quite badly. What you'd really need is the SQL equivalent of strtol, which consumes input up to the first non-digit and just returns that. A cast to integer will not do that, it'll report an error if it sees non-numeric garbage after the number. (As it happens I just wrote a C extension that exposes strtol for decoding hex values, but I'm guessing you don't want to use C extensions if you don't even want regex...)
It can be done with string ops if you make the simplifying assumption that an [id] nnnn tag always ends with either end of string or another tag, so it's always [ at the end of the number. We also assume that you're only interested in the first [id] if multiple appear in a string. That way you can write something like the following horrible monstrosity:
select
"index",
case
when next_tag_idx > 0 then substring(cut_id from 0 for next_tag_idx)
else cut_id
end AS "my_id",
"status"
from (
select
position('[' in cut_id) AS next_tag_idx,
*
from (
select
case
when id_offset = 0 then null
else substring(mystring from id_offset + 4)
end AS cut_id,
*
from (
select
position('[id] ' in mystring) AS id_offset,
*
from my_list
) x
) y
) z;
(If anybody ever actually uses that query for anything, kittens will fall from the sky and splat upon the pavement, wailing in horror all the way down).
Or you can be sensible and just use a regular expression for this kind of string processing, in which case your query (assuming you only want the first [id]) is:
regress=> SELECT
"index",
coalesce((SELECT (regexp_matches(mystring, '\[id\]\s?(\d+)'))[1])::integer, 0) AS my_id,
status
FROM my_list;
index | my_id | status
-------+----------------+--------
12 | 0 | D
14 | 5 | A
15 | 12 | C
16 | 314 | E
19 | 01211 | B
(5 rows)
Update: If you're having issues with unicode handling in regex, upgrade to Pg 9.2. See https://stackoverflow.com/a/14293924/398670

Pulling correct results from my PitchValues Table

I am getting a tad frustrated and was wondering if you can help:
I have a Pitch Values Table with the following Columns PitchValues_Skey, PitchType_Skey (this is a foreign key), Start Date, End Date and finally value:
For Example:
1 7 01/01/2010 31/12/2010 £15
2 7 01/01/2011 31/12/2011 £20
And all I want to do is update my Bookings table with how much each booking is going to be, so I put together the code below which worked fine when I only had 2010 data, but I know have 2011 and 2012 and want to update it but it will only update with the 2010 prices.
SELECT Bookings.Booking_Skey, DATEDIFF(day, Bookings.ArrivalDate, Bookings.DepartureDate) * PitchValues.Value AS BookingValue,
PitchValues.PitchType_Skey
FROM Bookings INNER JOIN
PitchValues ON Bookings.PitchType_Skey = PitchValues.PitchType_Skey
WHERE (Bookings.Booking_Skey = 1)
So when I run the query above I would expect to see one line of data but instead I see 4 (See Below)
I would expect this:
Booking_Skey BookingValue PitchType_Skey
1 420 4
But I get this
Booking_Skey BookingValue PitchType_Skey
1 420 4
1 453.6 4
1 476.7 4
1 476.7 4
All sorted now, thanks for your help.
SELECT Bookings.Booking_Skey, DATEDIFF(DAY, Bookings.ArrivalDate, Bookings.DepartureDate) * PitchValues.Value AS BookingValue, PitchValues.PitchType_Skey
FROM Bookings
INNER JOIN PitchValues ON Bookings.PitchType_Skey = PitchValues.PitchType_Skey
AND Bookings.ArrivalDate BETWEEN PitchValues.StartDate AND PitchValues.EndDate
WHERE (Bookings.Booking_Skey = 1)