Join Alias Columns SQL

Join Alias Columns SQL - postgresql

I am trying struggling with joining alias named columns. Overall, I want an output with the with date, hour, and the actual and forecasted (most recent before 10am on the previous day) windspeeds.
With the below code I get:
ERROR: column "date" does not exist
LINE xx: ...ast_prep.lat AND meso.lon = forecast_prep.lon AND Date ...
I cannot figure out how to get SQL to join these named columns.
Thanks. And yes, I am a SQL newbie.
with forecast_prep as (
SELECT
date_trunc('day', foretime)::date AS Foredate,
extract(hour from foretime)+1 AS foreHE,
lat,
lon,
windspeed,
max(as_of) AS as_of
FROM weather.forecast
WHERE date_trunc('day', foretime)::date-as_of>= interval '16 hours'
GROUP BY Foredate, foreHE, lat, lon, windspeed)
SELECT
meso.station,
date_trunc('day', meso.timestmp)::date AS Date,
extract(hour from meso.timestmp)+1 AS HE,
CAST(AVG(meso.windspd) as numeric(19,2)) As Actual,
forecast_prep.windspeed,
forecast_prep.as_of
FROM weather.meso
INNER JOIN forecast_prep ON (
meso.lat = forecast_prep.lat AND
meso.lon = forecast_prep.lon AND
Date = Foredate AND ----<<<< Error here
HE = foreHE)
WHERE
(meso.timestmp Between '2016-02-01' And '2016-02-02') AND
(meso.station='KSBN')
GROUP BY meso.station, Date, HE, forecast_prep.windspeed, forecast_prep.as_of
ORDER BY Date, HE ASC
Here are the table structures:
-- Table: weather.forecast
-- DROP TABLE weather.forecast;
CREATE TABLE weather.forecast
(
foretime timestamp without time zone NOT NULL,
as_of timestamp without time zone NOT NULL, -- in UTC
summary text,
precipintensity numeric(8,4),
precipprob numeric(2,2),
temperature numeric(5,2),
apptemp numeric(5,2),
dewpoint numeric(5,2),
humidity numeric(2,2),
windspeed numeric(5,2),
windbearing numeric(4,1),
visibility numeric(5,2),
cloudcover numeric(4,2),
pressure numeric(6,2),
ozone numeric(5,2),
preciptype text,
lat numeric(8,6) NOT NULL,
lon numeric(9,6) NOT NULL,
CONSTRAINT forecast_pkey PRIMARY KEY (foretime, as_of, lat, lon)
-- Table: weather.meso
-- DROP TABLE weather.meso;
CREATE TABLE weather.meso
(
timestmp timestamp without time zone NOT NULL,
station text NOT NULL,
lat numeric NOT NULL,
lon numeric NOT NULL,
tmp numeric,
hum numeric,
windspd numeric,
winddir integer,
dew numeric,
CONSTRAINT meso_pkey PRIMARY KEY (timestmp, station, lat, lon)

'Date' alias can't be seen from there.
You can use few tables after WITH, so I'll advice you to move second select there.
I'm not completly sure about weather.meso table structure but by guesing based on your query, this should work:
WITH
forecast_prep AS (
SELECT
date_trunc('day', foretime) :: DATE AS Foredate,
extract(HOUR FROM foretime) + 1 AS foreHE,
lat,
lon,
max(windspeed) as windspeed,
max(as_of) AS as_of
FROM weather.forecast
WHERE date_trunc('day', foretime) :: DATE - as_of >= INTERVAL '16 hours'
GROUP BY Foredate, foreHE, lat, lon
),
tmp AS (
SELECT
meso.station,
meso.lat,
meso.lon,
meso.timestmp,
date_trunc('day', meso.timestmp) :: DATE AS Date,
extract(HOUR FROM meso.timestmp) + 1 AS HE,
CAST(AVG(meso.windspd) AS NUMERIC(19, 2)) AS Actual
FROM weather.meso
GROUP BY station, lat, lon, timestmp, Date, HE
)
SELECT
tmp.station, tmp.Date, tmp.HE, tmp.Actual, forecast_prep.windspeed, forecast_prep.as_of
FROM tmp
INNER JOIN forecast_prep ON (
tmp.lat = forecast_prep.lat
AND tmp.lon = forecast_prep.lon
AND tmp.Date = forecast_prep.Foredate
AND tmp.HE = forecast_prep.foreHE
)
WHERE
(tmp.timestmp BETWEEN '2016-02-01' AND '2016-02-02')
AND (tmp.station = 'KSBN')
GROUP BY
tmp.station, tmp.Date, tmp.HE, forecast_prep.windspeed, forecast_prep.as_of, tmp.Actual
ORDER BY tmp.Date, tmp.HE ASC;
Like in first example right here https://www.postgresql.org/docs/8.4/static/queries-with.html

Related

Timescale interpolated_average with an additional group

Is there a way we add an extra group by to the toolkit_experimental.interpolated_average function? Say my data has power measurements for different sensors; how would I add a group by on the sensor_id?
with s as (
select sensor_id,
time_bucket('30 minutes', timestamp) bucket,
time_weight('LOCF', timestamp, value) agg
from
measurements m
inner join sensor_definition sd on m.sensor_id = sd.id
where asset_id = '<battery_id>' and sensor_name = 'power' and
timestamp between '2023-01-05 23:30:00' and '2023-01-07 00:30:00'
group by sensor_id, bucket)
select sensor_id,
bucket,
toolkit_experimental.interpolated_average(
agg,
bucket,
'30 minutes'::interval,
lag(agg) over (order by bucket),
lead(agg) over (order by bucket)
)
from s
group by sensor_id;
The above query does not work as I'd need to add bucket and agg as a group by column as well.
You can find the relevant schemas below.
create table measurements
(
sensor_id uuid not null,
timestamp timestamp with time zone not null,
value double precision not null
);
create table sensor_definition
(
id uuid default uuid_generate_v4() not null
primary key,
asset_id uuid not null,
sensor_name varchar(256) not null,
sensor_type varchar(256) not null,
unique (asset_id, sensor_name, sensor_type)
);
Any suggestions?

This is a great question and cool use case. There's definitely a way to do this! I like your CTE at the top, though I prefer to name them a little more descriptively. The join looks good for selection and you could even quite easily then sub out the "on-the-fly" aggregation for a continuous aggregate at some point in the future and just do the same join against the continuous aggregate...so that's great!
The only thing you need to do is modify the window clause of the lead and lag functions so that they understand that it's working on not the full ordered data set, and then you don't need a group by clause at all!
WITH weighted_sensor AS (
SELECT
sensor_id,
time_bucket('30 minutes', timestamp) bucket,
time_weight('LOCF', timestamp, value) agg
FROM
measurements m
INNER JOIN sensor_definition sd ON m.sensor_id = sd.id
WHERE asset_id = '<battery_id>' AND sensor_name = 'power' and
timestamp between '2023-01-05 23:30:00' and '2023-01-07 00:30:00'
GROUP BY sensor_id, bucket)
SELECT
sensor_id,
bucket,
toolkit_experimental.interpolated_average(
agg,
bucket,
'30 minutes'::interval,
lag(agg) OVER (PARTITION BY sensor_id ORDER BY bucket),
lead(agg) OVER (PARTITION BY sensor_id ORDER BY bucket)
)
FROM weighted_sensor;
You can also split out the window clause into a separate clause in the query and name it, this helps especially if you're using it more times, so if you were to use the integral function as well, for instance, to get total energy utilization in a period, you might do something like this:
WITH weighted_sensor AS (
SELECT
sensor_id,
time_bucket('30 minutes', timestamp) bucket,
time_weight('LOCF', timestamp, value) agg
FROM
measurements m
INNER JOIN sensor_definition sd ON m.sensor_id = sd.id
WHERE asset_id = '<battery_id>' AND sensor_name = 'power' and
timestamp between '2023-01-05 23:30:00' and '2023-01-07 00:30:00'
GROUP BY sensor_id, bucket)
SELECT
sensor_id,
bucket,
toolkit_experimental.interpolated_average(
agg,
bucket,
'30 minutes'::interval,
lag(agg) OVER sensor_times,
lead(agg) OVER sensor_times
),
toolkit_experimental.interpolated_integral(
agg,
bucket,
'30 minutes'::interval,
lag(agg) OVER sensor_times,
lead(agg) OVER sensor_times,
'hours'
)
FROM weighted_sensor
WINDOW sensor_times AS (PARTITION BY sensor_id ORDER BY bucket);
I used hours as the unit as I figure energy is often measured in watt-hours or the like...

PostgreSQL query to detect overlapping time ranges for INTERVAL

I have a table in Postgres which looks like below:
CREATE TABLE my_features
(
id uuid NOT NULL,
feature_id uuid NOT NULL,
begin_time timestamptz NOT NULL,
duration integer NOT NULL
)
For each feature_id there may be multiple rows with time ranges specified by begin_time .. (begin_time + duration). duration is in milliseconds. They may overlap. I'm looking for a fast way to find all feature_ids that have any overlaps.
I have referred to this - Query Overlapping time range which is similar but works on a fixed time end time.
I have tried the below query but it is throwing an error.
Query:
select c1.*
from my_features c1
where exists (select 1
from my_features c2
where tsrange(c2.begin_time, c2.begin_time + '30 minutes'::INTERVAL, '[]') && tsrange(c1.begin_time, c1.begin_time + '30 minutes'::INTERVAL, '[]')
and c2.feature_id = c1.feature_id
and c2.id <> c1.id);
Error:
ERROR: function tsrange(timestamp with time zone, timestamp with time zone, unknown) does not exist
LINE 5: where tsrange(c2.begin_time, c2.begin_time...
I have used a default time interval here because I did not understand how to convert the time into minutes and substitute it with 'n minutes'.

If you need a solution faster than O(n²), then you can use constraints on ranges with btree_gist extension, possibly on a temporary table:
CREATE TEMPORARY TABLE my_features_ranges (
id uuid NOT NULL,
feature_id uuid NOT NULL,
range tstzrange NOT NULL,
EXCLUDE USING GIST (feature_id WITH =, range WITH &&)
);
INSERT INTO my_features_ranges (id, feature_id, range)
select id, feature_id, tstzrange(begin_time, begin_time+duration*'1ms'::interval)
from my_features
on conflict do nothing;
select id from my_features except select id from my_features_ranges;

Using OVERLAPS predicate:
SELECT * -- DISTINCT f1.*
FROM my_features f1
JOIN my_features f2
ON f1.feature_id = f2.feature_id
AND f1.id <> f2.id
AND (f1.begin_time::date, f1.begin_time::date + '30 minutes'::INTERVAL)
OVERLAPS (f2.begin_time::date, f2.begin_time::date + '30 minutes'::INTERVAL);
db<>fiddle demo

Or try this
select c1.*
from jak.my_features c1
where exists (select 1
from jak.my_features c2
where tsrange(c2.begin_time::date, c2.begin_time::date + '30 minutes'::INTERVAL, '[]') && tsrange(c1.begin_time::date, c1.begin_time::date + '30 minutes'::INTERVAL, '[]') and
c2.feature_id = c1.feature_id
and c2.id <> c1.id);

The problem was, I was using tsrange on a column with timezone and for timestamp with timezone, there exist another function called tstzrange
Below worked for me:
EDIT: Added changes suggested by #a_horse_with_no_name
select c1.*
from my_features c1
where exists (select 1
from my_features c2
where tstzrange(c2.begin_time, c2.begin_time + make_interval(secs => c2.duration / 1000), '[]') && tstzrange(c1.begin_time, c1.begin_time + make_interval(secs => c1.duration / 1000), '[]')
and c2.feature_id = c1.feature_id
and c2.id <> c1.id);
However, the part of calculating interval dynamically is still pending

Select Based on Column Value in Postgres

I want to get values of TIMESTAMP and STRING_VALUE based on selected ID.
Suppose My Selected ID is 4259,4226 and 4259
Then It should select TIMESTAMP and STRING_VALUE for selected ID using CASE Statement.
I have tried Below query but returning Into Error
CREATE TABLE "DRL_FTO3_DI1_A0"
(
"VARIABLE" integer,
"CALCULATION" integer,
"TIMESTAMP_S" integer,
"TIMESTAMP_MS" integer,
"VALUE" double precision,
"STATUS" integer,
"GUID" character(36),
"STRVALUE" character varying(50)
)
INSERT INTO "DRL_FTO3_DI1_A0"(
"VARIABLE", "CALCULATION", "TIMESTAMP_S", "TIMESTAMP_MS", "VALUE",
"STATUS", "GUID", "STRVALUE")
VALUES (4226, 0, 1451120925, 329,0 , 1078067200, '', 'BATCH 1'),
(4306, 0, 1451120925, 329,0 , 1078067200, '', 'BATCH 2'),
(4311, 0, 1451120925, 329,0 , 1078067200, '', '2')
Now Suppose Out of three Variable(4226,4306,4311) I want to select 4226 and 4311
SELECT ((TIMESTAMP WITHOUT Time Zone 'epoch' + "TIMESTAMP_S" * INTERVAL '1 second') AT TIME ZONE 'UTC')::TIMESTAMP WITHOUT Time Zone,
SUM(CASE WHEN "VARIABLE" = 4226 Then "STRVALUE" END) as 'A',
SUM(CASE WHEN "VARIABLE" = 4311 Then "STRVALUE" END) as 'B'
FROM "DRL_FTO3_DI1_A0"
GROUP BY "TIMESTAMP_S"
ORDER BY "TIMESTAMP_S";
TIMESTAMP_S A B
2015-12-26 14:38:45 BATCH_1 2

This Is the Query using crosstab and It Works
SELECT *
FROM crosstab (
$$SELECT "VARIABLE", "TIMESTAMP_S", "STRVALUE"
FROM "DRL_FTO3_DI1_A0"
WHERE "VARIABLE" = ANY (array[4306,4226])
ORDER BY 1,2$$
)
AS
t (
"TIMESTAMP_S" integer,
"A" character varying,
"B" character varying
);

Filtering date does not return correct data

I have the following query.
SELECT *
FROM (SELECT temp.*, ROWNUM AS rn
FROM ( SELECT (id) M_ID,
CREATION_DATE,
RECIPIENT_STATUS,
PARENT_OR_CHILD,
CHILD_COUNT,
IS_PICKABLE,
IS_GOLDEN,
trxn_id,
id AS id,
MASTER_ID,
request_wf_state,
TITLE,
FIRST_NAME,
MIDDLE,
LAST_NAME,
FULL_NAME_LNF,
FULL_NAME_FNF,
NAME_OF_ORGANIZATION,
ADDRESS,
CITY,
STATE,
COUNTRY,
HCP_TYPE,
HCP_SUBTYPE,
is_edit_locked,
record_type rec_type,
DATA_SOURCE_NAME,
DEA_DATA,
NPI_DATA,
STATE_DATA,
RPPS,
SIREN_NUMBER,
FINESS,
ROW_NUMBER ()
OVER (PARTITION BY id ORDER BY full_name_fnf)
AS rp
FROM V_RECIPIENT_TRANS_SCRN_OP
WHERE 1 = 1
AND creation_date >=
to_date( '01-Sep-2015', 'DD-MON-YYYY') AND creation_date <=
to_date( '09-Sep-2015', 'DD-MON-YYYY')
ORDER BY CREATION_DATE DESC) temp
WHERE rp = 1)
WHERE rn > 0 AND rn < 10;
Issue is, that the above query does return data which has creation_date as '09-Sep-2015'.
NLS_DATE_FORMAT of my database is 'DD-MON-RR'.
Datatype of the column creation_date is date and the date format in which date is stored is MM/DD/YYYY.

Since your column creation_date has values with non-zero time components, and the result of to_date( '09-Sep-2015', 'DD-MON-YYYY') has a zero time component, the predicate creation_date <= to_date( '09-Sep-2015', 'DD-MON-YYYY') is unlikely to match. As an example, "9/9/2015 1:07:45 AM" is clearly greater than "9/9/2015 0:00:00 AM", which is returned by your to_date() call.
You will need to take into account the time component of the Oracle DATE data type.
One option is to use the trunc() function, as you did, to remove the time component from values of creation_date. However, this may prevent the use of index on creation_date if it exists.
A better alternative, in my view, would be to reformulate your predicate as creation_date < to_date( '10-Sep-2015', 'DD-MON-YYYY'), which would match any time values on the date of 09-Sep-2015.

Check value for every month in a year in T-sql using cursor

I need a example of a cursor for my meter system, where the system reads the meter every month.
The cursor needs to check, that every meter has a reading registered in the current year. For meters with missing readings, an estimated value is added, such that the daily consumption is like the daily comsumption in the previous period plus 15%. In no previous period exiss, the above Kwh value is used.

How about something like this. (The MonthSeed table could become a real table in your database)
declare #MonthSeed table (MonthNumber int)
insert into #MonthSeed values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)
-- assumes declared table "Reading" with fields ( Id int, [Date] datetime, MeterNo varchar(50), Consumption int )
select
m.MeterNo,
r.Date,
calculatedConsumption = isnull(r.Consumption, -- read consumption
isnull((select max(r2.Consumption) Consumption from Reading r2 where datepart(month, r2.Date) = (m.MonthNumber - 1) and r2.MeterNo = m.MeterNo) * 1.15, -- previous consumption + 15%
9999)) -- default consumption
from
(select distinct
MeterNo,
MonthNumber
from
Reading, #MonthSeed) m
left join
Reading r on r.MeterNo = m.MeterNo and datepart(month, r.Date) = m.monthNumber
EDIT FOLLOWING COMMENTS - EXAMPLE OF ADDING MISSING READINGS
As commented need to include an insert before the select insert into Reading (MeterNo, Date, Consumption) and making use of the left join to the reading table include a check for the reading id to be null ie missing where r.Id is null.
I noticed that this would result in null date entries when inserting into the reading table. So I included a date aggregate in the main sub-select Date = dateadd(month, monthnumber, #seeddate); the main select was amended to show a date for missing entries isnull(r.Date, m.Date),
I've calculated the #SeedDate to be the 1st of the current month one year ago but you may want to pass in an earlier date.
declare #MonthSeed table (MonthNumber int)
insert into #MonthSeed values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)
-- assumes declared table "Reading" with fields ( Id int, [Date] datetime, MeterNo varchar(50), Consumption int )
declare #SeedDate datetime = (select dateadd(month, datediff(month, 0, getdate())-12, 0)) -- this month, last year
insert into Reading (MeterNo, Date, Consumption)
select
m.MeterNo,
isnull(r.Date, m.Date),
calculatedConsumption =
isnull(r.Consumption, -- read consumption
isnull(1.15 * (select max(r2.Consumption) Consumption
from Reading r2
where datepart(month, r2.Date) = (m.MonthNumber - 1)
and r2.MeterNo = m.MeterNo), -- previous consumption + 15%
9999)) -- default consumption
from
(select distinct
MeterNo,
MonthNumber,
Date = dateadd(month, monthnumber, #seeddate)
from
Reading
cross join
#MonthSeed) m
left join
Reading r on r.MeterNo = m.MeterNo and datepart(month, r.Date) = m.monthNumber
where
r.Id is null
select * from Reading

(The following assumes SQL Server 2005 or later.)
Scrounge around in here and see if there's anything of value:
declare #StartDate as Date = '2012-01-01'
declare #Now as Date = GetDate()
declare #DefaultConsumption as Int = 2000 -- KWh.
declare #MeterReadings as Table
( MeterReadingId Int Identity, ReadingDate Date, MeterNumber VarChar(10), Consumption Int )
insert into #MeterReadings ( ReadingDate, MeterNumber, Consumption ) values
( '2012-01-13', 'E154', 2710 ),
( '2012-01-19', 'BR549', 650 ),
( '2012-02-15', 'E154', 2970 ),
( '2012-02-19', 'BR549', 618 ),
( '2012-03-16', 'BR549', 758 ),
( '2012-04-11', 'E154', 2633 ),
( '2012-04-20', 'BR549', 691 )
; with Months ( Month ) as (
select #StartDate as [Month]
union all
select DateAdd( mm, 1, Month )
from Months
where Month < #Now
),
MeterNumbers ( MeterNumber ) as (
select distinct MeterNumber
from #MeterReadings )
select M.Month, MN.MeterNumber,
MR.MeterReadingId, MR.ReadingDate, MR.Consumption,
Coalesce( MR.Consumption, #DefaultConsumption ) as [BillableConsumption],
( select Max( ReadingDate ) from #MeterReadings where MeterNumber = MN.MeterNumber and ReadingDate < M.Month ) as [PriorReadingDate],
( select Consumption from #MeterReadings where MeterNumber = MN.MeterNumber and ReadingDate =
( select Max( ReadingDate ) from #MeterReadings where MeterNumber = MN.MeterNumber and ReadingDate < M.Month ) ) as [PriorConsumption],
( select Consumption from #MeterReadings where MeterNumber = MN.MeterNumber and ReadingDate =
( select Max( ReadingDate ) from #MeterReadings where MeterNumber = MN.MeterNumber and ReadingDate < M.Month ) ) * 1.15 as [PriorConsumptionPlus15Percent]
from Months as M cross join
MeterNumbers as MN left outer join
#MeterReadings as MR on MR.MeterNumber = MN.MeterNumber and DateAdd( dd, 1 - DatePart( dd, MR.ReadingDate ), MR.ReadingDate ) = M.Month
order by M.Month, MN.MeterNumber

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Join Alias Columns SQL - postgresql

Related

Timescale interpolated_average with an additional group

PostgreSQL query to detect overlapping time ranges for INTERVAL

Select Based on Column Value in Postgres

Filtering date does not return correct data

Check value for every month in a year in T-sql using cursor

Categories

Resources