Include value from cte when it has not match - postgresql

In my table I have some entries which - by the table's date column - is not older than 2016-01-04 (January 4, 2016).
Now I would like to make a query which more or less counts the number of rows which have a specific date value, but I'd like this query to be able to return a 0 count for dates not present in table.
I have this:
with date_count as (select '2016-01-01'::date + CAST(offs || ' days' as
interval) as date from generate_series(0, 6, 1) AS offs ) select
date_count.date, count(allocation_id) as packs_used from medicine_allocation,
date_count where site_id = 1 and allocation_id is not null and timestamp
between date_count.date and date_count.date + interval '1 days' group by
date_count.date order by date_count.date;
This surely gives me a nice aggregated view of the date in my table, but since no rows are from before January 4 2016, they don't show in the result:
"2016-01-04 00:00:00";1
"2016-01-05 00:00:00";2
"2016-01-06 00:00:00";4
"2016-01-07 00:00:00";3
I would like this:
"2016-01-01 00:00:00";0
"2016-01-02 00:00:00";0
"2016-01-03 00:00:00";0
"2016-01-04 00:00:00";1
"2016-01-05 00:00:00";2
"2016-01-06 00:00:00";4
"2016-01-07 00:00:00";3
I have also tried right join on the cte, but this yields the same result. I cannot quite grasp how to do this... any help out there?
Best,
Janus

You simply need a left join:
with date_count as (
select '2016-01-01'::date + CAST(offs || ' days' as
interval) as date
from generate_series(0, 6, 1) AS offs
)
select dc.date, count(ma.allocation_id) as packs_used
from date_count dc left join
medicine_allocation ma
on ma.site_id = 1 and ma.allocation_id is not null and
ma.timestamp between dc.date and dc.date + interval '1 days'
group by dc.date
order by dc.date;
A word of advice: Never use commas in the FROM clause. Always use explicit JOIN syntax.
You will also notice that the where conditions were moved to the ON clause. That is necessary because they are on the second table.

Related

T-SQL apply where clause to function fields not working

I'm having a hard time filtering this view by CreateDate. The CreateDate in the table is in the following format: 2013-10-14 15:53:33.900
I managed to DATEPART the year month and day into separate columns, but now it's not letting me use my WHERE clause on those newly created columns. Specifically, the error is "Invalid Column Name CreateYear" for both lines. What am I doing wrong here guys? Is there a better/easier way to do this than parse out the day, month, and year? It seems overkill. I've spent quite a bit of hours on this to no avail.
SELECT convert(varchar, DATEPART(month,v.CreateDate)) CreateMonth,
convert(varchar, DATEPART(DAY,v.CreateDate)) CreateDay,
convert(varchar, DATEPART(YEAR,v.CreateDate)) CreateYear,
v.CreateDate,
v.customerName
From
vw_Name_SQL_DailyPartsUsage v
full outer join
ABC.serviceteamstechnicians t on v.TechnicianNumber = t.AgentNumber
full outer join
ABC.ServiceTeams s on t.STID = s.STID
where
CreateYear >= '02/01/2018'
and
CreateYear <= '02/20/2018'
You cannot reference an alias from the select in the where
Even if you could why would you expect year to be '02/01/2018'
Why are you converting to varchar
where year(v.CreateDate) = 2018
or
select crdate, cast(crdate as date), year(crdate), month(crdate), day(crdate)
from sysObjects
where cast(crdate as date) <= '2014-2-20'
and cast(crdate as date) >= '2000-2-10'
order by crdate
You could use:
SELECT convert(varchar, DATEPART(month,v.CreateDate)) CreateMonth,
convert(varchar, DATEPART(DAY,v.CreateDate)) CreateDay,
convert(varchar, DATEPART(YEAR,v.CreateDate)) CreateYear,
v.CreateDate,
v.customerName
From vw_Name_SQL_DailyPartsUsage v
full outer join
ABC.serviceteamstechnicians t on v.TechnicianNumber = t.AgentNumber
full outer join
ABC.ServiceTeams s on t.STID = s.STID
where CreateDate BETWEEN '20180102' and '20180220';
More info about the logical query processing is that you cannot refer to a column alias at SELECT in the WHERE clause without using a subquery/CROSS APPLY.

PLSQL group by with range table joining is showing proper count

Hi I am joining a table with range of 1 month days, to get the per day count based of join table(base table).
For that I using left outer join to get count of per day.
where my base table is as shown below (table name REGISTRIERUNG]
And I have create range of one month using below query
SELECT TO_DATE ('01-10-2017', 'dd-mm-yyyy') + ROWNUM - 1 AS daterange
FROM all_objects
WHERE ROWNUM <=
TO_DATE ('30-10-2017', 'dd-mm-yyyy')
- TO_DATE ('01-10-2017', 'dd-mm-yyyy')
+ 1;
but I getting count 1 for date where there now record matching with range table
instead of 0 count.
I am using below query for final result.
SELECT TRUNC (a.daterange), COUNT (a.daterange)
FROM (SELECT TO_DATE ('01-10-2017', 'dd-mm-yyyy') + ROWNUM - 1
AS daterange
FROM all_objects
WHERE ROWNUM <=
TO_DATE ('30-10-2017', 'dd-mm-yyyy')
- TO_DATE ('01-10-2017', 'dd-mm-yyyy')
+ 1) a
LEFT OUTER JOIN
REGISTRIERUNG b
ON TRUNC (a.daterange) = TRUNC (b.MODIFIKATIONZEIT)
GROUP BY TRUNC (a.daterange)
ORDER BY TRUNC (a.daterange) ASC;
You should not count rows based on the column that is always populated (in your query a.daterange is always populated because this column from your inline view has all the dates in a month). Rather, you should count number of rows from the table that is outer-joined to the inline view with generated dates. Note that count function will not take into account rows that have null value in the column modifikationzeit.
For instance:
select a.daterange,
count(b.modifikationzeit)
from (select to_date('01-10-2017', 'dd-mm-yyyy') + level - 1 as daterange
from dual
connect by level <= to_date('31-10-2017', 'dd-mm-yyyy') -
to_date('01-10-2017', 'dd-mm-yyyy') + 1) a
left outer join registrierung b
on a.daterange = trunc(b.modifikationzeit)
group by a.daterange
order by a.daterange;
I have removed unnecessary trunc's, and converted query from all_objects to the one that uses connect by clause. I also fixed date generation for October - it has 31 days, not 30 as per your example.

Postgres SQL - How to create a dynamic date variable

I want my query to have a dynamic date. The way it is written now, I would have to manually change the date every time. Please see the following as an example:
(select*
from table2
where table2.begin_timestamp::date = '2015-04-01')as start
left outer join
(Select *
from table 1
where opened_at::date >= ('2015-04-01' - 15)
and opened_at::date <= '2015-04-01’)
I don't want '2015-04-01' to be hard-coded. I want to run this query over and over for a series of dates.
Using normal joins, you can do this in an on clause or where clause but not inside the subquery. That leads to logic like this:
from (select*
from table2
) start left outer join
table 1
on opened_at::date >= table2.begin_timestamp::date - interval '15 day' and
opened_at::date <= table2.begin_timestamp::date
I'm not a postgres developer but I think you can adapt a technique from the sql server world called "tally tables".
Esentially your goal is to join day d and the window of days that are at most 15 days greater than it.
You can use something like
SELECT * FROM generate_series('2015-04-01'::timestamp,
'2015-04-30 00:00', '1 days');
To generate a date sequence and from there you can write something like
select *
from table a
join generate_series('2015-04-01'::timestamp,'2015-04-30','1 days') s(o)
on a.begin_timestamp::date = s.o
join table2 b
on a.opened_at>= b.begin_timestamp::date - interval '15 days'
and opened_at::date <= table2.begintimestamp::date
Essentially, instead of looping you use a series of the dates between the beginning of the interval and the end of the range to produce the results you are after.

How do I SELECT a number to put into INTERVAL as part of date addition function in PostgreSQL

I have a table which contains a timezone offset, +1, -7, +5 etc.
I have another, joinable, table which contains logging information including a datetime.
I would like to select out the datetime from the second table with the offset added on to it using for example, INTERVAL '1 hours', INTERVAL '-7 hours' etc.
My psuedocode would be something like:
SELECT l.status, l.inserted + INTERVAL (coalesce(timezone_offset, '0')||' hours' ) AS inserted FROM log l
LEFT OUTER JOIN users u on u.usersid = l.usersid
LEFT OUTER JOIN companies c on c.companiesid = u.companiesid
WHERE l.usersid=?
This doesn't work, but I can't figure out how to make it work in PostgreSQL 8.3.
Any ideas?
It turns out you can multiply an INTERVAL by a number. So if I create an INTERVAL of '1 hours' and multiply that by the stored offset, that will work, yielding:
SELECT l.status, l.inserted + (INTERVAL '1 hours' * coalesce(timezone_offset, '0')) FROM log as l
LEFT OUTER JOIN users u on u.usersid = l.usersid
LEFT OUTER JOIN companies c on c.companiesid = u.companiesid
WHERE l.usersid=?
et voila!
You can also use concatenation, so (l.inserted || ' hours')::INTERVAL will also work.

Postgresql SQL GROUP BY time interval with arbitrary accuracy (down to milli seconds)

I have my measurement data stored into the following structure:
CREATE TABLE measurements(
measured_at TIMESTAMPTZ,
val INTEGER
);
I already know that using
(a) date_trunc('hour',measured_at)
AND
(b) generate_series
I would be able to aggregate my data by:
microseconds,
milliseconds
.
.
.
But is it possible to aggregate the data by 5 minutes or let's say an arbitrary amount of seconds? Is it possible to aggregate measured data by an arbitrary multiple of seconds?
I need the data aggregated by different time resolutions to feed them into a FFT or an AR-Model in order to see possible seasonalities.
You can generate a table of "buckets" by adding intervals created by generate_series(). This SQL statement will generate a table of five-minute buckets for the first day (the value of min(measured_at)) in your data.
select
(select min(measured_at)::date from measurements) + ( n || ' minutes')::interval start_time,
(select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time
from generate_series(0, (24*60), 5) n
Wrap that statement in a common table expression, and you can join and group on it as if it were a base table.
with five_min_intervals as (
select
(select min(measured_at)::date from measurements) + ( n || ' minutes')::interval start_time,
(select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time
from generate_series(0, (24*60), 5) n
)
select f.start_time, f.end_time, avg(m.val) avg_val
from measurements m
right join five_min_intervals f
on m.measured_at >= f.start_time and m.measured_at < f.end_time
group by f.start_time, f.end_time
order by f.start_time
Grouping by an arbitrary number of seconds is similar--use date_trunc().
A more general use of generate_series() lets you avoid guessing the upper limit for five-minute buckets. In practice, you'd probably build this as a view or a function. You might get better performance from a base table.
select
(select min(measured_at)::date from measurements) + ( n || ' minutes')::interval start_time,
(select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time
from generate_series(0, ((select max(measured_at)::date - min(measured_at)::date from measurements) + 1)*24*60, 5) n;
Catcall has a great answer. My example of using it demonstrates having fixed buckets - in this case 30 minute intervals starting at midnight. It also shows that there can be one extra bucket generated in Catcall's first version and how to eliminate it. I wanted exactly 48 buckets in a day. In my problem, observations have separate date and time columns and I want to average the observations within a 30 minute period across the month for a number of different services.
with intervals as (
select
(n||' minutes')::interval as start_time,
((n+30)|| ' minutes')::interval as end_time
from generate_series(0, (23*60+30), 30) n
)
select i.start_time, o.service, avg(o.o)
from
observations o right join intervals i
on o.time >= i.start_time and o.time < i.end_time
where o.date between '2013-01-01' and '2013-01-31'
group by i.start_time, i.end_time, o.service
order by i.start_time
How about
SELECT MIN(val),
EXTRACT(epoch FROM measured_at) / EXTRACT(epoch FROM INTERVAL '5 min') AS int
FROM measurements
GROUP BY int
where '5 min' can be any expression supported by INTERVAL
The following will give you buckets of any size, even if they don't aline well with a nice minute/hour/whatever boundary. The value "300" is for a 5 minute grouping, but any value can be substituted:
select measured_at,
val,
(date_trunc('seconds', (measured_at - timestamptz 'epoch') / 300) * 300 + timestamptz 'epoch') as aligned_measured_at
from measurements;
You can then use whatever aggregate you need around "val", and use "group by aligned_measured_at" as required.
This is based on Mike Sherrill's answer, except that it uses timestamp intervals instead of separate start/end columns.
with intervals as (
select tstzrange(s, s + '5 minutes') das_interval
from (select generate_series(min(lower(time_range)), max(upper(time_rage)), '5 minutes') s
from your_table) x)
select das_interval, your_table.*
from your_table
right join intervals on time_range && das_interval
order by das_interval;
From PostgreSQL v14 on, you can use the date_bin function for that:
SELECT date_bin(
INTERVAL '5 minutes',
measured_at,
TIMSTAMPTZ '2000-01-01'
),
sum(val)
FROM measurements
GROUP BY 1;
I wanted to look at the past 24 hours of data and count things in hourly increments. I started Cat Recall's solution, which is pretty slick. It's bound to the data, though, rather than just what's happened in the past 24H. So I refactored and ended up with something pretty close to Julian's solution, but with more CTE. So it's sort of the marriage of the 2 answers.
WITH interval_query AS (
SELECT (ts ||' hour')::INTERVAL AS hour_interval
FROM generate_series(0,23) AS ts
), time_series AS (
SELECT date_trunc('hour', now()) + INTERVAL '60 min' * ROUND(date_part('minute', now()) / 60.0) - interval_query.hour_interval AS start_time
FROM interval_query
), time_intervals AS (
SELECT start_time, start_time + '1 hour'::INTERVAL AS end_time
FROM time_series ORDER BY start_time
), reading_counts AS (
SELECT f.start_time, f.end_time, br.minor, count(br.id) readings
FROM beacon_readings br
RIGHT JOIN time_intervals f
ON br.reading_timestamp >= f.start_time AND br.reading_timestamp < f.end_time AND br.major = 4
GROUP BY f.start_time, f.end_time, br.minor
ORDER BY f.start_time, br.minor
)
SELECT * FROM reading_counts
Note that any additional limiting I wanted in the final query needed to be done in the RIGHT JOIN. I'm not suggesting this is necessarily the best (or even a good approach), but it is something I'm running with (at least at the moment) in a dashboard.
The Timescale extension for PostgreSQL gives the ability to group by arbitrary time intervals. The function is called time_bucket() and has the same syntax as the date_trunc() function but takes an interval instead of a time precision as first parameter. Here you can find its API Docs. This is an example:
SELECT
time_bucket('5 minutes', observation_time) as bucket,
device_id,
avg(metric) as metric_avg,
max(metric) - min(metric) as metric_spread
FROM
device_readings
GROUP BY bucket, device_id;
You may also take a look at the continuous aggregate views if you want the 'grouped by an interval' views be updated automatically with new ingested data and if you want to query these views on a frequent basis. This can save you a lot of resources and will make your queries a lot faster.
I've taken a synthesis of all the above to try and come up with something slightly easier to use;
create or replace function interval_generator(start_ts timestamp with TIME ZONE, end_ts timestamp with TIME ZONE, round_interval INTERVAL)
returns TABLE(start_time timestamp with TIME ZONE, end_time timestamp with TIME ZONE) as $$
BEGIN
return query
SELECT
(n) start_time,
(n + round_interval) end_time
FROM generate_series(date_trunc('minute', start_ts), end_ts, round_interval) n;
END
$$
LANGUAGE 'plpgsql';
This function is a timestamp abstraction of Mikes answer, which (IMO) makes things a little cleaner, especially if you're generating queries on the client end.
Also using an inner join gets rid of the sea of NULLs that appeared previously.
with intervals as (select * from interval_generator(NOW() - INTERVAL '24 hours' , NOW(), '30 seconds'::INTERVAL))
select f.start_time, m.session_id, m.metric, min(m.value) min_val, avg(m.value) avg_val, max(m.value) max_val
from ts_combined as m
inner JOIN intervals f
on m.time >= f.start_time and m.time < f.end_time
GROUP BY f.start_time, f.end_time, m.metric, m.session_id
ORDER BY f.start_time desc
(Also for my purposes I added in a few more aggregation fields)
Perhaps, you can extract(epoch from measured_at) and go from that?