Selecting by ids with the precalculated order - postgresql

Here's the raw query I feed to the ORM I'm using:
SELECT
"fightEventId"
FROM
${EDBTableNames.LOCATION_TIME_SLOT} AS lts
WHERE
"fightId" IN (
SELECT
f.id
FROM
${EDBTableNames.FIGHTS} AS f
WHERE
f.status = '${EFightStatus.CONFIRMED}'
)
AND "fightEventId" IN (
SELECT
fe.id
FROM
${EDBTableNames.FIGHT_EVENTS} AS fe
WHERE
${status.includes(EFightEventStatus.ONGOING)}
AND (
NOW() at time zone 'utc' >= fe.from AND NOW() at time zone 'utc' <= fe.to
)
OR ${status.includes(EFightEventStatus.UPCOMING)} AND NOW() at time zone 'utc' <= fe.to
OR ${status.includes(EFightEventStatus.FINISHED)} AND NOW() at time zone 'utc' > fe.to
ORDER BY fe."from" ASC
)
GROUP BY "fightEventId"
HAVING
COUNT("fightId") > ${SHOW_WITH_NUMBER_OF_FIGHTS}
LIMIT ${limit}
OFFSET ${page * limit};
The problem with this query is that even though fight events are ordered by the "from" date: ORDER BY fe."from" ASC, this subquery order is not maintained in the whole query. I need it to be maintained.
What would be the right way to do this? By the "right way" I mean performance and clarity.
Here is a bunch of options, but I'm a little bit confused as to which one to go for.
ORDER BY the IN value list
P.S.
SHOW_WITH_NUMBER_OF_FIGHTS is an integer and as of now it's required to be equal to 4.

Related

Postgresql generate series with interval '15 minutes' longer than 29092 items

Sut:
create table meter.materialized_quarters
(
id int4 not null generated by default as identity,
tm timestamp without time zone
,constraint pk_materialized_quarters primary key (id)
--,constraint uq_materialized_quarters unique (tm)
);
Then setup data:
insert into meter.materialized_quarters (tm)
select GENERATE_SERIES ('1999-01-01', '2030-10-30', interval '15 minute');
And check data:
select count(*),tm
from meter.materialized_quarters
group by tm
having count(*)> 1
Some results:
count|tm |
-----+-----------------------+
2|1999-10-31 02:00:00.000|
2|1999-10-31 02:15:00.000|
2|1999-10-31 02:30:00.000|
2|1999-10-31 02:45:00.000|
2|2000-10-29 02:00:00.000|
2|2000-10-29 02:15:00.000|
2|2000-10-29 02:30:00.000|
2|2000-10-29 02:45:00.000|
2|2001-10-28 02:00:00.000|
2|2001-10-28 02:15:00.000|
2|2001-10-28 02:30:00.000|
....
Details:
select * from meter.materialized_quarters where tm = '1999-10-31 01:45:00';
Result:
id |tm |
-----+-----------------------+
29092|1999-10-31 01:45:00.000|
As I see, 29092 is maximum series of nonduplicated data generated by: GENERATE_SERIES with 15 minutes interval.
How to fill table (meter.materialized_quarters) from 1999 to 2030?
One solution is:
insert into meter.materialized_quarters (tm)
select GENERATE_SERIES ('1999-01-01', '1999-10-31 01:45:00', interval '15 minute');
then:
insert into meter.materialized_quarters (tm)
select GENERATE_SERIES ('1999-10-31 02:00:00.000', '2000-10-29 00:00:00.000', interval '15 minute');
and again, and again.
Or
with bad as (
select count(*),tm
from meter.materialized_quarters
group by tm
having count(*)> 1
)
, ids as (
select mq1.id, mq2.id as iddel
from meter.materialized_quarters mq1 inner join bad on bad.tm = mq1.tm inner join meter.materialized_quarters mq2 on bad.tm = mq2.tm
where mq1.id<mq2.id
)
delete from meter.materialized_quarters
where id in (select iddel from ids);
Is there more 'elegant' way?
EDIT.
I see the problem.
xxxx-10-29 02:00:00 - summer time become winter time.
select GENERATE_SERIES ('1999-10-31 01:45:00', '1999-10-31 02:00:00', interval '15 minute');
Your problem is the conversion from timestamp WITH time zone which is returned by generate_series() and your column which is defined as timestamp WITHOUT time zone.
1999-10-31 is the day where daylight savings time changes (at least in some countries)
If you change your column to timestamp WITH time zone your code works without any modification.
Example
If you want to stick with timestamp WITHOUT timestamp you need to convert the value returned by generate_series()
insert into materialized_quarters (tm)
select g.tm at time zone 'UTC' --<< change to the time zone you need
from GENERATE_SERIES ('1999-01-01', '2030-10-30', interval '15 minute') as g(tm)
Example

Trouble joining generate_series timestamp without time zone on a field that's timestamp without timezone

I am trying to figure out a way to report how many people are in a location at the same time, down to the second.
I have a table with the id for the person, the date they entered, the time they entered, the date they left and the time they left.
example:
select unique_id, start_date, start_time, end_date, end_time
from My_Table
where start_date between '09/01/2019' and '09/02/2019'
limit 3
"unique_id" "start_date" "start_time" "end_date" "end_time"
989179 "2019-09-01" "06:03:13" "2019-09-01" "06:03:55"
995203 "2019-09-01" "11:29:27" "2019-09-01" "11:30:13"
917637 "2019-09-01" "11:06:46" "2019-09-01" "11:06:59"
i've concatenated the start_date & start_time as well as end_date & end_time so they are 2 fields
select unique_id, ((start_date + start_time)::timestamp without time zone) as start_date,
((end_date + end_time)::timestamp without time zone) as end_date
result example:
"start_date"
"2019-09-01 09:28:54"
so i'm making that a CTE, then using a second CTE that uses generate_series between dates down to the second.
The goal being, the generate series will have a row for every second between the two dates. Then when i join my data sets, i can count how many records exist in my_table where the start_date(plus time) is equal or greater than the generate_series date_time field, and the end_date(plus time) is less than or equal to the generate_series date_time field.
i feel that was harder to explain than it needed to be.
in theory, if a person was in the room from 2019-09-01 00:01:01 and left at 2019-09-01 00:01:03, i would count that record in the generate_series rows 2019-09-01 00:01:01, 2019-09-01 00:01:02 & 2019-09-01 00:01:03.
When i look at the data i can see that i should be returning hundreds of people in the room at specific peak periods. but the query returns all 0's.
is this possibly a field formatting issue i need to adjust?
Here is the query:
with CTE as (
select unique_id, ((start_date+start_time)::timestamp without time zone) as start_date,
((end_date+end_time)::timestamp without time zone) as end_date
from My_table
where start_date between '09/01/2019' and '09/02/2019'
),
time_series as (
select generate_series( (date '2019-09-01')::timestamp, (date '2019-09-02')::timestamp, interval '1 second') as date_time
)
/*FINAL SELECT*/
select date_time, count(B.unique_id) as NumPpl
FROM (
select A.date_time
FROM time_series a
)x
left join CTE b on b.start_date >= x.date_time AND b.end_date <= x.date_time
GROUP BY 1
ORDER BY 1
(partial) result screenshot
Thank you in advance
i should also add i have read only access to this database so i'm not able to create functions.
Simple version: b.start_date >= x.date_time AND b.end_date <= x.date_time will never be true assuming end_date is always after start_date.
Longer version: You also do not need a CTE for the generate_series() and there is no reason for selecting all columns and all rows of this CTE as a subquery. I would also drop the CTE for your original data and just join it to the seconds (NOTE: this does somehow change the query, since you might now take those entries into account, where start_date is earlier than 2019-09-01. If you do not want this, you can add your condition again to the join condition. But I guess this is what you really wanted). I also removed some casts which were not needed. Try this:
SELECT gs.second, COUNT(my.unique_id)
FROM generate_series('2019-09-01'::timestamp, '2019-09-02'::timestamp, interval '1 second') gs (second)
LEFT JOIN my_table my ON (my.start_date + my.start_time) <= gs.second
AND (my.end_date + my.end_time) >= gs.second
GROUP BY 1
ORDER BY 1

Postgres search available time slots with generate_series

I have a table in my postgres database which has a column of dates. I want to search which of those dates is missing - for example:
date
2016-11-09 18:30:00
2016-11-09 19:00:00
2016-11-09 20:15:00
2016-11-09 22:20:00
2016-11-09 23:00:00
Here, |2016-11-09 21:00:00| is missing. After sorting my generated series if my table has an entry between two slots (slot of 1 hr interval) i need to remove that.
I want to make a query with generate_series that returns me the date which is missing. Is this possible?.
sample query that i used to generate series.
SELECT t
FROM generate_series(
TIMESTAMP WITH TIME ZONE '2016-11-09 18:00:00',
TIMESTAMP WITH TIME ZONE '2016-11-09 23:00:00',
INTERVAL '1 hour'
) t
EXCEPT
SELECT tscol
FROM mytable;
But this query is not removing 2016-11-09 18:30:00,2016-11-09 20:15:00 etc. cuz i used except.
This is not a gaps-and-island problem. You just want to find the 1 hour intervals for which no record exist in the table.
EXCEPT does not work here because it does equality comparison, while you want to check if a record exists or not within a range.
A typical solution for this is to use a left join antipattern:
select dt
from generate_series(
timestamp with time zone '2016-11-09 18:00:00',
timestamp with time zone '2016-11-09 23:00:00',
interval '1 hour'
) d(dt)
left join mytable t
on t.tscol >= dt and t.tscol < dt + interval '1 hour'
where t.tscol is null
You can also use not exists:
select dt
from generate_series(
timestamp with time zone '2016-11-09 18:00:00',
timestamp with time zone '2016-11-09 23:00:00',
interval '1 hour'
) d(dt)
where not exists (
select 1
from mytable t
where t.tscol >= dt and t.tscol < dt + interval '1 hour'
)
In this demo on DB Fiddle, both queries return:
| dt |
| :--------------------- |
| 2016-11-09 21:00:00+00 |

max(value) returning 1 empty row on Postgres 9.3.5, works on 8.4

I have a table with epoch values (one per minute, the epoch itself is in milliseconds) and temperatures.
select * from outdoor_temperature order by time desc;
time | value
---------------+-------
1423385340000 | 31.6
1423385280000 | 31.6
1423385220000 | 31.7
1423385160000 | 31.7
1423385100000 | 31.7
1423385040000 | 31.8
1423384980000 | 31.8
1423384920000 | 31.8
1423384860000 | 31.8
[...]
I want to get the highest single value in a given day, which I'm doing like this:
SELECT *
FROM
outdoor_temperature
WHERE
value = (
SELECT max(value)
FROM outdoor_temperature
WHERE
((timestamp with time zone 'epoch' + (time::float/1000) * interval '1 second') at time zone 'Australia/Sydney')::date
= '2015-02-05' at time zone 'Australia/Sydney'
)
AND
((timestamp with time zone 'epoch' + (time::float/1000) * interval '1 second') at time zone 'Australia/Sydney')::date
= '2015-02-05' at time zone 'Australia/Sydney'
ORDER BY time DESC LIMIT 1;
On my Linode, running CentOS 5 and Postgres 8.4, it returns perfectly (I get a single value, within that date, with the maximum temperature). On my MacBook Pro with Postgres 9.3.5, however, the exact same query against the exact same data doesn't return anything. I started simplifying everything to work out what was going wrong, and got to here:
SELECT max(value)
FROM outdoor_temperature
WHERE
((timestamp with time zone 'epoch' + (time::float/1000) * interval '1 second') at time zone 'Australia/Sydney')::date
= '2015-02-05' at time zone 'Australia/Sydney';
max
-----
(1 row)
It's empty, and yet returning one row?!
My questions are:
Firstly, why is that query working against Postgres 8.4 and doing something different on 9.3.5?
Secondly, is there a much simpler way to achieve what I'm trying to do? I feel like there should be but if so I've not managed to work it out. This ultimately needs to work on Postgres 8.4.
I'm not really sure why you're getting no results - you seem to simply miss data for this day.
But you really should use another query for selecting a date, as your query would not be able to use an index.
You should select like this:
select max(value) from outdoor_temperature where
time>=extract(
epoch from
'2015-02-05'::timestamp at time zone 'Australia/Sydney'
)
and
time<extract(
epoch from
('2015-02-05'::timestamp+'1 day'::interval) at time zone 'Australia/Sydney'
)
;
This is much simpler and this way your database would be able to use an index on time, which should be a primary key (with automatic index).

How define today date with Default timestmp

I am using postgressql i wish to get the data for currentdate, i want filter the data based on the date
in data base my plandate filed is define as Time stamp with time zone so its showing like this format 2013-09-01 03:22:01.438348+05:30 my query is like this
select ttodoid ,date,details from ttodo where date=currentdate():
but currentdate function giving me just date '2013-10-06' based on this result is no rows how can i manage it for today date detail
UPDATED: One way to do it
SELECT *
FROM ttodo
WHERE date BETWEEN DATE_TRUNC('day', CURRENT_TIMESTAMP)
AND DATE_TRUNC('day', CURRENT_TIMESTAMP)
+ INTERVAL '1 DAY'
- INTERVAL '1 MICROSECOND';
or
SELECT *
FROM ttodo
WHERE date >= DATE_TRUNC('day', CURRENT_TIMESTAMP)
AND date < DATE_TRUNC('day', CURRENT_TIMESTAMP)
+ INTERVAL '1 DAY';
Here is SQLFiddle demo
select * from ttodo where (ttodo.todoplandate::date = current_date) or
(ttodo.todoplandate::date < current_date
I think the easier approach would be just to convert your field to date:
SELECT ttodoid ,date,details FROM ttodo
WHERE CAST(date AS DATE) = current_date;
Notice that, ff you want this query to be indexed, you have to create the index with the cast:
CREATE INDEX idx_ttodo_date ON ttodo ((CAST(date AS DATE)));
Another approach, is instead of casting the field, is checking the intervals, something similar of what petern proposed, but with correct intervals:
SELECT ttodoid ,date,details FROM ttodo
WHERE date >= date_trunc('day', current_timestamp)
AND date < (date_trunc('day', current_timestamp) + interval '1day');
This approach has the advantage that it can use an index on the date field only, which is good if you already have it.