Speedy SQL comparision (will position matter)? - operator-keyword

Is there any performance issue between the below queries because of the position of what is being compared - the one from system's value and data from hard-disk?
SELECT * FROM hashes WHERE hash_key='HASH_KEY' AND NOW() <= expires_on;
SELECT * FROM hashes WHERE hash_key='HASH_KEY' AND expires_on >= NOW();
Will NOW() <= expires_on and expires_on >= NOW() make any difference over large records?

No, it should not make any difference.
What can make a difference is the position of hash_key='HASH_KEY' respectively to the one of expires_on >= NOW().
Query at the end of the statement are executed before, so it's generally a good idea to put the one that filters the most at the end.

Related

Postgres - Select all rows with a date column value within a range of another row's value?

I'm trying to use a date value as a starting point to construct a date range within a single postgres query. The date value would be something like
SELECT upgraded_at FROM accounts ORDER BY upgraded_at DESC limit 1;
which would then be used as the starting point. I then want to do something like
SELECT * from accounts WHERE upgraded_at >= (basis_date - 2 days) AND upgraded_at < (basis_date + 2 days);
Ideally I'd like to accomplish this with a single query. So I'll need to some subquery to get the starting date, then use that as a variable within the rest of the query.
Also eventually I'm going to be doing this within Sequelize. I definitely need the raw SQL way to do it but I'm also curious if later there's a Sequelize-specific way.
You can actually avoid making two references to the basis date here.
WITH cte AS (
SELECT *, MAX(upgraded_at) OVER () AS max_upgraded_at
FROM accounts
)
SELECT *
FROM cte
WHERE upgraded_at - max_upgraded_at BETWEEN -2 AND 2;

Ecto.Adapters.SQL.query! gives a different result

So this is apparently one of these weird days... And I know this makes 0 sense.
I'm executing a query in datagrip (a tool to execute raw querys) to the exact same database as in my phoenix application. And they are returning different results.
The query is quite complicated, but it's the only query that shows different results. So I cannot simplify it. I've tried other queries to be sure that I'm having the same database, restarted the server etc.
Here is the exact same query executed from my console. As you can see it is not the same result. A few rows are missing.
I have also checked if this is a timing issue by executing select now() => same result (more or less obviously). If I execute only the generate_series part, it returns the same result. So it could have something to do with the join.
I also checked the last few entries in the ttnmessages table just to be sure there is no general caching issue. The queries do also give the same result there.
So my question is: Is there anything that Ecto does differently upon executing a query? How can I figure this out? I'm grateful for any hint.
EDIT: The query is in both cases:
SELECT g.series AS time, MAX((t.payload ->'pulse')::text::numeric) as pulse
FROM generate_series(date_trunc('hour', now())- INTERVAL '12 hours', date_trunc('hour', now()), INTERVAL '60 min') AS g(series)
LEFT JOIN ttnmessages t
ON t.inserted_at < g.series + INTERVAL '60 min'
AND t.inserted_at > g.series
WHERE t.hardware_serial LIKE '093B55DF0C2C525A'
GROUP BY g.series
ORDER BY g.series;
While I did not find out the cause, I changed the query to the following:
SELECT MAX(t.inserted_at) as time, (t.payload ->'pulse')::text::numeric as pulse
FROM ttnmessages t
WHERE t.inserted_at > now() - INTERVAL '12 hours'
AND t.payload ->'pulse' IS NOT NULL
AND t.hardware_serial LIKE '093B55DF0C2C525A'
GROUP BY (t.payload ->'pulse')
ORDER BY time;
Runtime is < 50ms, so I'm happy with the result.
And I'll ignore the different results from the question. The query here returns the same result just like it's supposed to.

Postgres index timestamp with timezone column

I'm running PostgreSQL 9.6, and I have a table named decks with an expiration column of type timestamp with time zone (for storing decks of cards where each card can expire independently).
I'd like to create a nightly cron job that finds all cards which expired at any point during the previous day—i.e. between 0:00 and 23:59 inclusive.
This seems to gives me the time range I want...
SELECT id
FROM decks
WHERE expiration >= (now()::date - 1)::timestamptz
AND expiration < (now()::date)::timestamptz;
...but I'm wondering two things:
What's the best way to index the expiration column for my scenario?
Is there a better/cleaner way to specify the start and end times?
Question 1: For that query, a standard index is the best option. However, see below.
Question 2: Lots of options, here. A quick change to your query:
SELECT id
FROM decks
WHERE expiration::date = (now()::date - 1);
... allows you to create a functional index on expiration::date which should be smaller, and a bit more efficient.
Personally, I'd go a bit further and use current_date instead of now():
SELECT id
FROM decks
WHERE expiration::date = (current_date - 1);
As always, I recommend use of EXPLAIN and EXPLAIN ANALYZE when evaluating indexes.

Posgres 10: "spoofing" now() in an immutable function. A safe idea?

My app reports rainfall and streamflow information to whitewater boaters. Postgres is my data store for the gauge readings that come in 15 minute intervals. Over time, these tables get pretty big and the availabity of range partitioning in Postgres 10 inspired me to leave my shared hosting service and build a server from scratch at Linode. My queries on these large tables became way faster after I partitioned the readings into 2 week chunks. Several months down the road, I checked out the query plan and was very surprised to see that using now() in a query caused PG to scan allof the indexes on my partitioned tables. What the heck?!?! Isn't the point of partitiong data is to avoid situations like this?
Here's my set up: my partitioned table
CREATE TABLE public.precip
(
gauge_id smallint,
inches numeric(8, 2),
reading_time timestamp with time zone
) PARTITION BY RANGE (reading_time)
I've created partitions for every two weeks, so I have about 50 partition tables so far. One of my partitions:
CREATE TABLE public.precip_y2017w48 PARTITION OF public.precip
FOR VALUES FROM ('2017-12-03 00:00:00-05') TO ('2017-12-17 00:00:00-05');
Then each partition is indexed on gauge_id and reading_time
I have a lot of queries like
WHERE gauge_id = xxx
AND precip.reading_time > (now() - '01:00:00'::interval)
AND precip.reading_time < now()
As I mentioned postgres scans all of the indexes on reading_time for every 'child' table rather than only querying the child table that has timestamps in the query range. If I enter literal values (e.g., precip.reading_time > '2018-03-01 01:23:00') instead of now(), it only scans the indexes of appropriate child tables(s). I've done some reading and I understand that now() is volatile and that the planner won't know what the value will be when the query executes. I've also read that query planning is expensive so postgres caches plans. I can understand why PG is programmed to do that. However, one counter argument I read was that a re-planned query is probably way less expensive than a query that end up ignoring partitions. I agree - and that's probably the case in my situation.
As a work arounds, I've created this function:
CREATE OR REPLACE FUNCTION public.hours_ago2(i integer)
RETURNS timestamp with time zone
LANGUAGE 'plpgsql'
COST 100
IMMUTABLE
ROWS 0
AS $BODY$
DECLARE X timestamp with time zone;
BEGIN
X:= now() + cast(i || ' hours' as interval);
RETURN X;
END;
$BODY$;
Note the IMMUTABLE statment. Now when issue queries like
select * from stream
where gauge_id = 2142 and reading_time > hours_ago2(-3)
and reading_time < hours_ago2(0)
PG only searches the partition table that stores data for that time frame. This is the goal I was shooting for when I set up the partitions in the first place. Booyah. But is this safe? Will the query planner ever cache the results of hours_ago2(-3) and use it over and over again for hours down the road? It's ok if it's cached for a few minutes. Again, my app reports rain and streamflow information; it doesn't deal with financial transactions or any other 'critical' types of data processing. I've tested simple statements like select hours_ago2(-3) and it returns new values every time. So it seems safe. But is it really?
That is not safe because at planning time you have no idea if the statement will be executed in the same transaction or not.
If you are in a situation where query plans are cached, this will return wrong results. Query plans are cached for named prepared statements and statements in PL/pgSQL functions, so you could end up with an out-of-date value for the duration of the database session.
For example:
CREATE TABLE times(id integer PRIMARY KEY, d timestamptz NOT NULL);
PREPARE x AS SELECT * FROM times WHERE d > hours_ago2(1);
The function is evaluated at planning time, and the result is a constant in the execution plan (for immutable functions that is fine).
EXPLAIN (COSTS off) EXECUTE x;
QUERY PLAN
---------------------------------------------------------------------------
Seq Scan on times
Filter: (d > '2018-03-12 14:25:17.380057+01'::timestamp with time zone)
(2 rows)
SELECT pg_sleep(100);
EXPLAIN (COSTS off) EXECUTE x;
QUERY PLAN
---------------------------------------------------------------------------
Seq Scan on times
Filter: (d > '2018-03-12 14:25:17.380057+01'::timestamp with time zone)
(2 rows)
The second query definitely does not return the result you want.
I think you should evaluate now() (or better an equivalent function on the client side) first, perform your date arithmetic and supply the result as parameter to the query. Inside of PL/pgSQL functions, use dynamic SQL.
Change the queries to use 'now'::timestamptz instead of now(). Also, interval math on timestamptz is not immutable.
Change your query to something like:
WHERE gauge_id = xxx
AND precip.reading_time > ((('now'::timestamptz AT TIME ZONE 'UTC') - '01:00:00'::interval) AT TIME ZONE 'UTC')
AND precip.reading_time < 'now'::timestamptz

Using DATEDIFF in T-SQL

I am using DATEDIFF in an SQL statement. I am selecting it, and I need to use it in WHERE clause as well. This statement does not work...
SELECT DATEDIFF(ss, BegTime, EndTime) AS InitialSave
FROM MyTable
WHERE InitialSave <= 10
It gives the message: Invalid column name "InitialSave"
But this statement works fine...
SELECT DATEDIFF(ss, BegTime, EndTime) AS InitialSave
FROM MyTable
WHERE DATEDIFF(ss, BegTime, EndTime) <= 10
The programmer in me says that this is inefficient (seems like I am calling the function twice).
So two questions. Why doesn't the first statement work? Is it inefficient to do it using the second statement?
Note: When I originally wrote this answer I said that an index on one of the columns could create a query that performs better than other answers (and mentioned Dan Fuller's). However, I was not thinking 100% correctly. The fact is, without a computed column or indexed (materialized) view, a full table scan is going to be required, because the two date columns being compared are from the same table!
I believe there is still value in the information below, namely 1) the possibility of improved performance in the right situation, as when the comparison is between columns from different tables, and 2) promoting the habit in SQL developers of following best practice and reshaping their thinking in the right direction.
Making Conditions Sargable
The best practice I'm referring to is one of moving one column to be alone on one side of the comparison operator, like so:
SELECT InitialSave = DateDiff(second, T.BegTime, T.EndTime)
FROM dbo.MyTable T
WHERE T.EndTime <= T.BegTime + '00:00:10'
As I said, this will not avoid a scan on a single table, however, in a situation like this it could make a huge difference:
SELECT InitialSave = DateDiff(second, T.BegTime, T.EndTime)
FROM
dbo.BeginTime B
INNER JOIN dbo.EndTime E
ON B.BeginTime <= E.EndTime
AND B.BeginTime + '00:00:10' > E.EndTime
EndTime is in both conditions now alone on one side of the comparison. Assuming that the BeginTime table has many fewer rows, and the EndTime table has an index on column EndTime, this will perform far, far better than anything using DateDiff(second, B.BeginTime, E.EndTime). It is now sargable, which means there is a valid "search argument"--so as the engine scans the BeginTime table, it can seek into the EndTime table. Careful selection of which column is by itself on one side of the operator is required--it can be worth experimenting by putting BeginTime by itself by doing some algebra to switch to AND B.BeginTime > E.EndTime - '00:00:10'
Precision of DateDiff
I should also point out that DateDiff does not return elapsed time, but instead counts the number of boundaries crossed. If a call to DateDiff using seconds returns 1, this could mean 3 ms elapsed time, or it could mean 1997 ms! This is essentially a precision of +- 1 time units. For the better precision of +- 1/2 time unit, you would want the following query comparing 0 to EndTime - BegTime:
SELECT DateDiff(second, 0, EndTime - BegTime) AS InitialSave
FROM MyTable
WHERE EndTime <= BegTime + '00:00:10'
This now has a maximum rounding error of only one second total, not two (in effect, a floor() operation). Note that you can only subtract the datetime data type--to subtract a date or a time value you would have to convert to datetime or use other methods to get the better precision (a whole lot of DateAdd, DateDiff and possibly other junk, or perhaps using a higher precision time unit and dividing).
This principle is especially important when counting larger units such as hours, days, or months. A DateDiff of 1 month could be 62 days apart (think July 1, 2013 - Aug 31 2013)!
You can't access columns defined in the select statement in the where statement, because they're not generated until after the where has executed.
You can do this however
select InitialSave from
(SELECT DATEDIFF(ss, BegTime, EndTime) AS InitialSave
FROM MyTable) aTable
WHERE InitialSave <= 10
As a sidenote - this essentially moves the DATEDIFF into the where statement in terms of where it's first defined. Using functions on columns in where statements causes indexes to not be used as efficiently and should be avoided if possible, however if you've got to use datediff then you've got to do it!
beyond making it "work", you need to use an index
use a computed column with an index, or a view with an index, otherwise you will table scan. when you get enough rows, you will feel the PAIN of the slow scan!
computed column & index:
ALTER TABLE MyTable ADD
ComputedDate AS DATEDIFF(ss,BegTime, EndTime)
GO
CREATE NONCLUSTERED INDEX IX_MyTable_ComputedDate ON MyTable
(
ComputedDate
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
create a view & index:
CREATE VIEW YourNewView
AS
SELECT
KeyValues
,DATEDIFF(ss, BegTime, EndTime) AS InitialSave
FROM MyTable
GO
CREATE CLUSTERED INDEX IX_YourNewView
ON YourNewView(InitialSave)
GO
You have to use the function instead of the column alias - it is the same with count(*), etc. PITA.
As an alternate, you can use computed columns.