How to retrieve last modified timestamp from postgres table and pass that to a condition using pyspark - postgresql

I have a postgres table "log", which has a column called "timestamp" which has the date and time of files in a folder.
I need to retrieve the latest timestamp from table and pass this in a "for condition" but initially the table will be empty, from second iteration i need to fetch from the table using pyspark
Please let me know how to go about it
so far i tried
log_qry = """select timestamp from log order by timestamp desc limit 1"""
cursor.execute = log_qry
conn.commit
this seems to be not working

Your query should be like below:
select timestamp from log order by timestamp desc limit 1
it will return 0 records if there is no record in your table log.
better you try it with max like below:
select max(timestamp) from log
it will return 1 record always, if table is empty then it will return null else it will return the max timestamp from the column timestamp.
Don't use reserve keywords and column name

timestamp is a reserved word and should be double-quoted if used as a name in a query.
If null is not acceptable for your "for condition" then coalesce it to a date/time very very long ago.
select coalesce(max("timestamp"), '0001-01-01T00:00:00'::timestamp) from "log";

Related

PostgreSQL function lags when passing a date from another table as argument

I created a function that retrieves data from multiple table and insert the result into a new table. I am passing few dates in the where clause to filter the applicable information. The dates are in timestamptz format. The query takes roughly a min to process. However when I make the change in the where clause to pass the date from another table, the query lags a lot. i.e.
My initial argument that doesn't lag is
where timestamp = '2022-01-05 04:00:00+00'
The lag is when I change this where clause to pass this date from another table.
where timestamp = select date_time from table_Test
The date_time in table_test is defined as timestamptz and the value is '2022-01-05 04:00:00+00'.
Any idea why the query lags when I pass the date argument from another table?
Thanks,

How to retrieve last time stored in column having timestamp with time zone[] as datatype in PostgreSQL

I have a table with below structure and want to fetch last stored datetime. Here in below table column last_updated_date is of timestamp with time zone[] datatype which stores array of timestamps separated by comma. Now I want to fetch last recorded datetime here in this case it should be "2022-06-11 05:13:10.559+00".
Table
---------------------------------------
Login
Id
username
login_attempt
last_updated_date timestamp with time zone[]
The column last_updated_date has value something similar to this
{"2022-01-12 12:14:50.329+00","2022-02-17 03:49:45.525+00","2022-06-11 05:13:10.559+00"}
Assuming you timestamp array is always stored oldest to latest you can get the index for the latest with the number of entries in the array (array_length). Something like:
select Id
, username
, login_attempt
, last_updated[array_length( last_updated,1)]
from login;
NOTE: Not tested.

query to fetch records between two date and time

I have been using postgreSQL. My table has 3 columns date, time and userId. I have to find out records between the given date and time frame. Since date and time columns are different, 'BETWEEN' clause is not providing valid results
Combine the two columns into a single timestamp by adding the time to the date:
select *
from some_table
where date_column + time_column
between timestamp '2017-06-14 17:30:00' and timestamp '2017-06-19 08:26:00';
Note that this will not use an index on date_column or time_column. You would need to create an index on that expression. Or better: use a single column defined as timestamp instead.

getting second difference with sql statement

I have a table in postgresql which stores time stamp with timezone for every row inserted.
How can I use postgresql's function to find a the difference in seconds from the timestamp in one of the rows already inserted to the current postgresql server time stamp?
Assuming that the column name is ts and the table name is t, you can query like this:
select current_timestamp - max(ts) from t;
If the table contains large amount of data, this query will be very slow. In that case, you should have index on the timestamp column.

date_trunc on timestamp column returns nothing

I have a strange problem when retrieving records from db after comparing a truncated field with date_trunc().
This query doesn't return any data:
select id from my_db_log
where date_trunc('day',creation_date) >= to_date('2014-03-05'::text,'yyyy-mm-dd');
But if I add the column creation_date with id then it returns data(i.e. select id, creation_date...).
I have another column last_update_date having same type and when I use that one, still does the same behavior.
select id from my_db_log
where date_trunc('day',last_update_date) >= to_date('2014-03-05'::text,'yyyy-mm-dd');
Similar to previous one. it also returns record if I do id, last_update_date in my select.
Now to dig further, I have added both creation_date and last_updated_date in my where clause and this time it demands to have both of them in my select clause to have records(i.e. select id, creation_date, last_update_date).
Does anyone encountered the same problem ever? This similar thing works with my other tables which are having this type of columns!
If it helps, here is my table schema:
id serial NOT NULL,
creation_date timestamp without time zone NOT NULL DEFAULT now(),
last_update_date timestamp without time zone NOT NULL DEFAULT now(),
CONSTRAINT db_log_pkey PRIMARY KEY (id),
I have asked a different question earlier that didn't get any answer. This problem may be related to that one. If you are interested on that one, here is the link.
EDITS:: EXPLAIN (FORMAT XML) with select * returns:
<explain xmlns="http://www.postgresql.org/2009/explain">
<Query>
<Plan>
<Node-Type>Result</Node-Type>
<Startup-Cost>0.00</Startup-Cost>
<Total-Cost>0.00</Total-Cost>
<Plan-Rows>1000</Plan-Rows>
<Plan-Width>658</Plan-Width>
<Plans>
<Plan>
<Node-Type>Result</Node-Type>
<Parent-Relationship>Outer</Parent-Relationship>
<Alias>my_db_log</Alias>
<Startup-Cost>0.00</Startup-Cost>
<Total-Cost>0.00</Total-Cost>
<Plan-Rows>1000</Plan-Rows>
<Plan-Width>658</Plan-Width>
<Node/s>datanode1</Node/s>
<Coordinator-quals>(date_trunc('day'::text, creation_date) >= to_date('2014-03-05'::text, 'yyyy-mm-dd'::text))</Coordinator-quals>
</Plan>
</Plans>
</Plan>
</Query>
</explain>
"Impossible" phenomenon
The number of rows returned is completely independent of items in the SELECT clause. (But see #Craig's comment about SRFs.) Something must be broken in your db.
Maybe a broken covering index? When you throw in the additional column, you force Postgres to visit the table itself. Try to re-index:
REINDEX TABLE my_db_log;
The manual on REINDEX. Or:
VACUUM FULL ANALYZE my_db_log;
Better query
Either way, use instead:
select id from my_db_log
where creation_date >= '2014-03-05'::date
Or:
select id from my_db_log
where creation_date >= '2014-03-05 00:00'::timestamp
'2014-03-05' is in ISO 8601 format. You can just cast this string literal to date. No need for to_date(), works with any locale. The date is coerced to timestamp [without time zone] automatically when compared to creation_date (being timestamp [without time zone]). More details about timestamps in Postgres here:
Ignoring timezones altogether in Rails and PostgreSQL
Also, you gain nothing by throwing in date_trunc() here. On the contrary, your query will be slower and any plain index on the column cannot be used (potentially making this much slower)