PostgreSQL - sort by UUID version 1 timestamp - postgresql

I am using UUID version 1 as the primary key. I would like to sort on UUID v1 timestamp. Right now if I do something like this:
SELECT id, title
FROM table
ORDER BY id DESC;
PostgreSQL does not sort records by UUID timestamp, but by UUID string representation, which ends up with unexpected sorting result in my case.
Am I missing something, or there is not a built in way to do this in PostgreSQL?

The timestamp is one of the parts of a v1 UUID. It is stored in hex format as hundreds nanoseconds since 1582-10-15 00:00. This function extracts the timestamp:
create or replace function uuid_v1_timestamp (_uuid uuid)
returns timestamp with time zone as $$
select
to_timestamp(
(
('x' || lpad(h, 16, '0'))::bit(64)::bigint::double precision -
122192928000000000
) / 10000000
)
from (
select
substring (u from 16 for 3) ||
substring (u from 10 for 4) ||
substring (u from 1 for 8) as h
from (values (_uuid::text)) s (u)
) s
;
$$ language sql immutable;
select uuid_v1_timestamp(uuid_generate_v1());
uuid_v1_timestamp
-------------------------------
2016-06-16 12:17:39.261338+00
122192928000000000 is the interval between the start of the Gregorian calendar and the Unix timestamp.
In your query:
select id, title
from t
order by uuid_v1_timestamp(id) desc
To improve performance an index can be created on that:
create index uuid_timestamp_ndx on t (uuid_v1_timestamp(id));

Related

How to query by date and time in postgresql

i have a table where my date/time is of form: 2020-03-10 22:54:08
This is a timestampped object. I tried the following query but didn't return any rows:
select ts from table1
where cast(ts as timestamp) = '2020-03-10 22:54:08'
returns nothing.
How do i query based on date and time in postgressql?
A timestamp has microsecond resolution, so you have to use same techiques as when testing floating point numbers: Round it or use only < and > for comparison.
To retrieve data from a database, you need to refer to SQL SELECT syntax. In your situation, the ts column is already a timestamp, so there is no need to use cast(). Bear in mind, however, that a timestamp type contains fractions of a second (i.e., 2020-03-10 22:54:03.xxx), so you would be better off using a comparison operator (>,<,>=,or <=)
You can retrieve all columns by using the * syntax:
select *
from my_table
where ts >= '2020-03-10 22:54:08';
tymestamp type by default contains also microseconds, so, now() which, for example, is 2020-03-11 01:56:27.593985 here obviously is not equal to 2020-03-11 01:56:27. If you do not want to have microseconds precision in your data then declare your field like ts timestamp(0) NOT NULL DEFAULT now() which means "0 decimal digits for microseconds":
select
current_timestamp::timestamp as ts,
current_timestamp::timestamp(2) as ts2,
current_timestamp::timestamp(0) as ts0;
ts | ts2 | ts0
---------------------------+------------------------+---------------------
2020-03-11 02:02:52.98298 | 2020-03-11 02:02:52.98 | 2020-03-11 02:02:53
Actually this worked.
select distinct(ts) from my_table where ts >= '2020-03-10 22:54:08' and ts <= '2020-03-10 22:54:09'
But this doesn't give me the whole rows
But then i tried this and this worked:
select ts from table1
where to_char(ts,'YYYY-MM-DD HH24:MI:SS') = '2020-03-10 22:54:08'

Redshift: milliseconds to timestamp

Let us say we have have two tables:
CREATE TABLE IF NOT EXISTS tech_time(
ms_since_epoch BIGINT
);
CREATE TABLE IF NOT EXISTS readable_time(
ts timestamp without time zone,
);
Let us say tech_time has data and we would like to populate readable_time.
So in Postgres you could use to_timestamp(double precision) and do something like
INSERT INTO readable_time(ts)
SELECT DISTINCT to_timestamp(ms_since_epoch::float / 1000) AS ts,
FROM tech_time;
No such function seems to exist in Amazon Redshift:
function to_timestamp(double precision) does not exist
My question is: how do I properly populate readable_time, while losing the least amount of precision?
We can try using DATEADD and add the ms_since_epoch to January 1, 1970:
INSERT INTO readable_time (ts)
SELECT DATEADD(ms, ms_since_epoch, 'epoch')
FROM tech_time;

How to extract timestamp from mongodb objectid in postgres

In MongoDB you can retrieve the date from an ObjectId using the getTimestamp() function. How can I retrieve the date from a MongoDB ObjectId using Postgresql (e.g., in the case where such an ObjectId is stored in a Postgres database)?
Example input:
507c7f79bcf86cd7994f6c0e
Wanted output:
2012-10-15T21:26:17Z
In Mongodb documentation the Objectid is formed with a timestamp as the first 4 bytes, but this is represented in hexidecimal. Assuming that hexidecimal value is stored as a string in PostgreSQL, then the following query will extract just the first 8 characters of that objectid, convert that to an integer (which is seconds from 1970-01-01) then convert that integer to a timestamp. For example:
SELECT TO_TIMESTAMP(int_val) ts_val
FROM (
SELECT ('x' || lpad(left(objectid,8), 8, '0'))::bit(32)::int AS int_val
FROM (
VALUES ('507c7f79bcf86cd7994f6c0e')
) AS t1(objectid)
) AS t2
;
Converting a hexadecimal string to integer is discussed here:
Convert hex in text representation to decimal number
The first answer is quite excellent. This one expands the answer by making a reusable function out of it.
create function extractMongoTimestamp(text) RETURNS TIMESTAMP WITH TIME ZONE
as
'SELECT TO_TIMESTAMP(int_val) ts_val
FROM (
SELECT (''x'' || lpad(left(objectid,8), 8, ''0''))::bit(32)::int AS int_val
FROM (
VALUES ($1)
) AS t1(objectid)
) AS t2'
language sql
immutable
RETURNS null on null input;
Use it in your query:
select extractMongoTimestamp('507c7f79bcf86cd7994f6c0e');

Retrieving multiple values from a large jsonb field faster (postgresql 9.4)

tl;dr
Using PSQL 9.4, is there a way to retrieve multiple values from a jsonb field, such as you would with the imaginary function:
jsonb_extract_path(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key'])
With the hope of speeding up the otherwise almost linear time required to select multiple values (1 value = 300ms, 2 values = 450ms, 3 values = 600ms)
Background
I have the following jsonb table:
CREATE TABLE "public"."analysis" (
"date" date NOT NULL,
"name" character varying (10) NOT NULL,
"country" character (3) NOT NULL,
"x" jsonb,
PRIMARY KEY(date,name)
);
With roughly 100 000 rows where each rows has a jsonb dictionary with 90+ keys and corresponding values. I'm trying to write an SQL query to select a few (< 10) key+values in a fairly quick way (< 500 ms)
Index and querying: 190ms
I started by adding an index:
CREATE INDEX ON analysis USING GIN (x);
This makes querying based on values in the "x" dictionary fast, such as this:
SELECT date, name, country FROM analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;
This takes ~190 ms (acceptable for us)
Retrieving dictionary values
However, once I start adding keys to return in the SELECT part, execution time rises almost linear:
1 value: 300ms
select jsonb_extract_path(x, 'a_dictionary_key') from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;
Takes 366ms (+175ms)
select x#>'{a_dictionary_key}' as gear_down_altitude from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100 ;
Takes 300ms (+110ms)
3 values: 600ms
select jsonb_extract_path(x, 'a_dictionary_key'), jsonb_extract_path(x, 'a_second_dictionary_key'), jsonb_extract_path(x, 'a_third_dictionary_key') from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;
Takes 600ms (+410, or +100 for each value selected)
select x#>'{a_dictionary_key}' as a_dictionary_key, x#>'{a_second_dictionary_key}' as a_second_dictionary_key, x#>'{a_third_dictionary_key}' as a_third_dictionary_key from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100 ;
Takes 600ms (+410, or +100 for each value selected)
Retrieving more values faster
Is there a way to retrieve multiple values from a jsonb field, such as you would with the imaginary function:
jsonb_extract_path(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key'])
Which could possibly speed up these lookups. It can return them either as columns or as an list/array or even a json object.
Retrieving an array using PL/Python
Just for the heck of it I made a custom function using PL/Python, but that was much slower (5s+), possibly due to json.loads:
CREATE OR REPLACE FUNCTION retrieve_objects(data jsonb, k VARCHAR[])
RETURNS TEXT[] AS $$
if not data:
return []
import simplejson as json
j = json.loads(data)
l = []
for i in k:
l.append(j[i])
return l
$$ LANGUAGE plpython2u;
# Usage:
# select retrieve_objects(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key']) from analysis where date > '2014-01-01' and date < '2014-05-01'
Update 2015-05-21
I re-implemented the table using hstore with GIN index and the performance is almost identical to using jsonb, i.e not helpfull in my case.
You're using the #> operator, which looks like it performs a path search. Have you tried a normal -> lookup? Like:
select json_column->'json_field1'
, json_column->'json_field2'
It would be interesting to see what happened if you used a temporary table. Like:
create temporary table tmp_doclist (doc jsonb)
;
insert tmp_doclist
(doc)
select x
from analysis
where ... your conditions here ...
;
select doc->'col1'
, doc->'col2'
, doc->'col3'
from tmp_doclist
;
This is hard to test without the data.
Create a custom type
create type my_query_result_type (
a_dictionary_key float,
a_second_dictionary_key float
)
And your query
select (json_populate_record(null::my_query_result_type,j::json)).* from analysis;
You should be able to use a temporary table instead of type which will be created at, runtime making your query dynamic.
But first check it out if this helps form the performance point of view.

Extract year from date within WHERE clause

I need to include EXTRACT() function within WHERE clause as follow:
SELECT * FROM my_table WHERE EXTRACT(YEAR FROM date) = '2014';
I get a message like this:
pg_catalog.date_part(unknown, text) doesn't exist**
SQL State 42883
Here is my_table content (gid INTEGER, date DATE):
gid | date
-------+-------------
1 | 2014-12-12
2 | 2014-12-08
3 | 2013-17-15
I have to do it this way because the query is sent from a form on a website that includes a 'Year' field where users enter the year on a 4-digits basis.
The problem is that your column is of data type text, while EXTRACT() only works for date / time types.
You should convert your column to the appropriate data type.
ALTER TABLE my_table ALTER COLUMN date TYPE date;
That's smaller (4 bytes instead of 11 for the text), faster and cleaner (disallows illegal dates and most typos).
If you have non-standard format add a USING clause with a conversion expression. Example:
Alter character field to date
Also, for your queries to be fast with a plain index on date you should rather use sargable predicates. Like:
SELECT * FROM my_table
WHERE date >= '2014-01-01'
AND date < '2015-01-01';
Or, to go with your 4-digit input for the year:
SELECT * FROM my_table
WHERE date >= to_date('2014', 'YYYY')
AND date < to_date('2015', 'YYYY');
You could also be more explicit:
to_date('2014' || '0101', 'YYYYMMNDD')
Both produce the same date '2014-01-01'.
Aside: date is a reserved word in standard SQL and a basic type name in Postgres. Don't use it as identifier.
This happens because the column has a text or varchar type, as opposed to date or timestamp. This is easily reproducible:
SELECT 1 WHERE extract(year from '2014-01-01'::text)='2014';
yields this error:
ERROR: function pg_catalog.date_part(unknown, text) does not exist
LINE 1: SELECT 1 WHERE extract(year from '2014-01-01'::text)='2014';
^ HINT: No function matches the given name and argument types. You might need to add explicit type casts.
extract or is underlying function date_part does not exist for text-like datatypes, but they're not needed anyway. Extracting the year from this date format is equivalent to getting the 4 first characters, so your query would be:
SELECT * FROM my_table WHERE left(date,4)='2014';