I am using UUID version 1 as the primary key. I would like to sort on UUID v1 timestamp. Right now if I do something like this:
SELECT id, title
FROM table
ORDER BY id DESC;
PostgreSQL does not sort records by UUID timestamp, but by UUID string representation, which ends up with unexpected sorting result in my case.
Am I missing something, or there is not a built in way to do this in PostgreSQL?
The timestamp is one of the parts of a v1 UUID. It is stored in hex format as hundreds nanoseconds since 1582-10-15 00:00. This function extracts the timestamp:
create or replace function uuid_v1_timestamp (_uuid uuid)
returns timestamp with time zone as $$
select
to_timestamp(
(
('x' || lpad(h, 16, '0'))::bit(64)::bigint::double precision -
122192928000000000
) / 10000000
)
from (
select
substring (u from 16 for 3) ||
substring (u from 10 for 4) ||
substring (u from 1 for 8) as h
from (values (_uuid::text)) s (u)
) s
;
$$ language sql immutable;
select uuid_v1_timestamp(uuid_generate_v1());
uuid_v1_timestamp
-------------------------------
2016-06-16 12:17:39.261338+00
122192928000000000 is the interval between the start of the Gregorian calendar and the Unix timestamp.
In your query:
select id, title
from t
order by uuid_v1_timestamp(id) desc
To improve performance an index can be created on that:
create index uuid_timestamp_ndx on t (uuid_v1_timestamp(id));
Related
i have a table where my date/time is of form: 2020-03-10 22:54:08
This is a timestampped object. I tried the following query but didn't return any rows:
select ts from table1
where cast(ts as timestamp) = '2020-03-10 22:54:08'
returns nothing.
How do i query based on date and time in postgressql?
A timestamp has microsecond resolution, so you have to use same techiques as when testing floating point numbers: Round it or use only < and > for comparison.
To retrieve data from a database, you need to refer to SQL SELECT syntax. In your situation, the ts column is already a timestamp, so there is no need to use cast(). Bear in mind, however, that a timestamp type contains fractions of a second (i.e., 2020-03-10 22:54:03.xxx), so you would be better off using a comparison operator (>,<,>=,or <=)
You can retrieve all columns by using the * syntax:
select *
from my_table
where ts >= '2020-03-10 22:54:08';
tymestamp type by default contains also microseconds, so, now() which, for example, is 2020-03-11 01:56:27.593985 here obviously is not equal to 2020-03-11 01:56:27. If you do not want to have microseconds precision in your data then declare your field like ts timestamp(0) NOT NULL DEFAULT now() which means "0 decimal digits for microseconds":
select
current_timestamp::timestamp as ts,
current_timestamp::timestamp(2) as ts2,
current_timestamp::timestamp(0) as ts0;
ts | ts2 | ts0
---------------------------+------------------------+---------------------
2020-03-11 02:02:52.98298 | 2020-03-11 02:02:52.98 | 2020-03-11 02:02:53
Actually this worked.
select distinct(ts) from my_table where ts >= '2020-03-10 22:54:08' and ts <= '2020-03-10 22:54:09'
But this doesn't give me the whole rows
But then i tried this and this worked:
select ts from table1
where to_char(ts,'YYYY-MM-DD HH24:MI:SS') = '2020-03-10 22:54:08'
Let us say we have have two tables:
CREATE TABLE IF NOT EXISTS tech_time(
ms_since_epoch BIGINT
);
CREATE TABLE IF NOT EXISTS readable_time(
ts timestamp without time zone,
);
Let us say tech_time has data and we would like to populate readable_time.
So in Postgres you could use to_timestamp(double precision) and do something like
INSERT INTO readable_time(ts)
SELECT DISTINCT to_timestamp(ms_since_epoch::float / 1000) AS ts,
FROM tech_time;
No such function seems to exist in Amazon Redshift:
function to_timestamp(double precision) does not exist
My question is: how do I properly populate readable_time, while losing the least amount of precision?
We can try using DATEADD and add the ms_since_epoch to January 1, 1970:
INSERT INTO readable_time (ts)
SELECT DATEADD(ms, ms_since_epoch, 'epoch')
FROM tech_time;
In MongoDB you can retrieve the date from an ObjectId using the getTimestamp() function. How can I retrieve the date from a MongoDB ObjectId using Postgresql (e.g., in the case where such an ObjectId is stored in a Postgres database)?
Example input:
507c7f79bcf86cd7994f6c0e
Wanted output:
2012-10-15T21:26:17Z
In Mongodb documentation the Objectid is formed with a timestamp as the first 4 bytes, but this is represented in hexidecimal. Assuming that hexidecimal value is stored as a string in PostgreSQL, then the following query will extract just the first 8 characters of that objectid, convert that to an integer (which is seconds from 1970-01-01) then convert that integer to a timestamp. For example:
SELECT TO_TIMESTAMP(int_val) ts_val
FROM (
SELECT ('x' || lpad(left(objectid,8), 8, '0'))::bit(32)::int AS int_val
FROM (
VALUES ('507c7f79bcf86cd7994f6c0e')
) AS t1(objectid)
) AS t2
;
Converting a hexadecimal string to integer is discussed here:
Convert hex in text representation to decimal number
The first answer is quite excellent. This one expands the answer by making a reusable function out of it.
create function extractMongoTimestamp(text) RETURNS TIMESTAMP WITH TIME ZONE
as
'SELECT TO_TIMESTAMP(int_val) ts_val
FROM (
SELECT (''x'' || lpad(left(objectid,8), 8, ''0''))::bit(32)::int AS int_val
FROM (
VALUES ($1)
) AS t1(objectid)
) AS t2'
language sql
immutable
RETURNS null on null input;
Use it in your query:
select extractMongoTimestamp('507c7f79bcf86cd7994f6c0e');
tl;dr
Using PSQL 9.4, is there a way to retrieve multiple values from a jsonb field, such as you would with the imaginary function:
jsonb_extract_path(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key'])
With the hope of speeding up the otherwise almost linear time required to select multiple values (1 value = 300ms, 2 values = 450ms, 3 values = 600ms)
Background
I have the following jsonb table:
CREATE TABLE "public"."analysis" (
"date" date NOT NULL,
"name" character varying (10) NOT NULL,
"country" character (3) NOT NULL,
"x" jsonb,
PRIMARY KEY(date,name)
);
With roughly 100 000 rows where each rows has a jsonb dictionary with 90+ keys and corresponding values. I'm trying to write an SQL query to select a few (< 10) key+values in a fairly quick way (< 500 ms)
Index and querying: 190ms
I started by adding an index:
CREATE INDEX ON analysis USING GIN (x);
This makes querying based on values in the "x" dictionary fast, such as this:
SELECT date, name, country FROM analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;
This takes ~190 ms (acceptable for us)
Retrieving dictionary values
However, once I start adding keys to return in the SELECT part, execution time rises almost linear:
1 value: 300ms
select jsonb_extract_path(x, 'a_dictionary_key') from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;
Takes 366ms (+175ms)
select x#>'{a_dictionary_key}' as gear_down_altitude from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100 ;
Takes 300ms (+110ms)
3 values: 600ms
select jsonb_extract_path(x, 'a_dictionary_key'), jsonb_extract_path(x, 'a_second_dictionary_key'), jsonb_extract_path(x, 'a_third_dictionary_key') from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;
Takes 600ms (+410, or +100 for each value selected)
select x#>'{a_dictionary_key}' as a_dictionary_key, x#>'{a_second_dictionary_key}' as a_second_dictionary_key, x#>'{a_third_dictionary_key}' as a_third_dictionary_key from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100 ;
Takes 600ms (+410, or +100 for each value selected)
Retrieving more values faster
Is there a way to retrieve multiple values from a jsonb field, such as you would with the imaginary function:
jsonb_extract_path(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key'])
Which could possibly speed up these lookups. It can return them either as columns or as an list/array or even a json object.
Retrieving an array using PL/Python
Just for the heck of it I made a custom function using PL/Python, but that was much slower (5s+), possibly due to json.loads:
CREATE OR REPLACE FUNCTION retrieve_objects(data jsonb, k VARCHAR[])
RETURNS TEXT[] AS $$
if not data:
return []
import simplejson as json
j = json.loads(data)
l = []
for i in k:
l.append(j[i])
return l
$$ LANGUAGE plpython2u;
# Usage:
# select retrieve_objects(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key']) from analysis where date > '2014-01-01' and date < '2014-05-01'
Update 2015-05-21
I re-implemented the table using hstore with GIN index and the performance is almost identical to using jsonb, i.e not helpfull in my case.
You're using the #> operator, which looks like it performs a path search. Have you tried a normal -> lookup? Like:
select json_column->'json_field1'
, json_column->'json_field2'
It would be interesting to see what happened if you used a temporary table. Like:
create temporary table tmp_doclist (doc jsonb)
;
insert tmp_doclist
(doc)
select x
from analysis
where ... your conditions here ...
;
select doc->'col1'
, doc->'col2'
, doc->'col3'
from tmp_doclist
;
This is hard to test without the data.
Create a custom type
create type my_query_result_type (
a_dictionary_key float,
a_second_dictionary_key float
)
And your query
select (json_populate_record(null::my_query_result_type,j::json)).* from analysis;
You should be able to use a temporary table instead of type which will be created at, runtime making your query dynamic.
But first check it out if this helps form the performance point of view.
I need to include EXTRACT() function within WHERE clause as follow:
SELECT * FROM my_table WHERE EXTRACT(YEAR FROM date) = '2014';
I get a message like this:
pg_catalog.date_part(unknown, text) doesn't exist**
SQL State 42883
Here is my_table content (gid INTEGER, date DATE):
gid | date
-------+-------------
1 | 2014-12-12
2 | 2014-12-08
3 | 2013-17-15
I have to do it this way because the query is sent from a form on a website that includes a 'Year' field where users enter the year on a 4-digits basis.
The problem is that your column is of data type text, while EXTRACT() only works for date / time types.
You should convert your column to the appropriate data type.
ALTER TABLE my_table ALTER COLUMN date TYPE date;
That's smaller (4 bytes instead of 11 for the text), faster and cleaner (disallows illegal dates and most typos).
If you have non-standard format add a USING clause with a conversion expression. Example:
Alter character field to date
Also, for your queries to be fast with a plain index on date you should rather use sargable predicates. Like:
SELECT * FROM my_table
WHERE date >= '2014-01-01'
AND date < '2015-01-01';
Or, to go with your 4-digit input for the year:
SELECT * FROM my_table
WHERE date >= to_date('2014', 'YYYY')
AND date < to_date('2015', 'YYYY');
You could also be more explicit:
to_date('2014' || '0101', 'YYYYMMNDD')
Both produce the same date '2014-01-01'.
Aside: date is a reserved word in standard SQL and a basic type name in Postgres. Don't use it as identifier.
This happens because the column has a text or varchar type, as opposed to date or timestamp. This is easily reproducible:
SELECT 1 WHERE extract(year from '2014-01-01'::text)='2014';
yields this error:
ERROR: function pg_catalog.date_part(unknown, text) does not exist
LINE 1: SELECT 1 WHERE extract(year from '2014-01-01'::text)='2014';
^ HINT: No function matches the given name and argument types. You might need to add explicit type casts.
extract or is underlying function date_part does not exist for text-like datatypes, but they're not needed anyway. Extracting the year from this date format is equivalent to getting the 4 first characters, so your query would be:
SELECT * FROM my_table WHERE left(date,4)='2014';